Files
2nd/10_Wiki/Topics/Coding/Mobile_AB_Testing.md
T
2026-05-09 21:08:02 +09:00

7.2 KiB
Raw Blame History

id, title, category, status, source_trust_level, verification_status, created_at, updated_at, tags, tech_stack, applied_in, aliases
id title category status source_trust_level verification_status created_at updated_at tags tech_stack applied_in aliases
mobile-ab-testing Mobile A/B Testing — Variant / Tracking / Cleanup Coding draft B conceptual 2026-05-09 2026-05-09
mobile
ab-testing
experiment
vibe-coding
language applicable_to
Swift / Kotlin / TS
iOS
Android
React Native
A/B test
mobile experiment
Firebase Remote Config
feature flag mobile
in-app variant

Mobile A/B Testing

Native = 매 release = 다른 사용자 version. Remote Config + sticky variant + analytics + statistical significance. Firebase / Statsig / Optimizely / GrowthBook.

📖 핵심 개념

  • Variant: control vs treatment.
  • Sticky: 같은 사용자 = 항상 같은 variant.
  • Statistical significance: 결과 신뢰.
  • Cleanup: 끝난 experiment 코드 제거.

💻 코드 패턴

Firebase Remote Config (iOS)

import FirebaseRemoteConfig

let config = RemoteConfig.remoteConfig()
let settings = RemoteConfigSettings()
settings.minimumFetchInterval = 0  // dev
config.configSettings = settings

config.setDefaults([
    "checkout_button_color": "blue" as NSObject,
    "show_new_onboarding": false as NSObject,
])

// Fetch + activate
config.fetchAndActivate { status, error in
    let color = config.configValue(forKey: "checkout_button_color").stringValue ?? "blue"
    DispatchQueue.main.async {
        button.backgroundColor = color == "red" ? .red : .blue
    }
}

Firebase A/B testing

Firebase Console → A/B Testing → 새 experiment
- Variant A: checkout_button_color = "blue"
- Variant B: checkout_button_color = "red"
- Goal metric: purchases
- Audience: 50% Korean iOS users

→ Firebase 가 자동 분배 + significance 계산.

Statsig (modern, fast)

import Statsig from 'react-native-statsig';

await Statsig.initialize('client-key', { userID: user.id });

const config = Statsig.getDynamicConfig('checkout');
const buttonColor = config.get('button_color', 'blue');
const showNew = Statsig.checkGate('new_onboarding');

// Track
await Statsig.logEvent('purchase', amount, { product_id });
// Stable user ID (login 전)
const deviceId = await DeviceInfo.getUniqueId();
await Statsig.initialize('key', { userID: user?.id ?? deviceId });

GrowthBook (OSS)

import { GrowthBook } from '@growthbook/growthbook-react';

const gb = new GrowthBook({
  apiHost: '...',
  clientKey: '...',
  attributes: { id: user.id, country: locale.country },
});

await gb.loadFeatures();

const variant = gb.getFeatureValue('checkout_button', 'blue');

Sticky variant (consistent hash)

function getVariant(userId: string, experimentKey: string): string {
  const hash = murmurhash(`${experimentKey}:${userId}`);
  return hash % 2 === 0 ? 'control' : 'treatment';
}

// 같은 user + 같은 experiment = 항상 같은 variant

→ 사용자가 매번 다른 variant 보면 안 됨.

Variant exposure tracking

const variant = getVariant(userId, 'checkout_v2');

// Track exposure (once per session)
analytics.track('experiment_exposed', {
  experiment: 'checkout_v2',
  variant,
});

// Render
if (variant === 'treatment') return <NewCheckout />;
return <OldCheckout />;

→ Exposure 가 측정의 시작점.

Native (Swift) — 직접 implementation

struct ExperimentConfig {
    static var checkoutV2: String {
        let userId = User.current.id
        let hash = "\(userId):checkout_v2".hashValue
        return abs(hash) % 100 < 50 ? "control" : "treatment"
    }
}

if ExperimentConfig.checkoutV2 == "treatment" {
    showNewCheckout()
} else {
    showOldCheckout()
}

App version 별 variant

// 새 feature = v2.0+ 만
if (appVersion >= '2.0' && variant === 'treatment') {
  showNew();
} else {
  showOld();
}

→ 옛 version + 새 feature 깨짐 방지.

Server-side (모든 곳)

// Backend 가 사용자 변수 결정
const features = await assignFeatures(userId);
return res.json({ ...data, features });

// Client
if (response.features.checkout_v2) showNew();

→ Frontend 의 hash 의존 X. Server 가 truth.

Power calculation

Sample size:
n = (Z_α/2 + Z_β)² × 2σ² / δ²

α = 0.05 (95% confidence)
β = 0.20 (80% power)
δ = MDE (minimum detectable effect)
σ = baseline std dev

→ 보통 1000-10000 user / variant.

→ Statsig / Firebase 자동 계산.

Statistical significance

P-value < 0.05 = significant.

But:
- Multiple testing (10 variant 비교 시 false positive 늘어남) — Bonferroni
- Peeking (매일 결과 보면 false positive) — sequential testing
- Sample ratio mismatch — variant 분배 깨짐 검사

→ Statsig / Optimizely 가 자동 보정.

Cleanup (가장 중요)

// Experiment 끝 → 코드 정리

// ❌ 영원
if (variant === 'treatment') {
  // 새 code
} else {
  // 옛 code
}

// ✅ 결정 후 한쪽 제거
// 새 code 만 남김
# Linter rule (custom)
# 90일 이상된 experiment flag 검출

Feature flag vs Experiment

Feature flag: 켜기/끄기, gradual rollout.
Experiment:   2+ variant 비교, 측정.

→ 같은 framework 자주 (Statsig / GrowthBook / LaunchDarkly).

Funnel 측정

// 매 step track
analytics.track('checkout_started');
// User 가 떠남
analytics.track('checkout_completed');

// 분석:
// - Conversion (started → completed)
// - 각 step 의 dropoff
// - Variant 별 비교

Cohort analysis

Day 0 install → Day 1, 7, 30 retention.
Variant 별 retention 비교 — 단순 click 보다 중요.

빠른 iteration

1. Hypothesis: "빨간 button 가 더 많이 click"
2. Implement: button color 가 config-driven
3. Run: 1-2 weeks (significance)
4. Analyze: variant 별 metric
5. Decide: 채택 / reject / 다시
6. Cleanup: 코드 정리

Bandit (multi-armed)

Static A/B = 50/50.
Bandit = 자동 더 좋은 variant 트래픽 ↑.

→ 빠르게 winner 찾음 but significance 검증 어려움.

Mobile-specific 고민

- App version: 옛 version 가 새 variant 못 봄.
- Update lag: 사용자가 최신 version 으로 변경 X.
- Offline: variant assignment 가 cache.
- Push: variant 별 다른 message?
- iOS / Android 별로 분석.

Privacy / GDPR

// 사용자가 analytics opt-out
if (!user.analyticsConsent) {
  // Default variant 만
  return defaultExperience;
}

🤔 의사결정 기준

환경 추천
Firebase 사용 중 Firebase A/B + Remote Config
빠른 iteration / TS-first Statsig
OSS / self-host GrowthBook
큰 / enterprise Optimizely / Amplitude
Server-side critical LaunchDarkly + 서버

안티패턴

  • Sticky variant 없음: 사용자 매번 다름.
  • Significance 무시 (p > 0.05 인데 ship): 효과 없는 변경.
  • Peeking (매일 결과 보고 stop): false positive.
  • Cleanup 안 함: 코드 spaghetti.
  • 너무 많은 동시 experiment: noise.
  • Sample ratio mismatch 무시: 분배 깨짐.
  • App version 무시: 옛 사용자 깨짐.

🤖 LLM 활용 힌트

  • Firebase / Statsig 가 가장 단순.
  • Sticky + tracking + cleanup.
  • Power 계산 + significance.
  • Mobile app version 고려.

🔗 관련 문서