Files
2nd/10_Wiki/Topics/Coding/Mobile_AB_Testing.md
T
2026-05-09 21:08:02 +09:00

305 lines
7.2 KiB
Markdown
Raw Blame History

This file contains ambiguous Unicode characters
This file contains Unicode characters that might be confused with other characters. If you think that this is intentional, you can safely ignore this warning. Use the Escape button to reveal them.
---
id: mobile-ab-testing
title: Mobile A/B Testing — Variant / Tracking / Cleanup
category: Coding
status: draft
source_trust_level: B
verification_status: conceptual
created_at: 2026-05-09
updated_at: 2026-05-09
tags: [mobile, ab-testing, experiment, vibe-coding]
tech_stack: { language: "Swift / Kotlin / TS", applicable_to: ["iOS", "Android", "React Native"] }
applied_in: []
aliases: [A/B test, mobile experiment, Firebase Remote Config, feature flag mobile, in-app variant]
---
# Mobile A/B Testing
> Native = 매 release = 다른 사용자 version. **Remote Config + sticky variant + analytics + statistical significance**. Firebase / Statsig / Optimizely / GrowthBook.
## 📖 핵심 개념
- Variant: control vs treatment.
- Sticky: 같은 사용자 = 항상 같은 variant.
- Statistical significance: 결과 신뢰.
- Cleanup: 끝난 experiment 코드 제거.
## 💻 코드 패턴
### Firebase Remote Config (iOS)
```swift
import FirebaseRemoteConfig
let config = RemoteConfig.remoteConfig()
let settings = RemoteConfigSettings()
settings.minimumFetchInterval = 0 // dev
config.configSettings = settings
config.setDefaults([
"checkout_button_color": "blue" as NSObject,
"show_new_onboarding": false as NSObject,
])
// Fetch + activate
config.fetchAndActivate { status, error in
let color = config.configValue(forKey: "checkout_button_color").stringValue ?? "blue"
DispatchQueue.main.async {
button.backgroundColor = color == "red" ? .red : .blue
}
}
```
### Firebase A/B testing
```
Firebase Console → A/B Testing → 새 experiment
- Variant A: checkout_button_color = "blue"
- Variant B: checkout_button_color = "red"
- Goal metric: purchases
- Audience: 50% Korean iOS users
→ Firebase 가 자동 분배 + significance 계산.
```
### Statsig (modern, fast)
```ts
import Statsig from 'react-native-statsig';
await Statsig.initialize('client-key', { userID: user.id });
const config = Statsig.getDynamicConfig('checkout');
const buttonColor = config.get('button_color', 'blue');
const showNew = Statsig.checkGate('new_onboarding');
// Track
await Statsig.logEvent('purchase', amount, { product_id });
```
```ts
// Stable user ID (login 전)
const deviceId = await DeviceInfo.getUniqueId();
await Statsig.initialize('key', { userID: user?.id ?? deviceId });
```
### GrowthBook (OSS)
```ts
import { GrowthBook } from '@growthbook/growthbook-react';
const gb = new GrowthBook({
apiHost: '...',
clientKey: '...',
attributes: { id: user.id, country: locale.country },
});
await gb.loadFeatures();
const variant = gb.getFeatureValue('checkout_button', 'blue');
```
### Sticky variant (consistent hash)
```ts
function getVariant(userId: string, experimentKey: string): string {
const hash = murmurhash(`${experimentKey}:${userId}`);
return hash % 2 === 0 ? 'control' : 'treatment';
}
// 같은 user + 같은 experiment = 항상 같은 variant
```
→ 사용자가 매번 다른 variant 보면 안 됨.
### Variant exposure tracking
```ts
const variant = getVariant(userId, 'checkout_v2');
// Track exposure (once per session)
analytics.track('experiment_exposed', {
experiment: 'checkout_v2',
variant,
});
// Render
if (variant === 'treatment') return <NewCheckout />;
return <OldCheckout />;
```
→ Exposure 가 측정의 시작점.
### Native (Swift) — 직접 implementation
```swift
struct ExperimentConfig {
static var checkoutV2: String {
let userId = User.current.id
let hash = "\(userId):checkout_v2".hashValue
return abs(hash) % 100 < 50 ? "control" : "treatment"
}
}
if ExperimentConfig.checkoutV2 == "treatment" {
showNewCheckout()
} else {
showOldCheckout()
}
```
### App version 별 variant
```ts
// 새 feature = v2.0+ 만
if (appVersion >= '2.0' && variant === 'treatment') {
showNew();
} else {
showOld();
}
```
→ 옛 version + 새 feature 깨짐 방지.
### Server-side (모든 곳)
```ts
// Backend 가 사용자 변수 결정
const features = await assignFeatures(userId);
return res.json({ ...data, features });
// Client
if (response.features.checkout_v2) showNew();
```
→ Frontend 의 hash 의존 X. Server 가 truth.
### Power calculation
```
Sample size:
n = (Z_α/2 + Z_β)² × 2σ² / δ²
α = 0.05 (95% confidence)
β = 0.20 (80% power)
δ = MDE (minimum detectable effect)
σ = baseline std dev
→ 보통 1000-10000 user / variant.
```
→ Statsig / Firebase 자동 계산.
### Statistical significance
```
P-value < 0.05 = significant.
But:
- Multiple testing (10 variant 비교 시 false positive 늘어남) — Bonferroni
- Peeking (매일 결과 보면 false positive) — sequential testing
- Sample ratio mismatch — variant 분배 깨짐 검사
```
→ Statsig / Optimizely 가 자동 보정.
### Cleanup (가장 중요)
```ts
// Experiment 끝 → 코드 정리
// ❌ 영원
if (variant === 'treatment') {
// 새 code
} else {
// 옛 code
}
// ✅ 결정 후 한쪽 제거
// 새 code 만 남김
```
```bash
# Linter rule (custom)
# 90일 이상된 experiment flag 검출
```
### Feature flag vs Experiment
```
Feature flag: 켜기/끄기, gradual rollout.
Experiment: 2+ variant 비교, 측정.
→ 같은 framework 자주 (Statsig / GrowthBook / LaunchDarkly).
```
### Funnel 측정
```ts
// 매 step track
analytics.track('checkout_started');
// User 가 떠남
analytics.track('checkout_completed');
// 분석:
// - Conversion (started → completed)
// - 각 step 의 dropoff
// - Variant 별 비교
```
### Cohort analysis
```
Day 0 install → Day 1, 7, 30 retention.
Variant 별 retention 비교 — 단순 click 보다 중요.
```
### 빠른 iteration
```
1. Hypothesis: "빨간 button 가 더 많이 click"
2. Implement: button color 가 config-driven
3. Run: 1-2 weeks (significance)
4. Analyze: variant 별 metric
5. Decide: 채택 / reject / 다시
6. Cleanup: 코드 정리
```
### Bandit (multi-armed)
```
Static A/B = 50/50.
Bandit = 자동 더 좋은 variant 트래픽 ↑.
→ 빠르게 winner 찾음 but significance 검증 어려움.
```
### Mobile-specific 고민
```
- App version: 옛 version 가 새 variant 못 봄.
- Update lag: 사용자가 최신 version 으로 변경 X.
- Offline: variant assignment 가 cache.
- Push: variant 별 다른 message?
- iOS / Android 별로 분석.
```
### Privacy / GDPR
```ts
// 사용자가 analytics opt-out
if (!user.analyticsConsent) {
// Default variant 만
return defaultExperience;
}
```
## 🤔 의사결정 기준
| 환경 | 추천 |
|---|---|
| Firebase 사용 중 | Firebase A/B + Remote Config |
| 빠른 iteration / TS-first | Statsig |
| OSS / self-host | GrowthBook |
| 큰 / enterprise | Optimizely / Amplitude |
| Server-side critical | LaunchDarkly + 서버 |
## ❌ 안티패턴
- **Sticky variant 없음**: 사용자 매번 다름.
- **Significance 무시 (p > 0.05 인데 ship)**: 효과 없는 변경.
- **Peeking (매일 결과 보고 stop)**: false positive.
- **Cleanup 안 함**: 코드 spaghetti.
- **너무 많은 동시 experiment**: noise.
- **Sample ratio mismatch 무시**: 분배 깨짐.
- **App version 무시**: 옛 사용자 깨짐.
## 🤖 LLM 활용 힌트
- Firebase / Statsig 가 가장 단순.
- Sticky + tracking + cleanup.
- Power 계산 + significance.
- Mobile app version 고려.
## 🔗 관련 문서
- [[Backend_Feature_Flags_Deep]]
- [[AI_LLM_Eval_Patterns]]
- [[Mobile_Crash_Free_SLO]]