Files

T

koriweb d8a80f6272 chore(wiki): dangling 링크 canonical 정규화 (768파일/1200건)

이름만 다른(표기 변형) [[위키링크]]를 대상 문서의 canonical 제목으로 치환해
끊겼던 1,200개 링크를 연결. 제목/파일명 정규화 일치만 적용하고 별칭 매칭은
과병합 위험으로 제외(애매성 가드). 원본은 _link_reconcile_backup/ 에 백업.
도구: Datacollect/scripts/link_reconcile_apply.mjs

Co-Authored-By: Claude Opus 4.8 <noreply@anthropic.com>

2026-06-08 12:24:15 +09:00

8.0 KiB

Raw Blame History

id, title, category, status, canonical_id, aliases, duplicate_of, source_trust_level, confidence_score, verification_status, tags, raw_sources, last_reinforced, github_commit, tech_stack

title

Data Flywheel Effect

매 한 줄

"매 model → 매 product → 매 user → 매 data → 매 better model". 매 AI 의 defensible moat 의 source. 매 cold start 의 hardest. 매 modern: 매 LLM 시대 의 quality flywheel (RLHF, user feedback). 매 critique: 매 quantity ≠ moat.

매 핵심 cycle

Better model.
Better product / UX.
More users.
More data (interaction).
Model improvement → 매 1.

매 conditions for flywheel

Network effect of data: 매 user 1 → user 2 의 benefit.
Reinvestment: 매 data → 매 model improvement loop.
Speed: 매 cycle 의 cycle 의 빠름.
Quality matters: 매 noise 의 ↑ 의 model 의 degrade.

매 examples

Strong flywheel

Google Search: 매 click → 매 ranking.
Tesla FSD: 매 mile → 매 model.
Spotify: 매 listen → 매 recommend.
Waze: 매 traffic → 매 routing.
Duolingo: 매 mistake → 매 SRS.

Weak / Failed

Many startup AI: 매 data 의 collect 가 매 use X.
Generic chatbot: 매 user feedback X.

매 moat strength factor

Data exclusivity: 매 own only.
Data quality: 매 noise filter.
Data freshness: 매 update speed.
Network density: 매 user 의 interaction.
Switching cost: 매 lock-in.
Privacy compliance: 매 GDPR.

매 cold start strategy

Hand-curate: 매 first 1000 user 의 manually.
Synthetic data: 매 simulate.
Open data: 매 Wikipedia, 매 CommonCrawl.
Acquisition: 매 dataset 의 buy.
Lighthouse customer: 매 large customer 의 data.
Product-led growth: 매 free tier.

매 modern (LLM era)

RLHF: 매 user preference 의 collect.
Implicit feedback: 매 thumbs up / down, 매 dwell time.
A/B: 매 model variant.
User correction: 매 manual edit.

매 risks

Bias amplification: 매 own user 의 bias 의 reinforce.
Echo chamber: 매 narrow.
Privacy: 매 PII.
Regulatory: 매 EU AI Act.
Model collapse: 매 synthetic training.

매 critique

"Data is not the new oil — it's the new sand." (cheap, abundant)
매 LLM era 의 base model 의 commoditize.
매 quality > quantity.
매 application-layer 의 differentiate.

💻 패턴

Flywheel measurement

def flywheel_health(metrics):
    return {
        'data_growth_rate': (metrics.data_now - metrics.data_year_ago) / metrics.data_year_ago,
        'model_improvement_rate': (metrics.eval_now - metrics.eval_year_ago) / metrics.eval_year_ago,
        'user_growth_rate': metrics.users_now / metrics.users_year_ago,
        'data_per_user': metrics.data_now / metrics.users_now,
        'feedback_rate': metrics.feedback_count / metrics.user_interaction_count,
    }

Implicit feedback collection

def collect_implicit_feedback(user_id, response_id, signal_type, value):
    """매 dwell time, scroll depth, copy, share."""
    db.feedback.insert({
        'user_id': user_id,
        'response_id': response_id,
        'signal': signal_type,  # 매 'dwell', 'copy', 'share', 'edit'
        'value': value,
        'timestamp': datetime.now(),
    })

# 매 매 dwell > 30 sec → 매 positive signal.

RLHF data pipeline

def rlhf_pipeline():
    # 매 1. user interaction
    interactions = collect_interactions()
    
    # 매 2. preference pair generation
    pairs = []
    for i in interactions:
        if i.has_thumbs_up_and_down_in_session:
            pairs.append({
                'prompt': i.prompt,
                'chosen': i.thumbs_up_response,
                'rejected': i.thumbs_down_response,
            })
    
    # 매 3. quality filter
    pairs = filter_quality(pairs)
    
    # 매 4. DPO / RLHF train
    train_dpo(pairs)
    
    # 매 5. shadow deploy
    shadow_test_new_model()
    
    # 매 6. gradual rollout
    canary_deploy(percentage=5)

Cold start: synthetic data

def bootstrap_cold_start(use_case, n=1000):
    """매 synthetic data 의 first model 의 train."""
    examples = []
    for _ in range(n):
        seed = generate_seed_for(use_case)
        synthetic = llm.generate(f"""Generate a realistic example for: {use_case}
Input: ...
Expected output: ...""")
        examples.append(synthetic)
    return examples

A/B test (model improvement signal)

def ab_test_model(model_old, model_new, traffic_pct=10):
    def assign(user_id):
        return 'new' if hash(user_id) % 100 < traffic_pct else 'old'
    
    metrics = collect_metrics_by_variant(assign)
    if statistical_significance(metrics) and metrics['new'] > metrics['old']:
        promote(model_new)

Data quality scoring

def score_training_example(example, base_model):
    """매 매 example 의 quality 의 estimate."""
    score = 0
    score += has_diverse_vocab(example) * 0.2
    score += not_repetitive(example) * 0.2
    score += factually_consistent(example) * 0.3
    score += task_clarity(example) * 0.3
    return score

# 매 top-K 의 select for training.

Privacy-preserving learning

# 매 federated learning
def federated_update(global_model, client_data_chunks):
    local_updates = []
    for client_chunk in client_data_chunks:
        local_model = global_model.copy()
        local_model.train(client_chunk)
        local_updates.append(local_model.weights - global_model.weights)
    
    # 매 average update only — 매 raw data 의 leave 의 X
    global_model.weights += avg(local_updates)
    return global_model

Defensibility audit

def defensibility_score(metrics):
    score = 0
    if metrics.proprietary_data_exclusivity: score += 3
    if metrics.user_lock_in > 0.5: score += 2
    if metrics.network_density > 0.7: score += 2
    if metrics.data_quality_unique: score += 2
    if metrics.regulatory_barrier: score += 1
    return f'Moat strength: {score}/10'

매 결정 기준

상황	Strategy
Cold start	Synthetic + open data + lighthouse customer
Growing	Implicit feedback + A/B
Scale	RLHF + automation
Sensitive	Federated + DP
Specialized	Quality > quantity (curate)
Generic	Network effect (UGC)

기본값: 매 implicit feedback + 매 quality classifier + 매 RLHF + 매 A/B test.

🔗 Graph

부모: Defensibility
변형: Network-Effect · Data-Moat · Cold-Start
응용: RLHF · DPO · Federated-Learning
Adjacent: Concept-Drift · Cost-Benefit Analysis in AI · Asset-Specific-Knowledge · Algorithmic Fairness

🤖 LLM 활용

언제: 매 AI startup strategy. 매 product roadmap. 매 moat assessment. 매 fundraising 의 differentiator. 언제 X: 매 commodity (no flywheel possible).

❌ 안티패턴

Data hoarding (no use): 매 flywheel X.
Quality 의 ignore: 매 noise 의 amplify.
No feedback collection: 매 cycle 의 break.
Privacy violation: 매 regulatory + trust loss.
"Data is moat" 의 unconditional 신뢰: 매 LLM 의 commodity.
Synthetic data only: 매 model collapse.

🧪 검증 / 중복

Verified (Andreessen Horowitz "Data Network Effects", Reid Hoffman, Tesla / Google case studies).
신뢰도 B.
Related: Cost-Benefit Analysis in AI · Concept-Drift · Asset-Specific-Knowledge · CV_Synthesis · Algorithmic Fairness.

🕓 Changelog

날짜	변경
2026-05-08	Phase 1
2026-05-10	Manual cleanup — cycle + cold start + 매 RLHF / A/B / federated / quality code

8.0 KiB Raw Blame History