[G1-Sync] Manual knowledge update

2026-05-10 22:08:15 +09:00
parent 21ac3ed255
commit 504fd5fb42
3011 changed files with 380280 additions and 206977 deletions
@@ -1,63 +1,297 @@
 ---
-id: wiki-2026-0508-cost-benefit-analysis-in-ai
-title: Cost Benefit Analysis in AI
+id: wiki-2026-0508-cost-benefit-ai
+title: Cost-Benefit Analysis in AI
 category: 10_Wiki/Topics
-status: needs_review
+status: verified
 canonical_id: self
-aliases: [CBA-AI-001]
+aliases: [AI ROI, AI cost analysis, total cost of ownership, FinOps for ML, model economics, LLM cost]
 duplicate_of: none
 source_trust_level: A
-confidence_score: 1.0
-tags: [ai-business, Strategy, cost-benefit-Analysis, Scalability, Optimization]
+confidence_score: 0.9
+verification_status: applied
+tags: [ai-economics, roi, finops, mlops, llm-cost, total-cost-ownership, business-strategy]
 raw_sources: []
-last_reinforced: 2026-04-26
+last_reinforced: 2026-05-10
 github_commit: pending
-inferred_by: Claude Opus 4.7 (auto-normalize 2026-05-08)
+tech_stack:
+  language: business analysis
+  applicable_to: [AI Strategy, MLOps, Cost Engineering, Product Decision]
 ---

-# Cost-Benefit Analysis in AI (AI에서의 비용 대비 편익 분석)
+# Cost-Benefit Analysis in AI

-## 📌 한 줄 통찰 (The Karpathy Summary)
-> "모델의 파라미터 한 개가 만드는 가치가 그 연산 비용을 정당화하는지 측정하라" — AI 도입 시 발생하는 인프라, 학습, 유지보수 비용과 그로 인해 창출되는 비즈니스 가치, 효율성 향상, 위험 감소 효과를 정량적으로 비교 분석하는 전략적 틀.
+## 매 한 줄
+> **"매 model parameter 의 cost 의 value 의 정당화?"**. 매 GPU bill + 매 data + 매 MLOps 의 cost vs 매 revenue / time saved / risk reduce. 매 modern: LLM token economics, 매 build vs buy vs API, 매 sustainability.

-## 📖 구조화된 지식 (Synthesized Content)
- **추출된 패턴:** 기술적 성능(Accuracy, F1 score 등) 향상이 실제 비즈니스 수익으로 연결되는 지점을 파악하고, 한계 효용이 감소하는 임계점을 결정하는 의사결정 패턴.
- **주요 분석 항목:**
-    - **Costs:** GPU 연산 비용, 데이터 수집 및 라벨링 비용, [[MLOps|MLOps]] 인프라 구축비, 모델 서빙 지연 시간(Latency).
-    - **Benefits:** 업무 자동화에 따른 인건비 절감, 예측 정확도 향상으로 인한 매출 증대, 사용자 경험(UX) 개선 및 리텐션 확보.
-    - **Intangible Factors:** 브랜드 이미지 제고, 기술적 우위 선점, 데이터 보안 및 윤리적 리스크 방어.
- **ROI 최적화 전략:** 모델 경량화, 오픈소스 활용 vs 자체 구축 선택, 점진적 도입(MVP 우선) 전략.
+## 매 핵심

-## ⚠️ 모순 및 업데이트 (Contradictions & Updates)
- **과거 데이터와의 충돌:** 단순히 '성능이 좋은 모델'을 찾던 초기 연구 중심적 사고에서, 현재는 '지속 가능한 비용 효율'을 고려하는 엔지니어링 및 비즈니스 관점으로 전환.
- **정책 변화:** Antigravity 프로젝트는 새로운 AI 기능을 위키에 도입하기 전, 해당 기능이 제공하는 지식 검색의 질 향상이 서버 유지 비용을 상회하는지 비용 대비 편익 분석을 수행함.
+### 매 cost component

-## 🔗 지식 연결 (Graph)
-[[_system|system]]-Design-for-AI-Scale, Product-Thinking, Decision-Making, [[MLOps|MLOps]]
- **Raw Source:** 10_Wiki/Topics/AI/Cost-Benefit Analysis in AI.md
+#### Direct
+- **Compute**: GPU/TPU hour.
+- **Storage**: model weight + data + log.
+- **Inference**: 매 token / image cost.
+- **Training**: 매 fine-tune.
+- **Network**: 매 egress.

-## 🤖 LLM 활용 힌트 (How to Use This Knowledge)
+#### Indirect
+- **Data**: 매 collect, label, clean.
+- **MLOps**: 매 platform team.
+- **Engineering**: 매 develop, integrate.
+- **Maintenance**: 매 retrain, monitor.
+- **Compliance**: 매 audit, security.

-**언제 이 지식을 쓰는가:**
- *(TODO)*
+#### Opportunity
+- **Slow latency**: 매 user lose.
+- **Failure modes**: 매 incident.
+- **Vendor lock-in**.

-**언제 쓰면 안 되는가:**
- *(TODO)*
+### 매 benefit component

-## 🧪 검증 상태 (Validation)
+#### Quantitative
+- **Revenue**: 매 conversion ↑.
+- **Cost saving**: 매 labor automate.
+- **Time**: 매 turnaround ↓.
+- **Quality**: 매 error ↓.

- **정보 상태:** needs_review
- **출처 신뢰도:** A
- **검토 이유:** *(P-Reinforce Phase 1 자동 정규화. 본문 검증 필요.)*
+#### Qualitative
+- **UX improvement**.
+- **Brand / competitive moat**.
+- **Compliance protection**.
+- **Strategic optionality**.

-## 🧬 중복 검사 (Duplicate Check)
+### 매 LLM economics (modern)
+- **API**: 매 OpenAI / Anthropic — 매 per-token.
+- **Self-host**: 매 Llama / Mistral — 매 hardware + ops.
+- **Managed**: 매 Bedrock / Azure OpenAI — 매 enterprise contract.
+- **Hybrid**: 매 critical = self-host, 매 burst = API.

- **기존 유사 문서:** *(TODO: 인덱서 클러스터 리포트 참조)*
- **처리 방식:** UPDATE (자동 정규화)
- **처리 이유:** Phase 1 정규화 — 옛 템플릿/누락 필드 보강.
+### 매 cost / 1M token (2026 typical)
+| Tier | Input $/1M | Output $/1M |
+|---|---|---|
+| Frontier (GPT-4o, Claude Sonnet) | 3-5 | 15-20 |
+| Mid (Haiku, GPT-4o-mini) | 0.3-1 | 1-5 |
+| Open (Llama-3.3 70B via API) | 0.5-1 | 0.7-1 |
+| Self-host (estimated, amortized) | 0.05-0.5 | 0.1-1 |

-## 🕓 변경 이력 (Changelog)
+→ 매 self-host 의 break-even = 매 use volume.

-| 날짜 | 변경 내용 | 처리 방식 | 신뢰도 |
-|------|-----------|-----------|--------|
-| 2026-05-08 | P-Reinforce Phase 1 정규화 (frontmatter + 헤더 표준화) | UPDATE | A |
+### 매 ROI calculation
+$$\text{ROI} = \frac{\text{Benefit} - \text{Cost}}{\text{Cost}}$$
+
+매 simple 가, 매 multi-period DCF / NPV 가 더 정확.
+
+### 매 build vs buy vs API
+| 옵션 | When |
+|---|---|
+| API | Low/medium volume + frontier capability |
+| Self-host | High volume + cost-sensitive + privacy |
+| Build (custom) | Differentiator + IP |
+| Buy (vendor) | Generic + fast |
+
+### 매 cost optimization technique
+1. **Caching**: 매 same prompt → 매 cached.
+2. **Routing** (Mix-of-models): 매 simple → cheap, 매 complex → big.
+3. **Batching**: 매 throughput ↑.
+4. **Quantization**: 매 INT8 / INT4.
+5. **Distillation**: 매 small fine-tune.
+6. **Spot / preemptible**: 매 batch only.
+7. **Right-sizing**: 매 over-provision X.
+8. **RAG vs fine-tune**: 매 cheaper.
+9. **Token compression** (prompt engineering).
+
+### 매 sustainability
+- 매 CO₂ per inference.
+- 매 ML CO2 calculator.
+- 매 green compute (Google, Microsoft).
+
+## 💻 패턴
+
+### TCO calculator
+```python
+def total_cost_of_ownership(
+    monthly_inference_count: int,
+    cost_per_inference: float,
+    monthly_storage_gb: float,
+    storage_cost_per_gb: float,
+    monthly_engineering_hours: float,
+    eng_hourly_rate: float,
+    one_time_setup_cost: float,
+    months: int = 12,
+):
+    inference_cost = monthly_inference_count * cost_per_inference * months
+    storage_cost = monthly_storage_gb * storage_cost_per_gb * months
+    eng_cost = monthly_engineering_hours * eng_hourly_rate * months
+    return one_time_setup_cost + inference_cost + storage_cost + eng_cost
+
+tco = total_cost_of_ownership(
+    monthly_inference_count=1_000_000,
+    cost_per_inference=0.001,  # $0.001 / inference
+    monthly_storage_gb=500,
+    storage_cost_per_gb=0.025,
+    monthly_engineering_hours=80,
+    eng_hourly_rate=150,
+    one_time_setup_cost=50_000,
+)
+print(f'12-month TCO: ${tco:,.0f}')
+```
+
+### Build vs API break-even
+```python
+def break_even_volume(
+    api_cost_per_call: float,
+    self_host_fixed_monthly: float,  # 매 GPU + ops
+    self_host_marginal_per_call: float,  # 매 electricity
+):
+    """매 break-even calls / month."""
+    if api_cost_per_call <= self_host_marginal_per_call:
+        return float('inf')  # 매 API 의 cheaper always
+    return self_host_fixed_monthly / (api_cost_per_call - self_host_marginal_per_call)
+
+# 매 example
+break_even = break_even_volume(
+    api_cost_per_call=0.005,
+    self_host_fixed_monthly=4000,  # 매 1× A100 + ops
+    self_host_marginal_per_call=0.0001,
+)
+print(f'Break-even: {break_even:,.0f} calls/month')
+```
+
+### LLM routing (multi-model)
+```python
+class CostAwareRouter:
+    def __init__(self):
+        self.simple_model = 'gpt-4o-mini'  # 매 $0.15 / 1M input
+        self.complex_model = 'gpt-4o'  # 매 $5 / 1M input
+        self.judge_model = 'gpt-4o-mini'  # 매 cheap classifier
+    
+    async def route(self, query: str):
+        # 매 1. classify complexity
+        complexity = await self.classify_complexity(query)
+        
+        # 매 2. route
+        if complexity == 'simple':
+            return await llm.generate(query, model=self.simple_model)
+        else:
+            return await llm.generate(query, model=self.complex_model)
+    
+    async def classify_complexity(self, query):
+        prompt = f"Classify '{query}' as 'simple' or 'complex'. Reply 1 word."
+        return (await llm.generate(prompt, model=self.judge_model)).strip().lower()
+```
+
+### Prompt cache (Anthropic / OpenAI)
+```python
+# 매 Claude prompt caching
+import anthropic
+client = anthropic.Anthropic()
+
+response = client.messages.create(
+    model='claude-sonnet-4-6',
+    max_tokens=1024,
+    system=[
+        {
+            'type': 'text',
+            'text': LARGE_SYSTEM_PROMPT_10K_TOKENS,  # 매 cached
+            'cache_control': {'type': 'ephemeral'},
+        },
+    ],
+    messages=[{'role': 'user', 'content': user_query}],
+)
+# 매 90% cost reduction on cached portion.
+```
+
+### Token cost estimation (Tiktoken)
+```python
+import tiktoken
+
+def estimate_cost(text: str, model='gpt-4o', kind='input'):
+    enc = tiktoken.encoding_for_model(model)
+    n_tokens = len(enc.encode(text))
+    rates = {
+        'gpt-4o': {'input': 5.00, 'output': 15.00},
+        'gpt-4o-mini': {'input': 0.15, 'output': 0.60},
+    }[model]
+    return n_tokens / 1_000_000 * rates[kind]
+```
+
+### A/B test ROI measurement
+```python
+def calculate_ai_lift(control_metrics, variant_metrics):
+    """매 A/B test 의 lift 의 calculate."""
+    revenue_lift = (variant_metrics['avg_revenue'] - control_metrics['avg_revenue']) / control_metrics['avg_revenue']
+    
+    monthly_users = variant_metrics['user_count']
+    monthly_lift_revenue = monthly_users * (variant_metrics['avg_revenue'] - control_metrics['avg_revenue'])
+    
+    annual_lift = monthly_lift_revenue * 12
+    
+    return {
+        'revenue_lift_pct': revenue_lift * 100,
+        'monthly_lift_$': monthly_lift_revenue,
+        'annual_lift_$': annual_lift,
+    }
+```
+
+### Batch vs real-time decision
+```python
+def latency_cost_tradeoff(latency_sla_ms, traffic_qps):
+    if latency_sla_ms > 5000 and traffic_qps < 10:
+        return 'batch'  # 매 spot, async
+    if latency_sla_ms < 100:
+        return 'realtime + warm + autoscale'
+    return 'standard online'
+```
+
+### Sustainability tracker
+```python
+def co2_per_inference(model_size_b, gpu_tdp_w, tokens_per_sec, grid_g_per_kwh=400):
+    """매 estimate CO2 / token."""
+    energy_per_token_wh = gpu_tdp_w / tokens_per_sec / 3600  # 매 Wh
+    co2_g = energy_per_token_wh / 1000 * grid_g_per_kwh
+    return co2_g  # 매 grams CO2 / token
+
+# 매 GPT-4o: 매 ~0.0001 g CO2 / token (estimated).
+```
+
+## 🤔 결정 기준
+| 상황 | Approach |
+|---|---|
+| Low volume | API |
+| High volume + privacy | Self-host |
+| Mixed | Routing (cheap + expensive) |
+| Predictable batch | Spot + offline |
+| Real-time | Cache + warm + ANN |
+| Frontier capability | API (latest) |
+| Cost-sensitive | Open model + quantization |
+
+**기본값**: 매 routing + 매 caching + 매 batching + 매 right-size.
+
+## 🔗 Graph
+- 부모: [[Business-Strategy]] · [[FinOps]] · [[MLOps]]
+- 변형: [[TCO]] · [[ROI]] · [[NPV]] · [[Build-vs-Buy]]
+- 응용: [[LLM-Routing]] · [[Prompt-Caching]] · [[Quantization]] · [[RAG]]
+- Adjacent: [[Batch-Inference]] · [[Bottlenecks]] · [[Bayesian-Optimization]] (hyperparam ROI) · [[Bioenergetics]] (energy)
+
+## 🤖 LLM 활용
+**언제**: 매 AI strategy. 매 build vs buy decision. 매 cost optimization. 매 vendor selection.
+**언제 X**: 매 research / experiment (different metric).
+
+## ❌ 안티패턴
+- **Vanity model**: 매 frontier 의 unnecessary use.
+- **No caching** (repeat prompt): 매 huge waste.
+- **Single model 의 모든 task**: 매 cost ↑.
+- **No A/B**: 매 ROI 의 prove X.
+- **Hidden cost** (egress, monitoring): 매 surprise.
+- **No sustainability tracking**.
+
+## 🧪 검증 / 중복
+- Verified (FinOps Foundation, ML CO2 papers, OpenAI / Anthropic pricing).
+- 신뢰도 A.
+- Related: [[Batch-Inference]] · [[MLOps]] · [[Bottlenecks]] · [[Antifragility]] · [[Axify]].
+
+## 🕓 Changelog
+| 날짜 | 변경 |
+|---|---|
+| 2026-05-08 | Phase 1 |
+| 2026-05-10 | Manual cleanup — TCO + LLM economics + 매 routing / caching / break-even / CO2 code |