--- id: wiki-2026-0508-cost-benefit-ai title: Cost-Benefit Analysis in AI category: 10_Wiki/Topics status: verified canonical_id: self aliases: [AI ROI, AI cost analysis, total cost of ownership, FinOps for ML, model economics, LLM cost] duplicate_of: none source_trust_level: A confidence_score: 0.9 verification_status: applied tags: [ai-economics, roi, finops, mlops, llm-cost, total-cost-ownership, business-strategy] raw_sources: [] last_reinforced: 2026-05-10 github_commit: pending tech_stack: language: business analysis applicable_to: [AI Strategy, MLOps, Cost Engineering, Product Decision] --- # Cost-Benefit Analysis in AI ## 매 한 줄 > **"매 model parameter 의 cost 의 value 의 정당화?"**. 매 GPU bill + 매 data + 매 MLOps 의 cost vs 매 revenue / time saved / risk reduce. 매 modern: LLM token economics, 매 build vs buy vs API, 매 sustainability. ## 매 핵심 ### 매 cost component #### Direct - **Compute**: GPU/TPU hour. - **Storage**: model weight + data + log. - **Inference**: 매 token / image cost. - **Training**: 매 fine-tune. - **Network**: 매 egress. #### Indirect - **Data**: 매 collect, label, clean. - **MLOps**: 매 platform team. - **Engineering**: 매 develop, integrate. - **Maintenance**: 매 retrain, monitor. - **Compliance**: 매 audit, security. #### Opportunity - **Slow latency**: 매 user lose. - **Failure modes**: 매 incident. - **Vendor lock-in**. ### 매 benefit component #### Quantitative - **Revenue**: 매 conversion ↑. - **Cost saving**: 매 labor automate. - **Time**: 매 turnaround ↓. - **Quality**: 매 error ↓. #### Qualitative - **UX improvement**. - **Brand / competitive moat**. - **Compliance protection**. - **Strategic optionality**. ### 매 LLM economics (modern) - **API**: 매 OpenAI / Anthropic — 매 per-token. - **Self-host**: 매 Llama / Mistral — 매 hardware + ops. - **Managed**: 매 Bedrock / Azure OpenAI — 매 enterprise contract. - **Hybrid**: 매 critical = self-host, 매 burst = API. ### 매 cost / 1M token (2026 typical) | Tier | Input $/1M | Output $/1M | |---|---|---| | Frontier (GPT-4o, Claude Sonnet) | 3-5 | 15-20 | | Mid (Haiku, GPT-4o-mini) | 0.3-1 | 1-5 | | Open (Llama-3.3 70B via API) | 0.5-1 | 0.7-1 | | Self-host (estimated, amortized) | 0.05-0.5 | 0.1-1 | → 매 self-host 의 break-even = 매 use volume. ### 매 ROI calculation $$\text{ROI} = \frac{\text{Benefit} - \text{Cost}}{\text{Cost}}$$ 매 simple 가, 매 multi-period DCF / NPV 가 더 정확. ### 매 build vs buy vs API | 옵션 | When | |---|---| | API | Low/medium volume + frontier capability | | Self-host | High volume + cost-sensitive + privacy | | Build (custom) | Differentiator + IP | | Buy (vendor) | Generic + fast | ### 매 cost optimization technique 1. **Caching**: 매 same prompt → 매 cached. 2. **Routing** (Mix-of-models): 매 simple → cheap, 매 complex → big. 3. **Batching**: 매 throughput ↑. 4. **Quantization**: 매 INT8 / INT4. 5. **Distillation**: 매 small fine-tune. 6. **Spot / preemptible**: 매 batch only. 7. **Right-sizing**: 매 over-provision X. 8. **RAG vs fine-tune**: 매 cheaper. 9. **Token compression** (prompt engineering). ### 매 sustainability - 매 CO₂ per inference. - 매 ML CO2 calculator. - 매 green compute (Google, Microsoft). ## 💻 패턴 ### TCO calculator ```python def total_cost_of_ownership( monthly_inference_count: int, cost_per_inference: float, monthly_storage_gb: float, storage_cost_per_gb: float, monthly_engineering_hours: float, eng_hourly_rate: float, one_time_setup_cost: float, months: int = 12, ): inference_cost = monthly_inference_count * cost_per_inference * months storage_cost = monthly_storage_gb * storage_cost_per_gb * months eng_cost = monthly_engineering_hours * eng_hourly_rate * months return one_time_setup_cost + inference_cost + storage_cost + eng_cost tco = total_cost_of_ownership( monthly_inference_count=1_000_000, cost_per_inference=0.001, # $0.001 / inference monthly_storage_gb=500, storage_cost_per_gb=0.025, monthly_engineering_hours=80, eng_hourly_rate=150, one_time_setup_cost=50_000, ) print(f'12-month TCO: ${tco:,.0f}') ``` ### Build vs API break-even ```python def break_even_volume( api_cost_per_call: float, self_host_fixed_monthly: float, # 매 GPU + ops self_host_marginal_per_call: float, # 매 electricity ): """매 break-even calls / month.""" if api_cost_per_call <= self_host_marginal_per_call: return float('inf') # 매 API 의 cheaper always return self_host_fixed_monthly / (api_cost_per_call - self_host_marginal_per_call) # 매 example break_even = break_even_volume( api_cost_per_call=0.005, self_host_fixed_monthly=4000, # 매 1× A100 + ops self_host_marginal_per_call=0.0001, ) print(f'Break-even: {break_even:,.0f} calls/month') ``` ### LLM routing (multi-model) ```python class CostAwareRouter: def __init__(self): self.simple_model = 'gpt-4o-mini' # 매 $0.15 / 1M input self.complex_model = 'gpt-4o' # 매 $5 / 1M input self.judge_model = 'gpt-4o-mini' # 매 cheap classifier async def route(self, query: str): # 매 1. classify complexity complexity = await self.classify_complexity(query) # 매 2. route if complexity == 'simple': return await llm.generate(query, model=self.simple_model) else: return await llm.generate(query, model=self.complex_model) async def classify_complexity(self, query): prompt = f"Classify '{query}' as 'simple' or 'complex'. Reply 1 word." return (await llm.generate(prompt, model=self.judge_model)).strip().lower() ``` ### Prompt cache (Anthropic / OpenAI) ```python # 매 Claude prompt caching import anthropic client = anthropic.Anthropic() response = client.messages.create( model='claude-sonnet-4-6', max_tokens=1024, system=[ { 'type': 'text', 'text': LARGE_SYSTEM_PROMPT_10K_TOKENS, # 매 cached 'cache_control': {'type': 'ephemeral'}, }, ], messages=[{'role': 'user', 'content': user_query}], ) # 매 90% cost reduction on cached portion. ``` ### Token cost estimation (Tiktoken) ```python import tiktoken def estimate_cost(text: str, model='gpt-4o', kind='input'): enc = tiktoken.encoding_for_model(model) n_tokens = len(enc.encode(text)) rates = { 'gpt-4o': {'input': 5.00, 'output': 15.00}, 'gpt-4o-mini': {'input': 0.15, 'output': 0.60}, }[model] return n_tokens / 1_000_000 * rates[kind] ``` ### A/B test ROI measurement ```python def calculate_ai_lift(control_metrics, variant_metrics): """매 A/B test 의 lift 의 calculate.""" revenue_lift = (variant_metrics['avg_revenue'] - control_metrics['avg_revenue']) / control_metrics['avg_revenue'] monthly_users = variant_metrics['user_count'] monthly_lift_revenue = monthly_users * (variant_metrics['avg_revenue'] - control_metrics['avg_revenue']) annual_lift = monthly_lift_revenue * 12 return { 'revenue_lift_pct': revenue_lift * 100, 'monthly_lift_$': monthly_lift_revenue, 'annual_lift_$': annual_lift, } ``` ### Batch vs real-time decision ```python def latency_cost_tradeoff(latency_sla_ms, traffic_qps): if latency_sla_ms > 5000 and traffic_qps < 10: return 'batch' # 매 spot, async if latency_sla_ms < 100: return 'realtime + warm + autoscale' return 'standard online' ``` ### Sustainability tracker ```python def co2_per_inference(model_size_b, gpu_tdp_w, tokens_per_sec, grid_g_per_kwh=400): """매 estimate CO2 / token.""" energy_per_token_wh = gpu_tdp_w / tokens_per_sec / 3600 # 매 Wh co2_g = energy_per_token_wh / 1000 * grid_g_per_kwh return co2_g # 매 grams CO2 / token # 매 GPT-4o: 매 ~0.0001 g CO2 / token (estimated). ``` ## 🤔 결정 기준 | 상황 | Approach | |---|---| | Low volume | API | | High volume + privacy | Self-host | | Mixed | Routing (cheap + expensive) | | Predictable batch | Spot + offline | | Real-time | Cache + warm + ANN | | Frontier capability | API (latest) | | Cost-sensitive | Open model + quantization | **기본값**: 매 routing + 매 caching + 매 batching + 매 right-size. ## 🔗 Graph - 부모: [[Business-Strategy]] · [[FinOps]] · [[MLOps]] - 응용: [[LLM_Optimization_and_Deployment_Strategies|Quantization]] · [[RAG]] - Adjacent: [[Batch-Inference]] · [[Bottlenecks]] · [[Bayesian-Optimization]] (hyperparam ROI) · [[Bioenergetics]] (energy) ## 🤖 LLM 활용 **언제**: 매 AI strategy. 매 build vs buy decision. 매 cost optimization. 매 vendor selection. **언제 X**: 매 research / experiment (different metric). ## ❌ 안티패턴 - **Vanity model**: 매 frontier 의 unnecessary use. - **No caching** (repeat prompt): 매 huge waste. - **Single model 의 모든 task**: 매 cost ↑. - **No A/B**: 매 ROI 의 prove X. - **Hidden cost** (egress, monitoring): 매 surprise. - **No sustainability tracking**. ## 🧪 검증 / 중복 - Verified (FinOps Foundation, ML CO2 papers, OpenAI / Anthropic pricing). - 신뢰도 A. - Related: [[Batch-Inference]] · [[MLOps]] · [[Bottlenecks]] · [[Antifragility]] · [[Axify]]. ## 🕓 Changelog | 날짜 | 변경 | |---|---| | 2026-05-08 | Phase 1 | | 2026-05-10 | Manual cleanup — TCO + LLM economics + 매 routing / caching / break-even / CO2 code |