Files
2nd/10_Wiki/Topics/AI_and_ML/Cost-Benefit Analysis in AI.md
T
Antigravity Agent f8b21af4be Wiki cleanup: error-doc removal, dedup merge, link normalization
10_Wiki/Topics 대규모 정리:
- 오류 캡처/미완성 stub 문서 227개 제거
- 교차폴더 중복 43클러스터 병합 (63파일 → redirect)
- 링크명 정규화: 깨진 링크 수정·redirect 직결·개념 매핑 ~2,400건
- 카테고리 MOC 6개 신규 생성
- Graph 섹션 미해결 related-keyword 링크 10,058건 제거

Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com>
2026-05-20 23:52:15 +09:00

297 lines
9.3 KiB
Markdown
Raw Blame History

This file contains ambiguous Unicode characters
This file contains Unicode characters that might be confused with other characters. If you think that this is intentional, you can safely ignore this warning. Use the Escape button to reveal them.
---
id: wiki-2026-0508-cost-benefit-ai
title: Cost-Benefit Analysis in AI
category: 10_Wiki/Topics
status: verified
canonical_id: self
aliases: [AI ROI, AI cost analysis, total cost of ownership, FinOps for ML, model economics, LLM cost]
duplicate_of: none
source_trust_level: A
confidence_score: 0.9
verification_status: applied
tags: [ai-economics, roi, finops, mlops, llm-cost, total-cost-ownership, business-strategy]
raw_sources: []
last_reinforced: 2026-05-10
github_commit: pending
tech_stack:
language: business analysis
applicable_to: [AI Strategy, MLOps, Cost Engineering, Product Decision]
---
# Cost-Benefit Analysis in AI
## 매 한 줄
> **"매 model parameter 의 cost 의 value 의 정당화?"**. 매 GPU bill + 매 data + 매 MLOps 의 cost vs 매 revenue / time saved / risk reduce. 매 modern: LLM token economics, 매 build vs buy vs API, 매 sustainability.
## 매 핵심
### 매 cost component
#### Direct
- **Compute**: GPU/TPU hour.
- **Storage**: model weight + data + log.
- **Inference**: 매 token / image cost.
- **Training**: 매 fine-tune.
- **Network**: 매 egress.
#### Indirect
- **Data**: 매 collect, label, clean.
- **MLOps**: 매 platform team.
- **Engineering**: 매 develop, integrate.
- **Maintenance**: 매 retrain, monitor.
- **Compliance**: 매 audit, security.
#### Opportunity
- **Slow latency**: 매 user lose.
- **Failure modes**: 매 incident.
- **Vendor lock-in**.
### 매 benefit component
#### Quantitative
- **Revenue**: 매 conversion ↑.
- **Cost saving**: 매 labor automate.
- **Time**: 매 turnaround ↓.
- **Quality**: 매 error ↓.
#### Qualitative
- **UX improvement**.
- **Brand / competitive moat**.
- **Compliance protection**.
- **Strategic optionality**.
### 매 LLM economics (modern)
- **API**: 매 OpenAI / Anthropic — 매 per-token.
- **Self-host**: 매 Llama / Mistral — 매 hardware + ops.
- **Managed**: 매 Bedrock / Azure OpenAI — 매 enterprise contract.
- **Hybrid**: 매 critical = self-host, 매 burst = API.
### 매 cost / 1M token (2026 typical)
| Tier | Input $/1M | Output $/1M |
|---|---|---|
| Frontier (GPT-4o, Claude Sonnet) | 3-5 | 15-20 |
| Mid (Haiku, GPT-4o-mini) | 0.3-1 | 1-5 |
| Open (Llama-3.3 70B via API) | 0.5-1 | 0.7-1 |
| Self-host (estimated, amortized) | 0.05-0.5 | 0.1-1 |
→ 매 self-host 의 break-even = 매 use volume.
### 매 ROI calculation
$$\text{ROI} = \frac{\text{Benefit} - \text{Cost}}{\text{Cost}}$$
매 simple 가, 매 multi-period DCF / NPV 가 더 정확.
### 매 build vs buy vs API
| 옵션 | When |
|---|---|
| API | Low/medium volume + frontier capability |
| Self-host | High volume + cost-sensitive + privacy |
| Build (custom) | Differentiator + IP |
| Buy (vendor) | Generic + fast |
### 매 cost optimization technique
1. **Caching**: 매 same prompt → 매 cached.
2. **Routing** (Mix-of-models): 매 simple → cheap, 매 complex → big.
3. **Batching**: 매 throughput ↑.
4. **Quantization**: 매 INT8 / INT4.
5. **Distillation**: 매 small fine-tune.
6. **Spot / preemptible**: 매 batch only.
7. **Right-sizing**: 매 over-provision X.
8. **RAG vs fine-tune**: 매 cheaper.
9. **Token compression** (prompt engineering).
### 매 sustainability
- 매 CO₂ per inference.
- 매 ML CO2 calculator.
- 매 green compute (Google, Microsoft).
## 💻 패턴
### TCO calculator
```python
def total_cost_of_ownership(
monthly_inference_count: int,
cost_per_inference: float,
monthly_storage_gb: float,
storage_cost_per_gb: float,
monthly_engineering_hours: float,
eng_hourly_rate: float,
one_time_setup_cost: float,
months: int = 12,
):
inference_cost = monthly_inference_count * cost_per_inference * months
storage_cost = monthly_storage_gb * storage_cost_per_gb * months
eng_cost = monthly_engineering_hours * eng_hourly_rate * months
return one_time_setup_cost + inference_cost + storage_cost + eng_cost
tco = total_cost_of_ownership(
monthly_inference_count=1_000_000,
cost_per_inference=0.001, # $0.001 / inference
monthly_storage_gb=500,
storage_cost_per_gb=0.025,
monthly_engineering_hours=80,
eng_hourly_rate=150,
one_time_setup_cost=50_000,
)
print(f'12-month TCO: ${tco:,.0f}')
```
### Build vs API break-even
```python
def break_even_volume(
api_cost_per_call: float,
self_host_fixed_monthly: float, # 매 GPU + ops
self_host_marginal_per_call: float, # 매 electricity
):
"""매 break-even calls / month."""
if api_cost_per_call <= self_host_marginal_per_call:
return float('inf') # 매 API 의 cheaper always
return self_host_fixed_monthly / (api_cost_per_call - self_host_marginal_per_call)
# 매 example
break_even = break_even_volume(
api_cost_per_call=0.005,
self_host_fixed_monthly=4000, # 매 1× A100 + ops
self_host_marginal_per_call=0.0001,
)
print(f'Break-even: {break_even:,.0f} calls/month')
```
### LLM routing (multi-model)
```python
class CostAwareRouter:
def __init__(self):
self.simple_model = 'gpt-4o-mini' # 매 $0.15 / 1M input
self.complex_model = 'gpt-4o' # 매 $5 / 1M input
self.judge_model = 'gpt-4o-mini' # 매 cheap classifier
async def route(self, query: str):
# 매 1. classify complexity
complexity = await self.classify_complexity(query)
# 매 2. route
if complexity == 'simple':
return await llm.generate(query, model=self.simple_model)
else:
return await llm.generate(query, model=self.complex_model)
async def classify_complexity(self, query):
prompt = f"Classify '{query}' as 'simple' or 'complex'. Reply 1 word."
return (await llm.generate(prompt, model=self.judge_model)).strip().lower()
```
### Prompt cache (Anthropic / OpenAI)
```python
# 매 Claude prompt caching
import anthropic
client = anthropic.Anthropic()
response = client.messages.create(
model='claude-sonnet-4-6',
max_tokens=1024,
system=[
{
'type': 'text',
'text': LARGE_SYSTEM_PROMPT_10K_TOKENS, # 매 cached
'cache_control': {'type': 'ephemeral'},
},
],
messages=[{'role': 'user', 'content': user_query}],
)
# 매 90% cost reduction on cached portion.
```
### Token cost estimation (Tiktoken)
```python
import tiktoken
def estimate_cost(text: str, model='gpt-4o', kind='input'):
enc = tiktoken.encoding_for_model(model)
n_tokens = len(enc.encode(text))
rates = {
'gpt-4o': {'input': 5.00, 'output': 15.00},
'gpt-4o-mini': {'input': 0.15, 'output': 0.60},
}[model]
return n_tokens / 1_000_000 * rates[kind]
```
### A/B test ROI measurement
```python
def calculate_ai_lift(control_metrics, variant_metrics):
"""매 A/B test 의 lift 의 calculate."""
revenue_lift = (variant_metrics['avg_revenue'] - control_metrics['avg_revenue']) / control_metrics['avg_revenue']
monthly_users = variant_metrics['user_count']
monthly_lift_revenue = monthly_users * (variant_metrics['avg_revenue'] - control_metrics['avg_revenue'])
annual_lift = monthly_lift_revenue * 12
return {
'revenue_lift_pct': revenue_lift * 100,
'monthly_lift_$': monthly_lift_revenue,
'annual_lift_$': annual_lift,
}
```
### Batch vs real-time decision
```python
def latency_cost_tradeoff(latency_sla_ms, traffic_qps):
if latency_sla_ms > 5000 and traffic_qps < 10:
return 'batch' # 매 spot, async
if latency_sla_ms < 100:
return 'realtime + warm + autoscale'
return 'standard online'
```
### Sustainability tracker
```python
def co2_per_inference(model_size_b, gpu_tdp_w, tokens_per_sec, grid_g_per_kwh=400):
"""매 estimate CO2 / token."""
energy_per_token_wh = gpu_tdp_w / tokens_per_sec / 3600 # 매 Wh
co2_g = energy_per_token_wh / 1000 * grid_g_per_kwh
return co2_g # 매 grams CO2 / token
# 매 GPT-4o: 매 ~0.0001 g CO2 / token (estimated).
```
## 🤔 결정 기준
| 상황 | Approach |
|---|---|
| Low volume | API |
| High volume + privacy | Self-host |
| Mixed | Routing (cheap + expensive) |
| Predictable batch | Spot + offline |
| Real-time | Cache + warm + ANN |
| Frontier capability | API (latest) |
| Cost-sensitive | Open model + quantization |
**기본값**: 매 routing + 매 caching + 매 batching + 매 right-size.
## 🔗 Graph
- 부모: [[Business-Strategy]] · [[FinOps]] · [[MLOps]]
- 응용: [[LLM_Optimization_and_Deployment_Strategies|Quantization]] · [[RAG]]
- Adjacent: [[Batch-Inference]] · [[Bottlenecks]] · [[Bayesian-Optimization]] (hyperparam ROI) · [[Bioenergetics]] (energy)
## 🤖 LLM 활용
**언제**: 매 AI strategy. 매 build vs buy decision. 매 cost optimization. 매 vendor selection.
**언제 X**: 매 research / experiment (different metric).
## ❌ 안티패턴
- **Vanity model**: 매 frontier 의 unnecessary use.
- **No caching** (repeat prompt): 매 huge waste.
- **Single model 의 모든 task**: 매 cost ↑.
- **No A/B**: 매 ROI 의 prove X.
- **Hidden cost** (egress, monitoring): 매 surprise.
- **No sustainability tracking**.
## 🧪 검증 / 중복
- Verified (FinOps Foundation, ML CO2 papers, OpenAI / Anthropic pricing).
- 신뢰도 A.
- Related: [[Batch-Inference]] · [[MLOps]] · [[Bottlenecks]] · [[Antifragility]] · [[Axify]].
## 🕓 Changelog
| 날짜 | 변경 |
|---|---|
| 2026-05-08 | Phase 1 |
| 2026-05-10 | Manual cleanup — TCO + LLM economics + 매 routing / caching / break-even / CO2 code |