f8b21af4be
10_Wiki/Topics 대규모 정리: - 오류 캡처/미완성 stub 문서 227개 제거 - 교차폴더 중복 43클러스터 병합 (63파일 → redirect) - 링크명 정규화: 깨진 링크 수정·redirect 직결·개념 매핑 ~2,400건 - 카테고리 MOC 6개 신규 생성 - Graph 섹션 미해결 related-keyword 링크 10,058건 제거 Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com>
297 lines
9.3 KiB
Markdown
297 lines
9.3 KiB
Markdown
---
|
||
id: wiki-2026-0508-cost-benefit-ai
|
||
title: Cost-Benefit Analysis in AI
|
||
category: 10_Wiki/Topics
|
||
status: verified
|
||
canonical_id: self
|
||
aliases: [AI ROI, AI cost analysis, total cost of ownership, FinOps for ML, model economics, LLM cost]
|
||
duplicate_of: none
|
||
source_trust_level: A
|
||
confidence_score: 0.9
|
||
verification_status: applied
|
||
tags: [ai-economics, roi, finops, mlops, llm-cost, total-cost-ownership, business-strategy]
|
||
raw_sources: []
|
||
last_reinforced: 2026-05-10
|
||
github_commit: pending
|
||
tech_stack:
|
||
language: business analysis
|
||
applicable_to: [AI Strategy, MLOps, Cost Engineering, Product Decision]
|
||
---
|
||
|
||
# Cost-Benefit Analysis in AI
|
||
|
||
## 매 한 줄
|
||
> **"매 model parameter 의 cost 의 value 의 정당화?"**. 매 GPU bill + 매 data + 매 MLOps 의 cost vs 매 revenue / time saved / risk reduce. 매 modern: LLM token economics, 매 build vs buy vs API, 매 sustainability.
|
||
|
||
## 매 핵심
|
||
|
||
### 매 cost component
|
||
|
||
#### Direct
|
||
- **Compute**: GPU/TPU hour.
|
||
- **Storage**: model weight + data + log.
|
||
- **Inference**: 매 token / image cost.
|
||
- **Training**: 매 fine-tune.
|
||
- **Network**: 매 egress.
|
||
|
||
#### Indirect
|
||
- **Data**: 매 collect, label, clean.
|
||
- **MLOps**: 매 platform team.
|
||
- **Engineering**: 매 develop, integrate.
|
||
- **Maintenance**: 매 retrain, monitor.
|
||
- **Compliance**: 매 audit, security.
|
||
|
||
#### Opportunity
|
||
- **Slow latency**: 매 user lose.
|
||
- **Failure modes**: 매 incident.
|
||
- **Vendor lock-in**.
|
||
|
||
### 매 benefit component
|
||
|
||
#### Quantitative
|
||
- **Revenue**: 매 conversion ↑.
|
||
- **Cost saving**: 매 labor automate.
|
||
- **Time**: 매 turnaround ↓.
|
||
- **Quality**: 매 error ↓.
|
||
|
||
#### Qualitative
|
||
- **UX improvement**.
|
||
- **Brand / competitive moat**.
|
||
- **Compliance protection**.
|
||
- **Strategic optionality**.
|
||
|
||
### 매 LLM economics (modern)
|
||
- **API**: 매 OpenAI / Anthropic — 매 per-token.
|
||
- **Self-host**: 매 Llama / Mistral — 매 hardware + ops.
|
||
- **Managed**: 매 Bedrock / Azure OpenAI — 매 enterprise contract.
|
||
- **Hybrid**: 매 critical = self-host, 매 burst = API.
|
||
|
||
### 매 cost / 1M token (2026 typical)
|
||
| Tier | Input $/1M | Output $/1M |
|
||
|---|---|---|
|
||
| Frontier (GPT-4o, Claude Sonnet) | 3-5 | 15-20 |
|
||
| Mid (Haiku, GPT-4o-mini) | 0.3-1 | 1-5 |
|
||
| Open (Llama-3.3 70B via API) | 0.5-1 | 0.7-1 |
|
||
| Self-host (estimated, amortized) | 0.05-0.5 | 0.1-1 |
|
||
|
||
→ 매 self-host 의 break-even = 매 use volume.
|
||
|
||
### 매 ROI calculation
|
||
$$\text{ROI} = \frac{\text{Benefit} - \text{Cost}}{\text{Cost}}$$
|
||
|
||
매 simple 가, 매 multi-period DCF / NPV 가 더 정확.
|
||
|
||
### 매 build vs buy vs API
|
||
| 옵션 | When |
|
||
|---|---|
|
||
| API | Low/medium volume + frontier capability |
|
||
| Self-host | High volume + cost-sensitive + privacy |
|
||
| Build (custom) | Differentiator + IP |
|
||
| Buy (vendor) | Generic + fast |
|
||
|
||
### 매 cost optimization technique
|
||
1. **Caching**: 매 same prompt → 매 cached.
|
||
2. **Routing** (Mix-of-models): 매 simple → cheap, 매 complex → big.
|
||
3. **Batching**: 매 throughput ↑.
|
||
4. **Quantization**: 매 INT8 / INT4.
|
||
5. **Distillation**: 매 small fine-tune.
|
||
6. **Spot / preemptible**: 매 batch only.
|
||
7. **Right-sizing**: 매 over-provision X.
|
||
8. **RAG vs fine-tune**: 매 cheaper.
|
||
9. **Token compression** (prompt engineering).
|
||
|
||
### 매 sustainability
|
||
- 매 CO₂ per inference.
|
||
- 매 ML CO2 calculator.
|
||
- 매 green compute (Google, Microsoft).
|
||
|
||
## 💻 패턴
|
||
|
||
### TCO calculator
|
||
```python
|
||
def total_cost_of_ownership(
|
||
monthly_inference_count: int,
|
||
cost_per_inference: float,
|
||
monthly_storage_gb: float,
|
||
storage_cost_per_gb: float,
|
||
monthly_engineering_hours: float,
|
||
eng_hourly_rate: float,
|
||
one_time_setup_cost: float,
|
||
months: int = 12,
|
||
):
|
||
inference_cost = monthly_inference_count * cost_per_inference * months
|
||
storage_cost = monthly_storage_gb * storage_cost_per_gb * months
|
||
eng_cost = monthly_engineering_hours * eng_hourly_rate * months
|
||
return one_time_setup_cost + inference_cost + storage_cost + eng_cost
|
||
|
||
tco = total_cost_of_ownership(
|
||
monthly_inference_count=1_000_000,
|
||
cost_per_inference=0.001, # $0.001 / inference
|
||
monthly_storage_gb=500,
|
||
storage_cost_per_gb=0.025,
|
||
monthly_engineering_hours=80,
|
||
eng_hourly_rate=150,
|
||
one_time_setup_cost=50_000,
|
||
)
|
||
print(f'12-month TCO: ${tco:,.0f}')
|
||
```
|
||
|
||
### Build vs API break-even
|
||
```python
|
||
def break_even_volume(
|
||
api_cost_per_call: float,
|
||
self_host_fixed_monthly: float, # 매 GPU + ops
|
||
self_host_marginal_per_call: float, # 매 electricity
|
||
):
|
||
"""매 break-even calls / month."""
|
||
if api_cost_per_call <= self_host_marginal_per_call:
|
||
return float('inf') # 매 API 의 cheaper always
|
||
return self_host_fixed_monthly / (api_cost_per_call - self_host_marginal_per_call)
|
||
|
||
# 매 example
|
||
break_even = break_even_volume(
|
||
api_cost_per_call=0.005,
|
||
self_host_fixed_monthly=4000, # 매 1× A100 + ops
|
||
self_host_marginal_per_call=0.0001,
|
||
)
|
||
print(f'Break-even: {break_even:,.0f} calls/month')
|
||
```
|
||
|
||
### LLM routing (multi-model)
|
||
```python
|
||
class CostAwareRouter:
|
||
def __init__(self):
|
||
self.simple_model = 'gpt-4o-mini' # 매 $0.15 / 1M input
|
||
self.complex_model = 'gpt-4o' # 매 $5 / 1M input
|
||
self.judge_model = 'gpt-4o-mini' # 매 cheap classifier
|
||
|
||
async def route(self, query: str):
|
||
# 매 1. classify complexity
|
||
complexity = await self.classify_complexity(query)
|
||
|
||
# 매 2. route
|
||
if complexity == 'simple':
|
||
return await llm.generate(query, model=self.simple_model)
|
||
else:
|
||
return await llm.generate(query, model=self.complex_model)
|
||
|
||
async def classify_complexity(self, query):
|
||
prompt = f"Classify '{query}' as 'simple' or 'complex'. Reply 1 word."
|
||
return (await llm.generate(prompt, model=self.judge_model)).strip().lower()
|
||
```
|
||
|
||
### Prompt cache (Anthropic / OpenAI)
|
||
```python
|
||
# 매 Claude prompt caching
|
||
import anthropic
|
||
client = anthropic.Anthropic()
|
||
|
||
response = client.messages.create(
|
||
model='claude-sonnet-4-6',
|
||
max_tokens=1024,
|
||
system=[
|
||
{
|
||
'type': 'text',
|
||
'text': LARGE_SYSTEM_PROMPT_10K_TOKENS, # 매 cached
|
||
'cache_control': {'type': 'ephemeral'},
|
||
},
|
||
],
|
||
messages=[{'role': 'user', 'content': user_query}],
|
||
)
|
||
# 매 90% cost reduction on cached portion.
|
||
```
|
||
|
||
### Token cost estimation (Tiktoken)
|
||
```python
|
||
import tiktoken
|
||
|
||
def estimate_cost(text: str, model='gpt-4o', kind='input'):
|
||
enc = tiktoken.encoding_for_model(model)
|
||
n_tokens = len(enc.encode(text))
|
||
rates = {
|
||
'gpt-4o': {'input': 5.00, 'output': 15.00},
|
||
'gpt-4o-mini': {'input': 0.15, 'output': 0.60},
|
||
}[model]
|
||
return n_tokens / 1_000_000 * rates[kind]
|
||
```
|
||
|
||
### A/B test ROI measurement
|
||
```python
|
||
def calculate_ai_lift(control_metrics, variant_metrics):
|
||
"""매 A/B test 의 lift 의 calculate."""
|
||
revenue_lift = (variant_metrics['avg_revenue'] - control_metrics['avg_revenue']) / control_metrics['avg_revenue']
|
||
|
||
monthly_users = variant_metrics['user_count']
|
||
monthly_lift_revenue = monthly_users * (variant_metrics['avg_revenue'] - control_metrics['avg_revenue'])
|
||
|
||
annual_lift = monthly_lift_revenue * 12
|
||
|
||
return {
|
||
'revenue_lift_pct': revenue_lift * 100,
|
||
'monthly_lift_$': monthly_lift_revenue,
|
||
'annual_lift_$': annual_lift,
|
||
}
|
||
```
|
||
|
||
### Batch vs real-time decision
|
||
```python
|
||
def latency_cost_tradeoff(latency_sla_ms, traffic_qps):
|
||
if latency_sla_ms > 5000 and traffic_qps < 10:
|
||
return 'batch' # 매 spot, async
|
||
if latency_sla_ms < 100:
|
||
return 'realtime + warm + autoscale'
|
||
return 'standard online'
|
||
```
|
||
|
||
### Sustainability tracker
|
||
```python
|
||
def co2_per_inference(model_size_b, gpu_tdp_w, tokens_per_sec, grid_g_per_kwh=400):
|
||
"""매 estimate CO2 / token."""
|
||
energy_per_token_wh = gpu_tdp_w / tokens_per_sec / 3600 # 매 Wh
|
||
co2_g = energy_per_token_wh / 1000 * grid_g_per_kwh
|
||
return co2_g # 매 grams CO2 / token
|
||
|
||
# 매 GPT-4o: 매 ~0.0001 g CO2 / token (estimated).
|
||
```
|
||
|
||
## 🤔 결정 기준
|
||
| 상황 | Approach |
|
||
|---|---|
|
||
| Low volume | API |
|
||
| High volume + privacy | Self-host |
|
||
| Mixed | Routing (cheap + expensive) |
|
||
| Predictable batch | Spot + offline |
|
||
| Real-time | Cache + warm + ANN |
|
||
| Frontier capability | API (latest) |
|
||
| Cost-sensitive | Open model + quantization |
|
||
|
||
**기본값**: 매 routing + 매 caching + 매 batching + 매 right-size.
|
||
|
||
## 🔗 Graph
|
||
- 부모: [[Business-Strategy]] · [[FinOps]] · [[MLOps]]
|
||
- 응용: [[LLM_Optimization_and_Deployment_Strategies|Quantization]] · [[RAG]]
|
||
- Adjacent: [[Batch-Inference]] · [[Bottlenecks]] · [[Bayesian-Optimization]] (hyperparam ROI) · [[Bioenergetics]] (energy)
|
||
|
||
## 🤖 LLM 활용
|
||
**언제**: 매 AI strategy. 매 build vs buy decision. 매 cost optimization. 매 vendor selection.
|
||
**언제 X**: 매 research / experiment (different metric).
|
||
|
||
## ❌ 안티패턴
|
||
- **Vanity model**: 매 frontier 의 unnecessary use.
|
||
- **No caching** (repeat prompt): 매 huge waste.
|
||
- **Single model 의 모든 task**: 매 cost ↑.
|
||
- **No A/B**: 매 ROI 의 prove X.
|
||
- **Hidden cost** (egress, monitoring): 매 surprise.
|
||
- **No sustainability tracking**.
|
||
|
||
## 🧪 검증 / 중복
|
||
- Verified (FinOps Foundation, ML CO2 papers, OpenAI / Anthropic pricing).
|
||
- 신뢰도 A.
|
||
- Related: [[Batch-Inference]] · [[MLOps]] · [[Bottlenecks]] · [[Antifragility]] · [[Axify]].
|
||
|
||
## 🕓 Changelog
|
||
| 날짜 | 변경 |
|
||
|---|---|
|
||
| 2026-05-08 | Phase 1 |
|
||
| 2026-05-10 | Manual cleanup — TCO + LLM economics + 매 routing / caching / break-even / CO2 code |
|