Files
2nd/10_Wiki/Topics/AI_and_ML/Cost-Benefit Analysis in AI.md
T
Antigravity Agent f8b21af4be Wiki cleanup: error-doc removal, dedup merge, link normalization
10_Wiki/Topics 대규모 정리:
- 오류 캡처/미완성 stub 문서 227개 제거
- 교차폴더 중복 43클러스터 병합 (63파일 → redirect)
- 링크명 정규화: 깨진 링크 수정·redirect 직결·개념 매핑 ~2,400건
- 카테고리 MOC 6개 신규 생성
- Graph 섹션 미해결 related-keyword 링크 10,058건 제거

Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com>
2026-05-20 23:52:15 +09:00

9.3 KiB
Raw Blame History

id, title, category, status, canonical_id, aliases, duplicate_of, source_trust_level, confidence_score, verification_status, tags, raw_sources, last_reinforced, github_commit, tech_stack
id title category status canonical_id aliases duplicate_of source_trust_level confidence_score verification_status tags raw_sources last_reinforced github_commit tech_stack
wiki-2026-0508-cost-benefit-ai Cost-Benefit Analysis in AI 10_Wiki/Topics verified self
AI ROI
AI cost analysis
total cost of ownership
FinOps for ML
model economics
LLM cost
none A 0.9 applied
ai-economics
roi
finops
mlops
llm-cost
total-cost-ownership
business-strategy
2026-05-10 pending
language applicable_to
business analysis
AI Strategy
MLOps
Cost Engineering
Product Decision

Cost-Benefit Analysis in AI

매 한 줄

"매 model parameter 의 cost 의 value 의 정당화?". 매 GPU bill + 매 data + 매 MLOps 의 cost vs 매 revenue / time saved / risk reduce. 매 modern: LLM token economics, 매 build vs buy vs API, 매 sustainability.

매 핵심

매 cost component

Direct

  • Compute: GPU/TPU hour.
  • Storage: model weight + data + log.
  • Inference: 매 token / image cost.
  • Training: 매 fine-tune.
  • Network: 매 egress.

Indirect

  • Data: 매 collect, label, clean.
  • MLOps: 매 platform team.
  • Engineering: 매 develop, integrate.
  • Maintenance: 매 retrain, monitor.
  • Compliance: 매 audit, security.

Opportunity

  • Slow latency: 매 user lose.
  • Failure modes: 매 incident.
  • Vendor lock-in.

매 benefit component

Quantitative

  • Revenue: 매 conversion ↑.
  • Cost saving: 매 labor automate.
  • Time: 매 turnaround ↓.
  • Quality: 매 error ↓.

Qualitative

  • UX improvement.
  • Brand / competitive moat.
  • Compliance protection.
  • Strategic optionality.

매 LLM economics (modern)

  • API: 매 OpenAI / Anthropic — 매 per-token.
  • Self-host: 매 Llama / Mistral — 매 hardware + ops.
  • Managed: 매 Bedrock / Azure OpenAI — 매 enterprise contract.
  • Hybrid: 매 critical = self-host, 매 burst = API.

매 cost / 1M token (2026 typical)

Tier Input $/1M Output $/1M
Frontier (GPT-4o, Claude Sonnet) 3-5 15-20
Mid (Haiku, GPT-4o-mini) 0.3-1 1-5
Open (Llama-3.3 70B via API) 0.5-1 0.7-1
Self-host (estimated, amortized) 0.05-0.5 0.1-1

→ 매 self-host 의 break-even = 매 use volume.

매 ROI calculation

\text{ROI} = \frac{\text{Benefit} - \text{Cost}}{\text{Cost}}

매 simple 가, 매 multi-period DCF / NPV 가 더 정확.

매 build vs buy vs API

옵션 When
API Low/medium volume + frontier capability
Self-host High volume + cost-sensitive + privacy
Build (custom) Differentiator + IP
Buy (vendor) Generic + fast

매 cost optimization technique

  1. Caching: 매 same prompt → 매 cached.
  2. Routing (Mix-of-models): 매 simple → cheap, 매 complex → big.
  3. Batching: 매 throughput ↑.
  4. Quantization: 매 INT8 / INT4.
  5. Distillation: 매 small fine-tune.
  6. Spot / preemptible: 매 batch only.
  7. Right-sizing: 매 over-provision X.
  8. RAG vs fine-tune: 매 cheaper.
  9. Token compression (prompt engineering).

매 sustainability

  • 매 CO₂ per inference.
  • 매 ML CO2 calculator.
  • 매 green compute (Google, Microsoft).

💻 패턴

TCO calculator

def total_cost_of_ownership(
    monthly_inference_count: int,
    cost_per_inference: float,
    monthly_storage_gb: float,
    storage_cost_per_gb: float,
    monthly_engineering_hours: float,
    eng_hourly_rate: float,
    one_time_setup_cost: float,
    months: int = 12,
):
    inference_cost = monthly_inference_count * cost_per_inference * months
    storage_cost = monthly_storage_gb * storage_cost_per_gb * months
    eng_cost = monthly_engineering_hours * eng_hourly_rate * months
    return one_time_setup_cost + inference_cost + storage_cost + eng_cost

tco = total_cost_of_ownership(
    monthly_inference_count=1_000_000,
    cost_per_inference=0.001,  # $0.001 / inference
    monthly_storage_gb=500,
    storage_cost_per_gb=0.025,
    monthly_engineering_hours=80,
    eng_hourly_rate=150,
    one_time_setup_cost=50_000,
)
print(f'12-month TCO: ${tco:,.0f}')

Build vs API break-even

def break_even_volume(
    api_cost_per_call: float,
    self_host_fixed_monthly: float,  # 매 GPU + ops
    self_host_marginal_per_call: float,  # 매 electricity
):
    """매 break-even calls / month."""
    if api_cost_per_call <= self_host_marginal_per_call:
        return float('inf')  # 매 API 의 cheaper always
    return self_host_fixed_monthly / (api_cost_per_call - self_host_marginal_per_call)

# 매 example
break_even = break_even_volume(
    api_cost_per_call=0.005,
    self_host_fixed_monthly=4000,  # 매 1× A100 + ops
    self_host_marginal_per_call=0.0001,
)
print(f'Break-even: {break_even:,.0f} calls/month')

LLM routing (multi-model)

class CostAwareRouter:
    def __init__(self):
        self.simple_model = 'gpt-4o-mini'  # 매 $0.15 / 1M input
        self.complex_model = 'gpt-4o'  # 매 $5 / 1M input
        self.judge_model = 'gpt-4o-mini'  # 매 cheap classifier
    
    async def route(self, query: str):
        # 매 1. classify complexity
        complexity = await self.classify_complexity(query)
        
        # 매 2. route
        if complexity == 'simple':
            return await llm.generate(query, model=self.simple_model)
        else:
            return await llm.generate(query, model=self.complex_model)
    
    async def classify_complexity(self, query):
        prompt = f"Classify '{query}' as 'simple' or 'complex'. Reply 1 word."
        return (await llm.generate(prompt, model=self.judge_model)).strip().lower()

Prompt cache (Anthropic / OpenAI)

# 매 Claude prompt caching
import anthropic
client = anthropic.Anthropic()

response = client.messages.create(
    model='claude-sonnet-4-6',
    max_tokens=1024,
    system=[
        {
            'type': 'text',
            'text': LARGE_SYSTEM_PROMPT_10K_TOKENS,  # 매 cached
            'cache_control': {'type': 'ephemeral'},
        },
    ],
    messages=[{'role': 'user', 'content': user_query}],
)
# 매 90% cost reduction on cached portion.

Token cost estimation (Tiktoken)

import tiktoken

def estimate_cost(text: str, model='gpt-4o', kind='input'):
    enc = tiktoken.encoding_for_model(model)
    n_tokens = len(enc.encode(text))
    rates = {
        'gpt-4o': {'input': 5.00, 'output': 15.00},
        'gpt-4o-mini': {'input': 0.15, 'output': 0.60},
    }[model]
    return n_tokens / 1_000_000 * rates[kind]

A/B test ROI measurement

def calculate_ai_lift(control_metrics, variant_metrics):
    """매 A/B test 의 lift 의 calculate."""
    revenue_lift = (variant_metrics['avg_revenue'] - control_metrics['avg_revenue']) / control_metrics['avg_revenue']
    
    monthly_users = variant_metrics['user_count']
    monthly_lift_revenue = monthly_users * (variant_metrics['avg_revenue'] - control_metrics['avg_revenue'])
    
    annual_lift = monthly_lift_revenue * 12
    
    return {
        'revenue_lift_pct': revenue_lift * 100,
        'monthly_lift_$': monthly_lift_revenue,
        'annual_lift_$': annual_lift,
    }

Batch vs real-time decision

def latency_cost_tradeoff(latency_sla_ms, traffic_qps):
    if latency_sla_ms > 5000 and traffic_qps < 10:
        return 'batch'  # 매 spot, async
    if latency_sla_ms < 100:
        return 'realtime + warm + autoscale'
    return 'standard online'

Sustainability tracker

def co2_per_inference(model_size_b, gpu_tdp_w, tokens_per_sec, grid_g_per_kwh=400):
    """매 estimate CO2 / token."""
    energy_per_token_wh = gpu_tdp_w / tokens_per_sec / 3600  # 매 Wh
    co2_g = energy_per_token_wh / 1000 * grid_g_per_kwh
    return co2_g  # 매 grams CO2 / token

# 매 GPT-4o: 매 ~0.0001 g CO2 / token (estimated).

🤔 결정 기준

상황 Approach
Low volume API
High volume + privacy Self-host
Mixed Routing (cheap + expensive)
Predictable batch Spot + offline
Real-time Cache + warm + ANN
Frontier capability API (latest)
Cost-sensitive Open model + quantization

기본값: 매 routing + 매 caching + 매 batching + 매 right-size.

🔗 Graph

🤖 LLM 활용

언제: 매 AI strategy. 매 build vs buy decision. 매 cost optimization. 매 vendor selection. 언제 X: 매 research / experiment (different metric).

안티패턴

  • Vanity model: 매 frontier 의 unnecessary use.
  • No caching (repeat prompt): 매 huge waste.
  • Single model 의 모든 task: 매 cost ↑.
  • No A/B: 매 ROI 의 prove X.
  • Hidden cost (egress, monitoring): 매 surprise.
  • No sustainability tracking.

🧪 검증 / 중복

🕓 Changelog

날짜 변경
2026-05-08 Phase 1
2026-05-10 Manual cleanup — TCO + LLM economics + 매 routing / caching / break-even / CO2 code