Files
2nd/10_Wiki/Topics/AI_and_ML/Bottlenecks.md
T
Antigravity Agent f8b21af4be Wiki cleanup: error-doc removal, dedup merge, link normalization
10_Wiki/Topics 대규모 정리:
- 오류 캡처/미완성 stub 문서 227개 제거
- 교차폴더 중복 43클러스터 병합 (63파일 → redirect)
- 링크명 정규화: 깨진 링크 수정·redirect 직결·개념 매핑 ~2,400건
- 카테고리 MOC 6개 신규 생성
- Graph 섹션 미해결 related-keyword 링크 10,058건 제거

Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com>
2026-05-20 23:52:15 +09:00

275 lines
7.7 KiB
Markdown
Raw Blame History

This file contains ambiguous Unicode characters
This file contains Unicode characters that might be confused with other characters. If you think that this is intentional, you can safely ignore this warning. Use the Escape button to reveal them.
---
id: wiki-2026-0508-bottlenecks
title: Bottlenecks (Performance & Process)
category: 10_Wiki/Topics
status: verified
canonical_id: self
aliases: [병목, bottleneck, theory of constraints, TOC, critical path, profiling]
duplicate_of: none
source_trust_level: A
confidence_score: 0.93
verification_status: applied
tags: [performance, bottleneck, profiling, theory-of-constraints, optimization, scalability, latency]
raw_sources: []
last_reinforced: 2026-05-10
github_commit: pending
tech_stack:
language: any
framework: profiling tools
---
# Bottlenecks
## 📌 한 줄 통찰
> **"매 system 의 throat"**. 매 weakest link 의 throughput 의 결정. 매 non-bottleneck 의 improve = 매 시간 낭비. 매 Goldratt's TOC: 매 5 step. 매 modern AI: 매 HBM bandwidth + 매 network 의 bottleneck.
## 📖 핵심
### 매 type
1. **Hardware**: CPU / GPU / RAM / disk / network.
2. **Software**: algorithm / blocking / lock contention.
3. **Process**: approval / single point of expertise.
4. **Data**: schema / indexing / partitioning.
5. **Cognitive** (team): meeting / context-switch.
### Theory of Constraints (Goldratt)
1. **Identify** the bottleneck.
2. **Exploit** it (use 100%).
3. **Subordinate** non-bottleneck (don't over-feed).
4. **Elevate** it (invest to widen).
5. **Repeat** (new bottleneck emerges).
### Amdahl's Law (related)
- 매 90% 의 100× → 매 전체 의 매 10× cap.
- 매 bottleneck 의 X 의 의미.
### 매 hardware bottleneck 의 modern (LLM)
- **HBM bandwidth**: 매 H100 = 매 3 TB/s. 매 LLM inference 의 dominant.
- **NVLink**: 매 GPU-GPU.
- **Network** (RDMA, InfiniBand): 매 distributed train.
- **PCIe**: 매 GPU-CPU.
- **Storage**: 매 NVMe vs spinning.
- **Power / cooling**: 매 datacenter limit.
### 매 software bottleneck
- **CPU-bound**: 매 compute heavy.
- **I/O-bound**: 매 disk / network wait.
- **Memory-bound**: 매 swap / cache miss.
- **Lock contention**: 매 mutex.
- **GIL** (Python): 매 single-thread CPU.
- **N+1 query**: 매 ORM 의 typical.
### 매 detection
- **Profiler**: cProfile, perf, async-profiler.
- **Trace**: distributed tracing (Jaeger).
- **Metric**: CPU/mem/disk/network util.
- **APM**: Datadog, NewRelic.
- **Flame graph**.
- **Critical path**.
### 매 process bottleneck
- 매 approval chain.
- 매 single expert.
- 매 environment provisioning.
- 매 review SLA.
- 매 meeting cadence.
→ 매 DORA Lead Time 의 component.
### 매 data bottleneck
- 매 single hot row.
- 매 missing index.
- 매 cross-shard transaction.
- 매 schema migration block.
### 매 distributed bottleneck (modern)
- 매 leader 의 single (Raft, Paxos).
- 매 cross-region call.
- 매 sync replication.
- 매 connection pool limit.
## 💻 패턴
### Profile (Python cProfile)
```python
import cProfile, pstats
def main():
expensive_call()
cheap_call()
cProfile.run('main()', 'out.prof')
stats = pstats.Stats('out.prof').sort_stats('cumulative')
stats.print_stats(20)
```
### Linux perf (system-level)
```bash
# 매 CPU profile
perf record -F 99 -p $PID -- sleep 10
perf report
# 매 flame graph
perf script | ./stackcollapse-perf.pl | ./flamegraph.pl > flame.svg
```
### Async profiler (JVM)
```bash
# 매 sample lock contention
java -jar async-profiler.jar -e lock -d 30 -f lock.html $PID
# 매 wall clock (I/O bound 도)
java -jar async-profiler.jar -e wall -d 30 -f wall.html $PID
```
### N+1 detect (Django)
```python
from django.test.utils import CaptureQueriesContext
from django.db import connection
with CaptureQueriesContext(connection) as ctx:
posts = Post.objects.all()
for post in posts:
print(post.author.name) # 매 N+1
if len(ctx.captured_queries) > 5:
log(f'N+1 detected: {len(ctx.captured_queries)} queries')
# 매 fix
posts = Post.objects.select_related('author') # 매 1 query
```
### GPU bottleneck profile (PyTorch)
```python
import torch.profiler as prof
with prof.profile(
activities=[prof.ProfilerActivity.CPU, prof.ProfilerActivity.CUDA],
record_shapes=True,
profile_memory=True,
) as p:
model(input)
print(p.key_averages().table(sort_by='cuda_time_total', row_limit=20))
# 매 HBM bandwidth bottleneck 의 reveal
```
### Lock contention detection
```python
import threading
class LockMonitor:
def __init__(self, lock):
self.lock = lock
self.wait_times = []
def __enter__(self):
start = time.time()
self.lock.acquire()
self.wait_times.append(time.time() - start)
def __exit__(self, *args):
self.lock.release()
def report(self):
if not self.wait_times: return
avg = sum(self.wait_times) / len(self.wait_times)
if avg > 0.1: log(f'Lock contention: avg wait {avg*1000}ms')
```
### Distributed trace (Jaeger)
```python
from opentelemetry import trace
from opentelemetry.sdk.trace import TracerProvider
from opentelemetry.exporter.jaeger.thrift import JaegerExporter
trace.set_tracer_provider(TracerProvider())
tracer = trace.get_tracer(__name__)
@tracer.start_as_current_span('handle_request')
def handle(req):
with tracer.start_as_current_span('db_query') as span:
span.set_attribute('db.statement', 'SELECT ...')
result = db.query(...)
return result
```
→ 매 시각적 bottleneck identify.
### Process bottleneck (workflow analysis)
```python
def analyze_workflow(stage_durations):
"""매 stage 별 의 throughput 의 비교."""
rates = {stage: 1 / dur for stage, dur in stage_durations.items()}
bottleneck = min(rates, key=rates.get)
overall_rate = rates[bottleneck]
waste = sum(r - overall_rate for r in rates.values() if r > overall_rate)
return {
'bottleneck': bottleneck,
'overall_rate_per_min': overall_rate * 60,
'capacity_wasted': waste,
}
```
### Critical path (DAG)
```python
import networkx as nx
def critical_path(tasks):
"""매 longest path through DAG."""
G = nx.DiGraph()
for task in tasks:
G.add_node(task.id, duration=task.duration)
for dep in task.deps:
G.add_edge(dep, task.id)
# 매 longest path
return nx.dag_longest_path(G, weight='duration')
```
## 🤔 결정 기준
| 증상 | Tool |
|---|---|
| Slow request | APM + distributed trace |
| CPU pegged | Flame graph (perf) |
| GPU underutilized | Memory bandwidth (PyTorch profiler) |
| Slow query | EXPLAIN + slow query log |
| Lock contention | async-profiler -e lock |
| Long lead time | Process / DORA analysis |
| Thundering herd | Coordination check |
**기본값**: 매 measure first. 매 hypothesis-based optimize.
## 🔗 Graph
- 부모: [[System-Design]]
- 변형: [[CPU-Bound]]
- 응용: [[Theory-of-Constraints]] · [[Amdahls Law (암달의 법칙)]] · [[Critical-Path]]
- Tool: [[Profiling]] · [[Flame-Graph]] · [[Distributed-Tracing]]
- Adjacent: [[Optimization]] · [[Scalability]] · [[DORA-Metrics]]
## 🤖 LLM 활용
**언제**: 매 performance optimization. 매 capacity planning. 매 incident root cause. 매 process improvement.
**언제 X**: 매 hypothesis 없 의 optimize.
## ❌ 안티패턴
- **Optimize without measure**: 매 wrong place.
- **Non-bottleneck improve**: 매 시간 waste (TOC).
- **모든 part 의 평등 invest**: 매 ROI low.
- **Single profile 의 trust**: 매 representative X.
- **Process 의 "사람 의 fault"**: 매 system issue 가 대부분.
- **Premature optimization**: 매 simplicity lose.
## 🧪 검증 / 중복
- Verified (Goldratt TOC, Knuth premature optimization, Brendan Gregg Systems Performance).
- 신뢰도 A.
- Related: [[Amdahls Law (암달의 법칙)]] · [[Theory-of-Constraints]] · [[Profiling]] · [[Critical-Path]].
## 🕓 Changelog
| 날짜 | 변경 |
|---|---|
| 2026-05-08 | Phase 1 |
| 2026-05-10 | Manual cleanup — type + TOC + 매 profile / N+1 / GPU / trace code |