f8b21af4be
10_Wiki/Topics 대규모 정리: - 오류 캡처/미완성 stub 문서 227개 제거 - 교차폴더 중복 43클러스터 병합 (63파일 → redirect) - 링크명 정규화: 깨진 링크 수정·redirect 직결·개념 매핑 ~2,400건 - 카테고리 MOC 6개 신규 생성 - Graph 섹션 미해결 related-keyword 링크 10,058건 제거 Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com>
275 lines
7.7 KiB
Markdown
275 lines
7.7 KiB
Markdown
---
|
||
id: wiki-2026-0508-bottlenecks
|
||
title: Bottlenecks (Performance & Process)
|
||
category: 10_Wiki/Topics
|
||
status: verified
|
||
canonical_id: self
|
||
aliases: [병목, bottleneck, theory of constraints, TOC, critical path, profiling]
|
||
duplicate_of: none
|
||
source_trust_level: A
|
||
confidence_score: 0.93
|
||
verification_status: applied
|
||
tags: [performance, bottleneck, profiling, theory-of-constraints, optimization, scalability, latency]
|
||
raw_sources: []
|
||
last_reinforced: 2026-05-10
|
||
github_commit: pending
|
||
tech_stack:
|
||
language: any
|
||
framework: profiling tools
|
||
---
|
||
|
||
# Bottlenecks
|
||
|
||
## 📌 한 줄 통찰
|
||
> **"매 system 의 throat"**. 매 weakest link 의 throughput 의 결정. 매 non-bottleneck 의 improve = 매 시간 낭비. 매 Goldratt's TOC: 매 5 step. 매 modern AI: 매 HBM bandwidth + 매 network 의 bottleneck.
|
||
|
||
## 📖 핵심
|
||
|
||
### 매 type
|
||
1. **Hardware**: CPU / GPU / RAM / disk / network.
|
||
2. **Software**: algorithm / blocking / lock contention.
|
||
3. **Process**: approval / single point of expertise.
|
||
4. **Data**: schema / indexing / partitioning.
|
||
5. **Cognitive** (team): meeting / context-switch.
|
||
|
||
### Theory of Constraints (Goldratt)
|
||
1. **Identify** the bottleneck.
|
||
2. **Exploit** it (use 100%).
|
||
3. **Subordinate** non-bottleneck (don't over-feed).
|
||
4. **Elevate** it (invest to widen).
|
||
5. **Repeat** (new bottleneck emerges).
|
||
|
||
### Amdahl's Law (related)
|
||
- 매 90% 의 100× → 매 전체 의 매 10× cap.
|
||
- 매 bottleneck 의 X 의 의미.
|
||
|
||
### 매 hardware bottleneck 의 modern (LLM)
|
||
- **HBM bandwidth**: 매 H100 = 매 3 TB/s. 매 LLM inference 의 dominant.
|
||
- **NVLink**: 매 GPU-GPU.
|
||
- **Network** (RDMA, InfiniBand): 매 distributed train.
|
||
- **PCIe**: 매 GPU-CPU.
|
||
- **Storage**: 매 NVMe vs spinning.
|
||
- **Power / cooling**: 매 datacenter limit.
|
||
|
||
### 매 software bottleneck
|
||
- **CPU-bound**: 매 compute heavy.
|
||
- **I/O-bound**: 매 disk / network wait.
|
||
- **Memory-bound**: 매 swap / cache miss.
|
||
- **Lock contention**: 매 mutex.
|
||
- **GIL** (Python): 매 single-thread CPU.
|
||
- **N+1 query**: 매 ORM 의 typical.
|
||
|
||
### 매 detection
|
||
- **Profiler**: cProfile, perf, async-profiler.
|
||
- **Trace**: distributed tracing (Jaeger).
|
||
- **Metric**: CPU/mem/disk/network util.
|
||
- **APM**: Datadog, NewRelic.
|
||
- **Flame graph**.
|
||
- **Critical path**.
|
||
|
||
### 매 process bottleneck
|
||
- 매 approval chain.
|
||
- 매 single expert.
|
||
- 매 environment provisioning.
|
||
- 매 review SLA.
|
||
- 매 meeting cadence.
|
||
|
||
→ 매 DORA Lead Time 의 component.
|
||
|
||
### 매 data bottleneck
|
||
- 매 single hot row.
|
||
- 매 missing index.
|
||
- 매 cross-shard transaction.
|
||
- 매 schema migration block.
|
||
|
||
### 매 distributed bottleneck (modern)
|
||
- 매 leader 의 single (Raft, Paxos).
|
||
- 매 cross-region call.
|
||
- 매 sync replication.
|
||
- 매 connection pool limit.
|
||
|
||
## 💻 패턴
|
||
|
||
### Profile (Python cProfile)
|
||
```python
|
||
import cProfile, pstats
|
||
|
||
def main():
|
||
expensive_call()
|
||
cheap_call()
|
||
|
||
cProfile.run('main()', 'out.prof')
|
||
stats = pstats.Stats('out.prof').sort_stats('cumulative')
|
||
stats.print_stats(20)
|
||
```
|
||
|
||
### Linux perf (system-level)
|
||
```bash
|
||
# 매 CPU profile
|
||
perf record -F 99 -p $PID -- sleep 10
|
||
perf report
|
||
|
||
# 매 flame graph
|
||
perf script | ./stackcollapse-perf.pl | ./flamegraph.pl > flame.svg
|
||
```
|
||
|
||
### Async profiler (JVM)
|
||
```bash
|
||
# 매 sample lock contention
|
||
java -jar async-profiler.jar -e lock -d 30 -f lock.html $PID
|
||
|
||
# 매 wall clock (I/O bound 도)
|
||
java -jar async-profiler.jar -e wall -d 30 -f wall.html $PID
|
||
```
|
||
|
||
### N+1 detect (Django)
|
||
```python
|
||
from django.test.utils import CaptureQueriesContext
|
||
from django.db import connection
|
||
|
||
with CaptureQueriesContext(connection) as ctx:
|
||
posts = Post.objects.all()
|
||
for post in posts:
|
||
print(post.author.name) # 매 N+1
|
||
|
||
if len(ctx.captured_queries) > 5:
|
||
log(f'N+1 detected: {len(ctx.captured_queries)} queries')
|
||
|
||
# 매 fix
|
||
posts = Post.objects.select_related('author') # 매 1 query
|
||
```
|
||
|
||
### GPU bottleneck profile (PyTorch)
|
||
```python
|
||
import torch.profiler as prof
|
||
|
||
with prof.profile(
|
||
activities=[prof.ProfilerActivity.CPU, prof.ProfilerActivity.CUDA],
|
||
record_shapes=True,
|
||
profile_memory=True,
|
||
) as p:
|
||
model(input)
|
||
|
||
print(p.key_averages().table(sort_by='cuda_time_total', row_limit=20))
|
||
|
||
# 매 HBM bandwidth bottleneck 의 reveal
|
||
```
|
||
|
||
### Lock contention detection
|
||
```python
|
||
import threading
|
||
|
||
class LockMonitor:
|
||
def __init__(self, lock):
|
||
self.lock = lock
|
||
self.wait_times = []
|
||
|
||
def __enter__(self):
|
||
start = time.time()
|
||
self.lock.acquire()
|
||
self.wait_times.append(time.time() - start)
|
||
|
||
def __exit__(self, *args):
|
||
self.lock.release()
|
||
|
||
def report(self):
|
||
if not self.wait_times: return
|
||
avg = sum(self.wait_times) / len(self.wait_times)
|
||
if avg > 0.1: log(f'Lock contention: avg wait {avg*1000}ms')
|
||
```
|
||
|
||
### Distributed trace (Jaeger)
|
||
```python
|
||
from opentelemetry import trace
|
||
from opentelemetry.sdk.trace import TracerProvider
|
||
from opentelemetry.exporter.jaeger.thrift import JaegerExporter
|
||
|
||
trace.set_tracer_provider(TracerProvider())
|
||
tracer = trace.get_tracer(__name__)
|
||
|
||
@tracer.start_as_current_span('handle_request')
|
||
def handle(req):
|
||
with tracer.start_as_current_span('db_query') as span:
|
||
span.set_attribute('db.statement', 'SELECT ...')
|
||
result = db.query(...)
|
||
return result
|
||
```
|
||
|
||
→ 매 시각적 bottleneck identify.
|
||
|
||
### Process bottleneck (workflow analysis)
|
||
```python
|
||
def analyze_workflow(stage_durations):
|
||
"""매 stage 별 의 throughput 의 비교."""
|
||
rates = {stage: 1 / dur for stage, dur in stage_durations.items()}
|
||
bottleneck = min(rates, key=rates.get)
|
||
|
||
overall_rate = rates[bottleneck]
|
||
waste = sum(r - overall_rate for r in rates.values() if r > overall_rate)
|
||
|
||
return {
|
||
'bottleneck': bottleneck,
|
||
'overall_rate_per_min': overall_rate * 60,
|
||
'capacity_wasted': waste,
|
||
}
|
||
```
|
||
|
||
### Critical path (DAG)
|
||
```python
|
||
import networkx as nx
|
||
|
||
def critical_path(tasks):
|
||
"""매 longest path through DAG."""
|
||
G = nx.DiGraph()
|
||
for task in tasks:
|
||
G.add_node(task.id, duration=task.duration)
|
||
for dep in task.deps:
|
||
G.add_edge(dep, task.id)
|
||
|
||
# 매 longest path
|
||
return nx.dag_longest_path(G, weight='duration')
|
||
```
|
||
|
||
## 🤔 결정 기준
|
||
| 증상 | Tool |
|
||
|---|---|
|
||
| Slow request | APM + distributed trace |
|
||
| CPU pegged | Flame graph (perf) |
|
||
| GPU underutilized | Memory bandwidth (PyTorch profiler) |
|
||
| Slow query | EXPLAIN + slow query log |
|
||
| Lock contention | async-profiler -e lock |
|
||
| Long lead time | Process / DORA analysis |
|
||
| Thundering herd | Coordination check |
|
||
|
||
**기본값**: 매 measure first. 매 hypothesis-based optimize.
|
||
|
||
## 🔗 Graph
|
||
- 부모: [[System-Design]]
|
||
- 변형: [[CPU-Bound]]
|
||
- 응용: [[Theory-of-Constraints]] · [[Amdahls Law (암달의 법칙)]] · [[Critical-Path]]
|
||
- Tool: [[Profiling]] · [[Flame-Graph]] · [[Distributed-Tracing]]
|
||
- Adjacent: [[Optimization]] · [[Scalability]] · [[DORA-Metrics]]
|
||
|
||
## 🤖 LLM 활용
|
||
**언제**: 매 performance optimization. 매 capacity planning. 매 incident root cause. 매 process improvement.
|
||
**언제 X**: 매 hypothesis 없 의 optimize.
|
||
|
||
## ❌ 안티패턴
|
||
- **Optimize without measure**: 매 wrong place.
|
||
- **Non-bottleneck improve**: 매 시간 waste (TOC).
|
||
- **모든 part 의 평등 invest**: 매 ROI low.
|
||
- **Single profile 의 trust**: 매 representative X.
|
||
- **Process 의 "사람 의 fault"**: 매 system issue 가 대부분.
|
||
- **Premature optimization**: 매 simplicity lose.
|
||
|
||
## 🧪 검증 / 중복
|
||
- Verified (Goldratt TOC, Knuth premature optimization, Brendan Gregg Systems Performance).
|
||
- 신뢰도 A.
|
||
- Related: [[Amdahls Law (암달의 법칙)]] · [[Theory-of-Constraints]] · [[Profiling]] · [[Critical-Path]].
|
||
|
||
## 🕓 Changelog
|
||
| 날짜 | 변경 |
|
||
|---|---|
|
||
| 2026-05-08 | Phase 1 |
|
||
| 2026-05-10 | Manual cleanup — type + TOC + 매 profile / N+1 / GPU / trace code |
|