--- id: wiki-2026-0508-bottlenecks title: Bottlenecks (Performance & Process) category: 10_Wiki/Topics status: verified canonical_id: self aliases: [병목, bottleneck, theory of constraints, TOC, critical path, profiling] duplicate_of: none source_trust_level: A confidence_score: 0.93 verification_status: applied tags: [performance, bottleneck, profiling, theory-of-constraints, optimization, scalability, latency] raw_sources: [] last_reinforced: 2026-05-10 github_commit: pending tech_stack: language: any framework: profiling tools --- # Bottlenecks ## 📌 한 줄 통찰 > **"매 system 의 throat"**. 매 weakest link 의 throughput 의 결정. 매 non-bottleneck 의 improve = 매 시간 낭비. 매 Goldratt's TOC: 매 5 step. 매 modern AI: 매 HBM bandwidth + 매 network 의 bottleneck. ## 📖 핵심 ### 매 type 1. **Hardware**: CPU / GPU / RAM / disk / network. 2. **Software**: algorithm / blocking / lock contention. 3. **Process**: approval / single point of expertise. 4. **Data**: schema / indexing / partitioning. 5. **Cognitive** (team): meeting / context-switch. ### Theory of Constraints (Goldratt) 1. **Identify** the bottleneck. 2. **Exploit** it (use 100%). 3. **Subordinate** non-bottleneck (don't over-feed). 4. **Elevate** it (invest to widen). 5. **Repeat** (new bottleneck emerges). ### Amdahl's Law (related) - 매 90% 의 100× → 매 전체 의 매 10× cap. - 매 bottleneck 의 X 의 의미. ### 매 hardware bottleneck 의 modern (LLM) - **HBM bandwidth**: 매 H100 = 매 3 TB/s. 매 LLM inference 의 dominant. - **NVLink**: 매 GPU-GPU. - **Network** (RDMA, InfiniBand): 매 distributed train. - **PCIe**: 매 GPU-CPU. - **Storage**: 매 NVMe vs spinning. - **Power / cooling**: 매 datacenter limit. ### 매 software bottleneck - **CPU-bound**: 매 compute heavy. - **I/O-bound**: 매 disk / network wait. - **Memory-bound**: 매 swap / cache miss. - **Lock contention**: 매 mutex. - **GIL** (Python): 매 single-thread CPU. - **N+1 query**: 매 ORM 의 typical. ### 매 detection - **Profiler**: cProfile, perf, async-profiler. - **Trace**: distributed tracing (Jaeger). - **Metric**: CPU/mem/disk/network util. - **APM**: Datadog, NewRelic. - **Flame graph**. - **Critical path**. ### 매 process bottleneck - 매 approval chain. - 매 single expert. - 매 environment provisioning. - 매 review SLA. - 매 meeting cadence. → 매 DORA Lead Time 의 component. ### 매 data bottleneck - 매 single hot row. - 매 missing index. - 매 cross-shard transaction. - 매 schema migration block. ### 매 distributed bottleneck (modern) - 매 leader 의 single (Raft, Paxos). - 매 cross-region call. - 매 sync replication. - 매 connection pool limit. ## 💻 패턴 ### Profile (Python cProfile) ```python import cProfile, pstats def main(): expensive_call() cheap_call() cProfile.run('main()', 'out.prof') stats = pstats.Stats('out.prof').sort_stats('cumulative') stats.print_stats(20) ``` ### Linux perf (system-level) ```bash # 매 CPU profile perf record -F 99 -p $PID -- sleep 10 perf report # 매 flame graph perf script | ./stackcollapse-perf.pl | ./flamegraph.pl > flame.svg ``` ### Async profiler (JVM) ```bash # 매 sample lock contention java -jar async-profiler.jar -e lock -d 30 -f lock.html $PID # 매 wall clock (I/O bound 도) java -jar async-profiler.jar -e wall -d 30 -f wall.html $PID ``` ### N+1 detect (Django) ```python from django.test.utils import CaptureQueriesContext from django.db import connection with CaptureQueriesContext(connection) as ctx: posts = Post.objects.all() for post in posts: print(post.author.name) # 매 N+1 if len(ctx.captured_queries) > 5: log(f'N+1 detected: {len(ctx.captured_queries)} queries') # 매 fix posts = Post.objects.select_related('author') # 매 1 query ``` ### GPU bottleneck profile (PyTorch) ```python import torch.profiler as prof with prof.profile( activities=[prof.ProfilerActivity.CPU, prof.ProfilerActivity.CUDA], record_shapes=True, profile_memory=True, ) as p: model(input) print(p.key_averages().table(sort_by='cuda_time_total', row_limit=20)) # 매 HBM bandwidth bottleneck 의 reveal ``` ### Lock contention detection ```python import threading class LockMonitor: def __init__(self, lock): self.lock = lock self.wait_times = [] def __enter__(self): start = time.time() self.lock.acquire() self.wait_times.append(time.time() - start) def __exit__(self, *args): self.lock.release() def report(self): if not self.wait_times: return avg = sum(self.wait_times) / len(self.wait_times) if avg > 0.1: log(f'Lock contention: avg wait {avg*1000}ms') ``` ### Distributed trace (Jaeger) ```python from opentelemetry import trace from opentelemetry.sdk.trace import TracerProvider from opentelemetry.exporter.jaeger.thrift import JaegerExporter trace.set_tracer_provider(TracerProvider()) tracer = trace.get_tracer(__name__) @tracer.start_as_current_span('handle_request') def handle(req): with tracer.start_as_current_span('db_query') as span: span.set_attribute('db.statement', 'SELECT ...') result = db.query(...) return result ``` → 매 시각적 bottleneck identify. ### Process bottleneck (workflow analysis) ```python def analyze_workflow(stage_durations): """매 stage 별 의 throughput 의 비교.""" rates = {stage: 1 / dur for stage, dur in stage_durations.items()} bottleneck = min(rates, key=rates.get) overall_rate = rates[bottleneck] waste = sum(r - overall_rate for r in rates.values() if r > overall_rate) return { 'bottleneck': bottleneck, 'overall_rate_per_min': overall_rate * 60, 'capacity_wasted': waste, } ``` ### Critical path (DAG) ```python import networkx as nx def critical_path(tasks): """매 longest path through DAG.""" G = nx.DiGraph() for task in tasks: G.add_node(task.id, duration=task.duration) for dep in task.deps: G.add_edge(dep, task.id) # 매 longest path return nx.dag_longest_path(G, weight='duration') ``` ## 🤔 결정 기준 | 증상 | Tool | |---|---| | Slow request | APM + distributed trace | | CPU pegged | Flame graph (perf) | | GPU underutilized | Memory bandwidth (PyTorch profiler) | | Slow query | EXPLAIN + slow query log | | Lock contention | async-profiler -e lock | | Long lead time | Process / DORA analysis | | Thundering herd | Coordination check | **기본값**: 매 measure first. 매 hypothesis-based optimize. ## 🔗 Graph - 부모: [[System-Design]] - 변형: [[CPU-Bound]] - 응용: [[Theory-of-Constraints]] · [[Amdahl's Law]] · [[Critical-Path]] - Tool: [[Profiling]] · [[Flame-Graph]] · [[Distributed Tracing]] - Adjacent: [[Optimization]] · [[Scalability]] · [[DORA-Metrics]] ## 🤖 LLM 활용 **언제**: 매 performance optimization. 매 capacity planning. 매 incident root cause. 매 process improvement. **언제 X**: 매 hypothesis 없 의 optimize. ## ❌ 안티패턴 - **Optimize without measure**: 매 wrong place. - **Non-bottleneck improve**: 매 시간 waste (TOC). - **모든 part 의 평등 invest**: 매 ROI low. - **Single profile 의 trust**: 매 representative X. - **Process 의 "사람 의 fault"**: 매 system issue 가 대부분. - **Premature optimization**: 매 simplicity lose. ## 🧪 검증 / 중복 - Verified (Goldratt TOC, Knuth premature optimization, Brendan Gregg Systems Performance). - 신뢰도 A. - Related: [[Amdahl's Law]] · [[Theory-of-Constraints]] · [[Profiling]] · [[Critical-Path]]. ## 🕓 Changelog | 날짜 | 변경 | |---|---| | 2026-05-08 | Phase 1 | | 2026-05-10 | Manual cleanup — type + TOC + 매 profile / N+1 / GPU / trace code |