d8a80f6272
이름만 다른(표기 변형) [[위키링크]]를 대상 문서의 canonical 제목으로 치환해 끊겼던 1,200개 링크를 연결. 제목/파일명 정규화 일치만 적용하고 별칭 매칭은 과병합 위험으로 제외(애매성 가드). 원본은 _link_reconcile_backup/ 에 백업. 도구: Datacollect/scripts/link_reconcile_apply.mjs Co-Authored-By: Claude Opus 4.8 <noreply@anthropic.com>
7.6 KiB
7.6 KiB
id, title, category, status, canonical_id, aliases, duplicate_of, source_trust_level, confidence_score, verification_status, tags, raw_sources, last_reinforced, github_commit, tech_stack
| id | title | category | status | canonical_id | aliases | duplicate_of | source_trust_level | confidence_score | verification_status | tags | raw_sources | last_reinforced | github_commit | tech_stack | |||||||||||||||||
|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|
| wiki-2026-0508-bottlenecks | Bottlenecks (Performance & Process) | 10_Wiki/Topics | verified | self |
|
none | A | 0.93 | applied |
|
2026-05-10 | pending |
|
Bottlenecks
📌 한 줄 통찰
"매 system 의 throat". 매 weakest link 의 throughput 의 결정. 매 non-bottleneck 의 improve = 매 시간 낭비. 매 Goldratt's TOC: 매 5 step. 매 modern AI: 매 HBM bandwidth + 매 network 의 bottleneck.
📖 핵심
매 type
- Hardware: CPU / GPU / RAM / disk / network.
- Software: algorithm / blocking / lock contention.
- Process: approval / single point of expertise.
- Data: schema / indexing / partitioning.
- Cognitive (team): meeting / context-switch.
Theory of Constraints (Goldratt)
- Identify the bottleneck.
- Exploit it (use 100%).
- Subordinate non-bottleneck (don't over-feed).
- Elevate it (invest to widen).
- Repeat (new bottleneck emerges).
Amdahl's Law (related)
- 매 90% 의 100× → 매 전체 의 매 10× cap.
- 매 bottleneck 의 X 의 의미.
매 hardware bottleneck 의 modern (LLM)
- HBM bandwidth: 매 H100 = 매 3 TB/s. 매 LLM inference 의 dominant.
- NVLink: 매 GPU-GPU.
- Network (RDMA, InfiniBand): 매 distributed train.
- PCIe: 매 GPU-CPU.
- Storage: 매 NVMe vs spinning.
- Power / cooling: 매 datacenter limit.
매 software bottleneck
- CPU-bound: 매 compute heavy.
- I/O-bound: 매 disk / network wait.
- Memory-bound: 매 swap / cache miss.
- Lock contention: 매 mutex.
- GIL (Python): 매 single-thread CPU.
- N+1 query: 매 ORM 의 typical.
매 detection
- Profiler: cProfile, perf, async-profiler.
- Trace: distributed tracing (Jaeger).
- Metric: CPU/mem/disk/network util.
- APM: Datadog, NewRelic.
- Flame graph.
- Critical path.
매 process bottleneck
- 매 approval chain.
- 매 single expert.
- 매 environment provisioning.
- 매 review SLA.
- 매 meeting cadence.
→ 매 DORA Lead Time 의 component.
매 data bottleneck
- 매 single hot row.
- 매 missing index.
- 매 cross-shard transaction.
- 매 schema migration block.
매 distributed bottleneck (modern)
- 매 leader 의 single (Raft, Paxos).
- 매 cross-region call.
- 매 sync replication.
- 매 connection pool limit.
💻 패턴
Profile (Python cProfile)
import cProfile, pstats
def main():
expensive_call()
cheap_call()
cProfile.run('main()', 'out.prof')
stats = pstats.Stats('out.prof').sort_stats('cumulative')
stats.print_stats(20)
Linux perf (system-level)
# 매 CPU profile
perf record -F 99 -p $PID -- sleep 10
perf report
# 매 flame graph
perf script | ./stackcollapse-perf.pl | ./flamegraph.pl > flame.svg
Async profiler (JVM)
# 매 sample lock contention
java -jar async-profiler.jar -e lock -d 30 -f lock.html $PID
# 매 wall clock (I/O bound 도)
java -jar async-profiler.jar -e wall -d 30 -f wall.html $PID
N+1 detect (Django)
from django.test.utils import CaptureQueriesContext
from django.db import connection
with CaptureQueriesContext(connection) as ctx:
posts = Post.objects.all()
for post in posts:
print(post.author.name) # 매 N+1
if len(ctx.captured_queries) > 5:
log(f'N+1 detected: {len(ctx.captured_queries)} queries')
# 매 fix
posts = Post.objects.select_related('author') # 매 1 query
GPU bottleneck profile (PyTorch)
import torch.profiler as prof
with prof.profile(
activities=[prof.ProfilerActivity.CPU, prof.ProfilerActivity.CUDA],
record_shapes=True,
profile_memory=True,
) as p:
model(input)
print(p.key_averages().table(sort_by='cuda_time_total', row_limit=20))
# 매 HBM bandwidth bottleneck 의 reveal
Lock contention detection
import threading
class LockMonitor:
def __init__(self, lock):
self.lock = lock
self.wait_times = []
def __enter__(self):
start = time.time()
self.lock.acquire()
self.wait_times.append(time.time() - start)
def __exit__(self, *args):
self.lock.release()
def report(self):
if not self.wait_times: return
avg = sum(self.wait_times) / len(self.wait_times)
if avg > 0.1: log(f'Lock contention: avg wait {avg*1000}ms')
Distributed trace (Jaeger)
from opentelemetry import trace
from opentelemetry.sdk.trace import TracerProvider
from opentelemetry.exporter.jaeger.thrift import JaegerExporter
trace.set_tracer_provider(TracerProvider())
tracer = trace.get_tracer(__name__)
@tracer.start_as_current_span('handle_request')
def handle(req):
with tracer.start_as_current_span('db_query') as span:
span.set_attribute('db.statement', 'SELECT ...')
result = db.query(...)
return result
→ 매 시각적 bottleneck identify.
Process bottleneck (workflow analysis)
def analyze_workflow(stage_durations):
"""매 stage 별 의 throughput 의 비교."""
rates = {stage: 1 / dur for stage, dur in stage_durations.items()}
bottleneck = min(rates, key=rates.get)
overall_rate = rates[bottleneck]
waste = sum(r - overall_rate for r in rates.values() if r > overall_rate)
return {
'bottleneck': bottleneck,
'overall_rate_per_min': overall_rate * 60,
'capacity_wasted': waste,
}
Critical path (DAG)
import networkx as nx
def critical_path(tasks):
"""매 longest path through DAG."""
G = nx.DiGraph()
for task in tasks:
G.add_node(task.id, duration=task.duration)
for dep in task.deps:
G.add_edge(dep, task.id)
# 매 longest path
return nx.dag_longest_path(G, weight='duration')
🤔 결정 기준
| 증상 | Tool |
|---|---|
| Slow request | APM + distributed trace |
| CPU pegged | Flame graph (perf) |
| GPU underutilized | Memory bandwidth (PyTorch profiler) |
| Slow query | EXPLAIN + slow query log |
| Lock contention | async-profiler -e lock |
| Long lead time | Process / DORA analysis |
| Thundering herd | Coordination check |
기본값: 매 measure first. 매 hypothesis-based optimize.
🔗 Graph
- 부모: System-Design
- 변형: CPU-Bound
- 응용: Theory-of-Constraints · Amdahl's Law · Critical-Path
- Tool: Profiling · Flame-Graph · Distributed Tracing
- Adjacent: Optimization · Scalability · DORA-Metrics
🤖 LLM 활용
언제: 매 performance optimization. 매 capacity planning. 매 incident root cause. 매 process improvement. 언제 X: 매 hypothesis 없 의 optimize.
❌ 안티패턴
- Optimize without measure: 매 wrong place.
- Non-bottleneck improve: 매 시간 waste (TOC).
- 모든 part 의 평등 invest: 매 ROI low.
- Single profile 의 trust: 매 representative X.
- Process 의 "사람 의 fault": 매 system issue 가 대부분.
- Premature optimization: 매 simplicity lose.
🧪 검증 / 중복
- Verified (Goldratt TOC, Knuth premature optimization, Brendan Gregg Systems Performance).
- 신뢰도 A.
- Related: Amdahl's Law · Theory-of-Constraints · Profiling · Critical-Path.
🕓 Changelog
| 날짜 | 변경 |
|---|---|
| 2026-05-08 | Phase 1 |
| 2026-05-10 | Manual cleanup — type + TOC + 매 profile / N+1 / GPU / trace code |