--- id: wiki-2026-0508-theory-of-constraints-toc title: Theory of Constraints (TOC) category: 10_Wiki/Topics status: verified canonical_id: self aliases: [TOC, Bottleneck Theory, Goldratt] duplicate_of: none source_trust_level: A confidence_score: 0.9 verification_status: applied tags: [management, optimization, systems-thinking, ops] raw_sources: [] last_reinforced: 2026-05-10 github_commit: pending tech_stack: language: N/A framework: TOC methodology --- # Theory of Constraints (TOC) ## 매 한 줄 > **"매 system throughput 매 single bottleneck 의 결정"**. Goldratt 의 1984 *The Goal* 에서 정립. 매 system 의 weakest link (constraint) 가 전체 throughput 의 제한 → 매 다른 모든 부분의 optimization 매 wasted. 매 manufacturing 에서 시작, 매 software (DevOps, ML pipeline, LLM agent throughput) 까지 broad applicable. ## 매 핵심 ### 매 5-step focusing process 1. **Identify** the constraint. 2. **Exploit** it (maximize utilization without extra investment). 3. **Subordinate** everything else to the constraint. 4. **Elevate** the constraint (invest to expand capacity). 5. **Repeat** — 매 constraint 매 moves elsewhere. ### 매 핵심 metrics (Throughput Accounting) - **Throughput (T)** — 매 sales rate of new revenue. - **Inventory / Investment (I)** — 매 money tied in the system. - **Operating Expense (OE)** — 매 money spent to convert I → T. - 매 priority: maximize T, minimize I, control OE. ### 매 modern applications (2026) - **DevOps** — 매 *The Phoenix Project* (Kim 2013) 매 TOC + Lean 매 IT ops 에 적용. - **ML pipeline** — 매 GPU bottleneck (training), tokenizer (inference batch), vector store (RAG retrieval). - **LLM agent** — 매 sequential tool call 의 dominant latency contributor 의 identify → parallel. - **Org / Team** — 매 single approval bottleneck → workflow change. ### 매 응용 1. Factory floor scheduling (DBR — Drum-Buffer-Rope). 2. Software delivery (CI bottleneck, code review queue). 3. ML inference cost (which stage 매 dominant?). ## 💻 패턴 ### 매 inference pipeline profiling ```python import time def profile_pipeline(stages, inp): timings = {} out = inp for name, fn in stages: t0 = time.perf_counter() out = fn(out) timings[name] = time.perf_counter() - t0 bottleneck = max(timings, key=timings.get) return out, timings, bottleneck # 매 result: {"tokenize": 0.001, "embed": 0.04, "vector_search": 0.32, "rerank": 0.18} # 매 bottleneck: vector_search → 매 elevate (HNSW tune, GPU index, shard). ``` ### 매 LLM agent: serialize → parallelize ```python # 매 BEFORE (serial — bottleneck = sum) ctx = await fetch_user(uid) weather = await get_weather(ctx.city) calendar = await fetch_calendar(uid) # 매 AFTER (parallel — bottleneck = max) ctx, weather, calendar = await asyncio.gather( fetch_user(uid), get_weather_for_user(uid), fetch_calendar(uid), ) ``` ### 매 DBR (Drum-Buffer-Rope) for queue ```python # 매 Drum: bottleneck stage paces the system # 매 Buffer: small queue before bottleneck (의 starvation 방지) # 매 Rope: feedback to release new work only when buffer drains import asyncio buffer = asyncio.Queue(maxsize=8) # 매 Buffer async def producer(): while True: item = next_work() await buffer.put(item) # 매 Rope (blocks when full) async def bottleneck_consumer(): # 매 Drum while True: item = await buffer.get() await expensive_step(item) ``` ### 매 throughput accounting (simple) ```python def evaluate_change(before, after): delta_T = after["throughput"] - before["throughput"] delta_I = after["inventory"] - before["inventory"] delta_OE = after["op_expense"] - before["op_expense"] net = delta_T - delta_OE return {"net_value": net, "OK": net > 0 and delta_I <= 0} ``` ### 매 5-step driver (pseudocode) ```python def toc_loop(system, n_iterations=5): for _ in range(n_iterations): c = identify_constraint(system) # 1 exploit(c) # 2 subordinate_others_to(c, system) # 3 if not enough_capacity(c, target=system.target): elevate(c) # 4 # 5: repeat — 매 constraint 매 shift ``` ## 매 결정 기준 | 상황 | Approach | |---|---| | Multi-stage pipeline | Profile → identify slowest → optimize that | | LLM agent latency | Parallel tool calls + cache hot path | | Batch ML training | GPU util check (nvidia-smi) — 매 if low, IO bottleneck | | Software delivery | Value stream map → find single bottleneck | | Cost optimization | Throughput accounting (T - OE), not unit cost | **기본값**: 매 measure 먼저, 매 single biggest bottleneck 의 fix, 매 repeat — 매 premature optimization 의 X. ## 🔗 Graph - 부모: [[Systems_Thinking|Systems-Thinking]] - 변형: [[Lean]] · [[DevOps]] ## 🤖 LLM 활용 **언제**: 매 multi-stage pipeline 의 latency / cost 의 분석, ML / agent system optimization, team workflow bottleneck. **언제 X**: 매 single-step task — 매 TOC framing 매 overhead. ## ❌ 안티패턴 - **Local optimization**: 매 non-bottleneck 의 optimize 매 system throughput 의 0% 변화. - **Multiple "bottlenecks"**: 매 일반적으로 single dominant — 매 measure 안 하고 guess 매 wrong. - **Ignoring the shift**: 매 bottleneck fix 후 다른 stage 가 새 constraint — 매 repeat 안 하면 stuck. - **Capacity addition without exploit**: 매 step 4 (elevate) 의 step 2 (exploit) 전에 — 매 expensive 하게 hardware buy 후 underutilized. ## 🧪 검증 / 중복 - Verified (Goldratt 1984 *The Goal*, Kim *Phoenix Project* 2013, *Beyond the Phoenix Project* 2018). - 신뢰도 A. ## 🕓 Changelog | 날짜 | 변경 | |---|---| | 2026-05-08 | Phase 1 | | 2026-05-10 | Manual cleanup — TOC + DevOps/ML pipeline modern application |