---
id: wiki-2026-0508-theory-of-constraints-toc
title: Theory of Constraints (TOC)
category: 10_Wiki/Topics
status: verified
canonical_id: self
aliases: [TOC, Bottleneck Theory, Goldratt]
duplicate_of: none
source_trust_level: A
confidence_score: 0.9
verification_status: applied
tags: [management, optimization, systems-thinking, ops]
raw_sources: []
last_reinforced: 2026-05-10
github_commit: pending
tech_stack:
  language: N/A
  framework: TOC methodology
---

# Theory of Constraints (TOC)

## 매 한 줄
> **"매 system throughput 매 single bottleneck 의 결정"**. Goldratt 의 1984 *The Goal* 에서 정립. 매 system 의 weakest link (constraint) 가 전체 throughput 의 제한 → 매 다른 모든 부분의 optimization 매 wasted. 매 manufacturing 에서 시작, 매 software (DevOps, ML pipeline, LLM agent throughput) 까지 broad applicable.

## 매 핵심

### 매 5-step focusing process
1. **Identify** the constraint.
2. **Exploit** it (maximize utilization without extra investment).
3. **Subordinate** everything else to the constraint.
4. **Elevate** the constraint (invest to expand capacity).
5. **Repeat** — 매 constraint 매 moves elsewhere.

### 매 핵심 metrics (Throughput Accounting)
- **Throughput (T)** — 매 sales rate of new revenue.
- **Inventory / Investment (I)** — 매 money tied in the system.
- **Operating Expense (OE)** — 매 money spent to convert I → T.
- 매 priority: maximize T, minimize I, control OE.

### 매 modern applications (2026)
- **DevOps** — 매 *The Phoenix Project* (Kim 2013) 매 TOC + Lean 매 IT ops 에 적용.
- **ML pipeline** — 매 GPU bottleneck (training), tokenizer (inference batch), vector store (RAG retrieval).
- **LLM agent** — 매 sequential tool call 의 dominant latency contributor 의 identify → parallel.
- **Org / Team** — 매 single approval bottleneck → workflow change.

### 매 응용
1. Factory floor scheduling (DBR — Drum-Buffer-Rope).
2. Software delivery (CI bottleneck, code review queue).
3. ML inference cost (which stage 매 dominant?).

## 💻 패턴

### 매 inference pipeline profiling
```python
import time
def profile_pipeline(stages, inp):
    timings = {}
    out = inp
    for name, fn in stages:
        t0 = time.perf_counter()
        out = fn(out)
        timings[name] = time.perf_counter() - t0
    bottleneck = max(timings, key=timings.get)
    return out, timings, bottleneck

# 매 result: {"tokenize": 0.001, "embed": 0.04, "vector_search": 0.32, "rerank": 0.18}
# 매 bottleneck: vector_search → 매 elevate (HNSW tune, GPU index, shard).
```

### 매 LLM agent: serialize → parallelize
```python
# 매 BEFORE (serial — bottleneck = sum)
ctx = await fetch_user(uid)
weather = await get_weather(ctx.city)
calendar = await fetch_calendar(uid)

# 매 AFTER (parallel — bottleneck = max)
ctx, weather, calendar = await asyncio.gather(
    fetch_user(uid), get_weather_for_user(uid), fetch_calendar(uid),
)
```

### 매 DBR (Drum-Buffer-Rope) for queue
```python
# 매 Drum: bottleneck stage paces the system
# 매 Buffer: small queue before bottleneck (의 starvation 방지)
# 매 Rope: feedback to release new work only when buffer drains

import asyncio
buffer = asyncio.Queue(maxsize=8)  # 매 Buffer

async def producer():
    while True:
        item = next_work()
        await buffer.put(item)        # 매 Rope (blocks when full)

async def bottleneck_consumer():     # 매 Drum
    while True:
        item = await buffer.get()
        await expensive_step(item)
```

### 매 throughput accounting (simple)
```python
def evaluate_change(before, after):
    delta_T = after["throughput"] - before["throughput"]
    delta_I = after["inventory"] - before["inventory"]
    delta_OE = after["op_expense"] - before["op_expense"]
    net = delta_T - delta_OE
    return {"net_value": net, "OK": net > 0 and delta_I <= 0}
```

### 매 5-step driver (pseudocode)
```python
def toc_loop(system, n_iterations=5):
    for _ in range(n_iterations):
        c = identify_constraint(system)        # 1
        exploit(c)                             # 2
        subordinate_others_to(c, system)       # 3
        if not enough_capacity(c, target=system.target):
            elevate(c)                         # 4
        # 5: repeat — 매 constraint 매 shift
```

## 매 결정 기준
| 상황 | Approach |
|---|---|
| Multi-stage pipeline | Profile → identify slowest → optimize that |
| LLM agent latency | Parallel tool calls + cache hot path |
| Batch ML training | GPU util check (nvidia-smi) — 매 if low, IO bottleneck |
| Software delivery | Value stream map → find single bottleneck |
| Cost optimization | Throughput accounting (T - OE), not unit cost |

**기본값**: 매 measure 먼저, 매 single biggest bottleneck 의 fix, 매 repeat — 매 premature optimization 의 X.

## 🔗 Graph
- 부모: [[Systems_Thinking|Systems-Thinking]]
- 변형: [[Lean]] · [[DevOps]]

## 🤖 LLM 활용
**언제**: 매 multi-stage pipeline 의 latency / cost 의 분석, ML / agent system optimization, team workflow bottleneck.
**언제 X**: 매 single-step task — 매 TOC framing 매 overhead.

## ❌ 안티패턴
- **Local optimization**: 매 non-bottleneck 의 optimize 매 system throughput 의 0% 변화.
- **Multiple "bottlenecks"**: 매 일반적으로 single dominant — 매 measure 안 하고 guess 매 wrong.
- **Ignoring the shift**: 매 bottleneck fix 후 다른 stage 가 새 constraint — 매 repeat 안 하면 stuck.
- **Capacity addition without exploit**: 매 step 4 (elevate) 의 step 2 (exploit) 전에 — 매 expensive 하게 hardware buy 후 underutilized.

## 🧪 검증 / 중복
- Verified (Goldratt 1984 *The Goal*, Kim *Phoenix Project* 2013, *Beyond the Phoenix Project* 2018).
- 신뢰도 A.

## 🕓 Changelog
| 날짜 | 변경 |
|---|---|
| 2026-05-08 | Phase 1 |
| 2026-05-10 | Manual cleanup — TOC + DevOps/ML pipeline modern application |