--- id: wiki-2026-0508-directed-acyclic-graph-dependenc title: Directed Acyclic Graph Dependency Management category: 10_Wiki/Topics status: verified canonical_id: self aliases: [DAG, Build Graph, Task Dependency, Topological Sort] duplicate_of: none source_trust_level: A confidence_score: 0.94 verification_status: applied tags: [DAG, dependency, build-system, scheduler, topological-sort] raw_sources: [] last_reinforced: 2026-05-10 github_commit: pending tech_stack: language: python framework: networkx/airflow --- # Directed Acyclic Graph Dependency Management ## 매 한 줄 > **"매 DAG = nodes (tasks) + directed edges (must-run-before) + 매 cycle 금지"**. 매 1960s Make 의 build graph 부터 매 2026 Airflow/Dagster pipelines, Bazel/Turborepo monorepo, Spark physical plan, Git commit history, React fiber tree 까지 — 매 dependency resolution 의 universal data structure. ## 매 핵심 ### 매 핵심 연산 - **Topological Sort**: 매 valid execution order. Kahn's O(V+E) or DFS. - **Cycle Detection**: 매 DAG validity check. - **Transitive Reduction**: 매 minimal edge set with same reachability. - **Critical Path**: 매 longest path = makespan lower bound. - **Incremental Recompute**: 매 dirty subgraph 만 재실행. ### 매 응용 1. **Build systems**: Make, Bazel, Buck, Turborepo, Nx. 2. **Workflow orchestration**: Airflow, Dagster, Prefect, Argo Workflows. 3. **ML training pipelines**: Kubeflow, MLflow, ZenML. 4. **Spreadsheet recalc**: Excel, Google Sheets formula engine. 5. **VCS**: Git commit DAG, Mercurial. 6. **React/Solid reactivity**: 매 signal dependency graph. ### 매 schedule strategies - **List scheduling**: 매 ready tasks → workers (greedy). - **HEFT**: 매 heterogeneous earliest finish time (cloud). - **Critical Path Method (CPM)**: 매 longest path 기반 prioritization. - **Work-stealing**: 매 dynamic load balancing (Tokio, Rayon). ## 💻 패턴 ### Topological Sort (Kahn's Algorithm) ```python from collections import deque, defaultdict def topo_sort(nodes, edges): indegree = defaultdict(int) graph = defaultdict(list) for u, v in edges: graph[u].append(v); indegree[v] += 1 queue = deque([n for n in nodes if indegree[n] == 0]) order = [] while queue: u = queue.popleft(); order.append(u) for v in graph[u]: indegree[v] -= 1 if indegree[v] == 0: queue.append(v) if len(order) != len(nodes): raise ValueError("Cycle detected") return order ``` ### Parallel DAG Executor (asyncio) ```python import asyncio async def run_dag(tasks, deps): """tasks: {name: async_fn}, deps: {name: [prereqs]}.""" completed = {}; pending = dict(deps) async def run(name): await asyncio.gather(*(completed[d] for d in deps.get(name, []))) return await tasks[name]() completed = {n: asyncio.create_task(run(n)) for n in tasks} return await asyncio.gather(*completed.values()) ``` ### Incremental Build (Content-Hash) ```python def needs_rebuild(node, hashes, prev_hashes): own_hash = hash_inputs(node.sources, [hashes[d] for d in node.deps]) if prev_hashes.get(node.name) != own_hash: hashes[node.name] = own_hash return True hashes[node.name] = own_hash return False ``` ### Critical Path ```python def critical_path(graph, durations): order = topo_sort(graph.nodes, graph.edges) earliest = {n: durations[n] for n in graph.nodes} for u in order: for v in graph.successors(u): earliest[v] = max(earliest[v], earliest[u] + durations[v]) return max(earliest.values()), earliest ``` ### Cycle Detection (DFS) ```python WHITE, GRAY, BLACK = 0, 1, 2 def has_cycle(graph): color = {n: WHITE for n in graph} def dfs(u): color[u] = GRAY for v in graph[u]: if color[v] == GRAY: return True if color[v] == WHITE and dfs(v): return True color[u] = BLACK return False return any(dfs(n) for n in graph if color[n] == WHITE) ``` ### Airflow DAG (Practical) ```python from airflow.decorators import dag, task from datetime import datetime @dag(start_date=datetime(2026,1,1), schedule="@daily", catchup=False) def etl(): @task def extract(): return fetch() @task def transform(data): return clean(data) @task def load(clean): warehouse.write(clean) load(transform(extract())) etl() ``` ## 매 결정 기준 | 상황 | Approach | |---|---| | Build deterministic, hermetic | Bazel / Buck (content-hash) | | Data pipeline, scheduled | Airflow / Dagster | | Monorepo JS/TS | Turborepo / Nx | | ML experiment tracking | Kubeflow / MLflow / ZenML | | In-process reactive UI | Signals (Solid/Vue/Svelte) | | Real-time stream graph | Flink / Spark Structured Streaming | **기본값**: 매 explicit DAG (declarative) > 매 implicit ordering — 매 visualization + audit + parallel scheduling 가능. ## 🔗 Graph - 부모: [[Graph Theory]] · [[Topological Sort]] - 변형: [[Build Graph]] - 응용: [[Bazel]] · [[Airflow]] · [[Turborepo 환경 구성]] · [[Spark]] - Adjacent: [[Incremental-Computation]] ## 🤖 LLM 활용 **언제**: 매 multi-step agent plan 의 dependency 표현, 매 RAG indexing pipeline orchestration. **언제 X**: 매 cyclic feedback loop 가 본질적 (RL, gradient descent) — 매 DAG 외 unrolled iteration. ## ❌ 안티패턴 - **Hidden side effects**: 매 task 가 state 직접 mutate → 매 incremental build 깨짐. - **Ignoring transitive dependencies**: 매 missing edge → race condition. - **Single-task megasinks**: 매 fan-in bottleneck — 매 break into shards. - **Cycle by feature flag**: 매 conditional dependencies 가 implicit cycle 만들 수 있음. - **Over-fine granularity**: 매 nano-tasks → scheduler overhead > work. ## 🧪 검증 / 중복 - Verified (Kahn 1962; Bazel docs 2026; Airflow 3.x docs; CLRS Ch.22). - 신뢰도 A. ## 🕓 Changelog | 날짜 | 변경 | |---|---| | 2026-05-08 | Phase 1 | | 2026-05-10 | Manual cleanup — full content with topo, parallel exec, incremental, Airflow |