--- id: wiki-2026-0508-mark-sweep title: Mark-Sweep GC category: 10_Wiki/Topics status: verified canonical_id: self aliases: [Mark and Sweep, Tracing GC, M&S] duplicate_of: none source_trust_level: A confidence_score: 0.9 verification_status: applied tags: [gc, memory, runtime, algorithm] raw_sources: [] last_reinforced: 2026-05-10 github_commit: pending tech_stack: language: multi framework: jvm-v8-cpython --- # Mark-Sweep GC ## 매 한 줄 > **"매 reachable 만 살리고 — 매 나머지는 sweep."**. Mark-Sweep 은 매 1960년 McCarthy 가 Lisp 위해 발명한 매 tracing GC algorithm. 매 root 부터 reachable object 를 mark 하고 매 unmarked 를 sweep. 매 reference counting 의 cycle 문제 해결 — 매 modern GC (V8, JVM, CPython generational) 의 base. ## 매 핵심 ### 매 2-phase 1. **Mark**: 매 GC root (stack, globals, registers) 부터 매 graph traverse, 매 reachable 에 mark bit 1. 2. **Sweep**: 매 heap 전체 scan, 매 unmarked object 를 free list 에 추가. ### 매 GC root - 매 stack 의 local variable. - 매 global / static variable. - 매 CPU register. - 매 JNI / native handle. ### 매 변형 - **Mark-Compact**: sweep 대신 매 살아있는 object 를 압축 → 매 fragmentation 해결. - **Tri-color**: white / grey / black — 매 incremental / concurrent 가능. - **Generational**: young (Eden + Survivor) / old — 매 most objects die young 가설. ### 매 stop-the-world (STW) - 매 mark phase 동안 매 mutator (app thread) 정지 — 매 reference graph 일관성. - 매 concurrent / incremental GC 는 매 write barrier 로 STW 최소화. ## 💻 패턴 ### Pseudocode mark ```python def mark(obj): if obj is None or obj.marked: return obj.marked = True for ref in obj.references: mark(ref) def gc(): # 1. Mark for root in get_roots(): mark(root) # 2. Sweep for obj in heap: if obj.marked: obj.marked = False # reset for next cycle else: free(obj) ``` ### Tri-color iterative mark ```python WHITE, GREY, BLACK = 0, 1, 2 def mark_iterative(): # Initially all WHITE except roots → GREY grey = list(get_roots()) for r in grey: r.color = GREY while grey: obj = grey.pop() for ref in obj.references: if ref.color == WHITE: ref.color = GREY grey.append(ref) obj.color = BLACK # Sweep: WHITE → free, BLACK → reset to WHITE for obj in heap: if obj.color == WHITE: free(obj) else: obj.color = WHITE ``` ### V8 (JavaScript) GC config ```javascript // Inspect V8 GC behavior node --trace-gc app.js // → [GC] Scavenge ... Mark-sweep ... // Tune heap size node --max-old-space-size=4096 app.js // 4GB // Trigger GC manually (testing only) node --expose-gc -e "global.gc()" ``` ### JVM G1GC vs ZGC ```bash # G1GC (default since Java 9): regional mark-sweep with compaction java -XX:+UseG1GC -Xmx8g -XX:MaxGCPauseMillis=200 App # ZGC (Java 21+): concurrent, sub-ms pause java -XX:+UseZGC -Xmx32g App # → most work concurrent with mutator, STW phases <1ms ``` ### CPython gc module ```python import gc # CPython uses refcount + mark-sweep (for cycles only) gc.collect() # force cycle collection gc.get_count() # (gen0, gen1, gen2) — generational gc.set_threshold(700, 10, 10) # tune trigger # Detect cycles import objgraph objgraph.show_most_common_types(limit=20) ``` ### Write barrier (concurrent GC) ```c // When mutator writes a pointer, GC must know // (so concurrent mark doesn't miss new references) void write_field(Object* obj, int idx, Object* val) { if (gc_is_marking && obj->color == BLACK && val->color == WHITE) { val->color = GREY; grey_queue_push(val); } obj->fields[idx] = val; } ``` ## 매 결정 기준 | 상황 | GC choice | |---|---| | Throughput 중요, latency 무관 | Parallel GC (JVM) | | 균형 (default) | G1GC (JVM), V8 default | | 매 sub-ms pause | ZGC (JVM 21+), Shenandoah | | Embedded / RT | Manual / arena allocator | | Functional language | Generational copying (OCaml, Erlang per-process) | **기본값**: 매 generational + concurrent mark-sweep (G1, V8 Orinoco) — 매 modern runtime 의 standard. ## 🔗 Graph - 부모: [[Garbage Collection]] · [[Memory Management]] - 변형: [[Reference Counting]] - 응용: [[V8 Engine]] · [[JVM]] - Adjacent: [[Write Barrier]] · [[Memory_Leaks]] ## 🤖 LLM 활용 **언제**: GC algorithm 설계, runtime 선택, GC tuning 의 이론적 basis 가 필요할 때. **언제 X**: application-level memory leak 추적 — 매 [[Memory_Leaks]] 가 더 직접적. ## ❌ 안티패턴 - **"GC = no leaks"**: 매 reachable leak 이 매 GC lang 의 가장 흔한 issue. - **잦은 manual gc()**: 매 runtime 의 heuristic 보다 못함, 매 throughput 떨어뜨림. - **거대 heap**: 매 mark phase 가 heap size 에 비례 — 매 큰 heap = 큰 pause (non-concurrent GC). - **Finalizer 의존**: 매 unpredictable timing — 매 RAII / try-with-resources 가 정답. ## 🧪 검증 / 중복 - Verified (McCarthy 1960, "GC Handbook" Jones et al, V8 blog, OpenJDK docs). - 신뢰도 A. ## 🕓 Changelog | 날짜 | 변경 | |---|---| | 2026-05-08 | Phase 1 | | 2026-05-10 | Manual cleanup — 2-phase + tri-color + V8/JVM/CPython 패턴 |