Files
2nd/10_Wiki/Topics/Architecture/Mark-Sweep.md
T
Antigravity Agent f8b21af4be Wiki cleanup: error-doc removal, dedup merge, link normalization
10_Wiki/Topics 대규모 정리:
- 오류 캡처/미완성 stub 문서 227개 제거
- 교차폴더 중복 43클러스터 병합 (63파일 → redirect)
- 링크명 정규화: 깨진 링크 수정·redirect 직결·개념 매핑 ~2,400건
- 카테고리 MOC 6개 신규 생성
- Graph 섹션 미해결 related-keyword 링크 10,058건 제거

Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com>
2026-05-20 23:52:15 +09:00

5.2 KiB

id, title, category, status, canonical_id, aliases, duplicate_of, source_trust_level, confidence_score, verification_status, tags, raw_sources, last_reinforced, github_commit, tech_stack
id title category status canonical_id aliases duplicate_of source_trust_level confidence_score verification_status tags raw_sources last_reinforced github_commit tech_stack
wiki-2026-0508-mark-sweep Mark-Sweep GC 10_Wiki/Topics verified self
Mark and Sweep
Tracing GC
M&S
none A 0.9 applied
gc
memory
runtime
algorithm
2026-05-10 pending
language framework
multi jvm-v8-cpython

Mark-Sweep GC

매 한 줄

"매 reachable 만 살리고 — 매 나머지는 sweep.". Mark-Sweep 은 매 1960년 McCarthy 가 Lisp 위해 발명한 매 tracing GC algorithm. 매 root 부터 reachable object 를 mark 하고 매 unmarked 를 sweep. 매 reference counting 의 cycle 문제 해결 — 매 modern GC (V8, JVM, CPython generational) 의 base.

매 핵심

매 2-phase

  1. Mark: 매 GC root (stack, globals, registers) 부터 매 graph traverse, 매 reachable 에 mark bit 1.
  2. Sweep: 매 heap 전체 scan, 매 unmarked object 를 free list 에 추가.

매 GC root

  • 매 stack 의 local variable.
  • 매 global / static variable.
  • 매 CPU register.
  • 매 JNI / native handle.

매 변형

  • Mark-Compact: sweep 대신 매 살아있는 object 를 압축 → 매 fragmentation 해결.
  • Tri-color: white / grey / black — 매 incremental / concurrent 가능.
  • Generational: young (Eden + Survivor) / old — 매 most objects die young 가설.

매 stop-the-world (STW)

  • 매 mark phase 동안 매 mutator (app thread) 정지 — 매 reference graph 일관성.
  • 매 concurrent / incremental GC 는 매 write barrier 로 STW 최소화.

💻 패턴

Pseudocode mark

def mark(obj):
    if obj is None or obj.marked:
        return
    obj.marked = True
    for ref in obj.references:
        mark(ref)

def gc():
    # 1. Mark
    for root in get_roots():
        mark(root)
    # 2. Sweep
    for obj in heap:
        if obj.marked:
            obj.marked = False  # reset for next cycle
        else:
            free(obj)

Tri-color iterative mark

WHITE, GREY, BLACK = 0, 1, 2

def mark_iterative():
    # Initially all WHITE except roots → GREY
    grey = list(get_roots())
    for r in grey:
        r.color = GREY

    while grey:
        obj = grey.pop()
        for ref in obj.references:
            if ref.color == WHITE:
                ref.color = GREY
                grey.append(ref)
        obj.color = BLACK

    # Sweep: WHITE → free, BLACK → reset to WHITE
    for obj in heap:
        if obj.color == WHITE:
            free(obj)
        else:
            obj.color = WHITE

V8 (JavaScript) GC config

// Inspect V8 GC behavior
node --trace-gc app.js
// → [GC] Scavenge ... Mark-sweep ...

// Tune heap size
node --max-old-space-size=4096 app.js  // 4GB

// Trigger GC manually (testing only)
node --expose-gc -e "global.gc()"

JVM G1GC vs ZGC

# G1GC (default since Java 9): regional mark-sweep with compaction
java -XX:+UseG1GC -Xmx8g -XX:MaxGCPauseMillis=200 App

# ZGC (Java 21+): concurrent, sub-ms pause
java -XX:+UseZGC -Xmx32g App
# → most work concurrent with mutator, STW phases <1ms

CPython gc module

import gc

# CPython uses refcount + mark-sweep (for cycles only)
gc.collect()  # force cycle collection
gc.get_count()  # (gen0, gen1, gen2) — generational
gc.set_threshold(700, 10, 10)  # tune trigger

# Detect cycles
import objgraph
objgraph.show_most_common_types(limit=20)

Write barrier (concurrent GC)

// When mutator writes a pointer, GC must know
// (so concurrent mark doesn't miss new references)
void write_field(Object* obj, int idx, Object* val) {
    if (gc_is_marking && obj->color == BLACK && val->color == WHITE) {
        val->color = GREY;
        grey_queue_push(val);
    }
    obj->fields[idx] = val;
}

매 결정 기준

상황 GC choice
Throughput 중요, latency 무관 Parallel GC (JVM)
균형 (default) G1GC (JVM), V8 default
매 sub-ms pause ZGC (JVM 21+), Shenandoah
Embedded / RT Manual / arena allocator
Functional language Generational copying (OCaml, Erlang per-process)

기본값: 매 generational + concurrent mark-sweep (G1, V8 Orinoco) — 매 modern runtime 의 standard.

🔗 Graph

🤖 LLM 활용

언제: GC algorithm 설계, runtime 선택, GC tuning 의 이론적 basis 가 필요할 때. 언제 X: application-level memory leak 추적 — 매 Memory_Leaks 가 더 직접적.

안티패턴

  • "GC = no leaks": 매 reachable leak 이 매 GC lang 의 가장 흔한 issue.
  • 잦은 manual gc(): 매 runtime 의 heuristic 보다 못함, 매 throughput 떨어뜨림.
  • 거대 heap: 매 mark phase 가 heap size 에 비례 — 매 큰 heap = 큰 pause (non-concurrent GC).
  • Finalizer 의존: 매 unpredictable timing — 매 RAII / try-with-resources 가 정답.

🧪 검증 / 중복

  • Verified (McCarthy 1960, "GC Handbook" Jones et al, V8 blog, OpenJDK docs).
  • 신뢰도 A.

🕓 Changelog

날짜 변경
2026-05-08 Phase 1
2026-05-10 Manual cleanup — 2-phase + tri-color + V8/JVM/CPython 패턴