--- id: wiki-2026-0508-flame-graphs title: Flame Graphs category: 10_Wiki/Topics status: verified canonical_id: self aliases: [Flamegraph, Stack Trace Visualization, Brendan Gregg Flame Graph] duplicate_of: none source_trust_level: A confidence_score: 0.9 verification_status: applied tags: [profiling, performance, observability, perf, ebpf] raw_sources: [] last_reinforced: 2026-05-10 github_commit: pending tech_stack: language: rust framework: perf-pyspy-pprof --- # Flame Graphs ## 매 한 줄 > **"매 stack trace 의 SVG-ified hierarchy"**. Brendan Gregg (2011) 가 만든 매 visualization — 매 x축은 alphabetical (NOT time), 매 y축은 stack depth, 매 width 는 sample count. 매 hot path 가 매 wide flat plateau 로 즉시 보임. 매 2026 현재 perf, eBPF, py-spy, async-profiler, pprof, Pyroscope 등 매 모든 profiler 가 native output. ## 매 핵심 ### 매 읽는 법 - **Width = time spent** (sample count proportional). 매 wide = hot. - **Y = stack depth**. 매 bottom = entry, top = leaf. - **Color = arbitrary** (typically random hue per function — visual separation only). - **Plateau at top** = leaf function 의 CPU bound. - **Tower** = deep call chain (recursion 또는 framework overhead). ### 매 variant - **CPU flame graph**: 매 on-CPU sample 만 — classic. - **Off-CPU flame graph**: 매 blocked time (I/O, lock wait) — 매 latency 분석. - **Differential flame graph**: 매 두 profile 의 diff — red = slower, blue = faster. - **Icicle (inverted)**: top-down — 매 entry-point 분석에 좋음. - **Continuous profiling**: 매 Pyroscope / Grafana Phlare 가 매 production 에 항상 켜짐. ### 매 도구 매핑 1. **Linux native**: `perf record -F 99 -g` + Brendan Gregg's FlameGraph perl script. 2. **eBPF**: `bcc/profile`, `parca-agent` — kernel + user 통합. 3. **Python**: `py-spy record -o flame.svg --pid $PID`. 4. **JVM**: `async-profiler -e cpu -d 30 -f flame.html $PID`. 5. **Go**: `go tool pprof -http=:8080 cpu.prof` (built-in flame graph). 6. **Node.js**: `0x` or `clinic flame`. ## 💻 패턴 ### Linux perf → flame graph ```bash # 1. Sample 99 Hz for 30s, capture stacks sudo perf record -F 99 -a -g -- sleep 30 # 2. Convert to folded format sudo perf script | \ ~/FlameGraph/stackcollapse-perf.pl > out.folded # 3. Render SVG ~/FlameGraph/flamegraph.pl out.folded > flame.svg # Open in browser → click to zoom, search regex highlights ``` ### Differential flame graph (before/after) ```bash ~/FlameGraph/stackcollapse-perf.pl < before.perf > before.folded ~/FlameGraph/stackcollapse-perf.pl < after.perf > after.folded ~/FlameGraph/difffolded.pl before.folded after.folded | \ ~/FlameGraph/flamegraph.pl --negate > diff.svg ``` ### Continuous profiling with Pyroscope (Go) ```go import "github.com/grafana/pyroscope-go" func main() { pyroscope.Start(pyroscope.Config{ ApplicationName: "checkout-service", ServerAddress: "http://pyroscope:4040", Logger: pyroscope.StandardLogger, Tags: map[string]string{"region": "us-west-2"}, ProfileTypes: []pyroscope.ProfileType{ pyroscope.ProfileCPU, pyroscope.ProfileAllocObjects, pyroscope.ProfileInuseObjects, }, }) runServer() } ``` ### py-spy on running Python service ```bash # 30s sample, draw flame graph py-spy record -o flame.svg --pid 12345 --duration 30 --rate 100 # Native + Python frames combined py-spy record -o flame.svg --pid 12345 --native # Top-like live view py-spy top --pid 12345 ``` ### async-profiler for JVM ```bash # CPU profile (30s) → flamegraph HTML ./profiler.sh -e cpu -d 30 -f flame.html $(jps | grep MyApp | awk '{print $1}') # Allocation profile ./profiler.sh -e alloc -d 60 -f alloc.html $PID # Wall-clock (off-CPU + on-CPU) ./profiler.sh -e wall -t -d 30 -f wall.html $PID ``` ### Off-CPU flame graph (eBPF / bcc) ```bash # Capture off-CPU stacks (blocked time) for 30s sudo /usr/share/bcc/tools/offcputime -df -p $PID 30 > offcpu.folded ~/FlameGraph/flamegraph.pl --color=io --title="Off-CPU" \ offcpu.folded > offcpu.svg ``` ### pprof flame graph (Go built-in) ```go import _ "net/http/pprof" go func() { http.ListenAndServe("localhost:6060", nil) }() // Then on dev machine: // go tool pprof -http=:8080 http://service:6060/debug/pprof/profile?seconds=30 // → opens browser, click "View" → "Flame Graph" ``` ## 매 결정 기준 | 상황 | Approach | |---|---| | Production continuous | Pyroscope / Grafana Phlare / Polar Signals | | Linux ad-hoc | perf + FlameGraph | | Python | py-spy (zero-instrumentation) | | JVM | async-profiler (allocation + CPU + wall) | | Go | built-in pprof + go tool pprof | | Node | 0x or clinic flame | | Latency / blocked | Off-CPU flame graph (eBPF) | **기본값**: 매 production 에 Pyroscope + 매 dev 에 native profiler. ## 🔗 Graph - 부모: [[Profiling]] - 응용: [[SRE]] - Adjacent: [[eBPF]] ## 🤖 LLM 활용 **언제**: 매 flame graph 의 hot frame 식별 + optimization 제안, folded text → 자연어 summary, differential interpretation. **언제 X**: 매 visual exact pixel reading — 매 SVG 자체 사용. ## ❌ 안티패턴 - **Sampling rate too low**: 매 19 Hz — 매 short hot function miss. 매 99 Hz 표준. - **Without -g (no callgraphs)**: 매 perf record -g 누락 — 매 frames frame 만 보임. - **No frame pointers (Go ≤1.20, glibc)**: 매 stack unwind 실패 — `-fno-omit-frame-pointer` 또는 DWARF. - **Reading width as time order**: 매 x축은 time 의 X — alphabetical sort. - **Production profiling once a year**: 매 continuous 의 가치를 놓침. ## 🧪 검증 / 중복 - Verified (Brendan Gregg 2011, Pyroscope/Grafana Labs 2026). - 신뢰도 A. ## 🕓 Changelog | 날짜 | 변경 | |---|---| | 2026-05-08 | Phase 1 | | 2026-05-10 | Manual cleanup — flame graph reading guide + perf/py-spy/pprof recipes |