--- id: wiki-2026-0508-call-stack-analysis title: Call Stack Analysis category: 10_Wiki/Topics status: verified canonical_id: self aliases: [Call Stack, Stack Trace Analysis, Flame Graph, Profiling Stack] duplicate_of: none source_trust_level: A confidence_score: 0.95 verification_status: applied tags: [profiling, performance, flame-graph, debugging, observability] raw_sources: [] last_reinforced: 2026-05-10 github_commit: pending tech_stack: language: Polyglot framework: perf/eBPF/pprof --- # Call Stack Analysis ## 매 한 줄 > **"매 performance bug 의 95% 는 'where is CPU time spent?' — 매 call stack sampling 이 답한다."**. 매 stack trace 를 statistical 하게 sampling → flame graph 로 visualize 하면 hot path 가 즉시 보임. 매 2026 표준 stack 은 Linux perf + eBPF, 매 inferno / pyroscope / Datadog Continuous Profiler. ## 매 핵심 ### 매 sampling vs instrumentation - **Sampling profiler**: 매 N Hz (보통 99/999Hz) 마다 stack capture → low overhead, statistical. - **Instrumented profiler**: 매 every entry/exit hook → exact, but 10-100x overhead. - **현대 default**: 매 sampling — 매 production-safe. ### 매 stack source - **Frame pointer (RBP) walk**: 매 fastest, requires `-fno-omit-frame-pointer`. - **DWARF unwind**: 매 .eh_frame 사용 — frame pointer 불필요하나 expensive. - **ORC unwinder**: 매 Linux kernel 의 lightweight DWARF subset. - **eBPF stackmap**: 매 user+kernel stack 통합. ### 매 visualization - **Flame graph (Brendan Gregg)**: 매 x=share of samples, y=stack depth, width=hot. - **Icicle graph**: 매 flipped flame — root at top. - **Differential flame graph**: 매 두 profile diff — perf regression 사냥. ### 매 응용 1. **CPU bottleneck 진단**: 매 hot function 식별. 2. **Lock contention**: 매 off-CPU profile + futex stack. 3. **GC pressure**: 매 alloc-stack profile. 4. **Cold start**: 매 startup phase flame graph. 5. **Continuous profiling**: 매 prod 24/7 sample → regression alerting. ## 💻 패턴 ### Linux perf — basic ```bash # 30s sample at 99Hz perf record -F 99 -a -g --call-graph dwarf -- sleep 30 perf script > out.stack # render git clone https://github.com/brendangregg/FlameGraph ./FlameGraph/stackcollapse-perf.pl out.stack | \ ./FlameGraph/flamegraph.pl > flame.svg ``` ### eBPF profile (BCC) ```bash profile-bpfcc -F 99 -f 30 > out.folded flamegraph.pl out.folded > flame.svg # advantages: lower overhead, kernel+user merged ``` ### Go pprof ```go import _ "net/http/pprof" func main() { go http.ListenAndServe(":6060", nil) // ... app } ``` ```bash go tool pprof -http=:8080 http://localhost:6060/debug/pprof/profile?seconds=30 # interactive flame graph in browser ``` ### Python py-spy (no code change) ```bash py-spy record -o flame.svg -d 30 --pid $(pgrep -f myapp.py) # zero instrumentation, samples a running process ``` ### Node.js / V8 ```bash node --prof app.js # ... run workload ... node --prof-process isolate-0xNNN-v8.log > profile.txt # or 0x: npx 0x -- node app.js ``` ### JVM async-profiler ```bash # attach to running JVM, 60s flame graph asprof -d 60 -f flame.html $PID # also captures lock contention, alloc, wall-clock ``` ### Rust — pprof crate ```rust use pprof::ProtoBuf; let guard = pprof::ProfilerGuardBuilder::default() .frequency(999) .blocklist(&["libc", "libgcc", "pthread"]) .build()?; // ... workload ... let report = guard.report().build()?; let mut file = std::fs::File::create("profile.pb")?; report.pprof()?.encode(&mut file)?; ``` ### Continuous profiling (Pyroscope / Grafana) ```yaml # pyroscope agent — runs alongside app pyroscope: server: http://pyroscope:4040 app_name: api-prod spy_name: ebpfspy # auto-detect language sample_rate: 100 ``` ### Differential flame graph ```bash ./FlameGraph/difffolded.pl before.folded after.folded | \ ./FlameGraph/flamegraph.pl > diff.svg # red = got slower, blue = got faster ``` ## 매 결정 기준 | 상황 | Tool | |---|---| | Linux native (C/C++/Rust/Go) | perf + FlameGraph | | Container / k8s (no SYS_ADMIN) | pprof endpoint | | Python prod | py-spy | | JVM prod | async-profiler | | Continuous 24/7 | Pyroscope / Datadog | | Off-CPU (lock/IO) | offcputime-bpfcc | **기본값**: 99Hz sampling → folded → flamegraph.pl. 매 첫 5분 안에 hot path 보임. ## 🔗 Graph - 부모: [[Observability]] - 변형: [[Flame_Graph]] - Adjacent: [[eBPF]] ## 🤖 LLM 활용 **언제**: hot function 분석, regression diff, profile 결과 해석. **언제 X**: tail latency / distributed trace (분산 환경은 OpenTelemetry). ## ❌ 안티패턴 - **Time.time printf 로그 profiling**: 매 statistical 안 되고 hot loop 망침. - **Frame pointer 없는 build**: 매 unwind 망가짐 — `-fno-omit-frame-pointer` 필수. - **너무 낮은 sample rate (10Hz)**: 매 30초 = 300 samples — noise dominate. - **너무 높은 rate (10kHz)**: 매 self-overhead 가 측정 결과 왜곡. - **Single-run profile 만 보기**: 매 variance — minimum 5 runs 권장. ## 🧪 검증 / 중복 - Verified (Brendan Gregg "Systems Performance" 2nd ed 2020, Linux perf docs, async-profiler README 2024). - 신뢰도 A. ## 🕓 Changelog | 날짜 | 변경 | |---|---| | 2026-05-08 | Phase 1 | | 2026-05-10 | Manual cleanup — sampling profilers, flame graphs, multi-language tooling |