Files
2nd/10_Wiki/Topics/Architecture/Call_Stack_Analysis.md
T
2026-05-10 22:08:15 +09:00

5.4 KiB

id, title, category, status, canonical_id, aliases, duplicate_of, source_trust_level, confidence_score, verification_status, tags, raw_sources, last_reinforced, github_commit, tech_stack
id title category status canonical_id aliases duplicate_of source_trust_level confidence_score verification_status tags raw_sources last_reinforced github_commit tech_stack
wiki-2026-0508-call-stack-analysis Call Stack Analysis 10_Wiki/Topics verified self
Call Stack
Stack Trace Analysis
Flame Graph
Profiling Stack
none A 0.95 applied
profiling
performance
flame-graph
debugging
observability
2026-05-10 pending
language framework
Polyglot perf/eBPF/pprof

Call Stack Analysis

매 한 줄

"매 performance bug 의 95% 는 'where is CPU time spent?' — 매 call stack sampling 이 답한다.". 매 stack trace 를 statistical 하게 sampling → flame graph 로 visualize 하면 hot path 가 즉시 보임. 매 2026 표준 stack 은 Linux perf + eBPF, 매 inferno / pyroscope / Datadog Continuous Profiler.

매 핵심

매 sampling vs instrumentation

  • Sampling profiler: 매 N Hz (보통 99/999Hz) 마다 stack capture → low overhead, statistical.
  • Instrumented profiler: 매 every entry/exit hook → exact, but 10-100x overhead.
  • 현대 default: 매 sampling — 매 production-safe.

매 stack source

  • Frame pointer (RBP) walk: 매 fastest, requires -fno-omit-frame-pointer.
  • DWARF unwind: 매 .eh_frame 사용 — frame pointer 불필요하나 expensive.
  • ORC unwinder: 매 Linux kernel 의 lightweight DWARF subset.
  • eBPF stackmap: 매 user+kernel stack 통합.

매 visualization

  • Flame graph (Brendan Gregg): 매 x=share of samples, y=stack depth, width=hot.
  • Icicle graph: 매 flipped flame — root at top.
  • Differential flame graph: 매 두 profile diff — perf regression 사냥.

매 응용

  1. CPU bottleneck 진단: 매 hot function 식별.
  2. Lock contention: 매 off-CPU profile + futex stack.
  3. GC pressure: 매 alloc-stack profile.
  4. Cold start: 매 startup phase flame graph.
  5. Continuous profiling: 매 prod 24/7 sample → regression alerting.

💻 패턴

Linux perf — basic

# 30s sample at 99Hz
perf record -F 99 -a -g --call-graph dwarf -- sleep 30
perf script > out.stack
# render
git clone https://github.com/brendangregg/FlameGraph
./FlameGraph/stackcollapse-perf.pl out.stack | \
  ./FlameGraph/flamegraph.pl > flame.svg

eBPF profile (BCC)

profile-bpfcc -F 99 -f 30 > out.folded
flamegraph.pl out.folded > flame.svg
# advantages: lower overhead, kernel+user merged

Go pprof

import _ "net/http/pprof"
func main() {
    go http.ListenAndServe(":6060", nil)
    // ... app
}
go tool pprof -http=:8080 http://localhost:6060/debug/pprof/profile?seconds=30
# interactive flame graph in browser

Python py-spy (no code change)

py-spy record -o flame.svg -d 30 --pid $(pgrep -f myapp.py)
# zero instrumentation, samples a running process

Node.js / V8

node --prof app.js
# ... run workload ...
node --prof-process isolate-0xNNN-v8.log > profile.txt
# or 0x: npx 0x -- node app.js

JVM async-profiler

# attach to running JVM, 60s flame graph
asprof -d 60 -f flame.html $PID
# also captures lock contention, alloc, wall-clock

Rust — pprof crate

use pprof::ProtoBuf;
let guard = pprof::ProfilerGuardBuilder::default()
    .frequency(999)
    .blocklist(&["libc", "libgcc", "pthread"])
    .build()?;
// ... workload ...
let report = guard.report().build()?;
let mut file = std::fs::File::create("profile.pb")?;
report.pprof()?.encode(&mut file)?;

Continuous profiling (Pyroscope / Grafana)

# pyroscope agent — runs alongside app
pyroscope:
  server: http://pyroscope:4040
  app_name: api-prod
  spy_name: ebpfspy           # auto-detect language
  sample_rate: 100

Differential flame graph

./FlameGraph/difffolded.pl before.folded after.folded | \
  ./FlameGraph/flamegraph.pl > diff.svg
# red = got slower, blue = got faster

매 결정 기준

상황 Tool
Linux native (C/C++/Rust/Go) perf + FlameGraph
Container / k8s (no SYS_ADMIN) pprof endpoint
Python prod py-spy
JVM prod async-profiler
Continuous 24/7 Pyroscope / Datadog
Off-CPU (lock/IO) offcputime-bpfcc

기본값: 99Hz sampling → folded → flamegraph.pl. 매 첫 5분 안에 hot path 보임.

🔗 Graph

🤖 LLM 활용

언제: hot function 분석, regression diff, profile 결과 해석. 언제 X: tail latency / distributed trace (분산 환경은 OpenTelemetry).

안티패턴

  • Time.time printf 로그 profiling: 매 statistical 안 되고 hot loop 망침.
  • Frame pointer 없는 build: 매 unwind 망가짐 — -fno-omit-frame-pointer 필수.
  • 너무 낮은 sample rate (10Hz): 매 30초 = 300 samples — noise dominate.
  • 너무 높은 rate (10kHz): 매 self-overhead 가 측정 결과 왜곡.
  • Single-run profile 만 보기: 매 variance — minimum 5 runs 권장.

🧪 검증 / 중복

  • Verified (Brendan Gregg "Systems Performance" 2nd ed 2020, Linux perf docs, async-profiler README 2024).
  • 신뢰도 A.

🕓 Changelog

날짜 변경
2026-05-08 Phase 1
2026-05-10 Manual cleanup — sampling profilers, flame graphs, multi-language tooling