Files
2nd/10_Wiki/Topics/Architecture/Call_Stack_Analysis.md
T
2026-05-10 22:08:15 +09:00

173 lines
5.4 KiB
Markdown

---
id: wiki-2026-0508-call-stack-analysis
title: Call Stack Analysis
category: 10_Wiki/Topics
status: verified
canonical_id: self
aliases: [Call Stack, Stack Trace Analysis, Flame Graph, Profiling Stack]
duplicate_of: none
source_trust_level: A
confidence_score: 0.95
verification_status: applied
tags: [profiling, performance, flame-graph, debugging, observability]
raw_sources: []
last_reinforced: 2026-05-10
github_commit: pending
tech_stack:
language: Polyglot
framework: perf/eBPF/pprof
---
# Call Stack Analysis
## 매 한 줄
> **"매 performance bug 의 95% 는 'where is CPU time spent?' — 매 call stack sampling 이 답한다."**. 매 stack trace 를 statistical 하게 sampling → flame graph 로 visualize 하면 hot path 가 즉시 보임. 매 2026 표준 stack 은 Linux perf + eBPF, 매 inferno / pyroscope / Datadog Continuous Profiler.
## 매 핵심
### 매 sampling vs instrumentation
- **Sampling profiler**: 매 N Hz (보통 99/999Hz) 마다 stack capture → low overhead, statistical.
- **Instrumented profiler**: 매 every entry/exit hook → exact, but 10-100x overhead.
- **현대 default**: 매 sampling — 매 production-safe.
### 매 stack source
- **Frame pointer (RBP) walk**: 매 fastest, requires `-fno-omit-frame-pointer`.
- **DWARF unwind**: 매 .eh_frame 사용 — frame pointer 불필요하나 expensive.
- **ORC unwinder**: 매 Linux kernel 의 lightweight DWARF subset.
- **eBPF stackmap**: 매 user+kernel stack 통합.
### 매 visualization
- **Flame graph (Brendan Gregg)**: 매 x=share of samples, y=stack depth, width=hot.
- **Icicle graph**: 매 flipped flame — root at top.
- **Differential flame graph**: 매 두 profile diff — perf regression 사냥.
### 매 응용
1. **CPU bottleneck 진단**: 매 hot function 식별.
2. **Lock contention**: 매 off-CPU profile + futex stack.
3. **GC pressure**: 매 alloc-stack profile.
4. **Cold start**: 매 startup phase flame graph.
5. **Continuous profiling**: 매 prod 24/7 sample → regression alerting.
## 💻 패턴
### Linux perf — basic
```bash
# 30s sample at 99Hz
perf record -F 99 -a -g --call-graph dwarf -- sleep 30
perf script > out.stack
# render
git clone https://github.com/brendangregg/FlameGraph
./FlameGraph/stackcollapse-perf.pl out.stack | \
./FlameGraph/flamegraph.pl > flame.svg
```
### eBPF profile (BCC)
```bash
profile-bpfcc -F 99 -f 30 > out.folded
flamegraph.pl out.folded > flame.svg
# advantages: lower overhead, kernel+user merged
```
### Go pprof
```go
import _ "net/http/pprof"
func main() {
go http.ListenAndServe(":6060", nil)
// ... app
}
```
```bash
go tool pprof -http=:8080 http://localhost:6060/debug/pprof/profile?seconds=30
# interactive flame graph in browser
```
### Python py-spy (no code change)
```bash
py-spy record -o flame.svg -d 30 --pid $(pgrep -f myapp.py)
# zero instrumentation, samples a running process
```
### Node.js / V8
```bash
node --prof app.js
# ... run workload ...
node --prof-process isolate-0xNNN-v8.log > profile.txt
# or 0x: npx 0x -- node app.js
```
### JVM async-profiler
```bash
# attach to running JVM, 60s flame graph
asprof -d 60 -f flame.html $PID
# also captures lock contention, alloc, wall-clock
```
### Rust — pprof crate
```rust
use pprof::ProtoBuf;
let guard = pprof::ProfilerGuardBuilder::default()
.frequency(999)
.blocklist(&["libc", "libgcc", "pthread"])
.build()?;
// ... workload ...
let report = guard.report().build()?;
let mut file = std::fs::File::create("profile.pb")?;
report.pprof()?.encode(&mut file)?;
```
### Continuous profiling (Pyroscope / Grafana)
```yaml
# pyroscope agent — runs alongside app
pyroscope:
server: http://pyroscope:4040
app_name: api-prod
spy_name: ebpfspy # auto-detect language
sample_rate: 100
```
### Differential flame graph
```bash
./FlameGraph/difffolded.pl before.folded after.folded | \
./FlameGraph/flamegraph.pl > diff.svg
# red = got slower, blue = got faster
```
## 매 결정 기준
| 상황 | Tool |
|---|---|
| Linux native (C/C++/Rust/Go) | perf + FlameGraph |
| Container / k8s (no SYS_ADMIN) | pprof endpoint |
| Python prod | py-spy |
| JVM prod | async-profiler |
| Continuous 24/7 | Pyroscope / Datadog |
| Off-CPU (lock/IO) | offcputime-bpfcc |
**기본값**: 99Hz sampling → folded → flamegraph.pl. 매 첫 5분 안에 hot path 보임.
## 🔗 Graph
- 부모: [[Performance_Engineering]] · [[Observability]]
- 변형: [[Flame_Graph]] · [[Off_CPU_Profile]] · [[Differential_Flame_Graph]]
- 응용: [[Continuous_Profiling]] · [[Performance_Regression_Detection]]
- Adjacent: [[eBPF]] · [[perf]] · [[pprof]] · [[Brendan_Gregg]]
## 🤖 LLM 활용
**언제**: hot function 분석, regression diff, profile 결과 해석.
**언제 X**: tail latency / distributed trace (분산 환경은 OpenTelemetry).
## ❌ 안티패턴
- **Time.time printf 로그 profiling**: 매 statistical 안 되고 hot loop 망침.
- **Frame pointer 없는 build**: 매 unwind 망가짐 — `-fno-omit-frame-pointer` 필수.
- **너무 낮은 sample rate (10Hz)**: 매 30초 = 300 samples — noise dominate.
- **너무 높은 rate (10kHz)**: 매 self-overhead 가 측정 결과 왜곡.
- **Single-run profile 만 보기**: 매 variance — minimum 5 runs 권장.
## 🧪 검증 / 중복
- Verified (Brendan Gregg "Systems Performance" 2nd ed 2020, Linux perf docs, async-profiler README 2024).
- 신뢰도 A.
## 🕓 Changelog
| 날짜 | 변경 |
|---|---|
| 2026-05-08 | Phase 1 |
| 2026-05-10 | Manual cleanup — sampling profilers, flame graphs, multi-language tooling |