f8b21af4be
10_Wiki/Topics 대규모 정리: - 오류 캡처/미완성 stub 문서 227개 제거 - 교차폴더 중복 43클러스터 병합 (63파일 → redirect) - 링크명 정규화: 깨진 링크 수정·redirect 직결·개념 매핑 ~2,400건 - 카테고리 MOC 6개 신규 생성 - Graph 섹션 미해결 related-keyword 링크 10,058건 제거 Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com>
172 lines
5.2 KiB
Markdown
172 lines
5.2 KiB
Markdown
---
|
|
id: wiki-2026-0508-call-stack-analysis
|
|
title: Call Stack Analysis
|
|
category: 10_Wiki/Topics
|
|
status: verified
|
|
canonical_id: self
|
|
aliases: [Call Stack, Stack Trace Analysis, Flame Graph, Profiling Stack]
|
|
duplicate_of: none
|
|
source_trust_level: A
|
|
confidence_score: 0.95
|
|
verification_status: applied
|
|
tags: [profiling, performance, flame-graph, debugging, observability]
|
|
raw_sources: []
|
|
last_reinforced: 2026-05-10
|
|
github_commit: pending
|
|
tech_stack:
|
|
language: Polyglot
|
|
framework: perf/eBPF/pprof
|
|
---
|
|
|
|
# Call Stack Analysis
|
|
|
|
## 매 한 줄
|
|
> **"매 performance bug 의 95% 는 'where is CPU time spent?' — 매 call stack sampling 이 답한다."**. 매 stack trace 를 statistical 하게 sampling → flame graph 로 visualize 하면 hot path 가 즉시 보임. 매 2026 표준 stack 은 Linux perf + eBPF, 매 inferno / pyroscope / Datadog Continuous Profiler.
|
|
|
|
## 매 핵심
|
|
|
|
### 매 sampling vs instrumentation
|
|
- **Sampling profiler**: 매 N Hz (보통 99/999Hz) 마다 stack capture → low overhead, statistical.
|
|
- **Instrumented profiler**: 매 every entry/exit hook → exact, but 10-100x overhead.
|
|
- **현대 default**: 매 sampling — 매 production-safe.
|
|
|
|
### 매 stack source
|
|
- **Frame pointer (RBP) walk**: 매 fastest, requires `-fno-omit-frame-pointer`.
|
|
- **DWARF unwind**: 매 .eh_frame 사용 — frame pointer 불필요하나 expensive.
|
|
- **ORC unwinder**: 매 Linux kernel 의 lightweight DWARF subset.
|
|
- **eBPF stackmap**: 매 user+kernel stack 통합.
|
|
|
|
### 매 visualization
|
|
- **Flame graph (Brendan Gregg)**: 매 x=share of samples, y=stack depth, width=hot.
|
|
- **Icicle graph**: 매 flipped flame — root at top.
|
|
- **Differential flame graph**: 매 두 profile diff — perf regression 사냥.
|
|
|
|
### 매 응용
|
|
1. **CPU bottleneck 진단**: 매 hot function 식별.
|
|
2. **Lock contention**: 매 off-CPU profile + futex stack.
|
|
3. **GC pressure**: 매 alloc-stack profile.
|
|
4. **Cold start**: 매 startup phase flame graph.
|
|
5. **Continuous profiling**: 매 prod 24/7 sample → regression alerting.
|
|
|
|
## 💻 패턴
|
|
|
|
### Linux perf — basic
|
|
```bash
|
|
# 30s sample at 99Hz
|
|
perf record -F 99 -a -g --call-graph dwarf -- sleep 30
|
|
perf script > out.stack
|
|
# render
|
|
git clone https://github.com/brendangregg/FlameGraph
|
|
./FlameGraph/stackcollapse-perf.pl out.stack | \
|
|
./FlameGraph/flamegraph.pl > flame.svg
|
|
```
|
|
|
|
### eBPF profile (BCC)
|
|
```bash
|
|
profile-bpfcc -F 99 -f 30 > out.folded
|
|
flamegraph.pl out.folded > flame.svg
|
|
# advantages: lower overhead, kernel+user merged
|
|
```
|
|
|
|
### Go pprof
|
|
```go
|
|
import _ "net/http/pprof"
|
|
func main() {
|
|
go http.ListenAndServe(":6060", nil)
|
|
// ... app
|
|
}
|
|
```
|
|
```bash
|
|
go tool pprof -http=:8080 http://localhost:6060/debug/pprof/profile?seconds=30
|
|
# interactive flame graph in browser
|
|
```
|
|
|
|
### Python py-spy (no code change)
|
|
```bash
|
|
py-spy record -o flame.svg -d 30 --pid $(pgrep -f myapp.py)
|
|
# zero instrumentation, samples a running process
|
|
```
|
|
|
|
### Node.js / V8
|
|
```bash
|
|
node --prof app.js
|
|
# ... run workload ...
|
|
node --prof-process isolate-0xNNN-v8.log > profile.txt
|
|
# or 0x: npx 0x -- node app.js
|
|
```
|
|
|
|
### JVM async-profiler
|
|
```bash
|
|
# attach to running JVM, 60s flame graph
|
|
asprof -d 60 -f flame.html $PID
|
|
# also captures lock contention, alloc, wall-clock
|
|
```
|
|
|
|
### Rust — pprof crate
|
|
```rust
|
|
use pprof::ProtoBuf;
|
|
let guard = pprof::ProfilerGuardBuilder::default()
|
|
.frequency(999)
|
|
.blocklist(&["libc", "libgcc", "pthread"])
|
|
.build()?;
|
|
// ... workload ...
|
|
let report = guard.report().build()?;
|
|
let mut file = std::fs::File::create("profile.pb")?;
|
|
report.pprof()?.encode(&mut file)?;
|
|
```
|
|
|
|
### Continuous profiling (Pyroscope / Grafana)
|
|
```yaml
|
|
# pyroscope agent — runs alongside app
|
|
pyroscope:
|
|
server: http://pyroscope:4040
|
|
app_name: api-prod
|
|
spy_name: ebpfspy # auto-detect language
|
|
sample_rate: 100
|
|
```
|
|
|
|
### Differential flame graph
|
|
```bash
|
|
./FlameGraph/difffolded.pl before.folded after.folded | \
|
|
./FlameGraph/flamegraph.pl > diff.svg
|
|
# red = got slower, blue = got faster
|
|
```
|
|
|
|
## 매 결정 기준
|
|
| 상황 | Tool |
|
|
|---|---|
|
|
| Linux native (C/C++/Rust/Go) | perf + FlameGraph |
|
|
| Container / k8s (no SYS_ADMIN) | pprof endpoint |
|
|
| Python prod | py-spy |
|
|
| JVM prod | async-profiler |
|
|
| Continuous 24/7 | Pyroscope / Datadog |
|
|
| Off-CPU (lock/IO) | offcputime-bpfcc |
|
|
|
|
**기본값**: 99Hz sampling → folded → flamegraph.pl. 매 첫 5분 안에 hot path 보임.
|
|
|
|
## 🔗 Graph
|
|
- 부모: [[Observability]]
|
|
- 변형: [[Flame_Graph]]
|
|
- Adjacent: [[eBPF]]
|
|
|
|
## 🤖 LLM 활용
|
|
**언제**: hot function 분석, regression diff, profile 결과 해석.
|
|
**언제 X**: tail latency / distributed trace (분산 환경은 OpenTelemetry).
|
|
|
|
## ❌ 안티패턴
|
|
- **Time.time printf 로그 profiling**: 매 statistical 안 되고 hot loop 망침.
|
|
- **Frame pointer 없는 build**: 매 unwind 망가짐 — `-fno-omit-frame-pointer` 필수.
|
|
- **너무 낮은 sample rate (10Hz)**: 매 30초 = 300 samples — noise dominate.
|
|
- **너무 높은 rate (10kHz)**: 매 self-overhead 가 측정 결과 왜곡.
|
|
- **Single-run profile 만 보기**: 매 variance — minimum 5 runs 권장.
|
|
|
|
## 🧪 검증 / 중복
|
|
- Verified (Brendan Gregg "Systems Performance" 2nd ed 2020, Linux perf docs, async-profiler README 2024).
|
|
- 신뢰도 A.
|
|
|
|
## 🕓 Changelog
|
|
| 날짜 | 변경 |
|
|
|---|---|
|
|
| 2026-05-08 | Phase 1 |
|
|
| 2026-05-10 | Manual cleanup — sampling profilers, flame graphs, multi-language tooling |
|