Files

T

Antigravity Agent 504fd5fb42 [G1-Sync] Manual knowledge update

2026-05-10 22:08:15 +09:00

5.4 KiB

Raw Blame History

id, title, category, status, canonical_id, aliases, duplicate_of, source_trust_level, confidence_score, verification_status, tags, raw_sources, last_reinforced, github_commit, tech_stack

title

Call Stack Analysis

매 한 줄

"매 performance bug 의 95% 는 'where is CPU time spent?' — 매 call stack sampling 이 답한다.". 매 stack trace 를 statistical 하게 sampling → flame graph 로 visualize 하면 hot path 가 즉시 보임. 매 2026 표준 stack 은 Linux perf + eBPF, 매 inferno / pyroscope / Datadog Continuous Profiler.

매 핵심

매 sampling vs instrumentation

Sampling profiler: 매 N Hz (보통 99/999Hz) 마다 stack capture → low overhead, statistical.
Instrumented profiler: 매 every entry/exit hook → exact, but 10-100x overhead.
현대 default: 매 sampling — 매 production-safe.

매 stack source

Frame pointer (RBP) walk: 매 fastest, requires -fno-omit-frame-pointer.
DWARF unwind: 매 .eh_frame 사용 — frame pointer 불필요하나 expensive.
ORC unwinder: 매 Linux kernel 의 lightweight DWARF subset.
eBPF stackmap: 매 user+kernel stack 통합.

매 visualization

Flame graph (Brendan Gregg): 매 x=share of samples, y=stack depth, width=hot.
Icicle graph: 매 flipped flame — root at top.
Differential flame graph: 매 두 profile diff — perf regression 사냥.

매 응용

CPU bottleneck 진단: 매 hot function 식별.
Lock contention: 매 off-CPU profile + futex stack.
GC pressure: 매 alloc-stack profile.
Cold start: 매 startup phase flame graph.
Continuous profiling: 매 prod 24/7 sample → regression alerting.

💻 패턴

Linux perf — basic

# 30s sample at 99Hz
perf record -F 99 -a -g --call-graph dwarf -- sleep 30
perf script > out.stack
# render
git clone https://github.com/brendangregg/FlameGraph
./FlameGraph/stackcollapse-perf.pl out.stack | \
  ./FlameGraph/flamegraph.pl > flame.svg

eBPF profile (BCC)

profile-bpfcc -F 99 -f 30 > out.folded
flamegraph.pl out.folded > flame.svg
# advantages: lower overhead, kernel+user merged

Go pprof

import _ "net/http/pprof"
func main() {
    go http.ListenAndServe(":6060", nil)
    // ... app
}

go tool pprof -http=:8080 http://localhost:6060/debug/pprof/profile?seconds=30
# interactive flame graph in browser

Python py-spy (no code change)

py-spy record -o flame.svg -d 30 --pid $(pgrep -f myapp.py)
# zero instrumentation, samples a running process

Node.js / V8

node --prof app.js
# ... run workload ...
node --prof-process isolate-0xNNN-v8.log > profile.txt
# or 0x: npx 0x -- node app.js

JVM async-profiler

# attach to running JVM, 60s flame graph
asprof -d 60 -f flame.html $PID
# also captures lock contention, alloc, wall-clock

Rust — pprof crate

use pprof::ProtoBuf;
let guard = pprof::ProfilerGuardBuilder::default()
    .frequency(999)
    .blocklist(&["libc", "libgcc", "pthread"])
    .build()?;
// ... workload ...
let report = guard.report().build()?;
let mut file = std::fs::File::create("profile.pb")?;
report.pprof()?.encode(&mut file)?;

Continuous profiling (Pyroscope / Grafana)

# pyroscope agent — runs alongside app
pyroscope:
  server: http://pyroscope:4040
  app_name: api-prod
  spy_name: ebpfspy           # auto-detect language
  sample_rate: 100

Differential flame graph

./FlameGraph/difffolded.pl before.folded after.folded | \
  ./FlameGraph/flamegraph.pl > diff.svg
# red = got slower, blue = got faster

매 결정 기준

상황	Tool
Linux native (C/C++/Rust/Go)	perf + FlameGraph
Container / k8s (no SYS_ADMIN)	pprof endpoint
Python prod	py-spy
JVM prod	async-profiler
Continuous 24/7	Pyroscope / Datadog
Off-CPU (lock/IO)	offcputime-bpfcc

기본값: 99Hz sampling → folded → flamegraph.pl. 매 첫 5분 안에 hot path 보임.

🔗 Graph

부모: Performance_Engineering · Observability
변형: Flame_Graph · Off_CPU_Profile · Differential_Flame_Graph
응용: Continuous_Profiling · Performance_Regression_Detection
Adjacent: eBPF · perf · pprof · Brendan_Gregg

🤖 LLM 활용

언제: hot function 분석, regression diff, profile 결과 해석. 언제 X: tail latency / distributed trace (분산 환경은 OpenTelemetry).

❌ 안티패턴

Time.time printf 로그 profiling: 매 statistical 안 되고 hot loop 망침.
Frame pointer 없는 build: 매 unwind 망가짐 — -fno-omit-frame-pointer 필수.
너무 낮은 sample rate (10Hz): 매 30초 = 300 samples — noise dominate.
너무 높은 rate (10kHz): 매 self-overhead 가 측정 결과 왜곡.
Single-run profile 만 보기: 매 variance — minimum 5 runs 권장.

🧪 검증 / 중복

Verified (Brendan Gregg "Systems Performance" 2nd ed 2020, Linux perf docs, async-profiler README 2024).
신뢰도 A.

🕓 Changelog

날짜	변경
2026-05-08	Phase 1
2026-05-10	Manual cleanup — sampling profilers, flame graphs, multi-language tooling

5.4 KiB Raw Blame History