Files
2nd/10_Wiki/Topics/DevOps_and_Security/CPU Bottleneck.md
T
2026-05-10 22:08:15 +09:00

159 lines
5.0 KiB
Markdown

---
id: wiki-2026-0508-cpu-bottleneck
title: CPU Bottleneck
category: 10_Wiki/Topics
status: verified
canonical_id: self
aliases: [CPU-Bound, Compute Bottleneck]
duplicate_of: none
source_trust_level: A
confidence_score: 0.9
verification_status: applied
tags: [performance, profiling, cpu]
raw_sources: []
last_reinforced: 2026-05-10
github_commit: applied
tech_stack:
language: C++/Rust/JS
framework: perf/Instruments/Chrome DevTools
---
# CPU Bottleneck
## 매 한 줄
> **"매 GPU 가 놀고 main thread 가 100% 면 CPU bottleneck."**. CPU bottleneck 은 frame budget 16.7ms (60fps) 또는 11ms (90fps XR) 안에 main thread 작업이 안 끝나는 상태. 2026 진단: Chrome Performance panel + perf + Instruments → fix: WebWorker / WASM SIMD / off-main-thread / batching.
## 매 핵심
### 매 진단 신호
- GPU utilization < 70% but FPS drop.
- Long Task > 50ms in Performance panel.
- `perf top` 의 single function 이 hot.
- Profile 의 self-time 이 한 함수에 집중.
### 매 Bottleneck Source
- **Main-thread JS**: parse, layout, large loop.
- **Layout thrash**: read-write-read DOM.
- **GC pause**: allocation pressure.
- **Synchronous IO**: blocking syscall.
- **Unoptimized algorithm**: O(n²) on hot path.
- **Single-core saturation**: no parallelism.
### 매 Fix Strategy
1. **Profile first** — 매 measure, not guess.
2. **Off-main-thread**: WebWorker, OffscreenCanvas.
3. **Batch**: requestAnimationFrame, microtask.
4. **SIMD/WASM**: 매 hot inner loop.
5. **Algorithmic**: O(n²) → O(n log n).
6. **Cache**: memoize, weak-ref.
7. **Lazy**: defer, code-split.
## 💻 패턴
### Detect long task
```javascript
const obs = new PerformanceObserver(list => {
for (const e of list.getEntries()) {
if (e.duration > 50) console.warn('long task', e.duration, e.name);
}
});
obs.observe({ entryTypes: ['longtask'] });
```
### Move work to Worker
```javascript
// main.js
const w = new Worker('worker.js', { type: 'module' });
w.postMessage({ data: bigArray }, [bigArray.buffer]); // 매 transfer, zero-copy
w.onmessage = e => render(e.data);
// worker.js
self.onmessage = e => {
const result = heavyCompute(e.data.data);
self.postMessage(result, [result.buffer]);
};
```
### WASM SIMD hot loop (Rust)
```rust
#[target_feature(enable = "simd128")]
unsafe fn dot_product(a: &[f32], b: &[f32]) -> f32 {
use std::arch::wasm32::*;
let mut sum = f32x4_splat(0.0);
for i in (0..a.len()).step_by(4) {
let va = v128_load(a.as_ptr().add(i) as *const v128);
let vb = v128_load(b.as_ptr().add(i) as *const v128);
sum = f32x4_add(sum, f32x4_mul(va, vb));
}
f32x4_extract_lane::<0>(sum) + f32x4_extract_lane::<1>(sum)
+ f32x4_extract_lane::<2>(sum) + f32x4_extract_lane::<3>(sum)
}
```
### Time-sliced loop (yield to event loop)
```javascript
async function processChunked(items) {
const CHUNK = 200;
for (let i = 0; i < items.length; i += CHUNK) {
items.slice(i, i + CHUNK).forEach(processOne);
await new Promise(r => setTimeout(r, 0)); // 매 yield
}
}
// 또는 scheduler.yield() (2025+)
if ('scheduler' in window && 'yield' in scheduler) await scheduler.yield();
```
### Batch DOM read/write
```javascript
// 매 안티 — layout thrash
items.forEach(el => { const w = el.offsetWidth; el.style.width = (w*2)+'px'; });
// 매 fix — read first, then write
const widths = items.map(el => el.offsetWidth);
items.forEach((el, i) => { el.style.width = (widths[i]*2)+'px'; });
```
### Linux perf hot function
```bash
sudo perf record -F 99 -g -p $(pidof myapp) -- sleep 10
sudo perf report --stdio | head -40
sudo perf script | stackcollapse-perf.pl | flamegraph.pl > flame.svg
```
## 매 결정 기준
| 상황 | Approach |
|---|---|
| Long JS function | WebWorker / time-slice |
| Image/video pipeline | OffscreenCanvas |
| Number crunching | WASM SIMD / GPU compute |
| Layout thrash | read-then-write batch |
| GC pressure | object pool |
| Multi-core unused | Worker pool / parallel |
**기본값**: 매 measure → identify hot fn → off-main-thread or algorithmic fix.
## 🔗 Graph
- 부모: [[Analyze runtime performance]] · [[Flame_Graphs]]
- 변형: [[Draw Call]]
- 응용: [[Tree Shaking (번들 크기 최적화)]] · [[Frustum Culling]]
- Adjacent: [[Memory Management]] · [[Branch Prediction]]
## 🤖 LLM 활용
**언제**: profile flamegraph 해석, hot-function refactor 제안, perf annotation.
**언제 X**: 매 actual perf measurement — deterministic 도구가 정확.
## ❌ 안티패턴
- **Premature optimization**: 매 profile 없이 추측 — 잘못된 부분 fix.
- **Worker overuse**: 매 small task 의 postMessage 오버헤드 > 이득.
- **`while(true)` busy-wait**: 매 throttle / requestIdleCallback 사용.
- **Synchronous XHR**: 매 deprecated, main-thread block.
## 🧪 검증 / 중복
- Verified: Chrome Performance docs; web.dev Long Tasks; Linux perf-tools (Brendan Gregg).
- 신뢰도 A.
## 🕓 Changelog
| 날짜 | 변경 |
|---|---|
| 2026-05-08 | Phase 1 |
| 2026-05-10 | Manual cleanup — diagnosis + Worker/SIMD/yield patterns |