159 lines
5.0 KiB
Markdown
159 lines
5.0 KiB
Markdown
---
|
|
id: wiki-2026-0508-cpu-bottleneck
|
|
title: CPU Bottleneck
|
|
category: 10_Wiki/Topics
|
|
status: verified
|
|
canonical_id: self
|
|
aliases: [CPU-Bound, Compute Bottleneck]
|
|
duplicate_of: none
|
|
source_trust_level: A
|
|
confidence_score: 0.9
|
|
verification_status: applied
|
|
tags: [performance, profiling, cpu]
|
|
raw_sources: []
|
|
last_reinforced: 2026-05-10
|
|
github_commit: applied
|
|
tech_stack:
|
|
language: C++/Rust/JS
|
|
framework: perf/Instruments/Chrome DevTools
|
|
---
|
|
|
|
# CPU Bottleneck
|
|
|
|
## 매 한 줄
|
|
> **"매 GPU 가 놀고 main thread 가 100% 면 CPU bottleneck."**. CPU bottleneck 은 frame budget 16.7ms (60fps) 또는 11ms (90fps XR) 안에 main thread 작업이 안 끝나는 상태. 2026 진단: Chrome Performance panel + perf + Instruments → fix: WebWorker / WASM SIMD / off-main-thread / batching.
|
|
|
|
## 매 핵심
|
|
|
|
### 매 진단 신호
|
|
- GPU utilization < 70% but FPS drop.
|
|
- Long Task > 50ms in Performance panel.
|
|
- `perf top` 의 single function 이 hot.
|
|
- Profile 의 self-time 이 한 함수에 집중.
|
|
|
|
### 매 Bottleneck Source
|
|
- **Main-thread JS**: parse, layout, large loop.
|
|
- **Layout thrash**: read-write-read DOM.
|
|
- **GC pause**: allocation pressure.
|
|
- **Synchronous IO**: blocking syscall.
|
|
- **Unoptimized algorithm**: O(n²) on hot path.
|
|
- **Single-core saturation**: no parallelism.
|
|
|
|
### 매 Fix Strategy
|
|
1. **Profile first** — 매 measure, not guess.
|
|
2. **Off-main-thread**: WebWorker, OffscreenCanvas.
|
|
3. **Batch**: requestAnimationFrame, microtask.
|
|
4. **SIMD/WASM**: 매 hot inner loop.
|
|
5. **Algorithmic**: O(n²) → O(n log n).
|
|
6. **Cache**: memoize, weak-ref.
|
|
7. **Lazy**: defer, code-split.
|
|
|
|
## 💻 패턴
|
|
|
|
### Detect long task
|
|
```javascript
|
|
const obs = new PerformanceObserver(list => {
|
|
for (const e of list.getEntries()) {
|
|
if (e.duration > 50) console.warn('long task', e.duration, e.name);
|
|
}
|
|
});
|
|
obs.observe({ entryTypes: ['longtask'] });
|
|
```
|
|
|
|
### Move work to Worker
|
|
```javascript
|
|
// main.js
|
|
const w = new Worker('worker.js', { type: 'module' });
|
|
w.postMessage({ data: bigArray }, [bigArray.buffer]); // 매 transfer, zero-copy
|
|
w.onmessage = e => render(e.data);
|
|
|
|
// worker.js
|
|
self.onmessage = e => {
|
|
const result = heavyCompute(e.data.data);
|
|
self.postMessage(result, [result.buffer]);
|
|
};
|
|
```
|
|
|
|
### WASM SIMD hot loop (Rust)
|
|
```rust
|
|
#[target_feature(enable = "simd128")]
|
|
unsafe fn dot_product(a: &[f32], b: &[f32]) -> f32 {
|
|
use std::arch::wasm32::*;
|
|
let mut sum = f32x4_splat(0.0);
|
|
for i in (0..a.len()).step_by(4) {
|
|
let va = v128_load(a.as_ptr().add(i) as *const v128);
|
|
let vb = v128_load(b.as_ptr().add(i) as *const v128);
|
|
sum = f32x4_add(sum, f32x4_mul(va, vb));
|
|
}
|
|
f32x4_extract_lane::<0>(sum) + f32x4_extract_lane::<1>(sum)
|
|
+ f32x4_extract_lane::<2>(sum) + f32x4_extract_lane::<3>(sum)
|
|
}
|
|
```
|
|
|
|
### Time-sliced loop (yield to event loop)
|
|
```javascript
|
|
async function processChunked(items) {
|
|
const CHUNK = 200;
|
|
for (let i = 0; i < items.length; i += CHUNK) {
|
|
items.slice(i, i + CHUNK).forEach(processOne);
|
|
await new Promise(r => setTimeout(r, 0)); // 매 yield
|
|
}
|
|
}
|
|
// 또는 scheduler.yield() (2025+)
|
|
if ('scheduler' in window && 'yield' in scheduler) await scheduler.yield();
|
|
```
|
|
|
|
### Batch DOM read/write
|
|
```javascript
|
|
// 매 안티 — layout thrash
|
|
items.forEach(el => { const w = el.offsetWidth; el.style.width = (w*2)+'px'; });
|
|
// 매 fix — read first, then write
|
|
const widths = items.map(el => el.offsetWidth);
|
|
items.forEach((el, i) => { el.style.width = (widths[i]*2)+'px'; });
|
|
```
|
|
|
|
### Linux perf hot function
|
|
```bash
|
|
sudo perf record -F 99 -g -p $(pidof myapp) -- sleep 10
|
|
sudo perf report --stdio | head -40
|
|
sudo perf script | stackcollapse-perf.pl | flamegraph.pl > flame.svg
|
|
```
|
|
|
|
## 매 결정 기준
|
|
| 상황 | Approach |
|
|
|---|---|
|
|
| Long JS function | WebWorker / time-slice |
|
|
| Image/video pipeline | OffscreenCanvas |
|
|
| Number crunching | WASM SIMD / GPU compute |
|
|
| Layout thrash | read-then-write batch |
|
|
| GC pressure | object pool |
|
|
| Multi-core unused | Worker pool / parallel |
|
|
|
|
**기본값**: 매 measure → identify hot fn → off-main-thread or algorithmic fix.
|
|
|
|
## 🔗 Graph
|
|
- 부모: [[Analyze runtime performance]] · [[Flame_Graphs]]
|
|
- 변형: [[Draw Call]]
|
|
- 응용: [[Tree Shaking (번들 크기 최적화)]] · [[Frustum Culling]]
|
|
- Adjacent: [[Memory Management]] · [[Branch Prediction]]
|
|
|
|
## 🤖 LLM 활용
|
|
**언제**: profile flamegraph 해석, hot-function refactor 제안, perf annotation.
|
|
**언제 X**: 매 actual perf measurement — deterministic 도구가 정확.
|
|
|
|
## ❌ 안티패턴
|
|
- **Premature optimization**: 매 profile 없이 추측 — 잘못된 부분 fix.
|
|
- **Worker overuse**: 매 small task 의 postMessage 오버헤드 > 이득.
|
|
- **`while(true)` busy-wait**: 매 throttle / requestIdleCallback 사용.
|
|
- **Synchronous XHR**: 매 deprecated, main-thread block.
|
|
|
|
## 🧪 검증 / 중복
|
|
- Verified: Chrome Performance docs; web.dev Long Tasks; Linux perf-tools (Brendan Gregg).
|
|
- 신뢰도 A.
|
|
|
|
## 🕓 Changelog
|
|
| 날짜 | 변경 |
|
|
|---|---|
|
|
| 2026-05-08 | Phase 1 |
|
|
| 2026-05-10 | Manual cleanup — diagnosis + Worker/SIMD/yield patterns |
|