--- id: wiki-2026-0508-cpu-bottleneck title: CPU Bottleneck category: 10_Wiki/Topics status: verified canonical_id: self aliases: [CPU-Bound, Compute Bottleneck] duplicate_of: none source_trust_level: A confidence_score: 0.9 verification_status: applied tags: [performance, profiling, cpu] raw_sources: [] last_reinforced: 2026-05-10 github_commit: applied tech_stack: language: C++/Rust/JS framework: perf/Instruments/Chrome DevTools --- # CPU Bottleneck ## 매 한 줄 > **"매 GPU 가 놀고 main thread 가 100% 면 CPU bottleneck."**. CPU bottleneck 은 frame budget 16.7ms (60fps) 또는 11ms (90fps XR) 안에 main thread 작업이 안 끝나는 상태. 2026 진단: Chrome Performance panel + perf + Instruments → fix: WebWorker / WASM SIMD / off-main-thread / batching. ## 매 핵심 ### 매 진단 신호 - GPU utilization < 70% but FPS drop. - Long Task > 50ms in Performance panel. - `perf top` 의 single function 이 hot. - Profile 의 self-time 이 한 함수에 집중. ### 매 Bottleneck Source - **Main-thread JS**: parse, layout, large loop. - **Layout thrash**: read-write-read DOM. - **GC pause**: allocation pressure. - **Synchronous IO**: blocking syscall. - **Unoptimized algorithm**: O(n²) on hot path. - **Single-core saturation**: no parallelism. ### 매 Fix Strategy 1. **Profile first** — 매 measure, not guess. 2. **Off-main-thread**: WebWorker, OffscreenCanvas. 3. **Batch**: requestAnimationFrame, microtask. 4. **SIMD/WASM**: 매 hot inner loop. 5. **Algorithmic**: O(n²) → O(n log n). 6. **Cache**: memoize, weak-ref. 7. **Lazy**: defer, code-split. ## 💻 패턴 ### Detect long task ```javascript const obs = new PerformanceObserver(list => { for (const e of list.getEntries()) { if (e.duration > 50) console.warn('long task', e.duration, e.name); } }); obs.observe({ entryTypes: ['longtask'] }); ``` ### Move work to Worker ```javascript // main.js const w = new Worker('worker.js', { type: 'module' }); w.postMessage({ data: bigArray }, [bigArray.buffer]); // 매 transfer, zero-copy w.onmessage = e => render(e.data); // worker.js self.onmessage = e => { const result = heavyCompute(e.data.data); self.postMessage(result, [result.buffer]); }; ``` ### WASM SIMD hot loop (Rust) ```rust #[target_feature(enable = "simd128")] unsafe fn dot_product(a: &[f32], b: &[f32]) -> f32 { use std::arch::wasm32::*; let mut sum = f32x4_splat(0.0); for i in (0..a.len()).step_by(4) { let va = v128_load(a.as_ptr().add(i) as *const v128); let vb = v128_load(b.as_ptr().add(i) as *const v128); sum = f32x4_add(sum, f32x4_mul(va, vb)); } f32x4_extract_lane::<0>(sum) + f32x4_extract_lane::<1>(sum) + f32x4_extract_lane::<2>(sum) + f32x4_extract_lane::<3>(sum) } ``` ### Time-sliced loop (yield to event loop) ```javascript async function processChunked(items) { const CHUNK = 200; for (let i = 0; i < items.length; i += CHUNK) { items.slice(i, i + CHUNK).forEach(processOne); await new Promise(r => setTimeout(r, 0)); // 매 yield } } // 또는 scheduler.yield() (2025+) if ('scheduler' in window && 'yield' in scheduler) await scheduler.yield(); ``` ### Batch DOM read/write ```javascript // 매 안티 — layout thrash items.forEach(el => { const w = el.offsetWidth; el.style.width = (w*2)+'px'; }); // 매 fix — read first, then write const widths = items.map(el => el.offsetWidth); items.forEach((el, i) => { el.style.width = (widths[i]*2)+'px'; }); ``` ### Linux perf hot function ```bash sudo perf record -F 99 -g -p $(pidof myapp) -- sleep 10 sudo perf report --stdio | head -40 sudo perf script | stackcollapse-perf.pl | flamegraph.pl > flame.svg ``` ## 매 결정 기준 | 상황 | Approach | |---|---| | Long JS function | WebWorker / time-slice | | Image/video pipeline | OffscreenCanvas | | Number crunching | WASM SIMD / GPU compute | | Layout thrash | read-then-write batch | | GC pressure | object pool | | Multi-core unused | Worker pool / parallel | **기본값**: 매 measure → identify hot fn → off-main-thread or algorithmic fix. ## 🔗 Graph - 부모: [[Analyze runtime performance]] · [[Flame_Graphs]] - 변형: [[Draw Call]] - 응용: [[Tree Shaking (번들 크기 최적화)]] · [[Frustum Culling]] - Adjacent: [[Memory Management]] · [[Branch Prediction]] ## 🤖 LLM 활용 **언제**: profile flamegraph 해석, hot-function refactor 제안, perf annotation. **언제 X**: 매 actual perf measurement — deterministic 도구가 정확. ## ❌ 안티패턴 - **Premature optimization**: 매 profile 없이 추측 — 잘못된 부분 fix. - **Worker overuse**: 매 small task 의 postMessage 오버헤드 > 이득. - **`while(true)` busy-wait**: 매 throttle / requestIdleCallback 사용. - **Synchronous XHR**: 매 deprecated, main-thread block. ## 🧪 검증 / 중복 - Verified: Chrome Performance docs; web.dev Long Tasks; Linux perf-tools (Brendan Gregg). - 신뢰도 A. ## 🕓 Changelog | 날짜 | 변경 | |---|---| | 2026-05-08 | Phase 1 | | 2026-05-10 | Manual cleanup — diagnosis + Worker/SIMD/yield patterns |