Files
2nd/10_Wiki/Topics/DevOps_and_Security/CPU Bottleneck.md
T
2026-05-10 22:08:15 +09:00

5.0 KiB

id, title, category, status, canonical_id, aliases, duplicate_of, source_trust_level, confidence_score, verification_status, tags, raw_sources, last_reinforced, github_commit, tech_stack
id title category status canonical_id aliases duplicate_of source_trust_level confidence_score verification_status tags raw_sources last_reinforced github_commit tech_stack
wiki-2026-0508-cpu-bottleneck CPU Bottleneck 10_Wiki/Topics verified self
CPU-Bound
Compute Bottleneck
none A 0.9 applied
performance
profiling
cpu
2026-05-10 applied
language framework
C++/Rust/JS perf/Instruments/Chrome DevTools

CPU Bottleneck

매 한 줄

"매 GPU 가 놀고 main thread 가 100% 면 CPU bottleneck.". CPU bottleneck 은 frame budget 16.7ms (60fps) 또는 11ms (90fps XR) 안에 main thread 작업이 안 끝나는 상태. 2026 진단: Chrome Performance panel + perf + Instruments → fix: WebWorker / WASM SIMD / off-main-thread / batching.

매 핵심

매 진단 신호

  • GPU utilization < 70% but FPS drop.
  • Long Task > 50ms in Performance panel.
  • perf top 의 single function 이 hot.
  • Profile 의 self-time 이 한 함수에 집중.

매 Bottleneck Source

  • Main-thread JS: parse, layout, large loop.
  • Layout thrash: read-write-read DOM.
  • GC pause: allocation pressure.
  • Synchronous IO: blocking syscall.
  • Unoptimized algorithm: O(n²) on hot path.
  • Single-core saturation: no parallelism.

매 Fix Strategy

  1. Profile first — 매 measure, not guess.
  2. Off-main-thread: WebWorker, OffscreenCanvas.
  3. Batch: requestAnimationFrame, microtask.
  4. SIMD/WASM: 매 hot inner loop.
  5. Algorithmic: O(n²) → O(n log n).
  6. Cache: memoize, weak-ref.
  7. Lazy: defer, code-split.

💻 패턴

Detect long task

const obs = new PerformanceObserver(list => {
  for (const e of list.getEntries()) {
    if (e.duration > 50) console.warn('long task', e.duration, e.name);
  }
});
obs.observe({ entryTypes: ['longtask'] });

Move work to Worker

// main.js
const w = new Worker('worker.js', { type: 'module' });
w.postMessage({ data: bigArray }, [bigArray.buffer]); // 매 transfer, zero-copy
w.onmessage = e => render(e.data);

// worker.js
self.onmessage = e => {
  const result = heavyCompute(e.data.data);
  self.postMessage(result, [result.buffer]);
};

WASM SIMD hot loop (Rust)

#[target_feature(enable = "simd128")]
unsafe fn dot_product(a: &[f32], b: &[f32]) -> f32 {
    use std::arch::wasm32::*;
    let mut sum = f32x4_splat(0.0);
    for i in (0..a.len()).step_by(4) {
        let va = v128_load(a.as_ptr().add(i) as *const v128);
        let vb = v128_load(b.as_ptr().add(i) as *const v128);
        sum = f32x4_add(sum, f32x4_mul(va, vb));
    }
    f32x4_extract_lane::<0>(sum) + f32x4_extract_lane::<1>(sum)
      + f32x4_extract_lane::<2>(sum) + f32x4_extract_lane::<3>(sum)
}

Time-sliced loop (yield to event loop)

async function processChunked(items) {
  const CHUNK = 200;
  for (let i = 0; i < items.length; i += CHUNK) {
    items.slice(i, i + CHUNK).forEach(processOne);
    await new Promise(r => setTimeout(r, 0));   // 매 yield
  }
}
// 또는 scheduler.yield() (2025+)
if ('scheduler' in window && 'yield' in scheduler) await scheduler.yield();

Batch DOM read/write

// 매 안티 — layout thrash
items.forEach(el => { const w = el.offsetWidth; el.style.width = (w*2)+'px'; });
// 매 fix — read first, then write
const widths = items.map(el => el.offsetWidth);
items.forEach((el, i) => { el.style.width = (widths[i]*2)+'px'; });

Linux perf hot function

sudo perf record -F 99 -g -p $(pidof myapp) -- sleep 10
sudo perf report --stdio | head -40
sudo perf script | stackcollapse-perf.pl | flamegraph.pl > flame.svg

매 결정 기준

상황 Approach
Long JS function WebWorker / time-slice
Image/video pipeline OffscreenCanvas
Number crunching WASM SIMD / GPU compute
Layout thrash read-then-write batch
GC pressure object pool
Multi-core unused Worker pool / parallel

기본값: 매 measure → identify hot fn → off-main-thread or algorithmic fix.

🔗 Graph

🤖 LLM 활용

언제: profile flamegraph 해석, hot-function refactor 제안, perf annotation. 언제 X: 매 actual perf measurement — deterministic 도구가 정확.

안티패턴

  • Premature optimization: 매 profile 없이 추측 — 잘못된 부분 fix.
  • Worker overuse: 매 small task 의 postMessage 오버헤드 > 이득.
  • while(true) busy-wait: 매 throttle / requestIdleCallback 사용.
  • Synchronous XHR: 매 deprecated, main-thread block.

🧪 검증 / 중복

  • Verified: Chrome Performance docs; web.dev Long Tasks; Linux perf-tools (Brendan Gregg).
  • 신뢰도 A.

🕓 Changelog

날짜 변경
2026-05-08 Phase 1
2026-05-10 Manual cleanup — diagnosis + Worker/SIMD/yield patterns