--- id: wiki-2026-0508-multi-threaded-architecture title: Multi-threaded Architecture category: 10_Wiki/Topics status: verified canonical_id: self aliases: [Multithreading, Concurrent Architecture, MT Architecture] duplicate_of: none source_trust_level: A confidence_score: 0.9 verification_status: applied tags: [architecture, concurrency, threading, performance] raw_sources: [] last_reinforced: 2026-05-10 github_commit: pending tech_stack: language: cpp framework: stdthread-tbb-rayon --- # Multi-threaded Architecture ## 매 한 줄 > **"매 work를 multiple threads에 분산하여 throughput · responsiveness를 동시에 확보."**. 1990s SMP era에서 출발하여 2026 현재 manycore (Apple M4 Max 16-core, AMD Threadripper 96-core), GPU offload, async/await coroutine model이 주류. Game engine · server · ML inference · browser engine 모두 multi-threaded 설계가 default. ## 매 핵심 ### 매 thread model 종류 - **OS thread (1:1)**: pthread, std::thread — kernel-scheduled, expensive context switch. - **Green thread / fiber**: Go goroutine, Java 21 virtual thread — userland scheduler, M:N mapping. - **Coroutine / async task**: C++20 coroutine, Rust async, Kotlin coroutine — stackless, await-resume. - **Task-based**: Intel TBB, .NET TPL, Apple GCD — work-stealing scheduler, no thread management. ### 매 architectural pattern - **Producer-consumer**: bounded queue로 backpressure. - **Pipeline**: stage별 thread, ring buffer로 연결 (LMAX Disruptor). - **Fork-join**: divide & conquer, work-stealing. - **Actor**: 매 message passing (Akka, Erlang, Pony) — no shared state. - **Data parallelism**: SIMD + thread pool — Rayon `par_iter()`, OpenMP `#pragma omp parallel for`. ### 매 응용 1. Game engine — render thread + game thread + audio thread + IO thread. 2. Browser — process-per-tab + GPU process + utility processes. 3. Database — connection pool + worker threads + background flush. ## 💻 패턴 ### Thread pool with work queue (C++20) ```cpp #include #include #include #include #include class ThreadPool { std::vector workers; std::queue> tasks; std::mutex mtx; std::condition_variable cv; bool stop = false; public: explicit ThreadPool(size_t n) { for (size_t i = 0; i < n; ++i) workers.emplace_back([this](std::stop_token st) { while (!st.stop_requested()) { std::function task; { std::unique_lock lk(mtx); cv.wait(lk, [&]{ return stop || !tasks.empty(); }); if (stop && tasks.empty()) return; task = std::move(tasks.front()); tasks.pop(); } task(); } }); } template void submit(F&& f) { { std::lock_guard lk(mtx); tasks.emplace(std::forward(f)); } cv.notify_one(); } }; ``` ### Rust Rayon data parallelism ```rust use rayon::prelude::*; fn process_batch(items: &[Item]) -> Vec { items.par_iter() .filter(|i| i.valid()) .map(|i| expensive_compute(i)) .collect() } // auto: work-stealing across all cores ``` ### Go goroutine + channel (fan-out / fan-in) ```go func pipeline(input <-chan Job) <-chan Result { out := make(chan Result, 100) var wg sync.WaitGroup for i := 0; i < runtime.NumCPU(); i++ { wg.Add(1) go func() { defer wg.Done() for job := range input { out <- process(job) } }() } go func() { wg.Wait(); close(out) }() return out } ``` ### Lock-free SPSC ring buffer ```cpp template class SPSCQueue { alignas(64) std::atomic head{0}; alignas(64) std::atomic tail{0}; T buffer[N]; public: bool push(T v) { auto t = tail.load(std::memory_order_relaxed); auto next = (t + 1) % N; if (next == head.load(std::memory_order_acquire)) return false; buffer[t] = std::move(v); tail.store(next, std::memory_order_release); return true; } }; ``` ### Game engine 3-thread architecture ```cpp // Main thread: input + game logic // Render thread: GPU command buffer // IO thread: asset streaming struct FrameSync { std::atomic game_frame{0}; std::atomic render_frame{0}; std::counting_semaphore<2> render_ready{0}; }; // double-buffer scene state to allow N+1 game tick parallel with N render ``` ## 매 결정 기준 | 상황 | Approach | |---|---| | CPU-bound, divisible work | Rayon / OpenMP / TBB | | IO-heavy (10k+ connections) | async/await (Tokio, asyncio, Node) | | Real-time game loop | dedicated threads + lock-free queue | | Mixed workload | task-based (TBB, GCD) | | Simple parallel-for | thread pool + work queue | | Distributed across machines | actor (Akka) or message queue | **기본값**: task-based scheduler (TBB/Rayon/Tokio) — manual thread management 회피. ## 🔗 Graph - 부모: [[Concurrent_Rendering]] · [[Distributed-Systems|Distributed_Computing]] - 변형: [[Fiber_Architecture]] - 응용: [[Game_Loop]] · [[V8 엔진 힙 아키텍처|V8 Heap Architecture]] · [[Browser]] - Adjacent: [[SharedArrayBuffer_보안_이슈와_Cross-Origin_Isolation]] · [[Memory_Leaks]] ## 🤖 LLM 활용 **언제**: throughput-critical workload, multi-core utilization, real-time game/server, ML inference batching. **언제 X**: simple sequential script, IO-light short-lived task, single-core embedded — 매 overhead 큼. ## ❌ 안티패턴 - **Shared mutable state without sync**: data race · UB. - **Coarse global lock**: 매 single-thread보다 느림 (lock contention). - **Thread per request (10k+)**: stack memory 폭발 — async 또는 thread pool 사용. - **busy-wait spin**: CPU 100% 소모 — condition variable / semaphore. - **False sharing**: 같은 cache line의 다른 atomic — alignas(64) cache padding. ## 🧪 검증 / 중복 - Verified (Herb Sutter "The Free Lunch Is Over" 2005, Intel TBB docs, Rust async book 2026). - 신뢰도 A. ## 🕓 Changelog | 날짜 | 변경 | |---|---| | 2026-05-08 | Phase 1 | | 2026-05-10 | Manual cleanup — full content (thread models, patterns, decision matrix) |