2nd/10_Wiki/Topics/Architecture/Multi-threaded Architecture.md

---
id: wiki-2026-0508-multi-threaded-architecture
title: Multi-threaded Architecture
category: 10_Wiki/Topics
status: verified
canonical_id: self
aliases: [Multithreading, Concurrent Architecture, MT Architecture]
duplicate_of: none
source_trust_level: A
confidence_score: 0.9
verification_status: applied
tags: [architecture, concurrency, threading, performance]
raw_sources: []
last_reinforced: 2026-05-10
github_commit: pending
tech_stack:
  language: cpp
  framework: stdthread-tbb-rayon
---

# Multi-threaded Architecture

## 매 한 줄
> **"매 work를 multiple threads에 분산하여 throughput · responsiveness를 동시에 확보."**. 1990s SMP era에서 출발하여 2026 현재 manycore (Apple M4 Max 16-core, AMD Threadripper 96-core), GPU offload, async/await coroutine model이 주류. Game engine · server · ML inference · browser engine 모두 multi-threaded 설계가 default.

## 매 핵심

### 매 thread model 종류
- **OS thread (1:1)**: pthread, std::thread — kernel-scheduled, expensive context switch.
- **Green thread / fiber**: Go goroutine, Java 21 virtual thread — userland scheduler, M:N mapping.
- **Coroutine / async task**: C++20 coroutine, Rust async, Kotlin coroutine — stackless, await-resume.
- **Task-based**: Intel TBB, .NET TPL, Apple GCD — work-stealing scheduler, no thread management.

### 매 architectural pattern
- **Producer-consumer**: bounded queue로 backpressure.
- **Pipeline**: stage별 thread, ring buffer로 연결 (LMAX Disruptor).
- **Fork-join**: divide & conquer, work-stealing.
- **Actor**: 매 message passing (Akka, Erlang, Pony) — no shared state.
- **Data parallelism**: SIMD + thread pool — Rayon `par_iter()`, OpenMP `#pragma omp parallel for`.

### 매 응용
1. Game engine — render thread + game thread + audio thread + IO thread.
2. Browser — process-per-tab + GPU process + utility processes.
3. Database — connection pool + worker threads + background flush.

## 💻 패턴

### Thread pool with work queue (C++20)
```cpp
#include <thread>
#include <queue>
#include <mutex>
#include <condition_variable>
#include <functional>

class ThreadPool {
  std::vector<std::jthread> workers;
  std::queue<std::function<void()>> tasks;
  std::mutex mtx;
  std::condition_variable cv;
  bool stop = false;
public:
  explicit ThreadPool(size_t n) {
    for (size_t i = 0; i < n; ++i)
      workers.emplace_back([this](std::stop_token st) {
        while (!st.stop_requested()) {
          std::function<void()> task;
          {
            std::unique_lock lk(mtx);
            cv.wait(lk, [&]{ return stop || !tasks.empty(); });
            if (stop && tasks.empty()) return;
            task = std::move(tasks.front()); tasks.pop();
          }
          task();
        }
      });
  }
  template<class F> void submit(F&& f) {
    { std::lock_guard lk(mtx); tasks.emplace(std::forward<F>(f)); }
    cv.notify_one();
  }
};
```

### Rust Rayon data parallelism
```rust
use rayon::prelude::*;

fn process_batch(items: &[Item]) -> Vec<Result> {
    items.par_iter()
        .filter(|i| i.valid())
        .map(|i| expensive_compute(i))
        .collect()
}
// auto: work-stealing across all cores
```

### Go goroutine + channel (fan-out / fan-in)
```go
func pipeline(input <-chan Job) <-chan Result {
    out := make(chan Result, 100)
    var wg sync.WaitGroup
    for i := 0; i < runtime.NumCPU(); i++ {
        wg.Add(1)
        go func() {
            defer wg.Done()
            for job := range input {
                out <- process(job)
            }
        }()
    }
    go func() { wg.Wait(); close(out) }()
    return out
}
```

### Lock-free SPSC ring buffer
```cpp
template<typename T, size_t N>
class SPSCQueue {
  alignas(64) std::atomic<size_t> head{0};
  alignas(64) std::atomic<size_t> tail{0};
  T buffer[N];
public:
  bool push(T v) {
    auto t = tail.load(std::memory_order_relaxed);
    auto next = (t + 1) % N;
    if (next == head.load(std::memory_order_acquire)) return false;
    buffer[t] = std::move(v);
    tail.store(next, std::memory_order_release);
    return true;
  }
};
```

### Game engine 3-thread architecture
```cpp
// Main thread: input + game logic
// Render thread: GPU command buffer
// IO thread: asset streaming
struct FrameSync {
  std::atomic<uint64_t> game_frame{0};
  std::atomic<uint64_t> render_frame{0};
  std::counting_semaphore<2> render_ready{0};
};
// double-buffer scene state to allow N+1 game tick parallel with N render
```

## 매 결정 기준
| 상황 | Approach |
|---|---|
| CPU-bound, divisible work | Rayon / OpenMP / TBB |
| IO-heavy (10k+ connections) | async/await (Tokio, asyncio, Node) |
| Real-time game loop | dedicated threads + lock-free queue |
| Mixed workload | task-based (TBB, GCD) |
| Simple parallel-for | thread pool + work queue |
| Distributed across machines | actor (Akka) or message queue |

**기본값**: task-based scheduler (TBB/Rayon/Tokio) — manual thread management 회피.

## 🔗 Graph
- 부모: [[Concurrent_Rendering]] · [[Distributed-Systems|Distributed_Computing]]
- 변형: [[Fiber_Architecture]]
- 응용: [[Game_Loop]] · [[V8 엔진 힙 아키텍처|V8 Heap Architecture]] · [[Browser]]
- Adjacent: [[SharedArrayBuffer_보안_이슈와_Cross-Origin_Isolation]] · [[Memory_Leaks]]

## 🤖 LLM 활용
**언제**: throughput-critical workload, multi-core utilization, real-time game/server, ML inference batching.
**언제 X**: simple sequential script, IO-light short-lived task, single-core embedded — 매 overhead 큼.

## ❌ 안티패턴
- **Shared mutable state without sync**: data race · UB.
- **Coarse global lock**: 매 single-thread보다 느림 (lock contention).
- **Thread per request (10k+)**: stack memory 폭발 — async 또는 thread pool 사용.
- **busy-wait spin**: CPU 100% 소모 — condition variable / semaphore.
- **False sharing**: 같은 cache line의 다른 atomic — alignas(64) cache padding.

## 🧪 검증 / 중복
- Verified (Herb Sutter "The Free Lunch Is Over" 2005, Intel TBB docs, Rust async book 2026).
- 신뢰도 A.

## 🕓 Changelog
| 날짜 | 변경 |
|---|---|
| 2026-05-08 | Phase 1 |
| 2026-05-10 | Manual cleanup — full content (thread models, patterns, decision matrix) |