Files
2nd/10_Wiki/Topics/Architecture/Multi-threaded Architecture.md
T
koriweb d8a80f6272 chore(wiki): dangling 링크 canonical 정규화 (768파일/1200건)
이름만 다른(표기 변형) [[위키링크]]를 대상 문서의 canonical 제목으로 치환해
끊겼던 1,200개 링크를 연결. 제목/파일명 정규화 일치만 적용하고 별칭 매칭은
과병합 위험으로 제외(애매성 가드). 원본은 _link_reconcile_backup/ 에 백업.
도구: Datacollect/scripts/link_reconcile_apply.mjs

Co-Authored-By: Claude Opus 4.8 <noreply@anthropic.com>
2026-06-08 12:24:15 +09:00

187 lines
6.1 KiB
Markdown

---
id: wiki-2026-0508-multi-threaded-architecture
title: Multi-threaded Architecture
category: 10_Wiki/Topics
status: verified
canonical_id: self
aliases: [Multithreading, Concurrent Architecture, MT Architecture]
duplicate_of: none
source_trust_level: A
confidence_score: 0.9
verification_status: applied
tags: [architecture, concurrency, threading, performance]
raw_sources: []
last_reinforced: 2026-05-10
github_commit: pending
tech_stack:
language: cpp
framework: stdthread-tbb-rayon
---
# Multi-threaded Architecture
## 매 한 줄
> **"매 work를 multiple threads에 분산하여 throughput · responsiveness를 동시에 확보."**. 1990s SMP era에서 출발하여 2026 현재 manycore (Apple M4 Max 16-core, AMD Threadripper 96-core), GPU offload, async/await coroutine model이 주류. Game engine · server · ML inference · browser engine 모두 multi-threaded 설계가 default.
## 매 핵심
### 매 thread model 종류
- **OS thread (1:1)**: pthread, std::thread — kernel-scheduled, expensive context switch.
- **Green thread / fiber**: Go goroutine, Java 21 virtual thread — userland scheduler, M:N mapping.
- **Coroutine / async task**: C++20 coroutine, Rust async, Kotlin coroutine — stackless, await-resume.
- **Task-based**: Intel TBB, .NET TPL, Apple GCD — work-stealing scheduler, no thread management.
### 매 architectural pattern
- **Producer-consumer**: bounded queue로 backpressure.
- **Pipeline**: stage별 thread, ring buffer로 연결 (LMAX Disruptor).
- **Fork-join**: divide & conquer, work-stealing.
- **Actor**: 매 message passing (Akka, Erlang, Pony) — no shared state.
- **Data parallelism**: SIMD + thread pool — Rayon `par_iter()`, OpenMP `#pragma omp parallel for`.
### 매 응용
1. Game engine — render thread + game thread + audio thread + IO thread.
2. Browser — process-per-tab + GPU process + utility processes.
3. Database — connection pool + worker threads + background flush.
## 💻 패턴
### Thread pool with work queue (C++20)
```cpp
#include <thread>
#include <queue>
#include <mutex>
#include <condition_variable>
#include <functional>
class ThreadPool {
std::vector<std::jthread> workers;
std::queue<std::function<void()>> tasks;
std::mutex mtx;
std::condition_variable cv;
bool stop = false;
public:
explicit ThreadPool(size_t n) {
for (size_t i = 0; i < n; ++i)
workers.emplace_back([this](std::stop_token st) {
while (!st.stop_requested()) {
std::function<void()> task;
{
std::unique_lock lk(mtx);
cv.wait(lk, [&]{ return stop || !tasks.empty(); });
if (stop && tasks.empty()) return;
task = std::move(tasks.front()); tasks.pop();
}
task();
}
});
}
template<class F> void submit(F&& f) {
{ std::lock_guard lk(mtx); tasks.emplace(std::forward<F>(f)); }
cv.notify_one();
}
};
```
### Rust Rayon data parallelism
```rust
use rayon::prelude::*;
fn process_batch(items: &[Item]) -> Vec<Result> {
items.par_iter()
.filter(|i| i.valid())
.map(|i| expensive_compute(i))
.collect()
}
// auto: work-stealing across all cores
```
### Go goroutine + channel (fan-out / fan-in)
```go
func pipeline(input <-chan Job) <-chan Result {
out := make(chan Result, 100)
var wg sync.WaitGroup
for i := 0; i < runtime.NumCPU(); i++ {
wg.Add(1)
go func() {
defer wg.Done()
for job := range input {
out <- process(job)
}
}()
}
go func() { wg.Wait(); close(out) }()
return out
}
```
### Lock-free SPSC ring buffer
```cpp
template<typename T, size_t N>
class SPSCQueue {
alignas(64) std::atomic<size_t> head{0};
alignas(64) std::atomic<size_t> tail{0};
T buffer[N];
public:
bool push(T v) {
auto t = tail.load(std::memory_order_relaxed);
auto next = (t + 1) % N;
if (next == head.load(std::memory_order_acquire)) return false;
buffer[t] = std::move(v);
tail.store(next, std::memory_order_release);
return true;
}
};
```
### Game engine 3-thread architecture
```cpp
// Main thread: input + game logic
// Render thread: GPU command buffer
// IO thread: asset streaming
struct FrameSync {
std::atomic<uint64_t> game_frame{0};
std::atomic<uint64_t> render_frame{0};
std::counting_semaphore<2> render_ready{0};
};
// double-buffer scene state to allow N+1 game tick parallel with N render
```
## 매 결정 기준
| 상황 | Approach |
|---|---|
| CPU-bound, divisible work | Rayon / OpenMP / TBB |
| IO-heavy (10k+ connections) | async/await (Tokio, asyncio, Node) |
| Real-time game loop | dedicated threads + lock-free queue |
| Mixed workload | task-based (TBB, GCD) |
| Simple parallel-for | thread pool + work queue |
| Distributed across machines | actor (Akka) or message queue |
**기본값**: task-based scheduler (TBB/Rayon/Tokio) — manual thread management 회피.
## 🔗 Graph
- 부모: [[Concurrent_Rendering]] · [[Distributed-Systems|Distributed_Computing]]
- 변형: [[Fiber_Architecture]]
- 응용: [[Game_Loop]] · [[V8 엔진 힙 아키텍처|V8 Heap Architecture]] · [[Browser]]
- Adjacent: [[SharedArrayBuffer_보안_이슈와_Cross-Origin_Isolation]] · [[Memory_Leaks]]
## 🤖 LLM 활용
**언제**: throughput-critical workload, multi-core utilization, real-time game/server, ML inference batching.
**언제 X**: simple sequential script, IO-light short-lived task, single-core embedded — 매 overhead 큼.
## ❌ 안티패턴
- **Shared mutable state without sync**: data race · UB.
- **Coarse global lock**: 매 single-thread보다 느림 (lock contention).
- **Thread per request (10k+)**: stack memory 폭발 — async 또는 thread pool 사용.
- **busy-wait spin**: CPU 100% 소모 — condition variable / semaphore.
- **False sharing**: 같은 cache line의 다른 atomic — alignas(64) cache padding.
## 🧪 검증 / 중복
- Verified (Herb Sutter "The Free Lunch Is Over" 2005, Intel TBB docs, Rust async book 2026).
- 신뢰도 A.
## 🕓 Changelog
| 날짜 | 변경 |
|---|---|
| 2026-05-08 | Phase 1 |
| 2026-05-10 | Manual cleanup — full content (thread models, patterns, decision matrix) |