d8a80f6272
이름만 다른(표기 변형) [[위키링크]]를 대상 문서의 canonical 제목으로 치환해 끊겼던 1,200개 링크를 연결. 제목/파일명 정규화 일치만 적용하고 별칭 매칭은 과병합 위험으로 제외(애매성 가드). 원본은 _link_reconcile_backup/ 에 백업. 도구: Datacollect/scripts/link_reconcile_apply.mjs Co-Authored-By: Claude Opus 4.8 <noreply@anthropic.com>
187 lines
6.1 KiB
Markdown
187 lines
6.1 KiB
Markdown
---
|
|
id: wiki-2026-0508-multi-threaded-architecture
|
|
title: Multi-threaded Architecture
|
|
category: 10_Wiki/Topics
|
|
status: verified
|
|
canonical_id: self
|
|
aliases: [Multithreading, Concurrent Architecture, MT Architecture]
|
|
duplicate_of: none
|
|
source_trust_level: A
|
|
confidence_score: 0.9
|
|
verification_status: applied
|
|
tags: [architecture, concurrency, threading, performance]
|
|
raw_sources: []
|
|
last_reinforced: 2026-05-10
|
|
github_commit: pending
|
|
tech_stack:
|
|
language: cpp
|
|
framework: stdthread-tbb-rayon
|
|
---
|
|
|
|
# Multi-threaded Architecture
|
|
|
|
## 매 한 줄
|
|
> **"매 work를 multiple threads에 분산하여 throughput · responsiveness를 동시에 확보."**. 1990s SMP era에서 출발하여 2026 현재 manycore (Apple M4 Max 16-core, AMD Threadripper 96-core), GPU offload, async/await coroutine model이 주류. Game engine · server · ML inference · browser engine 모두 multi-threaded 설계가 default.
|
|
|
|
## 매 핵심
|
|
|
|
### 매 thread model 종류
|
|
- **OS thread (1:1)**: pthread, std::thread — kernel-scheduled, expensive context switch.
|
|
- **Green thread / fiber**: Go goroutine, Java 21 virtual thread — userland scheduler, M:N mapping.
|
|
- **Coroutine / async task**: C++20 coroutine, Rust async, Kotlin coroutine — stackless, await-resume.
|
|
- **Task-based**: Intel TBB, .NET TPL, Apple GCD — work-stealing scheduler, no thread management.
|
|
|
|
### 매 architectural pattern
|
|
- **Producer-consumer**: bounded queue로 backpressure.
|
|
- **Pipeline**: stage별 thread, ring buffer로 연결 (LMAX Disruptor).
|
|
- **Fork-join**: divide & conquer, work-stealing.
|
|
- **Actor**: 매 message passing (Akka, Erlang, Pony) — no shared state.
|
|
- **Data parallelism**: SIMD + thread pool — Rayon `par_iter()`, OpenMP `#pragma omp parallel for`.
|
|
|
|
### 매 응용
|
|
1. Game engine — render thread + game thread + audio thread + IO thread.
|
|
2. Browser — process-per-tab + GPU process + utility processes.
|
|
3. Database — connection pool + worker threads + background flush.
|
|
|
|
## 💻 패턴
|
|
|
|
### Thread pool with work queue (C++20)
|
|
```cpp
|
|
#include <thread>
|
|
#include <queue>
|
|
#include <mutex>
|
|
#include <condition_variable>
|
|
#include <functional>
|
|
|
|
class ThreadPool {
|
|
std::vector<std::jthread> workers;
|
|
std::queue<std::function<void()>> tasks;
|
|
std::mutex mtx;
|
|
std::condition_variable cv;
|
|
bool stop = false;
|
|
public:
|
|
explicit ThreadPool(size_t n) {
|
|
for (size_t i = 0; i < n; ++i)
|
|
workers.emplace_back([this](std::stop_token st) {
|
|
while (!st.stop_requested()) {
|
|
std::function<void()> task;
|
|
{
|
|
std::unique_lock lk(mtx);
|
|
cv.wait(lk, [&]{ return stop || !tasks.empty(); });
|
|
if (stop && tasks.empty()) return;
|
|
task = std::move(tasks.front()); tasks.pop();
|
|
}
|
|
task();
|
|
}
|
|
});
|
|
}
|
|
template<class F> void submit(F&& f) {
|
|
{ std::lock_guard lk(mtx); tasks.emplace(std::forward<F>(f)); }
|
|
cv.notify_one();
|
|
}
|
|
};
|
|
```
|
|
|
|
### Rust Rayon data parallelism
|
|
```rust
|
|
use rayon::prelude::*;
|
|
|
|
fn process_batch(items: &[Item]) -> Vec<Result> {
|
|
items.par_iter()
|
|
.filter(|i| i.valid())
|
|
.map(|i| expensive_compute(i))
|
|
.collect()
|
|
}
|
|
// auto: work-stealing across all cores
|
|
```
|
|
|
|
### Go goroutine + channel (fan-out / fan-in)
|
|
```go
|
|
func pipeline(input <-chan Job) <-chan Result {
|
|
out := make(chan Result, 100)
|
|
var wg sync.WaitGroup
|
|
for i := 0; i < runtime.NumCPU(); i++ {
|
|
wg.Add(1)
|
|
go func() {
|
|
defer wg.Done()
|
|
for job := range input {
|
|
out <- process(job)
|
|
}
|
|
}()
|
|
}
|
|
go func() { wg.Wait(); close(out) }()
|
|
return out
|
|
}
|
|
```
|
|
|
|
### Lock-free SPSC ring buffer
|
|
```cpp
|
|
template<typename T, size_t N>
|
|
class SPSCQueue {
|
|
alignas(64) std::atomic<size_t> head{0};
|
|
alignas(64) std::atomic<size_t> tail{0};
|
|
T buffer[N];
|
|
public:
|
|
bool push(T v) {
|
|
auto t = tail.load(std::memory_order_relaxed);
|
|
auto next = (t + 1) % N;
|
|
if (next == head.load(std::memory_order_acquire)) return false;
|
|
buffer[t] = std::move(v);
|
|
tail.store(next, std::memory_order_release);
|
|
return true;
|
|
}
|
|
};
|
|
```
|
|
|
|
### Game engine 3-thread architecture
|
|
```cpp
|
|
// Main thread: input + game logic
|
|
// Render thread: GPU command buffer
|
|
// IO thread: asset streaming
|
|
struct FrameSync {
|
|
std::atomic<uint64_t> game_frame{0};
|
|
std::atomic<uint64_t> render_frame{0};
|
|
std::counting_semaphore<2> render_ready{0};
|
|
};
|
|
// double-buffer scene state to allow N+1 game tick parallel with N render
|
|
```
|
|
|
|
## 매 결정 기준
|
|
| 상황 | Approach |
|
|
|---|---|
|
|
| CPU-bound, divisible work | Rayon / OpenMP / TBB |
|
|
| IO-heavy (10k+ connections) | async/await (Tokio, asyncio, Node) |
|
|
| Real-time game loop | dedicated threads + lock-free queue |
|
|
| Mixed workload | task-based (TBB, GCD) |
|
|
| Simple parallel-for | thread pool + work queue |
|
|
| Distributed across machines | actor (Akka) or message queue |
|
|
|
|
**기본값**: task-based scheduler (TBB/Rayon/Tokio) — manual thread management 회피.
|
|
|
|
## 🔗 Graph
|
|
- 부모: [[Concurrent_Rendering]] · [[Distributed-Systems|Distributed_Computing]]
|
|
- 변형: [[Fiber_Architecture]]
|
|
- 응용: [[Game_Loop]] · [[V8 엔진 힙 아키텍처|V8 Heap Architecture]] · [[Browser]]
|
|
- Adjacent: [[SharedArrayBuffer_보안_이슈와_Cross-Origin_Isolation]] · [[Memory_Leaks]]
|
|
|
|
## 🤖 LLM 활용
|
|
**언제**: throughput-critical workload, multi-core utilization, real-time game/server, ML inference batching.
|
|
**언제 X**: simple sequential script, IO-light short-lived task, single-core embedded — 매 overhead 큼.
|
|
|
|
## ❌ 안티패턴
|
|
- **Shared mutable state without sync**: data race · UB.
|
|
- **Coarse global lock**: 매 single-thread보다 느림 (lock contention).
|
|
- **Thread per request (10k+)**: stack memory 폭발 — async 또는 thread pool 사용.
|
|
- **busy-wait spin**: CPU 100% 소모 — condition variable / semaphore.
|
|
- **False sharing**: 같은 cache line의 다른 atomic — alignas(64) cache padding.
|
|
|
|
## 🧪 검증 / 중복
|
|
- Verified (Herb Sutter "The Free Lunch Is Over" 2005, Intel TBB docs, Rust async book 2026).
|
|
- 신뢰도 A.
|
|
|
|
## 🕓 Changelog
|
|
| 날짜 | 변경 |
|
|
|---|---|
|
|
| 2026-05-08 | Phase 1 |
|
|
| 2026-05-10 | Manual cleanup — full content (thread models, patterns, decision matrix) |
|