Files
2nd/10_Wiki/Topics/Architecture/Multi-threaded Architecture.md
T
koriweb d8a80f6272 chore(wiki): dangling 링크 canonical 정규화 (768파일/1200건)
이름만 다른(표기 변형) [[위키링크]]를 대상 문서의 canonical 제목으로 치환해
끊겼던 1,200개 링크를 연결. 제목/파일명 정규화 일치만 적용하고 별칭 매칭은
과병합 위험으로 제외(애매성 가드). 원본은 _link_reconcile_backup/ 에 백업.
도구: Datacollect/scripts/link_reconcile_apply.mjs

Co-Authored-By: Claude Opus 4.8 <noreply@anthropic.com>
2026-06-08 12:24:15 +09:00

6.1 KiB

id, title, category, status, canonical_id, aliases, duplicate_of, source_trust_level, confidence_score, verification_status, tags, raw_sources, last_reinforced, github_commit, tech_stack
id title category status canonical_id aliases duplicate_of source_trust_level confidence_score verification_status tags raw_sources last_reinforced github_commit tech_stack
wiki-2026-0508-multi-threaded-architecture Multi-threaded Architecture 10_Wiki/Topics verified self
Multithreading
Concurrent Architecture
MT Architecture
none A 0.9 applied
architecture
concurrency
threading
performance
2026-05-10 pending
language framework
cpp stdthread-tbb-rayon

Multi-threaded Architecture

매 한 줄

"매 work를 multiple threads에 분산하여 throughput · responsiveness를 동시에 확보.". 1990s SMP era에서 출발하여 2026 현재 manycore (Apple M4 Max 16-core, AMD Threadripper 96-core), GPU offload, async/await coroutine model이 주류. Game engine · server · ML inference · browser engine 모두 multi-threaded 설계가 default.

매 핵심

매 thread model 종류

  • OS thread (1:1): pthread, std::thread — kernel-scheduled, expensive context switch.
  • Green thread / fiber: Go goroutine, Java 21 virtual thread — userland scheduler, M:N mapping.
  • Coroutine / async task: C++20 coroutine, Rust async, Kotlin coroutine — stackless, await-resume.
  • Task-based: Intel TBB, .NET TPL, Apple GCD — work-stealing scheduler, no thread management.

매 architectural pattern

  • Producer-consumer: bounded queue로 backpressure.
  • Pipeline: stage별 thread, ring buffer로 연결 (LMAX Disruptor).
  • Fork-join: divide & conquer, work-stealing.
  • Actor: 매 message passing (Akka, Erlang, Pony) — no shared state.
  • Data parallelism: SIMD + thread pool — Rayon par_iter(), OpenMP #pragma omp parallel for.

매 응용

  1. Game engine — render thread + game thread + audio thread + IO thread.
  2. Browser — process-per-tab + GPU process + utility processes.
  3. Database — connection pool + worker threads + background flush.

💻 패턴

Thread pool with work queue (C++20)

#include <thread>
#include <queue>
#include <mutex>
#include <condition_variable>
#include <functional>

class ThreadPool {
  std::vector<std::jthread> workers;
  std::queue<std::function<void()>> tasks;
  std::mutex mtx;
  std::condition_variable cv;
  bool stop = false;
public:
  explicit ThreadPool(size_t n) {
    for (size_t i = 0; i < n; ++i)
      workers.emplace_back([this](std::stop_token st) {
        while (!st.stop_requested()) {
          std::function<void()> task;
          {
            std::unique_lock lk(mtx);
            cv.wait(lk, [&]{ return stop || !tasks.empty(); });
            if (stop && tasks.empty()) return;
            task = std::move(tasks.front()); tasks.pop();
          }
          task();
        }
      });
  }
  template<class F> void submit(F&& f) {
    { std::lock_guard lk(mtx); tasks.emplace(std::forward<F>(f)); }
    cv.notify_one();
  }
};

Rust Rayon data parallelism

use rayon::prelude::*;

fn process_batch(items: &[Item]) -> Vec<Result> {
    items.par_iter()
        .filter(|i| i.valid())
        .map(|i| expensive_compute(i))
        .collect()
}
// auto: work-stealing across all cores

Go goroutine + channel (fan-out / fan-in)

func pipeline(input <-chan Job) <-chan Result {
    out := make(chan Result, 100)
    var wg sync.WaitGroup
    for i := 0; i < runtime.NumCPU(); i++ {
        wg.Add(1)
        go func() {
            defer wg.Done()
            for job := range input {
                out <- process(job)
            }
        }()
    }
    go func() { wg.Wait(); close(out) }()
    return out
}

Lock-free SPSC ring buffer

template<typename T, size_t N>
class SPSCQueue {
  alignas(64) std::atomic<size_t> head{0};
  alignas(64) std::atomic<size_t> tail{0};
  T buffer[N];
public:
  bool push(T v) {
    auto t = tail.load(std::memory_order_relaxed);
    auto next = (t + 1) % N;
    if (next == head.load(std::memory_order_acquire)) return false;
    buffer[t] = std::move(v);
    tail.store(next, std::memory_order_release);
    return true;
  }
};

Game engine 3-thread architecture

// Main thread: input + game logic
// Render thread: GPU command buffer
// IO thread: asset streaming
struct FrameSync {
  std::atomic<uint64_t> game_frame{0};
  std::atomic<uint64_t> render_frame{0};
  std::counting_semaphore<2> render_ready{0};
};
// double-buffer scene state to allow N+1 game tick parallel with N render

매 결정 기준

상황 Approach
CPU-bound, divisible work Rayon / OpenMP / TBB
IO-heavy (10k+ connections) async/await (Tokio, asyncio, Node)
Real-time game loop dedicated threads + lock-free queue
Mixed workload task-based (TBB, GCD)
Simple parallel-for thread pool + work queue
Distributed across machines actor (Akka) or message queue

기본값: task-based scheduler (TBB/Rayon/Tokio) — manual thread management 회피.

🔗 Graph

🤖 LLM 활용

언제: throughput-critical workload, multi-core utilization, real-time game/server, ML inference batching. 언제 X: simple sequential script, IO-light short-lived task, single-core embedded — 매 overhead 큼.

안티패턴

  • Shared mutable state without sync: data race · UB.
  • Coarse global lock: 매 single-thread보다 느림 (lock contention).
  • Thread per request (10k+): stack memory 폭발 — async 또는 thread pool 사용.
  • busy-wait spin: CPU 100% 소모 — condition variable / semaphore.
  • False sharing: 같은 cache line의 다른 atomic — alignas(64) cache padding.

🧪 검증 / 중복

  • Verified (Herb Sutter "The Free Lunch Is Over" 2005, Intel TBB docs, Rust async book 2026).
  • 신뢰도 A.

🕓 Changelog

날짜 변경
2026-05-08 Phase 1
2026-05-10 Manual cleanup — full content (thread models, patterns, decision matrix)