--- id: [[P-Reinforce|P-Reinforce]]-AUTO-MOES-001 category: Unified confidence_score: 1.00 tags: [auto-reinforced, moe, mixture-of-experts, sparse-architecture, routing, compute-efficiency] last_reinforced: 2026-05-04 --- # [[Mixture of Experts (MoE) & Sparse Architectures|Mixture of Experts (MoE) & Sparse Architectures]] ## ๐Ÿ“Œ ํ•œ ์ค„ ํ†ต์ฐฐ (The Karpathy Summary) > "์ง€๋Šฅ์˜ ๋ถ„์—…ํ™”: ๊ฑฐ๋Œ€ํ•œ ์ง€์‹์„ ๊ฐ€์ง„ ์ˆ˜๋งŽ์€ ์ „๋ฌธ๊ฐ€๋“ค์„ ๋ชจ๋ธ ์•ˆ์— ๋ฐฐ์น˜ํ•˜๊ณ , ๋งค ์ˆœ๊ฐ„ ํ•„์š”ํ•œ ์†Œ์ˆ˜์˜ ์ „๋ฌธ๊ฐ€๋งŒ ํ™œ์„ฑํ™”ํ•จ์œผ๋กœ์จ ๋ชจ๋ธ์˜ ํฌ๊ธฐ๋Š” ํ‚ค์šฐ๋˜ ์—ฐ์‚ฐ ๋น„์šฉ์€ ๋‚ฎ๊ฒŒ ์œ ์ง€ํ•˜๋Š” ๊ฒฝ์ œ์  ์ง€๋Šฅ ์„ค๊ณ„." ## ๐Ÿ“– ๊ตฌ์กฐํ™”๋œ ์ง€์‹ (Synthesized Content) MoE(Mixture of Experts)๋Š” ๋ชจ๋ธ์˜ ์ „์ฒด ํŒŒ๋ผ๋ฏธํ„ฐ ์ค‘ ์ผ๋ถ€๋งŒ ์—ฐ์‚ฐ์— ์ฐธ์—ฌ์‹œํ‚ค๋Š” ํฌ์†Œ(Sparse) ๋ชจ๋ธ ์„ค๊ณ„ ๋ฐฉ์‹์ž…๋‹ˆ๋‹ค. 1. **ํ•ต์‹ฌ ์›๋ฆฌ**: * **Experts (์ „๋ฌธ๊ฐ€)**: ๋ชจ๋ธ ๋‚ด๋ถ€์˜ FFN ๊ณ„์ธต์„ ์—ฌ๋Ÿฌ ๊ฐœ์˜ ๋…๋ฆฝ๋œ '์ „๋ฌธ๊ฐ€' ๋„คํŠธ์›Œํฌ๋กœ ๋‚˜๋ˆ•๋‹ˆ๋‹ค. * **Router (๋ผ์šฐํ„ฐ)**: ์ž…๋ ฅ๋œ ํ† ํฐ๋ณ„๋กœ ๊ฐ€์žฅ ์ ํ•ฉํ•œ ์ „๋ฌธ๊ฐ€(๋ณดํ†ต ์ƒ์œ„ 1~2๊ฐœ)๋ฅผ ์„ ํƒํ•˜์—ฌ ์—ฐ์‚ฐ์„ ๋ณด๋ƒ…๋‹ˆ๋‹ค. * **Shared Experts (๊ณต์œ  ์ „๋ฌธ๊ฐ€)**: ํŠน์ • ๋ชจ๋ธ(์˜ˆ: DeepSeek)์€ ๋ชจ๋“  ํ† ํฐ์ด ๊ณตํ†ต์ ์œผ๋กœ ๊ฑฐ์น˜๋Š” '๊ณต์œ  ์ „๋ฌธ๊ฐ€'๋ฅผ ๋‘์–ด ์ง€์‹์˜ ๊ธฐ์ดˆ๋ฅผ ๋‹ค์ง‘๋‹ˆ๋‹ค. 2. **์ฃผ์š” ์žฅ์ **: * **์—ฐ์‚ฐ ํšจ์œจ์„ฑ**: ์ „์ฒด ํŒŒ๋ผ๋ฏธํ„ฐ๊ฐ€ 1์กฐ ๊ฐœ(1T)๋ผ๋„ ์ถ”๋ก  ์‹œ์—๋Š” ์ˆ˜์‹ญ์–ต ๊ฐœ๋งŒ ์‚ฌ์šฉํ•˜๋ฏ€๋กœ ์†๋„๊ฐ€ ๋น ๋ฆ…๋‹ˆ๋‹ค. * **ํ™•์žฅ์„ฑ**: ๋™์ผํ•œ ์ปดํ“จํŒ… ์ž์›์œผ๋กœ ๋” ๋ฐฉ๋Œ€ํ•œ ์ง€์‹์„ ๋‹ด์€ ๋ชจ๋ธ์„ ๊ตฌ์ถ•ํ•  ์ˆ˜ ์žˆ์Šต๋‹ˆ๋‹ค. 3. **๋Œ€ํ‘œ์  ๋ชจ๋ธ**: * GPT-4 (์•Œ๋ ค์ง„ ๋ฐ”์— ๋”ฐ๋ฅด๋ฉด MoE ์•„ํ‚คํ…์ฒ˜), Mixtral 8x7B, DeepSeek-V3. ## โš–๏ธ Trade-offs & Caveats * **VRAM ์ ์œ **: ์ถ”๋ก  ์—ฐ์‚ฐ์€ ์ ๊ฒŒ ํ•˜์ง€๋งŒ, ๋ชจ๋“  ์ „๋ฌธ๊ฐ€์˜ ๊ฐ€์ค‘์น˜๋ฅผ ๋ฉ”๋ชจ๋ฆฌ์— ์˜ฌ๋ ค๋‘์–ด์•ผ ํ•˜๋ฏ€๋กœ ์š”๊ตฌ๋˜๋Š” VRAM ์šฉ๋Ÿ‰์€ ๋ชจ๋ธ์˜ ์ „์ฒด ํฌ๊ธฐ๋งŒํผ ํฝ๋‹ˆ๋‹ค. * **์ „๋ฌธ๊ฐ€ ๋ถ•๊ดด (Expert Collapse)**: ๋ผ์šฐํ„ฐ๊ฐ€ ํŠน์ • ์ „๋ฌธ๊ฐ€์—๊ฒŒ๋งŒ ์ผ์„ ๋ชฐ์•„์ฃผ์–ด ๋‚˜๋จธ์ง€ ์ „๋ฌธ๊ฐ€๋“ค์ด ํ•™์Šต๋˜์ง€ ์•Š๋Š” ํ˜„์ƒ์ด ๋ฐœ์ƒํ•  ์ˆ˜ ์žˆ์Šต๋‹ˆ๋‹ค. ์ด๋ฅผ ๋ฐฉ์ง€ํ•˜๊ธฐ ์œ„ํ•œ ๋ถ€ํ•˜ ๋ถ„์‚ฐ(Load Balancing) ๊ธฐ์ˆ ์ด ํ•„์ˆ˜์ ์ž…๋‹ˆ๋‹ค. * **๋ฐฐํฌ ๋ณต์žก์„ฑ**: ์ „๋ฌธ๊ฐ€๋“ค์„ ์—ฌ๋Ÿฌ GPU์— ๋ถ„์‚ฐ ๋ฐฐ์น˜ํ•˜๊ณ  ๋™๊ธฐํ™”ํ•˜๋Š” ๊ณผ์ •์ด ์ผ๋ฐ˜ ๋ชจ๋ธ๋ณด๋‹ค ํ›จ์”ฌ ๊นŒ๋‹ค๋กญ์Šต๋‹ˆ๋‹ค. ## ๐Ÿ”— ์ง€์‹ ์—ฐ๊ฒฐ (Graph) * **๊ธฐ๋ฐ˜ ๊ตฌ์กฐ**: [[Transformer Architecture|Transformer Architecture]] * **์—ฐ๊ด€ ๊ธฐ์ˆ **: [[Routing Mechanism|Routing Mechanism]], [[Sparse Attention|Sparse Attention]] * **๊ฒฝ์Ÿ ๊ตฌ์กฐ**: Dense Models (Llama 3 ๋“ฑ) --- *Last updated: 2026-05-04*