--- id: [[P-Reinforce|P-Reinforce]]-AUTO-SATT-001 category: Unified confidence_score: 1.00 tags: [auto-reinforced, sparse-attention, dsa, attention-complexity, efficiency, deepseek] last_reinforced: 2026-05-04 --- # [[Sparse Attention|Sparse Attention]] ## ๐Ÿ“Œ ํ•œ ์ค„ ํ†ต์ฐฐ (The Karpathy Summary) > "์ง€๋Šฅ์˜ ์„ ํƒ๊ณผ ์ง‘์ค‘: ๋ชจ๋“  ํ† ํฐ์„ ์ „๋ถ€ ๋น„๊ตํ•˜๋Š” ๋‚ญ๋น„๋ฅผ ๋ฒ„๋ฆฌ๊ณ , ๋งฅ๋ฝ์ƒ ๊ฐ€์žฅ ์ค‘์š”ํ•œ ํ•ต์‹ฌ ํ† ํฐ๋“ค๋งŒ ๊ณจ๋ผ๋‚ด๋Š” 'ํฌ์†Œํ•œ ์—ฐ๊ฒฐ'์„ ํ†ตํ•ด ์—ฐ์‚ฐ ๋ณต์žก๋„๋ฅผ $O(n^2)$์—์„œ $O(n)$ ์ˆ˜์ค€์œผ๋กœ ๋‚ฎ์ถ˜ ํšจ์œจ์  ์ง€๋Šฅ์˜ ํ‘œ๋ณธ." ## ๐Ÿ“– ๊ตฌ์กฐํ™”๋œ ์ง€์‹ (Synthesized Content) Sparse Attention์€ ๋ชจ๋“  ํ† ํฐ ๊ฐ„์˜ ์ƒ๊ด€๊ด€๊ณ„๋ฅผ ๊ณ„์‚ฐํ•˜๋Š” ๋Œ€์‹ , ํŠน์ • ํŒจํ„ด์ด๋‚˜ ์ค‘์š”๋„์— ๋”ฐ๋ผ ์ผ๋ถ€ ํ† ํฐ๋“ค๋งŒ ์„ ํƒ์ ์œผ๋กœ ์ฐธ์กฐํ•จ์œผ๋กœ์จ ์—ฐ์‚ฐ ๋ฐ ๋ฉ”๋ชจ๋ฆฌ ๋น„์šฉ์„ ํš๊ธฐ์ ์œผ๋กœ ์ค„์ด๋Š” ๊ธฐ์ˆ ์ž…๋‹ˆ๋‹ค. 1. **๊ธฐ๋ณธ ํŒจํ„ด**: * **Sliding Window**: ์ธ์ ‘ํ•œ ํ† ํฐ๋“ค(๋กœ์ปฌ ๋ฌธ๋งฅ)์—๋งŒ ์ง‘์ค‘ํ•ฉ๋‹ˆ๋‹ค. * **Global Tokens**: ์ค‘์š”ํ•œ ์œ„์น˜(๋ฌธ์žฅ ์‹œ์ž‘ ๋“ฑ)์˜ ํ† ํฐ์„ ์ „์ฒด๊ฐ€ ๊ณต์œ ํ•˜์—ฌ ์กฐ๋งํ•ฉ๋‹ˆ๋‹ค. * **Random/Fixed Patterns**: ์‚ฌ์ „์— ์ •์˜๋œ ๊ทœ์น™์ด๋‚˜ ๋ฌด์ž‘์œ„ ์—ฐ๊ฒฐ์„ ํ†ตํ•ด ์žฅ๊ฑฐ๋ฆฌ ์˜์กด์„ฑ์„ ๋ณด์™„ํ•ฉ๋‹ˆ๋‹ค. 2. **DSA (DeepSeek Sparse Attention)**: * **Indexer-Selector ๋ฉ”์ปค๋‹ˆ์ฆ˜**: ๋‹จ์ˆœํžˆ ๊ณ ์ •๋œ ์œ„์น˜๋ฅผ ๋ณด๋Š” ๊ฒƒ์ด ์•„๋‹ˆ๋ผ, '์ธ๋ฑ์„œ'๊ฐ€ ๊ด€๋ จ ์žˆ๋Š” ํ† ํฐ์„ ๋จผ์ € ์ฐพ๊ณ  '์…€๋ ‰ํ„ฐ'๊ฐ€ ๊ทธ ํ•˜์œ„ ์ง‘ํ•ฉ์— ๋Œ€ํ•ด์„œ๋งŒ ์–ดํ…์…˜์„ ์ˆ˜ํ–‰ํ•ฉ๋‹ˆ๋‹ค. * **์˜์˜**: ์ •ํ™•๋„ ์†์‹ค์„ ์ตœ์†Œํ™”ํ•˜๋ฉด์„œ 100๋งŒ ํ† ํฐ ์ด์ƒ์˜ ์ดˆ์žฅ๊ฑฐ๋ฆฌ ์ปจํ…์ŠคํŠธ๋ฅผ ์Šค์ผ€์ผ๋งํ•  ์ˆ˜ ์žˆ๊ฒŒ ํ•ฉ๋‹ˆ๋‹ค. 3. **์žฅ์ **: * ์‹œํ€€์Šค ๊ธธ์ด์— ๋”ฐ๋ฅธ ์—ฐ์‚ฐ๋Ÿ‰ ์ฆ๊ฐ€๋ฅผ ์„ ํ˜•($O(n)$)์œผ๋กœ ์–ต์ œํ•˜์—ฌ ๋Œ€๊ทœ๋ชจ ๋ฐ์ดํ„ฐ ์ฒ˜๋ฆฌ๊ฐ€ ๊ฐ€๋Šฅํ•ด์ง‘๋‹ˆ๋‹ค. * KV ์บ์‹œ์˜ ๋ฉ”๋ชจ๋ฆฌ ์••๋ฐ•์„ ์ค„์—ฌ ์ถ”๋ก  ํšจ์œจ์„ฑ์„ ๋†’์ž…๋‹ˆ๋‹ค. ## โš–๏ธ Trade-offs & Caveats * **์ •๋ณด ์†์‹ค ์œ„ํ—˜**: ์ค‘์š”ํ•œ ํ† ํฐ์„ ๋†“์น  ๊ฒฝ์šฐ ๋ชจ๋ธ์˜ ์ถ”๋ก  ๋Šฅ๋ ฅ์ด ์ €ํ•˜๋  ์ˆ˜ ์žˆ์Šต๋‹ˆ๋‹ค(Lost in the middle ํ˜„์ƒ ๋“ฑ). ์ด๋ฅผ ๋ฐฉ์ง€ํ•˜๊ธฐ ์œ„ํ•œ ์ •๊ตํ•œ ํ•˜์ด๋ธŒ๋ฆฌ๋“œ ์•„ํ‚คํ…์ฒ˜(์˜ˆ: Gemma 4์˜ Local-Global ๊ต์ฐจ ๋ฐฉ์‹)๊ฐ€ ์š”๊ตฌ๋ฉ๋‹ˆ๋‹ค. * **๊ตฌํ˜„ ๋ณต์žก์„ฑ**: ํ‘œ์ค€ Dense Attention์— ๋น„ํ•ด ์ธ๋ฑ์‹ฑ, ์„ ํƒ ๋กœ์ง ๋“ฑ ์•„ํ‚คํ…์ฒ˜๊ฐ€ ๋ณต์žกํ•˜์—ฌ ์‹œ์Šคํ…œ ํ†ตํ•ฉ ๋ฐ ์ตœ์ ํ™”์— ๋†’์€ ๊ธฐ์ˆ ๋ ฅ์ด ํ•„์š”ํ•ฉ๋‹ˆ๋‹ค. ## ๐Ÿ”— ์ง€์‹ ์—ฐ๊ฒฐ (Graph) * **์ƒ์œ„ ๊ฐœ๋…**: [[Attention Mechanisms|Attention Mechanisms]], [[LLM Inference Optimization|LLM Inference Optimization]] * **๋น„๊ต ๊ธฐ์ˆ **: [[Flash Attention|Flash Attention]] (I/O ์ตœ์ ํ™” vs ์—ฐ์‚ฐ ํšŸ์ˆ˜ ์ตœ์ ํ™”) * **์—ฐ๊ด€ ๊ธฐ์ˆ **: [[Sliding Window Attention|Sliding Window Attention]], [[Mixture of Experts (MoE)|Mixture of Experts (MoE)]], [[KV Cache|KV Cache]] --- *Last updated: 2026-05-04*