--- id: wiki-2026-0508-sparse-attention title: Sparse Attention category: 10_Wiki/Topics status: needs_review canonical_id: self aliases: [P-Reinforce-AUTO-SATT-001] duplicate_of: none source_trust_level: A confidence_score: 1.0 tags: [auto-reinforced, sparse-attention, dsa, attention-complexity, efficiency, deepseek] raw_sources: [] last_reinforced: 2026-05-04 github_commit: pending inferred_by: Claude Opus 4.7 (auto-normalize 2026-05-08) tech_stack: language: unspecified framework: unspecified --- # [[Sparse Attention|Sparse Attention]] ## ๐Ÿ“Œ ํ•œ ์ค„ ํ†ต์ฐฐ (The Karpathy Summary) > "์ง€๋Šฅ์˜ ์„ ํƒ๊ณผ ์ง‘์ค‘: ๋ชจ๋“  ํ† ํฐ์„ ์ „๋ถ€ ๋น„๊ตํ•˜๋Š” ๋‚ญ๋น„๋ฅผ ๋ฒ„๋ฆฌ๊ณ , ๋งฅ๋ฝ์ƒ ๊ฐ€์žฅ ์ค‘์š”ํ•œ ํ•ต์‹ฌ ํ† ํฐ๋“ค๋งŒ ๊ณจ๋ผ๋‚ด๋Š” 'ํฌ์†Œํ•œ ์—ฐ๊ฒฐ'์„ ํ†ตํ•ด ์—ฐ์‚ฐ ๋ณต์žก๋„๋ฅผ $O(n^2)$์—์„œ $O(n)$ ์ˆ˜์ค€์œผ๋กœ ๋‚ฎ์ถ˜ ํšจ์œจ์  ์ง€๋Šฅ์˜ ํ‘œ๋ณธ." ## ๐Ÿ“– ๊ตฌ์กฐํ™”๋œ ์ง€์‹ (Synthesized Content) Sparse Attention์€ ๋ชจ๋“  ํ† ํฐ ๊ฐ„์˜ ์ƒ๊ด€๊ด€๊ณ„๋ฅผ ๊ณ„์‚ฐํ•˜๋Š” ๋Œ€์‹ , ํŠน์ • ํŒจํ„ด์ด๋‚˜ ์ค‘์š”๋„์— ๋”ฐ๋ผ ์ผ๋ถ€ ํ† ํฐ๋“ค๋งŒ ์„ ํƒ์ ์œผ๋กœ ์ฐธ์กฐํ•จ์œผ๋กœ์จ ์—ฐ์‚ฐ ๋ฐ ๋ฉ”๋ชจ๋ฆฌ ๋น„์šฉ์„ ํš๊ธฐ์ ์œผ๋กœ ์ค„์ด๋Š” ๊ธฐ์ˆ ์ž…๋‹ˆ๋‹ค. 1. **๊ธฐ๋ณธ ํŒจํ„ด**: * **Sliding Window**: ์ธ์ ‘ํ•œ ํ† ํฐ๋“ค(๋กœ์ปฌ ๋ฌธ๋งฅ)์—๋งŒ ์ง‘์ค‘ํ•ฉ๋‹ˆ๋‹ค. * **Global Tokens**: ์ค‘์š”ํ•œ ์œ„์น˜(๋ฌธ์žฅ ์‹œ์ž‘ ๋“ฑ)์˜ ํ† ํฐ์„ ์ „์ฒด๊ฐ€ ๊ณต์œ ํ•˜์—ฌ ์กฐ๋งํ•ฉ๋‹ˆ๋‹ค. * **Random/Fixed Patterns**: ์‚ฌ์ „์— ์ •์˜๋œ ๊ทœ์น™์ด๋‚˜ ๋ฌด์ž‘์œ„ ์—ฐ๊ฒฐ์„ ํ†ตํ•ด ์žฅ๊ฑฐ๋ฆฌ ์˜์กด์„ฑ์„ ๋ณด์™„ํ•ฉ๋‹ˆ๋‹ค. 2. **DSA (DeepSeek Sparse Attention)**: * **Indexer-Selector ๋ฉ”์ปค๋‹ˆ์ฆ˜**: ๋‹จ์ˆœํžˆ ๊ณ ์ •๋œ ์œ„์น˜๋ฅผ ๋ณด๋Š” ๊ฒƒ์ด ์•„๋‹ˆ๋ผ, '์ธ๋ฑ์„œ'๊ฐ€ ๊ด€๋ จ ์žˆ๋Š” ํ† ํฐ์„ ๋จผ์ € ์ฐพ๊ณ  '์…€๋ ‰ํ„ฐ'๊ฐ€ ๊ทธ ํ•˜์œ„ ์ง‘ํ•ฉ์— ๋Œ€ํ•ด์„œ๋งŒ ์–ดํ…์…˜์„ ์ˆ˜ํ–‰ํ•ฉ๋‹ˆ๋‹ค. * **์˜์˜**: ์ •ํ™•๋„ ์†์‹ค์„ ์ตœ์†Œํ™”ํ•˜๋ฉด์„œ 100๋งŒ ํ† ํฐ ์ด์ƒ์˜ ์ดˆ์žฅ๊ฑฐ๋ฆฌ ์ปจํ…์ŠคํŠธ๋ฅผ ์Šค์ผ€์ผ๋งํ•  ์ˆ˜ ์žˆ๊ฒŒ ํ•ฉ๋‹ˆ๋‹ค. 3. **์žฅ์ **: * ์‹œํ€€์Šค ๊ธธ์ด์— ๋”ฐ๋ฅธ ์—ฐ์‚ฐ๋Ÿ‰ ์ฆ๊ฐ€๋ฅผ ์„ ํ˜•($O(n)$)์œผ๋กœ ์–ต์ œํ•˜์—ฌ ๋Œ€๊ทœ๋ชจ ๋ฐ์ดํ„ฐ ์ฒ˜๋ฆฌ๊ฐ€ ๊ฐ€๋Šฅํ•ด์ง‘๋‹ˆ๋‹ค. * KV ์บ์‹œ์˜ ๋ฉ”๋ชจ๋ฆฌ ์••๋ฐ•์„ ์ค„์—ฌ ์ถ”๋ก  ํšจ์œจ์„ฑ์„ ๋†’์ž…๋‹ˆ๋‹ค. ## โš ๏ธ ๋ชจ์ˆœ ๋ฐ ์—…๋ฐ์ดํŠธ (Contradictions & Updates) * **์ •๋ณด ์†์‹ค ์œ„ํ—˜**: ์ค‘์š”ํ•œ ํ† ํฐ์„ ๋†“์น  ๊ฒฝ์šฐ ๋ชจ๋ธ์˜ ์ถ”๋ก  ๋Šฅ๋ ฅ์ด ์ €ํ•˜๋  ์ˆ˜ ์žˆ์Šต๋‹ˆ๋‹ค(Lost in the middle ํ˜„์ƒ ๋“ฑ). ์ด๋ฅผ ๋ฐฉ์ง€ํ•˜๊ธฐ ์œ„ํ•œ ์ •๊ตํ•œ ํ•˜์ด๋ธŒ๋ฆฌ๋“œ ์•„ํ‚คํ…์ฒ˜(์˜ˆ: Gemma 4์˜ Local-Global ๊ต์ฐจ ๋ฐฉ์‹)๊ฐ€ ์š”๊ตฌ๋ฉ๋‹ˆ๋‹ค. * **๊ตฌํ˜„ ๋ณต์žก์„ฑ**: ํ‘œ์ค€ Dense Attention์— ๋น„ํ•ด ์ธ๋ฑ์‹ฑ, ์„ ํƒ ๋กœ์ง ๋“ฑ ์•„ํ‚คํ…์ฒ˜๊ฐ€ ๋ณต์žกํ•˜์—ฌ ์‹œ์Šคํ…œ ํ†ตํ•ฉ ๋ฐ ์ตœ์ ํ™”์— ๋†’์€ ๊ธฐ์ˆ ๋ ฅ์ด ํ•„์š”ํ•ฉ๋‹ˆ๋‹ค. ## ๐Ÿ”— ์ง€์‹ ์—ฐ๊ฒฐ (Graph) * **์ƒ์œ„ ๊ฐœ๋…**: [[Attention Mechanisms|Attention Mechanisms]], [[LLM Inference Optimization|LLM Inference Optimization]] * **๋น„๊ต ๊ธฐ์ˆ **: [[Flash Attention|Flash Attention]] (I/O ์ตœ์ ํ™” vs ์—ฐ์‚ฐ ํšŸ์ˆ˜ ์ตœ์ ํ™”) * **์—ฐ๊ด€ ๊ธฐ์ˆ **: [[Sliding Window Attention|Sliding Window Attention]], [[Mixture of Experts (MoE)|Mixture of Experts (MoE)]], [[KV Cache|KV Cache]] --- *Last updated: 2026-05-04* ## ๐Ÿค– LLM ํ™œ์šฉ ํžŒํŠธ (How to Use This Knowledge) **์–ธ์ œ ์ด ์ง€์‹์„ ์“ฐ๋Š”๊ฐ€:** - *(TODO)* **์–ธ์ œ ์“ฐ๋ฉด ์•ˆ ๋˜๋Š”๊ฐ€:** - *(TODO)* ## ๐Ÿงช ๊ฒ€์ฆ ์ƒํƒœ (Validation) - **์ •๋ณด ์ƒํƒœ:** needs_review - **์ถœ์ฒ˜ ์‹ ๋ขฐ๋„:** A - **๊ฒ€ํ†  ์ด์œ :** *(P-Reinforce Phase 1 ์ž๋™ ์ •๊ทœํ™”. ๋ณธ๋ฌธ ๊ฒ€์ฆ ํ•„์š”.)* ## ๐Ÿงฌ ์ค‘๋ณต ๊ฒ€์‚ฌ (Duplicate Check) - **๊ธฐ์กด ์œ ์‚ฌ ๋ฌธ์„œ:** *(TODO: ์ธ๋ฑ์„œ ํด๋Ÿฌ์Šคํ„ฐ ๋ฆฌํฌํŠธ ์ฐธ์กฐ)* - **์ฒ˜๋ฆฌ ๋ฐฉ์‹:** UPDATE (์ž๋™ ์ •๊ทœํ™”) - **์ฒ˜๋ฆฌ ์ด์œ :** Phase 1 ์ •๊ทœํ™” โ€” ์˜› ํ…œํ”Œ๋ฆฟ/๋ˆ„๋ฝ ํ•„๋“œ ๋ณด๊ฐ•. ## ๐Ÿ•“ ๋ณ€๊ฒฝ ์ด๋ ฅ (Changelog) | ๋‚ ์งœ | ๋ณ€๊ฒฝ ๋‚ด์šฉ | ์ฒ˜๋ฆฌ ๋ฐฉ์‹ | ์‹ ๋ขฐ๋„ | |------|-----------|-----------|--------| | 2026-05-08 | P-Reinforce Phase 1 ์ •๊ทœํ™” (frontmatter + ํ—ค๋” ํ‘œ์ค€ํ™”) | UPDATE | A | ## ๐Ÿ’ป ์ฝ”๋“œ ํŒจํ„ด (Code Patterns) **ํŒจํ„ด 1:** *(TODO: ์ด ํ”„๋กœ์ ํŠธ ์ปจ๋ฒค์…˜ ๋ฐ˜์˜ํ•œ ๊ตฌ์กฐ ์Šค์ผˆ๋ ˆํ†ค)* ```text # TODO ``` ## ๐Ÿค” ์˜์‚ฌ๊ฒฐ์ • ๊ธฐ์ค€ (Decision Criteria) **์„ ํƒ A๋ฅผ ์จ์•ผ ํ•  ๋•Œ:** - *(TODO)* **์„ ํƒ B๋ฅผ ์จ์•ผ ํ•  ๋•Œ:** - *(TODO)* **๊ธฐ๋ณธ๊ฐ’:** > *(TODO)* ## โŒ ์•ˆํ‹ฐํŒจํ„ด (Anti-Patterns) - **[์•ˆํ‹ฐํŒจํ„ด]:** *(TODO: ๋ฌด์—‡์„ ํ•˜๋ฉด ์•ˆ ๋˜๋Š”๊ฐ€ + ์ด์œ  + ๋Œ€์‹  ๋ฌด์—‡์„)*