--- id: DL-SELF-ATT-001 category: "10_Wiki/πŸ’‘ Topics/AI" confidence_score: 1.0 tags: [ai, deep-learning, transformer, self-attention, attention-mechanism, nlp, neural-networks] last_reinforced: 2026-04-26 --- # Self-Attention Mechanisms (μ…€ν”„ μ–΄ν…μ…˜ λ©”μ»€λ‹ˆμ¦˜) ## πŸ“Œ ν•œ 쀄 톡찰 (The Karpathy Summary) > "데이터 λ‚΄μ˜ λͺ¨λ“  μš”μ†Œκ°€ μ„œλ‘œμ˜ λ§₯락을 λ³‘λ ¬λ‘œ νƒμƒ‰ν•˜κ²Œ ν•˜κ³ , ν˜„μž¬μ˜ 의미λ₯Ό μ™„μ„±ν•˜λŠ” 데 κ°€μž₯ 기여도가 높은 'μƒλŒ€'μ—κ²Œ μ§€λŠ₯의 μ΄ˆμ μ„ μ§‘μ€‘μ‹œμΌœλΌ" β€” μž…λ ₯ μ‹œν€€μŠ€μ˜ 각 μš”μ†Œκ°€ 전체 μ‹œν€€μŠ€μ˜ λ‹€λ₯Έ λͺ¨λ“  μš”μ†Œμ™€ μƒν˜Έμž‘μš©ν•˜λ©° μžμ‹ μ˜ 의미λ₯Ό μ—…λ°μ΄νŠΈν•˜λŠ” 트랜슀포머 μ•„ν‚€ν…μ²˜μ˜ 핡심 λ©”μ»€λ‹ˆμ¦˜. ## πŸ“– κ΅¬μ‘°ν™”λœ 지식 (Synthesized Content) - **μΆ”μΆœλœ νŒ¨ν„΄:** "Dynamic Contextual Weighing and Parallel Interaction" β€” 각 단어λ₯Ό 질문(Query), λŒ€μƒ(Key), 정보(Value) λ²‘ν„°λ‘œ νˆ¬μ˜ν•˜κ³ , 질문과 λŒ€μƒ μ‚¬μ΄μ˜ μœ μ‚¬λ„(Dot-product)λ₯Ό μ μˆ˜ν™”ν•˜μ—¬ ν•„μš”ν•œ 정보λ₯Ό 가쀑 ν‰κ· ν•˜μ—¬ κ°€μ Έμ˜€λŠ” νŒ¨ν„΄. - **핡심 μˆ˜μ‹ κ°œλ…:** - **Attention(Q, K, V) = softmax(QK^T / sqrt(d_k))V** - **Scaled Dot-product:** 기울기 폭주λ₯Ό 막기 μœ„ν•΄ 차원 수의 제곱근으둜 λ‚˜λˆ„μ–΄μ€Œ. - **Multi-head Attention:** μ—¬λŸ¬ 개의 μ–΄ν…μ…˜μ„ λ³‘λ ¬λ‘œ 돌렀 λ‹€μ–‘ν•œ μ‹œκ°(문법적, 의미적 λ“±)μ—μ„œ λ§₯락 νŒŒμ•…. - **의의:** RNNκ³Ό 달리 μ‹œν€€μŠ€λ₯Ό 순차적으둜 μ²˜λ¦¬ν•  ν•„μš”κ°€ μ—†μ–΄ 병렬 연산이 κ°€λŠ₯ν•˜λ©°, 거리가 λ¨Ό 단어듀 μ‚¬μ΄μ˜ 관계(Long-range dependency)도 ν•œ λ²ˆμ— νŒŒμ•…ν•  수 μžˆλŠ” μ§€λŠ₯의 ν˜μ‹ μ„ 이룸. ## ⚠️ λͺ¨μˆœ 및 μ—…λ°μ΄νŠΈ (Contradictions & RL Update) - **κ³Όκ±° λ°μ΄ν„°μ™€μ˜ 좩돌:** 데이터가 κΈΈμ–΄μ§ˆμˆ˜λ‘ μ—°μ‚°λŸ‰μ΄ 제곱($O(n^2)$)으둜 λŠ˜μ–΄λ‚œλ‹€λŠ” 치λͺ…적 ν•œκ³„λ₯Ό κ·Ήλ³΅ν•˜κΈ° μœ„ν•΄, μ΅œκ·Όμ—λŠ” Flash Attentionμ΄λ‚˜ Sparse Attention λ“± μ—°μ‚° νš¨μœ¨μ„ κ·ΉλŒ€ν™”ν•œ λ‹€μ–‘ν•œ λ³€ν˜• κΈ°μˆ λ“€μ΄ λ„μž…λ˜κ³  있음. - **μ •μ±… λ³€ν™”:** Antigravity ν”„λ‘œμ νŠΈλŠ” λŒ€κ·œλͺ¨ 지식 관계망 ꡬ좕 μ‹œ, λ¬Έμ„œ κ°„μ˜ 의미적 거리λ₯Ό μ‚°μΆœν•˜κΈ° μœ„ν•΄ λ‚΄λΆ€μ μœΌλ‘œ λ©€ν‹° ν—€λ“œ μ…€ν”„ μ–΄ν…μ…˜ 기반의 μž„λ² λ”© 뢄석 엔진을 ν™œμš©ν•¨. ## πŸ”— 지식 μ—°κ²° (Graph) - [[Natural-Language-Processing-NLP|Natural-Language-Processing-NLP]], Deep-Learning-Foundations, [[Scalability-in-AI-Systems|Scalability-in-AI-Systems]], [[Modern-Website-Architecture|Modern-Website-Architecture]] - **Raw Source:** 10_Wiki/Topics/AI/Self-Attention-Mechanisms.md