--- id: [[P-Reinforce|P-Reinforce]]-AUTO-ROPE-001 category: Unified confidence_score: 1.00 tags: [auto-reinforced, rope, positional-embedding, yarn, longrope, context-extension] last_reinforced: 2026-05-04 --- # [[Positional Embeddings (RoPE & Variants)|Positional Embeddings (RoPE & Variants)]] ## πŸ“Œ ν•œ 쀄 톡찰 (The Karpathy Summary) > "μ§€λŠ₯의 λ‚˜μΉ¨λ°˜: 단어듀 μ‚¬μ΄μ˜ μƒλŒ€μ  거리λ₯Ό νšŒμ „(Rotation)μ΄λΌλŠ” μˆ˜ν•™μ  κΈ°λ²•μœΌλ‘œ ν‘œν˜„ν•˜μ—¬, λͺ¨λΈμ΄ ν•™μŠ΅ν•œ λ²”μœ„λ₯Ό 훨씬 μ΄ˆκ³Όν•˜λŠ” κΈ΄ λ¬Έμž₯μ—μ„œλ„ λ‹¨μ–΄μ˜ μˆœμ„œμ™€ 관계λ₯Ό μ •ν™•νžˆ νŒŒμ•…ν•˜κ²Œ ν•΄μ£ΌλŠ” μœ„μΉ˜ μ •λ³΄μ˜ 혁λͺ…." ## πŸ“– κ΅¬μ‘°ν™”λœ 지식 (Synthesized Content) μœ„μΉ˜ 인코딩(Positional Encoding)은 μˆœμ„œ κ°œλ…μ΄ μ—†λŠ” 트랜슀포머 λͺ¨λΈμ—κ²Œ ν† ν°μ˜ μœ„μΉ˜ 정보λ₯Ό μ œκ³΅ν•˜λŠ” κΈ°μˆ μž…λ‹ˆλ‹€. 1. **RoPE (Rotary Position Embedding)**: * **원리**: 각 ν† ν°μ˜ μœ„μΉ˜λ₯Ό λ³΅μ†Œμˆ˜ ν‰λ©΄μ—μ„œμ˜ νšŒμ „ κ°λ„λ‘œ λ³€ν™˜ν•˜μ—¬ μž…λ ₯값에 κ³±ν•΄μ€λ‹ˆλ‹€. * **νŠΉμ§•**: μ ˆλŒ€μ μΈ μœ„μΉ˜κ°€ μ•„λ‹Œ 'μƒλŒ€μ μΈ 거리'λ₯Ό μžμ—°μŠ€λŸ½κ²Œ λ°˜μ˜ν•˜λ©°, κΈ΄ λ¬Έλ§₯μ—μ„œλ„ μ„±λŠ₯ μ €ν•˜κ°€ 적어 Llama, PaLM λ“± λŒ€λΆ€λΆ„μ˜ μ΅œμ‹  λͺ¨λΈμ—μ„œ ν‘œμ€€μœΌλ‘œ μ‚¬μš©λ©λ‹ˆλ‹€. 2. **μ»¨ν…μŠ€νŠΈ ν™•μž₯ 기술 (Variants)**: * **Linear Interpolation**: ν•™μŠ΅λœ λ²”μœ„λ₯Ό λ„˜μ–΄μ„œλŠ” μœ„μΉ˜λ₯Ό κΈ°μ‘΄ λ²”μœ„ λ‚΄λ‘œ μ„ ν˜• μ••μΆ•ν•˜μ—¬ μΈμ‹μ‹œν‚΅λ‹ˆλ‹€. * **YaRN (Yet another RoPE extension method)**: μ„œλ‘œ λ‹€λ₯Έ 주파수λ₯Ό κ°€μ§„ νŒŒν˜•λ“€μ„ 각기 λ‹€λ₯΄κ²Œ μ‘°μ •ν•˜μ—¬, 정확도 손싀 없이 μ»¨ν…μŠ€νŠΈ 창을 μˆ˜μ‹­ λ°° 이상 ν™•μž₯ν•©λ‹ˆλ‹€. * **LongRoPE**: μ§„ν™” μ•Œκ³ λ¦¬μ¦˜μ„ 톡해 수백만 토큰 이상을 μ²˜λ¦¬ν•  수 μžˆλŠ” 졜적의 νšŒμ „ νŒŒλΌλ―Έν„°λ₯Ό μ°Ύμ•„λƒ…λ‹ˆλ‹€. 3. **iRoPE (Interleaved RoPE)**: * λ©€ν‹°λͺ¨λ‹¬ λͺ¨λΈμ΄λ‚˜ κΈ΄ λ¬Έλ§₯ λͺ¨λΈμ—μ„œ νŠΉμ • λ ˆμ΄μ–΄λ§ˆλ‹€ μœ„μΉ˜ 정보λ₯Ό λ‹€λ₯΄κ²Œ μ£Όμž…ν•˜μ—¬ μ„±λŠ₯을 μ΅œμ ν™”ν•˜λŠ” κΈ°λ²•μž…λ‹ˆλ‹€. ## βš–οΈ Trade-offs & Caveats * **μ™Έμ‚½(Extrapolation)의 ν•œκ³„**: ν•™μŠ΅ μ‹œ 보지 λͺ»ν•œ μ•„μ£Ό λ¨Ό 거리의 토큰 κ°„ 관계λ₯Ό μ™„λ²½ν•˜κ²Œ νŒŒμ•…ν•˜λŠ” 것은 μ—¬μ „νžˆ μˆ˜ν•™μ μœΌλ‘œ 도전적인 κ³Όμ œμž…λ‹ˆλ‹€. * **λ―Έμ„Έ μ‘°μ • ν•„μˆ˜**: λ‹¨μˆœνžˆ RoPE 기법을 μ μš©ν•˜λŠ” κ²ƒλ§ŒμœΌλ‘œλŠ” λΆ€μ‘±ν•˜λ©°, ν™•μž₯된 μ»¨ν…μŠ€νŠΈ λ²”μœ„μ—μ„œ μ†ŒλŸ‰μ˜ λ°μ΄ν„°λ‘œ μΆ”κ°€ ν•™μŠ΅(Fine-tuning)을 μ§„ν–‰ν•΄μ•Ό 제 μ„±λŠ₯이 λ‚˜μ˜΅λ‹ˆλ‹€. ## πŸ”— 지식 μ—°κ²° (Graph) * **μƒμœ„ κ°œλ…**: [[Transformer Architecture|Transformer Architecture]] * **ν•˜μœ„ 기술**: [[Attention Mechanisms|Attention Mechanisms]] * **ν•΄κ²° 과제**: [[Context Window & Long-Context LLMs|Context Window & Long-Context LLMs]] --- *Last updated: 2026-05-04*