--- id: LAYER-NORM-001 category: "[[10_Wiki/πŸ’‘ Topics/AI]]" confidence_score: 1.0 tags: [deep-learning, normalization, transformer, neural-networks] last_reinforced: 2026-04-26 --- # [[Layer Normalization (λ ˆμ΄μ–΄ μ •κ·œν™”)]] ## πŸ“Œ ν•œ 쀄 톡찰 (The Karpathy Summary) > "λͺ¨λΈ λ‚΄λΆ€μ˜ 데이터 흐름을 μΌμ •ν•˜κ²Œ μ œμ–΄ν•˜μ—¬ ν•™μŠ΅μ˜ μ•ˆμ •μ„±μ„ ν™•λ³΄ν•˜λΌ" β€” 각 μƒ˜ν”Œ λ‚΄μ˜ ν”Όμ²˜(Feature)듀을 λŒ€μƒμœΌλ‘œ 평균과 뢄산을 κ³„μ‚°ν•˜μ—¬ μ •κ·œν™”ν•¨μœΌλ‘œμ¨, κΉŠμ€ μ‹ κ²½λ§μ—μ„œλ„ ν•™μŠ΅μ΄ λΉ λ₯΄κ³  μ•ˆμ •μ μœΌλ‘œ μ΄λ£¨μ–΄μ§€κ²Œ λ•λŠ” 기법. ## πŸ“– κ΅¬μ‘°ν™”λœ 지식 (Synthesized Content) - **μΆ”μΆœλœ νŒ¨ν„΄:** 배치 크기에 μ˜μ‘΄ν•˜μ§€ μ•Šκ³  κ°œλ³„ 데이터 μƒ˜ν”Œ λ‚΄λΆ€μ˜ μ •λ³΄λ§ŒμœΌλ‘œ μ •κ·œν™”λ₯Ό μˆ˜ν–‰ν•˜μ—¬, νŠΈλžœμŠ€ν¬λ¨Έμ™€ 같은 λ³΅μž‘ν•œ μ•„ν‚€ν…μ²˜μ™€ κ°€λ³€ 길이 λ°μ΄ν„°μ—μ„œ μΌκ΄€λœ μ„±λŠ₯을 보μž₯ν•˜λŠ” νŒ¨ν„΄. - **μ„ΈλΆ€ λ‚΄μš©:** - **vs Batch Normalization:** λ°°μΉ˜λŠ” μƒ˜ν”Œ κ°„(Across samples) μ •κ·œν™”λ₯Ό ν•˜μ§€λ§Œ, λ ˆμ΄μ–΄ μ •κ·œν™”λŠ” μƒ˜ν”Œ λ‚΄(Within sample) ν”Όμ²˜ κ°„ μ •κ·œν™”λ₯Ό 함. - **Transformer Essential:** 거의 λͺ¨λ“  ν˜„λŒ€ LLM(GPT, BERT λ“±) μ•„ν‚€ν…μ²˜μ—μ„œ μ•ˆμ •μ μΈ ν•™μŠ΅μ„ μœ„ν•΄ ν•„μˆ˜μ μœΌλ‘œ 채택됨. - **Scale & Shift:** μ •κ·œν™” ν›„ ν•™μŠ΅ κ°€λŠ₯ν•œ νŒŒλΌλ―Έν„°($\gamma, \beta$)λ₯Ό 톡해 λͺ¨λΈμ΄ 졜적의 데이터 뢄포λ₯Ό 슀슀둜 찾게 함. - **Inference Stability:** μΆ”λ‘  μ‹œμ—λ„ ν•™μŠ΅ μ‹œμ™€ λ™μΌν•œ λ°©μ‹μœΌλ‘œ λ™μž‘ν•˜λ―€λ‘œ μΌκ΄€λœ κ²°κ³Όλ₯Ό 얻을 수 있음. ## ⚠️ λͺ¨μˆœ 및 μ—…λ°μ΄νŠΈ (Contradictions & RL Update) - **κ³Όκ±° λ°μ΄ν„°μ™€μ˜ 좩돌:** 배치 μ •κ·œν™”κ°€ λŒ€μ„Έμ˜€λ˜ μ‹œκΈ°λ₯Ό μ§€λ‚˜, 순차적 데이터(RNN)와 트랜슀포머 ꡬ쑰가 μ£Όλ₯˜κ°€ λ˜λ©΄μ„œ λ ˆμ΄μ–΄ μ •κ·œν™”κ°€ ν‘œμ€€μœΌλ‘œ 자리 작음. - **μ •μ±… λ³€ν™”:** Antigravity ν”„λ‘œμ νŠΈμ—μ„œ μ‚¬μš©ν•˜λŠ” λͺ¨λΈλ“€μ˜ κ°€μ€‘μΉ˜ 뢄석 μ‹œ, λ ˆμ΄μ–΄ μ •κ·œν™” 측의 ν™œμ„±ν™” 값을 λͺ¨λ‹ˆν„°λ§ν•˜μ—¬ ν•™μŠ΅ 포화 μƒνƒœλ₯Ό 진단함. ## πŸ”— 지식 μ—°κ²° (Graph) - [[Batch-Normalization]], [[Transformer-Architecture]], [[Optimization]], [[Deep-Learning]] - **Raw Source:** [[10_Wiki/Topics/AI/Layer-Normalization.md]]