--- id: AI-DISTILL-001 category: "10_Wiki/πŸ’‘ Topics/AI" confidence_score: 1.0 tags: [ai, deep-learning, knowledge-distillation, model-compression, inference-optimization] last_reinforced: 2026-04-26 --- # Knowledge Distillation (지식 증λ₯˜) ## πŸ“Œ ν•œ 쀄 톡찰 (The Karpathy Summary) > "거인의 λ°©λŒ€ν•œ 지식을 μš”μ•½ν•˜μ—¬ μž‘μ€ μ•„μ΄μ˜ 머릿속에 효율적으둜 μ΄μ‹ν•˜λΌ" β€” κ±°λŒ€ν•œ 사전 ν•™μŠ΅ λͺ¨λΈ(Teacher)이 κ°€μ§„ μ •κ΅ν•œ 예츑 ν™•λ₯  뢄포λ₯Ό μž‘μ€ κ²½λŸ‰ λͺ¨λΈ(Student)이 ν•™μŠ΅ν•˜κ²Œ ν•˜μ—¬, μ„±λŠ₯ 손싀을 μ΅œμ†Œν™”ν•˜λ©΄μ„œ μΆ”λ‘  속도λ₯Ό λΉ„μ•½μ μœΌλ‘œ λ†’μ΄λŠ” 기법. ## πŸ“– κ΅¬μ‘°ν™”λœ 지식 (Synthesized Content) - **μΆ”μΆœλœ νŒ¨ν„΄:** "Teacher-Student Learning" β€” μ •λ‹΅ λ ˆμ΄λΈ”(Hard Target)뿐만 μ•„λ‹ˆλΌ, ν‹°μ²˜ λͺ¨λΈμ΄ 내놓은 각 ν΄λž˜μŠ€λ³„ ν™•λ₯ κ°’(Soft Target)에 λ‹΄κΈ΄ '클래슀 κ°„ 상관관계' μ •λ³΄κΉŒμ§€ 슀튜던트 λͺ¨λΈμ΄ ν•™μŠ΅ν•˜κ²Œ ν•˜λŠ” 지식 μ „μˆ˜ νŒ¨ν„΄. - **μž‘λ™ 원리:** - **Teacher Model:** ν’λΆ€ν•œ νŒŒλΌλ―Έν„°λ₯Ό κ°€μ§„ κ³ μ„±λŠ₯ λͺ¨λΈ. - **Student Model:** μ‹€μ „ 배포λ₯Ό μœ„ν•œ κ°€λ²Όμš΄ λͺ¨λΈ. - **Temperature (T):** μ†Œν”„νŠΈλ§₯슀 결과값을 λΆ€λ“œλŸ½κ²Œ λ§Œλ“€μ–΄(Softening) 슀튜던트 λͺ¨λΈμ΄ 더 ν’λΆ€ν•œ 정보λ₯Ό 배우게 함. - **의의:** κ±°λŒ€ λͺ¨λΈμ˜ λ›°μ–΄λ‚œ μΌλ°˜ν™” λŠ₯λ ₯을 μœ μ§€ν•˜λ©΄μ„œλ„ λͺ¨λ°”μΌμ΄λ‚˜ μ—£μ§€ κΈ°κΈ°μ—μ„œ μ‹€μ‹œκ°„ ꡬ동 κ°€λŠ₯ν•œ λͺ¨λΈμ„ λ§Œλ“€ 수 있게 함 (예: BERT -> DistilBERT). ## ⚠️ λͺ¨μˆœ 및 μ—…λ°μ΄νŠΈ (Contradictions & RL Update) - **κ³Όκ±° λ°μ΄ν„°μ™€μ˜ 좩돌:** λ‹¨μˆœνžˆ λͺ¨λΈμ˜ 크기λ₯Ό μ€„μ΄λŠ”(Pruning, Quantization) ν•˜λ“œμ›¨μ–΄μ  접근을 λ„˜μ–΄, λͺ¨λΈμ˜ '사고 방식' 자체λ₯Ό μ΅œμ ν™”ν•˜μ—¬ μ „μˆ˜ν•˜λŠ” μ•Œκ³ λ¦¬μ¦˜μ  μ ‘κ·ΌμœΌλ‘œ μ§„ν™”. - **μ •μ±… λ³€ν™”:** Antigravity ν”„λ‘œμ νŠΈλŠ” 둜컬 브레인용 κ²½λŸ‰ λͺ¨λΈ μ œμž‘ μ‹œ, ν΄λΌμš°λ“œ 브레인의 κ±°λŒ€ νŒŒλΌλ―Έν„° λͺ¨λΈμ„ ν‹°μ²˜λ‘œ μ‚Όμ•„ 지식 증λ₯˜ 과정을 거침으둜써 μ†Œν˜• λͺ¨λΈμ˜ μ§€λŠ₯을 상ν–₯ 평쀀화함. ## πŸ”— 지식 μ—°κ²° (Graph) - [[Inference-Optimization|Inference-Optimization]], Transfer-Learning-Foundations, [[LLM|LLM]], [[Hardware-Acceleration-for-AI|Hardware-Acceleration-for-AI]] - **Raw Source:** 10_Wiki/Topics/AI/Knowledge-Distillation.md