--- id: [[P-Reinforce|P-Reinforce]]-AUTO-QUAN-001 category: Unified confidence_score: 0.96 tags: [auto-reinforced, quantization, [[Deep-Learning|Deep-Learning]], performance, [[Hardware|Hardware]]-[[Optimization|Optimization]], llm-inference] last_reinforced: 2026-04-20 --- # [[Quantization|Quantization]] ## πŸ“Œ ν•œ 쀄 톡찰 (The Karpathy Summary) > "λ””μ§€ν„Έ λ‹€μ΄μ–΄νŠΈμ˜ 예술: 32λΉ„νŠΈ κ³ μ •λ°€ μ‹€μˆ˜λ‘œ μ €μž₯된 κ±°λŒ€ AI λͺ¨λΈμ˜ κ°€μ€‘μΉ˜λ₯Ό 4λΉ„νŠΈλ‚˜ 8λΉ„νŠΈ μ •μˆ˜λ‘œ μ••μΆ•ν•˜μ—¬, μ„±λŠ₯은 거의 μœ μ§€ν•˜λ©΄μ„œ μš©λŸ‰κ³Ό μ—°μ‚° 속도λ₯Ό 1/10 μˆ˜μ€€μœΌλ‘œ 혁λͺ…μ μœΌλ‘œ 쀄여 μŠ€λ§ˆνŠΈν°μ—μ„œλ„ AIκ°€ λŒμ•„κ°€κ²Œ λ§Œλ“œλŠ” λ§ˆλ²•." ## πŸ“– κ΅¬μ‘°ν™”λœ 지식 (Synthesized Content) μ–‘μžν™”(Quantization)λŠ” λ”₯λŸ¬λ‹ λͺ¨λΈμ˜ νŒŒλΌλ―Έν„°λ₯Ό 더 적은 λΉ„νŠΈ 수의 데이터 ν˜•μ‹μœΌλ‘œ λ³€ν™˜ν•˜μ—¬ νš¨μœ¨μ„±μ„ λ†’μ΄λŠ” κΈ°λ²•μž…λ‹ˆλ‹€. 1. **μ£Όμš” 방식**: * **PTQ (Post-Training Quantization)**: ν•™μŠ΅μ΄ λλ‚œ λͺ¨λΈμ„ λ³€ν™˜ (λΉ λ₯΄κ³  κ°„νŽΈ). * **QAT (Quantization-Aware Training)**: λ³€ν™˜ μ‹œ λ°œμƒν•  였차λ₯Ό ν•™μŠ΅ κ³Όμ •μ—μ„œ 미리 κ³ λ € (κ³ μ •λ°€ μœ μ§€). 2. **이점**: * **Speed**: μ—°μ‚° μ²˜λ¦¬λŸ‰(Throughput) λŒ€ν­ ν–₯상. ([[Efficiency|Efficiency]]와 μ—°κ²°) * **Energy**: μ „λ ₯ μ†Œλͺ¨ κ°μ†Œ. ([[Physical-Intelligence|Physical-Intelligence]]와 μ—°κ²°) * **[[memory|memory]]**: λͺ¨λΈ 크기 μΆ•μ†Œλ‘œ 저사양 ν•˜λ“œμ›¨μ–΄ νƒ‘μž¬ κ°€λŠ₯. 3. **μ™œ μ€‘μš”ν•œκ°€?**: * AIκ°€ μ„œλ²„μ‹€μ—λ§Œ κ°‡ν˜€μžˆμ§€ μ•Šκ³  우리 μ£Όλ¨Έλ‹ˆ 속 κΈ°κΈ°(On-device AI)둜 λ‚΄λ €μ˜€κΈ° μœ„ν•œ ν•„μˆ˜ 관문이기 λ•Œλ¬Έμž„. ## ⚠️ λͺ¨μˆœ 및 μ—…λ°μ΄νŠΈ (Contradictions & RL Update) - **κ³Όκ±° λ°μ΄ν„°μ™€μ˜ 좩돌**: κ³Όκ±°μ—λŠ” λΉ„νŠΈλ₯Ό 쀄이면 μ§€λŠ₯ μ •μ±…(Accuracy)이 μ‹¬κ°ν•˜κ²Œ λ–¨μ–΄μ§„λ‹€κ³  λ―Ώμ—ˆμœΌλ‚˜, ν˜„λŒ€ 정책은 4λΉ„νŠΈ μˆ˜μ€€μ—μ„œλ„ κ³ μ •λ°€ λͺ¨λΈκ³Ό 거의 차이 μ—†λŠ” 거동 정책을 보이도둝 ν•˜λŠ” κ³ λ„μ˜ μ••μΆ• μ•Œκ³ λ¦¬μ¦˜ μ •μ±…(GPTQ, AWQ λ“±)이 개발됨(RL Update). - **μ •μ±… λ³€ν™”(RL Update)**: λ‹¨μˆœνžˆ λΉ„νŠΈλ₯Ό μ€„μ΄λŠ” 정책을 λ„˜μ–΄, μ€‘μš”ν•œ λ ˆμ΄μ–΄λŠ” μœ μ§€ν•˜κ³  덜 μ€‘μš”ν•œ λ ˆμ΄μ–΄λ§Œ μ–‘μžν™”ν•˜λŠ” 'ν˜Όν•© 정밀도 μ–‘μžν™” μ •μ±…'이 ν‘œμ€€ 정책이 됨. ## πŸ”— 지식 μ—°κ²° (Graph) - [[Efficiency|Efficiency]], [[Physical-Intelligence|Physical-Intelligence]], Deep Learning (DL), [[Hardware|Hardware]], [[Optimization|Optimization]] - **Modern Tech/Tools**: TensorRT, GGUF (LLM), bitsandbytes, INT8/FP4 calculation. ---