--- id: [[P-Reinforce|P-Reinforce]]-AUTO-FITU-001 category: Dev confidence_score: 0.98 tags: [auto-reinforced, fine-tuning, llm, transfer-learning, domain-adaptation, lora] last_reinforced: 2026-04-20 --- # [[Fine-tuning|Fine-tuning]] ## ๐Ÿ“Œ ํ•œ ์ค„ ํ†ต์ฐฐ (The Karpathy Summary) > "๊ฑฐ์ธ์˜ ์ง€์‹์„ ๋‚ด ๊ฒƒ์œผ๋กœ ๋งŒ๋“ค๋‹ค: ๋น„์ „๊ณต ์ง€์‹์ด ๊ฐ€๋“ํ•œ ๊ฑฐ๋Œ€ ๋ชจ๋ธ(Pre-trained)์„ ๊ฐ€์ ธ์™€, ํŠน์ • ๋„๋ฉ”์ธ์ด๋‚˜ ๊ธฐ์—…์˜ ๊ณ ์œ  ๋ฐ์ดํ„ฐ๋ฅผ ์ถ”๊ฐ€๋กœ ํ•™์Šต์‹œํ‚ด์œผ๋กœ์จ ์ „๋ฌธ์„ฑ์„ ๋‚ ์นด๋กญ๊ฒŒ ๋ฒผ๋ฆฌ๊ณ  ๋งž์ถคํ˜• ์ „๋ฌธ๊ฐ€๋กœ ํƒˆ๋ฐ”๊ฟˆ์‹œํ‚ค๋Š” ์ตœ์ ํ™” ๊ณต์ •." ## ๐Ÿ“– ๊ตฌ์กฐํ™”๋œ ์ง€์‹ (Synthesized Content) ํŒŒ์ธํŠœ๋‹(Fine-tuning)์€ ์ด๋ฏธ ํ•™์Šต๋œ ๋ชจ๋ธ์— ์ƒˆ๋กœ์šด ๋ฐ์ดํ„ฐ์™€ ๊ฐ€์ค‘์น˜ ์—…๋ฐ์ดํŠธ๋ฅผ ์ ์šฉํ•˜์—ฌ ํŠน์ • ์ž‘์—…์— ์ตœ์ ํ™”ํ•˜๋Š” ๊ธฐ๋ฒ•์ž…๋‹ˆ๋‹ค. 1. **๋ฐฉ์‹**: * **Full Fine-tuning**: ์ „์ฒด ํŒŒ๋ผ๋ฏธํ„ฐ๋ฅผ ์—…๋ฐ์ดํŠธ (๋น„์šฉ ๋†’์Œ). * **PEFT ([[Parameter|Parameter]]-Efficient Fine-Tuning)**: LoRA(Low-Rank Adaptation) ๋“ฑ ์ผ๋ถ€ ํ•ต์‹ฌ ํŒŒ๋ผ๋ฏธํ„ฐ๋งŒ ํ•™์Šตํ•˜์—ฌ ์ ์€ ์ž์›์œผ๋กœ ๊ณ ์„ฑ๋Šฅ ๋‹ฌ์„ฑ. * **Instruction Tuning**: "์š”์•ฝํ•ด์ค˜", "๋ฒˆ์—ญํ•ด์ค˜" ๋“ฑ์˜ ์ง€์‹œ์–ด(Instruction)๋ฅผ ๋”ฐ๋ฅด๋„๋ก ํ•™์Šต. 2. **์™œ ์ค‘์š”ํ•œ๊ฐ€?**: * ๋ฒ”์šฉ ๋ชจ๋ธ์˜ ํ•œ๊ณ„(์ผ๋ฐ˜์  ๋‹ต๋ณ€)๋ฅผ ๋„˜์–ด, ์˜๋ฃŒ, ๋ฒ•๋ฅ , ๊ธฐ์—… ๋‚ด๋ถ€ ๋งค๋‰ด์–ผ ๋“ฑ์— ํŠนํ™”๋œ '์‚ด์•„์žˆ๋Š” ์ง€๋Šฅ'์„ ๋งŒ๋“œ๋Š” ๊ฐ€์žฅ ๊ฐ•๋ ฅํ•œ ๋ฐฉ๋ฒ•์ž„. (Transfer-Learning์˜ ์—ฐ์žฅ) ## โš ๏ธ ๋ชจ์ˆœ ๋ฐ ์—…๋ฐ์ดํŠธ (Contradictions & RL Update) - **๊ณผ๊ฑฐ ๋ฐ์ดํ„ฐ์™€์˜ ์ถฉ๋Œ**: ๊ณผ๊ฑฐ์—๋Š” ํŒŒ์ธํŠœ๋‹ ์‹œ ๋ชจ๋ธ์ด ์ด์ „ ์ง€์‹์„ ์žŠ์–ด๋ฒ„๋ฆฌ๋Š” 'ํŒŒ๊ดด์  ๋ง๊ฐ(Catastrophic Forgetting) ์ •์ฑ…'์ด ํฐ ๋ฌธ์ œ์˜€์œผ๋‚˜, ํ˜„๋Œ€ ์ •์ฑ…์€ ์ง€์‹ ๋ณด์กด ์ •์ฑ…(Elastic Weight Consolidation ๋“ฑ)๊ณผ ํšจ์œจ์  ํ•™์Šต ์ •์ฑ…์œผ๋กœ ์ด๋ฅผ ์ •๋ณตํ•จ(RL Update). - **์ •์ฑ… ๋ณ€ํ™”(RL Update)**: ๋‹จ์ˆœํžˆ ๋ฐ์ดํ„ฐ๋ฅผ ๋„ฃ๋Š” ์ˆ˜์ค€์„ ๋„˜์–ด, ๋ชจ๋ธ์˜ ๊ฐ€์น˜๊ด€๊ณผ ์œค๋ฆฌ๋ฅผ ์ •๋ ฌํ•˜๋Š” RLHF/DPO ์ •์ฑ…๊ณผ ๊ฒฐํ•ฉ๋˜์–ด '์ง€๋Šฅ์˜ ๋ฐฉํ–ฅ์„ฑ ์ •์ฑ…'์„ ์„ค์ •ํ•˜๋Š” ๊ณผ์ •์œผ๋กœ ๊ณ ๋„ํ™”๋จ. (DPO์™€ ์—ฐ๊ฒฐ) ## ๐Ÿ”— ์ง€์‹ ์—ฐ๊ฒฐ (Graph) - Transfer-Learning, DPO (Direct PReference Optimization), [[Optimization|Optimization]], [[Efficiency|Efficiency]], [[Constitutional AI (แ„’แ…ฅแ†ซแ„‡แ…ฅแ†ธ AI)|Constitutional AI (ํ—Œ๋ฒ• AI)]] - **Modern Tech/Tools**: LoRA, QLoRA, Hugging Face `peft` library, Unsloth. ---