--- id: wiki-2026-0508-fine-tuning title: Fine tuning category: 10_Wiki/Topics status: needs_review canonical_id: self aliases: [P-Reinforce-AUTO-FITU-001] duplicate_of: none source_trust_level: A confidence_score: 0.98 tags: [auto-reinforced, fine-tuning, llm, transfer-learning, domain-adaptation, lora] raw_sources: [] last_reinforced: 2026-04-20 github_commit: pending inferred_by: Claude Opus 4.7 (auto-normalize 2026-05-08) --- # [[Fine-tuning|Fine-tuning]] ## ๐Ÿ“Œ ํ•œ ์ค„ ํ†ต์ฐฐ (The Karpathy Summary) > "๊ฑฐ์ธ์˜ ์ง€์‹์„ ๋‚ด ๊ฒƒ์œผ๋กœ ๋งŒ๋“ค๋‹ค: ๋น„์ „๊ณต ์ง€์‹์ด ๊ฐ€๋“ํ•œ ๊ฑฐ๋Œ€ ๋ชจ๋ธ(Pre-trained)์„ ๊ฐ€์ ธ์™€, ํŠน์ • ๋„๋ฉ”์ธ์ด๋‚˜ ๊ธฐ์—…์˜ ๊ณ ์œ  ๋ฐ์ดํ„ฐ๋ฅผ ์ถ”๊ฐ€๋กœ ํ•™์Šต์‹œํ‚ด์œผ๋กœ์จ ์ „๋ฌธ์„ฑ์„ ๋‚ ์นด๋กญ๊ฒŒ ๋ฒผ๋ฆฌ๊ณ  ๋งž์ถคํ˜• ์ „๋ฌธ๊ฐ€๋กœ ํƒˆ๋ฐ”๊ฟˆ์‹œํ‚ค๋Š” ์ตœ์ ํ™” ๊ณต์ •." ## ๐Ÿ“– ๊ตฌ์กฐํ™”๋œ ์ง€์‹ (Synthesized Content) ํŒŒ์ธํŠœ๋‹(Fine-tuning)์€ ์ด๋ฏธ ํ•™์Šต๋œ ๋ชจ๋ธ์— ์ƒˆ๋กœ์šด ๋ฐ์ดํ„ฐ์™€ ๊ฐ€์ค‘์น˜ ์—…๋ฐ์ดํŠธ๋ฅผ ์ ์šฉํ•˜์—ฌ ํŠน์ • ์ž‘์—…์— ์ตœ์ ํ™”ํ•˜๋Š” ๊ธฐ๋ฒ•์ž…๋‹ˆ๋‹ค. 1. **๋ฐฉ์‹**: * **Full Fine-tuning**: ์ „์ฒด ํŒŒ๋ผ๋ฏธํ„ฐ๋ฅผ ์—…๋ฐ์ดํŠธ (๋น„์šฉ ๋†’์Œ). * **PEFT ([[Parameter|Parameter]]-Efficient Fine-Tuning)**: LoRA(Low-Rank Adaptation) ๋“ฑ ์ผ๋ถ€ ํ•ต์‹ฌ ํŒŒ๋ผ๋ฏธํ„ฐ๋งŒ ํ•™์Šตํ•˜์—ฌ ์ ์€ ์ž์›์œผ๋กœ ๊ณ ์„ฑ๋Šฅ ๋‹ฌ์„ฑ. * **Instruction Tuning**: "์š”์•ฝํ•ด์ค˜", "๋ฒˆ์—ญํ•ด์ค˜" ๋“ฑ์˜ ์ง€์‹œ์–ด(Instruction)๋ฅผ ๋”ฐ๋ฅด๋„๋ก ํ•™์Šต. 2. **์™œ ์ค‘์š”ํ•œ๊ฐ€?**: * ๋ฒ”์šฉ ๋ชจ๋ธ์˜ ํ•œ๊ณ„(์ผ๋ฐ˜์  ๋‹ต๋ณ€)๋ฅผ ๋„˜์–ด, ์˜๋ฃŒ, ๋ฒ•๋ฅ , ๊ธฐ์—… ๋‚ด๋ถ€ ๋งค๋‰ด์–ผ ๋“ฑ์— ํŠนํ™”๋œ '์‚ด์•„์žˆ๋Š” ์ง€๋Šฅ'์„ ๋งŒ๋“œ๋Š” ๊ฐ€์žฅ ๊ฐ•๋ ฅํ•œ ๋ฐฉ๋ฒ•์ž„. (Transfer-Learning์˜ ์—ฐ์žฅ) ## โš ๏ธ ๋ชจ์ˆœ ๋ฐ ์—…๋ฐ์ดํŠธ (Contradictions & Updates) - **๊ณผ๊ฑฐ ๋ฐ์ดํ„ฐ์™€์˜ ์ถฉ๋Œ**: ๊ณผ๊ฑฐ์—๋Š” ํŒŒ์ธํŠœ๋‹ ์‹œ ๋ชจ๋ธ์ด ์ด์ „ ์ง€์‹์„ ์žŠ์–ด๋ฒ„๋ฆฌ๋Š” 'ํŒŒ๊ดด์  ๋ง๊ฐ(Catastrophic Forgetting) ์ •์ฑ…'์ด ํฐ ๋ฌธ์ œ์˜€์œผ๋‚˜, ํ˜„๋Œ€ ์ •์ฑ…์€ ์ง€์‹ ๋ณด์กด ์ •์ฑ…(Elastic Weight Consolidation ๋“ฑ)๊ณผ ํšจ์œจ์  ํ•™์Šต ์ •์ฑ…์œผ๋กœ ์ด๋ฅผ ์ •๋ณตํ•จ(RL Update). - **์ •์ฑ… ๋ณ€ํ™”(RL Update)**: ๋‹จ์ˆœํžˆ ๋ฐ์ดํ„ฐ๋ฅผ ๋„ฃ๋Š” ์ˆ˜์ค€์„ ๋„˜์–ด, ๋ชจ๋ธ์˜ ๊ฐ€์น˜๊ด€๊ณผ ์œค๋ฆฌ๋ฅผ ์ •๋ ฌํ•˜๋Š” RLHF/DPO ์ •์ฑ…๊ณผ ๊ฒฐํ•ฉ๋˜์–ด '์ง€๋Šฅ์˜ ๋ฐฉํ–ฅ์„ฑ ์ •์ฑ…'์„ ์„ค์ •ํ•˜๋Š” ๊ณผ์ •์œผ๋กœ ๊ณ ๋„ํ™”๋จ. (DPO์™€ ์—ฐ๊ฒฐ) ## ๐Ÿ”— ์ง€์‹ ์—ฐ๊ฒฐ (Graph) - Transfer-Learning, DPO (Direct PReference Optimization), [[Optimization|Optimization]], [[Efficiency|Efficiency]], [[Constitutional AI (แ„’แ…ฅแ†ซแ„‡แ…ฅแ†ธ AI)|Constitutional AI (ํ—Œ๋ฒ• AI)]] - **Modern Tech/Tools**: LoRA, QLoRA, Hugging Face `peft` library, Unsloth. --- ## ๐Ÿค– LLM ํ™œ์šฉ ํžŒํŠธ (How to Use This Knowledge) **์–ธ์ œ ์ด ์ง€์‹์„ ์“ฐ๋Š”๊ฐ€:** - *(TODO)* **์–ธ์ œ ์“ฐ๋ฉด ์•ˆ ๋˜๋Š”๊ฐ€:** - *(TODO)* ## ๐Ÿงช ๊ฒ€์ฆ ์ƒํƒœ (Validation) - **์ •๋ณด ์ƒํƒœ:** needs_review - **์ถœ์ฒ˜ ์‹ ๋ขฐ๋„:** A - **๊ฒ€ํ†  ์ด์œ :** *(P-Reinforce Phase 1 ์ž๋™ ์ •๊ทœํ™”. ๋ณธ๋ฌธ ๊ฒ€์ฆ ํ•„์š”.)* ## ๐Ÿงฌ ์ค‘๋ณต ๊ฒ€์‚ฌ (Duplicate Check) - **๊ธฐ์กด ์œ ์‚ฌ ๋ฌธ์„œ:** *(TODO: ์ธ๋ฑ์„œ ํด๋Ÿฌ์Šคํ„ฐ ๋ฆฌํฌํŠธ ์ฐธ์กฐ)* - **์ฒ˜๋ฆฌ ๋ฐฉ์‹:** UPDATE (์ž๋™ ์ •๊ทœํ™”) - **์ฒ˜๋ฆฌ ์ด์œ :** Phase 1 ์ •๊ทœํ™” โ€” ์˜› ํ…œํ”Œ๋ฆฟ/๋ˆ„๋ฝ ํ•„๋“œ ๋ณด๊ฐ•. ## ๐Ÿ•“ ๋ณ€๊ฒฝ ์ด๋ ฅ (Changelog) | ๋‚ ์งœ | ๋ณ€๊ฒฝ ๋‚ด์šฉ | ์ฒ˜๋ฆฌ ๋ฐฉ์‹ | ์‹ ๋ขฐ๋„ | |------|-----------|-----------|--------| | 2026-05-08 | P-Reinforce Phase 1 ์ •๊ทœํ™” (frontmatter + ํ—ค๋” ํ‘œ์ค€ํ™”) | UPDATE | A |