--- id: wiki-2026-0508-fine-tuning title: Fine-tuning category: 10_Wiki/Topics status: verified canonical_id: self aliases: [fine-tuning, FT, LoRA, QLoRA, full fine-tune, instruction tuning, continual] duplicate_of: none source_trust_level: A confidence_score: 0.98 verification_status: applied tags: [machine-learning, fine-tuning, lora, qlora, transfer-learning, peft, instruction-tuning] raw_sources: [] last_reinforced: 2026-05-10 github_commit: pending tech_stack: language: Python framework: HuggingFace transformers / peft / TRL / Unsloth / Axolotl --- # Fine-tuning ## 매 한 줄 > **"매 pretrained model 의 task / domain 의 의 adapt"**. 매 modern: 매 LoRA / QLoRA (PEFT) — 매 fraction parameter 의 update. 매 instruction tuning, RLHF, DPO. 매 alternative: prompt engineering, RAG. 매 cost: 매 GPU + data + eval. ## 매 핵심 ### 매 spectrum - **Full fine-tune**: 매 모든 weight. - **PEFT** (parameter-efficient): - LoRA: 매 rank decomposition. - QLoRA: 매 4-bit quant + LoRA. - Adapter: 매 inserted layer. - IA³, Prefix-tuning, Prompt-tuning. - **Instruction tuning**: 매 (prompt, response). - **DPO / SimPO**: 매 preference. - **RLHF**: 매 PPO + reward. ### 매 vs alternatives - **Prompt engineering**: 매 cheapest, 매 limited. - **Few-shot**: 매 ICL, 매 token cost. - **RAG**: 매 fresh knowledge. - **Fine-tune**: 매 style / format / domain skill. ### 매 응용 1. **Domain adapt**: 매 medical, legal. 2. **Style**: 매 brand voice. 3. **Format**: 매 JSON, structured. 4. **Tool use**: 매 function calling. 5. **Multilingual**: 매 low-resource. 6. **Safety**: 매 refuse harmful. ### 매 modern (2024+) - **Unsloth**: 매 2x faster, lower VRAM. - **Axolotl**: 매 YAML config. - **TRL**: 매 SFT + DPO. - **MLX-LM** (Apple): 매 on-device. ## 💻 패턴 ### Full fine-tune (HF Trainer) ```python from transformers import AutoModelForCausalLM, AutoTokenizer, Trainer, TrainingArguments model = AutoModelForCausalLM.from_pretrained('meta-llama/Llama-3-8B') tokenizer = AutoTokenizer.from_pretrained('meta-llama/Llama-3-8B') args = TrainingArguments( output_dir='out', num_train_epochs=3, per_device_train_batch_size=4, gradient_accumulation_steps=4, learning_rate=1e-5, warmup_steps=100, fp16=True, save_steps=500, evaluation_strategy='steps', eval_steps=500, ) Trainer(model=model, args=args, train_dataset=ds, eval_dataset=eval_ds).train() ``` ### LoRA (peft) ```python from peft import LoraConfig, get_peft_model config = LoraConfig( r=16, lora_alpha=32, lora_dropout=0.1, target_modules=['q_proj', 'k_proj', 'v_proj', 'o_proj'], bias='none', task_type='CAUSAL_LM', ) model = get_peft_model(model, config) model.print_trainable_parameters() # 매 ~0.1% of full ``` ### QLoRA (4-bit) ```python from transformers import BitsAndBytesConfig bnb = BitsAndBytesConfig( load_in_4bit=True, bnb_4bit_quant_type='nf4', bnb_4bit_compute_dtype=torch.bfloat16, bnb_4bit_use_double_quant=True, ) model = AutoModelForCausalLM.from_pretrained('llama-3-8b', quantization_config=bnb) model = get_peft_model(model, lora_config) ``` ### TRL SFT ```python from trl import SFTTrainer def format_instruction(ex): return f"### Instruction\n{ex['instruction']}\n### Response\n{ex['response']}" trainer = SFTTrainer( model=model, args=args, train_dataset=ds, formatting_func=format_instruction, max_seq_length=2048, ) trainer.train() ``` ### DPO (preference) ```python from trl import DPOTrainer # 매 dataset: prompt, chosen, rejected dpo = DPOTrainer( model=model, ref_model=ref_model, args=args, beta=0.1, train_dataset=preference_ds, tokenizer=tokenizer, ) dpo.train() ``` ### Unsloth (faster) ```python from unsloth import FastLanguageModel model, tokenizer = FastLanguageModel.from_pretrained( 'unsloth/llama-3-8b-Instruct-bnb-4bit', max_seq_length=4096, load_in_4bit=True, ) model = FastLanguageModel.get_peft_model(model, r=16, lora_alpha=32) # 매 2x faster + lower memory ``` ### Axolotl YAML ```yaml base_model: meta-llama/Llama-3-8B load_in_4bit: true adapter: qlora lora_r: 16 lora_alpha: 32 datasets: - path: my_data.jsonl type: alpaca val_set_size: 0.05 sequence_len: 2048 gradient_accumulation_steps: 4 micro_batch_size: 4 num_epochs: 3 learning_rate: 2e-4 optimizer: adamw_bnb_8bit ``` ### Merge LoRA back ```python from peft import PeftModel base = AutoModelForCausalLM.from_pretrained('llama-3-8b') lora = PeftModel.from_pretrained(base, 'lora_checkpoint') merged = lora.merge_and_unload() merged.save_pretrained('llama3-finetuned') ``` ### Eval (held-out) ```python def eval_finetuned(model, tokenizer, eval_ds): correct = 0 for ex in eval_ds: out = model.generate(**tokenizer(ex['prompt'], return_tensors='pt'), max_new_tokens=128) pred = tokenizer.decode(out[0], skip_special_tokens=True) if grade(pred, ex['answer']): correct += 1 return correct / len(eval_ds) ``` ### Prevent catastrophic forgetting ```python # 매 mix in original-task data def mixed_training(specific_data, general_data, ratio=0.2): mixed = list(specific_data) + random.sample(list(general_data), int(len(specific_data) * ratio)) random.shuffle(mixed) return mixed ``` ### Function calling fine-tune ```python def format_function_call(ex): return { 'prompt': f"User: {ex['user']}\n", 'response': f"{json.dumps(ex['call'])}\n", } ``` ### LR schedule (cosine + warmup) ```python args = TrainingArguments( learning_rate=2e-4, warmup_ratio=0.03, lr_scheduler_type='cosine', num_train_epochs=3, ) ``` ### Data quality (LIMA-style) ```python # 매 LIMA 2023: 매 1000 high-quality > 매 100k noisy def filter_quality(dataset, criteria): return [d for d in dataset if all(c(d) for c in criteria)] ``` ### MLX-LM (Apple) ```bash pip install mlx-lm python -m mlx_lm.lora --model llama-3-8b --train --data data.jsonl --batch-size 1 --lora-layers 16 ``` ### Quantize-aware training ```python # 매 GPTQ / AWQ for inference from auto_gptq import AutoGPTQForCausalLM quantized = AutoGPTQForCausalLM.from_pretrained( merged_model, quantize_config=BaseQuantizeConfig(bits=4, group_size=128), ) ``` ## 매 결정 기준 | 상황 | Approach | |---|---| | Style / format | Prompt eng (try first) | | Domain knowledge | RAG (try first) | | Custom skill | LoRA fine-tune | | Limited GPU | QLoRA (4-bit) | | Production speed | Unsloth | | Preference align | DPO | | Best quality cost-aware | LoRA r=64 + DPO | **기본값**: 매 prompt → RAG → LoRA. 매 GPU 부족 = QLoRA. 매 alignment = DPO. 매 quality = LIMA-style data. ## 🔗 Graph - 부모: [[Machine-Learning]] - 변형: [[LoRA]] · [[QLoRA]] · [[DPO]] · [[RLHF]] · [[Fine-tuning|Instruction-Tuning]] - 응용: [[PEFT]] · [[Axolotl]] - Adjacent: [[Catastrophic-Forgetting]] · [[Foundation-Models]] · [[RAG]] · [[Prompt_Engineering|Prompt-Engineering]] ## 🤖 LLM 활용 **언제**: 매 specific style / skill. 매 domain expert. 매 alignment. **언제 X**: 매 quick fix (prompt). 매 fresh data (RAG). ## ❌ 안티패턴 - **Fine-tune for facts**: 매 RAG 의 use. - **Tiny dataset full FT**: 매 catastrophic forget. - **No eval baseline**: 매 regress 의 invisible. - **Skip LoRA → full FT**: 매 overkill. - **No quant inference**: 매 cost. ## 🧪 검증 / 중복 - Verified (Hu LoRA 2021, Dettmers QLoRA 2023, Rafailov DPO 2023, LIMA 2023). - 신뢰도 A. ## 🕓 Changelog | 날짜 | 변경 | |---|---| | 2026-04-20 | Auto-reinforced | | 2026-05-08 | Phase 1 | | 2026-05-10 | Manual cleanup — full / LoRA / QLoRA / DPO / Unsloth / Axolotl / merge code |