f8b21af4be
10_Wiki/Topics 대규모 정리: - 오류 캡처/미완성 stub 문서 227개 제거 - 교차폴더 중복 43클러스터 병합 (63파일 → redirect) - 링크명 정규화: 깨진 링크 수정·redirect 직결·개념 매핑 ~2,400건 - 카테고리 MOC 6개 신규 생성 - Graph 섹션 미해결 related-keyword 링크 10,058건 제거 Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com>
268 lines
7.5 KiB
Markdown
268 lines
7.5 KiB
Markdown
---
|
|
id: wiki-2026-0508-fine-tuning
|
|
title: Fine-tuning
|
|
category: 10_Wiki/Topics
|
|
status: verified
|
|
canonical_id: self
|
|
aliases: [fine-tuning, FT, LoRA, QLoRA, full fine-tune, instruction tuning, continual]
|
|
duplicate_of: none
|
|
source_trust_level: A
|
|
confidence_score: 0.98
|
|
verification_status: applied
|
|
tags: [machine-learning, fine-tuning, lora, qlora, transfer-learning, peft, instruction-tuning]
|
|
raw_sources: []
|
|
last_reinforced: 2026-05-10
|
|
github_commit: pending
|
|
tech_stack:
|
|
language: Python
|
|
framework: HuggingFace transformers / peft / TRL / Unsloth / Axolotl
|
|
---
|
|
|
|
# Fine-tuning
|
|
|
|
## 매 한 줄
|
|
> **"매 pretrained model 의 task / domain 의 의 adapt"**. 매 modern: 매 LoRA / QLoRA (PEFT) — 매 fraction parameter 의 update. 매 instruction tuning, RLHF, DPO. 매 alternative: prompt engineering, RAG. 매 cost: 매 GPU + data + eval.
|
|
|
|
## 매 핵심
|
|
|
|
### 매 spectrum
|
|
- **Full fine-tune**: 매 모든 weight.
|
|
- **PEFT** (parameter-efficient):
|
|
- LoRA: 매 rank decomposition.
|
|
- QLoRA: 매 4-bit quant + LoRA.
|
|
- Adapter: 매 inserted layer.
|
|
- IA³, Prefix-tuning, Prompt-tuning.
|
|
- **Instruction tuning**: 매 (prompt, response).
|
|
- **DPO / SimPO**: 매 preference.
|
|
- **RLHF**: 매 PPO + reward.
|
|
|
|
### 매 vs alternatives
|
|
- **Prompt engineering**: 매 cheapest, 매 limited.
|
|
- **Few-shot**: 매 ICL, 매 token cost.
|
|
- **RAG**: 매 fresh knowledge.
|
|
- **Fine-tune**: 매 style / format / domain skill.
|
|
|
|
### 매 응용
|
|
1. **Domain adapt**: 매 medical, legal.
|
|
2. **Style**: 매 brand voice.
|
|
3. **Format**: 매 JSON, structured.
|
|
4. **Tool use**: 매 function calling.
|
|
5. **Multilingual**: 매 low-resource.
|
|
6. **Safety**: 매 refuse harmful.
|
|
|
|
### 매 modern (2024+)
|
|
- **Unsloth**: 매 2x faster, lower VRAM.
|
|
- **Axolotl**: 매 YAML config.
|
|
- **TRL**: 매 SFT + DPO.
|
|
- **MLX-LM** (Apple): 매 on-device.
|
|
|
|
## 💻 패턴
|
|
|
|
### Full fine-tune (HF Trainer)
|
|
```python
|
|
from transformers import AutoModelForCausalLM, AutoTokenizer, Trainer, TrainingArguments
|
|
|
|
model = AutoModelForCausalLM.from_pretrained('meta-llama/Llama-3-8B')
|
|
tokenizer = AutoTokenizer.from_pretrained('meta-llama/Llama-3-8B')
|
|
|
|
args = TrainingArguments(
|
|
output_dir='out', num_train_epochs=3,
|
|
per_device_train_batch_size=4, gradient_accumulation_steps=4,
|
|
learning_rate=1e-5, warmup_steps=100, fp16=True,
|
|
save_steps=500, evaluation_strategy='steps', eval_steps=500,
|
|
)
|
|
Trainer(model=model, args=args, train_dataset=ds, eval_dataset=eval_ds).train()
|
|
```
|
|
|
|
### LoRA (peft)
|
|
```python
|
|
from peft import LoraConfig, get_peft_model
|
|
|
|
config = LoraConfig(
|
|
r=16, lora_alpha=32, lora_dropout=0.1,
|
|
target_modules=['q_proj', 'k_proj', 'v_proj', 'o_proj'],
|
|
bias='none', task_type='CAUSAL_LM',
|
|
)
|
|
model = get_peft_model(model, config)
|
|
model.print_trainable_parameters() # 매 ~0.1% of full
|
|
```
|
|
|
|
### QLoRA (4-bit)
|
|
```python
|
|
from transformers import BitsAndBytesConfig
|
|
bnb = BitsAndBytesConfig(
|
|
load_in_4bit=True, bnb_4bit_quant_type='nf4',
|
|
bnb_4bit_compute_dtype=torch.bfloat16, bnb_4bit_use_double_quant=True,
|
|
)
|
|
model = AutoModelForCausalLM.from_pretrained('llama-3-8b', quantization_config=bnb)
|
|
model = get_peft_model(model, lora_config)
|
|
```
|
|
|
|
### TRL SFT
|
|
```python
|
|
from trl import SFTTrainer
|
|
|
|
def format_instruction(ex):
|
|
return f"### Instruction\n{ex['instruction']}\n### Response\n{ex['response']}"
|
|
|
|
trainer = SFTTrainer(
|
|
model=model, args=args, train_dataset=ds,
|
|
formatting_func=format_instruction, max_seq_length=2048,
|
|
)
|
|
trainer.train()
|
|
```
|
|
|
|
### DPO (preference)
|
|
```python
|
|
from trl import DPOTrainer
|
|
|
|
# 매 dataset: prompt, chosen, rejected
|
|
dpo = DPOTrainer(
|
|
model=model, ref_model=ref_model,
|
|
args=args, beta=0.1,
|
|
train_dataset=preference_ds,
|
|
tokenizer=tokenizer,
|
|
)
|
|
dpo.train()
|
|
```
|
|
|
|
### Unsloth (faster)
|
|
```python
|
|
from unsloth import FastLanguageModel
|
|
|
|
model, tokenizer = FastLanguageModel.from_pretrained(
|
|
'unsloth/llama-3-8b-Instruct-bnb-4bit',
|
|
max_seq_length=4096, load_in_4bit=True,
|
|
)
|
|
model = FastLanguageModel.get_peft_model(model, r=16, lora_alpha=32)
|
|
# 매 2x faster + lower memory
|
|
```
|
|
|
|
### Axolotl YAML
|
|
```yaml
|
|
base_model: meta-llama/Llama-3-8B
|
|
load_in_4bit: true
|
|
adapter: qlora
|
|
lora_r: 16
|
|
lora_alpha: 32
|
|
datasets:
|
|
- path: my_data.jsonl
|
|
type: alpaca
|
|
val_set_size: 0.05
|
|
sequence_len: 2048
|
|
gradient_accumulation_steps: 4
|
|
micro_batch_size: 4
|
|
num_epochs: 3
|
|
learning_rate: 2e-4
|
|
optimizer: adamw_bnb_8bit
|
|
```
|
|
|
|
### Merge LoRA back
|
|
```python
|
|
from peft import PeftModel
|
|
base = AutoModelForCausalLM.from_pretrained('llama-3-8b')
|
|
lora = PeftModel.from_pretrained(base, 'lora_checkpoint')
|
|
merged = lora.merge_and_unload()
|
|
merged.save_pretrained('llama3-finetuned')
|
|
```
|
|
|
|
### Eval (held-out)
|
|
```python
|
|
def eval_finetuned(model, tokenizer, eval_ds):
|
|
correct = 0
|
|
for ex in eval_ds:
|
|
out = model.generate(**tokenizer(ex['prompt'], return_tensors='pt'), max_new_tokens=128)
|
|
pred = tokenizer.decode(out[0], skip_special_tokens=True)
|
|
if grade(pred, ex['answer']): correct += 1
|
|
return correct / len(eval_ds)
|
|
```
|
|
|
|
### Prevent catastrophic forgetting
|
|
```python
|
|
# 매 mix in original-task data
|
|
def mixed_training(specific_data, general_data, ratio=0.2):
|
|
mixed = list(specific_data) + random.sample(list(general_data), int(len(specific_data) * ratio))
|
|
random.shuffle(mixed)
|
|
return mixed
|
|
```
|
|
|
|
### Function calling fine-tune
|
|
```python
|
|
def format_function_call(ex):
|
|
return {
|
|
'prompt': f"User: {ex['user']}\n",
|
|
'response': f"<tool_call>{json.dumps(ex['call'])}</tool_call>\n",
|
|
}
|
|
```
|
|
|
|
### LR schedule (cosine + warmup)
|
|
```python
|
|
args = TrainingArguments(
|
|
learning_rate=2e-4, warmup_ratio=0.03, lr_scheduler_type='cosine',
|
|
num_train_epochs=3,
|
|
)
|
|
```
|
|
|
|
### Data quality (LIMA-style)
|
|
```python
|
|
# 매 LIMA 2023: 매 1000 high-quality > 매 100k noisy
|
|
def filter_quality(dataset, criteria):
|
|
return [d for d in dataset if all(c(d) for c in criteria)]
|
|
```
|
|
|
|
### MLX-LM (Apple)
|
|
```bash
|
|
pip install mlx-lm
|
|
python -m mlx_lm.lora --model llama-3-8b --train --data data.jsonl --batch-size 1 --lora-layers 16
|
|
```
|
|
|
|
### Quantize-aware training
|
|
```python
|
|
# 매 GPTQ / AWQ for inference
|
|
from auto_gptq import AutoGPTQForCausalLM
|
|
quantized = AutoGPTQForCausalLM.from_pretrained(
|
|
merged_model, quantize_config=BaseQuantizeConfig(bits=4, group_size=128),
|
|
)
|
|
```
|
|
|
|
## 매 결정 기준
|
|
| 상황 | Approach |
|
|
|---|---|
|
|
| Style / format | Prompt eng (try first) |
|
|
| Domain knowledge | RAG (try first) |
|
|
| Custom skill | LoRA fine-tune |
|
|
| Limited GPU | QLoRA (4-bit) |
|
|
| Production speed | Unsloth |
|
|
| Preference align | DPO |
|
|
| Best quality cost-aware | LoRA r=64 + DPO |
|
|
|
|
**기본값**: 매 prompt → RAG → LoRA. 매 GPU 부족 = QLoRA. 매 alignment = DPO. 매 quality = LIMA-style data.
|
|
|
|
## 🔗 Graph
|
|
- 부모: [[Machine-Learning]]
|
|
- 변형: [[LoRA]] · [[QLoRA]] · [[DPO]] · [[RLHF]] · [[Fine-tuning|Instruction-Tuning]]
|
|
- 응용: [[PEFT]] · [[Axolotl]]
|
|
- Adjacent: [[Catastrophic-Forgetting]] · [[Foundation-Models]] · [[RAG]] · [[Prompt_Engineering|Prompt-Engineering]]
|
|
|
|
## 🤖 LLM 활용
|
|
**언제**: 매 specific style / skill. 매 domain expert. 매 alignment.
|
|
**언제 X**: 매 quick fix (prompt). 매 fresh data (RAG).
|
|
|
|
## ❌ 안티패턴
|
|
- **Fine-tune for facts**: 매 RAG 의 use.
|
|
- **Tiny dataset full FT**: 매 catastrophic forget.
|
|
- **No eval baseline**: 매 regress 의 invisible.
|
|
- **Skip LoRA → full FT**: 매 overkill.
|
|
- **No quant inference**: 매 cost.
|
|
|
|
## 🧪 검증 / 중복
|
|
- Verified (Hu LoRA 2021, Dettmers QLoRA 2023, Rafailov DPO 2023, LIMA 2023).
|
|
- 신뢰도 A.
|
|
|
|
## 🕓 Changelog
|
|
| 날짜 | 변경 |
|
|
|---|---|
|
|
| 2026-04-20 | Auto-reinforced |
|
|
| 2026-05-08 | Phase 1 |
|
|
| 2026-05-10 | Manual cleanup — full / LoRA / QLoRA / DPO / Unsloth / Axolotl / merge code |
|