[G1-Sync] Manual knowledge update
This commit is contained in:
@@ -0,0 +1,355 @@
|
||||
---
|
||||
id: ai-fine-tune-practical
|
||||
title: Fine-tune Practical — LoRA / QLoRA / OpenAI API
|
||||
category: Coding
|
||||
status: draft
|
||||
source_trust_level: B
|
||||
verification_status: conceptual
|
||||
created_at: 2026-05-09
|
||||
updated_at: 2026-05-09
|
||||
tags: [ai, fine-tune, vibe-coding]
|
||||
tech_stack: { language: "Python", applicable_to: ["AI"] }
|
||||
applied_in: []
|
||||
aliases: [LoRA, QLoRA, fine-tune, OpenAI fine-tuning, Anthropic, Together, Axolotl]
|
||||
---
|
||||
|
||||
# Fine-tune Practical
|
||||
|
||||
> Prompt + RAG 가 안 = fine-tune. **OpenAI / Anthropic API (managed) 또는 LoRA self-host (cheap)**.
|
||||
|
||||
## 📖 핵심 개념
|
||||
- 대부분 prompt 충분.
|
||||
- Fine-tune = style / format / domain.
|
||||
- LoRA = parameter-efficient.
|
||||
- 100-10000 example 가 sweet.
|
||||
|
||||
## 💻 코드 패턴
|
||||
|
||||
### When fine-tune?
|
||||
```
|
||||
✓ Specific format (always JSON, specific style).
|
||||
✓ Domain knowledge (legal, medical).
|
||||
✓ Latency / cost (작은 model 가 fine-tune = 큰 model 같음).
|
||||
✓ Brand voice.
|
||||
|
||||
✗ "Better quality" generic.
|
||||
✗ Fact (RAG 더 좋음).
|
||||
✗ Recent info (cutoff).
|
||||
|
||||
→ Prompt 시도. RAG 시도. 안 되면 fine-tune.
|
||||
```
|
||||
|
||||
### OpenAI fine-tune
|
||||
```python
|
||||
import openai
|
||||
|
||||
# Upload data
|
||||
file = openai.files.create(
|
||||
file=open('data.jsonl', 'rb'),
|
||||
purpose='fine-tune'
|
||||
)
|
||||
|
||||
# Create job
|
||||
job = openai.fine_tuning.jobs.create(
|
||||
training_file=file.id,
|
||||
model='gpt-4o-mini-2024-07-18'
|
||||
)
|
||||
|
||||
# Status
|
||||
job = openai.fine_tuning.jobs.retrieve(job.id)
|
||||
print(job.status) # 'succeeded'
|
||||
print(job.fine_tuned_model) # 'ft:gpt-4o-mini:...'
|
||||
```
|
||||
|
||||
### Data format (chat)
|
||||
```jsonl
|
||||
{"messages": [{"role": "system", "content": "..."}, {"role": "user", "content": "..."}, {"role": "assistant", "content": "..."}]}
|
||||
{"messages": [...]}
|
||||
```
|
||||
|
||||
→ 50-1000 example 가 typical.
|
||||
|
||||
### Use
|
||||
```python
|
||||
r = openai.chat.completions.create(
|
||||
model='ft:gpt-4o-mini:my-org::abc',
|
||||
messages=[{'role': 'user', 'content': '...'}]
|
||||
)
|
||||
```
|
||||
|
||||
→ Drop-in. Prompt 줄어듦 (system 가 implicit).
|
||||
|
||||
### Anthropic fine-tune
|
||||
```
|
||||
Anthropic 가 자체 fine-tuning service 가 limited.
|
||||
- Claude Opus / Sonnet 가 prompt 강.
|
||||
- Cost 절감 = Haiku.
|
||||
- Custom fine-tune 가 enterprise 만.
|
||||
```
|
||||
|
||||
→ 대부분 case 에 prompt + RAG 충분.
|
||||
|
||||
### LoRA self-host (HuggingFace)
|
||||
```python
|
||||
from transformers import AutoModelForCausalLM, AutoTokenizer
|
||||
from peft import LoraConfig, get_peft_model
|
||||
from trl import SFTTrainer
|
||||
|
||||
model_id = 'meta-llama/Llama-3-8B-Instruct'
|
||||
model = AutoModelForCausalLM.from_pretrained(model_id)
|
||||
tokenizer = AutoTokenizer.from_pretrained(model_id)
|
||||
|
||||
config = LoraConfig(
|
||||
r=16, lora_alpha=32,
|
||||
target_modules=['q_proj', 'v_proj'],
|
||||
lora_dropout=0.05,
|
||||
)
|
||||
model = get_peft_model(model, config)
|
||||
|
||||
trainer = SFTTrainer(
|
||||
model=model,
|
||||
train_dataset=dataset,
|
||||
dataset_text_field='text',
|
||||
args=TrainingArguments(output_dir='./lora', num_train_epochs=3),
|
||||
)
|
||||
trainer.train()
|
||||
```
|
||||
|
||||
→ Single A100 가 충분 (8B model).
|
||||
|
||||
### QLoRA (4-bit)
|
||||
```python
|
||||
from transformers import BitsAndBytesConfig
|
||||
|
||||
config = BitsAndBytesConfig(
|
||||
load_in_4bit=True,
|
||||
bnb_4bit_quant_type='nf4',
|
||||
bnb_4bit_compute_dtype=torch.bfloat16,
|
||||
)
|
||||
|
||||
model = AutoModelForCausalLM.from_pretrained(model_id, quantization_config=config)
|
||||
```
|
||||
|
||||
→ 70B model 가 single A100 (40GB) 가능.
|
||||
|
||||
### Axolotl (config-based)
|
||||
```yaml
|
||||
# config.yml
|
||||
base_model: meta-llama/Llama-3-8B-Instruct
|
||||
adapter: lora
|
||||
lora_r: 16
|
||||
lora_alpha: 32
|
||||
|
||||
datasets:
|
||||
- path: my_data.jsonl
|
||||
type: chat_template
|
||||
|
||||
sequence_len: 2048
|
||||
gradient_accumulation_steps: 4
|
||||
num_epochs: 3
|
||||
learning_rate: 2e-4
|
||||
```
|
||||
|
||||
```bash
|
||||
axolotl train config.yml
|
||||
```
|
||||
|
||||
→ Best practice 이미 baked.
|
||||
|
||||
### Together AI / Replicate (managed)
|
||||
```python
|
||||
from together import Together
|
||||
together = Together()
|
||||
|
||||
job = together.fine_tuning.create(
|
||||
training_file='file-...',
|
||||
model='meta-llama/Llama-3-8B-Instruct',
|
||||
n_epochs=3,
|
||||
)
|
||||
```
|
||||
|
||||
→ Self-host 의 hassle 없이.
|
||||
|
||||
### Data quality > quantity
|
||||
```
|
||||
50 high-quality > 5000 noisy.
|
||||
|
||||
→ Manual curation.
|
||||
- Diverse examples.
|
||||
- Consistent format.
|
||||
- Clean (no typo, error).
|
||||
- Edge cases.
|
||||
```
|
||||
|
||||
### Eval
|
||||
```python
|
||||
# Test set (held out)
|
||||
test = [...]
|
||||
|
||||
for ex in test:
|
||||
pred = model.generate(ex['input'])
|
||||
score = match(pred, ex['expected'])
|
||||
```
|
||||
|
||||
→ Benchmark vs base model.
|
||||
|
||||
### Cost
|
||||
```
|
||||
OpenAI fine-tune (gpt-4o-mini):
|
||||
- $25 / 1M training token.
|
||||
- 1k example × 500 token = 500k = $13.
|
||||
|
||||
Self-host LoRA (8B):
|
||||
- 1 A100 hour = $1-3.
|
||||
- 1k example × 3 epoch = 1-2 hour = $2-6.
|
||||
- Compared to API generation: cheap.
|
||||
|
||||
Inference:
|
||||
- Fine-tuned API: 약간 비싼 (12% premium).
|
||||
- Self-host: GPU rental.
|
||||
```
|
||||
|
||||
### When small model fine-tune > big prompt
|
||||
```
|
||||
Big model + complex prompt:
|
||||
- $10 / 1M token.
|
||||
- 5 sec latency.
|
||||
|
||||
Small model fine-tuned:
|
||||
- $0.30 / 1M token.
|
||||
- 1 sec latency.
|
||||
|
||||
→ 같은 quality + 30x cheap + 5x faster.
|
||||
|
||||
But: only specific task. Generic = big model.
|
||||
```
|
||||
|
||||
### DPO (alignment)
|
||||
```python
|
||||
from trl import DPOTrainer
|
||||
|
||||
trainer = DPOTrainer(
|
||||
model=sft_model,
|
||||
ref_model=sft_frozen,
|
||||
train_dataset=preferences, # {chosen, rejected}
|
||||
beta=0.1,
|
||||
)
|
||||
trainer.train()
|
||||
```
|
||||
|
||||
→ Preference learning.
|
||||
→ [[AI_RLHF_DPO_Basics]].
|
||||
|
||||
### Production deploy
|
||||
```
|
||||
LoRA adapter:
|
||||
- 100 MB-1 GB (small).
|
||||
- 매 user 가 own adapter (multi-tenant).
|
||||
- vLLM 가 serve N adapter from 1 base.
|
||||
|
||||
Full fine-tune:
|
||||
- 큰 model (16-140 GB).
|
||||
- 자체 instance.
|
||||
```
|
||||
|
||||
→ LoRA 가 cost 의 답.
|
||||
|
||||
### Multi-LoRA serving
|
||||
```bash
|
||||
vllm --model meta-llama/Llama-3-8B \
|
||||
--enable-lora \
|
||||
--lora-modules customer1=path1 customer2=path2
|
||||
```
|
||||
|
||||
→ N customer × 1 base model.
|
||||
|
||||
### When NOT?
|
||||
```
|
||||
- Generic task: prompt 충분.
|
||||
- 작은 dataset (< 50): few-shot.
|
||||
- Frequent change: re-train cost.
|
||||
- Simple format: structured output.
|
||||
```
|
||||
|
||||
### Model selection
|
||||
```
|
||||
Open:
|
||||
- Llama 3 (Meta).
|
||||
- Mistral (Mistral AI).
|
||||
- Gemma (Google).
|
||||
- Qwen (Alibaba).
|
||||
|
||||
→ License + size + quality balance.
|
||||
```
|
||||
|
||||
### Synthetic data
|
||||
```python
|
||||
# GPT-4 가 training data 생성.
|
||||
prompts = [...]
|
||||
training_data = [{
|
||||
'input': p,
|
||||
'output': gpt4.complete(p)
|
||||
} for p in prompts]
|
||||
|
||||
# Smaller model 가 mimic.
|
||||
```
|
||||
|
||||
→ "Distillation" 의 식.
|
||||
|
||||
### Fine-tune for code
|
||||
```
|
||||
Code-specific:
|
||||
- DeepSeek Coder.
|
||||
- CodeLlama.
|
||||
- StarCoder 2.
|
||||
|
||||
→ Domain-specific base model 이 좋음.
|
||||
```
|
||||
|
||||
### Continuous fine-tune
|
||||
```
|
||||
Production:
|
||||
- 매 day / week 의 new data 가 model 에.
|
||||
- Latest = 최신 fine-tune.
|
||||
|
||||
→ Drift adaptation.
|
||||
```
|
||||
|
||||
### 함정
|
||||
```
|
||||
- Overfitting (small dataset).
|
||||
- Catastrophic forgetting (큰 fine-tune).
|
||||
- Eval set 가 train 에 leaked.
|
||||
- Fine-tune 후 generic 약 (specialized).
|
||||
- Cost > prompt approach.
|
||||
```
|
||||
|
||||
## 🤔 의사결정 기준
|
||||
| 상황 | 추천 |
|
||||
|---|---|
|
||||
| Generic | Prompt + RAG |
|
||||
| Specific format | OpenAI fine-tune |
|
||||
| Cost / latency | Fine-tune small open model |
|
||||
| Domain knowledge | Fine-tune + RAG |
|
||||
| Open / self-host | LoRA + Axolotl |
|
||||
| Managed | Together / Replicate / OpenAI |
|
||||
| Privacy | Self-host |
|
||||
|
||||
## ❌ 안티패턴
|
||||
- **Fine-tune 가 first try**: prompt + RAG 시도.
|
||||
- **Small dataset (< 50)**: few-shot.
|
||||
- **Eval leak**: 가짜 score.
|
||||
- **Catastrophic forgetting**: gentle 한 LR.
|
||||
- **No version control of fine-tune**: 잃음.
|
||||
- **Monolithic model**: LoRA 가 modular.
|
||||
|
||||
## 🤖 LLM 활용 힌트
|
||||
- LoRA / QLoRA 가 cost 의 답.
|
||||
- Axolotl 가 best-practice.
|
||||
- 50-1000 example 가 sweet.
|
||||
- Multi-LoRA serving (vLLM).
|
||||
|
||||
## 🔗 관련 문서
|
||||
- [[AI_Fine_Tuning_vs_Prompting]]
|
||||
- [[AI_RLHF_DPO_Basics]]
|
||||
- [[AI_Production_Deploy]]
|
||||
Reference in New Issue
Block a user