---
id: ai-fine-tune-practical
title: Fine-tune Practical — LoRA / QLoRA / OpenAI API
category: Coding
status: draft
source_trust_level: B
verification_status: conceptual
created_at: 2026-05-09
updated_at: 2026-05-09
tags: [ai, fine-tune, vibe-coding]
tech_stack: { language: "Python", applicable_to: ["AI"] }
applied_in: []
aliases: [LoRA, QLoRA, fine-tune, OpenAI fine-tuning, Anthropic, Together, Axolotl]
---

# Fine-tune Practical

> Prompt + RAG 가 안 = fine-tune. **OpenAI / Anthropic API (managed) 또는 LoRA self-host (cheap)**.

## 📖 핵심 개념
- 대부분 prompt 충분.
- Fine-tune = style / format / domain.
- LoRA = parameter-efficient.
- 100-10000 example 가 sweet.

## 💻 코드 패턴

### When fine-tune?
```
✓ Specific format (always JSON, specific style).
✓ Domain knowledge (legal, medical).
✓ Latency / cost (작은 model 가 fine-tune = 큰 model 같음).
✓ Brand voice.

✗ "Better quality" generic.
✗ Fact (RAG 더 좋음).
✗ Recent info (cutoff).

→ Prompt 시도. RAG 시도. 안 되면 fine-tune.
```

### OpenAI fine-tune
```python
import openai

# Upload data
file = openai.files.create(
    file=open('data.jsonl', 'rb'),
    purpose='fine-tune'
)

# Create job
job = openai.fine_tuning.jobs.create(
    training_file=file.id,
    model='gpt-4o-mini-2024-07-18'
)

# Status
job = openai.fine_tuning.jobs.retrieve(job.id)
print(job.status)  # 'succeeded'
print(job.fine_tuned_model)  # 'ft:gpt-4o-mini:...'
```

### Data format (chat)
```jsonl
{"messages": [{"role": "system", "content": "..."}, {"role": "user", "content": "..."}, {"role": "assistant", "content": "..."}]}
{"messages": [...]}
```

→ 50-1000 example 가 typical.

### Use
```python
r = openai.chat.completions.create(
    model='ft:gpt-4o-mini:my-org::abc',
    messages=[{'role': 'user', 'content': '...'}]
)
```

→ Drop-in. Prompt 줄어듦 (system 가 implicit).

### Anthropic fine-tune
```
Anthropic 가 자체 fine-tuning service 가 limited.
- Claude Opus / Sonnet 가 prompt 강.
- Cost 절감 = Haiku.
- Custom fine-tune 가 enterprise 만.
```

→ 대부분 case 에 prompt + RAG 충분.

### LoRA self-host (HuggingFace)
```python
from transformers import AutoModelForCausalLM, AutoTokenizer
from peft import LoraConfig, get_peft_model
from trl import SFTTrainer

model_id = 'meta-llama/Llama-3-8B-Instruct'
model = AutoModelForCausalLM.from_pretrained(model_id)
tokenizer = AutoTokenizer.from_pretrained(model_id)

config = LoraConfig(
    r=16, lora_alpha=32,
    target_modules=['q_proj', 'v_proj'],
    lora_dropout=0.05,
)
model = get_peft_model(model, config)

trainer = SFTTrainer(
    model=model,
    train_dataset=dataset,
    dataset_text_field='text',
    args=TrainingArguments(output_dir='./lora', num_train_epochs=3),
)
trainer.train()
```

→ Single A100 가 충분 (8B model).

### QLoRA (4-bit)
```python
from transformers import BitsAndBytesConfig

config = BitsAndBytesConfig(
    load_in_4bit=True,
    bnb_4bit_quant_type='nf4',
    bnb_4bit_compute_dtype=torch.bfloat16,
)

model = AutoModelForCausalLM.from_pretrained(model_id, quantization_config=config)
```

→ 70B model 가 single A100 (40GB) 가능.

### Axolotl (config-based)
```yaml
# config.yml
base_model: meta-llama/Llama-3-8B-Instruct
adapter: lora
lora_r: 16
lora_alpha: 32

datasets:
  - path: my_data.jsonl
    type: chat_template

sequence_len: 2048
gradient_accumulation_steps: 4
num_epochs: 3
learning_rate: 2e-4
```

```bash
axolotl train config.yml
```

→ Best practice 이미 baked.

### Together AI / Replicate (managed)
```python
from together import Together
together = Together()

job = together.fine_tuning.create(
    training_file='file-...',
    model='meta-llama/Llama-3-8B-Instruct',
    n_epochs=3,
)
```

→ Self-host 의 hassle 없이.

### Data quality > quantity
```
50 high-quality > 5000 noisy.

→ Manual curation.
- Diverse examples.
- Consistent format.
- Clean (no typo, error).
- Edge cases.
```

### Eval
```python
# Test set (held out)
test = [...]

for ex in test:
    pred = model.generate(ex['input'])
    score = match(pred, ex['expected'])
```

→ Benchmark vs base model.

### Cost
```
OpenAI fine-tune (gpt-4o-mini):
- $25 / 1M training token.
- 1k example × 500 token = 500k = $13.

Self-host LoRA (8B):
- 1 A100 hour = $1-3.
- 1k example × 3 epoch = 1-2 hour = $2-6.
- Compared to API generation: cheap.

Inference:
- Fine-tuned API: 약간 비싼 (12% premium).
- Self-host: GPU rental.
```

### When small model fine-tune > big prompt
```
Big model + complex prompt:
- $10 / 1M token.
- 5 sec latency.

Small model fine-tuned:
- $0.30 / 1M token.
- 1 sec latency.

→ 같은 quality + 30x cheap + 5x faster.

But: only specific task. Generic = big model.
```

### DPO (alignment)
```python
from trl import DPOTrainer

trainer = DPOTrainer(
    model=sft_model,
    ref_model=sft_frozen,
    train_dataset=preferences,  # {chosen, rejected}
    beta=0.1,
)
trainer.train()
```

→ Preference learning.
→ [[AI_RLHF_DPO_Basics]].

### Production deploy
```
LoRA adapter:
- 100 MB-1 GB (small).
- 매 user 가 own adapter (multi-tenant).
- vLLM 가 serve N adapter from 1 base.

Full fine-tune:
- 큰 model (16-140 GB).
- 자체 instance.
```

→ LoRA 가 cost 의 답.

### Multi-LoRA serving
```bash
vllm --model meta-llama/Llama-3-8B \
     --enable-lora \
     --lora-modules customer1=path1 customer2=path2
```

→ N customer × 1 base model.

### When NOT?
```
- Generic task: prompt 충분.
- 작은 dataset (< 50): few-shot.
- Frequent change: re-train cost.
- Simple format: structured output.
```

### Model selection
```
Open:
- Llama 3 (Meta).
- Mistral (Mistral AI).
- Gemma (Google).
- Qwen (Alibaba).

→ License + size + quality balance.
```

### Synthetic data
```python
# GPT-4 가 training data 생성.
prompts = [...]
training_data = [{
    'input': p,
    'output': gpt4.complete(p)
} for p in prompts]

# Smaller model 가 mimic.
```

→ "Distillation" 의 식.

### Fine-tune for code
```
Code-specific:
- DeepSeek Coder.
- CodeLlama.
- StarCoder 2.

→ Domain-specific base model 이 좋음.
```

### Continuous fine-tune
```
Production:
- 매 day / week 의 new data 가 model 에.
- Latest = 최신 fine-tune.

→ Drift adaptation.
```

### 함정
```
- Overfitting (small dataset).
- Catastrophic forgetting (큰 fine-tune).
- Eval set 가 train 에 leaked.
- Fine-tune 후 generic 약 (specialized).
- Cost > prompt approach.
```

## 🤔 의사결정 기준
| 상황 | 추천 |
|---|---|
| Generic | Prompt + RAG |
| Specific format | OpenAI fine-tune |
| Cost / latency | Fine-tune small open model |
| Domain knowledge | Fine-tune + RAG |
| Open / self-host | LoRA + Axolotl |
| Managed | Together / Replicate / OpenAI |
| Privacy | Self-host |

## ❌ 안티패턴
- **Fine-tune 가 first try**: prompt + RAG 시도.
- **Small dataset (< 50)**: few-shot.
- **Eval leak**: 가짜 score.
- **Catastrophic forgetting**: gentle 한 LR.
- **No version control of fine-tune**: 잃음.
- **Monolithic model**: LoRA 가 modular.

## 🤖 LLM 활용 힌트
- LoRA / QLoRA 가 cost 의 답.
- Axolotl 가 best-practice.
- 50-1000 example 가 sweet.
- Multi-LoRA serving (vLLM).

## 🔗 관련 문서
- [[AI_Fine_Tuning_vs_Prompting]]
- [[AI_RLHF_DPO_Basics]]
- [[AI_Production_Deploy]]