[G1-Sync] Manual knowledge update

This commit is contained in:
Antigravity Agent
2026-05-10 22:08:15 +09:00
parent 21ac3ed255
commit 504fd5fb42
3011 changed files with 380280 additions and 206977 deletions
@@ -0,0 +1,355 @@
---
id: ai-fine-tune-practical
title: Fine-tune Practical — LoRA / QLoRA / OpenAI API
category: Coding
status: draft
source_trust_level: B
verification_status: conceptual
created_at: 2026-05-09
updated_at: 2026-05-09
tags: [ai, fine-tune, vibe-coding]
tech_stack: { language: "Python", applicable_to: ["AI"] }
applied_in: []
aliases: [LoRA, QLoRA, fine-tune, OpenAI fine-tuning, Anthropic, Together, Axolotl]
---
# Fine-tune Practical
> Prompt + RAG 가 안 = fine-tune. **OpenAI / Anthropic API (managed) 또는 LoRA self-host (cheap)**.
## 📖 핵심 개념
- 대부분 prompt 충분.
- Fine-tune = style / format / domain.
- LoRA = parameter-efficient.
- 100-10000 example 가 sweet.
## 💻 코드 패턴
### When fine-tune?
```
✓ Specific format (always JSON, specific style).
✓ Domain knowledge (legal, medical).
✓ Latency / cost (작은 model 가 fine-tune = 큰 model 같음).
✓ Brand voice.
✗ "Better quality" generic.
✗ Fact (RAG 더 좋음).
✗ Recent info (cutoff).
→ Prompt 시도. RAG 시도. 안 되면 fine-tune.
```
### OpenAI fine-tune
```python
import openai
# Upload data
file = openai.files.create(
file=open('data.jsonl', 'rb'),
purpose='fine-tune'
)
# Create job
job = openai.fine_tuning.jobs.create(
training_file=file.id,
model='gpt-4o-mini-2024-07-18'
)
# Status
job = openai.fine_tuning.jobs.retrieve(job.id)
print(job.status) # 'succeeded'
print(job.fine_tuned_model) # 'ft:gpt-4o-mini:...'
```
### Data format (chat)
```jsonl
{"messages": [{"role": "system", "content": "..."}, {"role": "user", "content": "..."}, {"role": "assistant", "content": "..."}]}
{"messages": [...]}
```
→ 50-1000 example 가 typical.
### Use
```python
r = openai.chat.completions.create(
model='ft:gpt-4o-mini:my-org::abc',
messages=[{'role': 'user', 'content': '...'}]
)
```
→ Drop-in. Prompt 줄어듦 (system 가 implicit).
### Anthropic fine-tune
```
Anthropic 가 자체 fine-tuning service 가 limited.
- Claude Opus / Sonnet 가 prompt 강.
- Cost 절감 = Haiku.
- Custom fine-tune 가 enterprise 만.
```
→ 대부분 case 에 prompt + RAG 충분.
### LoRA self-host (HuggingFace)
```python
from transformers import AutoModelForCausalLM, AutoTokenizer
from peft import LoraConfig, get_peft_model
from trl import SFTTrainer
model_id = 'meta-llama/Llama-3-8B-Instruct'
model = AutoModelForCausalLM.from_pretrained(model_id)
tokenizer = AutoTokenizer.from_pretrained(model_id)
config = LoraConfig(
r=16, lora_alpha=32,
target_modules=['q_proj', 'v_proj'],
lora_dropout=0.05,
)
model = get_peft_model(model, config)
trainer = SFTTrainer(
model=model,
train_dataset=dataset,
dataset_text_field='text',
args=TrainingArguments(output_dir='./lora', num_train_epochs=3),
)
trainer.train()
```
→ Single A100 가 충분 (8B model).
### QLoRA (4-bit)
```python
from transformers import BitsAndBytesConfig
config = BitsAndBytesConfig(
load_in_4bit=True,
bnb_4bit_quant_type='nf4',
bnb_4bit_compute_dtype=torch.bfloat16,
)
model = AutoModelForCausalLM.from_pretrained(model_id, quantization_config=config)
```
→ 70B model 가 single A100 (40GB) 가능.
### Axolotl (config-based)
```yaml
# config.yml
base_model: meta-llama/Llama-3-8B-Instruct
adapter: lora
lora_r: 16
lora_alpha: 32
datasets:
- path: my_data.jsonl
type: chat_template
sequence_len: 2048
gradient_accumulation_steps: 4
num_epochs: 3
learning_rate: 2e-4
```
```bash
axolotl train config.yml
```
→ Best practice 이미 baked.
### Together AI / Replicate (managed)
```python
from together import Together
together = Together()
job = together.fine_tuning.create(
training_file='file-...',
model='meta-llama/Llama-3-8B-Instruct',
n_epochs=3,
)
```
→ Self-host 의 hassle 없이.
### Data quality > quantity
```
50 high-quality > 5000 noisy.
→ Manual curation.
- Diverse examples.
- Consistent format.
- Clean (no typo, error).
- Edge cases.
```
### Eval
```python
# Test set (held out)
test = [...]
for ex in test:
pred = model.generate(ex['input'])
score = match(pred, ex['expected'])
```
→ Benchmark vs base model.
### Cost
```
OpenAI fine-tune (gpt-4o-mini):
- $25 / 1M training token.
- 1k example × 500 token = 500k = $13.
Self-host LoRA (8B):
- 1 A100 hour = $1-3.
- 1k example × 3 epoch = 1-2 hour = $2-6.
- Compared to API generation: cheap.
Inference:
- Fine-tuned API: 약간 비싼 (12% premium).
- Self-host: GPU rental.
```
### When small model fine-tune > big prompt
```
Big model + complex prompt:
- $10 / 1M token.
- 5 sec latency.
Small model fine-tuned:
- $0.30 / 1M token.
- 1 sec latency.
→ 같은 quality + 30x cheap + 5x faster.
But: only specific task. Generic = big model.
```
### DPO (alignment)
```python
from trl import DPOTrainer
trainer = DPOTrainer(
model=sft_model,
ref_model=sft_frozen,
train_dataset=preferences, # {chosen, rejected}
beta=0.1,
)
trainer.train()
```
→ Preference learning.
→ [[AI_RLHF_DPO_Basics]].
### Production deploy
```
LoRA adapter:
- 100 MB-1 GB (small).
- 매 user 가 own adapter (multi-tenant).
- vLLM 가 serve N adapter from 1 base.
Full fine-tune:
- 큰 model (16-140 GB).
- 자체 instance.
```
→ LoRA 가 cost 의 답.
### Multi-LoRA serving
```bash
vllm --model meta-llama/Llama-3-8B \
--enable-lora \
--lora-modules customer1=path1 customer2=path2
```
→ N customer × 1 base model.
### When NOT?
```
- Generic task: prompt 충분.
- 작은 dataset (< 50): few-shot.
- Frequent change: re-train cost.
- Simple format: structured output.
```
### Model selection
```
Open:
- Llama 3 (Meta).
- Mistral (Mistral AI).
- Gemma (Google).
- Qwen (Alibaba).
→ License + size + quality balance.
```
### Synthetic data
```python
# GPT-4 가 training data 생성.
prompts = [...]
training_data = [{
'input': p,
'output': gpt4.complete(p)
} for p in prompts]
# Smaller model 가 mimic.
```
→ "Distillation" 의 식.
### Fine-tune for code
```
Code-specific:
- DeepSeek Coder.
- CodeLlama.
- StarCoder 2.
→ Domain-specific base model 이 좋음.
```
### Continuous fine-tune
```
Production:
- 매 day / week 의 new data 가 model 에.
- Latest = 최신 fine-tune.
→ Drift adaptation.
```
### 함정
```
- Overfitting (small dataset).
- Catastrophic forgetting (큰 fine-tune).
- Eval set 가 train 에 leaked.
- Fine-tune 후 generic 약 (specialized).
- Cost > prompt approach.
```
## 🤔 의사결정 기준
| 상황 | 추천 |
|---|---|
| Generic | Prompt + RAG |
| Specific format | OpenAI fine-tune |
| Cost / latency | Fine-tune small open model |
| Domain knowledge | Fine-tune + RAG |
| Open / self-host | LoRA + Axolotl |
| Managed | Together / Replicate / OpenAI |
| Privacy | Self-host |
## ❌ 안티패턴
- **Fine-tune 가 first try**: prompt + RAG 시도.
- **Small dataset (< 50)**: few-shot.
- **Eval leak**: 가짜 score.
- **Catastrophic forgetting**: gentle 한 LR.
- **No version control of fine-tune**: 잃음.
- **Monolithic model**: LoRA 가 modular.
## 🤖 LLM 활용 힌트
- LoRA / QLoRA 가 cost 의 답.
- Axolotl 가 best-practice.
- 50-1000 example 가 sweet.
- Multi-LoRA serving (vLLM).
## 🔗 관련 문서
- [[AI_Fine_Tuning_vs_Prompting]]
- [[AI_RLHF_DPO_Basics]]
- [[AI_Production_Deploy]]