7.1 KiB
7.1 KiB
id, title, category, status, source_trust_level, verification_status, created_at, updated_at, tags, tech_stack, applied_in, aliases
| id | title | category | status | source_trust_level | verification_status | created_at | updated_at | tags | tech_stack | applied_in | aliases | |||||||||||||||
|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|
| ai-fine-tune-practical | Fine-tune Practical — LoRA / QLoRA / OpenAI API | Coding | draft | B | conceptual | 2026-05-09 | 2026-05-09 |
|
|
|
Fine-tune Practical
Prompt + RAG 가 안 = fine-tune. OpenAI / Anthropic API (managed) 또는 LoRA self-host (cheap).
📖 핵심 개념
- 대부분 prompt 충분.
- Fine-tune = style / format / domain.
- LoRA = parameter-efficient.
- 100-10000 example 가 sweet.
💻 코드 패턴
When fine-tune?
✓ Specific format (always JSON, specific style).
✓ Domain knowledge (legal, medical).
✓ Latency / cost (작은 model 가 fine-tune = 큰 model 같음).
✓ Brand voice.
✗ "Better quality" generic.
✗ Fact (RAG 더 좋음).
✗ Recent info (cutoff).
→ Prompt 시도. RAG 시도. 안 되면 fine-tune.
OpenAI fine-tune
import openai
# Upload data
file = openai.files.create(
file=open('data.jsonl', 'rb'),
purpose='fine-tune'
)
# Create job
job = openai.fine_tuning.jobs.create(
training_file=file.id,
model='gpt-4o-mini-2024-07-18'
)
# Status
job = openai.fine_tuning.jobs.retrieve(job.id)
print(job.status) # 'succeeded'
print(job.fine_tuned_model) # 'ft:gpt-4o-mini:...'
Data format (chat)
{"messages": [{"role": "system", "content": "..."}, {"role": "user", "content": "..."}, {"role": "assistant", "content": "..."}]}
{"messages": [...]}
→ 50-1000 example 가 typical.
Use
r = openai.chat.completions.create(
model='ft:gpt-4o-mini:my-org::abc',
messages=[{'role': 'user', 'content': '...'}]
)
→ Drop-in. Prompt 줄어듦 (system 가 implicit).
Anthropic fine-tune
Anthropic 가 자체 fine-tuning service 가 limited.
- Claude Opus / Sonnet 가 prompt 강.
- Cost 절감 = Haiku.
- Custom fine-tune 가 enterprise 만.
→ 대부분 case 에 prompt + RAG 충분.
LoRA self-host (HuggingFace)
from transformers import AutoModelForCausalLM, AutoTokenizer
from peft import LoraConfig, get_peft_model
from trl import SFTTrainer
model_id = 'meta-llama/Llama-3-8B-Instruct'
model = AutoModelForCausalLM.from_pretrained(model_id)
tokenizer = AutoTokenizer.from_pretrained(model_id)
config = LoraConfig(
r=16, lora_alpha=32,
target_modules=['q_proj', 'v_proj'],
lora_dropout=0.05,
)
model = get_peft_model(model, config)
trainer = SFTTrainer(
model=model,
train_dataset=dataset,
dataset_text_field='text',
args=TrainingArguments(output_dir='./lora', num_train_epochs=3),
)
trainer.train()
→ Single A100 가 충분 (8B model).
QLoRA (4-bit)
from transformers import BitsAndBytesConfig
config = BitsAndBytesConfig(
load_in_4bit=True,
bnb_4bit_quant_type='nf4',
bnb_4bit_compute_dtype=torch.bfloat16,
)
model = AutoModelForCausalLM.from_pretrained(model_id, quantization_config=config)
→ 70B model 가 single A100 (40GB) 가능.
Axolotl (config-based)
# config.yml
base_model: meta-llama/Llama-3-8B-Instruct
adapter: lora
lora_r: 16
lora_alpha: 32
datasets:
- path: my_data.jsonl
type: chat_template
sequence_len: 2048
gradient_accumulation_steps: 4
num_epochs: 3
learning_rate: 2e-4
axolotl train config.yml
→ Best practice 이미 baked.
Together AI / Replicate (managed)
from together import Together
together = Together()
job = together.fine_tuning.create(
training_file='file-...',
model='meta-llama/Llama-3-8B-Instruct',
n_epochs=3,
)
→ Self-host 의 hassle 없이.
Data quality > quantity
50 high-quality > 5000 noisy.
→ Manual curation.
- Diverse examples.
- Consistent format.
- Clean (no typo, error).
- Edge cases.
Eval
# Test set (held out)
test = [...]
for ex in test:
pred = model.generate(ex['input'])
score = match(pred, ex['expected'])
→ Benchmark vs base model.
Cost
OpenAI fine-tune (gpt-4o-mini):
- $25 / 1M training token.
- 1k example × 500 token = 500k = $13.
Self-host LoRA (8B):
- 1 A100 hour = $1-3.
- 1k example × 3 epoch = 1-2 hour = $2-6.
- Compared to API generation: cheap.
Inference:
- Fine-tuned API: 약간 비싼 (12% premium).
- Self-host: GPU rental.
When small model fine-tune > big prompt
Big model + complex prompt:
- $10 / 1M token.
- 5 sec latency.
Small model fine-tuned:
- $0.30 / 1M token.
- 1 sec latency.
→ 같은 quality + 30x cheap + 5x faster.
But: only specific task. Generic = big model.
DPO (alignment)
from trl import DPOTrainer
trainer = DPOTrainer(
model=sft_model,
ref_model=sft_frozen,
train_dataset=preferences, # {chosen, rejected}
beta=0.1,
)
trainer.train()
→ Preference learning. → AI_RLHF_DPO_Basics.
Production deploy
LoRA adapter:
- 100 MB-1 GB (small).
- 매 user 가 own adapter (multi-tenant).
- vLLM 가 serve N adapter from 1 base.
Full fine-tune:
- 큰 model (16-140 GB).
- 자체 instance.
→ LoRA 가 cost 의 답.
Multi-LoRA serving
vllm --model meta-llama/Llama-3-8B \
--enable-lora \
--lora-modules customer1=path1 customer2=path2
→ N customer × 1 base model.
When NOT?
- Generic task: prompt 충분.
- 작은 dataset (< 50): few-shot.
- Frequent change: re-train cost.
- Simple format: structured output.
Model selection
Open:
- Llama 3 (Meta).
- Mistral (Mistral AI).
- Gemma (Google).
- Qwen (Alibaba).
→ License + size + quality balance.
Synthetic data
# GPT-4 가 training data 생성.
prompts = [...]
training_data = [{
'input': p,
'output': gpt4.complete(p)
} for p in prompts]
# Smaller model 가 mimic.
→ "Distillation" 의 식.
Fine-tune for code
Code-specific:
- DeepSeek Coder.
- CodeLlama.
- StarCoder 2.
→ Domain-specific base model 이 좋음.
Continuous fine-tune
Production:
- 매 day / week 의 new data 가 model 에.
- Latest = 최신 fine-tune.
→ Drift adaptation.
함정
- Overfitting (small dataset).
- Catastrophic forgetting (큰 fine-tune).
- Eval set 가 train 에 leaked.
- Fine-tune 후 generic 약 (specialized).
- Cost > prompt approach.
🤔 의사결정 기준
| 상황 | 추천 |
|---|---|
| Generic | Prompt + RAG |
| Specific format | OpenAI fine-tune |
| Cost / latency | Fine-tune small open model |
| Domain knowledge | Fine-tune + RAG |
| Open / self-host | LoRA + Axolotl |
| Managed | Together / Replicate / OpenAI |
| Privacy | Self-host |
❌ 안티패턴
- Fine-tune 가 first try: prompt + RAG 시도.
- Small dataset (< 50): few-shot.
- Eval leak: 가짜 score.
- Catastrophic forgetting: gentle 한 LR.
- No version control of fine-tune: 잃음.
- Monolithic model: LoRA 가 modular.
🤖 LLM 활용 힌트
- LoRA / QLoRA 가 cost 의 답.
- Axolotl 가 best-practice.
- 50-1000 example 가 sweet.
- Multi-LoRA serving (vLLM).