Files
2nd/10_Wiki/Topics/Coding/AI_Fine_Tune_Practical.md
T
2026-05-10 22:08:15 +09:00

7.1 KiB
Raw Blame History

id, title, category, status, source_trust_level, verification_status, created_at, updated_at, tags, tech_stack, applied_in, aliases
id title category status source_trust_level verification_status created_at updated_at tags tech_stack applied_in aliases
ai-fine-tune-practical Fine-tune Practical — LoRA / QLoRA / OpenAI API Coding draft B conceptual 2026-05-09 2026-05-09
ai
fine-tune
vibe-coding
language applicable_to
Python
AI
LoRA
QLoRA
fine-tune
OpenAI fine-tuning
Anthropic
Together
Axolotl

Fine-tune Practical

Prompt + RAG 가 안 = fine-tune. OpenAI / Anthropic API (managed) 또는 LoRA self-host (cheap).

📖 핵심 개념

  • 대부분 prompt 충분.
  • Fine-tune = style / format / domain.
  • LoRA = parameter-efficient.
  • 100-10000 example 가 sweet.

💻 코드 패턴

When fine-tune?

✓ Specific format (always JSON, specific style).
✓ Domain knowledge (legal, medical).
✓ Latency / cost (작은 model 가 fine-tune = 큰 model 같음).
✓ Brand voice.

✗ "Better quality" generic.
✗ Fact (RAG 더 좋음).
✗ Recent info (cutoff).

→ Prompt 시도. RAG 시도. 안 되면 fine-tune.

OpenAI fine-tune

import openai

# Upload data
file = openai.files.create(
    file=open('data.jsonl', 'rb'),
    purpose='fine-tune'
)

# Create job
job = openai.fine_tuning.jobs.create(
    training_file=file.id,
    model='gpt-4o-mini-2024-07-18'
)

# Status
job = openai.fine_tuning.jobs.retrieve(job.id)
print(job.status)  # 'succeeded'
print(job.fine_tuned_model)  # 'ft:gpt-4o-mini:...'

Data format (chat)

{"messages": [{"role": "system", "content": "..."}, {"role": "user", "content": "..."}, {"role": "assistant", "content": "..."}]}
{"messages": [...]}

→ 50-1000 example 가 typical.

Use

r = openai.chat.completions.create(
    model='ft:gpt-4o-mini:my-org::abc',
    messages=[{'role': 'user', 'content': '...'}]
)

→ Drop-in. Prompt 줄어듦 (system 가 implicit).

Anthropic fine-tune

Anthropic 가 자체 fine-tuning service 가 limited.
- Claude Opus / Sonnet 가 prompt 강.
- Cost 절감 = Haiku.
- Custom fine-tune 가 enterprise 만.

→ 대부분 case 에 prompt + RAG 충분.

LoRA self-host (HuggingFace)

from transformers import AutoModelForCausalLM, AutoTokenizer
from peft import LoraConfig, get_peft_model
from trl import SFTTrainer

model_id = 'meta-llama/Llama-3-8B-Instruct'
model = AutoModelForCausalLM.from_pretrained(model_id)
tokenizer = AutoTokenizer.from_pretrained(model_id)

config = LoraConfig(
    r=16, lora_alpha=32,
    target_modules=['q_proj', 'v_proj'],
    lora_dropout=0.05,
)
model = get_peft_model(model, config)

trainer = SFTTrainer(
    model=model,
    train_dataset=dataset,
    dataset_text_field='text',
    args=TrainingArguments(output_dir='./lora', num_train_epochs=3),
)
trainer.train()

→ Single A100 가 충분 (8B model).

QLoRA (4-bit)

from transformers import BitsAndBytesConfig

config = BitsAndBytesConfig(
    load_in_4bit=True,
    bnb_4bit_quant_type='nf4',
    bnb_4bit_compute_dtype=torch.bfloat16,
)

model = AutoModelForCausalLM.from_pretrained(model_id, quantization_config=config)

→ 70B model 가 single A100 (40GB) 가능.

Axolotl (config-based)

# config.yml
base_model: meta-llama/Llama-3-8B-Instruct
adapter: lora
lora_r: 16
lora_alpha: 32

datasets:
  - path: my_data.jsonl
    type: chat_template

sequence_len: 2048
gradient_accumulation_steps: 4
num_epochs: 3
learning_rate: 2e-4
axolotl train config.yml

→ Best practice 이미 baked.

Together AI / Replicate (managed)

from together import Together
together = Together()

job = together.fine_tuning.create(
    training_file='file-...',
    model='meta-llama/Llama-3-8B-Instruct',
    n_epochs=3,
)

→ Self-host 의 hassle 없이.

Data quality > quantity

50 high-quality > 5000 noisy.

→ Manual curation.
- Diverse examples.
- Consistent format.
- Clean (no typo, error).
- Edge cases.

Eval

# Test set (held out)
test = [...]

for ex in test:
    pred = model.generate(ex['input'])
    score = match(pred, ex['expected'])

→ Benchmark vs base model.

Cost

OpenAI fine-tune (gpt-4o-mini):
- $25 / 1M training token.
- 1k example × 500 token = 500k = $13.

Self-host LoRA (8B):
- 1 A100 hour = $1-3.
- 1k example × 3 epoch = 1-2 hour = $2-6.
- Compared to API generation: cheap.

Inference:
- Fine-tuned API: 약간 비싼 (12% premium).
- Self-host: GPU rental.

When small model fine-tune > big prompt

Big model + complex prompt:
- $10 / 1M token.
- 5 sec latency.

Small model fine-tuned:
- $0.30 / 1M token.
- 1 sec latency.

→ 같은 quality + 30x cheap + 5x faster.

But: only specific task. Generic = big model.

DPO (alignment)

from trl import DPOTrainer

trainer = DPOTrainer(
    model=sft_model,
    ref_model=sft_frozen,
    train_dataset=preferences,  # {chosen, rejected}
    beta=0.1,
)
trainer.train()

→ Preference learning. → AI_RLHF_DPO_Basics.

Production deploy

LoRA adapter:
- 100 MB-1 GB (small).
- 매 user 가 own adapter (multi-tenant).
- vLLM 가 serve N adapter from 1 base.

Full fine-tune:
- 큰 model (16-140 GB).
- 자체 instance.

→ LoRA 가 cost 의 답.

Multi-LoRA serving

vllm --model meta-llama/Llama-3-8B \
     --enable-lora \
     --lora-modules customer1=path1 customer2=path2

→ N customer × 1 base model.

When NOT?

- Generic task: prompt 충분.
- 작은 dataset (< 50): few-shot.
- Frequent change: re-train cost.
- Simple format: structured output.

Model selection

Open:
- Llama 3 (Meta).
- Mistral (Mistral AI).
- Gemma (Google).
- Qwen (Alibaba).

→ License + size + quality balance.

Synthetic data

# GPT-4 가 training data 생성.
prompts = [...]
training_data = [{
    'input': p,
    'output': gpt4.complete(p)
} for p in prompts]

# Smaller model 가 mimic.

→ "Distillation" 의 식.

Fine-tune for code

Code-specific:
- DeepSeek Coder.
- CodeLlama.
- StarCoder 2.

→ Domain-specific base model 이 좋음.

Continuous fine-tune

Production:
- 매 day / week 의 new data 가 model 에.
- Latest = 최신 fine-tune.

→ Drift adaptation.

함정

- Overfitting (small dataset).
- Catastrophic forgetting (큰 fine-tune).
- Eval set 가 train 에 leaked.
- Fine-tune 후 generic 약 (specialized).
- Cost > prompt approach.

🤔 의사결정 기준

상황 추천
Generic Prompt + RAG
Specific format OpenAI fine-tune
Cost / latency Fine-tune small open model
Domain knowledge Fine-tune + RAG
Open / self-host LoRA + Axolotl
Managed Together / Replicate / OpenAI
Privacy Self-host

안티패턴

  • Fine-tune 가 first try: prompt + RAG 시도.
  • Small dataset (< 50): few-shot.
  • Eval leak: 가짜 score.
  • Catastrophic forgetting: gentle 한 LR.
  • No version control of fine-tune: 잃음.
  • Monolithic model: LoRA 가 modular.

🤖 LLM 활용 힌트

  • LoRA / QLoRA 가 cost 의 답.
  • Axolotl 가 best-practice.
  • 50-1000 example 가 sweet.
  • Multi-LoRA serving (vLLM).

🔗 관련 문서