[G1-Sync] Manual knowledge update

2026-05-10 22:08:15 +09:00
parent 21ac3ed255
commit 504fd5fb42
3011 changed files with 380280 additions and 206977 deletions
@@ -0,0 +1,355 @@
+---
+id: ai-fine-tune-practical
+title: Fine-tune Practical — LoRA / QLoRA / OpenAI API
+category: Coding
+status: draft
+source_trust_level: B
+verification_status: conceptual
+created_at: 2026-05-09
+updated_at: 2026-05-09
+tags: [ai, fine-tune, vibe-coding]
+tech_stack: { language: "Python", applicable_to: ["AI"] }
+applied_in: []
+aliases: [LoRA, QLoRA, fine-tune, OpenAI fine-tuning, Anthropic, Together, Axolotl]
+---
+
+# Fine-tune Practical
+
+> Prompt + RAG 가 안 = fine-tune. **OpenAI / Anthropic API (managed) 또는 LoRA self-host (cheap)**.
+
+## 📖 핵심 개념
+- 대부분 prompt 충분.
+- Fine-tune = style / format / domain.
+- LoRA = parameter-efficient.
+- 100-10000 example 가 sweet.
+
+## 💻 코드 패턴
+
+### When fine-tune?
+```
+✓ Specific format (always JSON, specific style).
+✓ Domain knowledge (legal, medical).
+✓ Latency / cost (작은 model 가 fine-tune = 큰 model 같음).
+✓ Brand voice.
+
+✗ "Better quality" generic.
+✗ Fact (RAG 더 좋음).
+✗ Recent info (cutoff).
+
+→ Prompt 시도. RAG 시도. 안 되면 fine-tune.
+```
+
+### OpenAI fine-tune
+```python
+import openai
+
+# Upload data
+file = openai.files.create(
+    file=open('data.jsonl', 'rb'),
+    purpose='fine-tune'
+)
+
+# Create job
+job = openai.fine_tuning.jobs.create(
+    training_file=file.id,
+    model='gpt-4o-mini-2024-07-18'
+)
+
+# Status
+job = openai.fine_tuning.jobs.retrieve(job.id)
+print(job.status)  # 'succeeded'
+print(job.fine_tuned_model)  # 'ft:gpt-4o-mini:...'
+```
+
+### Data format (chat)
+```jsonl
+{"messages": [{"role": "system", "content": "..."}, {"role": "user", "content": "..."}, {"role": "assistant", "content": "..."}]}
+{"messages": [...]}
+```
+
+→ 50-1000 example 가 typical.
+
+### Use
+```python
+r = openai.chat.completions.create(
+    model='ft:gpt-4o-mini:my-org::abc',
+    messages=[{'role': 'user', 'content': '...'}]
+)
+```
+
+→ Drop-in. Prompt 줄어듦 (system 가 implicit).
+
+### Anthropic fine-tune
+```
+Anthropic 가 자체 fine-tuning service 가 limited.
+- Claude Opus / Sonnet 가 prompt 강.
+- Cost 절감 = Haiku.
+- Custom fine-tune 가 enterprise 만.
+```
+
+→ 대부분 case 에 prompt + RAG 충분.
+
+### LoRA self-host (HuggingFace)
+```python
+from transformers import AutoModelForCausalLM, AutoTokenizer
+from peft import LoraConfig, get_peft_model
+from trl import SFTTrainer
+
+model_id = 'meta-llama/Llama-3-8B-Instruct'
+model = AutoModelForCausalLM.from_pretrained(model_id)
+tokenizer = AutoTokenizer.from_pretrained(model_id)
+
+config = LoraConfig(
+    r=16, lora_alpha=32,
+    target_modules=['q_proj', 'v_proj'],
+    lora_dropout=0.05,
+)
+model = get_peft_model(model, config)
+
+trainer = SFTTrainer(
+    model=model,
+    train_dataset=dataset,
+    dataset_text_field='text',
+    args=TrainingArguments(output_dir='./lora', num_train_epochs=3),
+)
+trainer.train()
+```
+
+→ Single A100 가 충분 (8B model).
+
+### QLoRA (4-bit)
+```python
+from transformers import BitsAndBytesConfig
+
+config = BitsAndBytesConfig(
+    load_in_4bit=True,
+    bnb_4bit_quant_type='nf4',
+    bnb_4bit_compute_dtype=torch.bfloat16,
+)
+
+model = AutoModelForCausalLM.from_pretrained(model_id, quantization_config=config)
+```
+
+→ 70B model 가 single A100 (40GB) 가능.
+
+### Axolotl (config-based)
+```yaml
+# config.yml
+base_model: meta-llama/Llama-3-8B-Instruct
+adapter: lora
+lora_r: 16
+lora_alpha: 32
+
+datasets:
+  - path: my_data.jsonl
+    type: chat_template
+
+sequence_len: 2048
+gradient_accumulation_steps: 4
+num_epochs: 3
+learning_rate: 2e-4
+```
+
+```bash
+axolotl train config.yml
+```
+
+→ Best practice 이미 baked.
+
+### Together AI / Replicate (managed)
+```python
+from together import Together
+together = Together()
+
+job = together.fine_tuning.create(
+    training_file='file-...',
+    model='meta-llama/Llama-3-8B-Instruct',
+    n_epochs=3,
+)
+```
+
+→ Self-host 의 hassle 없이.
+
+### Data quality > quantity
+```
+50 high-quality > 5000 noisy.
+
+→ Manual curation.
+- Diverse examples.
+- Consistent format.
+- Clean (no typo, error).
+- Edge cases.
+```
+
+### Eval
+```python
+# Test set (held out)
+test = [...]
+
+for ex in test:
+    pred = model.generate(ex['input'])
+    score = match(pred, ex['expected'])
+```
+
+→ Benchmark vs base model.
+
+### Cost
+```
+OpenAI fine-tune (gpt-4o-mini):
+- $25 / 1M training token.
+- 1k example × 500 token = 500k = $13.
+
+Self-host LoRA (8B):
+- 1 A100 hour = $1-3.
+- 1k example × 3 epoch = 1-2 hour = $2-6.
+- Compared to API generation: cheap.
+
+Inference:
+- Fine-tuned API: 약간 비싼 (12% premium).
+- Self-host: GPU rental.
+```
+
+### When small model fine-tune > big prompt
+```
+Big model + complex prompt:
+- $10 / 1M token.
+- 5 sec latency.
+
+Small model fine-tuned:
+- $0.30 / 1M token.
+- 1 sec latency.
+
+→ 같은 quality + 30x cheap + 5x faster.
+
+But: only specific task. Generic = big model.
+```
+
+### DPO (alignment)
+```python
+from trl import DPOTrainer
+
+trainer = DPOTrainer(
+    model=sft_model,
+    ref_model=sft_frozen,
+    train_dataset=preferences,  # {chosen, rejected}
+    beta=0.1,
+)
+trainer.train()
+```
+
+→ Preference learning.
+→ [[AI_RLHF_DPO_Basics]].
+
+### Production deploy
+```
+LoRA adapter:
+- 100 MB-1 GB (small).
+- 매 user 가 own adapter (multi-tenant).
+- vLLM 가 serve N adapter from 1 base.
+
+Full fine-tune:
+- 큰 model (16-140 GB).
+- 자체 instance.
+```
+
+→ LoRA 가 cost 의 답.
+
+### Multi-LoRA serving
+```bash
+vllm --model meta-llama/Llama-3-8B \
+     --enable-lora \
+     --lora-modules customer1=path1 customer2=path2
+```
+
+→ N customer × 1 base model.
+
+### When NOT?
+```
+- Generic task: prompt 충분.
+- 작은 dataset (< 50): few-shot.
+- Frequent change: re-train cost.
+- Simple format: structured output.
+```
+
+### Model selection
+```
+Open:
+- Llama 3 (Meta).
+- Mistral (Mistral AI).
+- Gemma (Google).
+- Qwen (Alibaba).
+
+→ License + size + quality balance.
+```
+
+### Synthetic data
+```python
+# GPT-4 가 training data 생성.
+prompts = [...]
+training_data = [{
+    'input': p,
+    'output': gpt4.complete(p)
+} for p in prompts]
+
+# Smaller model 가 mimic.
+```
+
+→ "Distillation" 의 식.
+
+### Fine-tune for code
+```
+Code-specific:
+- DeepSeek Coder.
+- CodeLlama.
+- StarCoder 2.
+
+→ Domain-specific base model 이 좋음.
+```
+
+### Continuous fine-tune
+```
+Production:
+- 매 day / week 의 new data 가 model 에.
+- Latest = 최신 fine-tune.
+
+→ Drift adaptation.
+```
+
+### 함정
+```
+- Overfitting (small dataset).
+- Catastrophic forgetting (큰 fine-tune).
+- Eval set 가 train 에 leaked.
+- Fine-tune 후 generic 약 (specialized).
+- Cost > prompt approach.
+```
+
+## 🤔 의사결정 기준
+| 상황 | 추천 |
+|---|---|
+| Generic | Prompt + RAG |
+| Specific format | OpenAI fine-tune |
+| Cost / latency | Fine-tune small open model |
+| Domain knowledge | Fine-tune + RAG |
+| Open / self-host | LoRA + Axolotl |
+| Managed | Together / Replicate / OpenAI |
+| Privacy | Self-host |
+
+## ❌ 안티패턴
+- **Fine-tune 가 first try**: prompt + RAG 시도.
+- **Small dataset (< 50)**: few-shot.
+- **Eval leak**: 가짜 score.
+- **Catastrophic forgetting**: gentle 한 LR.
+- **No version control of fine-tune**: 잃음.
+- **Monolithic model**: LoRA 가 modular.
+
+## 🤖 LLM 활용 힌트
+- LoRA / QLoRA 가 cost 의 답.
+- Axolotl 가 best-practice.
+- 50-1000 example 가 sweet.
+- Multi-LoRA serving (vLLM).
+
+## 🔗 관련 문서
+- [[AI_Fine_Tuning_vs_Prompting]]
+- [[AI_RLHF_DPO_Basics]]
+- [[AI_Production_Deploy]]