--- id: ai-fine-tune-practical title: Fine-tune Practical — LoRA / QLoRA / OpenAI API category: Coding status: draft source_trust_level: B verification_status: conceptual created_at: 2026-05-09 updated_at: 2026-05-09 tags: [ai, fine-tune, vibe-coding] tech_stack: { language: "Python", applicable_to: ["AI"] } applied_in: [] aliases: [LoRA, QLoRA, fine-tune, OpenAI fine-tuning, Anthropic, Together, Axolotl] --- # Fine-tune Practical > Prompt + RAG 가 안 = fine-tune. **OpenAI / Anthropic API (managed) 또는 LoRA self-host (cheap)**. ## 📖 핵심 개념 - 대부분 prompt 충분. - Fine-tune = style / format / domain. - LoRA = parameter-efficient. - 100-10000 example 가 sweet. ## 💻 코드 패턴 ### When fine-tune? ``` ✓ Specific format (always JSON, specific style). ✓ Domain knowledge (legal, medical). ✓ Latency / cost (작은 model 가 fine-tune = 큰 model 같음). ✓ Brand voice. ✗ "Better quality" generic. ✗ Fact (RAG 더 좋음). ✗ Recent info (cutoff). → Prompt 시도. RAG 시도. 안 되면 fine-tune. ``` ### OpenAI fine-tune ```python import openai # Upload data file = openai.files.create( file=open('data.jsonl', 'rb'), purpose='fine-tune' ) # Create job job = openai.fine_tuning.jobs.create( training_file=file.id, model='gpt-4o-mini-2024-07-18' ) # Status job = openai.fine_tuning.jobs.retrieve(job.id) print(job.status) # 'succeeded' print(job.fine_tuned_model) # 'ft:gpt-4o-mini:...' ``` ### Data format (chat) ```jsonl {"messages": [{"role": "system", "content": "..."}, {"role": "user", "content": "..."}, {"role": "assistant", "content": "..."}]} {"messages": [...]} ``` → 50-1000 example 가 typical. ### Use ```python r = openai.chat.completions.create( model='ft:gpt-4o-mini:my-org::abc', messages=[{'role': 'user', 'content': '...'}] ) ``` → Drop-in. Prompt 줄어듦 (system 가 implicit). ### Anthropic fine-tune ``` Anthropic 가 자체 fine-tuning service 가 limited. - Claude Opus / Sonnet 가 prompt 강. - Cost 절감 = Haiku. - Custom fine-tune 가 enterprise 만. ``` → 대부분 case 에 prompt + RAG 충분. ### LoRA self-host (HuggingFace) ```python from transformers import AutoModelForCausalLM, AutoTokenizer from peft import LoraConfig, get_peft_model from trl import SFTTrainer model_id = 'meta-llama/Llama-3-8B-Instruct' model = AutoModelForCausalLM.from_pretrained(model_id) tokenizer = AutoTokenizer.from_pretrained(model_id) config = LoraConfig( r=16, lora_alpha=32, target_modules=['q_proj', 'v_proj'], lora_dropout=0.05, ) model = get_peft_model(model, config) trainer = SFTTrainer( model=model, train_dataset=dataset, dataset_text_field='text', args=TrainingArguments(output_dir='./lora', num_train_epochs=3), ) trainer.train() ``` → Single A100 가 충분 (8B model). ### QLoRA (4-bit) ```python from transformers import BitsAndBytesConfig config = BitsAndBytesConfig( load_in_4bit=True, bnb_4bit_quant_type='nf4', bnb_4bit_compute_dtype=torch.bfloat16, ) model = AutoModelForCausalLM.from_pretrained(model_id, quantization_config=config) ``` → 70B model 가 single A100 (40GB) 가능. ### Axolotl (config-based) ```yaml # config.yml base_model: meta-llama/Llama-3-8B-Instruct adapter: lora lora_r: 16 lora_alpha: 32 datasets: - path: my_data.jsonl type: chat_template sequence_len: 2048 gradient_accumulation_steps: 4 num_epochs: 3 learning_rate: 2e-4 ``` ```bash axolotl train config.yml ``` → Best practice 이미 baked. ### Together AI / Replicate (managed) ```python from together import Together together = Together() job = together.fine_tuning.create( training_file='file-...', model='meta-llama/Llama-3-8B-Instruct', n_epochs=3, ) ``` → Self-host 의 hassle 없이. ### Data quality > quantity ``` 50 high-quality > 5000 noisy. → Manual curation. - Diverse examples. - Consistent format. - Clean (no typo, error). - Edge cases. ``` ### Eval ```python # Test set (held out) test = [...] for ex in test: pred = model.generate(ex['input']) score = match(pred, ex['expected']) ``` → Benchmark vs base model. ### Cost ``` OpenAI fine-tune (gpt-4o-mini): - $25 / 1M training token. - 1k example × 500 token = 500k = $13. Self-host LoRA (8B): - 1 A100 hour = $1-3. - 1k example × 3 epoch = 1-2 hour = $2-6. - Compared to API generation: cheap. Inference: - Fine-tuned API: 약간 비싼 (12% premium). - Self-host: GPU rental. ``` ### When small model fine-tune > big prompt ``` Big model + complex prompt: - $10 / 1M token. - 5 sec latency. Small model fine-tuned: - $0.30 / 1M token. - 1 sec latency. → 같은 quality + 30x cheap + 5x faster. But: only specific task. Generic = big model. ``` ### DPO (alignment) ```python from trl import DPOTrainer trainer = DPOTrainer( model=sft_model, ref_model=sft_frozen, train_dataset=preferences, # {chosen, rejected} beta=0.1, ) trainer.train() ``` → Preference learning. → [[AI_RLHF_DPO_Basics]]. ### Production deploy ``` LoRA adapter: - 100 MB-1 GB (small). - 매 user 가 own adapter (multi-tenant). - vLLM 가 serve N adapter from 1 base. Full fine-tune: - 큰 model (16-140 GB). - 자체 instance. ``` → LoRA 가 cost 의 답. ### Multi-LoRA serving ```bash vllm --model meta-llama/Llama-3-8B \ --enable-lora \ --lora-modules customer1=path1 customer2=path2 ``` → N customer × 1 base model. ### When NOT? ``` - Generic task: prompt 충분. - 작은 dataset (< 50): few-shot. - Frequent change: re-train cost. - Simple format: structured output. ``` ### Model selection ``` Open: - Llama 3 (Meta). - Mistral (Mistral AI). - Gemma (Google). - Qwen (Alibaba). → License + size + quality balance. ``` ### Synthetic data ```python # GPT-4 가 training data 생성. prompts = [...] training_data = [{ 'input': p, 'output': gpt4.complete(p) } for p in prompts] # Smaller model 가 mimic. ``` → "Distillation" 의 식. ### Fine-tune for code ``` Code-specific: - DeepSeek Coder. - CodeLlama. - StarCoder 2. → Domain-specific base model 이 좋음. ``` ### Continuous fine-tune ``` Production: - 매 day / week 의 new data 가 model 에. - Latest = 최신 fine-tune. → Drift adaptation. ``` ### 함정 ``` - Overfitting (small dataset). - Catastrophic forgetting (큰 fine-tune). - Eval set 가 train 에 leaked. - Fine-tune 후 generic 약 (specialized). - Cost > prompt approach. ``` ## 🤔 의사결정 기준 | 상황 | 추천 | |---|---| | Generic | Prompt + RAG | | Specific format | OpenAI fine-tune | | Cost / latency | Fine-tune small open model | | Domain knowledge | Fine-tune + RAG | | Open / self-host | LoRA + Axolotl | | Managed | Together / Replicate / OpenAI | | Privacy | Self-host | ## ❌ 안티패턴 - **Fine-tune 가 first try**: prompt + RAG 시도. - **Small dataset (< 50)**: few-shot. - **Eval leak**: 가짜 score. - **Catastrophic forgetting**: gentle 한 LR. - **No version control of fine-tune**: 잃음. - **Monolithic model**: LoRA 가 modular. ## 🤖 LLM 활용 힌트 - LoRA / QLoRA 가 cost 의 답. - Axolotl 가 best-practice. - 50-1000 example 가 sweet. - Multi-LoRA serving (vLLM). ## 🔗 관련 문서 - [[AI_Fine_Tuning_vs_Prompting]] - [[AI_RLHF_DPO_Basics]] - [[AI_Production_Deploy]]