---
id: ai-fine-tuning-vs-prompting
title: Fine-tuning vs Prompting — 결정 기준
category: Coding
status: draft
source_trust_level: B
verification_status: conceptual
created_at: 2026-05-09
updated_at: 2026-05-09
tags: [ai, llm, fine-tuning, lora, vibe-coding]
tech_stack: { language: "TS / Python", applicable_to: ["Backend"] }
applied_in: []
aliases: [fine-tuning, LoRA, RAG vs FT, distillation, prompt engineering]
---

# Fine-tuning vs Prompting

> **거의 항상 prompting (+ RAG) 먼저**. Fine-tuning = 좁은 도메인 / 일관 스타일 / latency / cost 최적화. LoRA 가 cheap. **새로운 지식 = RAG, 새로운 스타일 / 형식 = fine-tune**.

## 📖 핵심 개념
- Prompt: zero-shot / few-shot.
- RAG: 외부 지식 inject.
- Fine-tune (full): 모든 weights — 비싸.
- LoRA / QLoRA: 적은 파라미터만 학습 — cheap.
- Distillation: 큰 모델 → 작은 모델 모방.

## 💻 코드 패턴

### 결정 트리
```
새 지식 (사실) 필요?
  YES → RAG
  NO → 다음

스타일 / 형식 / tone 일관 필요?
  YES → fine-tune (LoRA)
  NO → 다음

Latency / cost 줄여야?
  YES → fine-tune 작은 모델 + distillation
  NO → prompt 만
```

### Prompt → 충분한가 검증
```ts
// 100개 test case
const dataset = loadEvalSet();
const score = await evaluate(promptModel, dataset);
console.log('Pass:', score, '%'); // 80% 미만 → fine-tune 후보
```

### LoRA fine-tune (Hugging Face PEFT)
```python
from peft import LoraConfig, get_peft_model
from transformers import AutoModelForCausalLM, TrainingArguments
from trl import SFTTrainer

base = AutoModelForCausalLM.from_pretrained('meta-llama/Llama-3.2-8B-Instruct')

lora = LoraConfig(
    r=16, lora_alpha=32, target_modules=['q_proj', 'v_proj'],
    lora_dropout=0.05, bias='none', task_type='CAUSAL_LM',
)
model = get_peft_model(base, lora)

trainer = SFTTrainer(
    model=model,
    train_dataset=dataset,
    args=TrainingArguments(output_dir='./out', num_train_epochs=3, learning_rate=2e-4, per_device_train_batch_size=4),
    max_seq_length=2048,
)
trainer.train()
trainer.save_model('./lora-out')
```

→ 1000-10000 examples 면 충분. 1 GPU + 몇 시간.

### OpenAI fine-tune (managed)
```ts
// 1. Format JSONL
// {"messages":[{"role":"system","content":"..."},{"role":"user","content":"..."},{"role":"assistant","content":"..."}]}

// 2. Upload
const file = await openai.files.create({
  file: fs.createReadStream('train.jsonl'),
  purpose: 'fine-tune',
});

// 3. Job
const job = await openai.fineTuning.jobs.create({
  training_file: file.id,
  model: 'gpt-4o-mini-2024-07-18',
  hyperparameters: { n_epochs: 3 },
});

// 4. Wait + use
const completed = await waitForJob(job.id);
const model = completed.fine_tuned_model;

// 5. 사용
await openai.chat.completions.create({ model, messages });
```

### 데이터 (가장 중요)
```jsonl
{"messages":[{"role":"system","content":"You are a customer support bot for Acme."},
{"role":"user","content":"How do I reset my password?"},
{"role":"assistant","content":"To reset: 1. Go to /forgot-password. 2. Enter your email. 3. Check inbox. We never email plain passwords."}]}
```

```
규모:
- 50-100 examples = 시작 (작은 작업)
- 500-1000 = 좋은 결과
- 10000+ = 큰 task (분류 등)
```

품질 > 양. 일관성 critical.

### 평가 (fine-tune 전후 비교)
```ts
const before = await evaluate(baseModel, evalSet);
const after = await evaluate(fineTunedModel, evalSet);
console.log('Before:', before, 'After:', after);
```

→ 향상 없으면 도입 X.

### Distillation (큰 → 작은)
```
GPT-4o (큰) 가 답을 생성 → 그 데이터로 GPT-4o-mini (작은) fine-tune
→ 작은 모델이 비슷한 정확도, 10x cheap / fast
```

### When NOT to fine-tune
- 사실 / 지식 추가 → RAG.
- 자주 변경 → prompt 가 빠름.
- Few-shot 으로 충분.
- 데이터 적음 (<50).
- Eval 안 향상.

### Cost 비교 (대략)
```
Prompt:        $0 dev cost, $$ per token (큰 prompt = 비쌈)
RAG:           $$ infra (vector DB) + $ inference
Fine-tune:     $$$ training 1회 (~$50-500) + $ inference (cheaper than 큰 모델)
LoRA self:     $ GPU (~$10-50)
```

## 🤔 의사결정 기준
| 목적 | 추천 |
|---|---|
| 새 사실 / 지식 | RAG |
| 일관 스타일 / 톤 | Fine-tune |
| 특정 형식 (JSON) | Prompt + structured output |
| Latency 줄임 | Fine-tune small + distill |
| Cost 줄임 | Distill 또는 Local |
| 빠른 prototype | Prompt only |

## ❌ 안티패턴
- **Fine-tune 먼저 시도**: prompt + RAG 충분한 경우 비싼 우회.
- **Bad data 학습**: garbage in, out.
- **Eval 없이 launch**: 성능 모름.
- **너무 적은 데이터 (10개)**: overfit.
- **Train / test 같은 데이터**: 거짓 점수.
- **System prompt 가 train data 와 다름**: prod 동작 차이.
- **Cloud + provider lock-in**: switch 어려움.

## 🤖 LLM 활용 힌트
- Prompt + RAG → 80% case 해결.
- Fine-tune = 마지막 카드, 데이터 + eval 갖추고.
- LoRA cheap — 시도 가치.

## 🔗 관련 문서
- [[AI_Prompt_Engineering_Patterns]]
- [[AI_RAG_Pattern_Basics]]
- [[AI_LLM_Eval_Patterns]]
- [[AI_Local_LLM_Inference]]