Files
2nd/10_Wiki/Topics/AI_and_ML/Fine-tuning.md
T
Antigravity Agent f8b21af4be Wiki cleanup: error-doc removal, dedup merge, link normalization
10_Wiki/Topics 대규모 정리:
- 오류 캡처/미완성 stub 문서 227개 제거
- 교차폴더 중복 43클러스터 병합 (63파일 → redirect)
- 링크명 정규화: 깨진 링크 수정·redirect 직결·개념 매핑 ~2,400건
- 카테고리 MOC 6개 신규 생성
- Graph 섹션 미해결 related-keyword 링크 10,058건 제거

Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com>
2026-05-20 23:52:15 +09:00

7.5 KiB

id, title, category, status, canonical_id, aliases, duplicate_of, source_trust_level, confidence_score, verification_status, tags, raw_sources, last_reinforced, github_commit, tech_stack
id title category status canonical_id aliases duplicate_of source_trust_level confidence_score verification_status tags raw_sources last_reinforced github_commit tech_stack
wiki-2026-0508-fine-tuning Fine-tuning 10_Wiki/Topics verified self
fine-tuning
FT
LoRA
QLoRA
full fine-tune
instruction tuning
continual
none A 0.98 applied
machine-learning
fine-tuning
lora
qlora
transfer-learning
peft
instruction-tuning
2026-05-10 pending
language framework
Python HuggingFace transformers / peft / TRL / Unsloth / Axolotl

Fine-tuning

매 한 줄

"매 pretrained model 의 task / domain 의 의 adapt". 매 modern: 매 LoRA / QLoRA (PEFT) — 매 fraction parameter 의 update. 매 instruction tuning, RLHF, DPO. 매 alternative: prompt engineering, RAG. 매 cost: 매 GPU + data + eval.

매 핵심

매 spectrum

  • Full fine-tune: 매 모든 weight.
  • PEFT (parameter-efficient):
    • LoRA: 매 rank decomposition.
    • QLoRA: 매 4-bit quant + LoRA.
    • Adapter: 매 inserted layer.
    • IA³, Prefix-tuning, Prompt-tuning.
  • Instruction tuning: 매 (prompt, response).
  • DPO / SimPO: 매 preference.
  • RLHF: 매 PPO + reward.

매 vs alternatives

  • Prompt engineering: 매 cheapest, 매 limited.
  • Few-shot: 매 ICL, 매 token cost.
  • RAG: 매 fresh knowledge.
  • Fine-tune: 매 style / format / domain skill.

매 응용

  1. Domain adapt: 매 medical, legal.
  2. Style: 매 brand voice.
  3. Format: 매 JSON, structured.
  4. Tool use: 매 function calling.
  5. Multilingual: 매 low-resource.
  6. Safety: 매 refuse harmful.

매 modern (2024+)

  • Unsloth: 매 2x faster, lower VRAM.
  • Axolotl: 매 YAML config.
  • TRL: 매 SFT + DPO.
  • MLX-LM (Apple): 매 on-device.

💻 패턴

Full fine-tune (HF Trainer)

from transformers import AutoModelForCausalLM, AutoTokenizer, Trainer, TrainingArguments

model = AutoModelForCausalLM.from_pretrained('meta-llama/Llama-3-8B')
tokenizer = AutoTokenizer.from_pretrained('meta-llama/Llama-3-8B')

args = TrainingArguments(
    output_dir='out', num_train_epochs=3,
    per_device_train_batch_size=4, gradient_accumulation_steps=4,
    learning_rate=1e-5, warmup_steps=100, fp16=True,
    save_steps=500, evaluation_strategy='steps', eval_steps=500,
)
Trainer(model=model, args=args, train_dataset=ds, eval_dataset=eval_ds).train()

LoRA (peft)

from peft import LoraConfig, get_peft_model

config = LoraConfig(
    r=16, lora_alpha=32, lora_dropout=0.1,
    target_modules=['q_proj', 'k_proj', 'v_proj', 'o_proj'],
    bias='none', task_type='CAUSAL_LM',
)
model = get_peft_model(model, config)
model.print_trainable_parameters()  # 매 ~0.1% of full

QLoRA (4-bit)

from transformers import BitsAndBytesConfig
bnb = BitsAndBytesConfig(
    load_in_4bit=True, bnb_4bit_quant_type='nf4',
    bnb_4bit_compute_dtype=torch.bfloat16, bnb_4bit_use_double_quant=True,
)
model = AutoModelForCausalLM.from_pretrained('llama-3-8b', quantization_config=bnb)
model = get_peft_model(model, lora_config)

TRL SFT

from trl import SFTTrainer

def format_instruction(ex):
    return f"### Instruction\n{ex['instruction']}\n### Response\n{ex['response']}"

trainer = SFTTrainer(
    model=model, args=args, train_dataset=ds,
    formatting_func=format_instruction, max_seq_length=2048,
)
trainer.train()

DPO (preference)

from trl import DPOTrainer

# 매 dataset: prompt, chosen, rejected
dpo = DPOTrainer(
    model=model, ref_model=ref_model,
    args=args, beta=0.1,
    train_dataset=preference_ds,
    tokenizer=tokenizer,
)
dpo.train()

Unsloth (faster)

from unsloth import FastLanguageModel

model, tokenizer = FastLanguageModel.from_pretrained(
    'unsloth/llama-3-8b-Instruct-bnb-4bit',
    max_seq_length=4096, load_in_4bit=True,
)
model = FastLanguageModel.get_peft_model(model, r=16, lora_alpha=32)
# 매 2x faster + lower memory

Axolotl YAML

base_model: meta-llama/Llama-3-8B
load_in_4bit: true
adapter: qlora
lora_r: 16
lora_alpha: 32
datasets:
  - path: my_data.jsonl
    type: alpaca
val_set_size: 0.05
sequence_len: 2048
gradient_accumulation_steps: 4
micro_batch_size: 4
num_epochs: 3
learning_rate: 2e-4
optimizer: adamw_bnb_8bit

Merge LoRA back

from peft import PeftModel
base = AutoModelForCausalLM.from_pretrained('llama-3-8b')
lora = PeftModel.from_pretrained(base, 'lora_checkpoint')
merged = lora.merge_and_unload()
merged.save_pretrained('llama3-finetuned')

Eval (held-out)

def eval_finetuned(model, tokenizer, eval_ds):
    correct = 0
    for ex in eval_ds:
        out = model.generate(**tokenizer(ex['prompt'], return_tensors='pt'), max_new_tokens=128)
        pred = tokenizer.decode(out[0], skip_special_tokens=True)
        if grade(pred, ex['answer']): correct += 1
    return correct / len(eval_ds)

Prevent catastrophic forgetting

# 매 mix in original-task data
def mixed_training(specific_data, general_data, ratio=0.2):
    mixed = list(specific_data) + random.sample(list(general_data), int(len(specific_data) * ratio))
    random.shuffle(mixed)
    return mixed

Function calling fine-tune

def format_function_call(ex):
    return {
        'prompt': f"User: {ex['user']}\n",
        'response': f"<tool_call>{json.dumps(ex['call'])}</tool_call>\n",
    }

LR schedule (cosine + warmup)

args = TrainingArguments(
    learning_rate=2e-4, warmup_ratio=0.03, lr_scheduler_type='cosine',
    num_train_epochs=3,
)

Data quality (LIMA-style)

# 매 LIMA 2023: 매 1000 high-quality > 매 100k noisy
def filter_quality(dataset, criteria):
    return [d for d in dataset if all(c(d) for c in criteria)]

MLX-LM (Apple)

pip install mlx-lm
python -m mlx_lm.lora --model llama-3-8b --train --data data.jsonl --batch-size 1 --lora-layers 16

Quantize-aware training

# 매 GPTQ / AWQ for inference
from auto_gptq import AutoGPTQForCausalLM
quantized = AutoGPTQForCausalLM.from_pretrained(
    merged_model, quantize_config=BaseQuantizeConfig(bits=4, group_size=128),
)

매 결정 기준

상황 Approach
Style / format Prompt eng (try first)
Domain knowledge RAG (try first)
Custom skill LoRA fine-tune
Limited GPU QLoRA (4-bit)
Production speed Unsloth
Preference align DPO
Best quality cost-aware LoRA r=64 + DPO

기본값: 매 prompt → RAG → LoRA. 매 GPU 부족 = QLoRA. 매 alignment = DPO. 매 quality = LIMA-style data.

🔗 Graph

🤖 LLM 활용

언제: 매 specific style / skill. 매 domain expert. 매 alignment. 언제 X: 매 quick fix (prompt). 매 fresh data (RAG).

안티패턴

  • Fine-tune for facts: 매 RAG 의 use.
  • Tiny dataset full FT: 매 catastrophic forget.
  • No eval baseline: 매 regress 의 invisible.
  • Skip LoRA → full FT: 매 overkill.
  • No quant inference: 매 cost.

🧪 검증 / 중복

  • Verified (Hu LoRA 2021, Dettmers QLoRA 2023, Rafailov DPO 2023, LIMA 2023).
  • 신뢰도 A.

🕓 Changelog

날짜 변경
2026-04-20 Auto-reinforced
2026-05-08 Phase 1
2026-05-10 Manual cleanup — full / LoRA / QLoRA / DPO / Unsloth / Axolotl / merge code