37 lines
1.2 KiB
Markdown
37 lines
1.2 KiB
Markdown
---
|
|
id: wiki-2026-0508-sft-supervised-fine-tuning
|
|
title: SFT (Supervised Fine-Tuning)
|
|
category: 10_Wiki/Topics
|
|
status: duplicate
|
|
canonical_id: fine-tuning
|
|
duplicate_of: "[[Fine-tuning]]"
|
|
aliases: [Supervised Fine-Tuning, Instruction Tuning]
|
|
source_trust_level: A
|
|
confidence_score: 0.9
|
|
verification_status: redirected
|
|
tags: [duplicate, sft, fine-tuning, llm]
|
|
last_reinforced: 2026-05-10
|
|
github_commit: pending
|
|
---
|
|
|
|
# SFT (Supervised Fine-Tuning)
|
|
|
|
> **이 문서는 [[Fine-tuning]] 의 중복본입니다.** Canonical 문서로 redirect.
|
|
|
|
## 핵심 요약 (SFT-specific aspects)
|
|
- SFT = supervised stage of LLM post-training (prompt → response pairs).
|
|
- 매 RLHF/DPO pipeline 의 첫 stage — base model → instruction-following model.
|
|
- 일반적 dataset: ShareGPT, OpenAssistant, Alpaca-style, custom domain Q&A.
|
|
- 매 2026 typical recipe: LoRA/QLoRA on Llama 3.x / Qwen 2.5 + 1-3 epochs at ~2e-5 LR.
|
|
- 후속 stage: DPO → KTO → online RL (GRPO).
|
|
|
|
## 🔗 Graph
|
|
- 부모: [[Fine-tuning]] (canonical)
|
|
- Adjacent: [[RLHF]] · [[DPO]] · [[LoRA]] · [[Instruction-Tuning]]
|
|
|
|
## 🕓 변경 이력
|
|
| 날짜 | 변경 |
|
|
|---|---|
|
|
| 2026-05-08 | Phase 1 |
|
|
| 2026-05-10 | 중복 처리 — canonical 문서로 redirect |
|