--- id: wiki-2026-0508-sft-supervised-fine-tuning title: SFT (Supervised Fine-Tuning) category: 10_Wiki/Topics status: duplicate canonical_id: fine-tuning duplicate_of: "[[Fine-tuning]]" aliases: [Supervised Fine-Tuning, Instruction Tuning] source_trust_level: A confidence_score: 0.9 verification_status: redirected tags: [duplicate, sft, fine-tuning, llm] last_reinforced: 2026-05-10 github_commit: pending --- # SFT (Supervised Fine-Tuning) > **이 문서는 [[Fine-tuning]] 의 중복본입니다.** Canonical 문서로 redirect. ## 핵심 요약 (SFT-specific aspects) - SFT = supervised stage of LLM post-training (prompt → response pairs). - 매 RLHF/DPO pipeline 의 첫 stage — base model → instruction-following model. - 일반적 dataset: ShareGPT, OpenAssistant, Alpaca-style, custom domain Q&A. - 매 2026 typical recipe: LoRA/QLoRA on Llama 3.x / Qwen 2.5 + 1-3 epochs at ~2e-5 LR. - 후속 stage: DPO → KTO → online RL (GRPO). ## 🔗 Graph - 부모: [[Fine-tuning]] (canonical) - Adjacent: [[RLHF]] · [[DPO]] · [[LoRA]] · [[Instruction-Tuning]] ## 🕓 변경 이력 | 날짜 | 변경 | |---|---| | 2026-05-08 | Phase 1 | | 2026-05-10 | 중복 처리 — canonical 문서로 redirect |