1.2 KiB
1.2 KiB
id, title, category, status, canonical_id, duplicate_of, aliases, source_trust_level, confidence_score, verification_status, tags, last_reinforced, github_commit
| id | title | category | status | canonical_id | duplicate_of | aliases | source_trust_level | confidence_score | verification_status | tags | last_reinforced | github_commit | ||||||
|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|
| wiki-2026-0508-sft-supervised-fine-tuning | SFT (Supervised Fine-Tuning) | 10_Wiki/Topics | duplicate | fine-tuning | Fine-tuning |
|
A | 0.9 | redirected |
|
2026-05-10 | pending |
SFT (Supervised Fine-Tuning)
이 문서는 Fine-tuning 의 중복본입니다. Canonical 문서로 redirect.
핵심 요약 (SFT-specific aspects)
- SFT = supervised stage of LLM post-training (prompt → response pairs).
- 매 RLHF/DPO pipeline 의 첫 stage — base model → instruction-following model.
- 일반적 dataset: ShareGPT, OpenAssistant, Alpaca-style, custom domain Q&A.
- 매 2026 typical recipe: LoRA/QLoRA on Llama 3.x / Qwen 2.5 + 1-3 epochs at ~2e-5 LR.
- 후속 stage: DPO → KTO → online RL (GRPO).
🔗 Graph
- 부모: Fine-tuning (canonical)
- Adjacent: RLHF · DPO · LoRA · Instruction-Tuning
🕓 변경 이력
| 날짜 | 변경 |
|---|---|
| 2026-05-08 | Phase 1 |
| 2026-05-10 | 중복 처리 — canonical 문서로 redirect |