Files

T

Antigravity Agent 504fd5fb42 [G1-Sync] Manual knowledge update

2026-05-10 22:08:15 +09:00

1.2 KiB

Raw Blame History

id, title, category, status, canonical_id, duplicate_of, aliases, source_trust_level, confidence_score, verification_status, tags, last_reinforced, github_commit

title

SFT (Supervised Fine-Tuning)

이 문서는 Fine-tuning 의 중복본입니다. Canonical 문서로 redirect.

핵심 요약 (SFT-specific aspects)

SFT = supervised stage of LLM post-training (prompt → response pairs).
매 RLHF/DPO pipeline 의 첫 stage — base model → instruction-following model.
일반적 dataset: ShareGPT, OpenAssistant, Alpaca-style, custom domain Q&A.
매 2026 typical recipe: LoRA/QLoRA on Llama 3.x / Qwen 2.5 + 1-3 epochs at ~2e-5 LR.
후속 stage: DPO → KTO → online RL (GRPO).

🔗 Graph

부모: Fine-tuning (canonical)
Adjacent: RLHF · DPO · LoRA · Instruction-Tuning

🕓 변경 이력

날짜	변경
2026-05-08	Phase 1
2026-05-10	중복 처리 — canonical 문서로 redirect

1.2 KiB Raw Blame History

SFT (Supervised Fine-Tuning)

핵심 요약 (SFT-specific aspects)

🔗 Graph

🕓 변경 이력

1.2 KiB

Raw Blame History