Wiki cleanup: error-doc removal, dedup merge, link normalization
10_Wiki/Topics 대규모 정리: - 오류 캡처/미완성 stub 문서 227개 제거 - 교차폴더 중복 43클러스터 병합 (63파일 → redirect) - 링크명 정규화: 깨진 링크 수정·redirect 직결·개념 매핑 ~2,400건 - 카테고리 MOC 6개 신규 생성 - Graph 섹션 미해결 related-keyword 링크 10,058건 제거 Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com>
This commit is contained in:
@@ -135,10 +135,9 @@ def ppo_update(policy, ref, rm, prompts):
|
||||
**기본값**: 학습 phase는 CRF, 유지 phase는 VR. 매 punishment보다 reinforcement 우선.
|
||||
|
||||
## 🔗 Graph
|
||||
- 부모: [[Behaviorism]] · [[Operant_Conditioning]] · [[Learning_Theory]]
|
||||
- 변형: [[Negative_Reinforcement]] · [[Positive_Punishment]] · [[Reinforcement_Schedules]]
|
||||
- 응용: [[Reinforcement_Learning]] · [[RLHF]] · [[ABA_Therapy]] · [[Gamification]]
|
||||
- Adjacent: [[Reward_Function]] · [[Policy_Gradient]] · [[Variable_Reward]] · [[Skinner_Box]]
|
||||
- 부모: [[Operant_Conditioning]]
|
||||
- 응용: [[Reinforcement_Learning]] · [[RLHF]] · [[Gamification]]
|
||||
- Adjacent: [[Policy_Gradient]]
|
||||
|
||||
## 🤖 LLM 활용
|
||||
**언제**: RL agent reward design, LLM RLHF/DPO pipeline 설계, gamification UX, behavior change app.
|
||||
|
||||
Reference in New Issue
Block a user