docs: finalized wiki integrity maintenance (v3.0 standard) - pruned 1400+ stubs and fixed 11k+ ghost links

2026-05-02 09:18:34 +09:00
parent c84dcb8371
commit 6445fcc05b
13150 changed files with 55394 additions and 100862 deletions
@@ -2,7 +2,7 @@
 id: RL-REWARD-SHAPE-001
 category: "10_Wiki/💡 Topics/AI"
 confidence_score: 1.0
-tags: [ai, [[Reinforcement-Learning]], reward-shaping, reward-design, sparse-rewards, [[Behavior]]-steering]
+tags: [ai, [[Reinforcement-Learning|Reinforcement-Learning]], reward-shaping, reward-design, sparse-rewards, [[Behavior|Behavior]]-steering]
 last_reinforced: 2026-04-26
 ---

@@ -12,7 +12,7 @@ last_reinforced: 2026-04-26
 > "최종 목표라는 커다란 보상을 향해 가기 위해, 에이전트의 발걸음마다 '올바른 방향'을 가리키는 작은 이정표(Sub-rewards)를 설계하라" — 보상이 희소한(Sparse Reward) 환경에서 학습 속도를 높이기 위해 보상 함수에 추가적인 지침을 더하는 기법.

 ## 📖 구조화된 지식 (Synthesized Content)
- **추출된 패턴:** "Intermediate Incentivization and [[Alignment]] Steering" — 최종 성공 시에만 보상을 주는 대신, 목표에 가까워지는 상태 전이마다 보상을 부여하여 에이전트가 '무엇이 좋은 행동인지'를 빠르게 파악하게 만드는 패턴.
+- **추출된 패턴:** "Intermediate Incentivization and [[Alignment|Alignment]] Steering" — 최종 성공 시에만 보상을 주는 대신, 목표에 가까워지는 상태 전이마다 보상을 부여하여 에이전트가 '무엇이 좋은 행동인지'를 빠르게 파악하게 만드는 패턴.
 - **주요 고려 사항:**
    - **Potential-based Reward Shaping:** 정책의 최적성을 해치지 않으면서 보상을 추가하는 수학적 기법.
    - **Reward Hacking Risk:** 에이전트가 개발자의 의도와 달리 꼼수를 써서 보상만 극대화하는 부작용 주의.
@@ -24,5 +24,5 @@ last_reinforced: 2026-04-26
 - **정책 변화:** Antigravity 프로젝트는 에이전트의 작업 완수도 평가 시, 최종 결과뿐만 아니라 효율적인 도구 사용 및 불필요한 연산 방지 등 각 단계별 '좋은 습관'에 가중치를 주는 보상 체계를 적용함.

 ## 🔗 지식 연결 (Graph)
- [[Reinforcement-Learning]], [[Positive-Reinforcement]], [[Markov-Decision-Process-MDP]], [[Exploration-vs-Exploitation]]
+- [[Reinforcement-Learning|Reinforcement-Learning]], Positive-Reinforcement, Markov-Decision-Process-MDP, [[Exploration-vs-Exploitation|Exploration-vs-Exploitation]]
 - **Raw Source:** 10_Wiki/Topics/AI/Reward-Shaping-in-RL.md