Files
2nd/10_Wiki/Topics/AI_and_ML/Ps-Reinforce Policy Framework.md
T
2026-05-10 22:08:15 +09:00

885 B

id, title, category, status, canonical_id, duplicate_of, aliases, source_trust_level, confidence_score, verification_status, tags, last_reinforced, github_commit
id title category status canonical_id duplicate_of aliases source_trust_level confidence_score verification_status tags last_reinforced github_commit
wiki-2026-0508-ps-reinforce-policy-framework Ps-Reinforce Policy Framework 10_Wiki/Topics duplicate p-reinforce P-Reinforce
A 0.9 redirected
duplicate
reinforcement-learning
policy-gradient
2026-05-10 pending

Ps-Reinforce Policy Framework

이 문서는 P-Reinforce 의 중복본입니다. Canonical 문서로 redirect.

핵심 요약

  • "Ps-Reinforce" 의 매 typo / variant naming. Canonical 의 P-Reinforce (policy-gradient REINFORCE family).
  • Score-function estimator + log-prob trick.

🔗 Graph

🕓 변경 이력

날짜 변경
2026-05-08 Phase 1
2026-05-10 중복 처리 — canonical 문서로 redirect