--- id: wiki-2026-0508-axiology title: Axiology category: 10_Wiki/Topics status: verified canonical_id: self aliases: [Value Theory, Theory of Value, Philosophy of Value] duplicate_of: none source_trust_level: A confidence_score: 0.86 verification_status: applied tags: [philosophy, ethics, value-theory, ai-alignment, decision-theory] raw_sources: [] last_reinforced: 2026-05-10 github_commit: pending tech_stack: language: Python framework: RL/Reward-Modeling --- # Axiology ## 매 한 줄 > **"매 value 의 study — 매 what 의 X, 매 worth 의 question."**. Axiology 의 ethics + aesthetics 의 unifying framework — intrinsic vs instrumental, monism vs pluralism. 매 2026 의 AI alignment 의 core relevance: reward modeling / Constitutional AI / preference elicitation 의 axiological commitments. ## 매 핵심 ### 매 Subdomains - **Ethics**: moral value (good / right). - **Aesthetics**: aesthetic value (beautiful / sublime). - **Epistemology of value**: truth, knowledge value. ### 매 Distinctions - **Intrinsic** (good in itself, e.g., happiness for hedonist) vs **instrumental** (good for X). - **Subjective** (depends on attitude) vs **objective** (mind-independent). - **Monism** (one value, e.g., utility) vs **pluralism** (many incommensurable values). - **Realist** vs **anti-realist**. ### 매 Major Frames - **Hedonism** (Bentham, Mill): pleasure / absence of pain. - **Eudaimonism** (Aristotle): flourishing. - **Perfectionism**: excellence, capability (Sen, Nussbaum). - **Consequentialism**: outcomes. - **Deontology**: duty (Kant). - **Virtue ethics**: character. - **Pluralist value (Berlin)**: incommensurable goods. ### 매 AI Alignment Connection (2026) - **Reward model = axiological model**: implicit value commitment. - **Constitutional AI** (Anthropic): explicit principles → critique → revise. - **Preference learning (RLHF, DPO, IPO)**: aggregate human preferences. - **Pluralism challenge**: whose values? → community / democratic AI. - **Goodhart's law**: 매 measure → target → corruption (instrumental ≠ intrinsic). ### 매 응용 1. AI alignment / reward design. 2. Cost-benefit analysis (policy). 3. Aesthetic scoring (image gen). 4. Healthcare QALY/DALY weighting. ## 💻 패턴 ### Pattern 1 — Multi-objective reward (pluralism) ```python def reward(traj): return ( 1.0 * progress(traj) # instrumental + 0.5 * comfort(traj) # intrinsic-ish + 2.0 * safety(traj) # constraint priority - 0.3 * energy(traj) # cost ) ``` ### Pattern 2 — Constitutional critique (Anthropic-style) ```python CONSTITUTION = [ "Avoid harm.", "Be honest.", "Respect autonomy.", "Promote well-being equitably.", ] def critique(response, principles=CONSTITUTION): return llm.complete(f"Critique against: {principles}\nResponse: {response}") def revise(response, critique_text): return llm.complete(f"Revise: {response}\nIn light of: {critique_text}") ``` ### Pattern 3 — Preference elicitation ```python # binary preference dataset → DPO / IPO pairs = [{"prompt": p, "chosen": a, "rejected": b}, ...] # train policy to maximize likelihood ratio ``` ### Pattern 4 — Pareto frontier (incommensurable values) ```python def is_pareto(point, all_points): return not any(all(o[i] >= point[i] for i in range(len(point))) and o != point for o in all_points) ``` ## 매 결정 기준 | 상황 | Approach | |---|---| | Single clear metric | Scalar reward (monism) | | Multiple comparable | Weighted sum (pluralism reduced) | | Incommensurable | Pareto / lexicographic | | Norm uncertainty | Constitutional + critique loop | | Democratic | Preference aggregation + transparency | **기본값**: pluralism + transparent weights + constitutional guardrails. ## 🔗 Graph - 부모: [[Philosophy]] - 응용: [[AI_Safety_and_Alignment|AI-Alignment]] - Adjacent: [[Aesthetic-Value]] · [[Decision-Theory]] · [[AI_Safety_and_Alignment|Constitutional-AI]] ## 🤖 LLM 활용 **언제**: alignment policy drafting, principle articulation, value-laden decision review, ethical critique generation. **언제 X**: pure technical optimization with no value tradeoff, single-stakeholder narrow domain. ## ❌ 안티패턴 - **Hidden monism**: 매 single metric 의 dressed-up — Goodhart 의 vulnerable. - **False precision**: numeric weight 의 spurious 의 incommensurable values. - **No stakeholder mapping**: whose values 의 unclear. - **Reward hacking**: instrumental → intrinsic 의 confuse. ## 🧪 검증 / 중복 - Verified (Stanford Encyclopedia of Philosophy "Value Theory", Anthropic Constitutional AI paper). - 신뢰도 A. ## 🕓 Changelog | 날짜 | 변경 | |---|---| | 2026-05-08 | Phase 1 | | 2026-05-10 | Manual cleanup — FULL content (frames + AI alignment patterns) |