Files
2nd/10_Wiki/Topics/Computer_Science_and_Theory/Axiology.md
T
koriweb d8a80f6272 chore(wiki): dangling 링크 canonical 정규화 (768파일/1200건)
이름만 다른(표기 변형) [[위키링크]]를 대상 문서의 canonical 제목으로 치환해
끊겼던 1,200개 링크를 연결. 제목/파일명 정규화 일치만 적용하고 별칭 매칭은
과병합 위험으로 제외(애매성 가드). 원본은 _link_reconcile_backup/ 에 백업.
도구: Datacollect/scripts/link_reconcile_apply.mjs

Co-Authored-By: Claude Opus 4.8 <noreply@anthropic.com>
2026-06-08 12:24:15 +09:00

4.7 KiB

id, title, category, status, canonical_id, aliases, duplicate_of, source_trust_level, confidence_score, verification_status, tags, raw_sources, last_reinforced, github_commit, tech_stack
id title category status canonical_id aliases duplicate_of source_trust_level confidence_score verification_status tags raw_sources last_reinforced github_commit tech_stack
wiki-2026-0508-axiology Axiology 10_Wiki/Topics verified self
Value Theory
Theory of Value
Philosophy of Value
none A 0.86 applied
philosophy
ethics
value-theory
ai-alignment
decision-theory
2026-05-10 pending
language framework
Python RL/Reward-Modeling

Axiology

매 한 줄

"매 value 의 study — 매 what 의 X, 매 worth 의 question.". Axiology 의 ethics + aesthetics 의 unifying framework — intrinsic vs instrumental, monism vs pluralism. 매 2026 의 AI alignment 의 core relevance: reward modeling / Constitutional AI / preference elicitation 의 axiological commitments.

매 핵심

매 Subdomains

  • Ethics: moral value (good / right).
  • Aesthetics: aesthetic value (beautiful / sublime).
  • Epistemology of value: truth, knowledge value.

매 Distinctions

  • Intrinsic (good in itself, e.g., happiness for hedonist) vs instrumental (good for X).
  • Subjective (depends on attitude) vs objective (mind-independent).
  • Monism (one value, e.g., utility) vs pluralism (many incommensurable values).
  • Realist vs anti-realist.

매 Major Frames

  • Hedonism (Bentham, Mill): pleasure / absence of pain.
  • Eudaimonism (Aristotle): flourishing.
  • Perfectionism: excellence, capability (Sen, Nussbaum).
  • Consequentialism: outcomes.
  • Deontology: duty (Kant).
  • Virtue ethics: character.
  • Pluralist value (Berlin): incommensurable goods.

매 AI Alignment Connection (2026)

  • Reward model = axiological model: implicit value commitment.
  • Constitutional AI (Anthropic): explicit principles → critique → revise.
  • Preference learning (RLHF, DPO, IPO): aggregate human preferences.
  • Pluralism challenge: whose values? → community / democratic AI.
  • Goodhart's law: 매 measure → target → corruption (instrumental ≠ intrinsic).

매 응용

  1. AI alignment / reward design.
  2. Cost-benefit analysis (policy).
  3. Aesthetic scoring (image gen).
  4. Healthcare QALY/DALY weighting.

💻 패턴

Pattern 1 — Multi-objective reward (pluralism)

def reward(traj):
    return (
        1.0 * progress(traj)        # instrumental
      + 0.5 * comfort(traj)         # intrinsic-ish
      + 2.0 * safety(traj)          # constraint priority
      - 0.3 * energy(traj)          # cost
    )

Pattern 2 — Constitutional critique (Anthropic-style)

CONSTITUTION = [
  "Avoid harm.",
  "Be honest.",
  "Respect autonomy.",
  "Promote well-being equitably.",
]

def critique(response, principles=CONSTITUTION):
    return llm.complete(f"Critique against: {principles}\nResponse: {response}")

def revise(response, critique_text):
    return llm.complete(f"Revise: {response}\nIn light of: {critique_text}")

Pattern 3 — Preference elicitation

# binary preference dataset → DPO / IPO
pairs = [{"prompt": p, "chosen": a, "rejected": b}, ...]
# train policy to maximize likelihood ratio

Pattern 4 — Pareto frontier (incommensurable values)

def is_pareto(point, all_points):
    return not any(all(o[i] >= point[i] for i in range(len(point))) and o != point
                   for o in all_points)

매 결정 기준

상황 Approach
Single clear metric Scalar reward (monism)
Multiple comparable Weighted sum (pluralism reduced)
Incommensurable Pareto / lexicographic
Norm uncertainty Constitutional + critique loop
Democratic Preference aggregation + transparency

기본값: pluralism + transparent weights + constitutional guardrails.

🔗 Graph

🤖 LLM 활용

언제: alignment policy drafting, principle articulation, value-laden decision review, ethical critique generation. 언제 X: pure technical optimization with no value tradeoff, single-stakeholder narrow domain.

안티패턴

  • Hidden monism: 매 single metric 의 dressed-up — Goodhart 의 vulnerable.
  • False precision: numeric weight 의 spurious 의 incommensurable values.
  • No stakeholder mapping: whose values 의 unclear.
  • Reward hacking: instrumental → intrinsic 의 confuse.

🧪 검증 / 중복

  • Verified (Stanford Encyclopedia of Philosophy "Value Theory", Anthropic Constitutional AI paper).
  • 신뢰도 A.

🕓 Changelog

날짜 변경
2026-05-08 Phase 1
2026-05-10 Manual cleanup — FULL content (frames + AI alignment patterns)