Files

T

Antigravity Agent f8b21af4be Wiki cleanup: error-doc removal, dedup merge, link normalization

10_Wiki/Topics 대규모 정리:
- 오류 캡처/미완성 stub 문서 227개 제거
- 교차폴더 중복 43클러스터 병합 (63파일 → redirect)
- 링크명 정규화: 깨진 링크 수정·redirect 직결·개념 매핑 ~2,400건
- 카테고리 MOC 6개 신규 생성
- Graph 섹션 미해결 related-keyword 링크 10,058건 제거

Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com>

2026-05-20 23:52:15 +09:00

4.7 KiB

Raw Blame History

id, title, category, status, canonical_id, aliases, duplicate_of, source_trust_level, confidence_score, verification_status, tags, raw_sources, last_reinforced, github_commit, tech_stack

title

Axiology

매 한 줄

"매 value 의 study — 매 what 의 X, 매 worth 의 question.". Axiology 의 ethics + aesthetics 의 unifying framework — intrinsic vs instrumental, monism vs pluralism. 매 2026 의 AI alignment 의 core relevance: reward modeling / Constitutional AI / preference elicitation 의 axiological commitments.

매 핵심

매 Subdomains

Ethics: moral value (good / right).
Aesthetics: aesthetic value (beautiful / sublime).
Epistemology of value: truth, knowledge value.

매 Distinctions

Intrinsic (good in itself, e.g., happiness for hedonist) vs instrumental (good for X).
Subjective (depends on attitude) vs objective (mind-independent).
Monism (one value, e.g., utility) vs pluralism (many incommensurable values).
Realist vs anti-realist.

매 Major Frames

Hedonism (Bentham, Mill): pleasure / absence of pain.
Eudaimonism (Aristotle): flourishing.
Perfectionism: excellence, capability (Sen, Nussbaum).
Consequentialism: outcomes.
Deontology: duty (Kant).
Virtue ethics: character.
Pluralist value (Berlin): incommensurable goods.

매 AI Alignment Connection (2026)

Reward model = axiological model: implicit value commitment.
Constitutional AI (Anthropic): explicit principles → critique → revise.
Preference learning (RLHF, DPO, IPO): aggregate human preferences.
Pluralism challenge: whose values? → community / democratic AI.
Goodhart's law: 매 measure → target → corruption (instrumental ≠ intrinsic).

매 응용

AI alignment / reward design.
Cost-benefit analysis (policy).
Aesthetic scoring (image gen).
Healthcare QALY/DALY weighting.

💻 패턴

Pattern 1 — Multi-objective reward (pluralism)

def reward(traj):
    return (
        1.0 * progress(traj)        # instrumental
      + 0.5 * comfort(traj)         # intrinsic-ish
      + 2.0 * safety(traj)          # constraint priority
      - 0.3 * energy(traj)          # cost
    )

Pattern 2 — Constitutional critique (Anthropic-style)

CONSTITUTION = [
  "Avoid harm.",
  "Be honest.",
  "Respect autonomy.",
  "Promote well-being equitably.",
]

def critique(response, principles=CONSTITUTION):
    return llm.complete(f"Critique against: {principles}\nResponse: {response}")

def revise(response, critique_text):
    return llm.complete(f"Revise: {response}\nIn light of: {critique_text}")

Pattern 3 — Preference elicitation

# binary preference dataset → DPO / IPO
pairs = [{"prompt": p, "chosen": a, "rejected": b}, ...]
# train policy to maximize likelihood ratio

Pattern 4 — Pareto frontier (incommensurable values)

def is_pareto(point, all_points):
    return not any(all(o[i] >= point[i] for i in range(len(point))) and o != point
                   for o in all_points)

매 결정 기준

상황	Approach
Single clear metric	Scalar reward (monism)
Multiple comparable	Weighted sum (pluralism reduced)
Incommensurable	Pareto / lexicographic
Norm uncertainty	Constitutional + critique loop
Democratic	Preference aggregation + transparency

기본값: pluralism + transparent weights + constitutional guardrails.

🔗 Graph

부모: Philosophy
응용: AI_Safety_and_Alignment
Adjacent: Aesthetic-Value · Decision-Theory · AI_Safety_and_Alignment

🤖 LLM 활용

언제: alignment policy drafting, principle articulation, value-laden decision review, ethical critique generation. 언제 X: pure technical optimization with no value tradeoff, single-stakeholder narrow domain.

❌ 안티패턴

Hidden monism: 매 single metric 의 dressed-up — Goodhart 의 vulnerable.
False precision: numeric weight 의 spurious 의 incommensurable values.
No stakeholder mapping: whose values 의 unclear.
Reward hacking: instrumental → intrinsic 의 confuse.

🧪 검증 / 중복

Verified (Stanford Encyclopedia of Philosophy "Value Theory", Anthropic Constitutional AI paper).
신뢰도 A.

🕓 Changelog

날짜	변경
2026-05-08	Phase 1
2026-05-10	Manual cleanup — FULL content (frames + AI alignment patterns)

4.7 KiB Raw Blame History