Files

T

koriweb d8a80f6272 chore(wiki): dangling 링크 canonical 정규화 (768파일/1200건)

이름만 다른(표기 변형) [[위키링크]]를 대상 문서의 canonical 제목으로 치환해
끊겼던 1,200개 링크를 연결. 제목/파일명 정규화 일치만 적용하고 별칭 매칭은
과병합 위험으로 제외(애매성 가드). 원본은 _link_reconcile_backup/ 에 백업.
도구: Datacollect/scripts/link_reconcile_apply.mjs

Co-Authored-By: Claude Opus 4.8 <noreply@anthropic.com>

2026-06-08 12:24:15 +09:00

4.7 KiB

Raw Blame History

id, title, category, status, canonical_id, aliases, duplicate_of, source_trust_level, confidence_score, verification_status, tags, raw_sources, last_reinforced, github_commit, tech_stack

title

Axiology

매 한 줄

"매 value 의 study — 매 what 의 X, 매 worth 의 question.". Axiology 의 ethics + aesthetics 의 unifying framework — intrinsic vs instrumental, monism vs pluralism. 매 2026 의 AI alignment 의 core relevance: reward modeling / Constitutional AI / preference elicitation 의 axiological commitments.

매 핵심

매 Subdomains

Ethics: moral value (good / right).
Aesthetics: aesthetic value (beautiful / sublime).
Epistemology of value: truth, knowledge value.

매 Distinctions

Intrinsic (good in itself, e.g., happiness for hedonist) vs instrumental (good for X).
Subjective (depends on attitude) vs objective (mind-independent).
Monism (one value, e.g., utility) vs pluralism (many incommensurable values).
Realist vs anti-realist.

매 Major Frames

Hedonism (Bentham, Mill): pleasure / absence of pain.
Eudaimonism (Aristotle): flourishing.
Perfectionism: excellence, capability (Sen, Nussbaum).
Consequentialism: outcomes.
Deontology: duty (Kant).
Virtue ethics: character.
Pluralist value (Berlin): incommensurable goods.

매 AI Alignment Connection (2026)

Reward model = axiological model: implicit value commitment.
Constitutional AI (Anthropic): explicit principles → critique → revise.
Preference learning (RLHF, DPO, IPO): aggregate human preferences.
Pluralism challenge: whose values? → community / democratic AI.
Goodhart's law: 매 measure → target → corruption (instrumental ≠ intrinsic).

매 응용

AI alignment / reward design.
Cost-benefit analysis (policy).
Aesthetic scoring (image gen).
Healthcare QALY/DALY weighting.

💻 패턴

Pattern 1 — Multi-objective reward (pluralism)

def reward(traj):
    return (
        1.0 * progress(traj)        # instrumental
      + 0.5 * comfort(traj)         # intrinsic-ish
      + 2.0 * safety(traj)          # constraint priority
      - 0.3 * energy(traj)          # cost
    )

Pattern 2 — Constitutional critique (Anthropic-style)

CONSTITUTION = [
  "Avoid harm.",
  "Be honest.",
  "Respect autonomy.",
  "Promote well-being equitably.",
]

def critique(response, principles=CONSTITUTION):
    return llm.complete(f"Critique against: {principles}\nResponse: {response}")

def revise(response, critique_text):
    return llm.complete(f"Revise: {response}\nIn light of: {critique_text}")

Pattern 3 — Preference elicitation

# binary preference dataset → DPO / IPO
pairs = [{"prompt": p, "chosen": a, "rejected": b}, ...]
# train policy to maximize likelihood ratio

Pattern 4 — Pareto frontier (incommensurable values)

def is_pareto(point, all_points):
    return not any(all(o[i] >= point[i] for i in range(len(point))) and o != point
                   for o in all_points)

매 결정 기준

상황	Approach
Single clear metric	Scalar reward (monism)
Multiple comparable	Weighted sum (pluralism reduced)
Incommensurable	Pareto / lexicographic
Norm uncertainty	Constitutional + critique loop
Democratic	Preference aggregation + transparency

기본값: pluralism + transparent weights + constitutional guardrails.

🔗 Graph

부모: Philosophy
응용: AI_Safety_and_Alignment
Adjacent: Aesthetic-Value · Decision Theory · AI_Safety_and_Alignment

🤖 LLM 활용

언제: alignment policy drafting, principle articulation, value-laden decision review, ethical critique generation. 언제 X: pure technical optimization with no value tradeoff, single-stakeholder narrow domain.

❌ 안티패턴

Hidden monism: 매 single metric 의 dressed-up — Goodhart 의 vulnerable.
False precision: numeric weight 의 spurious 의 incommensurable values.
No stakeholder mapping: whose values 의 unclear.
Reward hacking: instrumental → intrinsic 의 confuse.

🧪 검증 / 중복

Verified (Stanford Encyclopedia of Philosophy "Value Theory", Anthropic Constitutional AI paper).
신뢰도 A.

🕓 Changelog

날짜	변경
2026-05-08	Phase 1
2026-05-10	Manual cleanup — FULL content (frames + AI alignment patterns)

4.7 KiB Raw Blame History