Files
2nd/10_Wiki/Topics/AI_and_ML/Cognitive Biases.md
T
2026-05-10 22:08:15 +09:00

10 KiB
Raw Blame History

id, title, category, status, canonical_id, aliases, duplicate_of, source_trust_level, confidence_score, verification_status, tags, raw_sources, last_reinforced, github_commit, tech_stack
id title category status canonical_id aliases duplicate_of source_trust_level confidence_score verification_status tags raw_sources last_reinforced github_commit tech_stack
wiki-2026-0508-cognitive-biases Cognitive Biases 10_Wiki/Topics verified self
인지 편향
cognitive biases
heuristics
Tversky-Kahneman
Thinking Fast and Slow
debiasing
nudge
none A 0.93 applied
psychology
cognitive-bias
kahneman
behavioral-economics
debiasing
nudge
decision-making
ml-bias
2026-05-10 pending
language applicable_to
psychology / decision theory
Decision Systems
Product Design
ML Bias Mitigation

Cognitive Biases

매 한 줄

"매 thinking 의 shortcut 의 trap". Kahneman 의 System 1 (fast / heuristic) vs System 2 (slow / logical). 매 evolutionary 의 useful, 매 modern context 의 misfire. 매 modern AI 의 bias source. 매 design 의 leverage (nudge) or 매 mitigation (debiasing).

매 핵심

매 major bias

Cognitive

  • Confirmation bias: 매 belief 의 support 만.
  • Availability heuristic: 매 recent / vivid.
  • Anchoring: 매 first number.
  • Representativeness: 매 stereotype.
  • Hindsight: 매 "I knew it".
  • Survivorship: 매 winner 만 의 분석.
  • Sunk cost: 매 already-invested 의 maintain.

Social

  • In-group bias: 매 our group 의 prefer.
  • Authority bias: 매 expert 의 over-trust.
  • Bandwagon: 매 majority 의 follow.
  • Halo effect: 매 1 trait → 매 all.

Self

  • Dunning-Kruger: 매 incompetent 의 over-confident.
  • Fundamental attribution: 매 others = 매 character, 매 self = 매 situation.
  • Self-serving: 매 success = self, 매 failure = environment.
  • Optimism bias: 매 future 의 over-rosy.

Loss

  • Loss aversion: 매 loss > 매 gain (2× weight).
  • Endowment effect: 매 own 의 over-value.
  • Status quo bias: 매 default keep.

Kahneman: System 1 vs System 2

System 1 System 2
Fast Slow
Automatic Deliberate
Pattern Logic
Cheap Expensive
Bias prone Bias correct

→ 매 모든 해결 의 X. 매 둘 다 needed.

매 history

  • Tversky-Kahneman 1974, "Judgment under Uncertainty".
  • Prospect Theory (1979) — Nobel.
  • Kahneman "Thinking Fast and Slow" (2011).
  • Cialdini "Influence" (1984).
  • Thaler "Nudge" (2008) — Nobel.

매 modern AI 의 응용

Bias 의 ML 의 source

  • 매 training data 의 인간 의 bias 의 reflect.
  • 매 amplification of existing.
  • 매 representation skew.

Debiasing

LLM-specific bias

  • Sycophancy: 매 user 의 agree.
  • Position bias: 매 first / last 의 prefer.
  • Recency: 매 latest token 의 weight ↑.
  • Anchoring: 매 example 의 over-weight.

Prompt engineering 의 mitigation

  • 매 chain-of-thought.
  • 매 self-critique.
  • 매 multiple perspective.
  • 매 explicit "consider opposite".

Nudge (Thaler-Sunstein)

  • 매 default 의 power.
  • 매 choice architecture.
  • 매 friction 의 control.
  • 매 loss frame vs gain frame.

매 Dark Pattern (anti-nudge)

  • 매 hidden cost.
  • 매 confirm-shaming.
  • 매 forced continuity.
  • 매 misdirection.
  • Addiction-Neuroscience 참조.

매 debiasing 기법

  1. Premortem (Klein): 매 imagine failure.
  2. Red team / devil's advocate.
  3. Anonymous voting.
  4. Decision journal (Thaler).
  5. Outside view (base rate).
  6. Multi-perspective (10 framework).
  7. Fermi estimation.
  8. Evidence-based reasoning.

💻 패턴

Decision journal (Bayesian)

class DecisionJournal:
    def __init__(self):
        self.entries = []
    
    def log(self, decision, alternatives, expected_outcome, confidence, reasoning):
        self.entries.append({
            'date': datetime.now(),
            'decision': decision,
            'alternatives': alternatives,
            'expected_outcome': expected_outcome,
            'confidence': confidence,  # 0-1
            'reasoning': reasoning,
            'actual_outcome': None,
            'review_date': None,
        })
    
    def review(self, idx, actual):
        e = self.entries[idx]
        e['actual_outcome'] = actual
        e['review_date'] = datetime.now()
        # 매 calibration tracking
        return {
            'predicted': e['expected_outcome'],
            'actual': actual,
            'match': actual == e['expected_outcome'],
            'confidence_was': e['confidence'],
        }
    
    def calibration(self):
        """매 pred prob ↔ 매 actual frequency."""
        bins = collections.defaultdict(list)
        for e in self.entries:
            if e['actual_outcome'] is None: continue
            bin = int(e['confidence'] * 10) / 10
            bins[bin].append(e['actual_outcome'] == e['expected_outcome'])
        return {b: np.mean(outcomes) for b, outcomes in bins.items()}

Premortem

def premortem(plan):
    """매 imagine 1 year future 의 failure."""
    return {
        'imagine_state': 'plan failed catastrophically',
        'failure_modes': brainstorm([
            'biggest reason',
            'early warning signs',
            'binding constraint',
            'wrong assumption',
        ]),
        'mitigations': [],  # 매 each mode 의 plan
    }

Anchoring counter

def negotiate_without_anchor(target, your_estimate):
    """매 first number 의 anchor 의 avoid."""
    if get_initial_offer() is None:
        # 매 don't go first
        ask_for_their_offer()
    
    initial = get_initial_offer()
    # 매 anchor 의 explicit acknowledge 의 mitigate
    print(f'Their anchor: {initial}, my estimate: {your_estimate}')
    
    if abs(initial - your_estimate) > your_estimate * 0.3:
        # 매 wide gap → 매 reset with reasoning
        reset_with_data(your_estimate)
    
    return negotiate_around(your_estimate)

LLM debiasing prompt

def cot_with_devils_advocate(question):
    return f"""Analyze this:

{question}

Step 1: Initial answer.
Step 2: List 3 strongest counter-arguments.
Step 3: Re-evaluate considering counter-arguments.
Step 4: Final answer with confidence (0-1).

Format: JSON only."""

Sycophancy detection (LLM)

def sycophancy_check(model, prompt):
    """매 user 의 stated opinion 의 sway?"""
    a = model(f"{prompt}\nWhat do you think?")
    b = model(f"I strongly believe X is correct. {prompt}\nWhat do you think?")
    c = model(f"I strongly believe X is wrong. {prompt}\nWhat do you think?")
    
    if assesses_X_correct(a) != assesses_X_correct(b) or \
       assesses_X_correct(a) != assesses_X_correct(c):
        return 'WARN: sycophantic'
    return 'OK'

Choice architecture (nudge)

// 매 default 의 power — opt-out 의 organ donor 의 95% vs opt-in 의 15%
function NewsletterSignup() {
  return (
    <form>
      <label>
        <input type="checkbox" defaultChecked />
         newsletter 구독 (opt-out)
      </label>
    </form>
  );
}

// 매 ❌ Dark pattern (avoid)
function CancelSubscription() {
  return (
    <button>
      Yes, cancel and lose all my benefits forever 😢
    </button>
  );
}

Anti-confirmation (red team)

def red_team_review(decision):
    return [
        ('What evidence would change your mind?', None),
        ('What did you NOT consider?', None),
        ('Who would disagree, and why?', None),
        ('What is the strongest argument against?', None),
        ('If you fail, what is the most likely cause?', None),
    ]

Survivorship bias check

def survivorship_audit(success_set, full_set):
    success_traits = traits(success_set)
    base_rate_traits = traits(full_set)  # 매 includes failures
    
    biased_traits = []
    for trait, success_rate in success_traits.items():
        base = base_rate_traits.get(trait, 0)
        if success_rate > base * 1.5:
            biased_traits.append({
                'trait': trait,
                'success_rate': success_rate,
                'base_rate': base,
                'inflation': success_rate / base if base else 'inf',
            })
    return biased_traits

🤔 결정 기준

상황 Counter-bias
Big decision Decision journal + premortem
Negotiation Don't go first + reset
LLM use CoT + multiple perspective
Hiring Structured interview + scorecard
Investing Outside view + base rate
Group meeting Anonymous voting
Strategy Red team
Daily Mindfulness + slow down

기본값: 매 explicit slow-down + 매 system 2 의 invoke + 매 evidence-based.

🔗 Graph

🤖 LLM 활용

언제: 매 decision design. 매 product UX. 매 negotiation prep. 매 LLM bias mitigation. 매 hiring. 언제 X: 매 dark pattern (manipulation). 매 specific medical / mental health.

안티패턴

  • Bias 의 fix 의 unrealistic: 매 always present.
  • Awareness 의 only: 매 actual 의 reduce 의 limited.
  • 모든 bias 의 fight: 매 some 의 useful (heuristic).
  • Dark pattern 의 leverage: 매 short-term gain, 매 long-term loss.
  • No calibration: 매 confidence 의 wrong.
  • Sycophantic LLM 의 trust: 매 false validation.

🧪 검증 / 중복

🕓 Changelog

날짜 변경
2026-05-08 Phase 1
2026-05-10 Manual cleanup — bias catalog + Kahneman + LLM-specific + 매 decision journal / premortem / CoT code