--- id: P-REINFORCE-SCI-CONDITIONING category: "10_Wiki/πŸ’‘ Topics/Science" confidence_score: 0.97 tags: [Conditioning, Behavioral Science, Learning, Psychology] last_reinforced: 2026-04-20 --- # Conditioning-and-Learning (쑰건 ν˜•μ„±κ³Ό ν•™μŠ΅) ## πŸ“Œ ν•œ 쀄 톡찰 (The Karpathy Summary) > "행동은 λ³΄μƒμ˜ 결과물이닀." 자극과 λ°˜μ‘μ΄ κ²°ν•©ν•˜μ—¬ μŠ΅κ΄€μ΄ 되고, λ³΄μƒμ˜ 타이밍에 따라 행동이 κ°•ν™”λ˜κ±°λ‚˜ μ‚¬λΌμ§€λŠ” λ©”μ»€λ‹ˆμ¦˜μ΄λ‹€. ## πŸ“– κ΅¬μ‘°ν™”λœ 지식 (Synthesized Content) - **Classical Conditioning (고전적 쑰건 ν˜•μ„±)**: - λΉ„μžλ°œμ  λ°˜μ‚¬ λ°˜μ‘ ν•™μŠ΅. νŒŒλΈ”λ‘œν”„μ˜ 개 μ‹€ν—˜μ²˜λŸΌ 쀑립 자극이 무쑰건 자극과 κ²°ν•©ν•˜μ—¬ λ°˜μ‘μ„ μ΄λŒμ–΄λ‚΄λŠ” 방식. - **Operant Conditioning (μ‘°μž‘μ  쑰건 ν˜•μ„±)**: - 자발적 행동 ν•™μŠ΅. ν–‰λ™μ˜ κ²°κ³Όκ°€ 보상(κ°•ν™”)이면 λ°˜λ³΅ν•˜κ³ , 처벌이면 λ©ˆμΆ”λŠ” 방식. μŠ€ν‚€λ„ˆμ˜ μ‹€ν—˜μ΄ λŒ€ν‘œμ μ΄λ‹€. - **Variable Reward Schedule**: - 보상을 가끔씩 예츑 λΆˆκ°€λŠ₯ν•˜κ²Œ 쀄 λ•Œ 행동이 κ°€μž₯ κ°•λ ₯ν•˜κ²Œ μœ μ§€λœλ‹€(도박, κ°€μ±  κ²Œμž„μ˜ 원리). ## ⚠️ λͺ¨μˆœ 및 μ—…λ°μ΄νŠΈ (RL Update) - 인간은 λ‹¨μˆœνžˆ λ³΄μƒμ—λ§Œ 따라 μ›€μ§μ΄λŠ” μ‘΄μž¬κ°€ μ•„λ‹ˆλ‹€(ν–‰λ™μ£Όμ˜μ˜ ν•œκ³„). μ‚¬νšŒμ  ν•™μŠ΅(κ΄€μ°° ν•™μŠ΅)κ³Ό λ‚΄λ©΄μ˜ 필터링이 μž‘μš©ν•œλ‹€. AI λΆ„μ•Όμ˜ κ°•ν™”ν•™μŠ΅(RL)은 이 μ‘°μž‘μ  쑰건 ν˜•μ„±μ„ μˆ˜ν•™μ μœΌλ‘œ λͺ¨λΈλ§ν•˜μ—¬ 기계가 슀슀둜 μ „λž΅μ„ 찾게 λ§Œλ“ λ‹€. ## πŸ”— 지식 μ—°κ²° (Graph) - Related: [[Behavioral-Economics|Behavioral-Economics]] , Cognitive Evaluation Theory - Foundation: Reinforcement Learning