--- id: DL-LR-SCHED-001 category: "10_Wiki/πŸ’‘ Topics/AI" confidence_score: 1.0 tags: [ai, deep-learning, optimization, learning-rate, scheduler, training-stability] last_reinforced: 2026-04-26 --- # Learning Rate Schedules (ν•™μŠ΅λ₯  μŠ€μΌ€μ€„) ## πŸ“Œ ν•œ 쀄 톡찰 (The Karpathy Summary) > "μ΄ˆλ°˜μ—λŠ” κ³Όκ°ν•˜κ²Œ νƒμƒ‰ν•˜κ³ , 정닡에 κ°€κΉŒμ›Œμ§ˆμˆ˜λ‘ μ‹ μ€‘ν•˜κ²Œ λ°œμ„ 내딛어라" β€” ν•™μŠ΅ κ³Όμ •μ—μ„œ ν•™μŠ΅λ₯ μ„ κ³ μ •ν•˜μ§€ μ•Šκ³  사전에 μ •μ˜λœ κ·œμΉ™μ— 따라 μ μ§„μ μœΌλ‘œ λ³€ν™”μ‹œν‚΄μœΌλ‘œμ¨, μ΅œμ ν•΄μ— 더 λΉ λ₯΄κ³  μ •λ°€ν•˜κ²Œ λ„λ‹¬ν•˜κ²Œ ν•˜λŠ” μ΅œμ ν™” μ „λž΅. ## πŸ“– κ΅¬μ‘°ν™”λœ 지식 (Synthesized Content) - **μΆ”μΆœλœ νŒ¨ν„΄:** "Adaptive Speed Control" β€” ν•™μŠ΅ μ΄ˆκΈ°μ—λŠ” 큰 ν•™μŠ΅λ₯ λ‘œ μ§€μ—­ μ΅œμ ν•΄(Local Optima)λ₯Ό λΉ λ₯΄κ²Œ νƒˆμΆœν•˜κ³ , ν›„κΈ°μ—λŠ” ν•™μŠ΅λ₯ μ„ 쀄여(Decay) 손싀 ν•¨μˆ˜μ˜ κ³‘λ©΄μ—μ„œ λ―Έμ„Έν•˜κ²Œ μ§„λ™ν•˜λ©° μ •κ΅ν•œ μ΅œμ μ μ„ μ°ΎλŠ” μŠ€μΌ€μ€„λ§ νŒ¨ν„΄. - **μ£Όμš” μŠ€μΌ€μ€„λ§ 기법:** - **Step Decay:** νŠΉμ • 에포크(Epoch)λ§ˆλ‹€ κ³ μ •λœ λΉ„μœ¨λ‘œ ν•™μŠ΅λ₯  κ°μ†Œ. - **Exponential Decay:** μ§€μˆ˜ ν•¨μˆ˜λ₯Ό 따라 λ§€ λ‹¨κ³„λ§ˆλ‹€ λΆ€λ“œλŸ½κ²Œ κ°μ†Œ. - **Cosine Annealing:** 코사인 곑선을 그리며 ν•™μŠ΅λ₯ μ„ 쑰절. νƒˆμΆœκ³Ό μ•ˆμ°©μ˜ κ· ν˜•μ΄ μ’‹μ•„ 졜근 μ„ ν˜Έλ¨. - **Warm-up:** ν•™μŠ΅ κ·Ήμ΄ˆλ°˜μ— μ•„μ£Ό μž‘μ€ ν•™μŠ΅λ₯ μ—μ„œ μ‹œμž‘ν•˜μ—¬ 점차 λ†’μž„μœΌλ‘œμ¨, 초기 κ°€μ€‘μΉ˜ λΆˆμ•ˆμ •μ„±μ„ 극볡. - **의의:** λͺ¨λΈμ˜ 수렴 속도λ₯Ό 높일 뿐만 μ•„λ‹ˆλΌ, μ΅œμ’…μ μΈ λͺ¨λΈμ˜ μΌλ°˜ν™” μ„±λŠ₯(Test Accuracy)을 κ²°μ •μ§“λŠ” 핡심 ν•˜μ΄νΌνŒŒλΌλ―Έν„° 관리 기술. ## ⚠️ λͺ¨μˆœ 및 μ—…λ°μ΄νŠΈ (Contradictions & RL Update) - **κ³Όκ±° λ°μ΄ν„°μ™€μ˜ 좩돌:** κ³ μ •λœ ν•™μŠ΅λ₯ μ΄ μ•ˆμ „ν•˜λ‹€λŠ” κ³ μ •κ΄€λ…μ—μ„œ λ²—μ–΄λ‚˜, μ΄μ œλŠ” 동적인 μŠ€μΌ€μ€„λ§μ΄ λŒ€κ·œλͺ¨ λͺ¨λΈ ν•™μŠ΅μ˜ ν•„μˆ˜ 성곡 μš”κ±΄μœΌλ‘œ 정착됨. - **μ •μ±… λ³€ν™”:** Antigravity ν”„λ‘œμ νŠΈλŠ” κ±°λŒ€ μ–Έμ–΄ λͺ¨λΈ λ―Έμ„Έ μ‘°μ • μ‹œ, 초기 λΆˆμ•ˆμ •μ„±μ„ μ œμ–΄ν•˜κΈ° μœ„ν•œ Linear Warm-upκ³Ό 졜적의 μ•ˆμ°©μ„ μœ„ν•œ Cosine Decay μŠ€μΌ€μ€„λŸ¬λ₯Ό κ²°ν•©ν•˜μ—¬ μ‚¬μš©ν•¨. ## πŸ”— 지식 μ—°κ²° (Graph) - [[Global-vs-Local-Optima]], Gradient-Descent-Foundations, [[Hyperparameter-Optimization]], Weight-Initialization-Strategies - **Raw Source:** 10_Wiki/Topics/AI/Learning-Rate-Schedules.md