--- id: DL-SCHED-001 category: "10_Wiki/๐Ÿ’ก Topics/AI" confidence_score: 1.0 tags: [ai, [[Deep-Learning]], [[Optimization]], scheduler, learning-rate, hyper[[Parameter]]-tuning, training-[[Efficiency]]] last_reinforced: 2026-04-26 --- # Scheduler Design in ML (ML์—์„œ์˜ ์Šค์ผ€์ค„๋Ÿฌ ์„ค๊ณ„) ## ๐Ÿ“Œ ํ•œ ์ค„ ํ†ต์ฐฐ (The Karpathy Summary) > "ํ•™์Šต ์ดˆ๊ธฐ์—๋Š” ๋Œ€๋‹ดํ•œ ํƒ์ƒ‰(High LR)์„ ์žฅ๋ คํ•˜๊ณ , ์ข…๋‹จ์—๋Š” ์ •๋ฐ€ํ•œ ์ˆ˜๋ ด(Low LR)์„ ์œ ๋„ํ•˜์—ฌ ๋ชจ๋ธ์˜ ์ž ์žฌ๋ ฅ์„ ๋งˆ์ง€๋ง‰ ํ•œ ๋ฐฉ์šธ๊นŒ์ง€ ์ฅ์–ด์งœ๋ผ" โ€” ํ•™์Šต ๊ณผ์ • ์ค‘์— ํ•™์Šต๋ฅ (Learning Rate)์ด๋‚˜ ์ž์› ๋ฐฐ๋ถ„์„ ๋™์ ์œผ๋กœ ๋ณ€๊ฒฝํ•˜์—ฌ ํ•™์Šต์˜ ์•ˆ์ •์„ฑ๊ณผ ์ตœ์ข… ์„ฑ๋Šฅ์„ ์ตœ์ ํ™”ํ•˜๋Š” ์ „๋žต์  ์„ค๊ณ„. ## ๐Ÿ“– ๊ตฌ์กฐํ™”๋œ ์ง€์‹ (Synthesized Content) - **์ถ”์ถœ๋œ ํŒจํ„ด:** "Decaying Learning Rate and Convergence Optimization" โ€” ํ•™์Šต์ด ์ง„ํ–‰๋จ์— ๋”ฐ๋ผ ์˜ค์ฐจ๊ฐ€ ์ค„์–ด๋“œ๋Š” ์†๋„๋ฅผ ๊ฐ์‹œํ•˜๊ณ , ์‚ฌ์ „์— ์ •์˜๋œ ์ •์ฑ…(Schedule)์— ๋”ฐ๋ผ ํ•™์Šต๋ฅ ์„ ์ ์ง„์ ์œผ๋กœ ๋‚ฎ์ถค์œผ๋กœ์จ ์ง€์—ญ ์ตœ์ ํ•ด(Local Minima)๋ฅผ ํƒˆ์ถœํ•˜๊ฑฐ๋‚˜ ์ „์—ญ ์ตœ์ ํ•ด์— ๋ถ€๋“œ๋Ÿฝ๊ฒŒ ์•ˆ์ฐฉ์‹œํ‚ค๋Š” ํŒจํ„ด. - **์ฃผ์š” ์Šค์ผ€์ค„๋Ÿฌ ๊ธฐ๋ฒ•:** - **Step Decay:** ์ •ํ•ด์ง„ ์—ํฌํฌ(Epoch)๋งˆ๋‹ค ํ•™์Šต๋ฅ ์„ ์ผ์ • ๋น„์œจ๋กœ ์ถ•์†Œ. - **Cosine Annealing:** ์ฝ”์‚ฌ์ธ ํ•จ์ˆ˜ ๊ณก์„ ์„ ๋”ฐ๋ผ ํ•™์Šต๋ฅ ์„ ๋ถ€๋“œ๋Ÿฝ๊ฒŒ ๋‚ฎ์ถค. ์ตœ๊ทผ ๊ฐ€์žฅ ๋„๋ฆฌ ์“ฐ์ž„. - **ReduceLROnPlateau:** ์„ฑ๋Šฅ ํ–ฅ์ƒ์ด ๋ฉˆ์ท„์„ ๋•Œ๋งŒ ์ง€๋Šฅ์ ์œผ๋กœ ํ•™์Šต๋ฅ  ์ธํ•˜. - **Warm-up:** ์ดˆ๊ธฐ ๋ถˆ์•ˆ์ •์„ฑ์„ ๋ง‰๊ธฐ ์œ„ํ•ด ์•„์ฃผ ์ž‘์€ ํ•™์Šต๋ฅ ์—์„œ ์‹œ์ž‘ํ•ด ์ ์ฐจ ๋†’์ด๋Š” ๊ณผ์ •. - **์˜์˜:** ๊ณ ์ •๋œ ํ•™์Šต๋ฅ (Fixed LR)์„ ์“ธ ๋•Œ๋ณด๋‹ค ํ›จ์”ฌ ๋น ๋ฅด๊ฒŒ ์ˆ˜๋ ดํ•˜๋ฉฐ, ๋ชจ๋ธ์ด ๊ฐ€์งˆ ์ˆ˜ ์žˆ๋Š” ์ตœ์ƒ์˜ ์ •ํ™•๋„์— ๋„๋‹ฌํ•˜๊ฒŒ ํ•˜๋Š” ๊ฒฐ์ •์  '๋””ํ…Œ์ผ'์˜ ์˜์—ญ. ## โš ๏ธ ๋ชจ์ˆœ ๋ฐ ์—…๋ฐ์ดํŠธ (Contradictions & RL Update) - **๊ณผ๊ฑฐ ๋ฐ์ดํ„ฐ์™€์˜ ์ถฉ๋Œ:** ๋‹จ์ˆœํžˆ ํ•™์Šต๋ฅ ์„ ๋‚ฎ์ถ”๊ธฐ๋งŒ ํ•˜๋Š” ๊ฒƒ์ด ์ •๋‹ต์ด๋ผ๋˜ ๊ณผ๊ฑฐ์™€ ๋‹ฌ๋ฆฌ, ์ด์ œ๋Š” ํ•™์Šต๋ฅ ์„ ๋‹ค์‹œ ๋†’์˜€๋‹ค๊ฐ€ ๋‚ฎ์ถ”๋Š” 'Cyclical Learning Rates' ๋ฐฉ์‹์ด ์•ˆ์žฅ์ (Saddle Point) ํƒˆ์ถœ์— ๋” ํšจ๊ณผ์ ์ž„์ด ๋ฐํ˜€์ ธ ์ ๊ทน ๋„์ž…๋˜๊ณ  ์žˆ์Œ. - **์ •์ฑ… ๋ณ€ํ™”:** Antigravity ํ”„๋กœ์ ํŠธ๋Š” ๋Œ€๊ทœ๋ชจ ๋ชจ๋ธ ๋ฏธ์„ธ ์กฐ์ • ์‹œ, ํ•™์Šต ์ดˆ๊ธฐ ๋ฐœ์‚ฐ์„ ๋ฐฉ์ง€ํ•˜๊ธฐ ์œ„ํ•œ Linear Warm-up๊ณผ ์ตœ์ข… ์ˆ˜๋ ด ๊ทน๋Œ€ํ™”๋ฅผ ์œ„ํ•œ Cosine Decay ์Šค์ผ€์ค„๋Ÿฌ๋ฅผ ํ‘œ์ค€ ์กฐํ•ฉ์œผ๋กœ ์‚ฌ์šฉํ•จ. ## ๐Ÿ”— ์ง€์‹ ์—ฐ๊ฒฐ (Graph) - [[Optimization-Algorithms]], Adam-Optimizer-Foundations, Hyperparameter-Tuning-Best-Practices, Deep-Learning-Foundations - **Raw Source:** 10_Wiki/Topics/AI/Scheduler-Design-in-ML.md