--- id: SGD-001 category: "10_Wiki/๐Ÿ’ก Topics/AI" confidence_score: 1.0 tags: [machine-learning, optimization, calculus, deep-learning, gradient-descent] last_reinforced: 2026-04-26 --- # Stochastic Gradient Descent (SGD, ํ™•๋ฅ ์  ๊ฒฝ์‚ฌ ํ•˜๊ฐ•๋ฒ•) ## ๐Ÿ“Œ ํ•œ ์ค„ ํ†ต์ฐฐ (The Karpathy Summary) > "์ „์ฒด๋ฅผ ๋‹ค ๋ณด์ง€ ๋ง๊ณ , ํ•œ ๊ฑธ์Œ์”ฉ ๋น ๋ฅด๊ฒŒ ๋‚˜์•„๊ฐ€๋ผ" โ€” ์ „์ฒด ๋ฐ์ดํ„ฐ๋ฅผ ํ•œ๊บผ๋ฒˆ์— ๊ณ„์‚ฐํ•˜๋Š” ๋Œ€์‹ , ๋ฐ์ดํ„ฐ ์ค‘ ์ผ๋ถ€(Mini-batch)๋งŒ ์‚ฌ์šฉํ•˜์—ฌ ์˜ค์ฐจ์˜ ๊ธฐ์šธ๊ธฐ๋ฅผ ๊ณ„์‚ฐํ•˜๊ณ  ๋น ๋ฅด๊ฒŒ ๊ฐ€์ค‘์น˜๋ฅผ ์—…๋ฐ์ดํŠธํ•˜๋Š” ์ตœ์ ํ™” ์•Œ๊ณ ๋ฆฌ์ฆ˜. ## ๐Ÿ“– ๊ตฌ์กฐํ™”๋œ ์ง€์‹ (Synthesized Content) - **์ถ”์ถœ๋œ ํŒจํ„ด:** ์ „์ฒด ๋ฐ์ดํ„ฐ์…‹์— ๋Œ€ํ•œ ๊ฒฝ์‚ฌ(Gradient) ๋Œ€์‹  ํ™•๋ฅ ์ ์œผ๋กœ ์ƒ˜ํ”Œ๋ง๋œ ์ผ๋ถ€์˜ ๊ฒฝ์‚ฌ๋ฅผ ํ™œ์šฉํ•˜์—ฌ ์—ฐ์‚ฐ ์†๋„๋ฅผ ๋†’์ด๊ณ , ๊ทธ ๊ณผ์ •์—์„œ์˜ ๋…ธ์ด์ฆˆ๋ฅผ ํ†ตํ•ด ์ง€์—ญ ์ตœ์ ํ•ด(Local Optima)๋ฅผ ํƒˆ์ถœํ•˜๋Š” ํšจ์œจ์  ํƒ์ƒ‰ ํŒจํ„ด. - **์„ธ๋ถ€ ๋‚ด์šฉ:** - **Iterative Update:** ๋งค ๋‹จ๊ณ„๋งˆ๋‹ค ์ž‘์€ ๋ณดํญ(Learning Rate)์œผ๋กœ ์†์‹ค ํ•จ์ˆ˜์˜ ๊ฐ’์ด ๋‚ฎ์•„์ง€๋Š” ๋ฐฉํ–ฅ์œผ๋กœ ์ด๋™. - **Efficiency:** ๋ฐฉ๋Œ€ํ•œ ์–‘์˜ ๋ฐ์ดํ„ฐ๋ฅผ ๋‹ค๋ฃจ๋Š” ๋”ฅ๋Ÿฌ๋‹ ํ™˜๊ฒฝ์—์„œ ๋ฉ”๋ชจ๋ฆฌ ํ•œ๊ณ„๋ฅผ ๊ทน๋ณตํ•˜๊ณ  ์‹ค์‹œ๊ฐ„ ํ•™์Šต ๊ฐ€๋Šฅํ•˜๊ฒŒ ํ•จ. - **Stochasticity (ํ™•๋ฅ ์„ฑ):** ์ผ๋ถ€ ๋ฐ์ดํ„ฐ๋งŒ ๋ณด๊ธฐ ๋•Œ๋ฌธ์— ๊ฒฝ๋กœ๊ฐ€ ๋ถˆ์•ˆ์ •(Zig-zag)ํ•ด ๋ณด์ผ ์ˆ˜ ์žˆ์œผ๋‚˜, ์˜คํžˆ๋ ค ์ด ํŠน์„ฑ์ด ์ข์€ ๊ณจ์งœ๊ธฐ(Local Optima)์— ๊ฐ‡ํžˆ์ง€ ์•Š๊ฒŒ ๋„์™€์คŒ. - **Variants:** ์†๋„ ์กฐ์ ˆ์„ ์œ„ํ•œ Momentum, ํŒŒ๋ผ๋ฏธํ„ฐ๋ณ„ ํ•™์Šต๋ฅ ์„ ์กฐ์ ˆํ•˜๋Š” AdaGrad, RMSProp, ๊ทธ๋ฆฌ๊ณ  ํ˜„๋Œ€์˜ ํ‘œ์ค€์ธ Adam์œผ๋กœ ๋ฐœ์ „. ## โš ๏ธ ๋ชจ์ˆœ ๋ฐ ์—…๋ฐ์ดํŠธ (Contradictions & RL Update) - **๊ณผ๊ฑฐ ๋ฐ์ดํ„ฐ์™€์˜ ์ถฉ๋Œ:** ๋ชจ๋“  ๋ฐ์ดํ„ฐ๋ฅผ ๋ด์•ผ ์ •ํ™•ํ•˜๋‹ค๊ณ  ๋ฏฟ์—ˆ๋˜ ๋ฐฐ์น˜ ํ•™์Šต ๋ฐฉ์‹์—์„œ, ์ ์ ˆํ•œ ๋…ธ์ด์ฆˆ๊ฐ€ ์„ž์ธ 'ํ™•๋ฅ ์ ' ๋ฐฉ์‹์ด ์‹ค์ œ๋กœ๋Š” ๋Œ€๊ทœ๋ชจ ์ธ๊ณต์‹ ๊ฒฝ๋ง ํ•™์Šต์— ํ›จ์”ฌ ์œ ๋ฆฌํ•จ์ด ์ฆ๋ช…๋จ. - **์ •์ฑ… ๋ณ€ํ™”:** Antigravity ํ”„๋กœ์ ํŠธ์˜ ๋กœ์ปฌ ๋ชจ๋ธ ํŒŒ์ธํŠœ๋‹ ์‹œ, ํ•˜๋“œ์›จ์–ด ์ž์› ์‚ฌ์šฉ๋Ÿ‰์„ ์ตœ์ ํ™”ํ•˜๊ธฐ ์œ„ํ•ด ์ ์ ˆํ•œ ๋ฏธ๋‹ˆ ๋ฐฐ์น˜ ์‚ฌ์ด์ฆˆ์™€ AdamW ์˜ตํ‹ฐ๋งˆ์ด์ €๊ฐ€ ์ ์šฉ๋œ SGD ๊ณ„์—ด ์•Œ๊ณ ๋ฆฌ์ฆ˜์„ ์‚ฌ์šฉํ•จ. ## ๐Ÿ”— ์ง€์‹ ์—ฐ๊ฒฐ (Graph) - [[Gradient-Descent|Gradient-Descent]], [[Optimization|Optimization]], AdamW-Optimizer, [[Machine-Learning-Lifecycle|Machine-Learning-Lifecycle]] - **Raw Source:** 10_Wiki/Topics/AI/Stochastic-Gradient-Descent-SGD.md