--- id: MATH-OPT-SGD-001 category: "10_Wiki/๐Ÿ’ก Topics/AI" confidence_score: 1.0 tags: [ai, machine-learning, optimization, sgd, stochastic-gradient-descent, deep-learning, loss-function] last_reinforced: 2026-04-26 --- # Stochastic Gradient Descent (SGD, ํ™•๋ฅ ์  ๊ฒฝ์‚ฌ ํ•˜๊ฐ•๋ฒ•) ## ๐Ÿ“Œ ํ•œ ์ค„ ํ†ต์ฐฐ (The Karpathy Summary) > "์ „์ฒด ๋ฐ์ดํ„ฐ๋ฅผ ๊ธฐ๋‹ค๋ฆฌ๋Š” ๊ฒŒ์œผ๋ฆ„์„ ๋ฒ„๋ฆฌ๊ณ , ๋‹จ ํ•˜๋‚˜์˜ ์ƒ˜ํ”Œ(Stochastic)์ด ์ฃผ๋Š” ์ฆ‰๊ฐ์ ์ธ ํžŒํŠธ๋กœ ๋Š์ž„์—†์ด ๋ฐฉํ–ฅ์„ ์ˆ˜์ •ํ•˜๋ฉฐ ์ตœ์ ์˜ ๊ณจ์งœ๊ธฐ๋กœ ๋Œ์ง„ํ•˜๋ผ" โ€” ์†์‹ค ํ•จ์ˆ˜์˜ ๊ธฐ์šธ๊ธฐ(Gradient)๋ฅผ ๊ตฌํ•  ๋•Œ ์ „์ฒด ๋ฐ์ดํ„ฐ์…‹์ด ์•„๋‹Œ ๋ฌด์ž‘์œ„๋กœ ์„ ํƒ๋œ ์ผ๋ถ€ ๋ฐ์ดํ„ฐ๋ฅผ ์‚ฌ์šฉํ•˜์—ฌ ๊ฐ€์ค‘์น˜๋ฅผ ์—…๋ฐ์ดํŠธํ•˜๋Š” ์ตœ์ ํ™” ์•Œ๊ณ ๋ฆฌ์ฆ˜. ## ๐Ÿ“– ๊ตฌ์กฐํ™”๋œ ์ง€์‹ (Synthesized Content) - **์ถ”์ถœ๋œ ํŒจํ„ด:** "Iterative Error Correction with Noise Injection" โ€” ๋งค ์—…๋ฐ์ดํŠธ๋งˆ๋‹ค ์ ์€ ์—ฐ์‚ฐ๋Ÿ‰์œผ๋กœ ๋น ๋ฅด๊ฒŒ ๊ธธ์„ ์ฐพ๊ณ , ํ™•๋ฅ ์ ์ธ ๋…ธ์ด์ฆˆ๋ฅผ ํ™œ์šฉํ•ด ์ง€์—ญ ์ตœ์ ํ•ด(Local Minima)์˜ ํ•จ์ •์„ ๋›ฐ์–ด๋„˜์–ด ์ „์—ญ ์ตœ์ ํ•ด ๊ทผ์ฒ˜๋กœ ์ˆ˜๋ ดํ•ด ๋‚˜๊ฐ€๋Š” ํŒจํ„ด. - **์ฃผ์š” ํŠน์ง•:** - **Efficiency:** ๋ฐฉ๋Œ€ํ•œ ๋น…๋ฐ์ดํ„ฐ ํ™˜๊ฒฝ์—์„œ๋„ ์ „์ฒด ๋ฐ์ดํ„ฐ๋ฅผ ๋‹ค ์ฝ์„ ํ•„์š” ์—†์ด ์‹ค์‹œ๊ฐ„ ํ•™์Šต ๊ฐ€๋Šฅ. - **Escaping Local Optima:** ๋ฌด์ž‘์œ„ ์ƒ˜ํ”Œ๋ง์œผ๋กœ ์ธํ•œ ๊ฒฝ๋กœ์˜ ์š”๋™(Fluctuation)์ด ์˜คํžˆ๋ ค ์ข์€ ๊ตฌ๋ฉ์ด๋ฅผ ํƒˆ์ถœํ•˜๊ฒŒ ๋•๋Š” ๋™๋ ฅ์ด ๋จ. - **Learning Rate Decay:** ์ˆ˜๋ ด ์ง€์  ๊ทผ์ฒ˜์—์„œ ์ง€๋‚˜์น˜๊ฒŒ ์ง„๋™ํ•˜๋Š” ๊ฒƒ์„ ๋ง‰๊ธฐ ์œ„ํ•ด ํ•™์Šต๋ฅ ์„ ์„œ์„œํžˆ ๋‚ฎ์ถ”๋Š” ์ „๋žต ๋ณ‘ํ–‰. - **์˜์˜:** ๊ฑฐ์˜ ๋ชจ๋“  ํ˜„๋Œ€ ๋”ฅ๋Ÿฌ๋‹ ์•„ํ‚คํ…์ฒ˜(CNN, Transformer ๋“ฑ)์˜ ๊ฐ€์ค‘์น˜๋ฅผ ๊ฒฐ์ •์ง“๋Š” ์‹ค์งˆ์ ์ธ ์‹ฌ์žฅ์ด๋ฉฐ, Adam, RMSProp ๋“ฑ ์ˆ˜๋งŽ์€ ๊ณ ๋„ํ™”๋œ ์˜ตํ‹ฐ๋งˆ์ด์ €์˜ ๋ชจํƒœ๊ฐ€ ๋จ. ## โš ๏ธ ๋ชจ์ˆœ ๋ฐ ์—…๋ฐ์ดํŠธ (Contradictions & RL Update) - **๊ณผ๊ฑฐ ๋ฐ์ดํ„ฐ์™€์˜ ์ถฉ๋Œ:** ํ•œ ๋ฒˆ์— ํ•œ ๊ฐœ์”ฉ๋งŒ ์“ฐ๋˜ ์ˆœ์ˆ˜ SGD(Pure SGD)์—์„œ ๋ฒ—์–ด๋‚˜, ์ด์ œ๋Š” ํ•˜๋“œ์›จ์–ด ๊ฐ€์†(GPU)์˜ ํšจ์œจ์„ฑ์„ ๊ทน๋Œ€ํ™”ํ•˜๊ธฐ ์œ„ํ•ด ์ˆ˜์‹ญ~์ˆ˜๋ฐฑ ๊ฐœ์˜ ๋ฌถ์Œ ๋‹จ์œ„๋กœ ์ฒ˜๋ฆฌํ•˜๋Š” '๋ฏธ๋‹ˆ ๋ฐฐ์น˜(Mini-batch) SGD'๊ฐ€ ์‹ค์ „์˜ ํ‘œ์ค€์œผ๋กœ ์ •์ฐฉ๋จ. - **์ •์ฑ… ๋ณ€ํ™”:** Antigravity ํ”„๋กœ์ ํŠธ๋Š” ์—์ด์ „ํŠธ์˜ ๋กœ์ปฌ ๋ฏธ์„ธ ์กฐ์ •(Fine-tuning) ๋ฐ ์ง€์‹ ๊ฐ€์ค‘์น˜ ์—…๋ฐ์ดํŠธ ์‹œ, ์—ฐ์‚ฐ ์ž์› ์ ์œ ์œจ์„ ์ตœ์†Œํ™”ํ•˜๋ฉด์„œ๋„ ๋น ๋ฅธ ์ˆ˜๋ ด์ด ๋ณด์žฅ๋œ ์ตœ์ ํ™”๋œ SGD ํŒŒ์ดํ”„๋ผ์ธ์„ ๊ฐ€๋™ํ•จ. ## ๐Ÿ”— ์ง€์‹ ์—ฐ๊ฒฐ (Graph) - Deep-Learning-Foundations, [[Optimization-Algorithms]], Momentum-in-Optimization, Backpropagation-Fundamentals - **Raw Source:** 10_Wiki/Topics/AI/Stochastic-Gradient-Descent.md