--- id: OPT-GRAD-001 category: "10_Wiki/๐Ÿ’ก Topics/AI" confidence_score: 1.0 tags: [ai, [[Optimization|Optimization]], mathematics, gradient-descent, machine-learning] last_reinforced: 2026-04-26 --- # Gradient Descent Foundations (๊ฒฝ์‚ฌ ํ•˜๊ฐ•๋ฒ• ๊ธฐ์ดˆ) ## ๐Ÿ“Œ ํ•œ ์ค„ ํ†ต์ฐฐ (The Karpathy Summary) > "์–ด๋‘  ์†์—์„œ ์ง€ํ˜•์˜ ๊ธฐ์šธ๊ธฐ๋งŒ์„ ๋А๋ผ๋ฉฐ ๊ฐ€์žฅ ๋‚ฎ์€ ๊ณจ์งœ๊ธฐ๋ฅผ ํ–ฅํ•ด ๋ˆ๊ธฐ ์žˆ๊ฒŒ ๋‚ด๋ ค๊ฐ€๋ผ" โ€” ๋ชจ๋ธ์˜ ์˜ˆ์ธก๊ฐ’๊ณผ ์‹ค์ œ๊ฐ’ ์‚ฌ์ด์˜ ์˜ค์ฐจ(Loss)๋ฅผ ์ •์˜ํ•˜๊ณ , ์ด ์˜ค์ฐจ๋ฅผ ์ตœ์†Œํ™”ํ•˜๋Š” ๋ฐฉํ–ฅ์œผ๋กœ ํŒŒ๋ผ๋ฏธํ„ฐ๋ฅผ ์ ์ง„์ ์œผ๋กœ ์ˆ˜์ •ํ•ด ๋‚˜๊ฐ€๋Š” ์ธ๊ณต์ง€๋Šฅ ํ•™์Šต์˜ ๊ทผ๋ณธ ์•Œ๊ณ ๋ฆฌ์ฆ˜. ## ๐Ÿ“– ๊ตฌ์กฐํ™”๋œ ์ง€์‹ (Synthesized Content) - **์ถ”์ถœ๋œ ํŒจํ„ด:** ๋ชฉ์  ํ•จ์ˆ˜์˜ ๋ฏธ๋ถ„๊ฐ’(Gradient)์ด ๊ฐ€๋ฆฌํ‚ค๋Š” ๋ฐฉํ–ฅ์˜ ๋ฐ˜๋Œ€ ๋ฐฉํ–ฅ์œผ๋กœ ํ•™์Šต๋ฅ (Learning Rate)๋งŒํผ ํŒŒ๋ผ๋ฏธํ„ฐ๋ฅผ ์—…๋ฐ์ดํŠธํ•˜์—ฌ, ์˜ค์ฐจ๋ผ๋Š” ์‚ฐ๋งฅ์˜ ์ตœ์ €์ ์„ ์ฐพ๋Š” ๋ฐ˜๋ณต์  ์ตœ์ ํ™” ํŒจํ„ด. - **ํ•ต์‹ฌ ์š”์†Œ:** - **Learning Rate ($\eta$):** ํ•œ ๋ฒˆ์— ์–ผ๋งˆ๋‚˜ ๋ฉ€๋ฆฌ ์ด๋™ํ• ์ง€ ๊ฒฐ์ •. ๋„ˆ๋ฌด ํฌ๋ฉด ๋ฐœ์‚ฐํ•˜๊ณ , ๋„ˆ๋ฌด ์ž‘์œผ๋ฉด ํ•™์Šต์ด ๋А๋ฆผ. - **Partial Derivative:** ๊ฐ ํŒŒ๋ผ๋ฏธํ„ฐ๊ฐ€ ์˜ค์ฐจ์— ๋ฏธ์น˜๋Š” ์˜ํ–ฅ๋ ฅ์„ ๊ฐœ๋ณ„์ ์œผ๋กœ ๊ณ„์‚ฐ. - **Step:** ํ˜„์žฌ ์œ„์น˜์—์„œ ๊ธฐ์šธ๊ธฐ๊ฐ€ ๊ฐ€์žฅ ๊ฐ€ํŒŒ๋ฅธ ๋ฐฉํ–ฅ์˜ ๋ฐ˜๋Œ€๋กœ ์ด๋™ํ•˜๋Š” ํ•œ ๋‹จ๊ณ„์˜ ์—ฐ์‚ฐ. - **์ฃผ์š” ๋ณ€ํ˜•:** - **[[stochastic gradient descent|stochastic gradient descent]] (SGD):** ํ•˜๋‚˜์˜ ๋ฐ์ดํ„ฐ๋งŒ ๋ณด๊ณ  ์ฆ‰์‹œ ์—…๋ฐ์ดํŠธ. ๋น ๋ฅด์ง€๋งŒ ์š”๋™์ด ์‹ฌํ•จ. - **Mini-batch SGD:** ์ ์ ˆํ•œ ๋ฌถ์Œ ๋ฐ์ดํ„ฐ๋ฅผ ์‚ฌ์šฉํ•˜์—ฌ ์†๋„์™€ ์•ˆ์ •์„ฑ์˜ ๊ท ํ˜•์„ ๋งž์ถค. ํ˜„๋Œ€ ๋”ฅ๋Ÿฌ๋‹์˜ ํ‘œ์ค€. - **์˜์˜:** ๋ณต์žกํ•œ ์‹ ๊ฒฝ๋ง์˜ ์ˆ˜๋ฐฑ๋งŒ ๊ฐœ ํŒŒ๋ผ๋ฏธํ„ฐ๋ฅผ ํ•™์Šต์‹œํ‚ฌ ์ˆ˜ ์žˆ๋Š” ์œ ์ผํ•˜๊ณ  ์‹ค์งˆ์ ์ธ ๋ฐฉ๋ฒ•. ## โš ๏ธ ๋ชจ์ˆœ ๋ฐ ์—…๋ฐ์ดํŠธ (Contradictions & RL Update) - **๊ณผ๊ฑฐ ๋ฐ์ดํ„ฐ์™€์˜ ์ถฉ๋Œ:** ๋‹จ์ˆœํžˆ ๊ธฐ์šธ๊ธฐ๋งŒ ๋”ฐ๋ผ๊ฐ€๋˜ ๋ฐฉ์‹์—์„œ, ์ด์ œ๋Š” ๊ด€์„ฑ(Momentum)๊ณผ ๊ฐ€๋ณ€ ํ•™์Šต๋ฅ (Adam, RMSProp)์„ ๋”ํ•ด ํ›จ์”ฌ ํšจ์œจ์ ์œผ๋กœ ์ตœ์ €์ ์„ ์ฐพ๋Š” ๋ฐฉ์‹์œผ๋กœ ์ง„ํ™”. - [[Global-vs-Local-Optima|Global-vs-Local-Optima]] ๋ฌธ์„œ์™€ ์—ฐ๊ณ„ํ•˜์—ฌ, ์ง€์—ญ ์ตœ์ ํ•ด์˜ ํ•จ์ •์„ ํ”ผํ•˜๋Š” ๊ฒƒ์ด ๊ฒฝ์‚ฌ ํ•˜๊ฐ•๋ฒ• ์šด์˜์˜ ํ•ต์‹ฌ ๊ธฐ์ˆ ์ž„. ## ๐Ÿ”— ์ง€์‹ ์—ฐ๊ฒฐ (Graph) - [[Backpropagation|Backpropagation]], [[Global-vs-Local-Optima|Global-vs-Local-Optima]], [[Deep-Learning|Deep-Learning]]-Foundations, Mathematics-for-AI - **Raw Source:** 10_Wiki/Topics/AI/Gradient-Descent.md