--- id: PAPER-001 category: "10_Wiki/๐Ÿ’ก Topics/AI" confidence_score: 1.0 tags: [ai, nlp, paper-summary, transformer, attention, google-research] last_reinforced: 2026-04-26 --- # Attention is All You Need (์–ดํ…์…˜ ๋…ผ๋ฌธ ์š”์•ฝ) ## ๐Ÿ“Œ ํ•œ ์ค„ ํ†ต์ฐฐ (The Karpathy Summary) > "์ˆœํ™˜(Recurrence)๊ณผ ํ•ฉ์„ฑ๊ณฑ(Convolution) ์—†์ด, ์˜ค์ง ์–ดํ…์…˜๋งŒ์œผ๋กœ ์‹œํ€€์Šค๋ฅผ ์ •๋ณตํ•˜๋ผ" โ€” ํŠธ๋žœ์Šคํฌ๋จธ ์•„ํ‚คํ…์ฒ˜๋ฅผ ์ฒ˜์Œ ์„ธ์ƒ์— ์•Œ๋ฆฐ ๊ธฐ๋…๋น„์  ๋…ผ๋ฌธ์œผ๋กœ, ๋Œ€๊ทœ๋ชจ ๋ณ‘๋ ฌ ์—ฐ์‚ฐ๊ณผ ์ „์—ญ์  ๋ฌธ๋งฅ ํŒŒ์•…์˜ ์‹œ๋Œ€๋ฅผ ์—ฐ ํ˜„๋Œ€ AI์˜ '์ฐฝ์„ธ๊ธฐ'. ## ๐Ÿ“– ๊ตฌ์กฐํ™”๋œ ์ง€์‹ (Synthesized Content) - **์ถ”์ถœ๋œ ํŒจํ„ด:** ๊ธฐ์กด RNN์˜ ๊ณ ์งˆ์ ์ธ ๋ฌธ์ œ์ธ ์žฅ๊ธฐ ์˜์กด์„ฑ(Long-term dependency)๊ณผ ์ˆœ์ฐจ์  ์—ฐ์‚ฐ์˜ ํ•œ๊ณ„๋ฅผ ํƒ€ํŒŒํ•˜๊ณ , ๋ชจ๋“  ๋ฐ์ดํ„ฐ ํฌ์ธํŠธ๊ฐ€ ์„œ๋กœ๋ฅผ '์ฃผ์˜ ๊นŠ๊ฒŒ' ๋ฐ”๋ผ๋ณด๊ฒŒ ์„ค๊ณ„๋œ ํ˜์‹ ์  ์ธ์ง€ ํŒจํ„ด. - **๋…ผ๋ฌธ์˜ ํ•ต์‹ฌ ๊ธฐ์—ฌ:** - **Self-Attention Mechanism:** ์ž…๋ ฅ ์‹œํ€€์Šค์˜ ๊ฐ ๋‹จ์–ด๊ฐ€ ๋‹ค๋ฅธ ๋ชจ๋“  ๋‹จ์–ด์™€์˜ ๊ฐ€์ค‘์น˜๋ฅผ ์ง์ ‘ ๊ณ„์‚ฐํ•˜์—ฌ ๋ฌธ๋งฅ์„ ํŒŒ์•…. - **Multi-Head Attention:** ์ •๋ณด๋ฅผ ์—ฌ๋Ÿฌ ๊ด€์ (Head)์—์„œ ๋™์‹œ์— ์ฒ˜๋ฆฌํ•˜์—ฌ ์ž…์ฒด์ ์ธ ์–ธ์–ด ์ดํ•ด ์‹คํ˜„. - **Elimination of Recurrence:** ๋ฐ์ดํ„ฐ๋ฅผ ์ˆœ์ฐจ์ ์œผ๋กœ ๋„ฃ์ง€ ์•Š๊ณ  ํ•œ๊บผ๋ฒˆ์— ์ž…๋ ฅํ•˜์—ฌ GPU ํ™œ์šฉ๋„์™€ ํ•™์Šต ์†๋„๋ฅผ ๋น„์•ฝ์ ์œผ๋กœ ํ–ฅ์ƒ. - **Positional Encoding:** ์ˆœ์ฐจ ์ •๋ณด๋ฅผ ์žƒ์ง€ ์•Š๊ธฐ ์œ„ํ•ด ์‚ฌ์ธ/์ฝ”์‚ฌ์ธ ํ•จ์ˆ˜๋ฅผ ์ด์šฉํ•œ ์œ„์น˜ ์ •๋ณด๋ฅผ ๋ฒกํ„ฐ์— ์ฃผ์ž…. - **๊ฒฐ๊ณผ:** ๊ธฐ๊ณ„ ๋ฒˆ์—ญ(WMT 2014)์—์„œ ๊ธฐ์กด SOTA๋ฅผ ๊ฐˆ์•„์น˜์šฐ๋ฉฐ ์••๋„์  ์„ฑ๋Šฅ ์ฆ๋ช…. ## โš ๏ธ ๋ชจ์ˆœ ๋ฐ ์—…๋ฐ์ดํŠธ (Contradictions & RL Update) - **๊ณผ๊ฑฐ ๋ฐ์ดํ„ฐ์™€์˜ ์ถฉ๋Œ:** ์‹œ๊ณ„์—ด ๋ฐ์ดํ„ฐ๋Š” ๋ฐ˜๋“œ์‹œ ์‹œ๊ฐ„ ์ˆœ์„œ๋Œ€๋กœ ์ฒ˜๋ฆฌํ•ด์•ผ ํ•œ๋‹ค๋Š” ํ†ต๋…์„ ๊นจ๋œจ๋ฆผ. ์ด๋กœ ์ธํ•ด 'ํ…์ŠคํŠธ'๋ฟ๋งŒ ์•„๋‹ˆ๋ผ ์ด๋ฏธ์ง€(ViT), ์˜ค๋””์˜ค ๋“ฑ ๋ชจ๋“  ๋„๋ฉ”์ธ์œผ๋กœ ํŠธ๋žœ์Šคํฌ๋จธ๊ฐ€ ํ™•์žฅ๋จ. - **์ •์ฑ… ๋ณ€ํ™”:** Antigravity ํ”„๋กœ์ ํŠธ๋Š” ์ด ๋…ผ๋ฌธ์˜ ์ฒ ํ•™์„ ๊ณ„์Šนํ•˜์—ฌ, ์ง€์‹๋“ค ๊ฐ„์˜ ์ „์—ญ์  ๊ด€๊ณ„๋ฅผ ํŒŒ์•…ํ•˜๋Š” '๋ฉ”ํƒ€ ๊ทธ๋ž˜ํ”„ ์–ดํ…์…˜' ๋กœ์ง์„ ์œ„ํ‚ค ์ธ๋ฑ์‹ฑ ์—”์ง„์— ์ ์šฉํ•จ. ## ๐Ÿ”— ์ง€์‹ ์—ฐ๊ฒฐ (Graph) - [[Transformer-Architecture|Transformer-Architecture]], [[NLP-Attention-Mechanisms|NLP-Attention-Mechanisms]], [[LLM|LLM]], [[Parallel-Computing|Parallel-Computing]] - **Raw Source:** 10_Wiki/Topics/AI/Attention is All You Need.md