--- id: AI-XAI-001 category: "10_Wiki/๐Ÿ’ก Topics/AI" confidence_score: 1.0 tags: [ai, xai, interpretability, shap, lime, trustworthy-ai, explainability] last_reinforced: 2026-04-26 --- # Model Interpretability Tools (๋ชจ๋ธ ํ•ด์„ ๊ฐ€๋Šฅ์„ฑ ๋„๊ตฌ) ## ๐Ÿ“Œ ํ•œ ์ค„ ํ†ต์ฐฐ (The Karpathy Summary) > "๊ฒฐ๊ณผ๋งŒ ์ œ์‹œํ•˜๋Š” ๊ณ ์ง‘๋ถˆํ†ต AI๋ฅผ '์ด์œ ๋ฅผ ์„ค๋ช…ํ•˜๋Š”' ํ˜‘๋ ฅ์ž๋กœ ๋ฐ”๊พธ์–ด, ์ธ๊ฐ„์ด ๊ธฐ๊ณ„์˜ ์ง€๋Šฅ์„ ์‹ ๋ขฐํ•˜๊ณ  ํ†ต์ œํ•˜๊ฒŒ ํ•˜๋ผ" โ€” ๋ณต์žกํ•œ ๋จธ์‹ ๋Ÿฌ๋‹ ๋ชจ๋ธ์˜ ํŒ๋‹จ ๊ทผ๊ฑฐ๋ฅผ ์‹œ๊ฐํ™”ํ•˜๊ฑฐ๋‚˜ ์ˆ˜์น˜๋กœ ์ •๋Ÿ‰ํ™”ํ•˜์—ฌ, ์–ด๋–ค ๋ณ€์ˆ˜๊ฐ€ ๊ฒฐ๊ณผ์— ๊ฒฐ์ •์ ์ธ ์˜ํ–ฅ์„ ๋ฏธ์ณค๋Š”์ง€ ์ธ๊ฐ„์ด ์ดํ•ดํ•  ์ˆ˜ ์žˆ๊ฒŒ ๋•๋Š” ๋„๊ตฌ ๋ฐ ๊ธฐ๋ฒ•. ## ๐Ÿ“– ๊ตฌ์กฐํ™”๋œ ์ง€์‹ (Synthesized Content) - **์ถ”์ถœ๋œ ํŒจํ„ด:** "Post-hoc Explanation and Feature Attribution" โ€” ๋ชจ๋ธ ํ•™์Šต์ด ๋๋‚œ ํ›„, ์ž…๋ ฅ๊ฐ’์˜ ๋ฏธ์„ธํ•œ ๋ณ€ํ™”๊ฐ€ ๊ฒฐ๊ณผ์— ๋ฏธ์น˜๋Š” ์˜ํ–ฅ์„ ์—ญ์ถ”์ ํ•˜๊ฑฐ๋‚˜ ๊ฒŒ์ž„ ์ด๋ก  ๋“ฑ์„ ์ ์šฉํ•˜์—ฌ ๊ฐ ํŠน์ง•(Feature)์˜ ๊ธฐ์—ฌ๋„๋ฅผ ์‚ฐ์ถœํ•˜๋Š” ํ•ด์„ ํŒจํ„ด. - **์ฃผ์š” ๋„๊ตฌ ๋ฐ ๊ธฐ๋ฒ•:** - **SHAP (SHapley Additive exPlanations):** ๊ฒŒ์ž„ ์ด๋ก ์„ ๋ฐ”ํƒ•์œผ๋กœ ๊ฐ ํŠน์ง•์ด ์˜ˆ์ธก๊ฐ’์— ๋ฏธ์น˜๋Š” ์˜ํ–ฅ๋ ฅ์„ ๊ณต์ •ํ•˜๊ฒŒ ๋ฐฐ๋ถ„ํ•˜์—ฌ ์‚ฐ์ถœ. ์ผ๊ด€์„ฑ์ด ๋†’์Œ. - **LIME (Local Interpretable Model-agnostic Explanations):** ํŠน์ • ๋ฐ์ดํ„ฐ ํฌ์ธํŠธ ์ฃผ๋ณ€์—์„œ ๋ชจ๋ธ์„ ์„ ํ˜•์ ์œผ๋กœ ๊ทผ์‚ฌํ•˜์—ฌ ๊ตญ์†Œ์ ์ธ ํŒ๋‹จ ๊ทผ๊ฑฐ ์„ค๋ช…. - **Feature Importance:** ํŠธ๋ฆฌ ๊ธฐ๋ฐ˜ ๋ชจ๋ธ ๋“ฑ์—์„œ ๋ณ€์ˆ˜๊ฐ€ ๋ถ„๊ธฐ์— ์–ผ๋งˆ๋‚˜ ์ž์ฃผ ์‚ฌ์šฉ๋˜์—ˆ๋Š”์ง€ ์ธก์ •. - **Attention Map:** ํŠธ๋žœ์Šคํฌ๋จธ ๋ชจ๋ธ์—์„œ AI๊ฐ€ ๋ฌธ์žฅ์˜ ์–ด๋А ๋ถ€๋ถ„์— ์ง‘์ค‘ํ–ˆ๋Š”์ง€ ์‹œ๊ฐํ™”. - **์˜์˜:** ์˜๋ฃŒ, ๊ธˆ์œต, ๋ฒ•๋ฅ  ๋“ฑ ๊ณ ๋„์˜ ์ฑ…์ž„์ด ๋”ฐ๋ฅด๋Š” ๋ถ„์•ผ์—์„œ AI ๋„์ž…์„ ๊ฐ€๋Šฅ์ผ€ ํ•˜๋ฉฐ, ๋ชจ๋ธ์˜ ํŽธํ–ฅ์„ฑ์„ ์ฐพ์•„๋‚ด๊ณ  ์„ฑ๋Šฅ์„ ๊ฐœ์„ ํ•˜๋Š” ํ•ต์‹ฌ ๋””๋ฒ„๊น… ๋„๊ตฌ ์—ญํ• . ## โš ๏ธ ๋ชจ์ˆœ ๋ฐ ์—…๋ฐ์ดํŠธ (Contradictions & RL Update) - **๊ณผ๊ฑฐ ๋ฐ์ดํ„ฐ์™€์˜ ์ถฉ๋Œ:** ์„ฑ๋Šฅ๊ณผ ํ•ด์„ ๊ฐ€๋Šฅ์„ฑ์€ ํŠธ๋ ˆ์ด๋“œ์˜คํ”„(Trade-off) ๊ด€๊ณ„๋ผ๋Š” ํ†ต๋…์ด ์žˆ์—ˆ์œผ๋‚˜, ์ตœ๊ทผ์—๋Š” ๋ณต์žกํ•œ ๋”ฅ๋Ÿฌ๋‹ ๋ชจ๋ธ์˜ ์„ฑ๋Šฅ์„ ์œ ์ง€ํ•˜๋ฉด์„œ๋„ ์‚ฌํ›„ ํ•ด์„ ๋„๊ตฌ๋ฅผ ํ†ตํ•ด ์ถฉ๋ถ„ํ•œ ํˆฌ๋ช…์„ฑ์„ ํ™•๋ณดํ•  ์ˆ˜ ์žˆ์Œ์ด ์ฆ๋ช…๋จ. - **์ •์ฑ… ๋ณ€ํ™”:** Antigravity ํ”„๋กœ์ ํŠธ๋Š” ์—์ด์ „ํŠธ์˜ ์ง€์‹ ์‚ญ์ œ๋‚˜ ์ค‘์š” ๊ฐ€์ด๋“œ๋ผ์ธ ์œ„๋ฐ˜ ํŒ๋‹จ ์‹œ, SHAP ๊ฐ’์„ ํ•จ๊ป˜ ๊ธฐ๋กํ•˜์—ฌ ๊ด€๋ฆฌ์ž๊ฐ€ AI์˜ ํŒ๋‹จ ๊ทผ๊ฑฐ๋ฅผ ์ƒ์‹œ ๋ชจ๋‹ˆํ„ฐ๋งํ•  ์ˆ˜ ์žˆ๋Š” ์ฒด๊ณ„๋ฅผ ๊ตฌ์ถ•ํ•จ. ## ๐Ÿ”— ์ง€์‹ ์—ฐ๊ฒฐ (Graph) - [[Explainable-AI-XAI|Explainable-AI-XAI]], [[Trustworthy-AI|Trustworthy-AI]], AI-Ethics, Transformer-Architecture-Foundations - **Raw Source:** 10_Wiki/Topics/AI/Model-Interpretability-Tools.md