--- category: Unified tags: [auto-consolidated, technical-documentation] title: [[Explainable-AI (XAI)|Explainable-AI (XAI)]] last_updated: 2026-05-02 --- # [[Explainable-AI (XAI)|Explainable-AI (XAI)]] ## ๐Ÿ“Œ Brief Summary > "๋ธ”๋ž™๋ฐ•์Šค์˜ ๋šœ๊ป‘์„ ์—ด๋‹ค: AI๊ฐ€ ๋ณต์žกํ•œ ์‹ ๊ฒฝ๋ง ์†์—์„œ ๋‚ด๋ฆฐ ๊ฒฐ๋ก ์˜ ๊ทผ๊ฑฐ๋ฅผ ์ธ๊ฐ„์ด ์ดํ•ดํ•  ์ˆ˜ ์žˆ๋Š” ์–ธ์–ด์™€ ์‹œ๊ฐ ์ž๋ฃŒ๋กœ ์„ค๋ช…ํ•จ์œผ๋กœ์จ, ๊ธฐ๊ณ„์— ๋Œ€ํ•œ ์‹ ๋ขฐ๋ฅผ ๊ตฌ์ถ•ํ•˜๊ณ  ์˜ค๋ฅ˜๋ฅผ ๊ฒ€์ฆ ๊ฐ€๋Šฅํ•˜๊ฒŒ ๋งŒ๋“œ๋Š” ํˆฌ๋ช…์„ฑ์˜ ๊ธฐ์ˆ ." --- > "๋ชจ๋ธ์ด '์™œ' ๊ทธ๋Ÿฐ ํŒ๋‹จ์„ ๋‚ด๋ ธ๋Š”์ง€ ์ธ๊ฐ„์˜ ์–ธ์–ด๋กœ ์ฆ๋ช…ํ•˜๋ผ" โ€” ๊ฒฐ๊ณผ๋ฟ๋งŒ ์•„๋‹ˆ๋ผ ๊ทธ ๊ฒฐ๊ณผ์— ๋„์ถœ๋œ ๊ณผ์ •๊ณผ ๊ทผ๊ฑฐ๋ฅผ ์ธ๊ฐ„์ด ์ดํ•ดํ•  ์ˆ˜ ์žˆ๋„๋ก ์ œ๊ณตํ•˜์—ฌ, AI์˜ ๋ธ”๋ž™๋ฐ•์Šค ๋ฌธ์ œ๋ฅผ ํ•ด๊ฒฐํ•˜๊ณ  ์‹ ๋ขฐ์„ฑ์„ ํ™•๋ณดํ•˜๋Š” ๊ธฐ์ˆ . ## ๐Ÿ“– Core Content ์„ค๋ช… ๊ฐ€๋Šฅํ•œ AI(XAI, Explainable-AI)๋Š” AI ๋ชจ๋ธ์˜ ๊ฒฐ๊ณผ๋ฌผ์— ๋Œ€ํ•ด ์ธ๊ฐ„์ด ์ดํ•ดํ•  ์ˆ˜ ์žˆ๋Š” ์„ค๋ช…์„ ์ œ๊ณตํ•˜๋Š” ๊ฒƒ์„ ๋ชฉํ‘œ๋กœ ํ•ฉ๋‹ˆ๋‹ค. 1. **์™œ ํ•„์š”ํ•œ๊ฐ€?**: * **Trust**: ์˜๋ฃŒ, ๊ธˆ์œต ๋“ฑ ์ƒ๋ช…/์ž์‚ฐ๊ณผ ์ง๊ฒฐ๋œ ๋ถ„์•ผ์—์„œ๋Š” "์™œ"๋ผ๋Š” ์งˆ๋ฌธ์— ๋‹ตํ•  ์ˆ˜ ์žˆ์–ด์•ผ ํ•จ. ([[Ethics & AI|Ethics & AI]]์™€ ์—ฐ๊ฒฐ) * **Debugging**: ๋ชจ๋ธ์ด ์—‰๋šฑํ•œ ๊ณณ์„ ๋ณด๊ณ  ํ•™์Šตํ•˜๋Š”์ง€(์˜ˆ: ๋ฐฐ๊ฒฝ์„ ๋ณด๊ณ  ๋Š‘๋Œ€๋ฅผ ๋ถ„๋ฅ˜) ํ™•์ธ. * **Regulatory Compliance**: AI์˜ ๊ฒฐ์ •์— ๋Œ€ํ•ด ์‚ฌ์šฉ์ž๊ฐ€ '์„ค๋ช…๋ฐ›์„ ๊ถŒ๋ฆฌ'๋ฅผ ๋ฒ•์ ์œผ๋กœ ๋ณด์žฅ๋ฐ›๋Š” ์ถ”์„ธ. 2. **์ฃผ์š” ๊ธฐ๋ฒ•**: * **LIME/SHAP**: ์ž…๋ ฅ๊ฐ’์˜ ๋ณ€ํ™”๊ฐ€ ๊ฒฐ๊ณผ์— ๋ฏธ์น˜๋Š” ์˜ํ–ฅ์„ ์ธก์ •ํ•˜์—ฌ ์ค‘์š”๋„ ํ‘œ์‹œ. * **Attention Maps**: ๋ชจ๋ธ์ด ์ด๋ฏธ์ง€์˜ ์–ด๋А ๋ถ€๋ถ„์ด๋‚˜ ํ…์ŠคํŠธ์˜ ์–ด๋А ๋‹จ์–ด์— ์ง‘์ค‘ํ–ˆ๋Š”์ง€ ๊ฐ€์‹œํ™”. --- - **์ถ”์ถœ๋œ ํŒจํ„ด:** ๋ชจ๋ธ ๋‚ด๋ถ€์˜ ๋ณต์žกํ•œ ์—ฐ์‚ฐ ๊ณผ์ •์„ ์ค‘์š”๋„ ๋งต(Heatmap), ํŠน์ง• ๊ธฐ์—ฌ๋„(Feature Attribution), ํ˜น์€ ์ž์—ฐ์–ด ์„ค๋ช…์œผ๋กœ ๋ณ€ํ™˜ํ•˜์—ฌ ์‚ฌ์šฉ์ž์—๊ฒŒ ํˆฌ๋ช…์„ฑ์„ ์ œ๊ณตํ•˜๋Š” ํ•ด์„ ํŒจํ„ด. - **์ฃผ์š” ๊ธฐ๋ฒ•:** - **LIME / SHAP:** ๋ชจ๋ธ์˜ ์ข…๋ฅ˜์™€ ์ƒ๊ด€์—†์ด ํŠน์ • ์ž…๋ ฅ์— ๋Œ€ํ•œ ์˜ˆ์ธก ๊ทผ๊ฑฐ๋ฅผ ๋ถ„์„ (Post-hoc). - **Attention Visualization:** ํŠธ๋žœ์Šคํฌ๋จธ ๋ชจ๋ธ์ด ์–ด๋–ค ๋‹จ์–ด๋‚˜ ์ด๋ฏธ์ง€ ์˜์—ญ์— ์ง‘์ค‘ํ–ˆ๋Š”์ง€ ์‹œ๊ฐํ™”. - **CAM (Class Activation Map):** ์ด๋ฏธ์ง€ ๋ถ„๋ฅ˜ ์‹œ ์–ด๋–ค ํ”ฝ์…€ ์˜์—ญ์ด ๊ฒฐ์ •์— ๊ฒฐ์ •์ ์ด์—ˆ๋Š”์ง€ ๋…ธ์ถœ. - **Rule-based Surro[[Gates|Gates]]:** ๋ณต์žกํ•œ ๋ชจ๋ธ์„ ๋‹จ์ˆœํ•œ ์˜์‚ฌ๊ฒฐ์ • ๋‚˜๋ฌด ๋“ฑ์œผ๋กœ ๊ทผ์‚ฌํ•˜์—ฌ ์„ค๋ช…. - **์˜์˜:** ์˜๋ฃŒ, ๊ธˆ์œต, ๋ฒ•๋ฅ  ๋“ฑ ๊ณ ์œ„ํ—˜ ์˜์‚ฌ๊ฒฐ์ • ๋ถ„์•ผ์—์„œ AI ๋„์ž…์„ ๊ฐ€๋Šฅํ•˜๊ฒŒ ํ•˜๋Š” ํ•„์ˆ˜ ์กฐ๊ฑด. ## โš–๏ธ Trade-offs & Caveats - **๊ณผ๊ฑฐ ๋ฐ์ดํ„ฐ์™€์˜ ์ถฉ๋Œ**: ๊ณผ๊ฑฐ์—๋Š” ์„ฑ๋Šฅ(Accuracy)๊ณผ ์„ค๋ช…๋ ฅ(Interpretability)์ด ๋ฐ˜๋น„๋ก€ ๊ด€๊ณ„๋ผ๋Š” ์ •์ฑ…์ด ์ฃผ๋ฅ˜์˜€์œผ๋‚˜, ํ˜„๋Œ€ ์ •์ฑ…์€ ์ง€๋Šฅ์ด ๋†’์œผ๋ฉด์„œ๋„ ์Šค์Šค๋กœ์˜ ๋…ผ๋ฆฌ ๊ตฌ์กฐ๋ฅผ ๋ธŒ๋ฆฌํ•‘ํ•˜๋Š” '๋‚ด์žฌ์  ์„ค๋ช… ์ •์ฑ…'์„ ์ถ”๊ตฌํ•จ(RL Update). - **์ •์ฑ… ๋ณ€ํ™”(RL Update)**: ๋‹จ์ˆœ ๊ฐ€์‹œํ™”๋ฅผ ๋„˜์–ด, AI๊ฐ€ ์ž์‹ ์˜ ์‚ฌ๊ณ  ๊ณผ์ •์„ ๋‹จ๊ณ„๋ณ„๋กœ ํ’€์–ด์„œ ์„ค๋ช…ํ•˜๋Š” CoT(Chain-of-Thought) ์ •์ฑ…์ด LLM ์‹œ๋Œ€์˜ ํ•ต์‹ฌ XAI ๋ฐฉ๋ฒ•๋ก ์œผ๋กœ ๋ถ€์ƒํ•จ. (Chain-of-Thought์™€ ์—ฐ๊ฒฐ) --- - **๊ณผ๊ฑฐ ๋ฐ์ดํ„ฐ์™€์˜ ์ถฉ๋Œ:** ๋‹จ์ˆœํžˆ ์„ฑ๋Šฅ(Accuracy)๋งŒ ๋†’์œผ๋ฉด ๋œ๋‹ค๋Š” ๊ด€์ ์—์„œ, ์„ฑ๋Šฅ์„ ์กฐ๊ธˆ ํฌ์ƒํ•˜๋”๋ผ๋„ '์„ค๋ช… ๊ฐ€๋Šฅ์„ฑ(Interpretability)'์ด ๋‹ด๋ณด๋˜์–ด์•ผ ํ•œ๋‹ค๋Š” ์‹ ๋ขฐ ์ค‘์‹ฌ ๊ด€์ ์œผ๋กœ ์ „ํ™˜. - **์ •์ฑ… ๋ณ€ํ™”:** Antigravity ํ”„๋กœ์ ํŠธ๋Š” ์—์ด์ „ํŠธ๊ฐ€ ์ œ์•ˆํ•œ ์ฝ”๋“œ๋‚˜ ์ง€์‹ ๋ณด๊ฐ• ๋‚ด์šฉ์— ๋Œ€ํ•ด, ์ฐธ๊ณ ํ•œ ์†Œ์Šค ๋ฌธ์„œ์™€ ์ถ”๋ก  ๊ณผ์ •์„ 'Rationale' ์„ธ์…˜์œผ๋กœ ๋ช…์‹œํ•˜์—ฌ ์‚ฌ์šฉ์ž ๊ฒ€์ฆ์„ ๋•๋„๋ก ์„ค๊ณ„ํ•จ. ## ๐Ÿ”— Knowledge Connections - [[Ethics & AI|Ethics & AI]], [[Chain-of-Thought (CoT แ„‰แ…กแ„€แ…ฉ แ„‰แ…กแ„‰แ…ณแ†ฏ)|Chain-of-Thought (CoT ์‚ฌ๊ณ  ์‚ฌ์Šฌ)]], Trust and Perspective, Transparency, Bias-Variance Tradeoff - **Modern Tech/Tools**: SHAP, LIME, Captum (PyTorch), Integrated Gradients. --- --- - [[Trustworthy-AI|Trustworthy-AI]], AI-Ethics, Decision-Making, [[Feature-Engineering|Feature-Engineering]] - **Raw Source:** 10_Wiki/Topics/AI/Explainable-AI-XAI.md