--- id: ML-IMBAL-001 category: "10_Wiki/๐Ÿ’ก Topics/AI" confidence_score: 1.0 tags: [machine-learning, imbalanced-data, resampling, smote, focal-loss] last_reinforced: 2026-04-26 --- # Imbalanced Data Handling (๋ถˆ๊ท ํ˜• ๋ฐ์ดํ„ฐ ์ฒ˜๋ฆฌ) ## ๐Ÿ“Œ ํ•œ ์ค„ ํ†ต์ฐฐ (The Karpathy Summary) > "๋ฐ์ดํ„ฐ์˜ ์–‘(Quantity)์— ์••๋„๋‹นํ•˜์ง€ ๋ง๊ณ , ์†Œ์™ธ๋œ ์†Œ์ˆ˜์˜ ์ •๋ณด(Minority Class) ์†์— ์ˆจ๊ฒจ์ง„ ๊ฐ€์น˜์— ์ง‘์ค‘ํ•˜๋ผ" โ€” ํ•™์Šต ๋ฐ์ดํ„ฐ์˜ ํด๋ž˜์Šค ๋ถ„ํฌ๊ฐ€ ํŽธํ–ฅ๋˜์–ด ์žˆ์„ ๋•Œ, ๋ชจ๋ธ์ด ๋‹ค์ˆ˜ ํด๋ž˜์Šค์—๋งŒ ์น˜์šฐ์นœ ์˜ˆ์ธก์„ ํ•˜์ง€ ์•Š๋„๋ก ๋ฐ์ดํ„ฐ๋‚˜ ์•Œ๊ณ ๋ฆฌ์ฆ˜ ์ธก๋ฉด์—์„œ ๊ท ํ˜•์„ ๋งž์ถ”๋Š” ๊ธฐ๋ฒ•. ## ๐Ÿ“– ๊ตฌ์กฐํ™”๋œ ์ง€์‹ (Synthesized Content) - **์ถ”์ถœ๋œ ํŒจํ„ด:** ๋‹ค์ˆ˜ ๋ฐ์ดํ„ฐ์˜ ์˜ํ–ฅ๋ ฅ์„ ์ค„์ด๊ฑฐ๋‚˜ ์†Œ์ˆ˜ ๋ฐ์ดํ„ฐ์˜ ๋น„์ค‘์„ ๋†’์—ฌ, ๋ชจ๋ธ์ด ํฌ๊ท€ํ•˜์ง€๋งŒ ์ค‘์š”ํ•œ ์ƒ˜ํ”Œ์˜ ํŠน์ง•์„ ์ถฉ๋ถ„ํžˆ ํ•™์Šตํ•˜๊ฒŒ ๋งŒ๋“œ๋Š” ๊ฐ€์ค‘์น˜ ๋ฐ ์ƒ˜ํ”Œ๋ง ์กฐ์ • ํŒจํ„ด. - **ํ•ต์‹ฌ ์ „๋žต:** - **Data-level (Resampling):** - **Undersampling:** ๋‹ค์ˆ˜ ํด๋ž˜์Šค ๋ฐ์ดํ„ฐ๋ฅผ ์‚ญ์ œ. ์ •๋ณด ์†์‹ค ์œ„ํ—˜. - **Oversampling:** ์†Œ์ˆ˜ ํด๋ž˜์Šค ๋ฐ์ดํ„ฐ๋ฅผ ๋ณต์ œํ•˜๊ฑฐ๋‚˜ ์ƒ์„ฑ (์˜ˆ: SMOTE - ํ•ฉ๋ฆฌ์ ์ธ ๊ฐ€์ƒ ๋ฐ์ดํ„ฐ ์ƒ์„ฑ). - **Algorithm-level:** - **Cost-sensitive Learning:** ์†Œ์ˆ˜ ํด๋ž˜์Šค๋ฅผ ํ‹€๋ ธ์„ ๋•Œ ๋” ํฐ ๋ฒŒ์ ์„ ๋ถ€์—ฌ. - **Focal Loss:** ์‰ฌ์šด ์ƒ˜ํ”Œ์˜ ๋น„์ค‘์„ ๋‚ฎ์ถ”๊ณ  ์–ด๋ ค์šด ์ƒ˜ํ”Œ์— ์ง‘์ค‘. - **ํ‰๊ฐ€ ์ง€ํ‘œ์˜ ์ „ํ™˜:** ๋ถˆ๊ท ํ˜• ๋ฐ์ดํ„ฐ์—์„œ๋Š” '์ •ํ™•๋„(Accuracy)' ๋Œ€์‹  '์ •๋ฐ€๋„(Precision)', '์žฌํ˜„์œจ(Recall)', 'F1-Score' ๋“ฑ์„ ์‚ฌ์šฉํ•˜์—ฌ ๋ชจ๋ธ์˜ ์‹ค์งˆ์ ์ธ ์„ฑ๋Šฅ์„ ์ธก์ •ํ•ด์•ผ ํ•จ. - **์˜์˜:** ์ด์ƒ ํƒ์ง€, ์งˆ๋ณ‘ ์ง„๋‹จ, ์‚ฌ๊ธฐ ์ ๋ฐœ ๋“ฑ ์‹ค์ƒํ™œ์—์„œ ๊ฐ€์žฅ ์ค‘์š”ํ•œ 'ํฌ๊ท€ ์ผ€์ด์Šค' ํƒ์ง€ ๋Šฅ๋ ฅ ํ™•๋ณด. ## โš ๏ธ ๋ชจ์ˆœ ๋ฐ ์—…๋ฐ์ดํŠธ (Contradictions & RL Update) - **๊ณผ๊ฑฐ ๋ฐ์ดํ„ฐ์™€์˜ ์ถฉ๋Œ:** ๋‹จ์ˆœํžˆ ๋ฐ์ดํ„ฐ๋ฅผ ๋Š˜๋ฆฌ๊ฑฐ๋‚˜ ์ค„์ด๋˜ ๋ฐฉ์‹์—์„œ, ์ตœ๊ทผ์—๋Š” [[Focal-Loss|Focal-Loss]]์™€ ๊ฐ™์€ ์†์‹ค ํ•จ์ˆ˜ ์ตœ์ ํ™”์™€ ์ด์ƒ ํƒ์ง€(Anomaly Detection) ๊ด€์ ์˜ ์ ‘๊ทผ์ด ์ฃผ๋ฅผ ์ด๋ฃธ. - **์ •์ฑ… ๋ณ€ํ™”:** Antigravity ํ”„๋กœ์ ํŠธ๋Š” ๋ณด์•ˆ ๋กœ๊ทธ ๋ถ„์„ ์‹œ, ์••๋„์ ์œผ๋กœ ๋งŽ์€ '์ •์ƒ ์ ‘๊ทผ' ์‚ฌ์ด์—์„œ ๊ทน์†Œ์ˆ˜์˜ '๊ณต๊ฒฉ ์ง•ํ›„'๋ฅผ ๋†“์น˜์ง€ ์•Š๊ธฐ ์œ„ํ•ด SMOTE์™€ Cost-sensitive ์•™์ƒ๋ธ” ๋ชจ๋ธ์„ ํ‘œ์ค€์œผ๋กœ ์‚ฌ์šฉํ•จ. ## ๐Ÿ”— ์ง€์‹ ์—ฐ๊ฒฐ (Graph) - [[Focal-Loss|Focal-Loss]], [[Supervised-Learning-Foundations|Supervised-Learning-Foundations]], Precision-Recall-and-F1-Score, Deep-Learning-Foundations - **Raw Source:** 10_Wiki/Topics/AI/Imbalanced-Data-Handling.md