--- id: EDA-001 category: "10_Wiki/πŸ’‘ Topics/AI" confidence_score: 1.0 tags: [data-science, statistics, eda, visualization, machine-learning] last_reinforced: 2026-04-26 --- # Exploratory Data Analysis (EDA, 탐색적 데이터 뢄석) ## πŸ“Œ ν•œ 쀄 톡찰 (The Karpathy Summary) > "λͺ¨λΈμ„ λ§Œλ“€κΈ° μ „, 데이터가 λ“€λ €μ£ΌλŠ” λ‚ κ²ƒμ˜ 이야기에 κ·€λ₯Ό κΈ°μšΈμ—¬λΌ" β€” μˆ˜μ§‘λœ 데이터λ₯Ό λ‹€μ–‘ν•œ κ°λ„μ—μ„œ κ΄€μ°°ν•˜κ³  μ‹œκ°ν™”ν•˜μ—¬ λ°μ΄ν„°μ˜ 뢄포, μ΄μƒμΉ˜, λ³€μˆ˜ κ°„ 상관관계λ₯Ό νŒŒμ•…ν•˜κ³  가섀을 μ„Έμš°λŠ” ν•„μˆ˜μ μΈ 기초 뢄석 단계. ## πŸ“– κ΅¬μ‘°ν™”λœ 지식 (Synthesized Content) - **μΆ”μΆœλœ νŒ¨ν„΄:** κ³ μ •λœ 정닡을 찾기보닀 λ°μ΄ν„°μ˜ 전체적인 μœ€κ³½μ„ νŒŒμ•…ν•˜κ³ , μ „μ²˜λ¦¬ λ°©ν–₯(Feature Engineering)을 κ²°μ •ν•˜κΈ° μœ„ν•œ 톡계적 직관 ν˜•μ„± νŒ¨ν„΄. - **μ£Όμš” μˆ˜ν–‰ μž‘μ—…:** - **Summary Statistics:** 평균, 쀑앙값, ν‘œμ€€νŽΈμ°¨ 확인. - **Distribution Analysis:** νžˆμŠ€ν† κ·Έλž¨μ΄λ‚˜ λ°•μŠ€ ν”Œλ‘―μ„ 톡해 데이터 치우침 및 μ΄μƒμΉ˜ 탐색. - **Correlation Analysis:** 산점도(Scatter plot)λ‚˜ Heatmap을 톡해 λ³€μˆ˜ κ°„ 관계 νŒŒμ•…. - **Missing Value Check:** 결츑치 비쀑과 νŒ¨ν„΄ 뢄석. - **의의:** μ“°λ ˆκΈ°λ₯Ό λ„£μœΌλ©΄ μ“°λ ˆκΈ°κ°€ λ‚˜μ˜€λŠ”(GIGO) ν˜„μƒμ„ λ°©μ§€ν•˜κ³ , 데이터에 μˆ¨κ²¨μ§„ 도메인 지식을 λ°œκ²¬ν•˜λŠ” κ³Όμ •. ## ⚠️ λͺ¨μˆœ 및 μ—…λ°μ΄νŠΈ (Contradictions & RL Update) - **κ³Όκ±° λ°μ΄ν„°μ™€μ˜ 좩돌:** λ°”λ‘œ λͺ¨λΈ ν•™μŠ΅ μ½”λ“œλ₯Ό 짜던 μ„±κΈ‰ν•¨μ—μ„œ λ²—μ–΄λ‚˜, λ°μ΄ν„°μ˜ νŠΉμ„±μ— λ§žλŠ” μ μ ˆν•œ μ•Œκ³ λ¦¬μ¦˜μ„ μ„ νƒν•˜κΈ° μœ„ν•œ κ·Όκ±° μ€‘μ‹¬μ˜ λΆ„μ„μœΌλ‘œ μ •μ°©. - **μ •μ±… λ³€ν™”:** Antigravity ν”„λ‘œμ νŠΈλŠ” μƒˆλ‘œμš΄ μœ„ν‚€ μ†ŒμŠ€ 데이터가 확보될 λ•Œλ§ˆλ‹€ μžλ™ν™”λœ EDA 리포트λ₯Ό μƒμ„±ν•˜μ—¬, μ§€μ‹μ˜ 밀도와 편ν–₯성을 사전에 점검함. ## πŸ”— 지식 μ—°κ²° (Graph) - Machine-Learning, [[Feature-Engineering|Feature-Engineering]], [[Dimensionality-Reduction|Dimensionality-Reduction]], Principal-Component-Analysis-PCA - **Raw Source:** 10_Wiki/Topics/AI/Exploratory-Data-Analysis.md