--- id: MATH-SAMPLING-001 category: "10_Wiki/πŸ’‘ Topics/AI" confidence_score: 1.0 tags: [math, statistics, sampling, data-science, bootstrap, stratified-sampling, monte-carlo] last_reinforced: 2026-04-26 --- # Sampling Techniques (μƒ˜ν”Œλ§ 기법) ## πŸ“Œ ν•œ 쀄 톡찰 (The Karpathy Summary) > "μ „μ²΄μ˜ κ±°λŒ€ν•¨μ— μ••λ„λ˜μ§€ 말고 λŒ€ν‘œμ„± μžˆλŠ” 쑰각(Sample)을 μ •κ΅ν•˜κ²Œ 도렀내어, μ΅œμ†Œν•œμ˜ μžμ›μœΌλ‘œ μ΅œλŒ€ν•œμ˜ 진싀을 μΆ”λ‘ ν•˜λΌ" β€” λͺ¨μ§‘단 전체λ₯Ό μ‘°μ‚¬ν•˜λŠ” λŒ€μ‹  κ·Έ 일뢀λ₯Ό μΆ”μΆœν•˜μ—¬ μ „μ²΄μ˜ νŠΉμ„±μ„ νŒŒμ•…ν•˜κ³  뢄석 νš¨μœ¨μ„ κ·ΉλŒ€ν™”ν•˜λŠ” 톡계적 방법둠. ## πŸ“– κ΅¬μ‘°ν™”λœ 지식 (Synthesized Content) - **μΆ”μΆœλœ νŒ¨ν„΄:** "Representative Subset Extraction and Bias Mitigation" β€” λ¬΄μž‘μœ„μ„±μ„ 기반으둜 ν•˜λ˜, λ°μ΄ν„°μ˜ μΈ΅(Strata)μ΄λ‚˜ ꡬ쑰λ₯Ό κ³ λ €ν•˜μ—¬ ν‘œλ³Έμ΄ νŠΉμ • 집단에 νŽΈμ€‘λ˜μ§€ μ•Šκ²Œ ν•¨μœΌλ‘œμ¨ μΆ”λ‘ μ˜ 였차(Sampling Error)λ₯Ό μ΅œμ†Œν™”ν•˜λŠ” νŒ¨ν„΄. - **μ£Όμš” 기법:** - **Simple Random Sampling:** λͺ¨λ“  μš”μ†Œμ—κ²Œ λ™μΌν•œ μΆ”μΆœ 기회 λΆ€μ—¬. - **Stratified Sampling:** λͺ¨μ§‘단을 성격이 λ‹€λ₯Έ 그룹으둜 λ‚˜λˆ„κ³  각 κ·Έλ£Ήμ—μ„œ λΉ„λ‘€ν•˜μ—¬ μΆ”μΆœ (λΆˆκ· ν˜• 데이터 ν•΄κ²°). - **Systematic Sampling:** μΌμ •ν•œ κ°„κ²©μœΌλ‘œ μΆ”μΆœ. - **Importance Sampling:** ν™•λ₯  뢄포가 ν¬μ†Œν•œ μ§€μ μ˜ μƒ˜ν”Œλ§ νš¨μœ¨μ„ λ†’μ΄λŠ” 기법 (κ°•ν™”ν•™μŠ΅μ—μ„œ ν™œμš©). - **Bootstrap:** 쀑볡 ν—ˆμš© μƒ˜ν”Œλ§ (앙상블 ν•™μŠ΅μ˜ 기초). - **의의:** 빅데이터 μ‹œλŒ€μ—λ„ μ „μˆ˜ μ‘°μ‚¬λŠ” λΉ„μš©κ³Ό μ‹œκ°„ λ©΄μ—μ„œ λΆˆκ°€λŠ₯ν•œ κ²½μš°κ°€ 많으며, μƒ˜ν”Œλ§μ€ 데이터 뢄석과 λ¨Έμ‹ λŸ¬λ‹ ν•™μŠ΅μ˜ 속도와 타당성을 κ²°μ •μ§“λŠ” 핡심 κ³΅μ •μž„. ## ⚠️ λͺ¨μˆœ 및 μ—…λ°μ΄νŠΈ (Contradictions & RL Update) - **κ³Όκ±° λ°μ΄ν„°μ™€μ˜ 좩돌:** λ‹¨μˆœνžˆ 많이 λ½‘λŠ” 것이 μ’‹λ‹€λŠ” μƒκ°μ—μ„œ λ²—μ–΄λ‚˜, μ΄μ œλŠ” λ°μ΄ν„°μ˜ 양보닀 'μ–Όλ§ˆλ‚˜ 편ν–₯λ˜μ§€ μ•Šκ²Œ λ½‘μ•˜λŠ”κ°€'κ°€ μ€‘μš”ν•΄μ‘ŒμœΌλ©°, 생성 λͺ¨λΈ(GAN, Diffusion)의 좜λ ₯ 이미지λ₯Ό κ³ λ₯΄λŠ” μ •κ΅ν•œ μƒ˜ν”Œλ§ μ „λž΅μœΌλ‘œκΉŒμ§€ ν™•μž₯됨. - **μ •μ±… λ³€ν™”:** Antigravity ν”„λ‘œμ νŠΈλŠ” 1,174개 지식 μžμ‚°μ˜ ν’ˆμ§ˆ κ²€μˆ˜ μ‹œ, μ‹œκ°„ νš¨μœ¨μ„ μœ„ν•΄ μ „μ²΄μ˜ 5%λ₯Ό μΈ΅ν™” μΆ”μΆœν•˜μ—¬ μ •λ°€ κ²€ν† ν•˜λŠ” μƒ˜ν”Œλ§ 기반의 ν’ˆμ§ˆ 관리(QA) ν”„λ‘œν† μ½œμ„ μˆ˜ν–‰ν•¨. ## πŸ”— 지식 μ—°κ²° (Graph) - [[Pre-processing-Data-for-AI|Pre-processing-Data-for-AI]], [[Prioritized-Experience-Replay|Prioritized-Experience-Replay]], [[Random-Forest-Classifiers|Random-Forest-Classifiers]], [[Probability-Theory-Foundations|Probability-Theory-Foundations]] - **Raw Source:** 10_Wiki/Topics/AI/Sampling-Techniques.md