--- id: wiki-2026-0508-decision-tree-xgboost title: "Decision Tree & XGBoost" category: AI_and_ML status: needs_review canonical_id: self aliases: [P-Reinforce-AUTO-DTX-001] duplicate_of: none source_trust_level: A confidence_score: 1.0 tags: [auto-reinforced, decision-tree, xgboost, gradient-boosting, learning-to-rank, machine-learning] raw_sources: [] last_reinforced: 2026-05-04 github_commit: pending inferred_by: Claude Opus 4.7 (auto-normalize 2026-05-08) tech_stack: language: unspecified framework: unspecified --- # [[Decision Tree & XGBoost|Decision Tree & XGBoost]] ## ๐Ÿ“Œ ํ•œ ์ค„ ํ†ต์ฐฐ (The Karpathy Summary) > "๋ฐ์ดํ„ฐ์˜ ์˜์‚ฌ๊ฒฐ์ • ์ง€๋„: ๋ณต์žกํ•œ ์กฐ๊ฑด๋“ค์„ ์˜ˆ/์•„๋‹ˆ์˜ค์˜ ํŠธ๋ฆฌ ๊ตฌ์กฐ๋กœ ๋ถ„ํ•ดํ•˜์—ฌ ๊ฒฐ๊ณผ๋ฅผ ์˜ˆ์ธกํ•˜๊ฑฐ๋‚˜ ์ˆœ์œ„๋ฅผ ๋งค๊ธฐ๋ฉฐ, ํŠนํžˆ ์ˆ˜๋งŽ์€ ์•ฝํ•œ ๋ชจ๋ธ์„ ๊ฒฐํ•ฉํ•˜๋Š” ๋ถ€์ŠคํŒ…(Boosting) ๊ธฐ๋ฒ•์„ ํ†ตํ•ด ํ˜„๋Œ€ ๊ฒ€์ƒ‰ ์‹œ์Šคํ…œ์˜ ๋žญํ‚น ์„ฑ๋Šฅ์„ ๊ทน๋Œ€ํ™”ํ•˜๋Š” ์•Œ๊ณ ๋ฆฌ์ฆ˜." ## ๐Ÿ“– ๊ตฌ์กฐํ™”๋œ ์ง€์‹ (Synthesized Content) ์˜์‚ฌ๊ฒฐ์ • ํŠธ๋ฆฌ์™€ ์ด๋ฅผ ๊ณ ๋„ํ™”ํ•œ XGBoost๋Š” ์ •ํ˜• ๋ฐ์ดํ„ฐ(Structured Data) ๋ถ„์„ ๋ฐ ์ˆœ์œ„ ํ•™์Šต(Learning to Rank) ๋ถ„์•ผ์—์„œ ๊ฐ€์žฅ ๊ฐ•๋ ฅํ•œ ์„ฑ๋Šฅ์„ ๋ฐœํœ˜ํ•˜๋Š” ๊ธฐ๊ณ„ ํ•™์Šต ๋ชจ๋ธ์ž…๋‹ˆ๋‹ค. 1. **์˜์‚ฌ๊ฒฐ์ • ํŠธ๋ฆฌ (Decision Tree)**: * **์›๋ฆฌ**: ๋ฐ์ดํ„ฐ์˜ ํŠน์ • ํŠน์ง•(Feature)์„ ๊ธฐ์ค€์œผ๋กœ ๊ฐ€์ง€๋ฅผ ์น˜๋ฉฐ ์ •๋‹ต์„ ์ฐพ์•„๊ฐ€๋Š” ๊ตฌ์กฐ์ž…๋‹ˆ๋‹ค. * **์žฅ์ **: ๋ชจ๋ธ์˜ ํŒ๋‹จ ๊ทผ๊ฑฐ๋ฅผ ์‹œ๊ฐ์ ์œผ๋กœ ํ™•์ธํ•˜๊ธฐ ์‰ฝ๊ณ  ์ง๊ด€์ ์ž…๋‹ˆ๋‹ค. * **ํ•œ๊ณ„**: ๋ฐ์ดํ„ฐ๊ฐ€ ์กฐ๊ธˆ๋งŒ ๋ฐ”๋€Œ์–ด๋„ ํŠธ๋ฆฌ๊ฐ€ ํฌ๊ฒŒ ๋ณ€ํ•˜๋Š” ๋ถˆ์•ˆ์ •์„ฑ(Overfitting)์ด ์กด์žฌํ•ฉ๋‹ˆ๋‹ค. 2. **XGBoost (Extreme Gradient Boosting)**: * **์›๋ฆฌ**: ์—ฌ๋Ÿฌ ๊ฐœ์˜ ์–•์€ ์˜์‚ฌ๊ฒฐ์ • ํŠธ๋ฆฌ๋ฅผ ์ˆœ์ฐจ์ ์œผ๋กœ ์ƒ์„ฑํ•˜๋˜, ์ด์ „ ํŠธ๋ฆฌ์˜ ์˜ค์ฐจ๋ฅผ ๋ณด์™„ํ•˜๋Š” ๋ฐฉ์‹์œผ๋กœ ํ•™์Šตํ•˜๋Š” ์•™์ƒ๋ธ”(Ensemble) ๊ธฐ๋ฒ•์ž…๋‹ˆ๋‹ค. * **ํŠน์ง•**: ๋ณ‘๋ ฌ ์ฒ˜๋ฆฌ๋ฅผ ํ†ตํ•ด ํ•™์Šต ์†๋„๊ฐ€ ๋งค์šฐ ๋น ๋ฅด๊ณ , ๊ณผ์ ํ•ฉ ๋ฐฉ์ง€๋ฅผ ์œ„ํ•œ ์ •๊ทœํ™” ๊ธฐ๋Šฅ์„ ๋‚ด์žฅํ•˜๊ณ  ์žˆ์Šต๋‹ˆ๋‹ค. 3. **๊ฒ€์ƒ‰ ์‹œ์Šคํ…œ์—์„œ์˜ ํ™œ์šฉ (LTR)**: * **[[LambdaMART|LambdaMART]]**: ์˜์‚ฌ๊ฒฐ์ • ํŠธ๋ฆฌ์™€ ๋ถ€์ŠคํŒ… ๊ธฐ๋ฒ•์„ ๊ฒฐํ•ฉํ•˜์—ฌ ๊ฒ€์ƒ‰ ๊ฒฐ๊ณผ์˜ ์ˆœ์œ„๋ฅผ ์ตœ์ ํ™”ํ•˜๋Š” ํ‘œ์ค€ ์•Œ๊ณ ๋ฆฌ์ฆ˜์ด๋ฉฐ, XGBoost๊ฐ€ ์ด๋ฅผ ๊ตฌํ˜„ํ•˜๋Š” ๋Œ€ํ‘œ์  ๋ผ์ด๋ธŒ๋Ÿฌ๋ฆฌ์ž…๋‹ˆ๋‹ค. * ์‚ฌ์šฉ์ž์˜ ๊ณผ๊ฑฐ ํด๋ฆญ ํŒจํ„ด, ๋ฌธ์„œ์˜ ์‹ ์„ ๋„, ํ…์ŠคํŠธ ์œ ์‚ฌ๋„ ๋“ฑ ์ˆ˜์‹ญ ๊ฐ€์ง€ ํŠน์ง•์„ ์ข…ํ•ฉํ•˜์—ฌ ์ตœ์ ์˜ ๊ฒ€์ƒ‰ ์ˆœ์œ„๋ฅผ ๊ฒฐ์ •ํ•ฉ๋‹ˆ๋‹ค. ## โš ๏ธ ๋ชจ์ˆœ ๋ฐ ์—…๋ฐ์ดํŠธ (Contradictions & Updates) * **์ปดํ“จํŒ… ๋ฆฌ์†Œ์Šค**: ํŠน์ง•(Feature)์˜ ์ˆ˜๊ฐ€ ๋Š˜์–ด๋‚ ์ˆ˜๋ก ํŠธ๋ฆฌ ๊นŠ์ด๊ฐ€ ๊นŠ์–ด์ง€๊ณ  ํ›ˆ๋ จ ์‹œ๊ฐ„์ด ๊ธฐํ•˜๊ธ‰์ˆ˜์ ์œผ๋กœ ์ฆ๊ฐ€ํ•ฉ๋‹ˆ๋‹ค. (๋‹จ๊ณ„์  ํŠน์ง• ๋„์ž…์ด ๊ถŒ์žฅ๋ฉ๋‹ˆ๋‹ค.) * **์™ธ๋ถ€ ์ถ”๋ก  ๊ตฌ์กฐ**: Elasticsearch์™€ ๊ฐ™์€ ๊ฒ€์ƒ‰ ์—”์ง„์€ ํŠธ๋ฆฌ ๊ธฐ๋ฐ˜ ๋ชจ๋ธ์˜ ์ถ”๋ก (Inference)์€ ์ง€์›ํ•˜์ง€๋งŒ, ๋ชจ๋ธ์„ ํ›ˆ๋ จํ•˜๋Š” ๊ณผ์ •์€ ๋ฐ˜๋“œ์‹œ ๋ณ„๋„์˜ ์ปดํ“จํŒ… ํ™˜๊ฒฝ์—์„œ ์ˆ˜ํ–‰๋˜์–ด์•ผ ํ•˜๋Š” ์•„ํ‚คํ…์ฒ˜์  ์ œ์•ฝ์ด ์žˆ์Šต๋‹ˆ๋‹ค. * **๋ฐ์ดํ„ฐ ์˜์กด์„ฑ**: ํ•™์Šต ๋ฐ์ดํ„ฐ(Judgment List)์˜ ํ’ˆ์งˆ์ด ๋‚ฎ์œผ๋ฉด ๋ชจ๋ธ์ด ํŽธํ–ฅ๋œ ์ˆœ์œ„๋ฅผ ๋‚ด๋†“๊ฒŒ ๋ฉ๋‹ˆ๋‹ค. ## ๐Ÿ’ป ์‹ค์ „ ๊ตฌํ˜„ ์ฝ”๋“œ (Boilerplate) `XGBoost`๋ฅผ ์‚ฌ์šฉํ•œ ํšŒ๊ท€ ์˜ˆ์ธก(๋˜๋Š” ๋žญํ‚น์„ ์œ„ํ•œ ์ ์ˆ˜ ์‚ฐ์ถœ)์˜ ๊ธฐ์ดˆ ์˜ˆ์‹œ์ž…๋‹ˆ๋‹ค. ```python import xgboost as xgb from sklearn.model_selection import train_test_split from sklearn.metrics import mean_squared_error import pandas as pd # 1. ๋ฐ์ดํ„ฐ ์ค€๋น„ (ํŠน์ง•: ๋ฌธ์„œ ์œ ์‚ฌ๋„, ํด๋ฆญ์ˆ˜, ์‹ ์„ ๋„ / ํƒ€๊นƒ: ๊ด€๋ จ์„ฑ ์ ์ˆ˜) data = { 'sim_score': [0.9, 0.5, 0.8, 0.2], 'clicks': [100, 20, 80, 5], 'freshness': [0.95, 0.3, 0.88, 0.1], 'relevance': [4, 1, 3, 0] # Ground Truth } df = pd.DataFrame(data) X = df.drop('relevance', axis=1) y = df['relevance'] # 2. ๋ชจ๋ธ ์ƒ์„ฑ ๋ฐ ํ•™์Šต model = xgb.XGBRegressor(objective='reg:squarederror', n_estimators=50) model.fit(X, y) # 3. ์ƒˆ๋กœ์šด ๊ฒฐ๊ณผ์— ๋Œ€ํ•œ ์ ์ˆ˜ ์˜ˆ์ธก new_docs = pd.DataFrame({'sim_score': [0.85], 'clicks': [50], 'freshness': [0.9]}) predicted_relevance = model.predict(new_docs) print(f"Predicted Relevance Score: {predicted_relevance[0]:.4f}") ``` ## ๐Ÿ”— ์ง€์‹ ์—ฐ๊ฒฐ (Graph) * **์ƒ์œ„ ๊ฐœ๋…**: [[Machine Learning (Machine Learning)|Machine Learning]], [[Learning to Rank (LTR)|Learning to Rank]] * **ํ•ต์‹ฌ ์•Œ๊ณ ๋ฆฌ์ฆ˜**: [[LambdaMART|LambdaMART]], [[Random Forest|Random Forest]] (Bagging vs Boosting) * **ํ‰๊ฐ€ ์ฒด๊ณ„**: [[nDCG|nDCG]], [[MAP|MAP]] --- *Last updated: 2026-05-04* ## ๐Ÿค– LLM ํ™œ์šฉ ํžŒํŠธ (How to Use This Knowledge) **์–ธ์ œ ์ด ์ง€์‹์„ ์“ฐ๋Š”๊ฐ€:** - *(TODO)* **์–ธ์ œ ์“ฐ๋ฉด ์•ˆ ๋˜๋Š”๊ฐ€:** - *(TODO)* ## ๐Ÿงช ๊ฒ€์ฆ ์ƒํƒœ (Validation) - **์ •๋ณด ์ƒํƒœ:** needs_review - **์ถœ์ฒ˜ ์‹ ๋ขฐ๋„:** A - **๊ฒ€ํ†  ์ด์œ :** *(P-Reinforce Phase 1 ์ž๋™ ์ •๊ทœํ™”. ๋ณธ๋ฌธ ๊ฒ€์ฆ ํ•„์š”.)* ## ๐Ÿงฌ ์ค‘๋ณต ๊ฒ€์‚ฌ (Duplicate Check) - **๊ธฐ์กด ์œ ์‚ฌ ๋ฌธ์„œ:** *(TODO: ์ธ๋ฑ์„œ ํด๋Ÿฌ์Šคํ„ฐ ๋ฆฌํฌํŠธ ์ฐธ์กฐ)* - **์ฒ˜๋ฆฌ ๋ฐฉ์‹:** UPDATE (์ž๋™ ์ •๊ทœํ™”) - **์ฒ˜๋ฆฌ ์ด์œ :** Phase 1 ์ •๊ทœํ™” โ€” ์˜› ํ…œํ”Œ๋ฆฟ/๋ˆ„๋ฝ ํ•„๋“œ ๋ณด๊ฐ•. ## ๐Ÿ•“ ๋ณ€๊ฒฝ ์ด๋ ฅ (Changelog) | ๋‚ ์งœ | ๋ณ€๊ฒฝ ๋‚ด์šฉ | ์ฒ˜๋ฆฌ ๋ฐฉ์‹ | ์‹ ๋ขฐ๋„ | |------|-----------|-----------|--------| | 2026-05-08 | P-Reinforce Phase 1 ์ •๊ทœํ™” (frontmatter + ํ—ค๋” ํ‘œ์ค€ํ™”) | UPDATE | A | ## ๐Ÿ’ป ์ฝ”๋“œ ํŒจํ„ด (Code Patterns) **ํŒจํ„ด 1:** *(TODO: ์ด ํ”„๋กœ์ ํŠธ ์ปจ๋ฒค์…˜ ๋ฐ˜์˜ํ•œ ๊ตฌ์กฐ ์Šค์ผˆ๋ ˆํ†ค)* ```text # TODO ``` ## ๐Ÿค” ์˜์‚ฌ๊ฒฐ์ • ๊ธฐ์ค€ (Decision Criteria) **์„ ํƒ A๋ฅผ ์จ์•ผ ํ•  ๋•Œ:** - *(TODO)* **์„ ํƒ B๋ฅผ ์จ์•ผ ํ•  ๋•Œ:** - *(TODO)* **๊ธฐ๋ณธ๊ฐ’:** > *(TODO)* ## โŒ ์•ˆํ‹ฐํŒจํ„ด (Anti-Patterns) - **[์•ˆํ‹ฐํŒจํ„ด]:** *(TODO: ๋ฌด์—‡์„ ํ•˜๋ฉด ์•ˆ ๋˜๋Š”๊ฐ€ + ์ด์œ  + ๋Œ€์‹  ๋ฌด์—‡์„)*