Files

T

Antigravity Agent f8b21af4be Wiki cleanup: error-doc removal, dedup merge, link normalization

10_Wiki/Topics 대규모 정리:
- 오류 캡처/미완성 stub 문서 227개 제거
- 교차폴더 중복 43클러스터 병합 (63파일 → redirect)
- 링크명 정규화: 깨진 링크 수정·redirect 직결·개념 매핑 ~2,400건
- 카테고리 MOC 6개 신규 생성
- Graph 섹션 미해결 related-keyword 링크 10,058건 제거

Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com>

2026-05-20 23:52:15 +09:00

6.1 KiB

Raw Blame History

id, title, category, status, canonical_id, aliases, duplicate_of, source_trust_level, confidence_score, verification_status, tags, raw_sources, last_reinforced, github_commit, tech_stack

title

Predictive Analytics

매 한 줄

"매 historical data 로 future outcomes 예측 — regression, classification, time series 의 union". 1990s statistics 에서 출발해 2010s ML 으로 mainstream, 2026 currently transformer-based forecasting (TimesFM, Chronos) 이 tabular 와 sequence 에서 공존. Business intelligence, supply chain, churn, fraud 의 매 backbone.

매 핵심

매 problem types

Regression: continuous target (revenue, demand, price).
Classification: discrete label (churn yes/no, fraud type).
Time series: temporally indexed (sales, sensor, stock).
Survival: time-to-event (customer lifetime, equipment failure).
Ranking: ordering items (recommendation, search).

매 modern stack (2026)

Tabular: XGBoost / LightGBM / CatBoost — 매 still SOTA on most tabular.
Deep tabular: TabPFN v2, FT-Transformer — 매 zero-shot tabular foundation.
Time series: Chronos (Amazon), TimesFM (Google), Moirai — 매 pretrained TS foundation models.
Classic TS: Prophet, statsforecast (AutoARIMA, ETS) — 매 baseline.

매 workflow

EDA + feature engineering.
Train/val/test split (temporal for TS).
Model selection + CV.
Hyperparameter tuning (Optuna).
Calibration + interpretability (SHAP).
Deployment + monitoring (drift detection).

💻 패턴

XGBoost regression baseline

import xgboost as xgb
from sklearn.model_selection import KFold
from sklearn.metrics import mean_absolute_error
import numpy as np

X, y = load_data()
kf = KFold(n_splits=5, shuffle=True, random_state=42)
scores = []
for tr, va in kf.split(X):
    model = xgb.XGBRegressor(
        n_estimators=2000, learning_rate=0.03, max_depth=6,
        subsample=0.8, colsample_bytree=0.8,
        early_stopping_rounds=50, eval_metric="mae",
    )
    model.fit(X.iloc[tr], y.iloc[tr], eval_set=[(X.iloc[va], y.iloc[va])], verbose=False)
    scores.append(mean_absolute_error(y.iloc[va], model.predict(X.iloc[va])))
print(f"CV MAE: {np.mean(scores):.4f} +/- {np.std(scores):.4f}")

LightGBM classification w/ early stopping

import lightgbm as lgb
model = lgb.LGBMClassifier(
    n_estimators=5000, learning_rate=0.02, num_leaves=63,
    min_child_samples=20, reg_alpha=0.1, reg_lambda=0.1,
)
model.fit(
    X_tr, y_tr,
    eval_set=[(X_va, y_va)],
    callbacks=[lgb.early_stopping(100), lgb.log_evaluation(0)],
)
proba = model.predict_proba(X_te)[:, 1]

Time series with TimesFM (foundation model, zero-shot)

import timesfm
tfm = timesfm.TimesFm(
    hparams=timesfm.TimesFmHparams(backend="gpu", per_core_batch_size=32),
    checkpoint=timesfm.TimesFmCheckpoint(huggingface_repo_id="google/timesfm-2.0-500m"),
)
forecast, _ = tfm.forecast(
    inputs=[history_series],     # list of np.array
    freq=[0],                     # 0=high freq, 1=med, 2=low
    horizon_len=96,
)

Prophet (interpretable seasonality)

from prophet import Prophet
m = Prophet(yearly_seasonality=True, weekly_seasonality=True, changepoint_prior_scale=0.05)
m.add_country_holidays(country_name="US")
m.fit(df)  # df with ds, y columns
future = m.make_future_dataframe(periods=90)
fcst = m.predict(future)  # yhat, yhat_lower, yhat_upper

SHAP explainability

import shap
explainer = shap.TreeExplainer(model)
shap_values = explainer.shap_values(X_te)
shap.summary_plot(shap_values, X_te)
shap.waterfall_plot(shap.Explanation(values=shap_values[0], base_values=explainer.expected_value, data=X_te.iloc[0]))

Probability calibration

from sklearn.calibration import CalibratedClassifierCV
calibrated = CalibratedClassifierCV(base_model, method="isotonic", cv="prefit")
calibrated.fit(X_va, y_va)
# Brier score + reliability diagram on test

Drift monitoring (Evidently)

from evidently.report import Report
from evidently.metric_preset import DataDriftPreset
report = Report(metrics=[DataDriftPreset()])
report.run(reference_data=X_train, current_data=X_prod)
report.save_html("drift.html")

매 결정 기준

상황	Approach
Tabular, <1M rows	XGBoost / LightGBM
Tabular, mixed features	CatBoost
Zero-shot tabular	TabPFN v2
Time series, single series	Prophet / AutoARIMA
Time series, many series, zero-shot	Chronos / TimesFM
Need interpretability	Linear / GAM + SHAP
Very large (>10M) tabular	LightGBM w/ histogram

기본값: LightGBM + Optuna + SHAP for tabular; Chronos zero-shot for new TS problems.

🔗 Graph

부모: Machine-Learning · Statistics
변형: Time-Series-Analysis · Regression
Adjacent: XGBoost · LightGBM · SHAP · Prophet

🤖 LLM 활용

언제: business outcome forecasting, risk scoring, demand planning, anomaly detection 의 supervised learning. 언제 X: causal inference (use Causal-Inference), prescriptive optimization (use Optimization).

❌ 안티패턴

Data leakage: 매 future info in training features — invalidates evaluation.
Random split on time series: must use temporal split.
Ignoring calibration: probabilities used for decisions but never calibrated.
No drift monitoring: model decays silently in production.
Over-engineering deep nets for small tabular: GBDT wins under 100k rows.

🧪 검증 / 중복

Verified (Kaggle Grandmaster patterns, Amazon Chronos paper 2024, Google TimesFM 2024).
신뢰도 A.

🕓 Changelog

날짜	변경
2026-05-08	Phase 1
2026-05-10	Manual cleanup — full predictive analytics workflow with 2026 foundation models

6.1 KiB Raw Blame History