Files

T

Antigravity Agent f8b21af4be Wiki cleanup: error-doc removal, dedup merge, link normalization

10_Wiki/Topics 대규모 정리:
- 오류 캡처/미완성 stub 문서 227개 제거
- 교차폴더 중복 43클러스터 병합 (63파일 → redirect)
- 링크명 정규화: 깨진 링크 수정·redirect 직결·개념 매핑 ~2,400건
- 카테고리 MOC 6개 신규 생성
- Graph 섹션 미해결 related-keyword 링크 10,058건 제거

Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com>

2026-05-20 23:52:15 +09:00

4.9 KiB

Raw Blame History

id, title, category, status, canonical_id, aliases, duplicate_of, source_trust_level, confidence_score, verification_status, tags, raw_sources, last_reinforced, github_commit, tech_stack

title

Multivariate Analysis

매 한 줄

"매 multiple correlated variables 매 동시에". 매 MVA는 covariance·correlation matrix를 base로 PCA/FA/CCA/MANOVA/discriminant analysis 매 통합, 매 2026 ML 시대에도 매 EDA·feature engineering·biostatistics·marketing research에서 매 indispensable foundation.

매 핵심

매 covariance matrix Σ

Σᵢⱼ = E[(Xᵢ - μᵢ)(Xⱼ - μⱼ)].
Eigendecomposition Σ = QΛQᵀ가 매 모든 multivariate 기법의 backbone.
Sample S = (1/(n-1)) XᶜᵀXᶜ.

매 family

PCA: max variance projection (eigen of Σ).
FA (Factor Analysis): latent factors + idiosyncratic noise (X = ΛF + ε).
CCA: max correlation between two variable sets.
LDA: discriminant axes (between-class vs within-class scatter).
MANOVA: multivariate generalization of ANOVA (Wilks Λ, Pillai trace).
MDS: distance-preserving embedding.

매 응용

EDA on tabular data (correlation heatmap, biplot).
Feature engineering before tree models or MLP.
Genomics (gene expression PCA / FA).
Marketing segmentation (cluster + biplot).
Psychometrics (factor structure of survey).

💻 패턴

PCA — full pipeline

from sklearn.decomposition import PCA
from sklearn.preprocessing import StandardScaler
import matplotlib.pyplot as plt

X_std = StandardScaler().fit_transform(X)
pca = PCA(n_components=0.95)  # keep 95% variance
Z = pca.fit_transform(X_std)
print(pca.explained_variance_ratio_.cumsum())

# Biplot
loadings = pca.components_.T * np.sqrt(pca.explained_variance_)
plt.scatter(Z[:,0], Z[:,1], alpha=0.3)
for i, name in enumerate(feature_names):
    plt.arrow(0, 0, loadings[i,0]*3, loadings[i,1]*3, color='r')
    plt.text(loadings[i,0]*3.2, loadings[i,1]*3.2, name)

Factor Analysis with rotation

from sklearn.decomposition import FactorAnalysis
fa = FactorAnalysis(n_components=3, rotation='varimax')
fa.fit(X_std)
print(fa.components_)  # loadings

from sklearn.cross_decomposition import CCA
cca = CCA(n_components=2)
cca.fit(X_view1, X_view2)
U, V = cca.transform(X_view1, X_view2)
# diag(corr(U, V)) = canonical correlations

Linear Discriminant Analysis

from sklearn.discriminant_analysis import LinearDiscriminantAnalysis
lda = LinearDiscriminantAnalysis(n_components=2)
Z = lda.fit_transform(X_std, y)  # supervised projection

MANOVA via statsmodels

from statsmodels.multivariate.manova import MANOVA
maov = MANOVA.from_formula('y1 + y2 + y3 ~ group', data=df)
print(maov.mv_test())  # Wilks, Pillai, Hotelling, Roy

Mahalanobis distance (multivariate outliers)

import numpy as np
mu = X.mean(axis=0)
S_inv = np.linalg.inv(np.cov(X, rowvar=False))
def mahal(x):
    d = x - mu
    return np.sqrt(d @ S_inv @ d)
# threshold: chi2.ppf(0.975, df=p)

매 결정 기준

상황	Approach
Variance compression unsupervised	PCA
Latent structure interpretation	Factor Analysis (with rotation)
Two correlated groups of vars	CCA
Supervised projection	LDA
Group-mean comparison (multivariate)	MANOVA
Distance-only data	MDS
Outlier detection multivariate	Mahalanobis / Min Cov Det

기본값: 매 EDA에 PCA + correlation heatmap, 매 supervised에 LDA, 매 latent factor에 FA + varimax.

🔗 Graph

부모: Statistics · Linear-Algebra-Foundations
변형: PCA · Factor-Analysis · LDA
응용: EDA · Feature Engineering
Adjacent: Dimensionality-Reduction · t-SNE · UMAP

🤖 LLM 활용

언제: 매 EDA narrative generation (PCA biplot 해석), factor labeling, MANOVA result writeup. 언제 X: 매 actual decomposition computing (numpy/sklearn use).

❌ 안티패턴

No standardization: 매 PCA before scaling → 매 large-magnitude vars dominate.
PCA on nonlinear: 매 swiss-roll에 매 PCA 매 사용 → 매 t-SNE/UMAP/Isomap 매 사용.
FA without rotation: 매 unrotated factors 매 interpret 어려움 — 매 varimax/promax 적용.
MANOVA assumption: 매 multivariate normality + equal cov 매 검증 X → wrong p-values.

🧪 검증 / 중복

Verified (Johnson & Wichern "Applied Multivariate", Hardle & Simar).
신뢰도 A.

🕓 Changelog

날짜	변경
2026-05-08	Phase 1
2026-05-10	Manual cleanup — full MVA toolkit (PCA/FA/CCA/LDA/MANOVA)

4.9 KiB Raw Blame History