id, title, category, status, canonical_id, aliases, duplicate_of, source_trust_level, confidence_score, verification_status, tags, raw_sources, last_reinforced, github_commit, tech_stack
id
title
category
status
canonical_id
aliases
duplicate_of
source_trust_level
confidence_score
verification_status
tags
raw_sources
last_reinforced
github_commit
tech_stack
wiki-2026-0508-multivariate-analysis
Multivariate Analysis
10_Wiki/Topics
verified
self
MVA
Multivariate Statistics
none
A
0.9
applied
statistics
dimensionality-reduction
multivariate
2026-05-10
pending
language
framework
Python
scikit-learn/statsmodels
Multivariate Analysis
매 한 줄
"매 multiple correlated variables 매 동시에" . 매 MVA는 covariance·correlation matrix를 base로 PCA/FA/CCA/MANOVA/discriminant analysis 매 통합, 매 2026 ML 시대에도 매 EDA·feature engineering·biostatistics·marketing research에서 매 indispensable foundation.
매 핵심
매 covariance matrix Σ
Σᵢⱼ = E[(Xᵢ - μᵢ)(Xⱼ - μⱼ)].
Eigendecomposition Σ = QΛQᵀ가 매 모든 multivariate 기법의 backbone.
Sample S = (1/(n-1)) XᶜᵀXᶜ.
매 family
PCA : max variance projection (eigen of Σ).
FA (Factor Analysis) : latent factors + idiosyncratic noise (X = ΛF + ε).
CCA : max correlation between two variable sets.
LDA : discriminant axes (between-class vs within-class scatter).
MANOVA : multivariate generalization of ANOVA (Wilks Λ, Pillai trace).
MDS : distance-preserving embedding.
매 응용
EDA on tabular data (correlation heatmap, biplot).
Feature engineering before tree models or MLP.
Genomics (gene expression PCA / FA).
Marketing segmentation (cluster + biplot).
Psychometrics (factor structure of survey).
💻 패턴
PCA — full pipeline
Factor Analysis with rotation
CCA (cross-modal)
Linear Discriminant Analysis
MANOVA via statsmodels
Mahalanobis distance (multivariate outliers)
매 결정 기준
상황
Approach
Variance compression unsupervised
PCA
Latent structure interpretation
Factor Analysis (with rotation)
Two correlated groups of vars
CCA
Supervised projection
LDA
Group-mean comparison (multivariate)
MANOVA
Distance-only data
MDS
Outlier detection multivariate
Mahalanobis / Min Cov Det
기본값 : 매 EDA에 PCA + correlation heatmap, 매 supervised에 LDA, 매 latent factor에 FA + varimax.
🔗 Graph
🤖 LLM 활용
언제 : 매 EDA narrative generation (PCA biplot 해석), factor labeling, MANOVA result writeup.
언제 X : 매 actual decomposition computing (numpy/sklearn use).
❌ 안티패턴
No standardization : 매 PCA before scaling → 매 large-magnitude vars dominate.
PCA on nonlinear : 매 swiss-roll에 매 PCA 매 사용 → 매 t-SNE/UMAP/Isomap 매 사용.
FA without rotation : 매 unrotated factors 매 interpret 어려움 — 매 varimax/promax 적용.
MANOVA assumption : 매 multivariate normality + equal cov 매 검증 X → wrong p-values.
🧪 검증 / 중복
Verified (Johnson & Wichern "Applied Multivariate", Hardle & Simar).
신뢰도 A.
🕓 Changelog
날짜
변경
2026-05-08
Phase 1
2026-05-10
Manual cleanup — full MVA toolkit (PCA/FA/CCA/LDA/MANOVA)