"매 orthogonal axes of maximum variance — eigendecomposition of covariance, equivalent to SVD of centered data". Pearson 1901, Hotelling 1933 의 statistical foundation; 2026 still the default linear dim-reduction baseline despite t-SNE/UMAP for viz. Note: spelled Principal (not "Principle") — kept alias for findability.
매 핵심
매 mathematical definition
Center data: X_c = X - mean(X).
Covariance: C = X_c^T X_c / (n-1).
Eigendecompose C = V Λ V^T; columns of V are principal axes.
Project: Z = X_c V_k (top k components).
Equivalent: SVD X_c = U Σ V^T → V same; singular values σ_i = sqrt((n-1) λ_i).
매 properties
Orthogonal: components uncorrelated.
Variance-ordered: first PC explains most variance.
pca=PCA(whiten=True).fit(X_train)X_train_w=pca.transform(X_train)X_test_w=pca.transform(X_test)# now features have unit variance, zero correlation
PCA for interpreting transformer hidden states
importtorchhidden=model.encode(prompts)# (B, D=4096)pca=PCA(n_components=8)Z=pca.fit_transform(hidden.cpu().numpy())# Top component often correlates with sentiment / topic / refusal.
언제: linear dim-reduction, whitening, denoising, hidden-state analysis, baseline before ML model.
언제 X: nonlinear manifold (use UMAP/autoencoder), categorical-only data (use MCA), interpretable original features required (use feature selection).
❌ 안티패턴
No standardization: features with large scale dominate components.
PCA on labels-included data: leakage if used for supervised pipeline.
Reading PC1 as "the cause": components are statistical, not causal.
PCA → tree models: GBDT doesn't benefit from rotation; just hurts interpretability.
Forgetting sign ambiguity: V and -V both valid; component direction is arbitrary.