--- id: wiki-2026-0508-factor-analysis title: Factor Analysis category: 10_Wiki/Topics status: verified canonical_id: self aliases: [FA, EFA, CFA, PCA-vs-FA, latent factor, Spearman g, Big Five] duplicate_of: none source_trust_level: A confidence_score: 0.95 verification_status: applied tags: [statistics, factor-analysis, latent-variable, dimensionality-reduction, psychometrics, sem] raw_sources: [] last_reinforced: 2026-05-10 github_commit: pending tech_stack: language: Python / R framework: factor_analyzer / lavaan / scikit-learn --- # Factor Analysis ## 매 한 줄 > **"매 latent factor 의 의 의 observed variable 의 explain"**. 매 EFA (exploratory) → 매 structure 의 discover. 매 CFA (confirmatory) → 매 hypothesis 의 test. 매 PCA 와 다름 — 매 FA 의 latent + error decompose. 매 famous: 매 Spearman g, Big Five. ## 매 핵심 ### 매 model ``` X = ΛF + ε ``` - X: 매 observed (n×p). - F: 매 factors (n×k), latent. - Λ: 매 loadings (p×k). - ε: 매 unique error. ### 매 PCA vs FA - **PCA**: 매 variance 의 maximize, 매 component = linear combo. - **FA**: 매 covariance 의 explain, 매 latent factor + error. ### 매 EFA vs CFA - **EFA**: 매 # factor 의 unknown. - **CFA**: 매 hypothesis 의 confirm (SEM). ### 매 step (EFA) 1. **KMO + Bartlett**: 매 factorability. 2. **# factor**: 매 scree, parallel analysis, MAP. 3. **Extract**: 매 PAF, ML. 4. **Rotate**: 매 varimax (orthogonal), oblimin (oblique). 5. **Interpret**. ### 매 응용 1. **Psychometrics**: 매 Big Five. 2. **Marketing**: 매 brand perception. 3. **Finance**: 매 risk factor. 4. **Bioinfo**: 매 gene expression. 5. **NLP**: 매 word factor. ## 💻 패턴 ### Factorability check (Python) ```python from factor_analyzer.factor_analyzer import calculate_kmo, calculate_bartlett_sphericity chi_sq, p = calculate_bartlett_sphericity(df) print(f'Bartlett: chi2={chi_sq:.2f}, p={p:.4f}') # 매 p<0.05 OK kmo_all, kmo_model = calculate_kmo(df) print(f'KMO: {kmo_model:.2f}') # 매 > 0.6 acceptable, > 0.8 great ``` ### Scree + parallel analysis ```python import numpy as np import matplotlib.pyplot as plt from factor_analyzer import FactorAnalyzer fa = FactorAnalyzer(rotation=None) fa.fit(df) ev, v = fa.get_eigenvalues() plt.plot(range(1, len(ev) + 1), ev, 'o-') plt.axhline(1, color='red', ls='--') # 매 Kaiser plt.title('Scree') plt.show() ``` ### EFA (varimax rotation) ```python fa = FactorAnalyzer(n_factors=5, rotation='varimax').fit(df) loadings = pd.DataFrame(fa.loadings_, index=df.columns, columns=[f'F{i+1}' for i in range(5)]) print(loadings.round(2)) ``` ### Interpretation (high-loading items) ```python def interpret_factors(loadings, threshold=0.4): for col in loadings.columns: items = loadings[loadings[col].abs() > threshold].index.tolist() print(f'{col}: {items}') ``` ### CFA (lavaan-style in semopy) ```python from semopy import Model desc = """ Conscientiousness =~ orderly + reliable + careful Openness =~ creative + curious + imaginative Extraversion =~ sociable + assertive + energetic Conscientiousness ~~ Openness """ model = Model(desc) model.fit(df) print(model.inspect()) ``` ### Item difficulty (loading magnitude) ```python def factor_quality(loadings): return { 'avg_loading': loadings.abs().mean(), 'cross_loadings': (loadings.abs() > 0.4).sum(axis=1).gt(1).sum(), 'low_communality': (loadings.abs().pow(2).sum(axis=1) < 0.3).sum(), } ``` ### Reliability (Cronbach α) ```python def cronbach_alpha(items): """매 매 factor 의 internal consistency.""" k = items.shape[1] return k / (k - 1) * (1 - items.var(ddof=1).sum() / items.sum(axis=1).var(ddof=1)) ``` ### Big Five inventory ```python BIG_FIVE_ITEMS = { 'Openness': ['imaginative', 'curious', 'creative', 'broad_interest'], 'Conscientiousness': ['organized', 'thorough', 'reliable', 'efficient'], 'Extraversion': ['outgoing', 'energetic', 'assertive', 'talkative'], 'Agreeableness': ['kind', 'trusting', 'cooperative', 'forgiving'], 'Neuroticism': ['anxious', 'moody', 'stress', 'worry'], } ``` ### Number of factors (parallel analysis) ```python def parallel_analysis(df, n_iter=100): """매 randomly permuted data 의 eigen 의 95th percentile.""" n, p = df.shape rand_eigs = [] for _ in range(n_iter): rand = np.random.normal(0, 1, (n, p)) ev = np.linalg.eigvalsh(np.corrcoef(rand.T))[::-1] rand_eigs.append(ev) threshold = np.percentile(rand_eigs, 95, axis=0) actual = np.linalg.eigvalsh(np.corrcoef(df.T))[::-1] return np.sum(actual > threshold) ``` ### MIMIC / SEM ```python desc = """ # 매 measurement Latent =~ x1 + x2 + x3 # 매 structural Latent ~ age + sex """ ``` ### Score factor (after fit) ```python factor_scores = fa.transform(df) df['factor_1'] = factor_scores[:, 0] ``` ### Bayesian FA (PyMC) ```python import pymc as pm with pm.Model() as bfa: L = pm.Normal('L', 0, 1, shape=(p, k)) F = pm.Normal('F', 0, 1, shape=(n, k)) sigma = pm.HalfNormal('sigma', 1, shape=p) pm.Normal('x', mu=F @ L.T, sigma=sigma, observed=X) trace = pm.sample() ``` ## 매 결정 기준 | 상황 | Approach | |---|---| | Discover structure | EFA + parallel analysis | | Test hypothesis | CFA (semopy / lavaan) | | Pure dim reduction | PCA | | Latent + measurement error | FA | | Psychometrics | EFA → CFA | | Causal latent | SEM (MIMIC) | **기본값**: 매 EFA → 매 # factor (parallel) → 매 oblimin rotation → 매 CFA hypothesis confirm + 매 reliability check. ## 🔗 Graph - 부모: [[Statistics]] - 변형: [[EFA]] · [[CFA]] · [[SEM]] - 응용: [[Big Five]] - Adjacent: [[PCA]] ## 🤖 LLM 활용 **언제**: 매 questionnaire. 매 latent construct. **언제 X**: 매 pure dim reduction (use PCA). ## ❌ 안티패턴 - **PCA = FA confusion**: 매 different. - **No factorability check**: 매 garbage in. - **Extract too many factors**: 매 noise. - **No rotation interp**: 매 unintepretable. - **No reliability**: 매 factor 의 trust. ## 🧪 검증 / 중복 - Verified (Spearman 1904, Thurstone, Costa & McCrae Big Five). - 신뢰도 A. ## 🕓 Changelog | 날짜 | 변경 | |---|---| | 2026-04-26 | STAT-FACTOR auto | | 2026-05-08 | Phase 1 | | 2026-05-10 | Manual cleanup — EFA / CFA + 매 KMO / scree / varimax / Cronbach code |