---
id: wiki-2026-0508-factor-analysis
title: Factor Analysis
category: 10_Wiki/Topics
status: verified
canonical_id: self
aliases: [FA, EFA, CFA, PCA-vs-FA, latent factor, Spearman g, Big Five]
duplicate_of: none
source_trust_level: A
confidence_score: 0.95
verification_status: applied
tags: [statistics, factor-analysis, latent-variable, dimensionality-reduction, psychometrics, sem]
raw_sources: []
last_reinforced: 2026-05-10
github_commit: pending
tech_stack:
  language: Python / R
  framework: factor_analyzer / lavaan / scikit-learn
---

# Factor Analysis

## 매 한 줄
> **"매 latent factor 의 의 의 observed variable 의 explain"**. 매 EFA (exploratory) → 매 structure 의 discover. 매 CFA (confirmatory) → 매 hypothesis 의 test. 매 PCA 와 다름 — 매 FA 의 latent + error decompose. 매 famous: 매 Spearman g, Big Five.

## 매 핵심

### 매 model
```
X = ΛF + ε
```
- X: 매 observed (n×p).
- F: 매 factors (n×k), latent.
- Λ: 매 loadings (p×k).
- ε: 매 unique error.

### 매 PCA vs FA
- **PCA**: 매 variance 의 maximize, 매 component = linear combo.
- **FA**: 매 covariance 의 explain, 매 latent factor + error.

### 매 EFA vs CFA
- **EFA**: 매 # factor 의 unknown.
- **CFA**: 매 hypothesis 의 confirm (SEM).

### 매 step (EFA)
1. **KMO + Bartlett**: 매 factorability.
2. **# factor**: 매 scree, parallel analysis, MAP.
3. **Extract**: 매 PAF, ML.
4. **Rotate**: 매 varimax (orthogonal), oblimin (oblique).
5. **Interpret**.

### 매 응용
1. **Psychometrics**: 매 Big Five.
2. **Marketing**: 매 brand perception.
3. **Finance**: 매 risk factor.
4. **Bioinfo**: 매 gene expression.
5. **NLP**: 매 word factor.

## 💻 패턴

### Factorability check (Python)
```python
from factor_analyzer.factor_analyzer import calculate_kmo, calculate_bartlett_sphericity

chi_sq, p = calculate_bartlett_sphericity(df)
print(f'Bartlett: chi2={chi_sq:.2f}, p={p:.4f}')  # 매 p<0.05 OK

kmo_all, kmo_model = calculate_kmo(df)
print(f'KMO: {kmo_model:.2f}')  # 매 > 0.6 acceptable, > 0.8 great
```

### Scree + parallel analysis
```python
import numpy as np
import matplotlib.pyplot as plt
from factor_analyzer import FactorAnalyzer

fa = FactorAnalyzer(rotation=None)
fa.fit(df)
ev, v = fa.get_eigenvalues()
plt.plot(range(1, len(ev) + 1), ev, 'o-')
plt.axhline(1, color='red', ls='--')  # 매 Kaiser
plt.title('Scree')
plt.show()
```

### EFA (varimax rotation)
```python
fa = FactorAnalyzer(n_factors=5, rotation='varimax').fit(df)
loadings = pd.DataFrame(fa.loadings_, index=df.columns, columns=[f'F{i+1}' for i in range(5)])
print(loadings.round(2))
```

### Interpretation (high-loading items)
```python
def interpret_factors(loadings, threshold=0.4):
    for col in loadings.columns:
        items = loadings[loadings[col].abs() > threshold].index.tolist()
        print(f'{col}: {items}')
```

### CFA (lavaan-style in semopy)
```python
from semopy import Model
desc = """
Conscientiousness =~ orderly + reliable + careful
Openness =~ creative + curious + imaginative
Extraversion =~ sociable + assertive + energetic
Conscientiousness ~~ Openness
"""
model = Model(desc)
model.fit(df)
print(model.inspect())
```

### Item difficulty (loading magnitude)
```python
def factor_quality(loadings):
    return {
        'avg_loading': loadings.abs().mean(),
        'cross_loadings': (loadings.abs() > 0.4).sum(axis=1).gt(1).sum(),
        'low_communality': (loadings.abs().pow(2).sum(axis=1) < 0.3).sum(),
    }
```

### Reliability (Cronbach α)
```python
def cronbach_alpha(items):
    """매 매 factor 의 internal consistency."""
    k = items.shape[1]
    return k / (k - 1) * (1 - items.var(ddof=1).sum() / items.sum(axis=1).var(ddof=1))
```

### Big Five inventory
```python
BIG_FIVE_ITEMS = {
    'Openness': ['imaginative', 'curious', 'creative', 'broad_interest'],
    'Conscientiousness': ['organized', 'thorough', 'reliable', 'efficient'],
    'Extraversion': ['outgoing', 'energetic', 'assertive', 'talkative'],
    'Agreeableness': ['kind', 'trusting', 'cooperative', 'forgiving'],
    'Neuroticism': ['anxious', 'moody', 'stress', 'worry'],
}
```

### Number of factors (parallel analysis)
```python
def parallel_analysis(df, n_iter=100):
    """매 randomly permuted data 의 eigen 의 95th percentile."""
    n, p = df.shape
    rand_eigs = []
    for _ in range(n_iter):
        rand = np.random.normal(0, 1, (n, p))
        ev = np.linalg.eigvalsh(np.corrcoef(rand.T))[::-1]
        rand_eigs.append(ev)
    threshold = np.percentile(rand_eigs, 95, axis=0)
    actual = np.linalg.eigvalsh(np.corrcoef(df.T))[::-1]
    return np.sum(actual > threshold)
```

### MIMIC / SEM
```python
desc = """
# 매 measurement
Latent =~ x1 + x2 + x3
# 매 structural
Latent ~ age + sex
"""
```

### Score factor (after fit)
```python
factor_scores = fa.transform(df)
df['factor_1'] = factor_scores[:, 0]
```

### Bayesian FA (PyMC)
```python
import pymc as pm
with pm.Model() as bfa:
    L = pm.Normal('L', 0, 1, shape=(p, k))
    F = pm.Normal('F', 0, 1, shape=(n, k))
    sigma = pm.HalfNormal('sigma', 1, shape=p)
    pm.Normal('x', mu=F @ L.T, sigma=sigma, observed=X)
    trace = pm.sample()
```

## 매 결정 기준
| 상황 | Approach |
|---|---|
| Discover structure | EFA + parallel analysis |
| Test hypothesis | CFA (semopy / lavaan) |
| Pure dim reduction | PCA |
| Latent + measurement error | FA |
| Psychometrics | EFA → CFA |
| Causal latent | SEM (MIMIC) |

**기본값**: 매 EFA → 매 # factor (parallel) → 매 oblimin rotation → 매 CFA hypothesis confirm + 매 reliability check.

## 🔗 Graph
- 부모: [[Statistics]]
- 변형: [[EFA]] · [[CFA]] · [[SEM]]
- 응용: [[Big Five]]
- Adjacent: [[PCA]]

## 🤖 LLM 활용
**언제**: 매 questionnaire. 매 latent construct.
**언제 X**: 매 pure dim reduction (use PCA).

## ❌ 안티패턴
- **PCA = FA confusion**: 매 different.
- **No factorability check**: 매 garbage in.
- **Extract too many factors**: 매 noise.
- **No rotation interp**: 매 unintepretable.
- **No reliability**: 매 factor 의 trust.

## 🧪 검증 / 중복
- Verified (Spearman 1904, Thurstone, Costa & McCrae Big Five).
- 신뢰도 A.

## 🕓 Changelog
| 날짜 | 변경 |
|---|---|
| 2026-04-26 | STAT-FACTOR auto |
| 2026-05-08 | Phase 1 |
| 2026-05-10 | Manual cleanup — EFA / CFA + 매 KMO / scree / varimax / Cronbach code |