Files
2nd/10_Wiki/Topics/AI_and_ML/Factor-Analysis.md
T
Antigravity Agent f8b21af4be Wiki cleanup: error-doc removal, dedup merge, link normalization
10_Wiki/Topics 대규모 정리:
- 오류 캡처/미완성 stub 문서 227개 제거
- 교차폴더 중복 43클러스터 병합 (63파일 → redirect)
- 링크명 정규화: 깨진 링크 수정·redirect 직결·개념 매핑 ~2,400건
- 카테고리 MOC 6개 신규 생성
- Graph 섹션 미해결 related-keyword 링크 10,058건 제거

Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com>
2026-05-20 23:52:15 +09:00

6.2 KiB
Raw Blame History

id, title, category, status, canonical_id, aliases, duplicate_of, source_trust_level, confidence_score, verification_status, tags, raw_sources, last_reinforced, github_commit, tech_stack
id title category status canonical_id aliases duplicate_of source_trust_level confidence_score verification_status tags raw_sources last_reinforced github_commit tech_stack
wiki-2026-0508-factor-analysis Factor Analysis 10_Wiki/Topics verified self
FA
EFA
CFA
PCA-vs-FA
latent factor
Spearman g
Big Five
none A 0.95 applied
statistics
factor-analysis
latent-variable
dimensionality-reduction
psychometrics
sem
2026-05-10 pending
language framework
Python / R factor_analyzer / lavaan / scikit-learn

Factor Analysis

매 한 줄

"매 latent factor 의 의 의 observed variable 의 explain". 매 EFA (exploratory) → 매 structure 의 discover. 매 CFA (confirmatory) → 매 hypothesis 의 test. 매 PCA 와 다름 — 매 FA 의 latent + error decompose. 매 famous: 매 Spearman g, Big Five.

매 핵심

매 model

X = ΛF + ε
  • X: 매 observed (n×p).
  • F: 매 factors (n×k), latent.
  • Λ: 매 loadings (p×k).
  • ε: 매 unique error.

매 PCA vs FA

  • PCA: 매 variance 의 maximize, 매 component = linear combo.
  • FA: 매 covariance 의 explain, 매 latent factor + error.

매 EFA vs CFA

  • EFA: 매 # factor 의 unknown.
  • CFA: 매 hypothesis 의 confirm (SEM).

매 step (EFA)

  1. KMO + Bartlett: 매 factorability.
  2. # factor: 매 scree, parallel analysis, MAP.
  3. Extract: 매 PAF, ML.
  4. Rotate: 매 varimax (orthogonal), oblimin (oblique).
  5. Interpret.

매 응용

  1. Psychometrics: 매 Big Five.
  2. Marketing: 매 brand perception.
  3. Finance: 매 risk factor.
  4. Bioinfo: 매 gene expression.
  5. NLP: 매 word factor.

💻 패턴

Factorability check (Python)

from factor_analyzer.factor_analyzer import calculate_kmo, calculate_bartlett_sphericity

chi_sq, p = calculate_bartlett_sphericity(df)
print(f'Bartlett: chi2={chi_sq:.2f}, p={p:.4f}')  # 매 p<0.05 OK

kmo_all, kmo_model = calculate_kmo(df)
print(f'KMO: {kmo_model:.2f}')  # 매 > 0.6 acceptable, > 0.8 great

Scree + parallel analysis

import numpy as np
import matplotlib.pyplot as plt
from factor_analyzer import FactorAnalyzer

fa = FactorAnalyzer(rotation=None)
fa.fit(df)
ev, v = fa.get_eigenvalues()
plt.plot(range(1, len(ev) + 1), ev, 'o-')
plt.axhline(1, color='red', ls='--')  # 매 Kaiser
plt.title('Scree')
plt.show()

EFA (varimax rotation)

fa = FactorAnalyzer(n_factors=5, rotation='varimax').fit(df)
loadings = pd.DataFrame(fa.loadings_, index=df.columns, columns=[f'F{i+1}' for i in range(5)])
print(loadings.round(2))

Interpretation (high-loading items)

def interpret_factors(loadings, threshold=0.4):
    for col in loadings.columns:
        items = loadings[loadings[col].abs() > threshold].index.tolist()
        print(f'{col}: {items}')

CFA (lavaan-style in semopy)

from semopy import Model
desc = """
Conscientiousness =~ orderly + reliable + careful
Openness =~ creative + curious + imaginative
Extraversion =~ sociable + assertive + energetic
Conscientiousness ~~ Openness
"""
model = Model(desc)
model.fit(df)
print(model.inspect())

Item difficulty (loading magnitude)

def factor_quality(loadings):
    return {
        'avg_loading': loadings.abs().mean(),
        'cross_loadings': (loadings.abs() > 0.4).sum(axis=1).gt(1).sum(),
        'low_communality': (loadings.abs().pow(2).sum(axis=1) < 0.3).sum(),
    }

Reliability (Cronbach α)

def cronbach_alpha(items):
    """매 매 factor 의 internal consistency."""
    k = items.shape[1]
    return k / (k - 1) * (1 - items.var(ddof=1).sum() / items.sum(axis=1).var(ddof=1))

Big Five inventory

BIG_FIVE_ITEMS = {
    'Openness': ['imaginative', 'curious', 'creative', 'broad_interest'],
    'Conscientiousness': ['organized', 'thorough', 'reliable', 'efficient'],
    'Extraversion': ['outgoing', 'energetic', 'assertive', 'talkative'],
    'Agreeableness': ['kind', 'trusting', 'cooperative', 'forgiving'],
    'Neuroticism': ['anxious', 'moody', 'stress', 'worry'],
}

Number of factors (parallel analysis)

def parallel_analysis(df, n_iter=100):
    """매 randomly permuted data 의 eigen 의 95th percentile."""
    n, p = df.shape
    rand_eigs = []
    for _ in range(n_iter):
        rand = np.random.normal(0, 1, (n, p))
        ev = np.linalg.eigvalsh(np.corrcoef(rand.T))[::-1]
        rand_eigs.append(ev)
    threshold = np.percentile(rand_eigs, 95, axis=0)
    actual = np.linalg.eigvalsh(np.corrcoef(df.T))[::-1]
    return np.sum(actual > threshold)

MIMIC / SEM

desc = """
# 매 measurement
Latent =~ x1 + x2 + x3
# 매 structural
Latent ~ age + sex
"""

Score factor (after fit)

factor_scores = fa.transform(df)
df['factor_1'] = factor_scores[:, 0]

Bayesian FA (PyMC)

import pymc as pm
with pm.Model() as bfa:
    L = pm.Normal('L', 0, 1, shape=(p, k))
    F = pm.Normal('F', 0, 1, shape=(n, k))
    sigma = pm.HalfNormal('sigma', 1, shape=p)
    pm.Normal('x', mu=F @ L.T, sigma=sigma, observed=X)
    trace = pm.sample()

매 결정 기준

상황 Approach
Discover structure EFA + parallel analysis
Test hypothesis CFA (semopy / lavaan)
Pure dim reduction PCA
Latent + measurement error FA
Psychometrics EFA → CFA
Causal latent SEM (MIMIC)

기본값: 매 EFA → 매 # factor (parallel) → 매 oblimin rotation → 매 CFA hypothesis confirm + 매 reliability check.

🔗 Graph

🤖 LLM 활용

언제: 매 questionnaire. 매 latent construct. 언제 X: 매 pure dim reduction (use PCA).

안티패턴

  • PCA = FA confusion: 매 different.
  • No factorability check: 매 garbage in.
  • Extract too many factors: 매 noise.
  • No rotation interp: 매 unintepretable.
  • No reliability: 매 factor 의 trust.

🧪 검증 / 중복

  • Verified (Spearman 1904, Thurstone, Costa & McCrae Big Five).
  • 신뢰도 A.

🕓 Changelog

날짜 변경
2026-04-26 STAT-FACTOR auto
2026-05-08 Phase 1
2026-05-10 Manual cleanup — EFA / CFA + 매 KMO / scree / varimax / Cronbach code