Files
2nd/10_Wiki/Topics/AI_and_ML/Factor-Analysis.md
T
koriweb d8a80f6272 chore(wiki): dangling 링크 canonical 정규화 (768파일/1200건)
이름만 다른(표기 변형) [[위키링크]]를 대상 문서의 canonical 제목으로 치환해
끊겼던 1,200개 링크를 연결. 제목/파일명 정규화 일치만 적용하고 별칭 매칭은
과병합 위험으로 제외(애매성 가드). 원본은 _link_reconcile_backup/ 에 백업.
도구: Datacollect/scripts/link_reconcile_apply.mjs

Co-Authored-By: Claude Opus 4.8 <noreply@anthropic.com>
2026-06-08 12:24:15 +09:00

6.2 KiB
Raw Blame History

id, title, category, status, canonical_id, aliases, duplicate_of, source_trust_level, confidence_score, verification_status, tags, raw_sources, last_reinforced, github_commit, tech_stack
id title category status canonical_id aliases duplicate_of source_trust_level confidence_score verification_status tags raw_sources last_reinforced github_commit tech_stack
wiki-2026-0508-factor-analysis Factor Analysis 10_Wiki/Topics verified self
FA
EFA
CFA
PCA-vs-FA
latent factor
Spearman g
Big Five
none A 0.95 applied
statistics
factor-analysis
latent-variable
dimensionality-reduction
psychometrics
sem
2026-05-10 pending
language framework
Python / R factor_analyzer / lavaan / scikit-learn

Factor Analysis

매 한 줄

"매 latent factor 의 의 의 observed variable 의 explain". 매 EFA (exploratory) → 매 structure 의 discover. 매 CFA (confirmatory) → 매 hypothesis 의 test. 매 PCA 와 다름 — 매 FA 의 latent + error decompose. 매 famous: 매 Spearman g, Big Five.

매 핵심

매 model

X = ΛF + ε
  • X: 매 observed (n×p).
  • F: 매 factors (n×k), latent.
  • Λ: 매 loadings (p×k).
  • ε: 매 unique error.

매 PCA vs FA

  • PCA: 매 variance 의 maximize, 매 component = linear combo.
  • FA: 매 covariance 의 explain, 매 latent factor + error.

매 EFA vs CFA

  • EFA: 매 # factor 의 unknown.
  • CFA: 매 hypothesis 의 confirm (SEM).

매 step (EFA)

  1. KMO + Bartlett: 매 factorability.
  2. # factor: 매 scree, parallel analysis, MAP.
  3. Extract: 매 PAF, ML.
  4. Rotate: 매 varimax (orthogonal), oblimin (oblique).
  5. Interpret.

매 응용

  1. Psychometrics: 매 Big Five.
  2. Marketing: 매 brand perception.
  3. Finance: 매 risk factor.
  4. Bioinfo: 매 gene expression.
  5. NLP: 매 word factor.

💻 패턴

Factorability check (Python)

from factor_analyzer.factor_analyzer import calculate_kmo, calculate_bartlett_sphericity

chi_sq, p = calculate_bartlett_sphericity(df)
print(f'Bartlett: chi2={chi_sq:.2f}, p={p:.4f}')  # 매 p<0.05 OK

kmo_all, kmo_model = calculate_kmo(df)
print(f'KMO: {kmo_model:.2f}')  # 매 > 0.6 acceptable, > 0.8 great

Scree + parallel analysis

import numpy as np
import matplotlib.pyplot as plt
from factor_analyzer import FactorAnalyzer

fa = FactorAnalyzer(rotation=None)
fa.fit(df)
ev, v = fa.get_eigenvalues()
plt.plot(range(1, len(ev) + 1), ev, 'o-')
plt.axhline(1, color='red', ls='--')  # 매 Kaiser
plt.title('Scree')
plt.show()

EFA (varimax rotation)

fa = FactorAnalyzer(n_factors=5, rotation='varimax').fit(df)
loadings = pd.DataFrame(fa.loadings_, index=df.columns, columns=[f'F{i+1}' for i in range(5)])
print(loadings.round(2))

Interpretation (high-loading items)

def interpret_factors(loadings, threshold=0.4):
    for col in loadings.columns:
        items = loadings[loadings[col].abs() > threshold].index.tolist()
        print(f'{col}: {items}')

CFA (lavaan-style in semopy)

from semopy import Model
desc = """
Conscientiousness =~ orderly + reliable + careful
Openness =~ creative + curious + imaginative
Extraversion =~ sociable + assertive + energetic
Conscientiousness ~~ Openness
"""
model = Model(desc)
model.fit(df)
print(model.inspect())

Item difficulty (loading magnitude)

def factor_quality(loadings):
    return {
        'avg_loading': loadings.abs().mean(),
        'cross_loadings': (loadings.abs() > 0.4).sum(axis=1).gt(1).sum(),
        'low_communality': (loadings.abs().pow(2).sum(axis=1) < 0.3).sum(),
    }

Reliability (Cronbach α)

def cronbach_alpha(items):
    """매 매 factor 의 internal consistency."""
    k = items.shape[1]
    return k / (k - 1) * (1 - items.var(ddof=1).sum() / items.sum(axis=1).var(ddof=1))

Big Five inventory

BIG_FIVE_ITEMS = {
    'Openness': ['imaginative', 'curious', 'creative', 'broad_interest'],
    'Conscientiousness': ['organized', 'thorough', 'reliable', 'efficient'],
    'Extraversion': ['outgoing', 'energetic', 'assertive', 'talkative'],
    'Agreeableness': ['kind', 'trusting', 'cooperative', 'forgiving'],
    'Neuroticism': ['anxious', 'moody', 'stress', 'worry'],
}

Number of factors (parallel analysis)

def parallel_analysis(df, n_iter=100):
    """매 randomly permuted data 의 eigen 의 95th percentile."""
    n, p = df.shape
    rand_eigs = []
    for _ in range(n_iter):
        rand = np.random.normal(0, 1, (n, p))
        ev = np.linalg.eigvalsh(np.corrcoef(rand.T))[::-1]
        rand_eigs.append(ev)
    threshold = np.percentile(rand_eigs, 95, axis=0)
    actual = np.linalg.eigvalsh(np.corrcoef(df.T))[::-1]
    return np.sum(actual > threshold)

MIMIC / SEM

desc = """
# 매 measurement
Latent =~ x1 + x2 + x3
# 매 structural
Latent ~ age + sex
"""

Score factor (after fit)

factor_scores = fa.transform(df)
df['factor_1'] = factor_scores[:, 0]

Bayesian FA (PyMC)

import pymc as pm
with pm.Model() as bfa:
    L = pm.Normal('L', 0, 1, shape=(p, k))
    F = pm.Normal('F', 0, 1, shape=(n, k))
    sigma = pm.HalfNormal('sigma', 1, shape=p)
    pm.Normal('x', mu=F @ L.T, sigma=sigma, observed=X)
    trace = pm.sample()

매 결정 기준

상황 Approach
Discover structure EFA + parallel analysis
Test hypothesis CFA (semopy / lavaan)
Pure dim reduction PCA
Latent + measurement error FA
Psychometrics EFA → CFA
Causal latent SEM (MIMIC)

기본값: 매 EFA → 매 # factor (parallel) → 매 oblimin rotation → 매 CFA hypothesis confirm + 매 reliability check.

🔗 Graph

🤖 LLM 활용

언제: 매 questionnaire. 매 latent construct. 언제 X: 매 pure dim reduction (use PCA).

안티패턴

  • PCA = FA confusion: 매 different.
  • No factorability check: 매 garbage in.
  • Extract too many factors: 매 noise.
  • No rotation interp: 매 unintepretable.
  • No reliability: 매 factor 의 trust.

🧪 검증 / 중복

  • Verified (Spearman 1904, Thurstone, Costa & McCrae Big Five).
  • 신뢰도 A.

🕓 Changelog

날짜 변경
2026-04-26 STAT-FACTOR auto
2026-05-08 Phase 1
2026-05-10 Manual cleanup — EFA / CFA + 매 KMO / scree / varimax / Cronbach code