Files

T

koriweb d8a80f6272 chore(wiki): dangling 링크 canonical 정규화 (768파일/1200건)

이름만 다른(표기 변형) [[위키링크]]를 대상 문서의 canonical 제목으로 치환해
끊겼던 1,200개 링크를 연결. 제목/파일명 정규화 일치만 적용하고 별칭 매칭은
과병합 위험으로 제외(애매성 가드). 원본은 _link_reconcile_backup/ 에 백업.
도구: Datacollect/scripts/link_reconcile_apply.mjs

Co-Authored-By: Claude Opus 4.8 <noreply@anthropic.com>

2026-06-08 12:24:15 +09:00

6.2 KiB

Raw Blame History

id, title, category, status, canonical_id, aliases, duplicate_of, source_trust_level, confidence_score, verification_status, tags, raw_sources, last_reinforced, github_commit, tech_stack

title

Factor Analysis

매 한 줄

"매 latent factor 의 의 의 observed variable 의 explain". 매 EFA (exploratory) → 매 structure 의 discover. 매 CFA (confirmatory) → 매 hypothesis 의 test. 매 PCA 와 다름 — 매 FA 의 latent + error decompose. 매 famous: 매 Spearman g, Big Five.

매 핵심

매 model

X = ΛF + ε

X: 매 observed (n×p).
F: 매 factors (n×k), latent.
Λ: 매 loadings (p×k).
ε: 매 unique error.

매 PCA vs FA

PCA: 매 variance 의 maximize, 매 component = linear combo.
FA: 매 covariance 의 explain, 매 latent factor + error.

매 EFA vs CFA

EFA: 매 # factor 의 unknown.
CFA: 매 hypothesis 의 confirm (SEM).

매 step (EFA)

KMO + Bartlett: 매 factorability.
# factor: 매 scree, parallel analysis, MAP.
Extract: 매 PAF, ML.
Rotate: 매 varimax (orthogonal), oblimin (oblique).
Interpret.

매 응용

Psychometrics: 매 Big Five.
Marketing: 매 brand perception.
Finance: 매 risk factor.
Bioinfo: 매 gene expression.
NLP: 매 word factor.

💻 패턴

Factorability check (Python)

from factor_analyzer.factor_analyzer import calculate_kmo, calculate_bartlett_sphericity

chi_sq, p = calculate_bartlett_sphericity(df)
print(f'Bartlett: chi2={chi_sq:.2f}, p={p:.4f}')  # 매 p<0.05 OK

kmo_all, kmo_model = calculate_kmo(df)
print(f'KMO: {kmo_model:.2f}')  # 매 > 0.6 acceptable, > 0.8 great

Scree + parallel analysis

import numpy as np
import matplotlib.pyplot as plt
from factor_analyzer import FactorAnalyzer

fa = FactorAnalyzer(rotation=None)
fa.fit(df)
ev, v = fa.get_eigenvalues()
plt.plot(range(1, len(ev) + 1), ev, 'o-')
plt.axhline(1, color='red', ls='--')  # 매 Kaiser
plt.title('Scree')
plt.show()

EFA (varimax rotation)

fa = FactorAnalyzer(n_factors=5, rotation='varimax').fit(df)
loadings = pd.DataFrame(fa.loadings_, index=df.columns, columns=[f'F{i+1}' for i in range(5)])
print(loadings.round(2))

Interpretation (high-loading items)

def interpret_factors(loadings, threshold=0.4):
    for col in loadings.columns:
        items = loadings[loadings[col].abs() > threshold].index.tolist()
        print(f'{col}: {items}')

CFA (lavaan-style in semopy)

from semopy import Model
desc = """
Conscientiousness =~ orderly + reliable + careful
Openness =~ creative + curious + imaginative
Extraversion =~ sociable + assertive + energetic
Conscientiousness ~~ Openness
"""
model = Model(desc)
model.fit(df)
print(model.inspect())

Item difficulty (loading magnitude)

def factor_quality(loadings):
    return {
        'avg_loading': loadings.abs().mean(),
        'cross_loadings': (loadings.abs() > 0.4).sum(axis=1).gt(1).sum(),
        'low_communality': (loadings.abs().pow(2).sum(axis=1) < 0.3).sum(),
    }

Reliability (Cronbach α)

def cronbach_alpha(items):
    """매 매 factor 의 internal consistency."""
    k = items.shape[1]
    return k / (k - 1) * (1 - items.var(ddof=1).sum() / items.sum(axis=1).var(ddof=1))

Big Five inventory

BIG_FIVE_ITEMS = {
    'Openness': ['imaginative', 'curious', 'creative', 'broad_interest'],
    'Conscientiousness': ['organized', 'thorough', 'reliable', 'efficient'],
    'Extraversion': ['outgoing', 'energetic', 'assertive', 'talkative'],
    'Agreeableness': ['kind', 'trusting', 'cooperative', 'forgiving'],
    'Neuroticism': ['anxious', 'moody', 'stress', 'worry'],
}

Number of factors (parallel analysis)

def parallel_analysis(df, n_iter=100):
    """매 randomly permuted data 의 eigen 의 95th percentile."""
    n, p = df.shape
    rand_eigs = []
    for _ in range(n_iter):
        rand = np.random.normal(0, 1, (n, p))
        ev = np.linalg.eigvalsh(np.corrcoef(rand.T))[::-1]
        rand_eigs.append(ev)
    threshold = np.percentile(rand_eigs, 95, axis=0)
    actual = np.linalg.eigvalsh(np.corrcoef(df.T))[::-1]
    return np.sum(actual > threshold)

MIMIC / SEM

desc = """
# 매 measurement
Latent =~ x1 + x2 + x3
# 매 structural
Latent ~ age + sex
"""

Score factor (after fit)

factor_scores = fa.transform(df)
df['factor_1'] = factor_scores[:, 0]

Bayesian FA (PyMC)

import pymc as pm
with pm.Model() as bfa:
    L = pm.Normal('L', 0, 1, shape=(p, k))
    F = pm.Normal('F', 0, 1, shape=(n, k))
    sigma = pm.HalfNormal('sigma', 1, shape=p)
    pm.Normal('x', mu=F @ L.T, sigma=sigma, observed=X)
    trace = pm.sample()

매 결정 기준

상황	Approach
Discover structure	EFA + parallel analysis
Test hypothesis	CFA (semopy / lavaan)
Pure dim reduction	PCA
Latent + measurement error	FA
Psychometrics	EFA → CFA
Causal latent	SEM (MIMIC)

기본값: 매 EFA → 매 # factor (parallel) → 매 oblimin rotation → 매 CFA hypothesis confirm + 매 reliability check.

🔗 Graph

부모: Statistics
변형: EFA · CFA · SEM
응용: Big Five
Adjacent: PCA

🤖 LLM 활용

언제: 매 questionnaire. 매 latent construct. 언제 X: 매 pure dim reduction (use PCA).

❌ 안티패턴

PCA = FA confusion: 매 different.
No factorability check: 매 garbage in.
Extract too many factors: 매 noise.
No rotation interp: 매 unintepretable.
No reliability: 매 factor 의 trust.

🧪 검증 / 중복

Verified (Spearman 1904, Thurstone, Costa & McCrae Big Five).
신뢰도 A.

🕓 Changelog

날짜	변경
2026-04-26	STAT-FACTOR auto
2026-05-08	Phase 1
2026-05-10	Manual cleanup — EFA / CFA + 매 KMO / scree / varimax / Cronbach code

6.2 KiB Raw Blame History Unescape Escape