---
id: wiki-2026-0508-ordinal-data-analysis
title: Ordinal Data Analysis
category: 10_Wiki/Topics
status: verified
canonical_id: self
aliases: [Ordinal Analysis, Ordinal Regression, Ordered Categorical Analysis, Likert Analysis]
duplicate_of: none
source_trust_level: A
confidence_score: 0.9
verification_status: applied
tags: [statistics, ordinal, regression, survey, likert]
raw_sources: []
last_reinforced: 2026-05-10
github_commit: pending
tech_stack: { language: python, framework: scipy/statsmodels/sklearn }
---

# Ordinal Data Analysis

## 한 줄
순서 정보는 있으나 간격은 정의되지 않은 데이터를 다루는 통계·ML 기법(Spearman, Kendall, ordinal regression).

## 핵심
- **Ordinal scale**: order O, distance X (예: 1=Poor … 5=Excellent).
- 평균 사용 금지 — 중앙값/사분위수/mode 사용.
- **상관**: Spearman ρ, Kendall τ (rank-based, 비모수).
- **회귀**: ordinal logit (Proportional Odds), ordinal probit, ordinal forest.
- **거리**: Manhattan on rank, Earth Mover's Distance.
- 평가: MAE on rank, QWK(Quadratic Weighted Kappa).
- 모델링 시 `OrdinalEncoder`는 단순 정수 → 잠재공간 가정 강함, 주의.

## 💻 패턴

```python
# 1. Spearman / Kendall 상관
from scipy.stats import spearmanr, kendalltau
import numpy as np

x = np.array([1, 2, 3, 4, 5, 5, 4, 3])      # 만족도 (1~5)
y = np.array([10, 25, 40, 70, 95, 90, 60, 30])  # 사용 시간(분)

rho, p1 = spearmanr(x, y)
tau, p2 = kendalltau(x, y)
print(f"Spearman ρ={rho:.3f} (p={p1:.4f})")
print(f"Kendall  τ={tau:.3f} (p={p2:.4f})")
```

```python
# 2. Ordinal Logistic Regression (statsmodels)
import pandas as pd
from statsmodels.miscmodels.ordinal_model import OrderedModel

df = pd.DataFrame({
    "satisfaction": [1, 2, 3, 4, 5, 4, 3, 2, 5, 4],  # ordinal target
    "age":          [22, 35, 41, 28, 50, 33, 45, 24, 60, 39],
    "premium":      [0, 0, 1, 0, 1, 1, 0, 0, 1, 1],
})

model = OrderedModel(
    df["satisfaction"],
    df[["age", "premium"]],
    distr="logit",  # or "probit"
)
res = model.fit(method="bfgs", disp=False)
print(res.summary())
```

```python
# 3. sklearn-style: mord (ordinal regression)
# pip install mord
from mord import LogisticAT
import numpy as np

X = np.random.randn(200, 4)
y = np.random.randint(1, 6, size=200)  # 1..5 ordinal

clf = LogisticAT(alpha=1.0)
clf.fit(X, y)
print("Train acc:", clf.score(X, y))
print("Pred:", clf.predict(X[:5]))
```

```python
# 4. Quadratic Weighted Kappa (대표 ordinal 평가)
from sklearn.metrics import cohen_kappa_score

y_true = [1, 2, 3, 4, 5, 3, 2, 4]
y_pred = [1, 2, 4, 4, 5, 2, 2, 5]
qwk = cohen_kappa_score(y_true, y_pred, weights="quadratic")
print(f"QWK = {qwk:.3f}")  # 1.0 = perfect, 0 = chance
```

```python
# 5. Likert 5점 척도 분포 시각화 (Diverging Bar)
import pandas as pd
import matplotlib.pyplot as plt

likert = pd.DataFrame({
    "Q1": [10, 20, 15, 30, 25],
    "Q2": [5, 15, 25, 35, 20],
    "Q3": [40, 25, 10, 15, 10],
}, index=["Strongly Disagree", "Disagree", "Neutral", "Agree", "Strongly Agree"])

likert.T.plot(kind="barh", stacked=True, colormap="RdYlGn", figsize=(8, 3))
plt.xlabel("% respondents")
plt.tight_layout()
```

```python
# 6. Ordinal Encoder (주의: 거리 가정 강함)
from sklearn.preprocessing import OrdinalEncoder

cats = [["Low"], ["Medium"], ["High"], ["High"], ["Low"]]
enc = OrdinalEncoder(categories=[["Low", "Medium", "High"]])
codes = enc.fit_transform(cats)
print(codes.ravel())  # [0. 1. 2. 2. 0.]
```

```python
# 7. Mann-Whitney U (두 그룹 ordinal 비교)
from scipy.stats import mannwhitneyu

group_a = [3, 4, 5, 4, 5, 5]   # 신제품 평점
group_b = [2, 3, 3, 4, 2, 3]   # 구제품 평점

stat, p = mannwhitneyu(group_a, group_b, alternative="greater")
print(f"U={stat}, p={p:.4f}")
```

## 결정 기준

| 상황 | 권장 기법 |
|---|---|
| 두 ordinal 변수 상관 | Spearman ρ |
| 동순위 많을 때 | Kendall τ-b |
| ordinal target 회귀 | OrderedModel(logit) / mord LogisticAT |
| 다중 그룹 비교 | Kruskal-Wallis |
| 두 그룹 비교 | Mann-Whitney U |
| 평가 지표 | QWK (대회 표준) |
| 단순 ML feature 변환 | OrdinalEncoder + tree 모델 |

## 🔗 Graph

## 🤖 LLM 활용
- 설문 자유응답 → LLM이 "5점 만족도"로 ordinal 정규화 → 통계 분석 결합.
- 모델 평가에서 LLM 응답 품질을 1~5 ordinal로 라벨링한 뒤 QWK로 평가.

## ❌ 안티패턴
- ordinal에 mean 사용 (4.2 같은 평균은 해석 모호).
- ordinal target에 일반 회귀(Linear/MSE) — 거리 비대칭 무시.
- one-hot 인코딩으로 순서 정보 손실.
- Pearson 상관으로 ordinal 비교 (선형성 가정 위배).

## 🧪 검증
- Spearman/Kendall로 두 ordinal 컬럼 합리적 양의 상관 확인.
- OrderedModel 추정 후 PR(predicted) vs actual confusion matrix가 대각 집중.
- QWK > 0.6면 강한 ordinal agreement.

## 🕓 Changelog
- 2026-05-08 Phase 1: 초안.
- 2026-05-10 Manual cleanup: 코드 7개 패턴, QWK/diverging bar 추가.