d8a80f6272
이름만 다른(표기 변형) [[위키링크]]를 대상 문서의 canonical 제목으로 치환해 끊겼던 1,200개 링크를 연결. 제목/파일명 정규화 일치만 적용하고 별칭 매칭은 과병합 위험으로 제외(애매성 가드). 원본은 _link_reconcile_backup/ 에 백업. 도구: Datacollect/scripts/link_reconcile_apply.mjs Co-Authored-By: Claude Opus 4.8 <noreply@anthropic.com>
168 lines
5.8 KiB
Markdown
168 lines
5.8 KiB
Markdown
---
|
||
id: wiki-2026-0508-pareto-principle
|
||
title: Pareto Principle
|
||
category: 10_Wiki/Topics
|
||
status: verified
|
||
canonical_id: self
|
||
aliases: [80/20 Rule, Pareto Distribution, Power Law]
|
||
duplicate_of: none
|
||
source_trust_level: A
|
||
confidence_score: 0.9
|
||
verification_status: applied
|
||
tags: [pareto, 80-20, prioritization, decision-making]
|
||
raw_sources: []
|
||
last_reinforced: 2026-05-10
|
||
github_commit: pending
|
||
tech_stack:
|
||
language: python
|
||
framework: pandas, numpy
|
||
---
|
||
|
||
# Pareto Principle
|
||
|
||
## 매 한 줄
|
||
> **"매 80% of effects from 20% of causes"**. Vilfredo Pareto (1896) — 매 Italy land ownership 의 observation. 매 modern application: bug triage (top 20% bugs cause 80% crashes), customer revenue (top 20% pay 80%), feature importance (top 20% features carry 80% of model signal). 매 prioritization heuristic 의 default.
|
||
|
||
## 매 핵심
|
||
|
||
### 매 origin
|
||
- Pareto 1896: 매 80% of Italian land owned by 20% of population.
|
||
- Juran 1940s: 매 quality control — "vital few vs trivial many".
|
||
- 매 Power Law family — log-log linear distribution.
|
||
- 매 80/20 의 mnemonic 일 뿐 — 매 actual ratios vary (90/10, 70/30 등).
|
||
|
||
### 매 핵심 insight
|
||
- Effects are NOT uniformly distributed across causes.
|
||
- Sorting by impact 의 long tail 발견.
|
||
- ROI: 매 fix top 20% causes → solve 80% of problem with 20% of effort.
|
||
- Caveat: 매 remaining 20% of effects 매 important 일 수 있음 (safety, compliance).
|
||
|
||
### 매 software / ML context
|
||
- **Bug triage**: 매 small set of bugs causes most crashes.
|
||
- **Performance hotspots**: 매 5% of code = 95% of CPU time.
|
||
- **Feature importance**: 매 top features dominate model signal.
|
||
- **Customer revenue**: 매 enterprise tail tiny number of users.
|
||
- **Test coverage**: 매 80% of bugs in 20% of code paths.
|
||
|
||
### 매 응용
|
||
1. Backlog prioritization (impact × ease).
|
||
2. Performance profiling (optimize hot path first).
|
||
3. Feature engineering (drop low-importance features).
|
||
4. Customer success (focus on high-value accounts).
|
||
5. Bug fixing (top crash signatures first).
|
||
|
||
## 💻 패턴
|
||
|
||
### Pareto chart for bug triage
|
||
```python
|
||
import pandas as pd
|
||
import matplotlib.pyplot as plt
|
||
|
||
df = pd.DataFrame({"bug": ["A","B","C","D","E","F"],
|
||
"occurrences": [500, 300, 100, 50, 30, 20]})
|
||
df = df.sort_values("occurrences", ascending=False)
|
||
df["cum_pct"] = df["occurrences"].cumsum() / df["occurrences"].sum() * 100
|
||
|
||
fig, ax1 = plt.subplots()
|
||
ax1.bar(df["bug"], df["occurrences"])
|
||
ax2 = ax1.twinx()
|
||
ax2.plot(df["bug"], df["cum_pct"], "r-o")
|
||
ax2.axhline(80, color="gray", linestyle="--")
|
||
plt.show()
|
||
```
|
||
|
||
### Find the "vital 20%"
|
||
```python
|
||
def vital_few(values, threshold=0.8):
|
||
sorted_vals = sorted(values, reverse=True)
|
||
cumsum = 0
|
||
total = sum(sorted_vals)
|
||
for i, v in enumerate(sorted_vals, 1):
|
||
cumsum += v
|
||
if cumsum / total >= threshold:
|
||
return i, sorted_vals[:i]
|
||
return len(values), sorted_vals
|
||
```
|
||
|
||
### Feature importance pruning
|
||
```python
|
||
import xgboost as xgb
|
||
model = xgb.XGBClassifier().fit(X, y)
|
||
importance = pd.Series(model.feature_importances_, index=X.columns).sort_values(ascending=False)
|
||
cumulative = importance.cumsum() / importance.sum()
|
||
top_features = importance[cumulative <= 0.8].index # vital few
|
||
print(f"{len(top_features)} of {len(X.columns)} features carry 80% importance")
|
||
```
|
||
|
||
### Revenue concentration analysis
|
||
```python
|
||
customers = pd.read_csv("customers.csv").sort_values("revenue", ascending=False)
|
||
customers["cum_revenue_pct"] = customers["revenue"].cumsum() / customers["revenue"].sum()
|
||
top_20 = customers.head(int(len(customers) * 0.2))
|
||
print(f"Top 20% generate {top_20['revenue'].sum() / customers['revenue'].sum():.0%}")
|
||
```
|
||
|
||
### Profiling hot path (Python)
|
||
```python
|
||
import cProfile, pstats
|
||
profiler = cProfile.Profile()
|
||
profiler.enable()
|
||
run_workload()
|
||
profiler.disable()
|
||
stats = pstats.Stats(profiler).sort_stats("cumulative")
|
||
stats.print_stats(20) # top 20 functions usually = 80%+ time
|
||
```
|
||
|
||
### LLM cost: top tokens
|
||
```python
|
||
# 매 prompt token spend tracking
|
||
from collections import Counter
|
||
spend = Counter()
|
||
for log in logs:
|
||
spend[log["prompt_template"]] += log["tokens"] * log["cost_per_token"]
|
||
total = sum(spend.values())
|
||
running = 0
|
||
for template, cost in spend.most_common():
|
||
running += cost
|
||
print(f"{template}: ${cost:.2f}, cumulative {running/total:.0%}")
|
||
if running/total > 0.8: break
|
||
```
|
||
|
||
## 매 결정 기준
|
||
| 상황 | Approach |
|
||
|---|---|
|
||
| Backlog overload | Pareto-rank by impact, ship top 20% |
|
||
| Slow application | Profile, fix hot path 먼저 |
|
||
| Too many features | Importance-based pruning |
|
||
| Customer support | Tier by revenue, allocate AE coverage |
|
||
| Long bug list | Triage by frequency × severity |
|
||
| Compliance / safety | Pareto NOT applicable (매 100% 필수) |
|
||
|
||
**기본값**: 매 sort by impact, take top until cumulative ≥ 80%.
|
||
|
||
## 🔗 Graph
|
||
- 부모: [[Power-Law]]
|
||
- 변형: [[80/20 Rule]] · [[Long-Tail]]
|
||
- 응용: [[Feature-Importance]]
|
||
|
||
## 🤖 LLM 활용
|
||
**언제**: 매 backlog prioritization, optimization scope, feature selection, customer segmentation.
|
||
**언제 X**: 매 safety-critical / compliance — long tail 매 ignore 불가.
|
||
|
||
## ❌ 안티패턴
|
||
- **Treating 80/20 literally**: 매 actual ratio varies — measure, don't assume.
|
||
- **Ignoring long tail entirely**: 매 some long-tail items high-leverage (zero-day, churn-risk customer).
|
||
- **Cause/effect confusion**: 매 20% of features cause 80% of accuracy ≠ keep only those (interactions matter).
|
||
- **Static analysis**: 매 Pareto re-ranks over time — 매 weekly recompute.
|
||
- **Pareto in safety domain**: 매 medical, finance, security — 매 100% coverage 필수.
|
||
|
||
## 🧪 검증 / 중복
|
||
- Verified (Pareto 1896 Cours d'économie politique, Juran 1951 Quality Handbook).
|
||
- 신뢰도 A.
|
||
|
||
## 🕓 Changelog
|
||
| 날짜 | 변경 |
|
||
|---|---|
|
||
| 2026-05-08 | Phase 1 |
|
||
| 2026-05-10 | Manual cleanup — Pareto applications, charts, anti-patterns |
|