[G1-Sync] Manual knowledge update

2026-05-10 22:08:15 +09:00
parent 21ac3ed255
commit 504fd5fb42
3011 changed files with 380280 additions and 206977 deletions
@@ -2,65 +2,167 @@
 id: wiki-2026-0508-pareto-principle
 title: Pareto Principle
 category: 10_Wiki/Topics
-status: needs_review
+status: verified
 canonical_id: self
-aliases: [P-Reinforce-AUTO-PARE-001]
+aliases: [80/20 Rule, Pareto Distribution, Power Law]
 duplicate_of: none
 source_trust_level: A
-confidence_score: 0.96
-tags: [auto-reinforced, pareto-principle, 80-20-rule, Efficiency, power-law, distribution, productivity]
+confidence_score: 0.9
+verification_status: applied
+tags: [pareto, 80-20, prioritization, decision-making]
 raw_sources: []
-last_reinforced: 2026-04-20
+last_reinforced: 2026-05-10
 github_commit: pending
-inferred_by: Claude Opus 4.7 (auto-normalize 2026-05-08)
+tech_stack:
+  language: python
+  framework: pandas, numpy
 ---

-# [[Pareto-Principle|Pareto-Principle]]
+# Pareto Principle

-## 📌 한 줄 통찰 (The Karpathy Summary)
-> "핵심 20%의 지배: 전체 결과의 80%는 단 20%의 원인으로부터 발생한다는 우주의 불평등한 질서이자, 수만 가지 일 중 '단 한두 개의 승부처'를 찾아내어 노력을 집중시키라는 효율성 최고의 지각판."
+## 매 한 줄
+> **"매 80% of effects from 20% of causes"**. Vilfredo Pareto (1896) — 매 Italy land ownership 의 observation. 매 modern application: bug triage (top 20% bugs cause 80% crashes), customer revenue (top 20% pay 80%), feature importance (top 20% features carry 80% of model signal). 매 prioritization heuristic 의 default.

-## 📖 구조화된 지식 (Synthesized Content)
-파레토 법칙(Pareto-Principle) 혹은 80/20 법칙은 투입과 결과의 불균형을 설명하는 통계적 법칙입니다. (빌프레도 파레토 발견)
+## 매 핵심

-1.  **현실적 사례**:
-    *   20%의 고객이 매출의 80%를 차지.
-    *   20%의 버그가 전체 시스템 장애의 80%를 유발.
-    *   20%의 공부량이 시험 성적의 80%를 결정.
-2.  **왜 중요한가?**:
-    *   우리의 자원(시간, 돈, 에너지)은 유한하므로, 모든 곳에 똑같이 에너지를 쏟는 대신 '레버리지'가 큰 소수에 집중하게 하여 성과를 극대화하기 때문임. (Efficiency와 연결)
+### 매 origin
+- Pareto 1896: 매 80% of Italian land owned by 20% of population.
+- Juran 1940s: 매 quality control — "vital few vs trivial many".
+- 매 Power Law family — log-log linear distribution.
+- 매 80/20 의 mnemonic 일 뿐 — 매 actual ratios vary (90/10, 70/30 등).

-## ⚠️ 모순 및 업데이트 (Contradictions & Updates)
- **과거 데이터와의 충돌**: 과거에는 소외된 80%를 무시하는 정책(Tail trim)이 주류였으나, 현대 정책은 꼬리 부분의 틈새 수요들을 모아 거대한 시장을 만드는 '롱테일 정책'으로 파레토 법칙의 전략적 보완을 꾀함(RL Update). ([[Long-Tail|Long-Tail]]와 연결)
- **정책 변화(RL Update)**: AI 지식 관리 정책에서도, 대표님이 가장 자주 쓰고 중요하게 생각하는 '상위 20%의 핵심 지식 모델'을 먼저 탄탄히 구축(Antigravity Core)하는 것이 전체 프로젝트의 가치를 결정짓는 핵심 정책임.
+### 매 핵심 insight
+- Effects are NOT uniformly distributed across causes.
+- Sorting by impact 의 long tail 발견.
+- ROI: 매 fix top 20% causes → solve 80% of problem with 20% of effort.
+- Caveat: 매 remaining 20% of effects 매 important 일 수 있음 (safety, compliance).

-## 🔗 지식 연결 (Graph)
- [[Efficiency|Efficiency]], [[Long-Tail|Long-Tail]], [[Management|Management]], [[Decision Theory|Decision Theory]], [[Economic-Analysis|Economic-Analysis]], [[Knowledge synthesis|Knowledge synthesis]]
- **Modern Tech/Tools**: Pareto ch[[Arts|Arts]], Priority matrices (Eisenhower), Resource allocation [[Strategy|Strategy]].
---
+### 매 software / ML context
+- **Bug triage**: 매 small set of bugs causes most crashes.
+- **Performance hotspots**: 매 5% of code = 95% of CPU time.
+- **Feature importance**: 매 top features dominate model signal.
+- **Customer revenue**: 매 enterprise tail tiny number of users.
+- **Test coverage**: 매 80% of bugs in 20% of code paths.

-## 🤖 LLM 활용 힌트 (How to Use This Knowledge)
+### 매 응용
+1. Backlog prioritization (impact × ease).
+2. Performance profiling (optimize hot path first).
+3. Feature engineering (drop low-importance features).
+4. Customer success (focus on high-value accounts).
+5. Bug fixing (top crash signatures first).

-**언제 이 지식을 쓰는가:**
- *(TODO)*
+## 💻 패턴

-**언제 쓰면 안 되는가:**
- *(TODO)*
+### Pareto chart for bug triage
+```python
+import pandas as pd
+import matplotlib.pyplot as plt

-## 🧪 검증 상태 (Validation)
+df = pd.DataFrame({"bug": ["A","B","C","D","E","F"],
+                   "occurrences": [500, 300, 100, 50, 30, 20]})
+df = df.sort_values("occurrences", ascending=False)
+df["cum_pct"] = df["occurrences"].cumsum() / df["occurrences"].sum() * 100

- **정보 상태:** needs_review
- **출처 신뢰도:** A
- **검토 이유:** *(P-Reinforce Phase 1 자동 정규화. 본문 검증 필요.)*
+fig, ax1 = plt.subplots()
+ax1.bar(df["bug"], df["occurrences"])
+ax2 = ax1.twinx()
+ax2.plot(df["bug"], df["cum_pct"], "r-o")
+ax2.axhline(80, color="gray", linestyle="--")
+plt.show()
+```

-## 🧬 중복 검사 (Duplicate Check)
+### Find the "vital 20%"
+```python
+def vital_few(values, threshold=0.8):
+    sorted_vals = sorted(values, reverse=True)
+    cumsum = 0
+    total = sum(sorted_vals)
+    for i, v in enumerate(sorted_vals, 1):
+        cumsum += v
+        if cumsum / total >= threshold:
+            return i, sorted_vals[:i]
+    return len(values), sorted_vals
+```

- **기존 유사 문서:** *(TODO: 인덱서 클러스터 리포트 참조)*
- **처리 방식:** UPDATE (자동 정규화)
- **처리 이유:** Phase 1 정규화 — 옛 템플릿/누락 필드 보강.
+### Feature importance pruning
+```python
+import xgboost as xgb
+model = xgb.XGBClassifier().fit(X, y)
+importance = pd.Series(model.feature_importances_, index=X.columns).sort_values(ascending=False)
+cumulative = importance.cumsum() / importance.sum()
+top_features = importance[cumulative <= 0.8].index  # vital few
+print(f"{len(top_features)} of {len(X.columns)} features carry 80% importance")
+```

-## 🕓 변경 이력 (Changelog)
+### Revenue concentration analysis
+```python
+customers = pd.read_csv("customers.csv").sort_values("revenue", ascending=False)
+customers["cum_revenue_pct"] = customers["revenue"].cumsum() / customers["revenue"].sum()
+top_20 = customers.head(int(len(customers) * 0.2))
+print(f"Top 20% generate {top_20['revenue'].sum() / customers['revenue'].sum():.0%}")
+```

-| 날짜 | 변경 내용 | 처리 방식 | 신뢰도 |
-|------|-----------|-----------|--------|
-| 2026-05-08 | P-Reinforce Phase 1 정규화 (frontmatter + 헤더 표준화) | UPDATE | A |
+### Profiling hot path (Python)
+```python
+import cProfile, pstats
+profiler = cProfile.Profile()
+profiler.enable()
+run_workload()
+profiler.disable()
+stats = pstats.Stats(profiler).sort_stats("cumulative")
+stats.print_stats(20)  # top 20 functions usually = 80%+ time
+```
+
+### LLM cost: top tokens
+```python
+# 매 prompt token spend tracking
+from collections import Counter
+spend = Counter()
+for log in logs:
+    spend[log["prompt_template"]] += log["tokens"] * log["cost_per_token"]
+total = sum(spend.values())
+running = 0
+for template, cost in spend.most_common():
+    running += cost
+    print(f"{template}: ${cost:.2f}, cumulative {running/total:.0%}")
+    if running/total > 0.8: break
+```
+
+## 매 결정 기준
+| 상황 | Approach |
+|---|---|
+| Backlog overload | Pareto-rank by impact, ship top 20% |
+| Slow application | Profile, fix hot path 먼저 |
+| Too many features | Importance-based pruning |
+| Customer support | Tier by revenue, allocate AE coverage |
+| Long bug list | Triage by frequency × severity |
+| Compliance / safety | Pareto NOT applicable (매 100% 필수) |
+
+**기본값**: 매 sort by impact, take top until cumulative ≥ 80%.
+
+## 🔗 Graph
+- 부모: [[Power-Law]] · [[Prioritization]]
+- 변형: [[80-20-Rule]] · [[Long-Tail]] · [[Zipf-Law]]
+- 응용: [[Bug-Triage]] · [[Feature-Importance]] · [[Performance-Profiling]]
+- Adjacent: [[ICE-Scoring]] · [[RICE-Scoring]] · [[Eisenhower-Matrix]]
+
+## 🤖 LLM 활용
+**언제**: 매 backlog prioritization, optimization scope, feature selection, customer segmentation.
+**언제 X**: 매 safety-critical / compliance — long tail 매 ignore 불가.
+
+## ❌ 안티패턴
+- **Treating 80/20 literally**: 매 actual ratio varies — measure, don't assume.
+- **Ignoring long tail entirely**: 매 some long-tail items high-leverage (zero-day, churn-risk customer).
+- **Cause/effect confusion**: 매 20% of features cause 80% of accuracy ≠ keep only those (interactions matter).
+- **Static analysis**: 매 Pareto re-ranks over time — 매 weekly recompute.
+- **Pareto in safety domain**: 매 medical, finance, security — 매 100% coverage 필수.
+
+## 🧪 검증 / 중복
+- Verified (Pareto 1896 Cours d'économie politique, Juran 1951 Quality Handbook).
+- 신뢰도 A.
+
+## 🕓 Changelog
+| 날짜 | 변경 |
+|---|---|
+| 2026-05-08 | Phase 1 |
+| 2026-05-10 | Manual cleanup — Pareto applications, charts, anti-patterns |