"매 80% of effects from 20% of causes". Vilfredo Pareto (1896) — 매 Italy land ownership 의 observation. 매 modern application: bug triage (top 20% bugs cause 80% crashes), customer revenue (top 20% pay 80%), feature importance (top 20% features carry 80% of model signal). 매 prioritization heuristic 의 default.
매 핵심
매 origin
Pareto 1896: 매 80% of Italian land owned by 20% of population.
Juran 1940s: 매 quality control — "vital few vs trivial many".
매 Power Law family — log-log linear distribution.
매 80/20 의 mnemonic 일 뿐 — 매 actual ratios vary (90/10, 70/30 등).
매 핵심 insight
Effects are NOT uniformly distributed across causes.
Sorting by impact 의 long tail 발견.
ROI: 매 fix top 20% causes → solve 80% of problem with 20% of effort.
Caveat: 매 remaining 20% of effects 매 important 일 수 있음 (safety, compliance).
매 software / ML context
Bug triage: 매 small set of bugs causes most crashes.
Performance hotspots: 매 5% of code = 95% of CPU time.
Feature importance: 매 top features dominate model signal.
Customer revenue: 매 enterprise tail tiny number of users.
Test coverage: 매 80% of bugs in 20% of code paths.
importxgboostasxgbmodel=xgb.XGBClassifier().fit(X,y)importance=pd.Series(model.feature_importances_,index=X.columns).sort_values(ascending=False)cumulative=importance.cumsum()/importance.sum()top_features=importance[cumulative<=0.8].index# vital fewprint(f"{len(top_features)} of {len(X.columns)} features carry 80% importance")
importcProfile,pstatsprofiler=cProfile.Profile()profiler.enable()run_workload()profiler.disable()stats=pstats.Stats(profiler).sort_stats("cumulative")stats.print_stats(20)# top 20 functions usually = 80%+ time
LLM cost: top tokens
# 매 prompt token spend trackingfromcollectionsimportCounterspend=Counter()forloginlogs:spend[log["prompt_template"]]+=log["tokens"]*log["cost_per_token"]total=sum(spend.values())running=0fortemplate,costinspend.most_common():running+=costprint(f"{template}: ${cost:.2f}, cumulative {running/total:.0%}")ifrunning/total>0.8:break
매 결정 기준
상황
Approach
Backlog overload
Pareto-rank by impact, ship top 20%
Slow application
Profile, fix hot path 먼저
Too many features
Importance-based pruning
Customer support
Tier by revenue, allocate AE coverage
Long bug list
Triage by frequency × severity
Compliance / safety
Pareto NOT applicable (매 100% 필수)
기본값: 매 sort by impact, take top until cumulative ≥ 80%.
언제: 매 backlog prioritization, optimization scope, feature selection, customer segmentation.
언제 X: 매 safety-critical / compliance — long tail 매 ignore 불가.
❌ 안티패턴
Treating 80/20 literally: 매 actual ratio varies — measure, don't assume.
Ignoring long tail entirely: 매 some long-tail items high-leverage (zero-day, churn-risk customer).
Cause/effect confusion: 매 20% of features cause 80% of accuracy ≠ keep only those (interactions matter).
Static analysis: 매 Pareto re-ranks over time — 매 weekly recompute.
Pareto in safety domain: 매 medical, finance, security — 매 100% coverage 필수.
🧪 검증 / 중복
Verified (Pareto 1896 Cours d'économie politique, Juran 1951 Quality Handbook).