"매 institutionalized regret-into-knowledge transformer". 매 1940s US Army After-Action Review (AAR) 의 origin → 매 2003 Google SRE 의 blameless postmortem 의 modern form. 매 each incident 매 paid-for data; throwing it 매 paying twice.
매 핵심
매 mechanism
Incident / project ends.
Timeline 매 reconstructed.
Root causes (plural) 매 identified.
Action items 매 owned + scheduled.
Doc 매 published, indexed, re-read.
매 modern best practices (Google SRE)
Blameless — 매 systems 매 fail, not people.
Concrete action items with owners + due dates.
5 Whys or Causal Analysis using STAMP (no single root cause).
Public within org (searchable).
매 응용
Production incidents (PagerDuty integration).
Project retros (sprint, quarter).
Security incidents (legal-friendly variant).
💻 패턴
Postmortem template (Google SRE-style)
# Incident YYYY-MM-DD: <short title>
## Summary
1-2 sentences.
## Impact
- Users affected: ...
- Duration: ...
- Revenue: ...
## Root causes (plural)
1. ...
2. ...
## Trigger
What event started the incident.
## Resolution
What stopped it.
## Detection
How we knew (and how late).
## Timeline (UTC)
| Time | Event |
|---|---|
| 14:32 | Deploy started |
| 14:34 | Error rate spike |
| ... | ... |
## What went well
- ...
## What went poorly
- ...
## Where we got lucky
- ...
## Action items
| ID | Action | Owner | Due | Type |
|---|---|---|---|---|
| AI-1 | Add canary deploy | @alice | 2026-05-20 | prevent |
| AI-2 | Improve alert | @bob | 2026-05-15 | detect |
Why did the site go down? Server OOM.
Why OOM? Cache grew unbounded.
Why unbounded? No eviction policy.
Why no policy? PR review missed it.
Why missed? Checklist had no cache item.
→ Root: missing checklist item (process), not the engineer.