Every incident is a learning opportunity. But only if you actually learn from it. Post-mortems are how you turn "something broke" into "we made sure it won't break that way again."
Here's how to run post-mortems that work.
What is a Blameless Post-Mortem?
A blameless post-mortem is an incident review focused on systems and processes, not individuals. The goal is to understand what happened and improve, not to find someone to punish.
Why "Blameless" Matters
- Honest reporting: People hide mistakes when they fear blame
- Faster recovery: Issues get reported sooner
- Better fixes: You address root causes, not symptoms
- Team health: Psychological safety improves performance
When to Run a Post-Mortem
Always Run One
- Customer-facing outages
- Data loss or corruption
- Security incidents
- Major revenue impact
Usually Run One
- Internal tool outages
- Near-misses that could have been worse
- Repeated smaller incidents
- Anything that woke someone up at 3 AM
Skip It
- Expected maintenance that went as planned
- Issues caught before impact
- Trivial incidents with obvious fixes
Post-Mortem Timeline
| When | What |
|---|---|
| During incident | Document timeline, actions taken, decisions made |
| Within 24-48 hours | Hold post-mortem meeting while details are fresh |
| Within 1 week | Publish post-mortem document |
| Ongoing | Track action items to completion |
Post-Mortem Template
Incident Post-Mortem
# [Incident Title]
**Date:** [Date of incident]
**Duration:** [Start time - End time]
**Impact:** [Who was affected, how severely]
**Severity:** [P1/P2/P3]
## Timeline (UTC)
- [HH:MM] - [What happened]
- [HH:MM] - [What happened]
- ...
## Root Cause
[What caused the incident - focus on systems, not people]
## Contributing Factors
- [Factor 1]
- [Factor 2]
## Detection
[How was the incident detected? How long from start to detection?]
## Resolution
[How was the incident fixed?]
## Action Items
- [ ] [Action 1] - Owner: [Name] - Due: [Date]
- [ ] [Action 2] - Owner: [Name] - Due: [Date]
## Lessons Learned
### What went well
- [Thing that worked]
### What could be improved
- [Thing that didn't work]
## Appendix
- Links to logs, dashboards, PRs, etc.
Running the Post-Mortem Meeting
Who Should Attend
- Incident responders — Anyone who worked on the incident
- Stakeholders — People affected by the incident
- Subject matter experts — If specialized knowledge is relevant
- Optional: Leadership (for visibility, not for blame)
Meeting Agenda
- Set the stage (2 min) — "This is blameless, we're here to learn"
- Review timeline (10 min) — What happened, when
- Discuss root cause (15 min) — Why did it happen?
- Identify improvements (15 min) — What can we do better?
- Assign action items (5 min) — Who does what, by when?
- What went well (3 min) — Acknowledge good responses
Facilitation Tips
- Start with the timeline — Facts before analysis
- Ask "why" repeatedly — Get to root causes
- Redirect blame — "What allowed that to happen?" not "Why did you do that?"
- Keep it focused — Don't let it become a general complaint session
- End with action items — Every post-mortem should produce concrete next steps
Root Cause Analysis
The "Five Whys"
Keep asking "why" until you reach something actionable:
Why was the site down? → Database ran out of connections
Why did it run out of connections? → Connection leak in the code
Why was there a connection leak? → Missing error handling
Why was error handling missing? → Not caught in code review
Why wasn't it caught in review? → No linting rule for connection cleanup
Action: Add linting rule to catch missing connection cleanup
Multiple Causes
Most incidents have multiple contributing factors:
- Immediate cause: What directly triggered the incident
- Contributing causes: What made it worse or harder to fix
- Root causes: Underlying systemic issues
Common Post-Mortem Anti-Patterns
Anti-Pattern 1: Blame Assignment
"John pushed the bad code." → "What allowed bad code to reach production?"
Anti-Pattern 2: Shallow Analysis
"We'll be more careful next time." → How? What specific changes will you make?
Anti-Pattern 3: Action Item Overload
Creating 20 action items ensures none get done. Focus on the 2-3 highest-impact fixes.
Anti-Pattern 4: No Follow-Up
Action items without owners and deadlines are wishes. Track them like any other work.
Anti-Pattern 5: Post-Mortem by Email
Written docs are good, but a meeting ensures shared understanding. Do both.
Making Action Items Stick
Characteristics of Good Action Items
- Specific: "Add alert for connection pool >80%" not "Improve monitoring"
- Owned: One person responsible for completion
- Time-bound: Clear due date
- Tracked: In your issue tracker, not lost in a doc
- Prioritized: Against other work
Action Item Types
- Immediate fixes: Patch the specific issue
- Process changes: Update runbooks, checklists
- Tool improvements: Better monitoring, automated checks
- Architecture changes: Remove single points of failure
Sharing Post-Mortems
Internal Sharing
- Post to shared wiki/documentation
- Link from incident tracking system
- Share in team channel
- Reference in on-call handoffs
External Sharing (for Customer-Impacting Incidents)
- Simplified version on status page
- Direct communication to affected customers
- Optional: Public blog post for major incidents
Post-Mortem Checklist
Before the Meeting
- ☐ Gather timeline and logs
- ☐ Schedule meeting within 48 hours
- ☐ Invite all relevant participants
- ☐ Prepare initial timeline draft
During the Meeting
- ☐ Start with blameless framing
- ☐ Review timeline together
- ☐ Identify root causes
- ☐ Generate specific action items
- ☐ Acknowledge what went well
After the Meeting
- ☐ Publish post-mortem document
- ☐ Create tickets for action items
- ☐ Track completion
- ☐ Review in future sprints
Prevent Incidents Before They Happen
Post-mortems help you learn from incidents. OpsPulse helps you catch them earlier. Smart monitoring reduces both frequency and impact.
Start Free Monitoring →Summary
Effective post-mortems:
- Are blameless — Focus on systems, not people
- Have clear timelines — Facts before analysis
- Find root causes — Ask "why" repeatedly
- Produce action items — Specific, owned, tracked
- Share learnings — Inside and outside the team
Every incident is expensive. Make sure you get your money's worth in learnings.