Pre-Mortem Analysis for Technical Projects
Pre-Mortem Analysis for Technical Projects
Introduction
Most engineering teams conduct post-mortems after failures—analyzing what went wrong after the damage is done. But what if you could identify critical failure points before starting a project? The pre-mortem technique, developed by psychologist Gary Klein, flips traditional planning on its head by assuming failure has already occurred and working backward to identify why.
For principal engineers and technical leaders, pre-mortems are one of the most powerful tools for de-risking complex technical initiatives, uncovering blind spots, and building more resilient systems.
What is a Pre-Mortem?
A pre-mortem is a structured exercise performed before starting a significant project or making a major technical decision. The team imagines that the project has failed spectacularly—shipped late, had catastrophic bugs, or was cancelled—and then works backward to identify the most likely causes of that failure.
Unlike traditional risk assessment (which asks “What could go wrong?”), pre-mortems start from the assumption that things have gone wrong, which psychologically liberates team members to voice concerns they might otherwise suppress.
Why Pre-Mortems Work: The Psychology
Overcoming Optimism Bias
Technical teams, especially experienced ones, tend toward optimism about their projects. This is reinforced by:
- Planning fallacy: We consistently underestimate how long tasks take
- Illusion of control: Belief that our skills will overcome obstacles
- Groupthink: Pressure to align with team consensus
Pre-mortems counteract these biases by legitimizing skepticism. Instead of being the “negative person” pointing out risks, you’re participating in a structured exercise everyone expects to surface concerns.
Prospective Hindsight
Research by Deborah Mitchell and colleagues found that imagining an event has already occurred increases the ability to identify reasons for that outcome by 30%. This “prospective hindsight” helps uncover failure modes that traditional risk analysis misses.
How to Conduct a Pre-Mortem for Technical Projects
Phase 1: Setup (15 minutes)
Participants: Core engineering team, architects, product stakeholders, and at least one “outsider” (someone not deeply invested in the project)
Materials: Shared document or whiteboard
Framing: “It’s [6 months/1 year] from now. Our [project/system/migration] has failed catastrophically. It was a complete disaster. Take 5 minutes individually to write down all the reasons why it failed.”
Key Point: Be specific about the timeframe and definition of failure. For example:
- “The microservices migration was abandoned after 9 months”
- “The ML recommendation system launched but was rolled back within a week”
- “The performance optimization project delivered no measurable improvements”
Phase 2: Individual Brainstorming (10 minutes)
Each person silently writes down potential failure causes. Encourage specificity:
❌ Vague: “Poor communication”
✅ Specific: “Backend and frontend teams had different assumptions about API contract versioning, leading to incompatible deployments”
❌ Vague: “Technical challenges”
✅ Specific: “Our assumption that PostgreSQL could handle 10x traffic increase was wrong; we hit connection pool limits at 3x”
Phase 3: Share and Consolidate (20-30 minutes)
Go around the room, having each person share one failure cause at a time until all are captured. Group similar items and look for patterns.
Watch for:
- Repeated concerns: If multiple people identify the same risk, it’s probably significant
- Surprises: Novel failure modes no one else thought of
- Organizational/political factors: Often surfaced more readily in pre-mortems than traditional planning
Phase 4: Prioritize and Mitigate (30-45 minutes)
For the top 5-10 failure causes, discuss:
- Likelihood: How probable is this failure mode?
- Impact: How damaging would it be?
- Mitigation: What specific actions reduce the risk?
- Detection: How would we know if this is happening?
Document concrete action items with owners and deadlines.
Real-World Example: Database Migration Pre-Mortem
Project: Migrating from MongoDB to PostgreSQL for a core service handling 100M requests/day
Pre-Mortem Failure Scenario: “It’s 8 months from now. The PostgreSQL migration was abandoned. We’re still on MongoDB, having wasted 6 engineer-months and damaged team morale.”
Identified Failure Causes:
“We underestimated query pattern differences”
- MongoDB’s flexible schema let us write inefficient queries that worked fine
- PostgreSQL’s query planner exposed N+1 query patterns we didn’t know existed
- Mitigation: Run PostgreSQL in shadow mode for 2 weeks, analyzing query patterns before cutover
“The data migration script had a silent data loss bug”
- Subtle schema differences between document and relational model
- Discovered only after migration when customer reported missing data
- Mitigation: Write bidirectional validation comparing MongoDB and PostgreSQL records; run on 10% of production data before full migration
“Connection pool tuning was wrong for our traffic pattern”
- Default connection pool settings worked in staging but thrashed under production load
- Mitigation: Load test at 2x peak traffic; monitor connection pool metrics in shadow mode
“Team didn’t have PostgreSQL expertise for production issues”
- When query performance degraded, no one knew how to interpret
EXPLAIN ANALYZE - Mitigation: Two engineers complete PostgreSQL DBA training; engage consultant for first month post-migration
- When query performance degraded, no one knew how to interpret
“Dual-write period created data inconsistencies”
- Writing to both MongoDB and PostgreSQL during migration led to race conditions
- Mitigation: Use transaction log-based replication instead of dual writes; implement reconciliation job
Result: The pre-mortem uncovered the connection pool and dual-write issues that likely would have caused production incidents. The team added 3 weeks to the timeline but avoided a likely rollback scenario.
Pre-Mortem Template for Technical Projects
# Pre-Mortem: [Project Name]
## Context
- **Project:** [Brief description]
- **Timeline:** [Expected duration]
- **Key Stakeholders:** [List]
- **Success Criteria:** [What success looks like]
## Failure Scenario
It's [DATE]. [PROJECT] has failed. [SPECIFIC DESCRIPTION OF FAILURE].
## Potential Failure Causes
### Technical Failures
- [ ] Failure cause 1
- [ ] Failure cause 2
...
### Process/Organizational Failures
- [ ] Failure cause 1
- [ ] Failure cause 2
...
### External/Dependency Failures
- [ ] Failure cause 1
- [ ] Failure cause 2
...
## Top 5 Risks & Mitigations
| Risk | Likelihood | Impact | Mitigation | Owner | Due Date |
|------|-----------|---------|-----------|-------|----------|
| [Risk description] | H/M/L | H/M/L | [Specific action] | [Name] | [Date] |
## Detection Mechanisms
[How we'll know if we're heading toward failure]
## Pre-Mortem Participants
- [Names and roles]
## Date Conducted
[Date]
Common Failure Patterns in Technical Projects
Based on hundreds of pre-mortems, certain failure modes recur:
Architecture & Design
- Underestimated complexity of legacy system integrations
- Over-engineered solution for actual requirements
- Assumed third-party library/service capabilities that didn’t exist
- Didn’t account for data migration complexity
Team & Organization
- Key engineer left mid-project; knowledge not documented
- Team lacked expertise in critical technology
- Cross-team dependencies not identified or managed
- Scope creep from stakeholder requests
Operations & Deployment
- Testing environment didn’t reflect production characteristics
- Deployment process not tested end-to-end before launch
- Monitoring/observability gaps prevented debugging production issues
- Rollback plan assumed things that weren’t true
Timing & Scheduling
- External dependency (vendor, platform, team) delayed longer than expected
- Underestimated time for code review and iteration
- Didn’t account for holiday/vacation schedules
- Concurrent projects created resource conflicts
When to Conduct Pre-Mortems
Always:
- Major architectural changes (migrations, rewrites)
- New system launches serving production traffic
- Cross-team initiatives with many dependencies
- Projects with significant uncertainty or technical risk
Consider:
- Large refactoring efforts
- Performance optimization initiatives
- Security hardening projects
- Introducing new technologies/frameworks
Skip:
- Well-understood, low-risk maintenance work
- Individual engineer tasks under a week
- Projects with easy rollback and low blast radius
Pitfalls to Avoid
- Turning it into traditional brainstorming: The “assume failure” framing is critical; don’t dilute it
- Letting senior people speak first: They anchor others’ thinking; use round-robin or silent writing
- Stopping at identification: Pre-mortems are worthless without concrete mitigations
- Doing it too late: Once the team is committed and code is written, confirmation bias sets in
- Using it as a blame exercise: Focus on systemic risks, not individual performance
Integration with Other Practices
Pre-mortems complement other technical practices:
- Design reviews: Pre-mortem after design doc, before implementation
- RFC process: Include pre-mortem section in architecture RFCs
- Sprint planning: Mini pre-mortems for complex stories
- Incident response: Post-mortem findings feed into future pre-mortems
Measuring Effectiveness
Track these metrics over time:
- Risks identified in pre-mortem vs. actual issues: Good pre-mortems catch 40-60% of eventual problems
- Project success rate: Should improve as team gets better at pre-mortems
- Time to delivery: Better risk management often reduces delays
- Team confidence: Survey team comfort level before and after pre-mortem
Conclusion
Pre-mortems are a forcing function for intellectual honesty. They create psychological safety for surfacing uncomfortable truths and challenge the optimism bias endemic to engineering teams. For principal engineers, they’re a lightweight, high-leverage tool that can prevent catastrophic failures and build organizational muscle around risk assessment.
The best time to conduct a pre-mortem is when you’re most confident in your plan—that’s precisely when you need it most.
Action Item: Schedule a pre-mortem for your next significant technical initiative. Block 90 minutes, invite diverse perspectives, and genuinely imagine failure. Your future self will thank you.