Pre-Mortem Analysis for Technical Projects

Introduction

Most engineering teams conduct post-mortems after failures—analyzing what went wrong after the damage is done. But what if you could identify critical failure points before starting a project? The pre-mortem technique, developed by psychologist Gary Klein, flips traditional planning on its head by assuming failure has already occurred and working backward to identify why.

For principal engineers and technical leaders, pre-mortems are one of the most powerful tools for de-risking complex technical initiatives, uncovering blind spots, and building more resilient systems.

What is a Pre-Mortem?

A pre-mortem is a structured exercise performed before starting a significant project or making a major technical decision. The team imagines that the project has failed spectacularly—shipped late, had catastrophic bugs, or was cancelled—and then works backward to identify the most likely causes of that failure.

Unlike traditional risk assessment (which asks “What could go wrong?”), pre-mortems start from the assumption that things have gone wrong, which psychologically liberates team members to voice concerns they might otherwise suppress.

Why Pre-Mortems Work: The Psychology

Overcoming Optimism Bias

Technical teams, especially experienced ones, tend toward optimism about their projects. This is reinforced by:

Planning fallacy: We consistently underestimate how long tasks take
Illusion of control: Belief that our skills will overcome obstacles
Groupthink: Pressure to align with team consensus

Pre-mortems counteract these biases by legitimizing skepticism. Instead of being the “negative person” pointing out risks, you’re participating in a structured exercise everyone expects to surface concerns.

Prospective Hindsight

Research by Deborah Mitchell and colleagues found that imagining an event has already occurred increases the ability to identify reasons for that outcome by 30%. This “prospective hindsight” helps uncover failure modes that traditional risk analysis misses.

How to Conduct a Pre-Mortem for Technical Projects

Phase 1: Setup (15 minutes)

Participants: Core engineering team, architects, product stakeholders, and at least one “outsider” (someone not deeply invested in the project)

Materials: Shared document or whiteboard

Framing: “It’s [6 months/1 year] from now. Our [project/system/migration] has failed catastrophically. It was a complete disaster. Take 5 minutes individually to write down all the reasons why it failed.”

Key Point: Be specific about the timeframe and definition of failure. For example:

“The microservices migration was abandoned after 9 months”
“The ML recommendation system launched but was rolled back within a week”
“The performance optimization project delivered no measurable improvements”

Phase 2: Individual Brainstorming (10 minutes)

Each person silently writes down potential failure causes. Encourage specificity:

❌ Vague: “Poor communication”
✅ Specific: “Backend and frontend teams had different assumptions about API contract versioning, leading to incompatible deployments”

❌ Vague: “Technical challenges”
✅ Specific: “Our assumption that PostgreSQL could handle 10x traffic increase was wrong; we hit connection pool limits at 3x”

Go around the room, having each person share one failure cause at a time until all are captured. Group similar items and look for patterns.

Watch for:

Repeated concerns: If multiple people identify the same risk, it’s probably significant
Surprises: Novel failure modes no one else thought of
Organizational/political factors: Often surfaced more readily in pre-mortems than traditional planning

Phase 4: Prioritize and Mitigate (30-45 minutes)

For the top 5-10 failure causes, discuss:

Likelihood: How probable is this failure mode?
Impact: How damaging would it be?
Mitigation: What specific actions reduce the risk?
Detection: How would we know if this is happening?

Document concrete action items with owners and deadlines.

Real-World Example: Database Migration Pre-Mortem

Project: Migrating from MongoDB to PostgreSQL for a core service handling 100M requests/day

Pre-Mortem Failure Scenario: “It’s 8 months from now. The PostgreSQL migration was abandoned. We’re still on MongoDB, having wasted 6 engineer-months and damaged team morale.”

Identified Failure Causes:

“We underestimated query pattern differences”
- MongoDB’s flexible schema let us write inefficient queries that worked fine
- PostgreSQL’s query planner exposed N+1 query patterns we didn’t know existed
- Mitigation: Run PostgreSQL in shadow mode for 2 weeks, analyzing query patterns before cutover
“The data migration script had a silent data loss bug”
- Subtle schema differences between document and relational model
- Discovered only after migration when customer reported missing data
- Mitigation: Write bidirectional validation comparing MongoDB and PostgreSQL records; run on 10% of production data before full migration
“Connection pool tuning was wrong for our traffic pattern”
- Default connection pool settings worked in staging but thrashed under production load
- Mitigation: Load test at 2x peak traffic; monitor connection pool metrics in shadow mode
“Team didn’t have PostgreSQL expertise for production issues”
- When query performance degraded, no one knew how to interpret EXPLAIN ANALYZE
- Mitigation: Two engineers complete PostgreSQL DBA training; engage consultant for first month post-migration
“Dual-write period created data inconsistencies”
- Writing to both MongoDB and PostgreSQL during migration led to race conditions
- Mitigation: Use transaction log-based replication instead of dual writes; implement reconciliation job

Result: The pre-mortem uncovered the connection pool and dual-write issues that likely would have caused production incidents. The team added 3 weeks to the timeline but avoided a likely rollback scenario.

Pre-Mortem Template for Technical Projects

# Pre-Mortem: [Project Name]

## Context
- **Project:** [Brief description]
- **Timeline:** [Expected duration]
- **Key Stakeholders:** [List]
- **Success Criteria:** [What success looks like]

## Failure Scenario
It's [DATE]. [PROJECT] has failed. [SPECIFIC DESCRIPTION OF FAILURE].

## Potential Failure Causes

### Technical Failures
- [ ] Failure cause 1
- [ ] Failure cause 2
...

### Process/Organizational Failures
- [ ] Failure cause 1
- [ ] Failure cause 2
...

### External/Dependency Failures
- [ ] Failure cause 1
- [ ] Failure cause 2
...

## Top 5 Risks & Mitigations

| Risk | Likelihood | Impact | Mitigation | Owner | Due Date |
|------|-----------|---------|-----------|-------|----------|
| [Risk description] | H/M/L | H/M/L | [Specific action] | [Name] | [Date] |

## Detection Mechanisms
[How we'll know if we're heading toward failure]

## Pre-Mortem Participants
- [Names and roles]

## Date Conducted
[Date]

Common Failure Patterns in Technical Projects

Based on hundreds of pre-mortems, certain failure modes recur:

Architecture & Design

Underestimated complexity of legacy system integrations
Over-engineered solution for actual requirements
Assumed third-party library/service capabilities that didn’t exist
Didn’t account for data migration complexity

Team & Organization

Key engineer left mid-project; knowledge not documented
Team lacked expertise in critical technology
Cross-team dependencies not identified or managed
Scope creep from stakeholder requests

Operations & Deployment

Testing environment didn’t reflect production characteristics
Deployment process not tested end-to-end before launch
Monitoring/observability gaps prevented debugging production issues
Rollback plan assumed things that weren’t true

Timing & Scheduling

External dependency (vendor, platform, team) delayed longer than expected
Underestimated time for code review and iteration
Didn’t account for holiday/vacation schedules
Concurrent projects created resource conflicts

When to Conduct Pre-Mortems

Always:

Major architectural changes (migrations, rewrites)
New system launches serving production traffic
Cross-team initiatives with many dependencies
Projects with significant uncertainty or technical risk

Consider:

Large refactoring efforts
Performance optimization initiatives
Security hardening projects
Introducing new technologies/frameworks

Skip:

Well-understood, low-risk maintenance work
Individual engineer tasks under a week
Projects with easy rollback and low blast radius

Pitfalls to Avoid

Turning it into traditional brainstorming: The “assume failure” framing is critical; don’t dilute it
Letting senior people speak first: They anchor others’ thinking; use round-robin or silent writing
Stopping at identification: Pre-mortems are worthless without concrete mitigations
Doing it too late: Once the team is committed and code is written, confirmation bias sets in
Using it as a blame exercise: Focus on systemic risks, not individual performance

Integration with Other Practices

Pre-mortems complement other technical practices:

Design reviews: Pre-mortem after design doc, before implementation
RFC process: Include pre-mortem section in architecture RFCs
Sprint planning: Mini pre-mortems for complex stories
Incident response: Post-mortem findings feed into future pre-mortems

Measuring Effectiveness

Track these metrics over time:

Risks identified in pre-mortem vs. actual issues: Good pre-mortems catch 40-60% of eventual problems
Project success rate: Should improve as team gets better at pre-mortems
Time to delivery: Better risk management often reduces delays
Team confidence: Survey team comfort level before and after pre-mortem

Conclusion

Pre-mortems are a forcing function for intellectual honesty. They create psychological safety for surfacing uncomfortable truths and challenge the optimism bias endemic to engineering teams. For principal engineers, they’re a lightweight, high-leverage tool that can prevent catastrophic failures and build organizational muscle around risk assessment.

The best time to conduct a pre-mortem is when you’re most confident in your plan—that’s precisely when you need it most.

Action Item: Schedule a pre-mortem for your next significant technical initiative. Block 90 minutes, invite diverse perspectives, and genuinely imagine failure. Your future self will thank you.

2025-11-17

../

Pre-Mortem Analysis for Technical Projects

Pre-Mortem Analysis for Technical Projects

Introduction

What is a Pre-Mortem?

Why Pre-Mortems Work: The Psychology

Overcoming Optimism Bias

Prospective Hindsight

How to Conduct a Pre-Mortem for Technical Projects

Phase 1: Setup (15 minutes)

Phase 2: Individual Brainstorming (10 minutes)

Phase 3: Share and Consolidate (20-30 minutes)

Phase 4: Prioritize and Mitigate (30-45 minutes)

Real-World Example: Database Migration Pre-Mortem

Pre-Mortem Template for Technical Projects

Common Failure Patterns in Technical Projects

Architecture & Design

Team & Organization

Operations & Deployment

Timing & Scheduling

When to Conduct Pre-Mortems

Pitfalls to Avoid

Integration with Other Practices

Measuring Effectiveness

Conclusion