BH-01: Deployment Frequency Conflict
Context
The MangaAssist AI Chatbot has been running in production for 3 months. The team has a working CI/CD pipeline (see CD-01) with automated testing and blue/green deployments. The debate erupts during a sprint retrospective when the team reviews deployment metrics: the average time from merge to production is 45 minutes, but deploys only happen 2-3 times per week because the team batches changes into "release trains."
The Conflict
Stakeholder Positions
graph TD
subgraph "Product Manager — Push for Daily Deploys"
PM1["Competitors ship features daily"]
PM2["Bug fixes are waiting 3-4 days for next release train"]
PM3["Customer complaints about slow UX improvements"]
PM4["Proposal: Deploy every merge to main immediately"]
end
subgraph "Architect — Push for Weekly Releases"
AR1["More deploys = more risk = more incidents"]
AR2["Integration testing needs a batch of changes together"]
AR3["Blue/green rollback is tested per-release, not per-commit"]
AR4["Proposal: Keep weekly releases, improve testing"]
end
subgraph "Team Lead — Concerned About Burnout"
TL1["Team is already exhausted from on-call"]
TL2["Each deploy requires someone watching metrics for 30 min"]
TL3["More deploys = more on-call interruptions"]
TL4["Proposal: 2x weekly releases with dedicated deploy owner"]
end
PM4 ---|"Conflict"| AR4
PM4 ---|"Conflict"| TL4
AR4 ---|"Partially agrees"| TL4
STAR Analysis
Situation
The AI chatbot processes ~50K conversations/day for manga product recommendations and customer support. The current weekly release cadence means: - Bug fixes wait an average of 3.5 days to reach production - Feature PRs sit in a queue, increasing merge conflicts - Prompt engineers can't iterate on RAG response quality quickly (see CD-06) - The team of 6 (2 backend, 1 frontend, 1 ML engineer, 1 DevOps, 1 prompt engineer) is stressed from batched integration testing
Task
Resolve the deployment frequency conflict between the PM (wants daily), Architect (wants weekly), and Team Lead (wants to protect the team).
Action
Step 1: Gather Data — What Do the Metrics Actually Say?
I proposed collecting 30 days of deployment data before making a decision:
| Metric | Current Value | Target |
|---|---|---|
| Deploys per week | 2.3 | ? |
| Mean time from merge to production | 3.5 days | ? |
| Deployment failure rate | 8% (1 in ~12 deploys) | < 5% |
| Mean time to rollback | 12 minutes | < 5 minutes |
| Hours spent on deploy-day (manual effort) | 4 hours/deploy | < 0.5 hours |
| Incidents caused by deployments | 2 in 30 days | 0 |
Step 2: Identify the Real Problem
The data revealed the conflict wasn't about frequency — it was about manual effort per deploy: - Each deploy required someone to watch CloudWatch dashboards for 30 minutes - Integration testing was manual (click through chatbot, test 5 scenarios) - Rollback required manual CodeDeploy console access
With 4 hours of manual effort per deploy, daily deploys meant a full-time deploy engineer — which the team didn't have.
Step 3: Build a Framework — "Trust the Pipeline" Criteria
I proposed automated quality gates that would allow self-service deploys without human babysitting:
If ALL of these pass automatically, deploy proceeds without human monitoring:
1. ✅ All unit tests pass (Jest + pytest)
2. ✅ Integration tests pass (automated chatbot conversation tests)
3. ✅ Blue/green health checks pass (3/3 health checks in 60 seconds)
4. ✅ Error rate stays below baseline + 1% for 5 minutes post-deploy
5. ✅ P99 latency stays below baseline + 200ms for 5 minutes
6. ✅ No DynamoDB throttling in 5-minute window
7. ✅ LLM response quality score (LLM-as-Judge) >= baseline - 0.2
If ANY gate fails → auto-rollback, Slack notification, no human intervention needed.
Step 4: Negotiate the Compromise
| Stakeholder | Concern | How Addressed |
|---|---|---|
| PM | Speed — wants daily deploys | Automated gates enable multiple deploys per day if desired |
| Architect | Safety — more deploys = more risk | 7 automated quality gates + auto-rollback reduce risk below current manual process |
| Team Lead | Burnout — human monitoring per deploy | Zero human monitoring required for green-path deploys. Only alerted on failures |
Step 5: Staged Rollout of the New Process
- Week 1-2: Implement automated quality gates alongside existing manual process (both run)
- Week 3-4: Automated gates only, manual monitoring optional (team can watch if they want)
- Week 5+: Full self-service deploys. Any merge to
maindeploys automatically. Team monitors via Slack alerts only
Result
After implementing the automated pipeline over 4 weeks:
| Metric | Before | After | Change |
|---|---|---|---|
| Deploys per week | 2.3 | 8-12 | +4x |
| Mean time merge → production | 3.5 days | 35 minutes | -99% |
| Deployment failure rate | 8% | 3% | -62% |
| Human effort per deploy | 4 hours | 5 minutes (review Slack alert) | -98% |
| Incidents from deploys (per month) | 2 | 0.5 | -75% |
| Team satisfaction (survey) | 3.⅖ | 4.⅘ | +37% |
Resolution
The conflict was resolved by reframing the question:
Wrong question: "How often should we deploy?" Right question: "What level of automation makes frequent deploys safe and effortless?"
The PM got more than daily deploys (8-12 per week). The Architect got safer deploys (automated gates catch more issues than manual monitoring). The Team Lead got less human effort per deploy (98% reduction).
Lessons Learned
1. Data Over Opinions
The 30-day metrics collection defused the emotional debate. Instead of "I feel like weekly is safer" vs "I feel like daily is better," we had "8% failure rate with 4 hours manual effort per deploy — what would make this 0% manual effort?"
2. The Real Blocker Is Rarely What People Argue About
The argument was about deployment frequency, but the actual blocker was manual effort per deploy. Once automated gates eliminated human monitoring, the frequency question answered itself.
3. Feature Flags Complement, Not Replace, Deploy Frequency
The Architect's concern about "integrating changes together" was valid for feature completeness. Feature flags (see CD-06) allow deploying incomplete features behind flags — code ships daily, features launch when ready.
4. Staged Rollout of Process Changes
Introducing "deploy every commit" overnight would have caused panic. The 4-week staged rollout (parallel → optional → automatic) gave the team confidence to trust the pipeline.
Related User Stories
- CD-01: Application Code Deployment Pipeline — the pipeline that was improved
- CD-06: Configuration & Prompt Pipeline — feature flags for incomplete features
- BH-04: Rollback Policy Conflict — related debate about rollback automation