BH-01: Deployment Frequency Conflict

Context

The MangaAssist AI Chatbot has been running in production for 3 months. The team has a working CI/CD pipeline (see CD-01) with automated testing and blue/green deployments. The debate erupts during a sprint retrospective when the team reviews deployment metrics: the average time from merge to production is 45 minutes, but deploys only happen 2-3 times per week because the team batches changes into "release trains."

The Conflict

Stakeholder Positions

graph TD
    subgraph "Product Manager — Push for Daily Deploys"
        PM1["Competitors ship features daily"]
        PM2["Bug fixes are waiting 3-4 days for next release train"]
        PM3["Customer complaints about slow UX improvements"]
        PM4["Proposal: Deploy every merge to main immediately"]
    end

    subgraph "Architect — Push for Weekly Releases"
        AR1["More deploys = more risk = more incidents"]
        AR2["Integration testing needs a batch of changes together"]
        AR3["Blue/green rollback is tested per-release, not per-commit"]
        AR4["Proposal: Keep weekly releases, improve testing"]
    end

    subgraph "Team Lead — Concerned About Burnout"
        TL1["Team is already exhausted from on-call"]
        TL2["Each deploy requires someone watching metrics for 30 min"]
        TL3["More deploys = more on-call interruptions"]
        TL4["Proposal: 2x weekly releases with dedicated deploy owner"]
    end

    PM4 ---|"Conflict"| AR4
    PM4 ---|"Conflict"| TL4
    AR4 ---|"Partially agrees"| TL4

STAR Analysis

Situation

The AI chatbot processes ~50K conversations/day for manga product recommendations and customer support. The current weekly release cadence means: - Bug fixes wait an average of 3.5 days to reach production - Feature PRs sit in a queue, increasing merge conflicts - Prompt engineers can't iterate on RAG response quality quickly (see CD-06) - The team of 6 (2 backend, 1 frontend, 1 ML engineer, 1 DevOps, 1 prompt engineer) is stressed from batched integration testing

Task

Resolve the deployment frequency conflict between the PM (wants daily), Architect (wants weekly), and Team Lead (wants to protect the team).

Action

Step 1: Gather Data — What Do the Metrics Actually Say?

I proposed collecting 30 days of deployment data before making a decision:

Metric	Current Value	Target
Deploys per week	2.3	?
Mean time from merge to production	3.5 days	?
Deployment failure rate	8% (1 in ~12 deploys)	< 5%
Mean time to rollback	12 minutes	< 5 minutes
Hours spent on deploy-day (manual effort)	4 hours/deploy	< 0.5 hours
Incidents caused by deployments	2 in 30 days	0

Step 2: Identify the Real Problem

The data revealed the conflict wasn't about frequency — it was about manual effort per deploy: - Each deploy required someone to watch CloudWatch dashboards for 30 minutes - Integration testing was manual (click through chatbot, test 5 scenarios) - Rollback required manual CodeDeploy console access

With 4 hours of manual effort per deploy, daily deploys meant a full-time deploy engineer — which the team didn't have.

Step 3: Build a Framework — "Trust the Pipeline" Criteria

I proposed automated quality gates that would allow self-service deploys without human babysitting:

If ALL of these pass automatically, deploy proceeds without human monitoring:
1. ✅ All unit tests pass (Jest + pytest)
2. ✅ Integration tests pass (automated chatbot conversation tests)
3. ✅ Blue/green health checks pass (3/3 health checks in 60 seconds)
4. ✅ Error rate stays below baseline + 1% for 5 minutes post-deploy
5. ✅ P99 latency stays below baseline + 200ms for 5 minutes
6. ✅ No DynamoDB throttling in 5-minute window
7. ✅ LLM response quality score (LLM-as-Judge) >= baseline - 0.2

If ANY gate fails → auto-rollback, Slack notification, no human intervention needed.

Step 4: Negotiate the Compromise

Stakeholder	Concern	How Addressed
PM	Speed — wants daily deploys	Automated gates enable multiple deploys per day if desired
Architect	Safety — more deploys = more risk	7 automated quality gates + auto-rollback reduce risk below current manual process
Team Lead	Burnout — human monitoring per deploy	Zero human monitoring required for green-path deploys. Only alerted on failures

Step 5: Staged Rollout of the New Process

Week 1-2: Implement automated quality gates alongside existing manual process (both run)
Week 3-4: Automated gates only, manual monitoring optional (team can watch if they want)
Week 5+: Full self-service deploys. Any merge to main deploys automatically. Team monitors via Slack alerts only

Result

After implementing the automated pipeline over 4 weeks:

Metric	Before	After	Change
Deploys per week	2.3	8-12	+4x
Mean time merge → production	3.5 days	35 minutes	-99%
Deployment failure rate	8%	3%	-62%
Human effort per deploy	4 hours	5 minutes (review Slack alert)	-98%
Incidents from deploys (per month)	2	0.5	-75%
Team satisfaction (survey)	3.⅖	4.⅘	+37%

Resolution

The conflict was resolved by reframing the question:

Wrong question: "How often should we deploy?" Right question: "What level of automation makes frequent deploys safe and effortless?"

The PM got more than daily deploys (8-12 per week). The Architect got safer deploys (automated gates catch more issues than manual monitoring). The Team Lead got less human effort per deploy (98% reduction).

Lessons Learned

1. Data Over Opinions

The 30-day metrics collection defused the emotional debate. Instead of "I feel like weekly is safer" vs "I feel like daily is better," we had "8% failure rate with 4 hours manual effort per deploy — what would make this 0% manual effort?"

2. The Real Blocker Is Rarely What People Argue About

The argument was about deployment frequency, but the actual blocker was manual effort per deploy. Once automated gates eliminated human monitoring, the frequency question answered itself.

3. Feature Flags Complement, Not Replace, Deploy Frequency

The Architect's concern about "integrating changes together" was valid for feature completeness. Feature flags (see CD-06) allow deploying incomplete features behind flags — code ships daily, features launch when ready.

4. Staged Rollout of Process Changes

Introducing "deploy every commit" overnight would have caused panic. The 4-week staged rollout (parallel → optional → automatic) gave the team confidence to trust the pipeline.

CD-01: Application Code Deployment Pipeline — the pipeline that was improved
CD-06: Configuration & Prompt Pipeline — feature flags for incomplete features
BH-04: Rollback Policy Conflict — related debate about rollback automation