CI/CD Pipeline User Stories — Amazon AI Chatbot (MangaAssist)
Overview
This folder contains 8 comprehensive user stories covering every CI/CD pipeline required to build, test, deploy, and operate the Amazon AI Chatbot at production scale. Each user story includes deep-dive implementation details, critical decision analysis comparing multiple approaches, and tradeoff sections documenting stakeholder tensions.
The chatbot's production stack spans ECS Fargate + Lambda (compute), Amazon Bedrock + SageMaker (AI/ML), DynamoDB + OpenSearch (data), CloudFront + API Gateway (edge), and CDK/CloudFormation (infrastructure). Each pipeline is designed for the 1–2 person DevOps/MLOps team described in 07-team-size.md.
User Stories
| # | Pipeline | File | Key Services | Critical Decisions |
|---|---|---|---|---|
| CD-01 | Application Code Deployment | CD-01 | ECS Fargate, Lambda, ECR, API Gateway | GitHub Actions vs CodePipeline; Blue/Green vs Canary; Branching strategy |
| CD-02 | Infrastructure as Code | CD-02 | CDK, CloudFormation, multi-stack | CDK vs Terraform vs CloudFormation; Single vs multi-stack; Environment isolation |
| CD-03 | ML Model Deployment | CD-03 | SageMaker, Inferentia, Model Registry | SageMaker Pipelines vs Step Functions; Shadow vs Canary; Auto vs human approval |
| CD-04 | RAG Knowledge Base | CD-04 | OpenSearch, Titan Embeddings, S3 | In-place vs blue/green index; Batch vs incremental re-embedding |
| CD-05 | Frontend Deployment | CD-05 | React, S3, CloudFront | Cache invalidation strategy; Preview deployments; Asset versioning |
| CD-06 | Configuration & Prompt | CD-06 | AppConfig, SSM, guardrails | AppConfig vs SSM vs LaunchDarkly; Git-managed vs prompt platform |
| CD-07 | Database Migration | CD-07 | DynamoDB, DAX, GSI | Online dual-write vs offline migration; Streams backfill vs scan-write |
| CD-08 | Monitoring & Observability | CD-08 | CloudWatch, X-Ray, MLflow, Grafana | CloudWatch vs Grafana; Alarm-as-code vs console; Centralized vs per-service |
Pipeline Dependency Map
graph TB
subgraph "Foundation Layer"
CD02["CD-02: Infrastructure as Code"]
CD08["CD-08: Monitoring & Observability"]
end
subgraph "Data Layer"
CD07["CD-07: Database Migration"]
CD04["CD-04: RAG Knowledge Base"]
end
subgraph "Application Layer"
CD01["CD-01: Application Code"]
CD05["CD-05: Frontend"]
CD03["CD-03: ML Model Deployment"]
end
subgraph "Runtime Layer"
CD06["CD-06: Configuration & Prompt"]
end
CD02 -->|"Provisions compute, networking"| CD01
CD02 -->|"Provisions DynamoDB, OpenSearch"| CD07
CD02 -->|"Provisions S3, CloudFront"| CD05
CD02 -->|"Provisions SageMaker endpoints"| CD03
CD02 -->|"Provisions dashboards, alarms"| CD08
CD07 -->|"Tables ready for app"| CD01
CD04 -->|"Index available for RAG"| CD01
CD03 -->|"Model endpoints live"| CD01
CD08 -->|"Alarms validate deployments"| CD01
CD08 -->|"Canary metrics for models"| CD03
CD01 -->|"App reads config at runtime"| CD06
CD05 -->|"Widget loads chat endpoint"| CD01
style CD02 fill:#ff9900,color:#000
style CD01 fill:#146eb4,color:#fff
style CD03 fill:#8C4FFF,color:#fff
style CD04 fill:#C925D1,color:#fff
style CD05 fill:#1B660F,color:#fff
style CD06 fill:#DD344C,color:#fff
style CD07 fill:#3334B9,color:#fff
style CD08 fill:#E07941,color:#fff
Deployment Frequency by Pipeline
gantt
title Pipeline Deployment Cadence
dateFormat YYYY-MM-DD
axisFormat %b %d
section App Code (CD-01)
Daily deploys :active, app1, 2026-03-01, 1d
Daily deploys :active, app2, 2026-03-02, 1d
Daily deploys :active, app3, 2026-03-03, 1d
section Infra (CD-02)
Weekly infra release :infra1, 2026-03-01, 7d
section ML Model (CD-03)
Weekly intent classifier:ml1, 2026-03-01, 7d
Monthly embeddings :ml2, 2026-03-01, 30d
section RAG KB (CD-04)
Daily incremental :rag1, 2026-03-01, 1d
Weekly full re-index :rag2, 2026-03-01, 7d
section Frontend (CD-05)
2-3x per week :fe1, 2026-03-01, 3d
section Config (CD-06)
On-demand (minutes) :cfg1, 2026-03-01, 1d
section DB Migration (CD-07)
Quarterly :db1, 2026-03-01, 90d
section Monitoring (CD-08)
Weekly dashboard updates:mon1, 2026-03-01, 7d
Behavioural Scenarios
Real-world conflict scenarios involving team leads, architects, and product managers around CI/CD decisions. See the Behavioural/ subfolder.
| # | Scenario | Stakeholders | Core Tension |
|---|---|---|---|
| BH-01 | Deployment Frequency Conflict | PM, Architect, Team Lead | Velocity vs stability vs team wellness |
| BH-02 | ML Model Gate Disagreement | DS Lead, Architect, PM | Model improvement urgency vs production safety |
| BH-03 | CI/CD Tooling Choice Conflict | Architect, Team Lead, DevOps, PM | Consistency vs DX vs cost |
| BH-04 | Rollback Policy Conflict | Architect, PM, Team Lead | Risk aversion vs experimentation velocity |
How to Use This Folder
- Start with CD-02 (Infrastructure as Code) — it's the foundation all other pipelines depend on
- Read CD-01 (Application Code) — the highest-frequency pipeline and primary deployment path
- Read CD-03 (ML Model) — the most complex pipeline with quality gates and canary stages
- Read remaining pipelines in any order based on interest
- Review Behavioural scenarios — they provide interview-ready answers for CI/CD conflict questions
Relationship to Architecture
| Document | Relevance |
|---|---|
| 04-architecture-hld.md | Overall deployment model (ECS Fargate + Lambda burst) |
| 04b-architecture-lld.md | Component design, latency budgets that pipelines must validate |
| 07-team-size.md | DevOps/MLOps team size (1-2 people) — pipelines must be automatable by small team |
| 15-tradeoffs-challenges.md | Architectural tradeoffs that influence pipeline design |
| Fine-Tuning-Foundational-Models/09-training-infrastructure-mlops.md | ML training pipeline, quality gates (shadow → canary → prod) |
| Tech-Stack/04-innovation-and-tradeoffs.md | Technology evaluation framework reused for CI/CD tooling decisions |