CI/CD Pipeline User Stories — Amazon AI Chatbot (MangaAssist)

Overview

This folder contains 8 comprehensive user stories covering every CI/CD pipeline required to build, test, deploy, and operate the Amazon AI Chatbot at production scale. Each user story includes deep-dive implementation details, critical decision analysis comparing multiple approaches, and tradeoff sections documenting stakeholder tensions.

The chatbot's production stack spans ECS Fargate + Lambda (compute), Amazon Bedrock + SageMaker (AI/ML), DynamoDB + OpenSearch (data), CloudFront + API Gateway (edge), and CDK/CloudFormation (infrastructure). Each pipeline is designed for the 1–2 person DevOps/MLOps team described in 07-team-size.md.

User Stories

#	Pipeline	File	Key Services	Critical Decisions
CD-01	Application Code Deployment	CD-01	ECS Fargate, Lambda, ECR, API Gateway	GitHub Actions vs CodePipeline; Blue/Green vs Canary; Branching strategy
CD-02	Infrastructure as Code	CD-02	CDK, CloudFormation, multi-stack	CDK vs Terraform vs CloudFormation; Single vs multi-stack; Environment isolation
CD-03	ML Model Deployment	CD-03	SageMaker, Inferentia, Model Registry	SageMaker Pipelines vs Step Functions; Shadow vs Canary; Auto vs human approval
CD-04	RAG Knowledge Base	CD-04	OpenSearch, Titan Embeddings, S3	In-place vs blue/green index; Batch vs incremental re-embedding
CD-05	Frontend Deployment	CD-05	React, S3, CloudFront	Cache invalidation strategy; Preview deployments; Asset versioning
CD-06	Configuration & Prompt	CD-06	AppConfig, SSM, guardrails	AppConfig vs SSM vs LaunchDarkly; Git-managed vs prompt platform
CD-07	Database Migration	CD-07	DynamoDB, DAX, GSI	Online dual-write vs offline migration; Streams backfill vs scan-write
CD-08	Monitoring & Observability	CD-08	CloudWatch, X-Ray, MLflow, Grafana	CloudWatch vs Grafana; Alarm-as-code vs console; Centralized vs per-service

Pipeline Dependency Map

graph TB
    subgraph "Foundation Layer"
        CD02["CD-02: Infrastructure as Code"]
        CD08["CD-08: Monitoring & Observability"]
    end

    subgraph "Data Layer"
        CD07["CD-07: Database Migration"]
        CD04["CD-04: RAG Knowledge Base"]
    end

    subgraph "Application Layer"
        CD01["CD-01: Application Code"]
        CD05["CD-05: Frontend"]
        CD03["CD-03: ML Model Deployment"]
    end

    subgraph "Runtime Layer"
        CD06["CD-06: Configuration & Prompt"]
    end

    CD02 -->|"Provisions compute, networking"| CD01
    CD02 -->|"Provisions DynamoDB, OpenSearch"| CD07
    CD02 -->|"Provisions S3, CloudFront"| CD05
    CD02 -->|"Provisions SageMaker endpoints"| CD03
    CD02 -->|"Provisions dashboards, alarms"| CD08

    CD07 -->|"Tables ready for app"| CD01
    CD04 -->|"Index available for RAG"| CD01
    CD03 -->|"Model endpoints live"| CD01
    CD08 -->|"Alarms validate deployments"| CD01
    CD08 -->|"Canary metrics for models"| CD03

    CD01 -->|"App reads config at runtime"| CD06
    CD05 -->|"Widget loads chat endpoint"| CD01

    style CD02 fill:#ff9900,color:#000
    style CD01 fill:#146eb4,color:#fff
    style CD03 fill:#8C4FFF,color:#fff
    style CD04 fill:#C925D1,color:#fff
    style CD05 fill:#1B660F,color:#fff
    style CD06 fill:#DD344C,color:#fff
    style CD07 fill:#3334B9,color:#fff
    style CD08 fill:#E07941,color:#fff

Deployment Frequency by Pipeline

gantt
    title Pipeline Deployment Cadence
    dateFormat  YYYY-MM-DD
    axisFormat  %b %d

    section App Code (CD-01)
    Daily deploys           :active, app1, 2026-03-01, 1d
    Daily deploys           :active, app2, 2026-03-02, 1d
    Daily deploys           :active, app3, 2026-03-03, 1d

    section Infra (CD-02)
    Weekly infra release    :infra1, 2026-03-01, 7d

    section ML Model (CD-03)
    Weekly intent classifier:ml1, 2026-03-01, 7d
    Monthly embeddings      :ml2, 2026-03-01, 30d

    section RAG KB (CD-04)
    Daily incremental       :rag1, 2026-03-01, 1d
    Weekly full re-index    :rag2, 2026-03-01, 7d

    section Frontend (CD-05)
    2-3x per week           :fe1, 2026-03-01, 3d

    section Config (CD-06)
    On-demand (minutes)     :cfg1, 2026-03-01, 1d

    section DB Migration (CD-07)
    Quarterly               :db1, 2026-03-01, 90d

    section Monitoring (CD-08)
    Weekly dashboard updates:mon1, 2026-03-01, 7d

Behavioural Scenarios

Real-world conflict scenarios involving team leads, architects, and product managers around CI/CD decisions. See the Behavioural/ subfolder.

#	Scenario	Stakeholders	Core Tension
BH-01	Deployment Frequency Conflict	PM, Architect, Team Lead	Velocity vs stability vs team wellness
BH-02	ML Model Gate Disagreement	DS Lead, Architect, PM	Model improvement urgency vs production safety
BH-03	CI/CD Tooling Choice Conflict	Architect, Team Lead, DevOps, PM	Consistency vs DX vs cost
BH-04	Rollback Policy Conflict	Architect, PM, Team Lead	Risk aversion vs experimentation velocity

How to Use This Folder

Start with CD-02 (Infrastructure as Code) — it's the foundation all other pipelines depend on
Read CD-01 (Application Code) — the highest-frequency pipeline and primary deployment path
Read CD-03 (ML Model) — the most complex pipeline with quality gates and canary stages
Read remaining pipelines in any order based on interest
Review Behavioural scenarios — they provide interview-ready answers for CI/CD conflict questions

Relationship to Architecture

Document	Relevance
04-architecture-hld.md	Overall deployment model (ECS Fargate + Lambda burst)
04b-architecture-lld.md	Component design, latency budgets that pipelines must validate
07-team-size.md	DevOps/MLOps team size (1-2 people) — pipelines must be automatable by small team
15-tradeoffs-challenges.md	Architectural tradeoffs that influence pipeline design
Fine-Tuning-Foundational-Models/09-training-infrastructure-mlops.md	ML training pipeline, quality gates (shadow → canary → prod)
Tech-Stack/04-innovation-and-tradeoffs.md	Technology evaluation framework reused for CI/CD tooling decisions