LOCAL PREVIEW View on GitHub

CI/CD Pipeline User Stories — Amazon AI Chatbot (MangaAssist)

Overview

This folder contains 8 comprehensive user stories covering every CI/CD pipeline required to build, test, deploy, and operate the Amazon AI Chatbot at production scale. Each user story includes deep-dive implementation details, critical decision analysis comparing multiple approaches, and tradeoff sections documenting stakeholder tensions.

The chatbot's production stack spans ECS Fargate + Lambda (compute), Amazon Bedrock + SageMaker (AI/ML), DynamoDB + OpenSearch (data), CloudFront + API Gateway (edge), and CDK/CloudFormation (infrastructure). Each pipeline is designed for the 1–2 person DevOps/MLOps team described in 07-team-size.md.


User Stories

# Pipeline File Key Services Critical Decisions
CD-01 Application Code Deployment CD-01 ECS Fargate, Lambda, ECR, API Gateway GitHub Actions vs CodePipeline; Blue/Green vs Canary; Branching strategy
CD-02 Infrastructure as Code CD-02 CDK, CloudFormation, multi-stack CDK vs Terraform vs CloudFormation; Single vs multi-stack; Environment isolation
CD-03 ML Model Deployment CD-03 SageMaker, Inferentia, Model Registry SageMaker Pipelines vs Step Functions; Shadow vs Canary; Auto vs human approval
CD-04 RAG Knowledge Base CD-04 OpenSearch, Titan Embeddings, S3 In-place vs blue/green index; Batch vs incremental re-embedding
CD-05 Frontend Deployment CD-05 React, S3, CloudFront Cache invalidation strategy; Preview deployments; Asset versioning
CD-06 Configuration & Prompt CD-06 AppConfig, SSM, guardrails AppConfig vs SSM vs LaunchDarkly; Git-managed vs prompt platform
CD-07 Database Migration CD-07 DynamoDB, DAX, GSI Online dual-write vs offline migration; Streams backfill vs scan-write
CD-08 Monitoring & Observability CD-08 CloudWatch, X-Ray, MLflow, Grafana CloudWatch vs Grafana; Alarm-as-code vs console; Centralized vs per-service

Pipeline Dependency Map

graph TB
    subgraph "Foundation Layer"
        CD02["CD-02: Infrastructure as Code"]
        CD08["CD-08: Monitoring & Observability"]
    end

    subgraph "Data Layer"
        CD07["CD-07: Database Migration"]
        CD04["CD-04: RAG Knowledge Base"]
    end

    subgraph "Application Layer"
        CD01["CD-01: Application Code"]
        CD05["CD-05: Frontend"]
        CD03["CD-03: ML Model Deployment"]
    end

    subgraph "Runtime Layer"
        CD06["CD-06: Configuration & Prompt"]
    end

    CD02 -->|"Provisions compute, networking"| CD01
    CD02 -->|"Provisions DynamoDB, OpenSearch"| CD07
    CD02 -->|"Provisions S3, CloudFront"| CD05
    CD02 -->|"Provisions SageMaker endpoints"| CD03
    CD02 -->|"Provisions dashboards, alarms"| CD08

    CD07 -->|"Tables ready for app"| CD01
    CD04 -->|"Index available for RAG"| CD01
    CD03 -->|"Model endpoints live"| CD01
    CD08 -->|"Alarms validate deployments"| CD01
    CD08 -->|"Canary metrics for models"| CD03

    CD01 -->|"App reads config at runtime"| CD06
    CD05 -->|"Widget loads chat endpoint"| CD01

    style CD02 fill:#ff9900,color:#000
    style CD01 fill:#146eb4,color:#fff
    style CD03 fill:#8C4FFF,color:#fff
    style CD04 fill:#C925D1,color:#fff
    style CD05 fill:#1B660F,color:#fff
    style CD06 fill:#DD344C,color:#fff
    style CD07 fill:#3334B9,color:#fff
    style CD08 fill:#E07941,color:#fff

Deployment Frequency by Pipeline

gantt
    title Pipeline Deployment Cadence
    dateFormat  YYYY-MM-DD
    axisFormat  %b %d

    section App Code (CD-01)
    Daily deploys           :active, app1, 2026-03-01, 1d
    Daily deploys           :active, app2, 2026-03-02, 1d
    Daily deploys           :active, app3, 2026-03-03, 1d

    section Infra (CD-02)
    Weekly infra release    :infra1, 2026-03-01, 7d

    section ML Model (CD-03)
    Weekly intent classifier:ml1, 2026-03-01, 7d
    Monthly embeddings      :ml2, 2026-03-01, 30d

    section RAG KB (CD-04)
    Daily incremental       :rag1, 2026-03-01, 1d
    Weekly full re-index    :rag2, 2026-03-01, 7d

    section Frontend (CD-05)
    2-3x per week           :fe1, 2026-03-01, 3d

    section Config (CD-06)
    On-demand (minutes)     :cfg1, 2026-03-01, 1d

    section DB Migration (CD-07)
    Quarterly               :db1, 2026-03-01, 90d

    section Monitoring (CD-08)
    Weekly dashboard updates:mon1, 2026-03-01, 7d

Behavioural Scenarios

Real-world conflict scenarios involving team leads, architects, and product managers around CI/CD decisions. See the Behavioural/ subfolder.

# Scenario Stakeholders Core Tension
BH-01 Deployment Frequency Conflict PM, Architect, Team Lead Velocity vs stability vs team wellness
BH-02 ML Model Gate Disagreement DS Lead, Architect, PM Model improvement urgency vs production safety
BH-03 CI/CD Tooling Choice Conflict Architect, Team Lead, DevOps, PM Consistency vs DX vs cost
BH-04 Rollback Policy Conflict Architect, PM, Team Lead Risk aversion vs experimentation velocity

How to Use This Folder

  1. Start with CD-02 (Infrastructure as Code) — it's the foundation all other pipelines depend on
  2. Read CD-01 (Application Code) — the highest-frequency pipeline and primary deployment path
  3. Read CD-03 (ML Model) — the most complex pipeline with quality gates and canary stages
  4. Read remaining pipelines in any order based on interest
  5. Review Behavioural scenarios — they provide interview-ready answers for CI/CD conflict questions

Relationship to Architecture

Document Relevance
04-architecture-hld.md Overall deployment model (ECS Fargate + Lambda burst)
04b-architecture-lld.md Component design, latency budgets that pipelines must validate
07-team-size.md DevOps/MLOps team size (1-2 people) — pipelines must be automatable by small team
15-tradeoffs-challenges.md Architectural tradeoffs that influence pipeline design
Fine-Tuning-Foundational-Models/09-training-infrastructure-mlops.md ML training pipeline, quality gates (shadow → canary → prod)
Tech-Stack/04-innovation-and-tradeoffs.md Technology evaluation framework reused for CI/CD tooling decisions