Scenario 6 — Container Supply-Chain Security And Release Gating

User Story

As the platform and security engineer for MangaAssist, I wanted only trusted container artifacts to reach production, because an LLM system can fail through its build and deployment chain even when application code looks unchanged — especially when the runtime bundles prompt-handling logic, guardrails, and model-serving dependencies together in the same image.

Why This Matters Specifically For MangaAssist

A MangaAssist container image is not just a packaging format. It bundles: - OS libraries and language runtimes - Manga recommendation orchestration logic - Prompt template handling and guardrail code - Model-serving runtime components (vLLM, SageMaker dependencies) - Transitive dependencies from dozens of packages

If you cannot prove what is inside the image and how it was built, you do not control production. An untrusted image could silently change prompt handling, bypass content guardrails for manga content moderation, or introduce a vulnerable library without any application code changing.

What We Actually Did

Used ECR as the standard registry with vulnerability scanning enabled on every push.
Treated base images, build containers, and deployment artifacts as supply-chain inputs that require evidence, not just source code.
Required a full evidence trail before any image could be promoted to production.
Used rollback-to-previous-signed-artifact as the containment path when a release was revoked or deemed untrustworthy.

Required Evidence Per Release

Before any MangaAssist container could be promoted to production, the following evidence was required:

Evidence item	Purpose
Lockfile digest	Prove exact dependency versions — no floating `latest`
Dependency diff from previous release	Show what changed in the dependency graph
SBOM (Software Bill of Materials)	Full inventory of what is inside the image
Build provenance attestation	Prove the image was built from known inputs on a trusted build system
Artifact signature	Prove the image was not tampered with after build
Security scan results	Known CVE check against ECR scan
Evaluation results	Model and prompt behavior validation for the new artifact
Deployment approval metadata	Human-in-loop sign-off with audit trail

Deep-Dive Questions And Answers

Q1. Why is image scanning alone not enough? Scanning tells you about known CVEs — it doesn't tell you about provenance. You still need exact dependency versions, SBOMs, and signatures to answer: what was built, from which inputs, was it built by a trusted system, and was the artifact tampered with after build? CVE scanning is one check in a longer chain. Without the rest, you can't answer those questions under incident pressure.

Q2. What release evidence would you require for a production container? Lockfile digest, dependency diff from the previous release, SBOM, build provenance attestation, artifact signature, security scan results, evaluation results (for model-serving containers especially), and deployment approval metadata. For MangaAssist, the evaluation results were particularly important — a new vLLM container that passes security scans but degrades recommendation quality is still a bad release.

Q3. How would you respond if a critical base-image CVE appeared after release? Use the SBOM to identify which MangaAssist releases are affected — you get the answer in minutes instead of hours. Rebuild from a patched base or updated lockfile. Promote only the new signed artifact through the full evidence chain. Roll back any still-affected deployment to the previous trusted version. The SBOM-driven triage starts with facts instead of guesswork.

Q4. How do you explain this without sounding like generic security theater? Tie it back to customer impact specific to MangaAssist. A bad artifact could affect content moderation guardrails (allowing inappropriate manga content through), change prompt handling behavior silently, or introduce a library vulnerability in the inference path. We cared about rapid containment and explainability — being able to say exactly which users were served by the affected version and exactly what changed — not compliance checkboxes.

Q5. What is the best "senior engineer" sentence here? I treat container images as governed release artifacts, not build by-products. If an image is unsigned, missing an SBOM, or missing promotion evidence, it fails closed and never reaches production. The same rigor we apply to code changes should apply to every dependency that bundles into the image.

Optimizations We Can Credibly Claim

ECR vulnerability scanning on every image push
Promotion policy enforced via evidence gate — no evidence = no promotion
SBOM-based incident triage — minutes to identify affected releases instead of hours
Fast rollback to a previously trusted signed artifact
Lockfile-pinned dependencies — no floating versions that change without a PR

Better-Than-Naive Explanation

The naive answer is "we scan Docker images." The stronger answer: we made image trust enforceable at promotion time and auditable after release. Scanning is one signal. The full picture is: what's in the image (SBOM), how was it built (provenance), was it changed after build (signature), what CVEs does it carry (scan), and does it behave correctly (evaluation). That is how you manage supply-chain risk in a production LLM chatbot where runtime behavior also depends on model and prompt artifacts that live inside the container.

Decision Table

Dimension	Details
Why ECR over self-hosted registry	Native AWS scanning integration, IAM-based access control, no registry infrastructure to operate
Why SBOM matters for MangaAssist	Image bundles prompt handling + guardrails + model serving — need full dependency inventory for incident response
Why provenance matters	Proves the image was built on trusted infra from known inputs — detects supply-chain injection
Image signing rationale	Detects tampering between build and deployment — especially critical for model-serving containers
Evaluation results in evidence	A security-clean but quality-degraded model container is still a bad release
Rollback unit	Previous signed artifact — not a branch, not a commit, the artifact itself
Tradeoff: evidence requirement vs deployment speed	More gates = slower promotion; payoff = rapid containment, explainability, auditable trail
Key outcome	Auditable, fail-closed production — unsigned/unattested image never reaches serving path

Tradeoffs Discussed

Option considered	Why rejected or scoped
Scan-only (no SBOM/provenance)	Misses provenance and tampering vectors; insufficient for incident response
Self-hosted container registry	Adds registry infrastructure burden; no product advantage for MangaAssist's team size
Manual promotion with no evidence trail	Fast, no overhead — but no auditability, no rollback precision, unacceptable for regulated content platform
Code-level rollback only (no artifact rollback)	Rebuilding from code takes longer under incident pressure; pre-signed artifact rollback is faster
Evaluation results optional	Security-clean but quality-degraded container could silently harm recommendation quality

Scale Planned

Scope	Enforcement
Every image push	ECR vulnerability scan triggered automatically
Every promotion to staging	SBOM + build provenance + dependency diff required
Every promotion to production	Full evidence gate: lockfile + diff + SBOM + signature + scan + eval + approval
Incident response	SBOM-driven triage: seconds to identify affected releases + which images to roll back
Rollback SLA	Rollback to previous signed artifact: minutes, not hours

Supply-Chain Threat Model For MangaAssist

Threat 1 — Vulnerable base image
  Detection: ECR scan + SBOM identifies OS packages
  Response: Rebuild from patched base, re-promote through evidence chain

Threat 2 — Compromised build system
  Detection: Build provenance attestation proves build origin
  Response: Re-build on trusted system, new signature required

Threat 3 — Post-build image tampering
  Detection: Artifact signature verification at deployment time
  Response: Unsigned/invalid-signature image fails to deploy

Threat 4 — Floating dependency introduces bad version
  Detection: Lockfile digest + dependency diff flags unexpected changes
  Response: Pin dependency, rebuild, re-promote

Threat 5 — Quality regression in model-serving container
  Detection: Evaluation results required in evidence gate
  Response: Block promotion until evaluation passes

Intuition From This Scenario

A container image is a release artifact that encapsulates your entire runtime trust surface. Checking the code is not enough — the image also bundles OS libraries, transitive dependencies, prompt-handling logic, and for MangaAssist, model-serving components. Any one of those layers can be the vector for a silent regression or security incident. The discipline is treating image promotion with the same rigor you apply to a database migration: evidence-gated, reversible, auditable, and with a clear rollback path that doesn't require rebuilding from source under incident pressure. Signed artifacts are your rollback targets, not your git branches.