Scenario 6 — Container Supply-Chain Security And Release Gating
User Story
As the platform and security engineer for MangaAssist, I wanted only trusted container artifacts to reach production, because an LLM system can fail through its build and deployment chain even when application code looks unchanged — especially when the runtime bundles prompt-handling logic, guardrails, and model-serving dependencies together in the same image.
Why This Matters Specifically For MangaAssist
A MangaAssist container image is not just a packaging format. It bundles: - OS libraries and language runtimes - Manga recommendation orchestration logic - Prompt template handling and guardrail code - Model-serving runtime components (vLLM, SageMaker dependencies) - Transitive dependencies from dozens of packages
If you cannot prove what is inside the image and how it was built, you do not control production. An untrusted image could silently change prompt handling, bypass content guardrails for manga content moderation, or introduce a vulnerable library without any application code changing.
What We Actually Did
- Used ECR as the standard registry with vulnerability scanning enabled on every push.
- Treated base images, build containers, and deployment artifacts as supply-chain inputs that require evidence, not just source code.
- Required a full evidence trail before any image could be promoted to production.
- Used rollback-to-previous-signed-artifact as the containment path when a release was revoked or deemed untrustworthy.
Required Evidence Per Release
Before any MangaAssist container could be promoted to production, the following evidence was required:
| Evidence item | Purpose |
|---|---|
| Lockfile digest | Prove exact dependency versions — no floating latest |
| Dependency diff from previous release | Show what changed in the dependency graph |
| SBOM (Software Bill of Materials) | Full inventory of what is inside the image |
| Build provenance attestation | Prove the image was built from known inputs on a trusted build system |
| Artifact signature | Prove the image was not tampered with after build |
| Security scan results | Known CVE check against ECR scan |
| Evaluation results | Model and prompt behavior validation for the new artifact |
| Deployment approval metadata | Human-in-loop sign-off with audit trail |
Deep-Dive Questions And Answers
Q1. Why is image scanning alone not enough? Scanning tells you about known CVEs — it doesn't tell you about provenance. You still need exact dependency versions, SBOMs, and signatures to answer: what was built, from which inputs, was it built by a trusted system, and was the artifact tampered with after build? CVE scanning is one check in a longer chain. Without the rest, you can't answer those questions under incident pressure.
Q2. What release evidence would you require for a production container? Lockfile digest, dependency diff from the previous release, SBOM, build provenance attestation, artifact signature, security scan results, evaluation results (for model-serving containers especially), and deployment approval metadata. For MangaAssist, the evaluation results were particularly important — a new vLLM container that passes security scans but degrades recommendation quality is still a bad release.
Q3. How would you respond if a critical base-image CVE appeared after release? Use the SBOM to identify which MangaAssist releases are affected — you get the answer in minutes instead of hours. Rebuild from a patched base or updated lockfile. Promote only the new signed artifact through the full evidence chain. Roll back any still-affected deployment to the previous trusted version. The SBOM-driven triage starts with facts instead of guesswork.
Q4. How do you explain this without sounding like generic security theater? Tie it back to customer impact specific to MangaAssist. A bad artifact could affect content moderation guardrails (allowing inappropriate manga content through), change prompt handling behavior silently, or introduce a library vulnerability in the inference path. We cared about rapid containment and explainability — being able to say exactly which users were served by the affected version and exactly what changed — not compliance checkboxes.
Q5. What is the best "senior engineer" sentence here? I treat container images as governed release artifacts, not build by-products. If an image is unsigned, missing an SBOM, or missing promotion evidence, it fails closed and never reaches production. The same rigor we apply to code changes should apply to every dependency that bundles into the image.
Optimizations We Can Credibly Claim
- ECR vulnerability scanning on every image push
- Promotion policy enforced via evidence gate — no evidence = no promotion
- SBOM-based incident triage — minutes to identify affected releases instead of hours
- Fast rollback to a previously trusted signed artifact
- Lockfile-pinned dependencies — no floating versions that change without a PR
Better-Than-Naive Explanation
The naive answer is "we scan Docker images." The stronger answer: we made image trust enforceable at promotion time and auditable after release. Scanning is one signal. The full picture is: what's in the image (SBOM), how was it built (provenance), was it changed after build (signature), what CVEs does it carry (scan), and does it behave correctly (evaluation). That is how you manage supply-chain risk in a production LLM chatbot where runtime behavior also depends on model and prompt artifacts that live inside the container.
Decision Table
| Dimension | Details |
|---|---|
| Why ECR over self-hosted registry | Native AWS scanning integration, IAM-based access control, no registry infrastructure to operate |
| Why SBOM matters for MangaAssist | Image bundles prompt handling + guardrails + model serving — need full dependency inventory for incident response |
| Why provenance matters | Proves the image was built on trusted infra from known inputs — detects supply-chain injection |
| Image signing rationale | Detects tampering between build and deployment — especially critical for model-serving containers |
| Evaluation results in evidence | A security-clean but quality-degraded model container is still a bad release |
| Rollback unit | Previous signed artifact — not a branch, not a commit, the artifact itself |
| Tradeoff: evidence requirement vs deployment speed | More gates = slower promotion; payoff = rapid containment, explainability, auditable trail |
| Key outcome | Auditable, fail-closed production — unsigned/unattested image never reaches serving path |
Tradeoffs Discussed
| Option considered | Why rejected or scoped |
|---|---|
| Scan-only (no SBOM/provenance) | Misses provenance and tampering vectors; insufficient for incident response |
| Self-hosted container registry | Adds registry infrastructure burden; no product advantage for MangaAssist's team size |
| Manual promotion with no evidence trail | Fast, no overhead — but no auditability, no rollback precision, unacceptable for regulated content platform |
| Code-level rollback only (no artifact rollback) | Rebuilding from code takes longer under incident pressure; pre-signed artifact rollback is faster |
| Evaluation results optional | Security-clean but quality-degraded container could silently harm recommendation quality |
Scale Planned
| Scope | Enforcement |
|---|---|
| Every image push | ECR vulnerability scan triggered automatically |
| Every promotion to staging | SBOM + build provenance + dependency diff required |
| Every promotion to production | Full evidence gate: lockfile + diff + SBOM + signature + scan + eval + approval |
| Incident response | SBOM-driven triage: seconds to identify affected releases + which images to roll back |
| Rollback SLA | Rollback to previous signed artifact: minutes, not hours |
Supply-Chain Threat Model For MangaAssist
Threat 1 — Vulnerable base image
Detection: ECR scan + SBOM identifies OS packages
Response: Rebuild from patched base, re-promote through evidence chain
Threat 2 — Compromised build system
Detection: Build provenance attestation proves build origin
Response: Re-build on trusted system, new signature required
Threat 3 — Post-build image tampering
Detection: Artifact signature verification at deployment time
Response: Unsigned/invalid-signature image fails to deploy
Threat 4 — Floating dependency introduces bad version
Detection: Lockfile digest + dependency diff flags unexpected changes
Response: Pin dependency, rebuild, re-promote
Threat 5 — Quality regression in model-serving container
Detection: Evaluation results required in evidence gate
Response: Block promotion until evaluation passes
Intuition From This Scenario
A container image is a release artifact that encapsulates your entire runtime trust surface. Checking the code is not enough — the image also bundles OS libraries, transitive dependencies, prompt-handling logic, and for MangaAssist, model-serving components. Any one of those layers can be the vector for a silent regression or security incident. The discipline is treating image promotion with the same rigor you apply to a database migration: evidence-gated, reversible, auditable, and with a clear rollback path that doesn't require rebuilding from source under incident pressure. Signed artifacts are your rollback targets, not your git branches.