LOCAL PREVIEW View on GitHub

Cost Optimization and Offline Testing - MangaAssist

This folder explains how I would test MangaAssist cheaply without sending every code change to Bedrock. The core idea is simple: most chatbot regressions are not "LLM quality" problems. They are routing, retrieval, memory, guardrail, schema, latency, or prompt-shape problems, and those can be tested offline at little or no GenAI cost.


Files

Foundation (general offline testing for the chatbot)

File What It Covers
01-offline-testing-strategy.md The offline-first test architecture, dataset design, CI/CD gates, and spend controls for MangaAssist
02-offline-testing-scenarios-with-answers.md Deep-dive scenarios showing exactly how to validate prompt, retrieval, memory, guardrails, and routing changes while keeping GenAI cost low

Cost-Optimization Deep-Dive (per-scenario for the 8 user stories)

The files below apply offline testing specifically to the 8 cost-optimization user stories in ../Cost-Optimization-User-Stories/. Each story (US-01 through US-08) is dissected for what its offline test should look like and what an Amazon ML/AI Engineer or MLOps Engineer would be asked about it on a loop.

File What It Covers
03-foundations-and-primitives-for-cost-optimization-testing.md Why cost-opt offline testing is structurally different from quality testing; the 4 primitives (counterfactual replay, decision-equivalence, cost-aware golden, stress/saturation); paired-metric pattern; unified test-pipeline shape
04-scenario-deep-dives-per-cost-story.md One section per US story (US-01 through US-08): cost lever, quality contract, offline test design, mermaid pipeline diagram, real-incident sketches, threshold reconciliation
05-ml-ai-engineer-grill-chains.md Full interview grill chains (Opening + 4 follow-ups + 3 architect-level + intuition) per scenario, framed for the Amazon ML/AI Engineer loop — model behavior, calibration, distribution shift, statistical rigor
06-mlops-engineer-grill-chains.md Same scenarios from the MLOps Engineer lens — telemetry, deployment, observability, kill switches, CI gates, runbooks, on-call burden
07-cross-cutting-system-grill.md 6 system-level questions spanning multiple scenarios: compounding savings/risk, offline-online correlation, cost vs. quality SLO breaches, CI gate design, auditability, the ratchet problem; closing scoring rubric for both roles

Reading Paths

You are... Read in this order
New to MangaAssist offline testing 01 → 02 → 03 → 04
Designing an offline test for one specific cost story 03 (primitives) → 04 (find your scenario)
Preparing for an Amazon ML/AI Engineer loop 03 → 04 → 05 → 07
Preparing for an Amazon MLOps Engineer loop 03 → 04 → 06 → 07
Preparing for a system-design / staff loop 03 → 04 → 07

Core Principle

For MangaAssist, the testing ladder should always move from cheapest to most expensive:

  1. Deterministic tests with no LLM calls
  2. Replay tests on labeled datasets with mocked services
  3. Local open-source model smoke tests when prompt behavior must be exercised
  4. Small, capped paid-model evaluation only for the final promotion gate

If a change fails in step 1 or step 2, it should never reach Bedrock evaluation.


Why This Matters

MangaAssist is a hybrid chatbot:

  • many intents should never hit the LLM
  • product facts must come from catalog data
  • policy answers must come from retrieval
  • prices and ASINs must be validated after generation

That architecture is good for production cost and also good for testing cost. It lets us validate most of the system offline, then reserve paid GenAI usage for the narrow set of questions that only the target model can answer.