Cost Optimization Offline Testing

Cost levers ranked by leverage: prompt cache, model right-sizing, context trimming, batch APIs, retrieval-before-LLM. Each lever has an offline harness pattern so you can measure the win before shipping.

Interview talking points

Order of cost levers. 1) Prompt cache, 2) model right-sizing via Arena, 3) trim context, 4) batch what you can, 5) retrieval-before-LLM, 6) self-host (last resort).
Why offline first? Production A/B is the slowest measurement loop; offline = same query set + same prompt = clean delta.
Cache-hit math. Anthropic prompt cache: cached tokens ~10% of normal price, ~5x faster TTFT. Justifies a 1024+ token system prompt if you reuse it.
Model arena methodology. Blind voting, locked promotion rule (cheapest model that wins ≥X% of votes ships).

Files in this folder

File	Title
01-offline-testing-strategy.md	01. Offline Testing Strategy - How I Keep MangaAssist Testing Cheap
02-offline-testing-scenarios-with-answers.md	02. Offline Testing Scenarios With Answers
03-foundations-and-primitives-for-cost-optimization-testing.md	03. Foundations and Primitives for Cost-Optimization Offline Testing
04-scenario-deep-dives-per-cost-story.md	04. Scenario Deep-Dives — Offline Testing for Each Cost-Optimization User Story
05-ml-ai-engineer-grill-chains.md	05. ML/AI Engineer Grill Chains — Cost-Optimization Offline Testing
06-mlops-engineer-grill-chains.md	06. MLOps Engineer Grill Chains — Cost-Optimization Offline Testing
07-cross-cutting-system-grill.md	07. Cross-Cutting System Grill — Cost Optimization at Amazon-Loop Depth
README.md	Cost Optimization and Offline Testing - MangaAssist

Back to the home page.