Cost Optimization Offline Testing
Cost levers ranked by leverage: prompt cache, model right-sizing, context trimming, batch APIs, retrieval-before-LLM. Each lever has an offline harness pattern so you can measure the win before shipping.
Interview talking points
- Order of cost levers. 1) Prompt cache, 2) model right-sizing via Arena, 3) trim context, 4) batch what you can, 5) retrieval-before-LLM, 6) self-host (last resort).
- Why offline first? Production A/B is the slowest measurement loop; offline = same query set + same prompt = clean delta.
- Cache-hit math. Anthropic prompt cache: cached tokens ~10% of normal price, ~5x faster TTFT. Justifies a 1024+ token system prompt if you reuse it.
- Model arena methodology. Blind voting, locked promotion rule (cheapest model that wins ≥X% of votes ships).
Files in this folder
| File | Title |
|---|---|
| 01-offline-testing-strategy.md | 01. Offline Testing Strategy - How I Keep MangaAssist Testing Cheap |
| 02-offline-testing-scenarios-with-answers.md | 02. Offline Testing Scenarios With Answers |
| 03-foundations-and-primitives-for-cost-optimization-testing.md | 03. Foundations and Primitives for Cost-Optimization Offline Testing |
| 04-scenario-deep-dives-per-cost-story.md | 04. Scenario Deep-Dives — Offline Testing for Each Cost-Optimization User Story |
| 05-ml-ai-engineer-grill-chains.md | 05. ML/AI Engineer Grill Chains — Cost-Optimization Offline Testing |
| 06-mlops-engineer-grill-chains.md | 06. MLOps Engineer Grill Chains — Cost-Optimization Offline Testing |
| 07-cross-cutting-system-grill.md | 07. Cross-Cutting System Grill — Cost Optimization at Amazon-Loop Depth |
| README.md | Cost Optimization and Offline Testing - MangaAssist |
Back to the home page.