Cost-Effective Model Selection — Scenarios and Runbooks

MangaAssist Context: JP Manga store chatbot running on AWS. Bedrock Claude 3 Sonnet ($3/$15 per 1M input/output tokens) handles complex queries; Haiku ($0.25/$1.25 per 1M input/output tokens) handles simple ones. 1M messages/day across product search, order status, manga recommendations, and Q&A. Infrastructure: OpenSearch Serverless, DynamoDB, ECS Fargate, API Gateway WebSocket, ElastiCache Redis.

Skill Mapping

AWS AIP-C01 Domain	Task	Skill	This File Covers
Domain 4 Operational Efficiency	Task 4.1 Cost Optimization	4.1.2 Cost-Effective Model Selection	5 production scenarios — wrong model, wasteful routing, budget overrun, quality regression, unmapped intent

Scenario Overview

#	Scenario	Core Problem	Severity
1	Haiku producing poor manga recommendations	Wrong model for complex task	HIGH
2	Sonnet routing for simple "where's my order" queries	Wasteful over-routing	MEDIUM
3	Budget overrun during manga sale event	Traffic spike exhausts daily LLM budget	CRITICAL
4	Quality regression after model tier rebalancing	Routing change degraded user experience	HIGH
5	New intent `manga_review` defaults to expensive Sonnet	Unmapped intent hits most expensive model	MEDIUM

Scenario 1: Haiku Producing Poor Manga Recommendations

Problem Statement

After a cost optimization initiative, the recommendation intent was experimentally routed to Haiku to reduce spend. Users began reporting irrelevant recommendations — "I asked for manga like Berserk and got generic shounen suggestions." Satisfaction for recommendations dropped from 94% to 61%.

Decision Tree

flowchart TD
    ALERT["ALERT: Recommendation<br/>satisfaction dropped to 61%<br/>(threshold: 80%)"] --> CHECK_MODEL{"Which model is<br/>serving recommendations?"}

    CHECK_MODEL -->|Haiku| ROOT["ROOT CAUSE:<br/>Haiku lacks reasoning depth<br/>for nuanced recommendations"]
    CHECK_MODEL -->|Sonnet| OTHER["Investigate other causes:<br/>prompt drift, data quality, guardrails"]

    ROOT --> FIX1["IMMEDIATE: Revert<br/>recommendation → Sonnet"]
    FIX1 --> VERIFY{"Satisfaction<br/>recovering?"}

    VERIFY -->|Yes, back to ~90%| RESOLVED["RESOLVED<br/>Document: recommendation<br/>requires Sonnet-tier reasoning"]
    VERIFY -->|No| INVESTIGATE["Investigate:<br/>- Prompt template changed?<br/>- OpenSearch index stale?<br/>- Guardrails blocking content?"]

    INVESTIGATE --> FIX2["Fix secondary issue<br/>+ keep Sonnet routing"]

    ROOT --> PREVENT["PREVENTION:<br/>1. Add recommendation to<br/>   'Sonnet-required' locked list<br/>2. A/B test before downgrade<br/>3. Quality gate: auto-rollback<br/>   if satisfaction < 80%"]

    style ALERT fill:#e74c3c,color:#fff
    style ROOT fill:#f39c12,color:#fff
    style FIX1 fill:#2ecc71,color:#fff
    style RESOLVED fill:#2ecc71,color:#fff
    style PREVENT fill:#3498db,color:#fff

Runbook

Detection

CloudWatch alarm: recommendation_satisfaction_rate < 0.80 for 15 minutes.
Metric source: User thumbs-up/down feedback aggregated per intent per model.

Dashboard query:

SELECT AVG(satisfaction)
FROM mangaassist.user_feedback
WHERE intent = 'recommendation'
AND timestamp > NOW() - INTERVAL 1 HOUR
GROUP BY model_tier

Root Cause Analysis

Factor	Expected	Actual	Verdict
Model tier for `recommendation`	Sonnet	Haiku	Incorrect routing
Haiku quality score for recommendations	N/A (untested)	6.2/10	Below threshold
Prompt template	v3 (unchanged)	v3	Not the cause
OpenSearch index freshness	< 1 hour	45 min	Not the cause

Root cause: Haiku cannot perform the multi-step reasoning required for personalized manga recommendations. Queries like "manga like Berserk but less dark and more hopeful" require understanding thematic nuance, tone comparison, and reader preference modeling — capabilities where Sonnet scores 9.4 and Haiku scores 6.2.

Resolution Steps

Immediate (< 5 min): Update routing config in DynamoDB:

# Revert recommendation intent to Sonnet
dynamodb.update_item(
    TableName="mangaassist-routing-config",
    Key={"intent": {"S": "recommendation"}},
    UpdateExpression="SET model_tier = :tier, updated_by = :by, updated_at = :ts",
    ExpressionAttributeValues={
        ":tier": {"S": "sonnet"},
        ":by": {"S": "oncall-engineer"},
        ":ts": {"S": "2026-03-31T14:30:00Z"},
    },
)

Verify (< 30 min): Monitor recommendation_satisfaction_rate — should recover to > 85% within 30 minutes.
Post-mortem: Document that recommendation intent is in the "Sonnet-locked" category.

Prevention

Lock list: Maintain a sonnet_required_intents set in config that cannot be overridden by automated cost optimization.
A/B testing gate: Any routing change must go through a 10% A/B test with n >= 2,000 before full rollout.
Auto-rollback: If any intent's satisfaction drops > 15% within 1 hour of a routing change, automatically revert.

Scenario 2: Sonnet Routing for Simple "Where's My Order" Queries

Problem Statement

Cost analysis reveals that 12% of Sonnet invocations are for order_status queries like "Where is my order #98765?" These should be handled by the Template tier (zero cost, pure DynamoDB lookup) but are being misclassified as complex queries due to a bug in the intent classifier.

Decision Tree

flowchart TD
    ALERT["ALERT: Sonnet cost 18% above<br/>projection for 3 consecutive days"] --> ANALYZE{"Analyze Sonnet<br/>invocations by intent"}

    ANALYZE --> FIND["FINDING: 12% of Sonnet calls<br/>are order_status queries<br/>($200/day wasted)"]

    FIND --> WHY{"Why are order_status<br/>queries hitting Sonnet?"}

    WHY -->|Intent classifier bug| BUG_FIX["FIX: Patch intent classifier<br/>order_status regex missed<br/>Japanese order number formats"]
    WHY -->|Complexity scorer too high| SCORER_FIX["FIX: Tune complexity scorer<br/>threshold for order_status"]
    WHY -->|Missing template pattern| PATTERN_FIX["FIX: Add missing patterns<br/>to template fast path"]

    BUG_FIX --> DEPLOY["Deploy fix to ECS"]
    SCORER_FIX --> DEPLOY
    PATTERN_FIX --> DEPLOY

    DEPLOY --> VERIFY{"Order_status queries<br/>now hitting Template?"}

    VERIFY -->|Yes| SAVINGS["RESOLVED<br/>Savings: ~$6,000/month"]
    VERIFY -->|Partially| ITERATE["Add more patterns<br/>Retrain classifier"]

    SAVINGS --> PREVENT["PREVENTION:<br/>1. Weekly intent-model audit<br/>2. Alert on expensive-model<br/>   usage for template intents<br/>3. Cost anomaly detection"]

    style ALERT fill:#f39c12,color:#fff
    style FIND fill:#e74c3c,color:#fff
    style SAVINGS fill:#2ecc71,color:#fff
    style PREVENT fill:#3498db,color:#fff

Runbook

Detection

CloudWatch alarm: sonnet_cost_daily > projected_daily * 1.15 for 3 consecutive days.

Investigation query:

SELECT intent, COUNT(*) as count, SUM(cost) as total_cost
FROM mangaassist.inference_log
WHERE model = 'sonnet'
AND date = CURRENT_DATE
GROUP BY intent
ORDER BY total_cost DESC

Red flag: order_status appearing in the Sonnet invocation log at all.

Root Cause Analysis

Factor	Expected	Actual	Verdict
`order_status` model tier	Template	Sonnet	Misrouted
Intent classifier accuracy	> 95%	88% for order_status	Bug
Missed pattern	N/A	Japanese order formats (注文番号12345)	Missing regex
Complexity score for "注文番号12345はどこ?"	< 0.2	0.65	Inflated by Japanese chars

Root cause: The complexity classifier's japanese_char_ratio feature adds +0.05 to the score for any query with Japanese characters. Since all order status queries from Japanese users contain Japanese, they get inflated complexity scores. Additionally, the template fast path was missing Japanese-language order patterns.

Resolution Steps

Immediate: Add Japanese order status patterns to the template fast path:

# Add to ComplexityClassifier.TEMPLATE_PATTERNS
re.compile(r"(?i)(注文|オーダー)\s*(番号|#|＃)?\s*\d+"),
re.compile(r"(?i)(配送|配達|届|delivery).*(状況|ステータス|status)"),

Short-term: Adjust complexity scorer to not penalize queries that match a known template intent:

def classify(self, query: str, detected_intent: str = None) -> float:
    # If intent is already classified as template-eligible, cap score
    if detected_intent in ("order_status", "escalation", "chitchat"):
        return min(self._raw_score(query), 0.15)
    return self._raw_score(query)

Verify: Confirm zero Sonnet invocations for order_status in the next 24 hours.

Prevention

Weekly audit: Automated report of model utilization by intent — flag any template-eligible intent appearing in Sonnet logs.
Cost anomaly alert: CloudWatch anomaly detection on per-intent Sonnet spend.
Test coverage: Add Japanese-language order queries to the classifier test suite.

Scenario 3: Budget Overrun During Manga Sale Event

Problem Statement

During the annual "Manga Matsuri" sale event, traffic spikes to 3.2x normal (3.2M messages/day). The daily FM budget of $2,500 is exhausted by 3:00 PM JST. The Budget Guardian enters EMERGENCY mode, routing all queries to Haiku/Template. Recommendation quality plummets for the remaining 9 hours of peak sale traffic.

Decision Tree

flowchart TD
    ALERT["CRITICAL ALERT: Budget 95%<br/>consumed at 15:00 JST<br/>9 hours of peak traffic remaining"] --> ASSESS{"Is this a known<br/>traffic event?"}

    ASSESS -->|"Yes (Manga Matsuri)"| PLANNED["Should have been planned.<br/>Increase daily budget<br/>for event duration."]
    ASSESS -->|"No (unexpected spike)"| UNEXPECTED["Investigate traffic source:<br/>- Organic growth?<br/>- Bot traffic?<br/>- Marketing campaign?"]

    PLANNED --> INCREASE["IMMEDIATE: Increase daily<br/>budget from $2,500 to $6,000<br/>for event duration (3 days)"]

    INCREASE --> VERIFY_B{"Budget Guardian<br/>exits EMERGENCY?"}
    VERIFY_B -->|Yes| QUALITY_CHECK["Monitor recommendation<br/>quality recovery"]
    VERIFY_B -->|No| REDIS_FIX["Check Redis config —<br/>budget key may be cached"]

    QUALITY_CHECK --> RESOLVED["RESOLVED:<br/>Quality restored during sale"]

    UNEXPECTED --> BOT{"Is it bot<br/>traffic?"}
    BOT -->|Yes| BLOCK["Block bots at WAF layer<br/>Do NOT increase budget"]
    BOT -->|No| SCALE["Increase budget +<br/>enable auto-scale provisions"]

    RESOLVED --> PREVENT["PREVENTION:<br/>1. Pre-scale budget for known events<br/>2. Event calendar integration<br/>3. Gradual budget ramp (not cliff)<br/>4. Separate event budget pool"]

    style ALERT fill:#e74c3c,color:#fff
    style INCREASE fill:#2ecc71,color:#fff
    style RESOLVED fill:#2ecc71,color:#fff
    style BLOCK fill:#e74c3c,color:#fff
    style PREVENT fill:#3498db,color:#fff

Runbook

Detection

PagerDuty alert: budget_utilization_pct > 95% triggers page.
Early warning: budget_utilization_pct > 60% AND hour < 12 (budget draining faster than expected).
Dashboard: Real-time budget burn rate vs projected end-of-day.

Root Cause Analysis

Factor	Expected	Actual	Verdict
Daily traffic	1M messages	3.2M messages	3.2x spike
Daily budget	$2,500	$2,500 (unchanged)	Budget not pre-scaled
Budget exhaustion time	23:59 JST	15:00 JST	9 hours of degraded service
Emergency mode duration	0 hours	9 hours	Unacceptable quality loss
Recommendation satisfaction (15:00-24:00)	94%	58%	Severe degradation

Root cause: The operations team did not pre-scale the daily budget for the Manga Matsuri sale event. The Budget Guardian correctly activated emergency mode to protect spend, but the budget was too low for the traffic level, causing extended quality degradation during peak sale hours — the worst possible time for poor recommendations.

Resolution Steps

Immediate (during incident):

# Increase daily budget via Redis override
redis_client.set("budget:override:2026-03-31", "6000.00")
redis_client.expire("budget:override:2026-03-31", 86400)

# Reset today's budget mode tracking
# BudgetGuardian will re-evaluate on next request

Same day: Update Budget Guardian to check for override values:

def _get_daily_budget(self) -> float:
    override = self.redis.get(f"budget:override:{time.strftime('%Y-%m-%d')}")
    if override:
        return float(override)
    return self.DAILY_BUDGET

Post-event: Create event budget calendar integration.

Prevention

Event calendar: Maintain a budget_events DynamoDB table with known traffic events:

{
  "event_id": "manga-matsuri-2026",
  "start_date": "2026-03-29",
  "end_date": "2026-04-01",
  "expected_traffic_multiplier": 3.5,
  "budget_multiplier": 3.0,
  "approved_by": "platform-lead"
}

Gradual budget ramp: Instead of a cliff at 95%, implement proportional quality reduction:
```
quality_factor = min(1.0, remaining_budget / projected_remaining_cost)
```
Separate event budget pool: Allocate a separate pool for planned events so normal-day budget is unaffected.
Pre-event checklist: - [ ] Budget multiplier configured for event dates - [ ] Traffic projections reviewed - [ ] On-call staffing increased - [ ] Fallback templates updated for event context - [ ] CloudWatch alarms tuned for event thresholds

Scenario 4: Quality Regression After Model Tier Rebalancing

Problem Statement

To reduce costs, the platform team changed the routing split from 15% Sonnet / 50% Haiku / 35% Template to 8% Sonnet / 52% Haiku / 40% Template. The change shipped globally at once (no A/B test). Within 24 hours, the overall CSAT score dropped from 4.2 to 3.6 (out of 5). The team cannot pinpoint which intent degraded because the change affected multiple intents simultaneously.

Decision Tree

flowchart TD
    ALERT["ALERT: CSAT dropped<br/>from 4.2 to 3.6 in 24 hours<br/>(post-routing change)"] --> CORRELATE{"Does timing correlate<br/>with routing change?"}

    CORRELATE -->|Yes| ROLLBACK_Q{"Can we rollback<br/>the routing change?"}
    CORRELATE -->|No| OTHER_CAUSES["Investigate:<br/>- Prompt template change?<br/>- Data pipeline issue?<br/>- Guardrails update?"]

    ROLLBACK_Q -->|Yes| ROLLBACK["IMMEDIATE: Rollback<br/>routing to previous config<br/>(15/50/35 split)"]
    ROLLBACK_Q -->|No, config lost| MANUAL["Manually reconstruct<br/>previous routing config"]

    ROLLBACK --> RECOVERING{"CSAT recovering<br/>within 4 hours?"}

    RECOVERING -->|Yes| ROOT_CAUSE["Confirmed: routing change<br/>caused quality regression"]
    RECOVERING -->|No| COMPOUND["Compound issue —<br/>investigate other changes"]

    ROOT_CAUSE --> ANALYZE["Analyze per-intent impact:<br/>Which intents degraded most<br/>when moved to cheaper tier?"]

    ANALYZE --> INTENT1["manga_qa: Sonnet→Haiku<br/>Quality: 9.1→6.5 (-28.6%)"]
    ANALYZE --> INTENT2["product_search (complex):<br/>Sonnet→Haiku<br/>Quality: 8.9→7.8 (-12.4%)"]
    ANALYZE --> INTENT3["shipping_info: unchanged<br/>Quality: stable"]

    INTENT1 & INTENT2 & INTENT3 --> PLAN["PLAN: Phased rebalancing<br/>with A/B tests per intent"]

    PLAN --> PREVENT["PREVENTION:<br/>1. Never change multiple intents at once<br/>2. Mandatory A/B test per intent<br/>3. Canary rollout (1%→10%→50%→100%)<br/>4. Automated rollback on CSAT drop"]

    style ALERT fill:#e74c3c,color:#fff
    style ROLLBACK fill:#f39c12,color:#fff
    style ROOT_CAUSE fill:#2ecc71,color:#fff
    style PREVENT fill:#3498db,color:#fff

Runbook

Detection

CloudWatch alarm: csat_score_24h < 3.8 (baseline 4.2, threshold 10% drop).
Supporting signals: Per-intent satisfaction rates, model-tier distribution shift, user complaint volume in support tickets.

Root Cause Analysis

Intent	Before (Model/Quality)	After (Model/Quality)	Quality Delta	User Impact
`manga_qa`	Sonnet / 9.1	Haiku / 6.5	-28.6%	Very noticeable
`product_search` (complex)	Sonnet / 8.9	Haiku / 7.8	-12.4%	Noticeable
`recommendation`	Sonnet / 9.4	Sonnet / 9.4	0%	None
`product_search` (simple)	Haiku / 8.1	Haiku / 8.1	0%	None
`shipping_info`	Haiku / 8.8	Haiku / 8.8	0%	None
`order_status`	Template / 9.5	Template / 9.5	0%	None

Root cause: The blanket reduction from 15% to 8% Sonnet moved manga_qa from Sonnet to Haiku. This single change accounts for ~70% of the CSAT drop. The lack of A/B testing meant the impact was not caught before full rollout.

Resolution Steps

Immediate (< 10 min): Rollback routing config to 15/50/35 split.
Verify (< 4 hours): Confirm CSAT recovery trend.
Plan (next sprint): - A/B test manga_qa on Haiku with 10% traffic. - A/B test product_search (complex subset) on Haiku with 10% traffic. - Only proceed with each change if quality drop < 10%.

Prevention

Change management policy: Routing changes must follow the staged rollout process:

Config change → A/B test (10%, 48 hours, n >= 2,000) → Canary (25%, 24 hours) → Full rollout

Per-intent quality gates:

# Automated quality gate check before promoting routing change
def can_promote_routing_change(intent: str, ab_test_results: dict) -> bool:
    control = ab_test_results["control"]
    treatment = ab_test_results["treatment"]

    quality_drop = (control["quality"] - treatment["quality"]) / control["quality"]
    satisfaction_drop = (control["satisfaction"] - treatment["satisfaction"]) / control["satisfaction"]

    return (
        quality_drop < 0.10          # Less than 10% quality drop
        and satisfaction_drop < 0.05  # Less than 5% satisfaction drop
        and treatment["sample_count"] >= 2000  # Enough samples
    )

One-intent-at-a-time rule: Never change routing for more than one intent in a single deployment.

Scenario 5: New Intent `manga_review` Defaults to Expensive Sonnet

Problem Statement

The product team launches a new "write a manga review" feature. The intent classifier correctly detects manga_review as a new intent, but the Model Router has no explicit mapping for it. The fallback logic defaults unknown intents to Sonnet (the safest but most expensive option). Within a week, manga_review becomes 8% of traffic, adding $1,200/day in unnecessary Sonnet spend. Analysis shows Haiku handles review generation at 85% quality — more than adequate for user-generated content.

Decision Tree

flowchart TD
    ALERT["ALERT: New intent 'manga_review'<br/>detected in Sonnet invocation logs<br/>8% of traffic, $1,200/day"] --> CHECK{"Is manga_review<br/>in the routing map?"}

    CHECK -->|No — using default| DEFAULT_Q{"What is the default<br/>model for unmapped intents?"}
    CHECK -->|Yes| VERIFY_MAP["Verify correct model<br/>is assigned"]

    DEFAULT_Q -->|Sonnet (current)| PROBLEM["PROBLEM: Unmapped intents<br/>default to most expensive model"]

    PROBLEM --> EVALUATE{"What model does<br/>manga_review need?"}

    EVALUATE --> TEST["Run quick evaluation:<br/>- 100 sample reviews on Sonnet<br/>- 100 sample reviews on Haiku<br/>- LLM-as-judge scoring"]

    TEST --> RESULT["Results:<br/>Sonnet: 9.0 quality, $0.0111/req<br/>Haiku: 8.5 quality, $0.0003/req<br/>Quality gap: 5.6% (acceptable)"]

    RESULT --> ASSIGN["ASSIGN: manga_review → Haiku<br/>Update routing config"]

    ASSIGN --> VERIFY{"Cost reduced?<br/>Quality acceptable?"}

    VERIFY -->|Yes| RESOLVED["RESOLVED<br/>Savings: $1,176/day ($35,280/month)"]

    RESOLVED --> PREVENT["PREVENTION:<br/>1. Change default for unknown intents<br/>   from Sonnet to Haiku<br/>2. Alert on any new unmapped intent<br/>3. Require model assignment in<br/>   feature launch checklist<br/>4. Weekly unmapped intent audit"]

    style ALERT fill:#f39c12,color:#fff
    style PROBLEM fill:#e74c3c,color:#fff
    style RESULT fill:#3498db,color:#fff
    style RESOLVED fill:#2ecc71,color:#fff
    style PREVENT fill:#3498db,color:#fff

Runbook

Detection

Weekly audit report: Automated check for intents appearing in inference logs that are not in the routing map.

# Audit query: find unmapped intents
unmapped = set(inference_log_intents) - set(INTENT_MODEL_MAP.keys())
if unmapped:
    alert(f"Unmapped intents detected: {unmapped}")

Cost anomaly: Unexpected Sonnet spend increase not correlated with traffic growth.

Root Cause Analysis

Factor	Expected	Actual	Verdict
`manga_review` in routing map	Yes	No	Missing mapping
Default for unmapped intents	Haiku (safe default)	Sonnet	Overly conservative default
Feature launch checklist	Includes model assignment	Skipped	Process gap
Weekly intent audit	Running	Not configured	Missing automation

Root cause: Two process failures compounded: (1) the feature launch checklist does not require a model assignment for new intents, and (2) the default model for unmapped intents is Sonnet instead of Haiku.

Resolution Steps

Immediate: Add manga_review to routing config:

INTENT_MODEL_MAP[Intent.MANGA_REVIEW] = ModelTier.HAIKU

Same day: Change the default model for unmapped intents from Sonnet to Haiku:

# In ModelRouter.route()
default_tier = INTENT_MODEL_MAP.get(intent, ModelTier.HAIKU)  # Changed from SONNET

This sprint: Add weekly unmapped intent audit automation.

Prevention

Safe default: Change the system-wide default from Sonnet to Haiku for unmapped intents. Rationale: Haiku provides acceptable quality (7.4+) for most intents, and the 24.5x cost savings far outweigh occasional slight quality dips while the team evaluates the correct model tier.

Feature launch checklist update:

## New Feature Launch — Model Routing Checklist
- [ ] New intent name defined and added to Intent enum
- [ ] Complexity evaluation performed (100 sample queries scored)
- [ ] Model tier assigned based on evaluation
- [ ] Routing config updated in DynamoDB
- [ ] A/B test configured for first 2 weeks
- [ ] Cost projection added to monthly forecast
- [ ] On-call team notified of new intent

Automated guardrail:

# Run daily: detect and alert on unmapped intents hitting Sonnet
def audit_unmapped_intents(inference_logs: list, routing_map: dict) -> list:
    unmapped = []
    for log in inference_logs:
        if log["intent"] not in routing_map and log["model"] == "sonnet":
            unmapped.append({
                "intent": log["intent"],
                "count": log["count"],
                "daily_cost": log["total_cost"],
                "first_seen": log["first_seen"],
            })
    return sorted(unmapped, key=lambda x: x["daily_cost"], reverse=True)

Cross-Scenario Summary

Common Patterns Across All 5 Scenarios

Pattern	Scenarios	Mitigation
Missing or incorrect model assignment	1, 2, 5	Intent-to-model map with locked tiers + default to Haiku
No A/B testing before routing changes	1, 4	Mandatory A/B test gate (10%, n >= 2,000)
Budget not scaled for known events	3	Event calendar with auto-budget adjustment
No automated detection of misrouting	2, 5	Weekly intent-model audit + cost anomaly alerts
Rollout too fast (no canary)	4	Staged rollout: 1% → 10% → 50% → 100%

Prevention Priority Matrix

quadrantChart
    title Prevention Priority — Impact vs Effort
    x-axis Low Effort --> High Effort
    y-axis Low Impact --> High Impact
    quadrant-1 Do First
    quadrant-2 Plan Carefully
    quadrant-3 Quick Wins
    quadrant-4 Deprioritize
    "Change default to Haiku": [0.2, 0.7]
    "Weekly intent audit": [0.3, 0.6]
    "Event budget calendar": [0.5, 0.9]
    "A/B test gate": [0.6, 0.85]
    "Canary rollout system": [0.7, 0.8]
    "Per-intent quality gates": [0.8, 0.75]
    "Feature launch checklist": [0.15, 0.5]
    "Auto-rollback system": [0.9, 0.9]

Monitoring Checklist

Metric	Alarm Threshold	Scenario It Catches
`intent_satisfaction_rate`	< 80% for any intent	Scenario 1, 4
`sonnet_daily_cost`	> 115% of projection	Scenario 2, 5
`budget_utilization_pct`	> 60% before noon JST	Scenario 3
`csat_score_24h`	< 3.8 (baseline 4.2)	Scenario 4
`unmapped_intent_count`	> 0	Scenario 5
`model_tier_distribution`	Sonnet > 20% of traffic	Scenario 2
`budget_mode`	!= "normal" for > 2 hours	Scenario 3

Incident Response Quick Reference

flowchart LR
    INCIDENT[Model Routing<br/>Incident] --> TYPE{Incident Type?}

    TYPE -->|Quality Drop| QD["1. Check which intent dropped<br/>2. Check model tier for that intent<br/>3. Revert if tier changed recently<br/>4. A/B test before re-applying"]

    TYPE -->|Cost Spike| CS["1. Check per-intent Sonnet usage<br/>2. Look for unmapped intents<br/>3. Look for misclassified queries<br/>4. Check if traffic event is known"]

    TYPE -->|Budget Exhaustion| BE["1. Is it a planned event?<br/>   → Increase budget<br/>2. Is it bot traffic?<br/>   → Block at WAF<br/>3. Is it organic growth?<br/>   → Adjust daily budget"]

    style QD fill:#e74c3c,color:#fff
    style CS fill:#f39c12,color:#fff
    style BE fill:#e74c3c,color:#fff

Previous: 02-inference-cost-optimization.md -- Budget guardian, A/B testing, fallback chains, cost projections.

Back to: 01-model-selection-framework.md -- Model selection architecture, complexity classifier, routing maps.

Cost-Effective Model Selection — Scenarios and Runbooks

Skill Mapping

Scenario Overview

Scenario 1: Haiku Producing Poor Manga Recommendations

Problem Statement

Decision Tree

Runbook

Detection

Root Cause Analysis

Resolution Steps

Prevention

Scenario 2: Sonnet Routing for Simple "Where's My Order" Queries

Problem Statement

Decision Tree

Runbook

Detection

Root Cause Analysis

Resolution Steps

Prevention

Scenario 3: Budget Overrun During Manga Sale Event

Problem Statement

Decision Tree

Runbook

Detection

Root Cause Analysis

Resolution Steps

Prevention

Scenario 4: Quality Regression After Model Tier Rebalancing

Problem Statement

Decision Tree

Runbook

Detection

Root Cause Analysis

Resolution Steps

Prevention

Scenario 5: New Intent manga_review Defaults to Expensive Sonnet

Problem Statement

Decision Tree

Runbook

Detection

Root Cause Analysis

Resolution Steps

Prevention

Cross-Scenario Summary

Common Patterns Across All 5 Scenarios

Prevention Priority Matrix

Monitoring Checklist

Incident Response Quick Reference

Scenario 5: New Intent `manga_review` Defaults to Expensive Sonnet