LOCAL PREVIEW View on GitHub

Cost-Effective Model Selection — Scenarios and Runbooks

MangaAssist Context: JP Manga store chatbot running on AWS. Bedrock Claude 3 Sonnet ($3/$15 per 1M input/output tokens) handles complex queries; Haiku ($0.25/$1.25 per 1M input/output tokens) handles simple ones. 1M messages/day across product search, order status, manga recommendations, and Q&A. Infrastructure: OpenSearch Serverless, DynamoDB, ECS Fargate, API Gateway WebSocket, ElastiCache Redis.


Skill Mapping

AWS AIP-C01 Domain Task Skill This File Covers
Domain 4 Operational Efficiency Task 4.1 Cost Optimization 4.1.2 Cost-Effective Model Selection 5 production scenarios — wrong model, wasteful routing, budget overrun, quality regression, unmapped intent

Scenario Overview

# Scenario Core Problem Severity
1 Haiku producing poor manga recommendations Wrong model for complex task HIGH
2 Sonnet routing for simple "where's my order" queries Wasteful over-routing MEDIUM
3 Budget overrun during manga sale event Traffic spike exhausts daily LLM budget CRITICAL
4 Quality regression after model tier rebalancing Routing change degraded user experience HIGH
5 New intent manga_review defaults to expensive Sonnet Unmapped intent hits most expensive model MEDIUM

Scenario 1: Haiku Producing Poor Manga Recommendations

Problem Statement

After a cost optimization initiative, the recommendation intent was experimentally routed to Haiku to reduce spend. Users began reporting irrelevant recommendations — "I asked for manga like Berserk and got generic shounen suggestions." Satisfaction for recommendations dropped from 94% to 61%.

Decision Tree

flowchart TD
    ALERT["ALERT: Recommendation<br/>satisfaction dropped to 61%<br/>(threshold: 80%)"] --> CHECK_MODEL{"Which model is<br/>serving recommendations?"}

    CHECK_MODEL -->|Haiku| ROOT["ROOT CAUSE:<br/>Haiku lacks reasoning depth<br/>for nuanced recommendations"]
    CHECK_MODEL -->|Sonnet| OTHER["Investigate other causes:<br/>prompt drift, data quality, guardrails"]

    ROOT --> FIX1["IMMEDIATE: Revert<br/>recommendation → Sonnet"]
    FIX1 --> VERIFY{"Satisfaction<br/>recovering?"}

    VERIFY -->|Yes, back to ~90%| RESOLVED["RESOLVED<br/>Document: recommendation<br/>requires Sonnet-tier reasoning"]
    VERIFY -->|No| INVESTIGATE["Investigate:<br/>- Prompt template changed?<br/>- OpenSearch index stale?<br/>- Guardrails blocking content?"]

    INVESTIGATE --> FIX2["Fix secondary issue<br/>+ keep Sonnet routing"]

    ROOT --> PREVENT["PREVENTION:<br/>1. Add recommendation to<br/>   'Sonnet-required' locked list<br/>2. A/B test before downgrade<br/>3. Quality gate: auto-rollback<br/>   if satisfaction < 80%"]

    style ALERT fill:#e74c3c,color:#fff
    style ROOT fill:#f39c12,color:#fff
    style FIX1 fill:#2ecc71,color:#fff
    style RESOLVED fill:#2ecc71,color:#fff
    style PREVENT fill:#3498db,color:#fff

Runbook

Detection

  • CloudWatch alarm: recommendation_satisfaction_rate < 0.80 for 15 minutes.
  • Metric source: User thumbs-up/down feedback aggregated per intent per model.
  • Dashboard query:
    SELECT AVG(satisfaction)
    FROM mangaassist.user_feedback
    WHERE intent = 'recommendation'
    AND timestamp > NOW() - INTERVAL 1 HOUR
    GROUP BY model_tier
    

Root Cause Analysis

Factor Expected Actual Verdict
Model tier for recommendation Sonnet Haiku Incorrect routing
Haiku quality score for recommendations N/A (untested) 6.2/10 Below threshold
Prompt template v3 (unchanged) v3 Not the cause
OpenSearch index freshness < 1 hour 45 min Not the cause

Root cause: Haiku cannot perform the multi-step reasoning required for personalized manga recommendations. Queries like "manga like Berserk but less dark and more hopeful" require understanding thematic nuance, tone comparison, and reader preference modeling — capabilities where Sonnet scores 9.4 and Haiku scores 6.2.

Resolution Steps

  1. Immediate (< 5 min): Update routing config in DynamoDB:
    # Revert recommendation intent to Sonnet
    dynamodb.update_item(
        TableName="mangaassist-routing-config",
        Key={"intent": {"S": "recommendation"}},
        UpdateExpression="SET model_tier = :tier, updated_by = :by, updated_at = :ts",
        ExpressionAttributeValues={
            ":tier": {"S": "sonnet"},
            ":by": {"S": "oncall-engineer"},
            ":ts": {"S": "2026-03-31T14:30:00Z"},
        },
    )
    
  2. Verify (< 30 min): Monitor recommendation_satisfaction_rate — should recover to > 85% within 30 minutes.
  3. Post-mortem: Document that recommendation intent is in the "Sonnet-locked" category.

Prevention

  1. Lock list: Maintain a sonnet_required_intents set in config that cannot be overridden by automated cost optimization.
  2. A/B testing gate: Any routing change must go through a 10% A/B test with n >= 2,000 before full rollout.
  3. Auto-rollback: If any intent's satisfaction drops > 15% within 1 hour of a routing change, automatically revert.

Scenario 2: Sonnet Routing for Simple "Where's My Order" Queries

Problem Statement

Cost analysis reveals that 12% of Sonnet invocations are for order_status queries like "Where is my order #98765?" These should be handled by the Template tier (zero cost, pure DynamoDB lookup) but are being misclassified as complex queries due to a bug in the intent classifier.

Decision Tree

flowchart TD
    ALERT["ALERT: Sonnet cost 18% above<br/>projection for 3 consecutive days"] --> ANALYZE{"Analyze Sonnet<br/>invocations by intent"}

    ANALYZE --> FIND["FINDING: 12% of Sonnet calls<br/>are order_status queries<br/>($200/day wasted)"]

    FIND --> WHY{"Why are order_status<br/>queries hitting Sonnet?"}

    WHY -->|Intent classifier bug| BUG_FIX["FIX: Patch intent classifier<br/>order_status regex missed<br/>Japanese order number formats"]
    WHY -->|Complexity scorer too high| SCORER_FIX["FIX: Tune complexity scorer<br/>threshold for order_status"]
    WHY -->|Missing template pattern| PATTERN_FIX["FIX: Add missing patterns<br/>to template fast path"]

    BUG_FIX --> DEPLOY["Deploy fix to ECS"]
    SCORER_FIX --> DEPLOY
    PATTERN_FIX --> DEPLOY

    DEPLOY --> VERIFY{"Order_status queries<br/>now hitting Template?"}

    VERIFY -->|Yes| SAVINGS["RESOLVED<br/>Savings: ~$6,000/month"]
    VERIFY -->|Partially| ITERATE["Add more patterns<br/>Retrain classifier"]

    SAVINGS --> PREVENT["PREVENTION:<br/>1. Weekly intent-model audit<br/>2. Alert on expensive-model<br/>   usage for template intents<br/>3. Cost anomaly detection"]

    style ALERT fill:#f39c12,color:#fff
    style FIND fill:#e74c3c,color:#fff
    style SAVINGS fill:#2ecc71,color:#fff
    style PREVENT fill:#3498db,color:#fff

Runbook

Detection

  • CloudWatch alarm: sonnet_cost_daily > projected_daily * 1.15 for 3 consecutive days.
  • Investigation query:
    SELECT intent, COUNT(*) as count, SUM(cost) as total_cost
    FROM mangaassist.inference_log
    WHERE model = 'sonnet'
    AND date = CURRENT_DATE
    GROUP BY intent
    ORDER BY total_cost DESC
    
  • Red flag: order_status appearing in the Sonnet invocation log at all.

Root Cause Analysis

Factor Expected Actual Verdict
order_status model tier Template Sonnet Misrouted
Intent classifier accuracy > 95% 88% for order_status Bug
Missed pattern N/A Japanese order formats (注文番号12345) Missing regex
Complexity score for "注文番号12345はどこ?" < 0.2 0.65 Inflated by Japanese chars

Root cause: The complexity classifier's japanese_char_ratio feature adds +0.05 to the score for any query with Japanese characters. Since all order status queries from Japanese users contain Japanese, they get inflated complexity scores. Additionally, the template fast path was missing Japanese-language order patterns.

Resolution Steps

  1. Immediate: Add Japanese order status patterns to the template fast path:
    # Add to ComplexityClassifier.TEMPLATE_PATTERNS
    re.compile(r"(?i)(注文|オーダー)\s*(番号|#|#)?\s*\d+"),
    re.compile(r"(?i)(配送|配達|届|delivery).*(状況|ステータス|status)"),
    
  2. Short-term: Adjust complexity scorer to not penalize queries that match a known template intent:
    def classify(self, query: str, detected_intent: str = None) -> float:
        # If intent is already classified as template-eligible, cap score
        if detected_intent in ("order_status", "escalation", "chitchat"):
            return min(self._raw_score(query), 0.15)
        return self._raw_score(query)
    
  3. Verify: Confirm zero Sonnet invocations for order_status in the next 24 hours.

Prevention

  • Weekly audit: Automated report of model utilization by intent — flag any template-eligible intent appearing in Sonnet logs.
  • Cost anomaly alert: CloudWatch anomaly detection on per-intent Sonnet spend.
  • Test coverage: Add Japanese-language order queries to the classifier test suite.

Scenario 3: Budget Overrun During Manga Sale Event

Problem Statement

During the annual "Manga Matsuri" sale event, traffic spikes to 3.2x normal (3.2M messages/day). The daily FM budget of $2,500 is exhausted by 3:00 PM JST. The Budget Guardian enters EMERGENCY mode, routing all queries to Haiku/Template. Recommendation quality plummets for the remaining 9 hours of peak sale traffic.

Decision Tree

flowchart TD
    ALERT["CRITICAL ALERT: Budget 95%<br/>consumed at 15:00 JST<br/>9 hours of peak traffic remaining"] --> ASSESS{"Is this a known<br/>traffic event?"}

    ASSESS -->|"Yes (Manga Matsuri)"| PLANNED["Should have been planned.<br/>Increase daily budget<br/>for event duration."]
    ASSESS -->|"No (unexpected spike)"| UNEXPECTED["Investigate traffic source:<br/>- Organic growth?<br/>- Bot traffic?<br/>- Marketing campaign?"]

    PLANNED --> INCREASE["IMMEDIATE: Increase daily<br/>budget from $2,500 to $6,000<br/>for event duration (3 days)"]

    INCREASE --> VERIFY_B{"Budget Guardian<br/>exits EMERGENCY?"}
    VERIFY_B -->|Yes| QUALITY_CHECK["Monitor recommendation<br/>quality recovery"]
    VERIFY_B -->|No| REDIS_FIX["Check Redis config —<br/>budget key may be cached"]

    QUALITY_CHECK --> RESOLVED["RESOLVED:<br/>Quality restored during sale"]

    UNEXPECTED --> BOT{"Is it bot<br/>traffic?"}
    BOT -->|Yes| BLOCK["Block bots at WAF layer<br/>Do NOT increase budget"]
    BOT -->|No| SCALE["Increase budget +<br/>enable auto-scale provisions"]

    RESOLVED --> PREVENT["PREVENTION:<br/>1. Pre-scale budget for known events<br/>2. Event calendar integration<br/>3. Gradual budget ramp (not cliff)<br/>4. Separate event budget pool"]

    style ALERT fill:#e74c3c,color:#fff
    style INCREASE fill:#2ecc71,color:#fff
    style RESOLVED fill:#2ecc71,color:#fff
    style BLOCK fill:#e74c3c,color:#fff
    style PREVENT fill:#3498db,color:#fff

Runbook

Detection

  • PagerDuty alert: budget_utilization_pct > 95% triggers page.
  • Early warning: budget_utilization_pct > 60% AND hour < 12 (budget draining faster than expected).
  • Dashboard: Real-time budget burn rate vs projected end-of-day.

Root Cause Analysis

Factor Expected Actual Verdict
Daily traffic 1M messages 3.2M messages 3.2x spike
Daily budget $2,500 $2,500 (unchanged) Budget not pre-scaled
Budget exhaustion time 23:59 JST 15:00 JST 9 hours of degraded service
Emergency mode duration 0 hours 9 hours Unacceptable quality loss
Recommendation satisfaction (15:00-24:00) 94% 58% Severe degradation

Root cause: The operations team did not pre-scale the daily budget for the Manga Matsuri sale event. The Budget Guardian correctly activated emergency mode to protect spend, but the budget was too low for the traffic level, causing extended quality degradation during peak sale hours — the worst possible time for poor recommendations.

Resolution Steps

  1. Immediate (during incident):

    # Increase daily budget via Redis override
    redis_client.set("budget:override:2026-03-31", "6000.00")
    redis_client.expire("budget:override:2026-03-31", 86400)
    
    # Reset today's budget mode tracking
    # BudgetGuardian will re-evaluate on next request
    

  2. Same day: Update Budget Guardian to check for override values:

    def _get_daily_budget(self) -> float:
        override = self.redis.get(f"budget:override:{time.strftime('%Y-%m-%d')}")
        if override:
            return float(override)
        return self.DAILY_BUDGET
    

  3. Post-event: Create event budget calendar integration.

Prevention

  1. Event calendar: Maintain a budget_events DynamoDB table with known traffic events:

    {
      "event_id": "manga-matsuri-2026",
      "start_date": "2026-03-29",
      "end_date": "2026-04-01",
      "expected_traffic_multiplier": 3.5,
      "budget_multiplier": 3.0,
      "approved_by": "platform-lead"
    }
    

  2. Gradual budget ramp: Instead of a cliff at 95%, implement proportional quality reduction:

    quality_factor = min(1.0, remaining_budget / projected_remaining_cost)
    

  3. Separate event budget pool: Allocate a separate pool for planned events so normal-day budget is unaffected.

  4. Pre-event checklist: - [ ] Budget multiplier configured for event dates - [ ] Traffic projections reviewed - [ ] On-call staffing increased - [ ] Fallback templates updated for event context - [ ] CloudWatch alarms tuned for event thresholds


Scenario 4: Quality Regression After Model Tier Rebalancing

Problem Statement

To reduce costs, the platform team changed the routing split from 15% Sonnet / 50% Haiku / 35% Template to 8% Sonnet / 52% Haiku / 40% Template. The change shipped globally at once (no A/B test). Within 24 hours, the overall CSAT score dropped from 4.2 to 3.6 (out of 5). The team cannot pinpoint which intent degraded because the change affected multiple intents simultaneously.

Decision Tree

flowchart TD
    ALERT["ALERT: CSAT dropped<br/>from 4.2 to 3.6 in 24 hours<br/>(post-routing change)"] --> CORRELATE{"Does timing correlate<br/>with routing change?"}

    CORRELATE -->|Yes| ROLLBACK_Q{"Can we rollback<br/>the routing change?"}
    CORRELATE -->|No| OTHER_CAUSES["Investigate:<br/>- Prompt template change?<br/>- Data pipeline issue?<br/>- Guardrails update?"]

    ROLLBACK_Q -->|Yes| ROLLBACK["IMMEDIATE: Rollback<br/>routing to previous config<br/>(15/50/35 split)"]
    ROLLBACK_Q -->|No, config lost| MANUAL["Manually reconstruct<br/>previous routing config"]

    ROLLBACK --> RECOVERING{"CSAT recovering<br/>within 4 hours?"}

    RECOVERING -->|Yes| ROOT_CAUSE["Confirmed: routing change<br/>caused quality regression"]
    RECOVERING -->|No| COMPOUND["Compound issue —<br/>investigate other changes"]

    ROOT_CAUSE --> ANALYZE["Analyze per-intent impact:<br/>Which intents degraded most<br/>when moved to cheaper tier?"]

    ANALYZE --> INTENT1["manga_qa: Sonnet→Haiku<br/>Quality: 9.1→6.5 (-28.6%)"]
    ANALYZE --> INTENT2["product_search (complex):<br/>Sonnet→Haiku<br/>Quality: 8.9→7.8 (-12.4%)"]
    ANALYZE --> INTENT3["shipping_info: unchanged<br/>Quality: stable"]

    INTENT1 & INTENT2 & INTENT3 --> PLAN["PLAN: Phased rebalancing<br/>with A/B tests per intent"]

    PLAN --> PREVENT["PREVENTION:<br/>1. Never change multiple intents at once<br/>2. Mandatory A/B test per intent<br/>3. Canary rollout (1%→10%→50%→100%)<br/>4. Automated rollback on CSAT drop"]

    style ALERT fill:#e74c3c,color:#fff
    style ROLLBACK fill:#f39c12,color:#fff
    style ROOT_CAUSE fill:#2ecc71,color:#fff
    style PREVENT fill:#3498db,color:#fff

Runbook

Detection

  • CloudWatch alarm: csat_score_24h < 3.8 (baseline 4.2, threshold 10% drop).
  • Supporting signals: Per-intent satisfaction rates, model-tier distribution shift, user complaint volume in support tickets.

Root Cause Analysis

Intent Before (Model/Quality) After (Model/Quality) Quality Delta User Impact
manga_qa Sonnet / 9.1 Haiku / 6.5 -28.6% Very noticeable
product_search (complex) Sonnet / 8.9 Haiku / 7.8 -12.4% Noticeable
recommendation Sonnet / 9.4 Sonnet / 9.4 0% None
product_search (simple) Haiku / 8.1 Haiku / 8.1 0% None
shipping_info Haiku / 8.8 Haiku / 8.8 0% None
order_status Template / 9.5 Template / 9.5 0% None

Root cause: The blanket reduction from 15% to 8% Sonnet moved manga_qa from Sonnet to Haiku. This single change accounts for ~70% of the CSAT drop. The lack of A/B testing meant the impact was not caught before full rollout.

Resolution Steps

  1. Immediate (< 10 min): Rollback routing config to 15/50/35 split.
  2. Verify (< 4 hours): Confirm CSAT recovery trend.
  3. Plan (next sprint): - A/B test manga_qa on Haiku with 10% traffic. - A/B test product_search (complex subset) on Haiku with 10% traffic. - Only proceed with each change if quality drop < 10%.

Prevention

  1. Change management policy: Routing changes must follow the staged rollout process:

    Config change → A/B test (10%, 48 hours, n >= 2,000) → Canary (25%, 24 hours) → Full rollout
    

  2. Per-intent quality gates:

    # Automated quality gate check before promoting routing change
    def can_promote_routing_change(intent: str, ab_test_results: dict) -> bool:
        control = ab_test_results["control"]
        treatment = ab_test_results["treatment"]
    
        quality_drop = (control["quality"] - treatment["quality"]) / control["quality"]
        satisfaction_drop = (control["satisfaction"] - treatment["satisfaction"]) / control["satisfaction"]
    
        return (
            quality_drop < 0.10          # Less than 10% quality drop
            and satisfaction_drop < 0.05  # Less than 5% satisfaction drop
            and treatment["sample_count"] >= 2000  # Enough samples
        )
    

  3. One-intent-at-a-time rule: Never change routing for more than one intent in a single deployment.


Scenario 5: New Intent manga_review Defaults to Expensive Sonnet

Problem Statement

The product team launches a new "write a manga review" feature. The intent classifier correctly detects manga_review as a new intent, but the Model Router has no explicit mapping for it. The fallback logic defaults unknown intents to Sonnet (the safest but most expensive option). Within a week, manga_review becomes 8% of traffic, adding $1,200/day in unnecessary Sonnet spend. Analysis shows Haiku handles review generation at 85% quality — more than adequate for user-generated content.

Decision Tree

flowchart TD
    ALERT["ALERT: New intent 'manga_review'<br/>detected in Sonnet invocation logs<br/>8% of traffic, $1,200/day"] --> CHECK{"Is manga_review<br/>in the routing map?"}

    CHECK -->|No — using default| DEFAULT_Q{"What is the default<br/>model for unmapped intents?"}
    CHECK -->|Yes| VERIFY_MAP["Verify correct model<br/>is assigned"]

    DEFAULT_Q -->|Sonnet (current)| PROBLEM["PROBLEM: Unmapped intents<br/>default to most expensive model"]

    PROBLEM --> EVALUATE{"What model does<br/>manga_review need?"}

    EVALUATE --> TEST["Run quick evaluation:<br/>- 100 sample reviews on Sonnet<br/>- 100 sample reviews on Haiku<br/>- LLM-as-judge scoring"]

    TEST --> RESULT["Results:<br/>Sonnet: 9.0 quality, $0.0111/req<br/>Haiku: 8.5 quality, $0.0003/req<br/>Quality gap: 5.6% (acceptable)"]

    RESULT --> ASSIGN["ASSIGN: manga_review → Haiku<br/>Update routing config"]

    ASSIGN --> VERIFY{"Cost reduced?<br/>Quality acceptable?"}

    VERIFY -->|Yes| RESOLVED["RESOLVED<br/>Savings: $1,176/day ($35,280/month)"]

    RESOLVED --> PREVENT["PREVENTION:<br/>1. Change default for unknown intents<br/>   from Sonnet to Haiku<br/>2. Alert on any new unmapped intent<br/>3. Require model assignment in<br/>   feature launch checklist<br/>4. Weekly unmapped intent audit"]

    style ALERT fill:#f39c12,color:#fff
    style PROBLEM fill:#e74c3c,color:#fff
    style RESULT fill:#3498db,color:#fff
    style RESOLVED fill:#2ecc71,color:#fff
    style PREVENT fill:#3498db,color:#fff

Runbook

Detection

  • Weekly audit report: Automated check for intents appearing in inference logs that are not in the routing map.
    # Audit query: find unmapped intents
    unmapped = set(inference_log_intents) - set(INTENT_MODEL_MAP.keys())
    if unmapped:
        alert(f"Unmapped intents detected: {unmapped}")
    
  • Cost anomaly: Unexpected Sonnet spend increase not correlated with traffic growth.

Root Cause Analysis

Factor Expected Actual Verdict
manga_review in routing map Yes No Missing mapping
Default for unmapped intents Haiku (safe default) Sonnet Overly conservative default
Feature launch checklist Includes model assignment Skipped Process gap
Weekly intent audit Running Not configured Missing automation

Root cause: Two process failures compounded: (1) the feature launch checklist does not require a model assignment for new intents, and (2) the default model for unmapped intents is Sonnet instead of Haiku.

Resolution Steps

  1. Immediate: Add manga_review to routing config:

    INTENT_MODEL_MAP[Intent.MANGA_REVIEW] = ModelTier.HAIKU
    

  2. Same day: Change the default model for unmapped intents from Sonnet to Haiku:

    # In ModelRouter.route()
    default_tier = INTENT_MODEL_MAP.get(intent, ModelTier.HAIKU)  # Changed from SONNET
    

  3. This sprint: Add weekly unmapped intent audit automation.

Prevention

  1. Safe default: Change the system-wide default from Sonnet to Haiku for unmapped intents. Rationale: Haiku provides acceptable quality (7.4+) for most intents, and the 24.5x cost savings far outweigh occasional slight quality dips while the team evaluates the correct model tier.

  2. Feature launch checklist update:

    ## New Feature Launch — Model Routing Checklist
    - [ ] New intent name defined and added to Intent enum
    - [ ] Complexity evaluation performed (100 sample queries scored)
    - [ ] Model tier assigned based on evaluation
    - [ ] Routing config updated in DynamoDB
    - [ ] A/B test configured for first 2 weeks
    - [ ] Cost projection added to monthly forecast
    - [ ] On-call team notified of new intent
    

  3. Automated guardrail:

    # Run daily: detect and alert on unmapped intents hitting Sonnet
    def audit_unmapped_intents(inference_logs: list, routing_map: dict) -> list:
        unmapped = []
        for log in inference_logs:
            if log["intent"] not in routing_map and log["model"] == "sonnet":
                unmapped.append({
                    "intent": log["intent"],
                    "count": log["count"],
                    "daily_cost": log["total_cost"],
                    "first_seen": log["first_seen"],
                })
        return sorted(unmapped, key=lambda x: x["daily_cost"], reverse=True)
    


Cross-Scenario Summary

Common Patterns Across All 5 Scenarios

Pattern Scenarios Mitigation
Missing or incorrect model assignment 1, 2, 5 Intent-to-model map with locked tiers + default to Haiku
No A/B testing before routing changes 1, 4 Mandatory A/B test gate (10%, n >= 2,000)
Budget not scaled for known events 3 Event calendar with auto-budget adjustment
No automated detection of misrouting 2, 5 Weekly intent-model audit + cost anomaly alerts
Rollout too fast (no canary) 4 Staged rollout: 1% → 10% → 50% → 100%

Prevention Priority Matrix

quadrantChart
    title Prevention Priority — Impact vs Effort
    x-axis Low Effort --> High Effort
    y-axis Low Impact --> High Impact
    quadrant-1 Do First
    quadrant-2 Plan Carefully
    quadrant-3 Quick Wins
    quadrant-4 Deprioritize
    "Change default to Haiku": [0.2, 0.7]
    "Weekly intent audit": [0.3, 0.6]
    "Event budget calendar": [0.5, 0.9]
    "A/B test gate": [0.6, 0.85]
    "Canary rollout system": [0.7, 0.8]
    "Per-intent quality gates": [0.8, 0.75]
    "Feature launch checklist": [0.15, 0.5]
    "Auto-rollback system": [0.9, 0.9]

Monitoring Checklist

Metric Alarm Threshold Scenario It Catches
intent_satisfaction_rate < 80% for any intent Scenario 1, 4
sonnet_daily_cost > 115% of projection Scenario 2, 5
budget_utilization_pct > 60% before noon JST Scenario 3
csat_score_24h < 3.8 (baseline 4.2) Scenario 4
unmapped_intent_count > 0 Scenario 5
model_tier_distribution Sonnet > 20% of traffic Scenario 2
budget_mode != "normal" for > 2 hours Scenario 3

Incident Response Quick Reference

flowchart LR
    INCIDENT[Model Routing<br/>Incident] --> TYPE{Incident Type?}

    TYPE -->|Quality Drop| QD["1. Check which intent dropped<br/>2. Check model tier for that intent<br/>3. Revert if tier changed recently<br/>4. A/B test before re-applying"]

    TYPE -->|Cost Spike| CS["1. Check per-intent Sonnet usage<br/>2. Look for unmapped intents<br/>3. Look for misclassified queries<br/>4. Check if traffic event is known"]

    TYPE -->|Budget Exhaustion| BE["1. Is it a planned event?<br/>   → Increase budget<br/>2. Is it bot traffic?<br/>   → Block at WAF<br/>3. Is it organic growth?<br/>   → Adjust daily budget"]

    style QD fill:#e74c3c,color:#fff
    style CS fill:#f39c12,color:#fff
    style BE fill:#e74c3c,color:#fff

Previous: 02-inference-cost-optimization.md -- Budget guardian, A/B testing, fallback chains, cost projections.

Back to: 01-model-selection-framework.md -- Model selection architecture, complexity classifier, routing maps.