US-10: Unified Optimization Decision Dashboard
User Story
As an engineering leadership team (Cost, Performance, Inference leads), I want a single dashboard that shows how every optimization decision affects all three dimensions simultaneously, So that we can make informed tradeoff decisions in monthly reviews and respond quickly when a reversal trigger fires.
Why This Dashboard Exists
graph TD
A["Without Unified Dashboard"] --> B["Cost Team watches<br/>AWS billing dashboard"]
A --> C["Performance Team watches<br/>CloudWatch latency metrics"]
A --> D["Inference Team watches<br/>quality evaluation pipeline"]
B --> E["Each team optimizes<br/>their metrics independently"]
E --> F["Local optimizations<br/>create global pessimum"]
G["With Unified Dashboard"] --> H["All three dimensions<br/>visible in one place"]
H --> I["QACPI trend shows<br/>global system health"]
I --> J["Tradeoff decisions are<br/>data-driven, not political"]
style F fill:#eb3b5a,stroke:#333,color:#fff
style J fill:#2d8659,stroke:#333,color:#fff
Acceptance Criteria
- QACPI is computed and displayed in real-time with 5-minute granularity.
- All reversal triggers from US-01 through US-09 are monitored with automated alerts.
- Each dashboard panel links to the relevant user story for context.
- Monthly tradeoff review has a standardized report generated from dashboard data.
- Any team member can see the impact of a proposed change before deploying it.
Dashboard Architecture
graph TD
subgraph "Data Sources"
CW["CloudWatch<br/>Latency, throughput,<br/>error rates"]
BR["Bedrock Metrics<br/>Token counts, model usage,<br/>provisioned utilization"]
AB["AWS Billing<br/>Daily spend by service"]
QE["Quality Evaluator<br/>Hallucination rate, CSAT,<br/>groundedness scores"]
AN["Analytics Pipeline<br/>Intent distribution,<br/>cache hit rates,<br/>escalation rates"]
end
subgraph "Processing"
KIN["Kinesis Data Stream<br/>Real-time aggregation"]
RS["Redshift<br/>Historical analysis"]
end
subgraph "Dashboard Layer"
DASH["Unified Dashboard<br/>(CloudWatch Dashboard + QuickSight)"]
end
CW --> KIN
BR --> KIN
AB --> RS
QE --> KIN
AN --> KIN
KIN --> DASH
RS --> DASH
style DASH fill:#54a0ff,stroke:#333,color:#000
Dashboard Panels
Panel 1: QACPI Composite Score (Hero Metric)
graph LR
subgraph "QACPI Trend (30 days)"
direction TB
QACPI["Current QACPI: 188,889<br/>7d trend: +2.3%<br/>30d trend: +8.7%"]
TARGET["Target: > 150,000<br/>Alert: < 120,000"]
end
subgraph "QACPI Components"
Q["Quality: 0.87<br/>(target > 0.85)"]
T["Throughput: 1,200 RPS<br/>(target > 1,000)"]
CPR["Cost/Req: $0.006<br/>(target < $0.008)"]
LAT["p95 Latency: 0.9s<br/>(target < 2.0s)"]
end
QACPI --> Q
QACPI --> T
QACPI --> CPR
QACPI --> LAT
style QACPI fill:#2d8659,stroke:#333,color:#fff
Panel 2: The Three Dimensions (At a Glance)
graph TD
subgraph "Cost Health"
C1["Daily Spend: $5,200<br/>Budget: $6,000/day<br/>Utilization: 87%"]
C2["LLM: $3,100<br/>Compute: $1,400<br/>Storage: $450<br/>Other: $250"]
end
subgraph "Performance Health"
P1["p50 Latency: 420ms<br/>p95 Latency: 1,200ms<br/>p99 Latency: 1,850ms"]
P2["Throughput: 1,200 RPS<br/>Error Rate: 0.02%<br/>Cache Hit Rate: 58%"]
end
subgraph "Inference Health"
I1["Quality Score: 0.87<br/>Hallucination Rate: 3.2%<br/>CSAT: 4.1/5.0"]
I2["Guardrail Block Rate: 2.8%<br/>False Positive Rate: 0.8%<br/>Escalation Rate: 4.5%"]
end
style C1 fill:#f9d71c,stroke:#333,color:#000
style P1 fill:#4ecdc4,stroke:#333,color:#000
style I1 fill:#ff6b6b,stroke:#333,color:#000
Panel 3: Model Tiering Effectiveness (US-02)
graph TD
subgraph "Traffic Distribution by Tier"
T1["Template: 32%<br/>(target: 30%)"]
T2["Haiku: 38%<br/>(target: 40%)"]
T3["Sonnet: 30%<br/>(target: 30%)"]
end
subgraph "Quality by Tier"
Q1["Template: N/A<br/>(deterministic)"]
Q2["Haiku avg quality: 0.79<br/>(floor: 0.72)"]
Q3["Sonnet avg quality: 0.92<br/>(floor: 0.88)"]
end
subgraph "Cost by Tier"
CO1["Template: $0/day"]
CO2["Haiku: $400/day"]
CO3["Sonnet: $4,500/day"]
end
style T1 fill:#2d8659,stroke:#333,color:#fff
style T2 fill:#fd9644,stroke:#333,color:#000
style T3 fill:#eb3b5a,stroke:#333,color:#fff
Panel 4: Latency Budget Compliance (US-03)
graph LR
subgraph "Pipeline Stage Latency (p95)"
E["Edge: 35ms<br/>(budget: 40ms) ✅"]
S["Session: 42ms<br/>(budget: 50ms) ✅"]
IC["Intent: 38ms<br/>(budget: 50ms) ✅"]
RAG["RAG: 230ms<br/>(budget: 250ms) ⚠️"]
SVC["Services: 180ms<br/>(budget: 200ms) ✅"]
LLM["LLM TTFT: 380ms<br/>(budget: 400ms) ⚠️"]
GR["Guardrails: 85ms<br/>(budget: 100ms) ✅"]
end
style RAG fill:#fd9644,stroke:#333,color:#000
style LLM fill:#fd9644,stroke:#333,color:#000
Panel 5: RAG and Caching Health (US-05, US-06)
| Metric | Current | Target | Status |
|---|---|---|---|
| RAG recall@3 | 0.86 | ≥ 0.85 | ✅ |
| RAG hallucination rate | 3.8% | < 4% | ⚠️ |
| RAG avg latency | 115ms | < 250ms | ✅ |
| Cache hit rate (product) | 72% | ≥ 70% | ✅ |
| Cache hit rate (FAQ semantic) | 38% | ≥ 40% | ⚠️ |
| Cache hit rate (recommendations) | 11% | ≥ 10% | ✅ |
| Stale data complaints | 0.06% | < 0.1% | ✅ |
Panel 6: Guardrail Health (US-07)
graph TD
subgraph "Guardrail Metrics"
G1["Total Block Rate: 2.8%<br/>(target: 2-4%)"]
G2["False Positive Rate: 0.8%<br/>(target: < 2%)"]
G3["PII Detections: 1,240/day<br/>(all blocked ✅)"]
G4["Price Corrections: 89/day<br/>(all corrected ✅)"]
G5["Async Retractions: 12/day<br/>(hallucinations caught post-delivery)"]
end
style G1 fill:#2d8659,stroke:#333,color:#fff
style G2 fill:#2d8659,stroke:#333,color:#fff
Panel 7: Autoscaling Health (US-08)
graph TD
subgraph "Scaling Events (Last 24h)"
SE1["Predictive scale-ups: 4<br/>(all on schedule)"]
SE2["Reactive scale-ups: 2<br/>(unexpected traffic)"]
SE3["Lambda overflow events: 1<br/>(handled 8K requests)"]
SE4["Scale-downs: 6<br/>(all smooth)"]
end
subgraph "Capacity Utilization"
CU1["Current tasks: 45<br/>(capacity: 60)"]
CU2["Headroom: 25%"]
CU3["Idle cost: $380/day<br/>(23% of compute)"]
end
style CU3 fill:#fd9644,stroke:#333,color:#000
Panel 8: Token Budget Health (US-09)
| Metric | Current | Target | Status |
|---|---|---|---|
| Avg input tokens | 1,780 | < 2,000 | ✅ |
| p95 input tokens | 2,850 | < 3,000 | ⚠️ |
| Avg output tokens | 280 | < 300 | ✅ |
| Prompt cache hit rate | 68% | ≥ 60% | ✅ |
| History summarization triggers/day | 185K | Monitor only | - |
| Token budget overflows/day | 2,300 | < 5,000 | ✅ |
2026 Update: Turn the Dashboard into a Decision Cockpit
Treat everything above this section as the baseline dashboard architecture. This update preserves that original monitoring design and shows how the current architecture evolves into a more trace-level decision cockpit.
A useful optimization dashboard now needs to operate at trace level, not just as a static KPI board.
- Add serving-path metrics such as queue time, TTFT, ITL/TPOT, prefill time, decode time, waiting requests, prompt cache hit rate, prefix cache hit rate, and invalidation lag.
- Segment every major panel by intent, route class, model class, traffic cohort, and experiment ID. Blended averages hide which tradeoff is actually failing.
- Join offline evals, online judge scores, human review outcomes, and business metrics in one experiment record so every change has an attributable impact trail.
- Annotate releases and use trace replay / what-if simulation before rollout. The dashboard should compare expected vs actual impact by decision ID, not only after-the-fact trends.
- Keep QACPI, but pair it with a constraint board and Pareto frontier view. Leadership needs to see when a "better" composite score violates a hard safety or policy threshold.
Recent references: AWS Bedrock model invocation logging, CloudWatch generative AI observability, Bedrock Guardrails CloudWatch metrics, vLLM metrics, Anthropic evaluation guidance.
Reversal Trigger Monitoring
graph TD
subgraph "Active Reversal Triggers (0 firing, 2 warning)"
RT1["US-02: Haiku quality < 0.72<br/>Status: 0.79 ✅"]
RT2["US-03: End-to-end p95 > 2s<br/>Status: 1.2s ✅"]
RT3["US-05: Hallucination > 5%<br/>Status: 3.8% ⚠️ (watching)"]
RT4["US-06: Stale complaints > 0.1%<br/>Status: 0.06% ✅"]
RT5["US-07: False positive > 3%<br/>Status: 0.8% ✅"]
RT6["US-08: Idle cost > 30%<br/>Status: 23% ✅"]
RT7["US-09: Quality < 0.82 on multi-turn<br/>Status: 0.84 ⚠️ (watching)"]
end
style RT3 fill:#fd9644,stroke:#333,color:#000
style RT7 fill:#fd9644,stroke:#333,color:#000
Automated Alert Rules
| Alert Level | Condition | Notification | Response |
|---|---|---|---|
| Info | Metric within 10% of trigger threshold | Slack message to team channel | Awareness; no action required |
| Warning | Metric within 5% of trigger threshold | Slack + email to team leads | Investigate; prepare rollback |
| Critical | Reversal trigger fires | PagerDuty + Slack + auto-generated Jira ticket | Execute reversal playbook within 2 hours |
| Emergency | PII leak or price error detected | PagerDuty (P1) + automatic traffic shift to safe mode | Immediate response; safe mode until root-caused |
Monthly Tradeoff Review Report
The dashboard auto-generates a monthly report for the engineering leadership meeting:
graph TD
subgraph "Monthly Review Template"
M1["1. QACPI Trend<br/>Is the system getting better<br/>or worse overall?"]
M2["2. Dimension Health<br/>Which dimension improved?<br/>Which degraded?"]
M3["3. Tradeoff Decisions Made<br/>What changed this month?<br/>What was the measured impact?"]
M4["4. Reversal Triggers Status<br/>Any close to firing?<br/>Any fired and resolved?"]
M5["5. Next Month Proposals<br/>What optimization is proposed?<br/>Expected impact on all 3 dims?"]
end
M1 --> M2 --> M3 --> M4 --> M5
style M1 fill:#54a0ff,stroke:#333,color:#000
style M5 fill:#ff9f43,stroke:#333,color:#000
Example Monthly Summary
| Category | January | February | Trend |
|---|---|---|---|
| QACPI | 175,000 | 188,889 | ↑ +7.9% |
| Monthly Spend | $168,000 | $156,000 | ↓ -7.1% |
| p95 Latency | 1.4s | 1.2s | ↓ -14.3% |
| Quality Score | 0.86 | 0.87 | ↑ +1.2% |
| Hallucination Rate | 4.1% | 3.8% | ↓ -7.3% |
| Cache Hit Rate | 52% | 58% | ↑ +11.5% |
| Escalation Rate | 5.2% | 4.5% | ↓ -13.5% |
Simulation Mode: "What If" Analysis
Before making any tradeoff decision, the dashboard supports simulation:
sequenceDiagram
participant Lead as Engineering Lead
participant Sim as Simulation Engine
participant Dash as Dashboard
Lead->>Sim: "What if we route FAQ to Haiku instead of Sonnet?"
Sim->>Sim: Calculate impact on cost, latency, quality
Sim-->>Dash: Projected QACPI: 195,000 (+3.2%)
Sim-->>Dash: Projected cost: -$12K/month
Sim-->>Dash: Projected FAQ quality: -0.08 (0.84→0.76)
Sim-->>Dash: Projected FAQ latency: -400ms
Lead->>Lead: Quality drop acceptable for FAQ?<br/>Consult Inference Team.
Note over Lead: Decision: Approve if FAQ quality stays > 0.72
Simulation Inputs
| Parameter | Options |
|---|---|
| Model tier change | Move intent X from Sonnet→Haiku, or Haiku→Template |
| RAG configuration | Change chunk count, add/remove reranker |
| Cache TTL change | Increase/decrease TTL for data type X |
| Scaling change | Adjust provisioned capacity, change predictive schedule |
| Token budget change | Reallocate tokens between sections |
| Guardrail change | Loosen/tighten threshold for guardrail X |
Summary: How All 10 User Stories Connect
graph TD
US01["US-01: Trilemma Framework<br/>(Decision methodology)"] --> US02["US-02: Model Tiering<br/>(Which model for which query?)"]
US01 --> US03["US-03: Latency Budget<br/>(How much time per stage?)"]
US01 --> US04["US-04: Real-Time vs Batch<br/>(What to pre-compute?)"]
US01 --> US05["US-05: RAG Depth<br/>(How many chunks?)"]
US01 --> US06["US-06: Caching<br/>(What to cache, how long?)"]
US01 --> US07["US-07: Guardrails<br/>(How strict?)"]
US01 --> US08["US-08: Autoscaling<br/>(How much headroom?)"]
US01 --> US09["US-09: Token Budget<br/>(How to partition context?)"]
US02 --> US10["US-10: Unified Dashboard<br/>(Track everything together)"]
US03 --> US10
US04 --> US10
US05 --> US10
US06 --> US10
US07 --> US10
US08 --> US10
US09 --> US10
style US01 fill:#ff9f43,stroke:#333,color:#000
style US10 fill:#5f27cd,stroke:#333,color:#fff
Every decision in US-02 through US-09 feeds metrics into this dashboard. The dashboard enables the monthly review cycle that keeps all three optimization dimensions balanced. No team optimizes in isolation. Every tradeoff is visible, measured, and reversible.