Monitoring GenAI Systems — AWS AIP-C01 Task 4.3
Overview
This folder provides a comprehensive deep-dive into monitoring, observability, and troubleshooting systems for Foundation Model (FM) applications, aligned with AWS AIP-C01 Task 4.3. All scenarios are grounded in the MangaAssist e-commerce chatbot architecture (Bedrock Claude 3, OpenSearch Serverless, DynamoDB, ECS Fargate, API Gateway WebSocket).
Key Differentiator: GenAI monitoring is fundamentally different from traditional application monitoring. Token economics, hallucination detection, prompt drift, reasoning failures, and vector store health represent entirely new operational dimensions that require specialized frameworks.
Master Mind Map — All 6 Skills
mindmap
root((Task 4.3<br/>Monitoring<br/>GenAI Systems))
**4.3.1 Holistic Observability**
Metrics Collection
Infrastructure Metrics
Service Metrics
FM Operational Metrics
Performance Tracing
X-Ray Distributed Traces
FM Interaction Traces
Trace-to-Business Correlation
Business Impact
Revenue Attribution
Conversion Tracking
CSAT Correlation
Custom Dashboards
Executive Layer
Operational Layer
Debug Layer
**4.3.2 GenAI Monitoring**
Token Usage & Cost
Per-Model Tracking
Per-Intent Tracking
Cost Anomaly CUSUM
Prompt Effectiveness
Version Registry
A/B Comparison
Effectiveness Scoring
Hallucination & Quality
Golden Dataset Scoring
Quality Dimensions
Degradation Alerting
Anomaly Detection
Token Burst Patterns
Response Drift
Isolation Forest
Bedrock Invocation Logs
Log Pipeline to Athena
Request/Response Analysis
Performance Benchmarks
**4.3.3 Integrated Observability**
Operational Dashboards
Real-Time Widgets
Alert Rules
SLA Tracking
Business Visualizations
Revenue Attribution
Funnel Analysis
QuickSight Dashboards
Compliance Monitoring
PII Detection Pipeline
Content Policy Tracking
GDPR/SOC2 Evidence
Forensic Traceability
Immutable Audit Logs
Request Replay
Hash-Chain Integrity
User & Model Behavior
Session Analytics
Behavior Clustering
Abandonment Analysis
**4.3.4 Tool Performance**
Call Pattern Tracking
Invocation Frequency
Call Chain Analysis
Dependency Mapping
Performance Metrics
Per-Tool Latency
Success/Timeout Rate
Parameter Accuracy
Multi-Agent Coordination
Handoff Tracing
Context Transfer
Overhead Metrics
Usage Baselines
Rolling Averages
Anomaly Types
Health State Machine
**4.3.5 Vector Store Ops**
Performance Monitoring
Query Latency p50/p95/p99
Throughput Metrics
Index Health
Index Optimization
HNSW Auto-Tuning
Segment Compaction
Benchmark-Driven
Data Quality
Embedding Freshness
Stale Content Detection
Quality Scoring
**4.3.6 FM Troubleshooting**
Golden Datasets
Stratified Design
Automated Scoring
Hallucination Detection
Output Diffing
Text Diff
Semantic Diff
Structured Field Diff
Reasoning Tracing
CoT Step Extraction
Logical Error Detection
Contradiction Finder
Specialized Pipelines
FM vs Traditional ML
Pluggable Scoring
Drift Detection
Skill-to-Folder Mapping
| AWS Skill |
Folder |
Focus |
Files |
| 4.3.1 Holistic Observability |
Skill-4.3.1-Holistic-Observability/ |
Complete visibility — metrics, tracing, business impact, dashboards |
7 |
| 4.3.2 GenAI Monitoring |
Skill-4.3.2-GenAI-Monitoring/ |
Proactive issue detection — tokens, hallucinations, anomalies, Bedrock logs |
8 |
| 4.3.3 Integrated Observability |
Skill-4.3.3-Integrated-Observability/ |
Actionable insights — dashboards, compliance, forensics, behavior tracking |
8 |
| 4.3.4 Tool Performance |
Skill-4.3.4-Tool-Performance-Frameworks/ |
Tool operation — call patterns, metrics, multi-agent, baselines |
7 |
| 4.3.5 Vector Store Ops |
Skill-4.3.5-Vector-Store-Operations/ |
Vector DB — performance, index optimization, data quality |
6 |
| 4.3.6 FM Troubleshooting |
Skill-4.3.6-FM-Troubleshooting-Frameworks/ |
FM-specific failures — hallucinations, diffing, reasoning, pipelines |
7 |
Total: 43 files (including this README)
Architecture Overview — How the 6 Skills Interconnect
graph TB
subgraph "Data Sources"
BEDROCK[Amazon Bedrock<br/>Model Invocations]
OPENSEARCH[OpenSearch Serverless<br/>Vector Store]
ECS[ECS Fargate<br/>Orchestrator]
APIGW[API Gateway<br/>WebSocket]
DYNAMO[DynamoDB<br/>Sessions/Products]
GUARDRAILS[Bedrock Guardrails]
end
subgraph "4.3.1 — Holistic Observability Layer"
XRAY[X-Ray Traces]
CW_METRICS[CloudWatch Metrics]
CW_LOGS[CloudWatch Logs]
EVENTS[EventBridge Events]
end
subgraph "4.3.2 — GenAI Monitoring Engine"
TOKEN_MON[Token/Cost Monitor]
QUALITY_MON[Quality Scorer]
ANOMALY[Anomaly Detector]
INVOC_LOGS[Invocation Log Analyzer]
end
subgraph "4.3.4 — Tool Performance"
TOOL_TRACK[Tool Call Tracker]
AGENT_TRACE[Agent Coordinator Tracer]
end
subgraph "4.3.5 — Vector Store Ops"
VEC_MON[Vector DB Monitor]
IDX_OPT[Index Optimizer]
DATA_QUAL[Data Quality Validator]
end
subgraph "4.3.6 — FM Troubleshooting"
GOLDEN[Golden Dataset Scorer]
DIFFER[Output Differ]
REASON[Reasoning Tracer]
end
subgraph "4.3.3 — Integrated Observability Platform"
OPS_DASH[Operational Dashboards]
BIZ_DASH[Business Impact Viz]
COMPLIANCE[Compliance Monitor]
AUDIT[Forensic Audit Trail]
BEHAVIOR[User/Model Behavior]
end
BEDROCK --> XRAY & CW_LOGS & INVOC_LOGS
OPENSEARCH --> VEC_MON
ECS --> XRAY & CW_METRICS
APIGW --> CW_METRICS & CW_LOGS
DYNAMO --> CW_METRICS
GUARDRAILS --> CW_LOGS & COMPLIANCE
XRAY --> TOKEN_MON & TOOL_TRACK
CW_METRICS --> ANOMALY & OPS_DASH
CW_LOGS --> QUALITY_MON & AUDIT
INVOC_LOGS --> ANOMALY & GOLDEN
TOKEN_MON --> OPS_DASH & BIZ_DASH
QUALITY_MON --> OPS_DASH & GOLDEN
ANOMALY --> OPS_DASH & COMPLIANCE
TOOL_TRACK --> AGENT_TRACE & OPS_DASH
VEC_MON --> IDX_OPT & OPS_DASH
DATA_QUAL --> GOLDEN & COMPLIANCE
GOLDEN --> DIFFER --> REASON
REASON --> BEHAVIOR
style BEDROCK fill:#ff6b35,color:#fff
style OPENSEARCH fill:#ff6b35,color:#fff
style ECS fill:#ff6b35,color:#fff
style OPS_DASH fill:#2ecc71,color:#fff
style BIZ_DASH fill:#2ecc71,color:#fff
style COMPLIANCE fill:#2ecc71,color:#fff
style AUDIT fill:#2ecc71,color:#fff
style ANOMALY fill:#e74c3c,color:#fff
style GOLDEN fill:#e74c3c,color:#fff
File Index by Skill
Skill 4.3.1 — Holistic Observability
| # |
File |
Description |
| 01 |
01-observability-architecture.md |
4 pillars of GenAI observability, architecture diagram, mind map |
| 02 |
02-operational-metrics-deep-dive.md |
Infrastructure, service, and FM operational metrics with CloudWatch publishers |
| 03 |
03-performance-tracing-fm-interaction.md |
X-Ray distributed tracing, FM interaction traces, trace-to-business correlation |
| 04 |
04-business-impact-metrics.md |
FM quality → business KPI attribution, revenue impact tracking |
| 05 |
05-custom-dashboards-design.md |
CloudWatch/Grafana dashboard specs, JSON templates, wireframes |
| 06 |
06-scenarios-and-runbooks.md |
5 MangaAssist production scenarios with decision trees |
| 07 |
07-3d-visualizations.md |
Plotly 3D surface/scatter + Three.js service dependency graph |
Skill 4.3.2 — GenAI Monitoring
| # |
File |
Description |
| 01 |
01-monitoring-architecture.md |
CloudWatch-centric monitoring stack, Bedrock logging pipeline, mind map |
| 02 |
02-token-usage-cost-monitoring.md |
Per-model/intent token tracking, CUSUM cost anomaly detection |
| 03 |
03-prompt-effectiveness-tracking.md |
Prompt version registry, A/B metrics, effectiveness scoring |
| 04 |
04-hallucination-response-quality.md |
Hallucination rate pipeline, multi-dimensional quality scoring |
| 05 |
05-anomaly-detection-systems.md |
Token burst detection, response drift, Z-score + Isolation Forest |
| 06 |
06-bedrock-invocation-log-analysis.md |
Model Invocation Logs → Athena pipeline, pre-built queries |
| 07 |
07-scenarios-and-runbooks.md |
5 MangaAssist anomaly detection scenarios |
| 08 |
08-3d-visualizations.md |
Plotly token 3D surface + Three.js anomaly stream viz |
Skill 4.3.3 — Integrated Observability
| # |
File |
Description |
| 01 |
01-integrated-observability-architecture.md |
Unified platform design, multi-stakeholder dashboards, mind map |
| 02 |
02-operational-dashboards.md |
Real-time widget specs, CloudWatch JSON, Grafana alert rules |
| 03 |
03-business-impact-visualizations.md |
Revenue attribution, conversion funnel overlay, QuickSight |
| 04 |
04-compliance-monitoring.md |
PII detection, content policy tracking, GDPR/SOC2 evidence |
| 05 |
05-forensic-traceability-audit.md |
Immutable audit logs, request replay, hash-chain integrity |
| 06 |
06-user-interaction-model-behavior.md |
Session analytics, behavior clustering, abandonment analysis |
| 07 |
07-scenarios-and-runbooks.md |
5 compliance/audit/forensic scenarios |
| 08 |
08-3d-visualizations.md |
Three.js user journey 3D + Plotly compliance heatmap |
| # |
File |
Description |
| 01 |
01-tool-performance-architecture.md |
Tool calling monitoring framework, mind map, MangaAssist tool inventory |
| 02 |
02-call-pattern-tracking.md |
Invocation frequency, call chain analysis, dependency mapping |
| 03 |
03-tool-performance-metrics.md |
Per-tool latency/success/accuracy, instrumentation decorator |
| 04 |
04-multi-agent-coordination.md |
Agent handoff tracing, coordination overhead, span-based tracking |
| 05 |
05-usage-baselines-anomaly-detection.md |
Rolling baseline modeling, anomaly types, health state machine |
| 06 |
06-scenarios-and-runbooks.md |
5 tool/agent failure scenarios |
| 07 |
07-3d-visualizations.md |
Three.js multi-agent graph + Plotly tool 3D scatter |
Skill 4.3.5 — Vector Store Operations
| # |
File |
Description |
| 01 |
01-vector-store-ops-architecture.md |
Vector DB monitoring framework, comparison matrix, mind map |
| 02 |
02-vector-db-performance-monitoring.md |
Query latency p50/p95/p99, throughput, connection monitoring |
| 03 |
03-automated-index-optimization.md |
HNSW auto-tuning, segment compaction, benchmark-driven selection |
| 04 |
04-data-quality-validation.md |
Embedding freshness, stale content, quality scoring |
| 05 |
05-scenarios-and-runbooks.md |
5 vector store operational scenarios |
| 06 |
06-3d-visualizations.md |
Plotly 3D embedding space + Three.js HNSW layer viz |
Skill 4.3.6 — FM Troubleshooting Frameworks
| # |
File |
Description |
| 01 |
01-troubleshooting-architecture.md |
FM failure taxonomy, state diagram, traditional ML comparison |
| 02 |
02-golden-datasets-hallucination.md |
Golden dataset design, automated hallucination scoring pipeline |
| 03 |
03-output-diffing-consistency.md |
Text/semantic/structured diff, temporal stability analysis |
| 04 |
04-reasoning-path-tracing.md |
CoT extraction, logical error detection, contradiction finder |
| 05 |
05-specialized-observability-pipelines.md |
FM-specific vs traditional ML pipelines, pluggable scoring |
| 06 |
06-scenarios-and-runbooks.md |
5 FM troubleshooting scenarios |
| 07 |
07-3d-visualizations.md |
Plotly reasoning tree 3D + Three.js failure mode viz |
How to Use This Folder
For AWS AIP-C01 Exam Prep
- Start with this README for the complete overview
- Read each
01-*-architecture.md for conceptual understanding
- Study the scenario files (
06/07-scenarios-and-runbooks.md) for applied knowledge
- Review the mind maps in visualization files for quick revision
For Production Implementation
- Start with
Skill-4.3.1/01-observability-architecture.md for the foundational design
- Implement
Skill-4.3.2 monitoring components (token tracking, anomaly detection)
- Build the integrated dashboards from
Skill-4.3.3
- Add tool and vector store monitoring from
Skill-4.3.4 and Skill-4.3.5
- Establish troubleshooting frameworks from
Skill-4.3.6
For Interview Preparation
- Focus on architecture files and scenarios
- Practice explaining the mind maps verbally
- Know the MangaAssist scenarios — they demonstrate real-world application
- Understand tradeoffs (latency vs completeness, cost vs coverage)
Cross-References to Existing Content
These folders contain complementary content. This folder builds ON TOP of them — it does NOT duplicate their content.
| Existing Folder |
Relationship |
Key Files |
Debugging/ |
Foundation — logging infrastructure that monitoring systems consume |
01-bedrock-logging.md, 02-application-logging.md |
Troubleshoot-GenAI-Applications/ |
Sibling — troubleshooting playbooks that monitoring systems trigger |
04-retrieval-system-troubleshooting.md, 02-fm-integration-troubleshooting.md |
Evaluation-Systems-GenAI/ |
Upstream — evaluation metrics that monitoring systems track continuously |
07-agent-performance-framework.md, 08-reporting-visualization-systems.md |
LLMOps/ |
Lifecycle — LLMOps processes that monitoring systems support |
llmops-user-stories.md |
Root 13-metrics.md |
Foundation — business/operational metric definitions |
Full file |
MangaAssist System Context
All scenarios in this folder reference the MangaAssist e-commerce chatbot:
graph LR
USER[Customer] --> APIGW[API Gateway<br/>WebSocket]
APIGW --> ECS[ECS Fargate<br/>Orchestrator]
ECS --> BEDROCK[Bedrock Claude 3<br/>Sonnet/Haiku]
ECS --> OPENSEARCH[OpenSearch<br/>Serverless<br/>Vector Store]
ECS --> DYNAMO[DynamoDB<br/>Sessions/Products]
ECS --> GUARD[Bedrock<br/>Guardrails]
BEDROCK --> ECS
OPENSEARCH --> ECS
ECS --> APIGW --> USER
style BEDROCK fill:#ff9900,color:#000
style OPENSEARCH fill:#ff9900,color:#000
style ECS fill:#ff9900,color:#000
style DYNAMO fill:#ff9900,color:#000
Components: API Gateway (WebSocket) → ECS Fargate (orchestrator) → Bedrock Claude 3 Sonnet (complex) / Haiku (simple) → OpenSearch Serverless (product embeddings) → DynamoDB (sessions, products, orders) → Bedrock Guardrails (content filtering)
Technology Stack for Visualizations
All visualization files include runnable code examples using:
| Library |
Version |
Purpose |
| Plotly (Python) |
5.x |
Interactive 3D surface plots, scatter plots, heatmaps |
| Three.js (JavaScript) |
r150+ |
Browser-based 3D rendering — service graphs, agent networks, embedding spaces |
| Mermaid |
10.x |
In-markdown diagrams — flowcharts, sequence, mindmaps, state, class |
| NumPy |
1.24+ |
Data generation for Plotly visualizations |
| Pandas |
2.0+ |
Data manipulation for metric analysis |
Install Python dependencies:
pip install plotly pandas numpy scikit-learn