Monitoring GenAI Systems — AWS AIP-C01 Task 4.3

Overview

This folder provides a comprehensive deep-dive into monitoring, observability, and troubleshooting systems for Foundation Model (FM) applications, aligned with AWS AIP-C01 Task 4.3. All scenarios are grounded in the MangaAssist e-commerce chatbot architecture (Bedrock Claude 3, OpenSearch Serverless, DynamoDB, ECS Fargate, API Gateway WebSocket).

Key Differentiator: GenAI monitoring is fundamentally different from traditional application monitoring. Token economics, hallucination detection, prompt drift, reasoning failures, and vector store health represent entirely new operational dimensions that require specialized frameworks.

Master Mind Map — All 6 Skills

mindmap
  root((Task 4.3<br/>Monitoring<br/>GenAI Systems))
    **4.3.1 Holistic Observability**
      Metrics Collection
        Infrastructure Metrics
        Service Metrics
        FM Operational Metrics
      Performance Tracing
        X-Ray Distributed Traces
        FM Interaction Traces
        Trace-to-Business Correlation
      Business Impact
        Revenue Attribution
        Conversion Tracking
        CSAT Correlation
      Custom Dashboards
        Executive Layer
        Operational Layer
        Debug Layer
    **4.3.2 GenAI Monitoring**
      Token Usage & Cost
        Per-Model Tracking
        Per-Intent Tracking
        Cost Anomaly CUSUM
      Prompt Effectiveness
        Version Registry
        A/B Comparison
        Effectiveness Scoring
      Hallucination & Quality
        Golden Dataset Scoring
        Quality Dimensions
        Degradation Alerting
      Anomaly Detection
        Token Burst Patterns
        Response Drift
        Isolation Forest
      Bedrock Invocation Logs
        Log Pipeline to Athena
        Request/Response Analysis
        Performance Benchmarks
    **4.3.3 Integrated Observability**
      Operational Dashboards
        Real-Time Widgets
        Alert Rules
        SLA Tracking
      Business Visualizations
        Revenue Attribution
        Funnel Analysis
        QuickSight Dashboards
      Compliance Monitoring
        PII Detection Pipeline
        Content Policy Tracking
        GDPR/SOC2 Evidence
      Forensic Traceability
        Immutable Audit Logs
        Request Replay
        Hash-Chain Integrity
      User & Model Behavior
        Session Analytics
        Behavior Clustering
        Abandonment Analysis
    **4.3.4 Tool Performance**
      Call Pattern Tracking
        Invocation Frequency
        Call Chain Analysis
        Dependency Mapping
      Performance Metrics
        Per-Tool Latency
        Success/Timeout Rate
        Parameter Accuracy
      Multi-Agent Coordination
        Handoff Tracing
        Context Transfer
        Overhead Metrics
      Usage Baselines
        Rolling Averages
        Anomaly Types
        Health State Machine
    **4.3.5 Vector Store Ops**
      Performance Monitoring
        Query Latency p50/p95/p99
        Throughput Metrics
        Index Health
      Index Optimization
        HNSW Auto-Tuning
        Segment Compaction
        Benchmark-Driven
      Data Quality
        Embedding Freshness
        Stale Content Detection
        Quality Scoring
    **4.3.6 FM Troubleshooting**
      Golden Datasets
        Stratified Design
        Automated Scoring
        Hallucination Detection
      Output Diffing
        Text Diff
        Semantic Diff
        Structured Field Diff
      Reasoning Tracing
        CoT Step Extraction
        Logical Error Detection
        Contradiction Finder
      Specialized Pipelines
        FM vs Traditional ML
        Pluggable Scoring
        Drift Detection

Skill-to-Folder Mapping

AWS Skill	Folder	Focus	Files
4.3.1 Holistic Observability	`Skill-4.3.1-Holistic-Observability/`	Complete visibility — metrics, tracing, business impact, dashboards	7
4.3.2 GenAI Monitoring	`Skill-4.3.2-GenAI-Monitoring/`	Proactive issue detection — tokens, hallucinations, anomalies, Bedrock logs	8
4.3.3 Integrated Observability	`Skill-4.3.3-Integrated-Observability/`	Actionable insights — dashboards, compliance, forensics, behavior tracking	8
4.3.4 Tool Performance	`Skill-4.3.4-Tool-Performance-Frameworks/`	Tool operation — call patterns, metrics, multi-agent, baselines	7
4.3.5 Vector Store Ops	`Skill-4.3.5-Vector-Store-Operations/`	Vector DB — performance, index optimization, data quality	6
4.3.6 FM Troubleshooting	`Skill-4.3.6-FM-Troubleshooting-Frameworks/`	FM-specific failures — hallucinations, diffing, reasoning, pipelines	7

Total: 43 files (including this README)

Architecture Overview — How the 6 Skills Interconnect

graph TB
    subgraph "Data Sources"
        BEDROCK[Amazon Bedrock<br/>Model Invocations]
        OPENSEARCH[OpenSearch Serverless<br/>Vector Store]
        ECS[ECS Fargate<br/>Orchestrator]
        APIGW[API Gateway<br/>WebSocket]
        DYNAMO[DynamoDB<br/>Sessions/Products]
        GUARDRAILS[Bedrock Guardrails]
    end

    subgraph "4.3.1 — Holistic Observability Layer"
        XRAY[X-Ray Traces]
        CW_METRICS[CloudWatch Metrics]
        CW_LOGS[CloudWatch Logs]
        EVENTS[EventBridge Events]
    end

    subgraph "4.3.2 — GenAI Monitoring Engine"
        TOKEN_MON[Token/Cost Monitor]
        QUALITY_MON[Quality Scorer]
        ANOMALY[Anomaly Detector]
        INVOC_LOGS[Invocation Log Analyzer]
    end

    subgraph "4.3.4 — Tool Performance"
        TOOL_TRACK[Tool Call Tracker]
        AGENT_TRACE[Agent Coordinator Tracer]
    end

    subgraph "4.3.5 — Vector Store Ops"
        VEC_MON[Vector DB Monitor]
        IDX_OPT[Index Optimizer]
        DATA_QUAL[Data Quality Validator]
    end

    subgraph "4.3.6 — FM Troubleshooting"
        GOLDEN[Golden Dataset Scorer]
        DIFFER[Output Differ]
        REASON[Reasoning Tracer]
    end

    subgraph "4.3.3 — Integrated Observability Platform"
        OPS_DASH[Operational Dashboards]
        BIZ_DASH[Business Impact Viz]
        COMPLIANCE[Compliance Monitor]
        AUDIT[Forensic Audit Trail]
        BEHAVIOR[User/Model Behavior]
    end

    BEDROCK --> XRAY & CW_LOGS & INVOC_LOGS
    OPENSEARCH --> VEC_MON
    ECS --> XRAY & CW_METRICS
    APIGW --> CW_METRICS & CW_LOGS
    DYNAMO --> CW_METRICS
    GUARDRAILS --> CW_LOGS & COMPLIANCE

    XRAY --> TOKEN_MON & TOOL_TRACK
    CW_METRICS --> ANOMALY & OPS_DASH
    CW_LOGS --> QUALITY_MON & AUDIT
    INVOC_LOGS --> ANOMALY & GOLDEN

    TOKEN_MON --> OPS_DASH & BIZ_DASH
    QUALITY_MON --> OPS_DASH & GOLDEN
    ANOMALY --> OPS_DASH & COMPLIANCE
    TOOL_TRACK --> AGENT_TRACE & OPS_DASH
    VEC_MON --> IDX_OPT & OPS_DASH
    DATA_QUAL --> GOLDEN & COMPLIANCE
    GOLDEN --> DIFFER --> REASON
    REASON --> BEHAVIOR

    style BEDROCK fill:#ff6b35,color:#fff
    style OPENSEARCH fill:#ff6b35,color:#fff
    style ECS fill:#ff6b35,color:#fff
    style OPS_DASH fill:#2ecc71,color:#fff
    style BIZ_DASH fill:#2ecc71,color:#fff
    style COMPLIANCE fill:#2ecc71,color:#fff
    style AUDIT fill:#2ecc71,color:#fff
    style ANOMALY fill:#e74c3c,color:#fff
    style GOLDEN fill:#e74c3c,color:#fff

File Index by Skill

Skill 4.3.1 — Holistic Observability

#	File	Description
01	`01-observability-architecture.md`	4 pillars of GenAI observability, architecture diagram, mind map
02	`02-operational-metrics-deep-dive.md`	Infrastructure, service, and FM operational metrics with CloudWatch publishers
03	`03-performance-tracing-fm-interaction.md`	X-Ray distributed tracing, FM interaction traces, trace-to-business correlation
04	`04-business-impact-metrics.md`	FM quality → business KPI attribution, revenue impact tracking
05	`05-custom-dashboards-design.md`	CloudWatch/Grafana dashboard specs, JSON templates, wireframes
06	`06-scenarios-and-runbooks.md`	5 MangaAssist production scenarios with decision trees
07	`07-3d-visualizations.md`	Plotly 3D surface/scatter + Three.js service dependency graph

Skill 4.3.2 — GenAI Monitoring

#	File	Description
01	`01-monitoring-architecture.md`	CloudWatch-centric monitoring stack, Bedrock logging pipeline, mind map
02	`02-token-usage-cost-monitoring.md`	Per-model/intent token tracking, CUSUM cost anomaly detection
03	`03-prompt-effectiveness-tracking.md`	Prompt version registry, A/B metrics, effectiveness scoring
04	`04-hallucination-response-quality.md`	Hallucination rate pipeline, multi-dimensional quality scoring
05	`05-anomaly-detection-systems.md`	Token burst detection, response drift, Z-score + Isolation Forest
06	`06-bedrock-invocation-log-analysis.md`	Model Invocation Logs → Athena pipeline, pre-built queries
07	`07-scenarios-and-runbooks.md`	5 MangaAssist anomaly detection scenarios
08	`08-3d-visualizations.md`	Plotly token 3D surface + Three.js anomaly stream viz

Skill 4.3.3 — Integrated Observability

#	File	Description
01	`01-integrated-observability-architecture.md`	Unified platform design, multi-stakeholder dashboards, mind map
02	`02-operational-dashboards.md`	Real-time widget specs, CloudWatch JSON, Grafana alert rules
03	`03-business-impact-visualizations.md`	Revenue attribution, conversion funnel overlay, QuickSight
04	`04-compliance-monitoring.md`	PII detection, content policy tracking, GDPR/SOC2 evidence
05	`05-forensic-traceability-audit.md`	Immutable audit logs, request replay, hash-chain integrity
06	`06-user-interaction-model-behavior.md`	Session analytics, behavior clustering, abandonment analysis
07	`07-scenarios-and-runbooks.md`	5 compliance/audit/forensic scenarios
08	`08-3d-visualizations.md`	Three.js user journey 3D + Plotly compliance heatmap

Skill 4.3.4 — Tool Performance Frameworks

#	File	Description
01	`01-tool-performance-architecture.md`	Tool calling monitoring framework, mind map, MangaAssist tool inventory
02	`02-call-pattern-tracking.md`	Invocation frequency, call chain analysis, dependency mapping
03	`03-tool-performance-metrics.md`	Per-tool latency/success/accuracy, instrumentation decorator
04	`04-multi-agent-coordination.md`	Agent handoff tracing, coordination overhead, span-based tracking
05	`05-usage-baselines-anomaly-detection.md`	Rolling baseline modeling, anomaly types, health state machine
06	`06-scenarios-and-runbooks.md`	5 tool/agent failure scenarios
07	`07-3d-visualizations.md`	Three.js multi-agent graph + Plotly tool 3D scatter

Skill 4.3.5 — Vector Store Operations

#	File	Description
01	`01-vector-store-ops-architecture.md`	Vector DB monitoring framework, comparison matrix, mind map
02	`02-vector-db-performance-monitoring.md`	Query latency p50/p95/p99, throughput, connection monitoring
03	`03-automated-index-optimization.md`	HNSW auto-tuning, segment compaction, benchmark-driven selection
04	`04-data-quality-validation.md`	Embedding freshness, stale content, quality scoring
05	`05-scenarios-and-runbooks.md`	5 vector store operational scenarios
06	`06-3d-visualizations.md`	Plotly 3D embedding space + Three.js HNSW layer viz

Skill 4.3.6 — FM Troubleshooting Frameworks

#	File	Description
01	`01-troubleshooting-architecture.md`	FM failure taxonomy, state diagram, traditional ML comparison
02	`02-golden-datasets-hallucination.md`	Golden dataset design, automated hallucination scoring pipeline
03	`03-output-diffing-consistency.md`	Text/semantic/structured diff, temporal stability analysis
04	`04-reasoning-path-tracing.md`	CoT extraction, logical error detection, contradiction finder
05	`05-specialized-observability-pipelines.md`	FM-specific vs traditional ML pipelines, pluggable scoring
06	`06-scenarios-and-runbooks.md`	5 FM troubleshooting scenarios
07	`07-3d-visualizations.md`	Plotly reasoning tree 3D + Three.js failure mode viz

How to Use This Folder

For AWS AIP-C01 Exam Prep

Start with this README for the complete overview
Read each 01-*-architecture.md for conceptual understanding
Study the scenario files (06/07-scenarios-and-runbooks.md) for applied knowledge
Review the mind maps in visualization files for quick revision

For Production Implementation

Start with Skill-4.3.1/01-observability-architecture.md for the foundational design
Implement Skill-4.3.2 monitoring components (token tracking, anomaly detection)
Build the integrated dashboards from Skill-4.3.3
Add tool and vector store monitoring from Skill-4.3.4 and Skill-4.3.5
Establish troubleshooting frameworks from Skill-4.3.6

For Interview Preparation

Focus on architecture files and scenarios
Practice explaining the mind maps verbally
Know the MangaAssist scenarios — they demonstrate real-world application
Understand tradeoffs (latency vs completeness, cost vs coverage)

Cross-References to Existing Content

These folders contain complementary content. This folder builds ON TOP of them — it does NOT duplicate their content.

Existing Folder	Relationship	Key Files
`Debugging/`	Foundation — logging infrastructure that monitoring systems consume	`01-bedrock-logging.md`, `02-application-logging.md`
`Troubleshoot-GenAI-Applications/`	Sibling — troubleshooting playbooks that monitoring systems trigger	`04-retrieval-system-troubleshooting.md`, `02-fm-integration-troubleshooting.md`
`Evaluation-Systems-GenAI/`	Upstream — evaluation metrics that monitoring systems track continuously	`07-agent-performance-framework.md`, `08-reporting-visualization-systems.md`
`LLMOps/`	Lifecycle — LLMOps processes that monitoring systems support	`llmops-user-stories.md`
Root `13-metrics.md`	Foundation — business/operational metric definitions	Full file

MangaAssist System Context

All scenarios in this folder reference the MangaAssist e-commerce chatbot:

graph LR
    USER[Customer] --> APIGW[API Gateway<br/>WebSocket]
    APIGW --> ECS[ECS Fargate<br/>Orchestrator]
    ECS --> BEDROCK[Bedrock Claude 3<br/>Sonnet/Haiku]
    ECS --> OPENSEARCH[OpenSearch<br/>Serverless<br/>Vector Store]
    ECS --> DYNAMO[DynamoDB<br/>Sessions/Products]
    ECS --> GUARD[Bedrock<br/>Guardrails]
    BEDROCK --> ECS
    OPENSEARCH --> ECS
    ECS --> APIGW --> USER

    style BEDROCK fill:#ff9900,color:#000
    style OPENSEARCH fill:#ff9900,color:#000
    style ECS fill:#ff9900,color:#000
    style DYNAMO fill:#ff9900,color:#000

Components: API Gateway (WebSocket) → ECS Fargate (orchestrator) → Bedrock Claude 3 Sonnet (complex) / Haiku (simple) → OpenSearch Serverless (product embeddings) → DynamoDB (sessions, products, orders) → Bedrock Guardrails (content filtering)

Technology Stack for Visualizations

All visualization files include runnable code examples using:

Library	Version	Purpose
Plotly (Python)	5.x	Interactive 3D surface plots, scatter plots, heatmaps
Three.js (JavaScript)	r150+	Browser-based 3D rendering — service graphs, agent networks, embedding spaces
Mermaid	10.x	In-markdown diagrams — flowcharts, sequence, mindmaps, state, class
NumPy	1.24+	Data generation for Plotly visualizations
Pandas	2.0+	Data manipulation for metric analysis

Install Python dependencies:

pip install plotly pandas numpy scikit-learn