LOCAL PREVIEW View on GitHub

4. Debugging Quick Reference

This is the scan-first guide for MangaAssist incidents. Use it to decide where to look before diving into the longer documents.

  • Bedrock-specific analysis: 01-bedrock-logging.md
  • Full application trace path: 02-application-logging.md
  • Interview narratives and root-cause stories: 03-debugging-scenarios.md

Triage Matrix

Symptom First place to check Second place to check Third place to check
Wrong answer classifier output and confidence retrieval chunks and reranker scores Bedrock prompt plus raw output
Slow answer orchestrator latency breakdown Bedrock latency and throttling Lambda cold start or ECS saturation
No answer API Gateway status and backend exceptions WebSocket disconnects Bedrock timeout or fallback rate
Blocked answer guardrail trace and block category prompt wording for risky phrasing intent-specific false-positive rate
Stale answer chunk last_updated metadata catalog freshness reindex or refresh pipeline
Misrouted answer classifier confusion rule-based prefilter coverage shadow-mode regression comparison

First Three Places To Check by Symptom

If the answer is wrong

  1. Check classifier prediction and confidence.
  2. Check retrieved chunks, reranker scores, and source freshness.
  3. Check Bedrock prompt payload, raw output, and post-generation validation.

If the answer is slow

  1. Check orchestrator trace for the slowest span.
  2. Check Bedrock invocation latency and throttling rate.
  3. Check runtime overhead such as Lambda cold starts, ECS saturation, or open circuit breakers.

If there is no answer

  1. Check API Gateway request acceptance and response status.
  2. Check orchestrator exceptions or early fallback behavior.
  3. Check WebSocket delivery logs and disconnect reasons.

If the answer is blocked or evasive

  1. Check Bedrock guardrail category and confidence.
  2. Check whether the content is a domain false positive.
  3. Check whether the fallback text replaced an otherwise valid raw answer.

If the answer is stale

  1. Check last_updated on selected chunks.
  2. Check product catalog freshness and event propagation.
  3. Check whether reindexing or cache warming affected retrieval.

Dashboard Checklist

Dependency Health

  • API Gateway 4xx and 5xx
  • circuit breaker open events
  • DynamoDB throttle events
  • OpenSearch latency and error spikes

Model Health

  • TTFT P50 and P99
  • Bedrock throttling events
  • input and output token trends
  • timeout and fallback rate

Retrieval Health

  • Recall@3 trend
  • high-frequency chunk dominance
  • stale chunk selection
  • reranker score drift

Guardrail Behavior

  • guardrail block rate
  • false positives by intent or genre
  • ASIN validation failures
  • price validation failures

Business Impact

  • escalation rate
  • thumbs-down trend
  • conversion rate for chat users
  • support deflection trend

Thresholds Worth Memorizing

These are the most useful numbers to remember from the repo:

Signal Target or threshold
P99 first-token latency < 1.5s target
P99 full response latency < 3s target
Error rate < 0.5% target
Intent accuracy > 90% target
Hallucination rate < 2% target
Guardrail block rate < 5% target
LLM timeout rate > 5% is incident-worthy
Bedrock throttling > 5% triggers response action
Guardrail pass rate in eval >= 95%

Reusable CloudWatch Logs Insights Queries

Bedrock failures

fields @timestamp, request_id, model_id, invocation_status, error_code, bedrock_latency_ms
| filter invocation_status in ["timeout", "throttled", "error"]
| sort @timestamp desc
| limit 50

Guardrail block concentration

stats count(*) as blocked by intent, guardrail_category
| sort blocked desc

Token inflation by intent

stats avg(input_tokens) as avg_in, avg(output_tokens) as avg_out by intent, model_id
| sort avg_out desc

Frequently retrieved chunks

fields retrieved_chunk_ids
| unnest retrieved_chunk_ids as chunk_id
| stats count(*) as retrieval_count by chunk_id
| sort retrieval_count desc
| limit 20

Misrouting investigation

fields @timestamp, request_id, intent, intent_confidence, fallback_used
| filter intent_confidence < 0.70
| sort @timestamp desc
| limit 100

Useful AWS CLI Checks

These are interview-friendly examples, not environment-specific production commands.

Tail recent Lambda orchestrator logs

aws logs tail /aws/lambda/mangaassist-orchestrator --since 15m --follow

Search a Bedrock log group for one request ID

aws logs filter-log-events --log-group-name /aws/bedrock/mangaassist --filter-pattern "req-123"

Inspect a SageMaker endpoint's health

aws cloudwatch get-metric-statistics --namespace AWS/SageMaker --metric-name ModelLatency --dimensions Name=EndpointName,Value=mangaassist-intent-classifier --start-time 2026-03-22T17:00:00Z --end-time 2026-03-22T18:00:00Z --period 60 --statistics Average p99

Look at recent DynamoDB throttling or latency metrics

aws cloudwatch get-metric-statistics --namespace AWS/DynamoDB --metric-name SuccessfulRequestLatency --dimensions Name=TableName,Value=MangaAssistSessions --start-time 2026-03-22T17:00:00Z --end-time 2026-03-22T18:00:00Z --period 60 --statistics Average Maximum

Interview Cheat Sheet

Bedrock throttle incident

I separated upstream fan-out latency from Bedrock latency, found throttling above the 5% threshold, and shifted more traffic to lighter-weight or non-LLM paths until the spike passed.

Stale RAG incident

I proved the catalog was current, then used retrieval traces to show stale chunks were being selected because the knowledge-base refresh path lagged behind.

Misrouting incident

I traced recommendation-like prompts into the classifier, found they were being labeled as product_question, and corrected it with more training data plus route validation.

Guardrail over-blocking incident

I inspected blocked samples, found manga-domain terms were being treated as unsafe, and tuned guardrails toward context-aware confidence scoring instead of rigid binary blocking.

Price hallucination incident

I confirmed the source catalog data was correct, then showed the model invented a discount narrative from dense comparative context and fixed it with stricter prompting plus hard validation.


Usage Guidance

Use this file when you need to decide where to start.

Use 01-bedrock-logging.md when the issue is likely inside generation, retrieval visibility, or guardrails.

Use 02-application-logging.md when the issue may be anywhere in the request path.

Use 03-debugging-scenarios.md when you want a full interview answer with context, root cause, fix, and prevention.