LOCAL PREVIEW View on GitHub

MangaAssist Architecture — Deep Dive Series

A narrative-driven walk through the MangaAssist chatbot architecture. Start with the story, then drill into any layer.

The series intentionally focuses on the agent layer and cross-cutting concerns. The seven MCP servers themselves already have detailed deep dives under ../RAG-MCP-Integration/ — those are referenced rather than duplicated.

Convention: every agent dive ends with a Constraint Sanity Check

Architecture docs tend to quote optimistic latency budgets, cache hit rates, and throughput targets that don't survive load testing. To resist that drift, every agent deep dive in this series ends with a Validation: Constraint Sanity Check section that:

  • Lists each claimed metric (latency, cache hit rate, iteration limits, throughput)
  • Verdict: Realistic / Aggressive / Inconsistent / Unrealistic
  • Shows the math or component breakdown that justifies the verdict
  • Flags inconsistencies between metrics that look fine in isolation but contradict each other
  • Recommends honest revisions where the original number doesn't hold

The point isn't to tear down the architecture — it's to separate whiteboard targets from measured behavior. If a number came from a load test, link the test. If it came from a whiteboard, label it that way.


Reading order

# Document What you'll learn
00 The Story One query end-to-end. Read first.
01 Orchestrator Agent The supervisor — lifecycle, system prompt, tool dispatch, stopping conditions
02 ProductSearchAgent Catalog discovery sub-agent backed by OpenSearch
03 OrderStatusAgent Order tracking and inventory sub-agent backed by RDS + ElastiCache
04 RecommendationAgent Personalization sub-agent backed by DynamoDB + Personalize vectors
05 MangaQAAgent Editorial Q&A sub-agent backed by OpenSearch + S3
06 Tool Dispatch & Routing Why the router IS the LLM. Tool description engineering. Parallel vs sequential dispatch.
07 Failure Handling Circuit breakers, retries, stopping conditions, graceful degradation
08 Memory Architecture Three-tier memory: conversation, session, long-term. Summarization. Prompt caching.
09 Escalation Workflow When and how to hand off to a human via Amazon Connect

How the layers fit

        ┌──────────────────────────────────┐
        │  Orchestrator (Claude 3.5)       │  ← reasoning, no knowledge
        │  ┌────────────────────────────┐  │
        │  │  4 Sub-Agents              │  │  ← bounded domains
        │  │  ┌──────────────────────┐  │  │
        │  │  │  7 RAG-MCP servers   │  │  │  ← knowledge, no reasoning
        │  │  └──────────────────────┘  │  │
        │  └────────────────────────────┘  │
        └──────────────────────────────────┘
  • Orchestrator — reasons, does not know. Holds the system prompt, tool manifests, conversation context.
  • Sub-Agents — bounded specialists. Each owns one domain. They report up, never sideways.
  • RAG-MCP Servers — pure knowledge retrieval. Each is a self-contained RAG pipeline.

Source documents this series builds on