MangaAssist Architecture — Deep Dive Series

A narrative-driven walk through the MangaAssist chatbot architecture. Start with the story, then drill into any layer.

The series intentionally focuses on the agent layer and cross-cutting concerns. The seven MCP servers themselves already have detailed deep dives under ../RAG-MCP-Integration/ — those are referenced rather than duplicated.

Convention: every agent dive ends with a Constraint Sanity Check

Architecture docs tend to quote optimistic latency budgets, cache hit rates, and throughput targets that don't survive load testing. To resist that drift, every agent deep dive in this series ends with a Validation: Constraint Sanity Check section that:

Lists each claimed metric (latency, cache hit rate, iteration limits, throughput)
Verdict: Realistic / Aggressive / Inconsistent / Unrealistic
Shows the math or component breakdown that justifies the verdict
Flags inconsistencies between metrics that look fine in isolation but contradict each other
Recommends honest revisions where the original number doesn't hold

The point isn't to tear down the architecture — it's to separate whiteboard targets from measured behavior. If a number came from a load test, link the test. If it came from a whiteboard, label it that way.

Reading order

#	Document	What you'll learn
00	The Story	One query end-to-end. Read first.
01	Orchestrator Agent	The supervisor — lifecycle, system prompt, tool dispatch, stopping conditions
02	ProductSearchAgent	Catalog discovery sub-agent backed by OpenSearch
03	OrderStatusAgent	Order tracking and inventory sub-agent backed by RDS + ElastiCache
04	RecommendationAgent	Personalization sub-agent backed by DynamoDB + Personalize vectors
05	MangaQAAgent	Editorial Q&A sub-agent backed by OpenSearch + S3
06	Tool Dispatch & Routing	Why the router IS the LLM. Tool description engineering. Parallel vs sequential dispatch.
07	Failure Handling	Circuit breakers, retries, stopping conditions, graceful degradation
08	Memory Architecture	Three-tier memory: conversation, session, long-term. Summarization. Prompt caching.
09	Escalation Workflow	When and how to hand off to a human via Amazon Connect

How the layers fit

        ┌──────────────────────────────────┐
        │  Orchestrator (Claude 3.5)       │  ← reasoning, no knowledge
        │  ┌────────────────────────────┐  │
        │  │  4 Sub-Agents              │  │  ← bounded domains
        │  │  ┌──────────────────────┐  │  │
        │  │  │  7 RAG-MCP servers   │  │  │  ← knowledge, no reasoning
        │  │  └──────────────────────┘  │  │
        │  └────────────────────────────┘  │
        └──────────────────────────────────┘

Orchestrator — reasons, does not know. Holds the system prompt, tool manifests, conversation context.
Sub-Agents — bounded specialists. Each owns one domain. They report up, never sideways.
RAG-MCP Servers — pure knowledge retrieval. Each is a self-contained RAG pipeline.

Source documents this series builds on

../agents.md — Canonical agent reference
../subagents.md — Sub-agent details
../10-ai-llm-design.md — LLM design decisions
../04-architecture-hld.md — High-level architecture
../04b-architecture-lld.md — Low-level architecture
../RAG-MCP-Integration/ — Per-MCP deep dives