Agents — MangaAssist Architecture Guide
MangaAssist uses a layered agent architecture built on Strands Agents and AWS Agent Squad, deployed via Amazon Bedrock (Claude 3.5 Sonnet). This document covers the agent model, lifecycle, memory, tool dispatch, and failure handling.
Agent Types
Orchestrator Agent (Supervisor)
The central control plane. It receives every user message, classifies intent, and decides which tools or sub-agents to invoke.
- Model: Claude 3.5 Sonnet via Amazon Bedrock
- Role: Stateful supervisor — maintains conversation context and issues tool calls
- Routing: The LLM itself is the router; there is no hardcoded routing code. Tool descriptions act as routing logic. The model reads all tool manifests at inference time and selects the right combination.
- Latency budget: P99 < 3 seconds end-to-end; first token streamed to the user at ~950 ms
Specialized Sub-Agents (AWS Agent Squad)
Discrete agents that handle bounded domains. Each is independently deployable and exposes a defined interface to the orchestrator.
| Sub-Agent | Domain | Backing Data |
|---|---|---|
| ProductSearchAgent | Catalog discovery | OpenSearch (5 M+ titles) |
| OrderStatusAgent | Order tracking & returns | RDS + ElastiCache |
| RecommendationAgent | Personalized picks | DynamoDB + personalization vectors |
| MangaQAAgent | Product Q&A and editorial | OpenSearch + S3 |
Agent Lifecycle
Initialize → Plan → Act → Observe → Reflect → (loop or terminate)
- Initialize — Attach system prompt, inject conversation history from DynamoDB, load user preferences from cache.
- Plan — Claude reasons over the intent + context and proposes a tool-call plan.
- Act — Tool calls are dispatched (parallel where independent, sequential where chained).
- Observe — Tool results are appended to the context window.
- Reflect — Claude synthesizes results into a grounded response; post-generation guardrails validate before streaming to the user.
- Terminate — Session state is persisted back to DynamoDB; metrics emitted to CloudWatch.
State Management
Agents maintain three layers of state.
Conversation State (Short-term)
- Store: ElastiCache Redis, TTL 30 minutes
- Contents: Recent message turns, active intent, pending tool results
- Schema key:
session:{session_id}:context
Session State (Medium-term)
- Store: Amazon DynamoDB (
MangaAssistSessionstable) - Contents: Full turn history, summary after N turns, extracted entities (series name, ASIN, volume, language)
- TTL: 24 hours (configurable per user plan)
Long-term Memory
- Store: DynamoDB user profile table
- Contents: Favourite genres, purchase history, language preferences, escalation history
- Updated: After session close or significant preference signal
Tool Dispatch
Parallel Dispatch
When a user request requires independent data sources, tools are called concurrently using asyncio.gather. Up to 5 tools can run simultaneously.
User: "Show me volume 3 of Naruto — is it in stock and are there good reviews?"
Parallel dispatch:
├── catalog_search_mcp(series="Naruto", volume=3)
├── order_inventory_mcp(asin="...")
└── review_sentiment_mcp(asin="...")
Sequential Chaining
When one tool's output is needed as input to the next, calls are chained. This is emergent from the model's reasoning — not hardcoded.
1. catalog_search_mcp → returns ASIN
2. review_sentiment_mcp(asin=<result from step 1>)
Tool Selection Logic
The orchestrator agent reads tool manifests (name, description, input schema) injected into the system prompt. Claude selects tools purely based on their descriptions. Manifest quality is therefore the primary lever for improving routing accuracy — the description IS the routing rule.
Memory Patterns
Conversation Summarization
After every 10 turns, the orchestrator summarises the conversation and stores the summary in DynamoDB. The full turn buffer is trimmed to the last 3 turns plus the summary. This keeps the context window manageable without losing long-range context.
Prompt Caching
System prompts and static tool manifests are cached using Bedrock's prompt caching feature. Cache hit rate target: > 85%. This reduces both latency and token cost on every request.
Entity Persistence
Entities extracted during a session (series name, ASIN, volume number, preferred language) are stored in session state and automatically injected into subsequent turns, avoiding repeated entity extraction.
Failure Handling
Stopping Conditions
The agent loop has explicit stopping conditions to prevent runaway execution:
- Max iterations: 10 tool-call rounds per user message
- Wall-clock timeout: 8 seconds for the full agentic loop
- Token budget: Hard cap on context window growth per session
Circuit Breakers
Each MCP server has a circuit breaker with three states: Closed → Open → Half-Open.
- Threshold: 5 failures in a 60-second window opens the circuit
- Recovery probe: Single request in half-open state; success closes the circuit
- Fallback: Graceful degradation message returned to user; orchestrator skips that tool
Retry Policy
- Retries: Up to 2 retries with exponential backoff (100 ms → 200 ms → 400 ms)
- Idempotency: All MCP tool calls are read-only by design; retries are safe
- Non-retryable: 400-class errors from tools (bad input) are not retried
Human-in-the-Loop Escalation
When the orchestrator detects any of the following, it triggers the escalation workflow:
- Confidence score below 0.6 after the two-stage classifier
- User explicit request ("talk to a person")
- Guardrail violation that cannot be resolved by rephrasing
- Consecutive failed tool calls (> 2)
On escalation: 1. Session context snapshot is serialised to DynamoDB 2. Escalation event published to SNS → connects to human agent queue (Amazon Connect) 3. Human agent receives full context summary; conversation continues seamlessly
Security Boundaries
- All agent-to-tool calls are IAM-scoped; each MCP server runs under a least-privilege execution role
- User PII is stripped from tool inputs before dispatch (guardrail layer)
- Tool outputs are validated before insertion into the context window (ASIN validation, price sanity check, link validation)
- Agent actions are traced end-to-end via AWS X-Ray; every tool call emits a trace segment
Related Documents
- RAG-MCP-Integration/08-mcp-orchestration-router.md — Orchestration and tool selection deep dive
- Implementation-Integration-Domain2/ — AWS AIP-C01 Domain 2 agent skills
- LLD-Questions/topic-deep-dives/01-orchestrator-request-flow.md — Orchestrator state machine and latency budget
- mangaassist_workflow_interview_pack/ — All critical and non-critical workflow references
- subagents.md — MCP server sub-agents detail
- skills.md — AWS AIP-C01 Domain 2 skills inventory