Agents — MangaAssist Architecture Guide

MangaAssist uses a layered agent architecture built on Strands Agents and AWS Agent Squad, deployed via Amazon Bedrock (Claude 3.5 Sonnet). This document covers the agent model, lifecycle, memory, tool dispatch, and failure handling.

Agent Types

Orchestrator Agent (Supervisor)

The central control plane. It receives every user message, classifies intent, and decides which tools or sub-agents to invoke.

Model: Claude 3.5 Sonnet via Amazon Bedrock
Role: Stateful supervisor — maintains conversation context and issues tool calls
Routing: The LLM itself is the router; there is no hardcoded routing code. Tool descriptions act as routing logic. The model reads all tool manifests at inference time and selects the right combination.
Latency budget: P99 < 3 seconds end-to-end; first token streamed to the user at ~950 ms

Specialized Sub-Agents (AWS Agent Squad)

Discrete agents that handle bounded domains. Each is independently deployable and exposes a defined interface to the orchestrator.

Sub-Agent	Domain	Backing Data
ProductSearchAgent	Catalog discovery	OpenSearch (5 M+ titles)
OrderStatusAgent	Order tracking & returns	RDS + ElastiCache
RecommendationAgent	Personalized picks	DynamoDB + personalization vectors
MangaQAAgent	Product Q&A and editorial	OpenSearch + S3

Agent Lifecycle

Initialize → Plan → Act → Observe → Reflect → (loop or terminate)

Initialize — Attach system prompt, inject conversation history from DynamoDB, load user preferences from cache.
Plan — Claude reasons over the intent + context and proposes a tool-call plan.
Act — Tool calls are dispatched (parallel where independent, sequential where chained).
Observe — Tool results are appended to the context window.
Reflect — Claude synthesizes results into a grounded response; post-generation guardrails validate before streaming to the user.
Terminate — Session state is persisted back to DynamoDB; metrics emitted to CloudWatch.

State Management

Agents maintain three layers of state.

Conversation State (Short-term)

Store: ElastiCache Redis, TTL 30 minutes
Contents: Recent message turns, active intent, pending tool results
Schema key: session:{session_id}:context

Session State (Medium-term)

Store: Amazon DynamoDB (MangaAssistSessions table)
Contents: Full turn history, summary after N turns, extracted entities (series name, ASIN, volume, language)
TTL: 24 hours (configurable per user plan)

Long-term Memory

Store: DynamoDB user profile table
Contents: Favourite genres, purchase history, language preferences, escalation history
Updated: After session close or significant preference signal

Tool Dispatch

Parallel Dispatch

When a user request requires independent data sources, tools are called concurrently using asyncio.gather. Up to 5 tools can run simultaneously.

User: "Show me volume 3 of Naruto — is it in stock and are there good reviews?"

Parallel dispatch:
  ├── catalog_search_mcp(series="Naruto", volume=3)
  ├── order_inventory_mcp(asin="...")
  └── review_sentiment_mcp(asin="...")

Sequential Chaining

When one tool's output is needed as input to the next, calls are chained. This is emergent from the model's reasoning — not hardcoded.

1. catalog_search_mcp → returns ASIN
2. review_sentiment_mcp(asin=<result from step 1>)

Tool Selection Logic

The orchestrator agent reads tool manifests (name, description, input schema) injected into the system prompt. Claude selects tools purely based on their descriptions. Manifest quality is therefore the primary lever for improving routing accuracy — the description IS the routing rule.

Memory Patterns

Conversation Summarization

After every 10 turns, the orchestrator summarises the conversation and stores the summary in DynamoDB. The full turn buffer is trimmed to the last 3 turns plus the summary. This keeps the context window manageable without losing long-range context.

Prompt Caching

System prompts and static tool manifests are cached using Bedrock's prompt caching feature. Cache hit rate target: > 85%. This reduces both latency and token cost on every request.

Entity Persistence

Entities extracted during a session (series name, ASIN, volume number, preferred language) are stored in session state and automatically injected into subsequent turns, avoiding repeated entity extraction.

Failure Handling

Stopping Conditions

The agent loop has explicit stopping conditions to prevent runaway execution:

Max iterations: 10 tool-call rounds per user message
Wall-clock timeout: 8 seconds for the full agentic loop
Token budget: Hard cap on context window growth per session

Circuit Breakers

Each MCP server has a circuit breaker with three states: Closed → Open → Half-Open.

Threshold: 5 failures in a 60-second window opens the circuit
Recovery probe: Single request in half-open state; success closes the circuit
Fallback: Graceful degradation message returned to user; orchestrator skips that tool

Retry Policy

Retries: Up to 2 retries with exponential backoff (100 ms → 200 ms → 400 ms)
Idempotency: All MCP tool calls are read-only by design; retries are safe
Non-retryable: 400-class errors from tools (bad input) are not retried

Human-in-the-Loop Escalation

When the orchestrator detects any of the following, it triggers the escalation workflow:

Confidence score below 0.6 after the two-stage classifier
User explicit request ("talk to a person")
Guardrail violation that cannot be resolved by rephrasing
Consecutive failed tool calls (> 2)

On escalation: 1. Session context snapshot is serialised to DynamoDB 2. Escalation event published to SNS → connects to human agent queue (Amazon Connect) 3. Human agent receives full context summary; conversation continues seamlessly

Security Boundaries

All agent-to-tool calls are IAM-scoped; each MCP server runs under a least-privilege execution role
User PII is stripped from tool inputs before dispatch (guardrail layer)
Tool outputs are validated before insertion into the context window (ASIN validation, price sanity check, link validation)
Agent actions are traced end-to-end via AWS X-Ray; every tool call emits a trace segment

RAG-MCP-Integration/08-mcp-orchestration-router.md — Orchestration and tool selection deep dive
Implementation-Integration-Domain2/ — AWS AIP-C01 Domain 2 agent skills
LLD-Questions/topic-deep-dives/01-orchestrator-request-flow.md — Orchestrator state machine and latency budget
mangaassist_workflow_interview_pack/ — All critical and non-critical workflow references
subagents.md — MCP server sub-agents detail
skills.md — AWS AIP-C01 Domain 2 skills inventory