4. End-to-End Architecture - High-Level Design (HLD)
Architecture Overview
MangaAssist is a microservices-based, event-driven chatbot that sits inside Amazon's existing service-oriented architecture. It does not reinvent infrastructure; it composes catalog, order, recommendation, and support services behind a new orchestration layer. The assistant uses a hybrid execution model: structured requests stay on deterministic API or template paths, while open-ended and ambiguous requests use grounded LLM generation.
High-Level Architecture Diagram
graph TB
subgraph "Client Layer"
A[Amazon.com JP Manga Store<br>Web / Mobile] -->|WebSocket / HTTPS| B[Chat Edge<br>CloudFront + ALB / API Gateway]
end
subgraph "Edge and Auth"
B --> C[Auth and Session Service]
B --> D[Rate Limiter]
end
subgraph "Orchestration Layer"
C --> E[Chatbot Orchestrator]
D --> E
E --> F[Intent Classifier]
E --> G[Conversation Memory<br>DynamoDB]
end
subgraph "Intelligence Layer"
E -->|discovery / recommendation| H[Recommendation Engine]
E -->|faq / policy| I[RAG Pipeline]
E -->|order / checkout / support| J[Order, Checkout, and Support Router]
E -->|product question| K[Product Q&A Service]
E --> L[Bedrock LLM<br>Claude 3.5 Sonnet]
end
subgraph "Caching Layer"
E --> Cache[ElastiCache<br>Product / Promo / Reco Cache]
end
subgraph "Data Layer"
K --> M[Product Catalog]
J --> N[Order Service]
J --> O[Returns Service]
J --> P1[Checkout Service]
H --> P2[User Profile and History]
I --> Q[Knowledge Base<br>OpenSearch]
E --> R[Promotions Service]
end
subgraph "Safety and Output"
L --> S[Guardrails]
S --> T[Response Formatter]
T --> B
end
subgraph "Observability"
E --> U[Logging]
E --> V[Metrics]
E --> W[Analytics]
T --> X[Feedback Capture]
end
subgraph "Fallback"
E -->|escalation| Y[Amazon Connect<br>Human Agent Queue]
end
Component Breakdown
1. Frontend Integration
- A chat widget is embedded in JP Manga store pages on web and mobile.
- The widget uses WebSocket for streaming responses and HTTPS fallback for environments that block streaming.
- WebSocket makes the assistant feel faster because users see tokens as they generate.
2. Chat Edge
- Single ingress point for chat traffic.
- Supports WebSocket streaming with HTTPS fallback for clients that cannot hold a persistent connection.
- Session initialization happens via
POST /chat/init, which validates the user and creates a session before messages are exchanged. - Handles routing, TLS termination, throttling, and request validation.
- Decouples the frontend transport from downstream service contracts.
- WebSocket connections use heartbeat pings every 30 seconds; idle connections are closed after 5 minutes.
3. Authentication and Session
- Logged-in users are identified through the Amazon session token.
- Guest users receive a temporary session ID.
- Personalization features require authentication; discovery and FAQ do not.
4. Rate Limiter
- Token-bucket rate limiter per user or session.
- Protects downstream services and limits abuse.
- Keeps LLM usage under control.
5. Chatbot Orchestrator
- The central coordinator for every user message.
- Loads conversation state, calls the intent classifier, fans out to downstream systems, chooses between template/API-first and LLM-backed response paths, runs guardrails, and returns the response.
- Does not do NLP or data fetching itself; it coordinates work.
6. Intent Classifier
- Lightweight classifier that maps messages to intents.
- Common intents include product discovery, product question, FAQ, order tracking, return request, promotion inquiry, recommendation, checkout help, escalation, and chitchat.
- Deterministic routing is cheaper and faster than sending every message to the LLM.
7. Conversation Memory
- Stores the last N turns per session.
- Implemented in DynamoDB with TTL.
- Supports multi-turn context such as "What about the second one you mentioned?"
8. RAG Pipeline
- Used for FAQ, policy, and product knowledge.
- Retrieves chunks from the knowledge base, augments the prompt, and grounds the LLM in real data.
9. Recommendation Engine
- Returns ranked ASINs based on browsing history, past purchases, and current query.
- Reuses Amazon's existing personalization strength instead of rebuilding it.
10. Product Catalog Service
- Provides product data by ASIN: title, author, price, format, availability, images, and review summary.
- Every product-related response depends on it.
11. User Profile and History
- Stores user-level preferences, past purchases, browsing history, and reading history.
- Feeds the Recommendation Engine so results reflect what this specific user has read or bought.
- Authenticated sessions only; guest sessions use page context and in-session browsing instead.
12. Promotions Service
- Returns active promotions, coupons, and deals relevant to manga.
- Queried by the Orchestrator for any response that could benefit from a promotional nudge.
- Results are cached in ElastiCache with event-driven invalidation.
13. Order, Checkout, and Support Services
- Existing Order, Returns, and Checkout services handle purchase and post-purchase use cases.
- They are queried with customer, session, or cart context depending on the workflow.
14. LLM Response Generation
- Bedrock hosts the generation model.
- The LLM receives structured context and generates a natural-language response only for recommendation, FAQ, product explanation, and other ambiguous flows.
- Structured intents such as greetings, simple order lookups, and low-ambiguity promotion answers can bypass the LLM and use templates.
15. Guardrails and Moderation
- Filters PII leakage, toxic content, off-topic responses, competitor mentions, and hallucinated prices or dates.
- Guardrails are mandatory because trust is the product.
16. Analytics and Monitoring
- CloudWatch, Kinesis, and Redshift provide operational and business visibility.
- Metrics include latency, intent distribution, resolution rate, and escalation rate.
17. Human Handoff
- Amazon Connect receives the escalation payload.
- The agent gets a conversation summary plus user context so the customer does not repeat themselves.
18. Caching Layer
- ElastiCache (Redis) sits between the Orchestrator and frequently-queried data services.
- Product details, recommendations, and promotions are cached to reduce latency and protect downstream services.
- Prices are never cached; they are always fetched live to avoid stale pricing.
- Cache invalidation is event-driven for catalog and promotion changes.
Deployment Architecture
graph TD
subgraph "Edge"
CF[CloudFront] --> ALB[Application Load Balancer]
end
subgraph "Compute - ECS Fargate Cluster"
ALB --> ORC[Orchestrator Service<br>Auto-scaling: 10-100 tasks]
ALB --> WS[WebSocket Handler<br>Sticky sessions]
end
subgraph "Compute - Lambda"
ORC -->|overflow| LAMB[Lambda Burst Workers<br>Concurrency: 1000]
end
subgraph "ML Inference"
ORC --> SM[SageMaker Endpoint<br>Intent Classifier]
ORC --> BR[Amazon Bedrock<br>Claude 3.5 Sonnet]
end
subgraph "Storage"
ORC --> DDB[DynamoDB<br>Conversation Memory]
ORC --> OS[OpenSearch Serverless<br>Vector Store]
ORC --> EC[ElastiCache Redis<br>Response Cache]
end
subgraph "Async / Analytics"
ORC --> KIN[Kinesis Data Stream]
KIN --> RS[Redshift<br>Analytics Warehouse]
end
subgraph "Support"
ORC --> AC[Amazon Connect<br>Human Handoff]
end
Error Handling and Degraded Modes
Not every dependency is equally critical. When a service is down, the chatbot degrades gracefully rather than failing entirely.
| Service Down | Impact | Fallback Behavior |
|---|---|---|
| Recommendation Engine | Cannot personalize suggestions | Return trending / popular manga from cache |
| Product Catalog | Cannot show product details | Return a search link with a brief apology |
| Order Service | Cannot look up order status | Ask the user to check the order tracking page |
| RAG / Knowledge Base | Cannot retrieve grounded context | Use LLM with system knowledge only (flag as lower confidence) |
| LLM / Bedrock | Cannot generate natural-language responses | Return template-based responses for known intents; show unavailable message for open-ended queries |
| ElastiCache | Cannot serve cached data | Fall through to origin services directly (higher latency) |
| DynamoDB (Memory) | Cannot load or save conversation turns | Treat request as stateless; warn user that context may be lost |
Data Flow Summary
sequenceDiagram
participant User
participant Frontend
participant ChatEdge
participant Orchestrator
participant IntentClassifier
participant Services
participant LLM
participant Guardrails
User->>Frontend: Types message
Frontend->>ChatEdge: WebSocket message
ChatEdge->>Orchestrator: Authenticated request
Orchestrator->>IntentClassifier: Classify intent
IntentClassifier-->>Orchestrator: intent = recommendation
Orchestrator->>Services: Query Recommendation Engine + Catalog
Services-->>Orchestrator: Product data
Orchestrator->>LLM: Context + prompt
LLM-->>Orchestrator: Generated response
Orchestrator->>Guardrails: Validate response
Guardrails-->>Orchestrator: Approved response
Orchestrator-->>ChatEdge: Final response
ChatEdge-->>Frontend: Stream to user
Frontend-->>User: Display response
Technology Stack Summary
| Layer | Technology | Why |
|---|---|---|
| Frontend | React | Existing stack |
| Transport | WebSocket + HTTPS | Streaming plus reliability |
| Chat Edge | CloudFront + ALB / API Gateway | WebSocket plus HTTPS ingress |
| Auth | Internal session validation service | Reuses Amazon auth context |
| Orchestrator | ECS Fargate service + Lambda burst workers | Streaming-friendly and elastic |
| Intent Classifier | SageMaker endpoint | Low-latency ML inference |
| LLM | Amazon Bedrock (Claude 3.5 Sonnet) | Managed and cost-effective |
| Vector Store | OpenSearch Serverless | Managed vector search |
| Memory | DynamoDB | Low-latency key-value store |
| Cache | ElastiCache (Redis) | Sub-millisecond reads for hot data |
| Catalog | DynamoDB + Elasticsearch | Existing infrastructure |
| Analytics | CloudWatch + Kinesis + Redshift | Full observability |
| Escalation | Amazon Connect | Existing support infrastructure |