4. End-to-End Architecture - High-Level Design (HLD)

Architecture Overview

MangaAssist is a microservices-based, event-driven chatbot that sits inside Amazon's existing service-oriented architecture. It does not reinvent infrastructure; it composes catalog, order, recommendation, and support services behind a new orchestration layer. The assistant uses a hybrid execution model: structured requests stay on deterministic API or template paths, while open-ended and ambiguous requests use grounded LLM generation.

High-Level Architecture Diagram

graph TB
    subgraph "Client Layer"
        A[Amazon.com JP Manga Store<br>Web / Mobile] -->|WebSocket / HTTPS| B[Chat Edge<br>CloudFront + ALB / API Gateway]
    end

    subgraph "Edge and Auth"
        B --> C[Auth and Session Service]
        B --> D[Rate Limiter]
    end

    subgraph "Orchestration Layer"
        C --> E[Chatbot Orchestrator]
        D --> E
        E --> F[Intent Classifier]
        E --> G[Conversation Memory<br>DynamoDB]
    end

    subgraph "Intelligence Layer"
        E -->|discovery / recommendation| H[Recommendation Engine]
        E -->|faq / policy| I[RAG Pipeline]
        E -->|order / checkout / support| J[Order, Checkout, and Support Router]
        E -->|product question| K[Product Q&A Service]
        E --> L[Bedrock LLM<br>Claude 3.5 Sonnet]
    end

    subgraph "Caching Layer"
        E --> Cache[ElastiCache<br>Product / Promo / Reco Cache]
    end

    subgraph "Data Layer"
        K --> M[Product Catalog]
        J --> N[Order Service]
        J --> O[Returns Service]
        J --> P1[Checkout Service]
        H --> P2[User Profile and History]
        I --> Q[Knowledge Base<br>OpenSearch]
        E --> R[Promotions Service]
    end

    subgraph "Safety and Output"
        L --> S[Guardrails]
        S --> T[Response Formatter]
        T --> B
    end

    subgraph "Observability"
        E --> U[Logging]
        E --> V[Metrics]
        E --> W[Analytics]
        T --> X[Feedback Capture]
    end

    subgraph "Fallback"
        E -->|escalation| Y[Amazon Connect<br>Human Agent Queue]
    end

Component Breakdown

1. Frontend Integration

A chat widget is embedded in JP Manga store pages on web and mobile.
The widget uses WebSocket for streaming responses and HTTPS fallback for environments that block streaming.
WebSocket makes the assistant feel faster because users see tokens as they generate.

2. Chat Edge

Single ingress point for chat traffic.
Supports WebSocket streaming with HTTPS fallback for clients that cannot hold a persistent connection.
Session initialization happens via POST /chat/init, which validates the user and creates a session before messages are exchanged.
Handles routing, TLS termination, throttling, and request validation.
Decouples the frontend transport from downstream service contracts.
WebSocket connections use heartbeat pings every 30 seconds; idle connections are closed after 5 minutes.

3. Authentication and Session

Logged-in users are identified through the Amazon session token.
Guest users receive a temporary session ID.
Personalization features require authentication; discovery and FAQ do not.

4. Rate Limiter

Token-bucket rate limiter per user or session.
Protects downstream services and limits abuse.
Keeps LLM usage under control.

5. Chatbot Orchestrator

The central coordinator for every user message.
Loads conversation state, calls the intent classifier, fans out to downstream systems, chooses between template/API-first and LLM-backed response paths, runs guardrails, and returns the response.
Does not do NLP or data fetching itself; it coordinates work.

6. Intent Classifier

Lightweight classifier that maps messages to intents.
Common intents include product discovery, product question, FAQ, order tracking, return request, promotion inquiry, recommendation, checkout help, escalation, and chitchat.
Deterministic routing is cheaper and faster than sending every message to the LLM.

7. Conversation Memory

Stores the last N turns per session.
Implemented in DynamoDB with TTL.
Supports multi-turn context such as "What about the second one you mentioned?"

8. RAG Pipeline

Used for FAQ, policy, and product knowledge.
Retrieves chunks from the knowledge base, augments the prompt, and grounds the LLM in real data.

9. Recommendation Engine

Returns ranked ASINs based on browsing history, past purchases, and current query.
Reuses Amazon's existing personalization strength instead of rebuilding it.

10. Product Catalog Service

Provides product data by ASIN: title, author, price, format, availability, images, and review summary.
Every product-related response depends on it.

11. User Profile and History

Stores user-level preferences, past purchases, browsing history, and reading history.
Feeds the Recommendation Engine so results reflect what this specific user has read or bought.
Authenticated sessions only; guest sessions use page context and in-session browsing instead.

12. Promotions Service

Returns active promotions, coupons, and deals relevant to manga.
Queried by the Orchestrator for any response that could benefit from a promotional nudge.
Results are cached in ElastiCache with event-driven invalidation.

13. Order, Checkout, and Support Services

Existing Order, Returns, and Checkout services handle purchase and post-purchase use cases.
They are queried with customer, session, or cart context depending on the workflow.

14. LLM Response Generation

Bedrock hosts the generation model.
The LLM receives structured context and generates a natural-language response only for recommendation, FAQ, product explanation, and other ambiguous flows.
Structured intents such as greetings, simple order lookups, and low-ambiguity promotion answers can bypass the LLM and use templates.

15. Guardrails and Moderation

Filters PII leakage, toxic content, off-topic responses, competitor mentions, and hallucinated prices or dates.
Guardrails are mandatory because trust is the product.

16. Analytics and Monitoring

CloudWatch, Kinesis, and Redshift provide operational and business visibility.
Metrics include latency, intent distribution, resolution rate, and escalation rate.

17. Human Handoff

Amazon Connect receives the escalation payload.
The agent gets a conversation summary plus user context so the customer does not repeat themselves.

18. Caching Layer

ElastiCache (Redis) sits between the Orchestrator and frequently-queried data services.
Product details, recommendations, and promotions are cached to reduce latency and protect downstream services.
Prices are never cached; they are always fetched live to avoid stale pricing.
Cache invalidation is event-driven for catalog and promotion changes.

Deployment Architecture

graph TD
    subgraph "Edge"
        CF[CloudFront] --> ALB[Application Load Balancer]
    end

    subgraph "Compute - ECS Fargate Cluster"
        ALB --> ORC[Orchestrator Service<br>Auto-scaling: 10-100 tasks]
        ALB --> WS[WebSocket Handler<br>Sticky sessions]
    end

    subgraph "Compute - Lambda"
        ORC -->|overflow| LAMB[Lambda Burst Workers<br>Concurrency: 1000]
    end

    subgraph "ML Inference"
        ORC --> SM[SageMaker Endpoint<br>Intent Classifier]
        ORC --> BR[Amazon Bedrock<br>Claude 3.5 Sonnet]
    end

    subgraph "Storage"
        ORC --> DDB[DynamoDB<br>Conversation Memory]
        ORC --> OS[OpenSearch Serverless<br>Vector Store]
        ORC --> EC[ElastiCache Redis<br>Response Cache]
    end

    subgraph "Async / Analytics"
        ORC --> KIN[Kinesis Data Stream]
        KIN --> RS[Redshift<br>Analytics Warehouse]
    end

    subgraph "Support"
        ORC --> AC[Amazon Connect<br>Human Handoff]
    end

Error Handling and Degraded Modes

Not every dependency is equally critical. When a service is down, the chatbot degrades gracefully rather than failing entirely.

Service Down	Impact	Fallback Behavior
Recommendation Engine	Cannot personalize suggestions	Return trending / popular manga from cache
Product Catalog	Cannot show product details	Return a search link with a brief apology
Order Service	Cannot look up order status	Ask the user to check the order tracking page
RAG / Knowledge Base	Cannot retrieve grounded context	Use LLM with system knowledge only (flag as lower confidence)
LLM / Bedrock	Cannot generate natural-language responses	Return template-based responses for known intents; show unavailable message for open-ended queries
ElastiCache	Cannot serve cached data	Fall through to origin services directly (higher latency)
DynamoDB (Memory)	Cannot load or save conversation turns	Treat request as stateless; warn user that context may be lost

Data Flow Summary

sequenceDiagram
    participant User
    participant Frontend
    participant ChatEdge
    participant Orchestrator
    participant IntentClassifier
    participant Services
    participant LLM
    participant Guardrails

    User->>Frontend: Types message
    Frontend->>ChatEdge: WebSocket message
    ChatEdge->>Orchestrator: Authenticated request
    Orchestrator->>IntentClassifier: Classify intent
    IntentClassifier-->>Orchestrator: intent = recommendation
    Orchestrator->>Services: Query Recommendation Engine + Catalog
    Services-->>Orchestrator: Product data
    Orchestrator->>LLM: Context + prompt
    LLM-->>Orchestrator: Generated response
    Orchestrator->>Guardrails: Validate response
    Guardrails-->>Orchestrator: Approved response
    Orchestrator-->>ChatEdge: Final response
    ChatEdge-->>Frontend: Stream to user
    Frontend-->>User: Display response

Technology Stack Summary

Layer	Technology	Why
Frontend	React	Existing stack
Transport	WebSocket + HTTPS	Streaming plus reliability
Chat Edge	CloudFront + ALB / API Gateway	WebSocket plus HTTPS ingress
Auth	Internal session validation service	Reuses Amazon auth context
Orchestrator	ECS Fargate service + Lambda burst workers	Streaming-friendly and elastic
Intent Classifier	SageMaker endpoint	Low-latency ML inference
LLM	Amazon Bedrock (Claude 3.5 Sonnet)	Managed and cost-effective
Vector Store	OpenSearch Serverless	Managed vector search
Memory	DynamoDB	Low-latency key-value store
Cache	ElastiCache (Redis)	Sub-millisecond reads for hot data
Catalog	DynamoDB + Elasticsearch	Existing infrastructure
Analytics	CloudWatch + Kinesis + Redshift	Full observability
Escalation	Amazon Connect	Existing support infrastructure