LOCAL PREVIEW View on GitHub

4. End-to-End Architecture - High-Level Design (HLD)

Architecture Overview

MangaAssist is a microservices-based, event-driven chatbot that sits inside Amazon's existing service-oriented architecture. It does not reinvent infrastructure; it composes catalog, order, recommendation, and support services behind a new orchestration layer. The assistant uses a hybrid execution model: structured requests stay on deterministic API or template paths, while open-ended and ambiguous requests use grounded LLM generation.

High-Level Architecture Diagram

graph TB
    subgraph "Client Layer"
        A[Amazon.com JP Manga Store<br>Web / Mobile] -->|WebSocket / HTTPS| B[Chat Edge<br>CloudFront + ALB / API Gateway]
    end

    subgraph "Edge and Auth"
        B --> C[Auth and Session Service]
        B --> D[Rate Limiter]
    end

    subgraph "Orchestration Layer"
        C --> E[Chatbot Orchestrator]
        D --> E
        E --> F[Intent Classifier]
        E --> G[Conversation Memory<br>DynamoDB]
    end

    subgraph "Intelligence Layer"
        E -->|discovery / recommendation| H[Recommendation Engine]
        E -->|faq / policy| I[RAG Pipeline]
        E -->|order / checkout / support| J[Order, Checkout, and Support Router]
        E -->|product question| K[Product Q&A Service]
        E --> L[Bedrock LLM<br>Claude 3.5 Sonnet]
    end

    subgraph "Caching Layer"
        E --> Cache[ElastiCache<br>Product / Promo / Reco Cache]
    end

    subgraph "Data Layer"
        K --> M[Product Catalog]
        J --> N[Order Service]
        J --> O[Returns Service]
        J --> P1[Checkout Service]
        H --> P2[User Profile and History]
        I --> Q[Knowledge Base<br>OpenSearch]
        E --> R[Promotions Service]
    end

    subgraph "Safety and Output"
        L --> S[Guardrails]
        S --> T[Response Formatter]
        T --> B
    end

    subgraph "Observability"
        E --> U[Logging]
        E --> V[Metrics]
        E --> W[Analytics]
        T --> X[Feedback Capture]
    end

    subgraph "Fallback"
        E -->|escalation| Y[Amazon Connect<br>Human Agent Queue]
    end

Component Breakdown

1. Frontend Integration

  • A chat widget is embedded in JP Manga store pages on web and mobile.
  • The widget uses WebSocket for streaming responses and HTTPS fallback for environments that block streaming.
  • WebSocket makes the assistant feel faster because users see tokens as they generate.

2. Chat Edge

  • Single ingress point for chat traffic.
  • Supports WebSocket streaming with HTTPS fallback for clients that cannot hold a persistent connection.
  • Session initialization happens via POST /chat/init, which validates the user and creates a session before messages are exchanged.
  • Handles routing, TLS termination, throttling, and request validation.
  • Decouples the frontend transport from downstream service contracts.
  • WebSocket connections use heartbeat pings every 30 seconds; idle connections are closed after 5 minutes.

3. Authentication and Session

  • Logged-in users are identified through the Amazon session token.
  • Guest users receive a temporary session ID.
  • Personalization features require authentication; discovery and FAQ do not.

4. Rate Limiter

  • Token-bucket rate limiter per user or session.
  • Protects downstream services and limits abuse.
  • Keeps LLM usage under control.

5. Chatbot Orchestrator

  • The central coordinator for every user message.
  • Loads conversation state, calls the intent classifier, fans out to downstream systems, chooses between template/API-first and LLM-backed response paths, runs guardrails, and returns the response.
  • Does not do NLP or data fetching itself; it coordinates work.

6. Intent Classifier

  • Lightweight classifier that maps messages to intents.
  • Common intents include product discovery, product question, FAQ, order tracking, return request, promotion inquiry, recommendation, checkout help, escalation, and chitchat.
  • Deterministic routing is cheaper and faster than sending every message to the LLM.

7. Conversation Memory

  • Stores the last N turns per session.
  • Implemented in DynamoDB with TTL.
  • Supports multi-turn context such as "What about the second one you mentioned?"

8. RAG Pipeline

  • Used for FAQ, policy, and product knowledge.
  • Retrieves chunks from the knowledge base, augments the prompt, and grounds the LLM in real data.

9. Recommendation Engine

  • Returns ranked ASINs based on browsing history, past purchases, and current query.
  • Reuses Amazon's existing personalization strength instead of rebuilding it.

10. Product Catalog Service

  • Provides product data by ASIN: title, author, price, format, availability, images, and review summary.
  • Every product-related response depends on it.

11. User Profile and History

  • Stores user-level preferences, past purchases, browsing history, and reading history.
  • Feeds the Recommendation Engine so results reflect what this specific user has read or bought.
  • Authenticated sessions only; guest sessions use page context and in-session browsing instead.

12. Promotions Service

  • Returns active promotions, coupons, and deals relevant to manga.
  • Queried by the Orchestrator for any response that could benefit from a promotional nudge.
  • Results are cached in ElastiCache with event-driven invalidation.

13. Order, Checkout, and Support Services

  • Existing Order, Returns, and Checkout services handle purchase and post-purchase use cases.
  • They are queried with customer, session, or cart context depending on the workflow.

14. LLM Response Generation

  • Bedrock hosts the generation model.
  • The LLM receives structured context and generates a natural-language response only for recommendation, FAQ, product explanation, and other ambiguous flows.
  • Structured intents such as greetings, simple order lookups, and low-ambiguity promotion answers can bypass the LLM and use templates.

15. Guardrails and Moderation

  • Filters PII leakage, toxic content, off-topic responses, competitor mentions, and hallucinated prices or dates.
  • Guardrails are mandatory because trust is the product.

16. Analytics and Monitoring

  • CloudWatch, Kinesis, and Redshift provide operational and business visibility.
  • Metrics include latency, intent distribution, resolution rate, and escalation rate.

17. Human Handoff

  • Amazon Connect receives the escalation payload.
  • The agent gets a conversation summary plus user context so the customer does not repeat themselves.

18. Caching Layer

  • ElastiCache (Redis) sits between the Orchestrator and frequently-queried data services.
  • Product details, recommendations, and promotions are cached to reduce latency and protect downstream services.
  • Prices are never cached; they are always fetched live to avoid stale pricing.
  • Cache invalidation is event-driven for catalog and promotion changes.

Deployment Architecture

graph TD
    subgraph "Edge"
        CF[CloudFront] --> ALB[Application Load Balancer]
    end

    subgraph "Compute - ECS Fargate Cluster"
        ALB --> ORC[Orchestrator Service<br>Auto-scaling: 10-100 tasks]
        ALB --> WS[WebSocket Handler<br>Sticky sessions]
    end

    subgraph "Compute - Lambda"
        ORC -->|overflow| LAMB[Lambda Burst Workers<br>Concurrency: 1000]
    end

    subgraph "ML Inference"
        ORC --> SM[SageMaker Endpoint<br>Intent Classifier]
        ORC --> BR[Amazon Bedrock<br>Claude 3.5 Sonnet]
    end

    subgraph "Storage"
        ORC --> DDB[DynamoDB<br>Conversation Memory]
        ORC --> OS[OpenSearch Serverless<br>Vector Store]
        ORC --> EC[ElastiCache Redis<br>Response Cache]
    end

    subgraph "Async / Analytics"
        ORC --> KIN[Kinesis Data Stream]
        KIN --> RS[Redshift<br>Analytics Warehouse]
    end

    subgraph "Support"
        ORC --> AC[Amazon Connect<br>Human Handoff]
    end

Error Handling and Degraded Modes

Not every dependency is equally critical. When a service is down, the chatbot degrades gracefully rather than failing entirely.

Service Down Impact Fallback Behavior
Recommendation Engine Cannot personalize suggestions Return trending / popular manga from cache
Product Catalog Cannot show product details Return a search link with a brief apology
Order Service Cannot look up order status Ask the user to check the order tracking page
RAG / Knowledge Base Cannot retrieve grounded context Use LLM with system knowledge only (flag as lower confidence)
LLM / Bedrock Cannot generate natural-language responses Return template-based responses for known intents; show unavailable message for open-ended queries
ElastiCache Cannot serve cached data Fall through to origin services directly (higher latency)
DynamoDB (Memory) Cannot load or save conversation turns Treat request as stateless; warn user that context may be lost

Data Flow Summary

sequenceDiagram
    participant User
    participant Frontend
    participant ChatEdge
    participant Orchestrator
    participant IntentClassifier
    participant Services
    participant LLM
    participant Guardrails

    User->>Frontend: Types message
    Frontend->>ChatEdge: WebSocket message
    ChatEdge->>Orchestrator: Authenticated request
    Orchestrator->>IntentClassifier: Classify intent
    IntentClassifier-->>Orchestrator: intent = recommendation
    Orchestrator->>Services: Query Recommendation Engine + Catalog
    Services-->>Orchestrator: Product data
    Orchestrator->>LLM: Context + prompt
    LLM-->>Orchestrator: Generated response
    Orchestrator->>Guardrails: Validate response
    Guardrails-->>Orchestrator: Approved response
    Orchestrator-->>ChatEdge: Final response
    ChatEdge-->>Frontend: Stream to user
    Frontend-->>User: Display response

Technology Stack Summary

Layer Technology Why
Frontend React Existing stack
Transport WebSocket + HTTPS Streaming plus reliability
Chat Edge CloudFront + ALB / API Gateway WebSocket plus HTTPS ingress
Auth Internal session validation service Reuses Amazon auth context
Orchestrator ECS Fargate service + Lambda burst workers Streaming-friendly and elastic
Intent Classifier SageMaker endpoint Low-latency ML inference
LLM Amazon Bedrock (Claude 3.5 Sonnet) Managed and cost-effective
Vector Store OpenSearch Serverless Managed vector search
Memory DynamoDB Low-latency key-value store
Cache ElastiCache (Redis) Sub-millisecond reads for hot data
Catalog DynamoDB + Elasticsearch Existing infrastructure
Analytics CloudWatch + Kinesis + Redshift Full observability
Escalation Amazon Connect Existing support infrastructure