LOCAL PREVIEW View on GitHub

4b. Low-Level Design (LLD) - Component Deep Dives

This document expands the high-level architecture into service contracts, state transitions, storage schemas, and validation pipelines for MangaAssist.

How to Use This Document

  • Start with LLD-1 to understand the orchestrator, because it is the control plane for every request.
  • Read LLD-2 through LLD-6 next if you want the core AI and data path: intent classification, RAG, memory, prompt construction, and guardrails.
  • Use LLD-7 and LLD-8 as implementation references for APIs and persistence contracts.
  • Use LLD-9 for the escalation and human handoff flow.
  • Use LLD-10 for the caching strategy and cache invalidation design.

LLD-1: Chatbot Orchestrator Service

Class Diagram

classDiagram
    class ChatbotOrchestrator {
        -conversationMemory: ConversationMemoryClient
        -intentClassifier: IntentClassifierClient
        -responseGenerator: ResponseGeneratorClient
        -guardrails: GuardrailsClient
        -metricsEmitter: MetricsEmitter
        +handleMessage(request: ChatRequest): ResponseHandle
        -routeByIntent(intent: Intent, context: ConversationContext): ServiceResponse
        -buildLLMPrompt(intent: Intent, serviceData: Map, history: List~Turn~): Prompt
    }

    class ChatRequest {
        +sessionId: String
        +customerId: String?
        +message: String
        +pageContext: PageContext
        +timestamp: Instant
    }

    class ChatResponse {
        +sessionId: String
        +responseText: String
        +products: List~ProductCard~?
        +actions: List~ActionButton~?
        +metadata: ResponseMetadata
    }

    class ResponseHandle {
        +sessionId: String
        +responseId: String
        +status: String
        +deliveryChannel: String
    }

    class PageContext {
        +currentASIN: String?
        +storeSection: String
        +cartASINs: List~String~
        +browsingHistory: List~String~
    }

    class ConversationContext {
        +sessionId: String
        +turns: List~Turn~
        +userProfile: UserProfile?
        +currentIntent: Intent
        +pageContext: PageContext
    }

    class Turn {
        +role: Role
        +content: String
        +timestamp: Instant
        +intent: Intent?
    }

    class Intent {
        +type: IntentType
        +confidence: Float
        +entities: Map~String, String~
    }

    ChatbotOrchestrator --> ChatRequest
    ChatbotOrchestrator --> ResponseHandle
    ChatbotOrchestrator --> ChatResponse
    ChatbotOrchestrator --> ConversationContext
    ConversationContext --> Turn
    ConversationContext --> PageContext
    Turn --> Intent

Orchestrator Flow - Internal State Machine

stateDiagram-v2
    [*] --> ReceiveMessage
    ReceiveMessage --> LoadContext: Load session + memory
    LoadContext --> ClassifyIntent: Call Intent Classifier
    ClassifyIntent --> RouteToService: Based on intent type

    RouteToService --> ProductDiscovery: product_discovery
    RouteToService --> ProductQA: product_question
    RouteToService --> FAQRetrieval: faq
    RouteToService --> OrderLookup: order_tracking
    RouteToService --> ReturnFlow: return_request
    RouteToService --> PromoLookup: promotion
    RouteToService --> Recommendation: recommendation
    RouteToService --> CheckoutHelp: checkout_help
    RouteToService --> ChitChat: chitchat
    RouteToService --> Escalation: escalation

    ProductDiscovery --> AggregateData
    ProductQA --> AggregateData
    FAQRetrieval --> AggregateData
    OrderLookup --> AggregateData
    ReturnFlow --> AggregateData
    PromoLookup --> AggregateData
    Recommendation --> AggregateData
    CheckoutHelp --> AggregateData
    ChitChat --> TemplateResponse
    Escalation --> HandoffToAgent

    AggregateData --> DecideResponseMode
    DecideResponseMode --> TemplateResponse: structured / low-ambiguity
    DecideResponseMode --> GenerateResponse: explanatory / ambiguous
    GenerateResponse --> ApplyGuardrails
    TemplateResponse --> ApplyGuardrails
    ApplyGuardrails --> SaveTurn: Save to memory
    SaveTurn --> ReturnResponse
    ReturnResponse --> [*]

    HandoffToAgent --> [*]

LLD-2: Intent Classifier

Design

The Intent Classifier is a two-stage system:

Stage 1 - Fast Rule-Based Pre-filter: - Regex patterns catch high-confidence intents cheaply. - Example: Messages containing "where is my order" or "track" -> order_tracking with 0.95 confidence. - Example: Messages containing "return" + "damaged" -> return_request.

Stage 2 - ML Model (Fallback): - If Stage 1 confidence < 0.8, a fine-tuned BERT classifier runs on SageMaker. - Trained on labeled Amazon customer service conversations + manga-specific training data. - Returns intent + confidence + extracted entities (ASIN, series name, volume number).

graph LR
    A[User Message] --> B{Rule-Based<br>Matcher}
    B -->|confidence >= 0.8| C[Return Intent]
    B -->|confidence < 0.8| D[BERT Classifier<br>SageMaker]
    D --> E{confidence >= 0.6?}
    E -->|yes| C
    E -->|no| F[Fallback: general_query<br>Send full message to LLM]

Intent Taxonomy

Intent Example Messages Routed To
product_discovery "Show me horror manga", "What's popular?" Recommendation Engine
product_question "Is this in English?", "How many pages?" Product Catalog
recommendation "Something like One Piece" Recommendation Engine + RAG
faq "What's the return policy?" RAG Pipeline
order_tracking "Where is my order?" Order Service
return_request "I want to return this" Returns Service
promotion "Any deals on manga?" Promotions Service
checkout_help "Can I use gift cards?" FAQ + Checkout Service
escalation "Talk to a human" Human Handoff
chitchat "Hello", "Thanks" Template Response
general_query Low-confidence catch-all (Stage 2 below 0.6) LLM with system prompt only, no service routing

Entity Extraction

{
  "intent": "product_question",
  "confidence": 0.92,
  "entities": {
    "series_name": "Demon Slayer",
    "volume_number": "12",
    "attribute": "language",
    "asin": null
  }
}

LLD-3: RAG Pipeline

Indexing Pipeline

graph LR
    subgraph "Data Sources"
        A[Product Descriptions]
        B[FAQ Pages]
        C[Return Policies]
        D[Editorial Content]
        E[Review Summaries]
    end

    subgraph "Processing"
        A --> F[Chunker<br>Size varies by source type]
        B --> F
        C --> F
        D --> F
        E --> F
        F --> G[Embedding Model<br>Titan Embeddings V2]
        G --> H[OpenSearch Serverless<br>Vector Index]
    end

    subgraph "Metadata"
        F --> I[Attach metadata:<br>source_type, asin,<br>category, last_updated]
        I --> H
    end

Chunk Strategy by Content Type

Content Type Chunk Size Overlap Metadata Rationale
Product descriptions 256 tokens 25 tokens ASIN, category, format Short, self-contained entries
FAQ articles 512 tokens 50 tokens topic, last_updated Longer explanatory content
Return and shipping policies 512 tokens 50 tokens policy_type, region Must preserve full policy context
Editorial content 512 tokens 50 tokens genre, author Narrative content needs larger windows
Review summaries 128 tokens 0 ASIN, sentiment Already condensed; no overlap needed

Index Refresh Strategy

  • Product descriptions and prices: Re-indexed every 6 hours or on catalog change events.
  • FAQ and policy pages: Re-indexed daily during off-peak hours.
  • Editorial content: Re-indexed weekly or on publish.
  • Review summaries: Re-indexed daily.
  • Stale chunk handling: Each chunk carries a last_updated timestamp. Chunks older than their source's refresh cycle are deprioritized during retrieval.

Retrieval Flow

sequenceDiagram
    participant Orchestrator
    participant Embedder
    participant VectorStore
    participant Reranker
    participant LLM

    Orchestrator->>Embedder: Embed user query
    Embedder-->>Orchestrator: Query vector (1536-dim)
    Orchestrator->>VectorStore: KNN search (top 10)
    VectorStore-->>Orchestrator: 10 candidate chunks
    Orchestrator->>Reranker: Rerank by relevance
    Reranker-->>Orchestrator: Top 3 chunks
    Orchestrator->>LLM: System prompt + 3 chunks + query
    LLM-->>Orchestrator: Grounded response

Chunk Schema

{
  "chunk_id": "faq-return-policy-003",
  "content": "Manga volumes can be returned within 30 days of delivery if they are in original condition. Damaged items can be returned for a full refund or replacement.",
  "source_type": "faq",
  "source_url": "/help/returns",
  "asin": null,
  "category": "manga",
  "embedding": [0.012, -0.034, ...],
  "last_updated": "2025-12-01"
}

LLD-4: Conversation Memory

DynamoDB Schema

Attribute Type Applies To Description
pk String All items SESSION#<session_id>
sk String All items META, TURN#<timestamp>, or SUMMARY#<window_id>
customer_id (GSI1PK) String Meta item Amazon customer ID for authenticated session lookup
updated_at (GSI1SK) Number Meta item Supports fetching the most recent sessions
role String Turn item user, assistant, or system
content String Turn and summary items Raw turn text or compressed summary
intent String Turn item Classified intent for the turn
page_context Map Meta item Latest page ASIN, store section, and cart snapshot
created_at Number All items Epoch timestamp
ttl Number Meta and turn items Epoch + 86400 (24-hour expiry)
turn_count Number Meta item Total turns in session

Memory Management

graph TD
    A[New Message Arrives] --> B[Read META + latest turns]
    B --> C{Turn count > 20?}
    C -->|No| D[Write new TURN item]
    C -->|Yes| E[Summarize oldest unsummarized window<br>Window size: 10 turns]
    E --> F[Write SUMMARY item]
    F --> D
    D --> G[Update META item<br>turn_count, updated_at, last_intent]

Window size for summarization: 10 turns (5 user messages + 5 assistant responses). The LLM compresses these into a 2-3 sentence summary preserving: what the user was looking for, what was recommended, and any unresolved issues.

Why summarize instead of truncate? Summarization preserves key context (what the user was looking for, what was recommended) while reducing token count for LLM prompts.

Why not store the full conversation as one DynamoDB item? Separate turn items avoid the 400 KB item-size ceiling, reduce rewrite amplification on every new message, and make concurrent writes safer during streaming or retries.

DynamoDB throttle handling: If a TURN write is throttled, the Orchestrator retries with exponential backoff (max 2 retries). If the write still fails, the response is still delivered to the user, but the turn is enqueued to an SQS retry queue and persisted by an async worker to avoid blocking the response path. Messages that exhaust the worker's retries land in a DLQ for manual review.


LLD-5: Response Generation - Prompt Engineering

System Prompt Template

You are MangaAssist, a helpful shopping assistant for the JP Manga store on Amazon.com.

RULES:
1. Only recommend products that exist in the provided product data.
2. Never invent prices, availability, or delivery dates.
3. If you don't have the information, say so and offer to help differently.
4. Keep responses concise (2-4 sentences for simple questions, up to a short paragraph for recommendations).
5. Always include product links when recommending items.
6. Never discuss competitors by name.
7. If the user needs help beyond your capabilities, offer to connect them with a support agent.

CONTEXT:
- Store section: {{store_section}}
- Current product (if any): {{current_product_json}}
- User's recent browsing: {{browsing_history}}
- Active promotions: {{active_promos}}

RETRIEVED INFORMATION:
{{rag_chunks}}

CONVERSATION HISTORY:
{{conversation_turns}}

USER MESSAGE:
{{user_message}}

Response Format Contract

{
  "response_text": "Based on your love for action manga, here are 3 titles you might enjoy:",
  "products": [
    {
      "asin": "B08X1YRSTR",
      "title": "Chainsaw Man, Vol. 1",
      "price": "$9.99",
      "image_url": "https://...",
      "product_url": "https://amazon.com/dp/..."
    }
  ],
  "actions": [
    {
      "label": "Add to Cart",
      "type": "add_to_cart",
      "asin": "B08X1YRSTR"
    },
    {
      "label": "See More Like This",
      "type": "more_recommendations"
    }
  ],
  "follow_up_suggestions": [
    "Tell me about the art style",
    "Is there a box set?",
    "Show me horror manga instead"
  ]
}

LLD-6: Guardrails Pipeline

graph LR
    A[User Message] --> B[Input Safety<br>PII scrub, prompt injection, abuse checks]
    B --> C{Blocked?}
    C -->|Yes| H[Safe fallback or escalation]
    C -->|No| D[LLM or Template Response]
    D --> E[Output Safety<br>PII, price, toxicity, competitor, ASIN, scope]
    E --> F{All checks pass?}
    F -->|Yes| G[Return to User]
    F -->|No| H[Fallback Response<br>+ Log for review]

Guardrail Rules

Rule What It Checks Action on Failure
Input PII Scrub SSN, credit card, phone, email, address in user input before LLM or analytics Mask sensitive tokens and log
Prompt Injection Instruction override attempts or encoded jailbreak patterns Block or safe-refuse; log
Output PII Redaction Sensitive PII reproduced in output Redact and log
Price Accuracy Prices in response match catalog Replace with correct price
Toxicity Offensive or inappropriate content Block response, return safe fallback
Competitor Mention Names of competitors (Barnes & Noble, etc.) Remove mention
ASIN Validation Product IDs mentioned actually exist Remove invalid product
Scope Check Response stays on topic (manga/Amazon) Redirect to on-topic

LLD-7: API Contracts

The transport is split into synchronous submission and asynchronous delivery so the contract matches the streaming architecture.

POST /chat/init

Creates a new chat session and returns a welcome message. Must be called before sending messages.

Request:

{
  "session_token": "amzn-session-xyz789",
  "page_context": {
    "current_asin": "B08X1YRSTR",
    "store_section": "manga-home",
    "url": "/stores/page/jp-manga"
  },
  "locale": "ja_JP"
}

Response (200 OK):

{
  "session_id": "sess_abc123",
  "customer_id": "C123",
  "is_authenticated": true,
  "welcome_message": "Welcome to the JP Manga store! How can I help you today?",
  "quick_actions": [
    { "label": "Browse Popular Manga", "type": "discovery" },
    { "label": "Track My Order", "type": "order_tracking" },
    { "label": "Get Recommendations", "type": "recommendation" }
  ],
  "websocket_url": "wss://chat.amazon.com/ws/sess_abc123"
}

POST /chat/message

Request:

{
  "session_id": "sess_abc123",
  "message": "Recommend something like Naruto",
  "page_context": {
    "current_asin": null,
    "store_section": "manga-home",
    "cart_asins": ["B09XYZ"],
    "url": "/stores/page/jp-manga"
  }
}

Response (202 Accepted):

{
  "session_id": "sess_abc123",
  "response_id": "resp_def456",
  "status": "accepted",
  "delivery_channel": "websocket"
}

WebSocket Events

Delta event:

{
  "type": "chat.response.delta",
  "session_id": "sess_abc123",
  "response_id": "resp_def456",
  "delta": "If you loved Naruto, you'll enjoy these..."
}

Completed event:

{
  "type": "chat.response.completed",
  "session_id": "sess_abc123",
  "response_id": "resp_def456",
  "response_text": "If you loved Naruto, you'll enjoy these...",
  "products": [...],
  "actions": [...],
  "follow_up_suggestions": [...],
  "metadata": {
    "intent": "recommendation",
    "latency_ms": 1842,
    "model": "claude-3-5-sonnet",
    "sources": ["recommendation_engine", "product_catalog"]
  }
}

GET /chat/message/{response_id}

HTTPS fallback for clients that cannot keep a WebSocket connection. Returns the latest buffered state or the final completed response.

POST /chat/feedback

{
  "response_id": "resp_def456",
  "feedback": "thumbs_up",
  "comment": null
}

POST /chat/escalate

{
  "session_id": "sess_abc123",
  "reason": "user_requested",
  "summary": "User wants to return damaged Naruto Vol 5, order #112-3456789"
}

Error Response Schema

All endpoints return a consistent error structure on failure.

{
  "error": {
    "code": "RATE_LIMITED",
    "message": "Too many requests. Please wait a moment and try again.",
    "retry_after_ms": 5000
  }
}
HTTP Status Error Code When
400 INVALID_REQUEST Missing required fields or malformed payload
401 UNAUTHORIZED Session token expired or invalid
403 FORBIDDEN User banned or action not allowed for guest
404 SESSION_NOT_FOUND Session ID does not exist or expired
429 RATE_LIMITED Token bucket exhausted
500 INTERNAL_ERROR Unexpected server error
503 SERVICE_UNAVAILABLE Downstream dependency failure; includes fallback message if available

LLD-8: Database Schemas

Conversation Memory (DynamoDB)

Table: manga_chatbot_memory
  PK: pk (String)                  -- SESSION#<session_id>
  SK: sk (String)                  -- META | TURN#<timestamp> | SUMMARY#<window_id>

  GSI1PK: customer_id (String)     -- nullable for guests
  GSI1SK: updated_at (Number)

  META item:
    session_id: String
    customer_id: String (nullable)
    page_context: Map
    created_at: Number
    updated_at: Number
    ttl: Number
    turn_count: Number
    last_intent: String

  TURN item:
    role: String
    content: String
    intent: String
    created_at: Number
    response_id: String
    guardrail_flags: List<String>

  SUMMARY item:
    covered_turns: String
    content: String
    created_at: Number

Analytics Events (Kinesis -> Redshift)

CREATE TABLE chatbot_events (
    event_id        VARCHAR(64) PRIMARY KEY,
    session_id      VARCHAR(64),
    customer_id     VARCHAR(64),
    event_type      VARCHAR(32),   -- message, response, feedback, escalation
    intent          VARCHAR(32),
    message_text    VARCHAR(2000), -- PII-scrubbed
    response_text   VARCHAR(4000),
    products_shown  VARCHAR(500),  -- comma-separated ASINs
    latency_ms      INTEGER,
    model_id        VARCHAR(64),
    feedback        VARCHAR(16),   -- thumbs_up, thumbs_down, null
    created_at      TIMESTAMP
);

RAG Knowledge Base Index (OpenSearch)

{
  "mappings": {
    "properties": {
      "chunk_id": { "type": "keyword" },
      "content": { "type": "text" },
      "embedding": {
        "type": "knn_vector",
        "dimension": 1536,
        "method": { "name": "hnsw", "engine": "nmslib" }
      },
      "source_type": { "type": "keyword" },
      "asin": { "type": "keyword" },
      "category": { "type": "keyword" },
      "last_updated": { "type": "date" }
    }
  }
}

LLD-9: Escalation and Human Handoff

Escalation Flow

sequenceDiagram
    participant User
    participant Orchestrator
    participant SummaryLLM as LLM (Summary)
    participant Memory as DynamoDB
    participant Connect as Amazon Connect
    participant Agent as Human Agent

    User->>Orchestrator: "I want to talk to a human"
    Orchestrator->>Memory: Load full conversation history
    Memory-->>Orchestrator: All turns + metadata
    Orchestrator->>SummaryLLM: Summarize conversation for agent handoff
    SummaryLLM-->>Orchestrator: Summary text
    Orchestrator->>Connect: POST /escalate {session_id, summary, customer_id, intent_history, sentiment}
    Connect-->>Orchestrator: {queue_id, estimated_wait_seconds}
    Orchestrator-->>User: "Connecting you with a support agent. Estimated wait: ~2 minutes."
    Connect->>Agent: Route to available agent with context payload
    Agent-->>User: "Hi, I see you need help with a damaged manga return. Let me help."

Escalation Payload to Amazon Connect

{
  "session_id": "sess_abc123",
  "customer_id": "C123",
  "is_prime": true,
  "reason": "user_requested",
  "conversation_summary": "Customer wants to return Naruto Vol 5 (ASIN B09XYZ) from order #112-3456789 due to water damage. Bot confirmed the return window is still open.",
  "intent_history": ["product_question", "return_request", "escalation"],
  "sentiment": "frustrated",
  "turn_count": 6,
  "unresolved_issue": "Return label generation failed"
}

LLD-10: Caching Strategy

Cache Architecture

graph TD
    subgraph "Orchestrator"
        A[Request Handler]
    end

    subgraph "ElastiCache Redis Cluster"
        B[Product Cache<br>TTL: 5 min]
        C[Recommendation Cache<br>TTL: 15 min]
        D[Promotion Cache<br>TTL: 15 min]
        E[Review Cache<br>TTL: 1 hour]
    end

    subgraph "Origin Services"
        F[Product Catalog]
        G[Recommendation Engine]
        H[Promotions Service]
        I[Reviews Service]
    end

    A -->|cache-aside read| B
    B -->|miss| F
    F -->|populate cache| B

    A -->|cache-aside read| C
    C -->|miss| G
    G -->|populate cache| C

    A -->|cache-aside read| D
    D -->|miss| H
    H -->|populate cache| D

    A -->|cache-aside read| E
    E -->|miss| I
    I -->|populate cache| E

Cache Rules

Data Cache Key Pattern TTL Invalidation
Product details product:{asin} 5 min Catalog change event via SNS
Recommendations reco:{user_id}:{seed_asin} 15 min New session or explicit refresh
Promotions promo:{store_section} 15 min Promotion change event via SNS
Reviews / ratings review:{asin} 1 hour Scheduled refresh
Prices Never cached N/A Always fetched live

Why cache-aside? The Orchestrator checks the cache first and only calls the origin service on a miss. This pattern is simple, avoids stale-read complexity, and keeps the cache layer optional — if ElastiCache is down, requests fall through to origin services with higher latency but correct data.