4b. Low-Level Design (LLD) - Component Deep Dives

This document expands the high-level architecture into service contracts, state transitions, storage schemas, and validation pipelines for MangaAssist.

How to Use This Document

Start with LLD-1 to understand the orchestrator, because it is the control plane for every request.
Read LLD-2 through LLD-6 next if you want the core AI and data path: intent classification, RAG, memory, prompt construction, and guardrails.
Use LLD-7 and LLD-8 as implementation references for APIs and persistence contracts.
Use LLD-9 for the escalation and human handoff flow.
Use LLD-10 for the caching strategy and cache invalidation design.

LLD-1: Chatbot Orchestrator Service

Class Diagram

classDiagram
    class ChatbotOrchestrator {
        -conversationMemory: ConversationMemoryClient
        -intentClassifier: IntentClassifierClient
        -responseGenerator: ResponseGeneratorClient
        -guardrails: GuardrailsClient
        -metricsEmitter: MetricsEmitter
        +handleMessage(request: ChatRequest): ResponseHandle
        -routeByIntent(intent: Intent, context: ConversationContext): ServiceResponse
        -buildLLMPrompt(intent: Intent, serviceData: Map, history: List~Turn~): Prompt
    }

    class ChatRequest {
        +sessionId: String
        +customerId: String?
        +message: String
        +pageContext: PageContext
        +timestamp: Instant
    }

    class ChatResponse {
        +sessionId: String
        +responseText: String
        +products: List~ProductCard~?
        +actions: List~ActionButton~?
        +metadata: ResponseMetadata
    }

    class ResponseHandle {
        +sessionId: String
        +responseId: String
        +status: String
        +deliveryChannel: String
    }

    class PageContext {
        +currentASIN: String?
        +storeSection: String
        +cartASINs: List~String~
        +browsingHistory: List~String~
    }

    class ConversationContext {
        +sessionId: String
        +turns: List~Turn~
        +userProfile: UserProfile?
        +currentIntent: Intent
        +pageContext: PageContext
    }

    class Turn {
        +role: Role
        +content: String
        +timestamp: Instant
        +intent: Intent?
    }

    class Intent {
        +type: IntentType
        +confidence: Float
        +entities: Map~String, String~
    }

    ChatbotOrchestrator --> ChatRequest
    ChatbotOrchestrator --> ResponseHandle
    ChatbotOrchestrator --> ChatResponse
    ChatbotOrchestrator --> ConversationContext
    ConversationContext --> Turn
    ConversationContext --> PageContext
    Turn --> Intent

Orchestrator Flow - Internal State Machine

stateDiagram-v2
    [*] --> ReceiveMessage
    ReceiveMessage --> LoadContext: Load session + memory
    LoadContext --> ClassifyIntent: Call Intent Classifier
    ClassifyIntent --> RouteToService: Based on intent type

    RouteToService --> ProductDiscovery: product_discovery
    RouteToService --> ProductQA: product_question
    RouteToService --> FAQRetrieval: faq
    RouteToService --> OrderLookup: order_tracking
    RouteToService --> ReturnFlow: return_request
    RouteToService --> PromoLookup: promotion
    RouteToService --> Recommendation: recommendation
    RouteToService --> CheckoutHelp: checkout_help
    RouteToService --> ChitChat: chitchat
    RouteToService --> Escalation: escalation

    ProductDiscovery --> AggregateData
    ProductQA --> AggregateData
    FAQRetrieval --> AggregateData
    OrderLookup --> AggregateData
    ReturnFlow --> AggregateData
    PromoLookup --> AggregateData
    Recommendation --> AggregateData
    CheckoutHelp --> AggregateData
    ChitChat --> TemplateResponse
    Escalation --> HandoffToAgent

    AggregateData --> DecideResponseMode
    DecideResponseMode --> TemplateResponse: structured / low-ambiguity
    DecideResponseMode --> GenerateResponse: explanatory / ambiguous
    GenerateResponse --> ApplyGuardrails
    TemplateResponse --> ApplyGuardrails
    ApplyGuardrails --> SaveTurn: Save to memory
    SaveTurn --> ReturnResponse
    ReturnResponse --> [*]

    HandoffToAgent --> [*]

LLD-2: Intent Classifier

Design

The Intent Classifier is a two-stage system:

Stage 1 - Fast Rule-Based Pre-filter: - Regex patterns catch high-confidence intents cheaply. - Example: Messages containing "where is my order" or "track" -> order_tracking with 0.95 confidence. - Example: Messages containing "return" + "damaged" -> return_request.

Stage 2 - ML Model (Fallback): - If Stage 1 confidence < 0.8, a fine-tuned BERT classifier runs on SageMaker. - Trained on labeled Amazon customer service conversations + manga-specific training data. - Returns intent + confidence + extracted entities (ASIN, series name, volume number).

graph LR
    A[User Message] --> B{Rule-Based<br>Matcher}
    B -->|confidence >= 0.8| C[Return Intent]
    B -->|confidence < 0.8| D[BERT Classifier<br>SageMaker]
    D --> E{confidence >= 0.6?}
    E -->|yes| C
    E -->|no| F[Fallback: general_query<br>Send full message to LLM]

Intent Taxonomy

Intent	Example Messages	Routed To
`product_discovery`	"Show me horror manga", "What's popular?"	Recommendation Engine
`product_question`	"Is this in English?", "How many pages?"	Product Catalog
`recommendation`	"Something like One Piece"	Recommendation Engine + RAG
`faq`	"What's the return policy?"	RAG Pipeline
`order_tracking`	"Where is my order?"	Order Service
`return_request`	"I want to return this"	Returns Service
`promotion`	"Any deals on manga?"	Promotions Service
`checkout_help`	"Can I use gift cards?"	FAQ + Checkout Service
`escalation`	"Talk to a human"	Human Handoff
`chitchat`	"Hello", "Thanks"	Template Response
`general_query`	Low-confidence catch-all (Stage 2 below 0.6)	LLM with system prompt only, no service routing

Entity Extraction

{
  "intent": "product_question",
  "confidence": 0.92,
  "entities": {
    "series_name": "Demon Slayer",
    "volume_number": "12",
    "attribute": "language",
    "asin": null
  }
}

LLD-3: RAG Pipeline

Indexing Pipeline

graph LR
    subgraph "Data Sources"
        A[Product Descriptions]
        B[FAQ Pages]
        C[Return Policies]
        D[Editorial Content]
        E[Review Summaries]
    end

    subgraph "Processing"
        A --> F[Chunker<br>Size varies by source type]
        B --> F
        C --> F
        D --> F
        E --> F
        F --> G[Embedding Model<br>Titan Embeddings V2]
        G --> H[OpenSearch Serverless<br>Vector Index]
    end

    subgraph "Metadata"
        F --> I[Attach metadata:<br>source_type, asin,<br>category, last_updated]
        I --> H
    end

Chunk Strategy by Content Type

Content Type	Chunk Size	Overlap	Metadata	Rationale
Product descriptions	256 tokens	25 tokens	ASIN, category, format	Short, self-contained entries
FAQ articles	512 tokens	50 tokens	topic, last_updated	Longer explanatory content
Return and shipping policies	512 tokens	50 tokens	policy_type, region	Must preserve full policy context
Editorial content	512 tokens	50 tokens	genre, author	Narrative content needs larger windows
Review summaries	128 tokens	0	ASIN, sentiment	Already condensed; no overlap needed

Index Refresh Strategy

Product descriptions and prices: Re-indexed every 6 hours or on catalog change events.
FAQ and policy pages: Re-indexed daily during off-peak hours.
Editorial content: Re-indexed weekly or on publish.
Review summaries: Re-indexed daily.
Stale chunk handling: Each chunk carries a last_updated timestamp. Chunks older than their source's refresh cycle are deprioritized during retrieval.

Retrieval Flow

sequenceDiagram
    participant Orchestrator
    participant Embedder
    participant VectorStore
    participant Reranker
    participant LLM

    Orchestrator->>Embedder: Embed user query
    Embedder-->>Orchestrator: Query vector (1536-dim)
    Orchestrator->>VectorStore: KNN search (top 10)
    VectorStore-->>Orchestrator: 10 candidate chunks
    Orchestrator->>Reranker: Rerank by relevance
    Reranker-->>Orchestrator: Top 3 chunks
    Orchestrator->>LLM: System prompt + 3 chunks + query
    LLM-->>Orchestrator: Grounded response

Chunk Schema

{
  "chunk_id": "faq-return-policy-003",
  "content": "Manga volumes can be returned within 30 days of delivery if they are in original condition. Damaged items can be returned for a full refund or replacement.",
  "source_type": "faq",
  "source_url": "/help/returns",
  "asin": null,
  "category": "manga",
  "embedding": [0.012, -0.034, ...],
  "last_updated": "2025-12-01"
}

LLD-4: Conversation Memory

DynamoDB Schema

Attribute	Type	Applies To	Description
`pk`	String	All items	`SESSION#<session_id>`
`sk`	String	All items	`META`, `TURN#<timestamp>`, or `SUMMARY#<window_id>`
`customer_id` (GSI1PK)	String	Meta item	Amazon customer ID for authenticated session lookup
`updated_at` (GSI1SK)	Number	Meta item	Supports fetching the most recent sessions
`role`	String	Turn item	`user`, `assistant`, or `system`
`content`	String	Turn and summary items	Raw turn text or compressed summary
`intent`	String	Turn item	Classified intent for the turn
`page_context`	Map	Meta item	Latest page ASIN, store section, and cart snapshot
`created_at`	Number	All items	Epoch timestamp
`ttl`	Number	Meta and turn items	Epoch + 86400 (24-hour expiry)
`turn_count`	Number	Meta item	Total turns in session

Memory Management

graph TD
    A[New Message Arrives] --> B[Read META + latest turns]
    B --> C{Turn count > 20?}
    C -->|No| D[Write new TURN item]
    C -->|Yes| E[Summarize oldest unsummarized window<br>Window size: 10 turns]
    E --> F[Write SUMMARY item]
    F --> D
    D --> G[Update META item<br>turn_count, updated_at, last_intent]

Window size for summarization: 10 turns (5 user messages + 5 assistant responses). The LLM compresses these into a 2-3 sentence summary preserving: what the user was looking for, what was recommended, and any unresolved issues.

Why summarize instead of truncate? Summarization preserves key context (what the user was looking for, what was recommended) while reducing token count for LLM prompts.

Why not store the full conversation as one DynamoDB item? Separate turn items avoid the 400 KB item-size ceiling, reduce rewrite amplification on every new message, and make concurrent writes safer during streaming or retries.

DynamoDB throttle handling: If a TURN write is throttled, the Orchestrator retries with exponential backoff (max 2 retries). If the write still fails, the response is still delivered to the user, but the turn is enqueued to an SQS retry queue and persisted by an async worker to avoid blocking the response path. Messages that exhaust the worker's retries land in a DLQ for manual review.

LLD-5: Response Generation - Prompt Engineering

System Prompt Template

You are MangaAssist, a helpful shopping assistant for the JP Manga store on Amazon.com.

RULES:
1. Only recommend products that exist in the provided product data.
2. Never invent prices, availability, or delivery dates.
3. If you don't have the information, say so and offer to help differently.
4. Keep responses concise (2-4 sentences for simple questions, up to a short paragraph for recommendations).
5. Always include product links when recommending items.
6. Never discuss competitors by name.
7. If the user needs help beyond your capabilities, offer to connect them with a support agent.

CONTEXT:
- Store section: {{store_section}}
- Current product (if any): {{current_product_json}}
- User's recent browsing: {{browsing_history}}
- Active promotions: {{active_promos}}

RETRIEVED INFORMATION:
{{rag_chunks}}

CONVERSATION HISTORY:
{{conversation_turns}}

USER MESSAGE:
{{user_message}}

Response Format Contract

{
  "response_text": "Based on your love for action manga, here are 3 titles you might enjoy:",
  "products": [
    {
      "asin": "B08X1YRSTR",
      "title": "Chainsaw Man, Vol. 1",
      "price": "$9.99",
      "image_url": "https://...",
      "product_url": "https://amazon.com/dp/..."
    }
  ],
  "actions": [
    {
      "label": "Add to Cart",
      "type": "add_to_cart",
      "asin": "B08X1YRSTR"
    },
    {
      "label": "See More Like This",
      "type": "more_recommendations"
    }
  ],
  "follow_up_suggestions": [
    "Tell me about the art style",
    "Is there a box set?",
    "Show me horror manga instead"
  ]
}

LLD-6: Guardrails Pipeline

graph LR
    A[User Message] --> B[Input Safety<br>PII scrub, prompt injection, abuse checks]
    B --> C{Blocked?}
    C -->|Yes| H[Safe fallback or escalation]
    C -->|No| D[LLM or Template Response]
    D --> E[Output Safety<br>PII, price, toxicity, competitor, ASIN, scope]
    E --> F{All checks pass?}
    F -->|Yes| G[Return to User]
    F -->|No| H[Fallback Response<br>+ Log for review]

Guardrail Rules

Rule	What It Checks	Action on Failure
Input PII Scrub	SSN, credit card, phone, email, address in user input before LLM or analytics	Mask sensitive tokens and log
Prompt Injection	Instruction override attempts or encoded jailbreak patterns	Block or safe-refuse; log
Output PII Redaction	Sensitive PII reproduced in output	Redact and log
Price Accuracy	Prices in response match catalog	Replace with correct price
Toxicity	Offensive or inappropriate content	Block response, return safe fallback
Competitor Mention	Names of competitors (Barnes & Noble, etc.)	Remove mention
ASIN Validation	Product IDs mentioned actually exist	Remove invalid product
Scope Check	Response stays on topic (manga/Amazon)	Redirect to on-topic

LLD-7: API Contracts

The transport is split into synchronous submission and asynchronous delivery so the contract matches the streaming architecture.

POST /chat/init

Creates a new chat session and returns a welcome message. Must be called before sending messages.

Request:

{
  "session_token": "amzn-session-xyz789",
  "page_context": {
    "current_asin": "B08X1YRSTR",
    "store_section": "manga-home",
    "url": "/stores/page/jp-manga"
  },
  "locale": "ja_JP"
}

Response (200 OK):

{
  "session_id": "sess_abc123",
  "customer_id": "C123",
  "is_authenticated": true,
  "welcome_message": "Welcome to the JP Manga store! How can I help you today?",
  "quick_actions": [
    { "label": "Browse Popular Manga", "type": "discovery" },
    { "label": "Track My Order", "type": "order_tracking" },
    { "label": "Get Recommendations", "type": "recommendation" }
  ],
  "websocket_url": "wss://chat.amazon.com/ws/sess_abc123"
}

POST /chat/message

Request:

{
  "session_id": "sess_abc123",
  "message": "Recommend something like Naruto",
  "page_context": {
    "current_asin": null,
    "store_section": "manga-home",
    "cart_asins": ["B09XYZ"],
    "url": "/stores/page/jp-manga"
  }
}

Response (202 Accepted):

{
  "session_id": "sess_abc123",
  "response_id": "resp_def456",
  "status": "accepted",
  "delivery_channel": "websocket"
}

WebSocket Events

Delta event:

{
  "type": "chat.response.delta",
  "session_id": "sess_abc123",
  "response_id": "resp_def456",
  "delta": "If you loved Naruto, you'll enjoy these..."
}

Completed event:

{
  "type": "chat.response.completed",
  "session_id": "sess_abc123",
  "response_id": "resp_def456",
  "response_text": "If you loved Naruto, you'll enjoy these...",
  "products": [...],
  "actions": [...],
  "follow_up_suggestions": [...],
  "metadata": {
    "intent": "recommendation",
    "latency_ms": 1842,
    "model": "claude-3-5-sonnet",
    "sources": ["recommendation_engine", "product_catalog"]
  }
}

GET /chat/message/{response_id}

HTTPS fallback for clients that cannot keep a WebSocket connection. Returns the latest buffered state or the final completed response.

POST /chat/feedback

{
  "response_id": "resp_def456",
  "feedback": "thumbs_up",
  "comment": null
}

POST /chat/escalate

{
  "session_id": "sess_abc123",
  "reason": "user_requested",
  "summary": "User wants to return damaged Naruto Vol 5, order #112-3456789"
}

Error Response Schema

All endpoints return a consistent error structure on failure.

{
  "error": {
    "code": "RATE_LIMITED",
    "message": "Too many requests. Please wait a moment and try again.",
    "retry_after_ms": 5000
  }
}

HTTP Status	Error Code	When
400	`INVALID_REQUEST`	Missing required fields or malformed payload
401	`UNAUTHORIZED`	Session token expired or invalid
403	`FORBIDDEN`	User banned or action not allowed for guest
404	`SESSION_NOT_FOUND`	Session ID does not exist or expired
429	`RATE_LIMITED`	Token bucket exhausted
500	`INTERNAL_ERROR`	Unexpected server error
503	`SERVICE_UNAVAILABLE`	Downstream dependency failure; includes fallback message if available

LLD-8: Database Schemas

Conversation Memory (DynamoDB)

Table: manga_chatbot_memory
  PK: pk (String)                  -- SESSION#<session_id>
  SK: sk (String)                  -- META | TURN#<timestamp> | SUMMARY#<window_id>

  GSI1PK: customer_id (String)     -- nullable for guests
  GSI1SK: updated_at (Number)

  META item:
    session_id: String
    customer_id: String (nullable)
    page_context: Map
    created_at: Number
    updated_at: Number
    ttl: Number
    turn_count: Number
    last_intent: String

  TURN item:
    role: String
    content: String
    intent: String
    created_at: Number
    response_id: String
    guardrail_flags: List<String>

  SUMMARY item:
    covered_turns: String
    content: String
    created_at: Number

Analytics Events (Kinesis -> Redshift)

CREATE TABLE chatbot_events (
    event_id        VARCHAR(64) PRIMARY KEY,
    session_id      VARCHAR(64),
    customer_id     VARCHAR(64),
    event_type      VARCHAR(32),   -- message, response, feedback, escalation
    intent          VARCHAR(32),
    message_text    VARCHAR(2000), -- PII-scrubbed
    response_text   VARCHAR(4000),
    products_shown  VARCHAR(500),  -- comma-separated ASINs
    latency_ms      INTEGER,
    model_id        VARCHAR(64),
    feedback        VARCHAR(16),   -- thumbs_up, thumbs_down, null
    created_at      TIMESTAMP
);

RAG Knowledge Base Index (OpenSearch)

{
  "mappings": {
    "properties": {
      "chunk_id": { "type": "keyword" },
      "content": { "type": "text" },
      "embedding": {
        "type": "knn_vector",
        "dimension": 1536,
        "method": { "name": "hnsw", "engine": "nmslib" }
      },
      "source_type": { "type": "keyword" },
      "asin": { "type": "keyword" },
      "category": { "type": "keyword" },
      "last_updated": { "type": "date" }
    }
  }
}

LLD-9: Escalation and Human Handoff

Escalation Flow

sequenceDiagram
    participant User
    participant Orchestrator
    participant SummaryLLM as LLM (Summary)
    participant Memory as DynamoDB
    participant Connect as Amazon Connect
    participant Agent as Human Agent

    User->>Orchestrator: "I want to talk to a human"
    Orchestrator->>Memory: Load full conversation history
    Memory-->>Orchestrator: All turns + metadata
    Orchestrator->>SummaryLLM: Summarize conversation for agent handoff
    SummaryLLM-->>Orchestrator: Summary text
    Orchestrator->>Connect: POST /escalate {session_id, summary, customer_id, intent_history, sentiment}
    Connect-->>Orchestrator: {queue_id, estimated_wait_seconds}
    Orchestrator-->>User: "Connecting you with a support agent. Estimated wait: ~2 minutes."
    Connect->>Agent: Route to available agent with context payload
    Agent-->>User: "Hi, I see you need help with a damaged manga return. Let me help."

Escalation Payload to Amazon Connect

{
  "session_id": "sess_abc123",
  "customer_id": "C123",
  "is_prime": true,
  "reason": "user_requested",
  "conversation_summary": "Customer wants to return Naruto Vol 5 (ASIN B09XYZ) from order #112-3456789 due to water damage. Bot confirmed the return window is still open.",
  "intent_history": ["product_question", "return_request", "escalation"],
  "sentiment": "frustrated",
  "turn_count": 6,
  "unresolved_issue": "Return label generation failed"
}

LLD-10: Caching Strategy

Cache Architecture

graph TD
    subgraph "Orchestrator"
        A[Request Handler]
    end

    subgraph "ElastiCache Redis Cluster"
        B[Product Cache<br>TTL: 5 min]
        C[Recommendation Cache<br>TTL: 15 min]
        D[Promotion Cache<br>TTL: 15 min]
        E[Review Cache<br>TTL: 1 hour]
    end

    subgraph "Origin Services"
        F[Product Catalog]
        G[Recommendation Engine]
        H[Promotions Service]
        I[Reviews Service]
    end

    A -->|cache-aside read| B
    B -->|miss| F
    F -->|populate cache| B

    A -->|cache-aside read| C
    C -->|miss| G
    G -->|populate cache| C

    A -->|cache-aside read| D
    D -->|miss| H
    H -->|populate cache| D

    A -->|cache-aside read| E
    E -->|miss| I
    I -->|populate cache| E

Cache Rules

Data	Cache Key Pattern	TTL	Invalidation
Product details	`product:{asin}`	5 min	Catalog change event via SNS
Recommendations	`reco:{user_id}:{seed_asin}`	15 min	New session or explicit refresh
Promotions	`promo:{store_section}`	15 min	Promotion change event via SNS
Reviews / ratings	`review:{asin}`	1 hour	Scheduled refresh
Prices	Never cached	N/A	Always fetched live

Why cache-aside? The Orchestrator checks the cache first and only calls the origin service on a miss. This pattern is simple, avoids stale-read complexity, and keeps the cache layer optional — if ElastiCache is down, requests fall through to origin services with higher latency but correct data.