4b. Low-Level Design (LLD) - Component Deep Dives
This document expands the high-level architecture into service contracts, state transitions, storage schemas, and validation pipelines for MangaAssist.
How to Use This Document
- Start with
LLD-1to understand the orchestrator, because it is the control plane for every request. - Read
LLD-2throughLLD-6next if you want the core AI and data path: intent classification, RAG, memory, prompt construction, and guardrails. - Use
LLD-7andLLD-8as implementation references for APIs and persistence contracts. - Use
LLD-9for the escalation and human handoff flow. - Use
LLD-10for the caching strategy and cache invalidation design.
LLD-1: Chatbot Orchestrator Service
Class Diagram
classDiagram
class ChatbotOrchestrator {
-conversationMemory: ConversationMemoryClient
-intentClassifier: IntentClassifierClient
-responseGenerator: ResponseGeneratorClient
-guardrails: GuardrailsClient
-metricsEmitter: MetricsEmitter
+handleMessage(request: ChatRequest): ResponseHandle
-routeByIntent(intent: Intent, context: ConversationContext): ServiceResponse
-buildLLMPrompt(intent: Intent, serviceData: Map, history: List~Turn~): Prompt
}
class ChatRequest {
+sessionId: String
+customerId: String?
+message: String
+pageContext: PageContext
+timestamp: Instant
}
class ChatResponse {
+sessionId: String
+responseText: String
+products: List~ProductCard~?
+actions: List~ActionButton~?
+metadata: ResponseMetadata
}
class ResponseHandle {
+sessionId: String
+responseId: String
+status: String
+deliveryChannel: String
}
class PageContext {
+currentASIN: String?
+storeSection: String
+cartASINs: List~String~
+browsingHistory: List~String~
}
class ConversationContext {
+sessionId: String
+turns: List~Turn~
+userProfile: UserProfile?
+currentIntent: Intent
+pageContext: PageContext
}
class Turn {
+role: Role
+content: String
+timestamp: Instant
+intent: Intent?
}
class Intent {
+type: IntentType
+confidence: Float
+entities: Map~String, String~
}
ChatbotOrchestrator --> ChatRequest
ChatbotOrchestrator --> ResponseHandle
ChatbotOrchestrator --> ChatResponse
ChatbotOrchestrator --> ConversationContext
ConversationContext --> Turn
ConversationContext --> PageContext
Turn --> Intent
Orchestrator Flow - Internal State Machine
stateDiagram-v2
[*] --> ReceiveMessage
ReceiveMessage --> LoadContext: Load session + memory
LoadContext --> ClassifyIntent: Call Intent Classifier
ClassifyIntent --> RouteToService: Based on intent type
RouteToService --> ProductDiscovery: product_discovery
RouteToService --> ProductQA: product_question
RouteToService --> FAQRetrieval: faq
RouteToService --> OrderLookup: order_tracking
RouteToService --> ReturnFlow: return_request
RouteToService --> PromoLookup: promotion
RouteToService --> Recommendation: recommendation
RouteToService --> CheckoutHelp: checkout_help
RouteToService --> ChitChat: chitchat
RouteToService --> Escalation: escalation
ProductDiscovery --> AggregateData
ProductQA --> AggregateData
FAQRetrieval --> AggregateData
OrderLookup --> AggregateData
ReturnFlow --> AggregateData
PromoLookup --> AggregateData
Recommendation --> AggregateData
CheckoutHelp --> AggregateData
ChitChat --> TemplateResponse
Escalation --> HandoffToAgent
AggregateData --> DecideResponseMode
DecideResponseMode --> TemplateResponse: structured / low-ambiguity
DecideResponseMode --> GenerateResponse: explanatory / ambiguous
GenerateResponse --> ApplyGuardrails
TemplateResponse --> ApplyGuardrails
ApplyGuardrails --> SaveTurn: Save to memory
SaveTurn --> ReturnResponse
ReturnResponse --> [*]
HandoffToAgent --> [*]
LLD-2: Intent Classifier
Design
The Intent Classifier is a two-stage system:
Stage 1 - Fast Rule-Based Pre-filter:
- Regex patterns catch high-confidence intents cheaply.
- Example: Messages containing "where is my order" or "track" -> order_tracking with 0.95 confidence.
- Example: Messages containing "return" + "damaged" -> return_request.
Stage 2 - ML Model (Fallback): - If Stage 1 confidence < 0.8, a fine-tuned BERT classifier runs on SageMaker. - Trained on labeled Amazon customer service conversations + manga-specific training data. - Returns intent + confidence + extracted entities (ASIN, series name, volume number).
graph LR
A[User Message] --> B{Rule-Based<br>Matcher}
B -->|confidence >= 0.8| C[Return Intent]
B -->|confidence < 0.8| D[BERT Classifier<br>SageMaker]
D --> E{confidence >= 0.6?}
E -->|yes| C
E -->|no| F[Fallback: general_query<br>Send full message to LLM]
Intent Taxonomy
| Intent | Example Messages | Routed To |
|---|---|---|
product_discovery |
"Show me horror manga", "What's popular?" | Recommendation Engine |
product_question |
"Is this in English?", "How many pages?" | Product Catalog |
recommendation |
"Something like One Piece" | Recommendation Engine + RAG |
faq |
"What's the return policy?" | RAG Pipeline |
order_tracking |
"Where is my order?" | Order Service |
return_request |
"I want to return this" | Returns Service |
promotion |
"Any deals on manga?" | Promotions Service |
checkout_help |
"Can I use gift cards?" | FAQ + Checkout Service |
escalation |
"Talk to a human" | Human Handoff |
chitchat |
"Hello", "Thanks" | Template Response |
general_query |
Low-confidence catch-all (Stage 2 below 0.6) | LLM with system prompt only, no service routing |
Entity Extraction
{
"intent": "product_question",
"confidence": 0.92,
"entities": {
"series_name": "Demon Slayer",
"volume_number": "12",
"attribute": "language",
"asin": null
}
}
LLD-3: RAG Pipeline
Indexing Pipeline
graph LR
subgraph "Data Sources"
A[Product Descriptions]
B[FAQ Pages]
C[Return Policies]
D[Editorial Content]
E[Review Summaries]
end
subgraph "Processing"
A --> F[Chunker<br>Size varies by source type]
B --> F
C --> F
D --> F
E --> F
F --> G[Embedding Model<br>Titan Embeddings V2]
G --> H[OpenSearch Serverless<br>Vector Index]
end
subgraph "Metadata"
F --> I[Attach metadata:<br>source_type, asin,<br>category, last_updated]
I --> H
end
Chunk Strategy by Content Type
| Content Type | Chunk Size | Overlap | Metadata | Rationale |
|---|---|---|---|---|
| Product descriptions | 256 tokens | 25 tokens | ASIN, category, format | Short, self-contained entries |
| FAQ articles | 512 tokens | 50 tokens | topic, last_updated | Longer explanatory content |
| Return and shipping policies | 512 tokens | 50 tokens | policy_type, region | Must preserve full policy context |
| Editorial content | 512 tokens | 50 tokens | genre, author | Narrative content needs larger windows |
| Review summaries | 128 tokens | 0 | ASIN, sentiment | Already condensed; no overlap needed |
Index Refresh Strategy
- Product descriptions and prices: Re-indexed every 6 hours or on catalog change events.
- FAQ and policy pages: Re-indexed daily during off-peak hours.
- Editorial content: Re-indexed weekly or on publish.
- Review summaries: Re-indexed daily.
- Stale chunk handling: Each chunk carries a
last_updatedtimestamp. Chunks older than their source's refresh cycle are deprioritized during retrieval.
Retrieval Flow
sequenceDiagram
participant Orchestrator
participant Embedder
participant VectorStore
participant Reranker
participant LLM
Orchestrator->>Embedder: Embed user query
Embedder-->>Orchestrator: Query vector (1536-dim)
Orchestrator->>VectorStore: KNN search (top 10)
VectorStore-->>Orchestrator: 10 candidate chunks
Orchestrator->>Reranker: Rerank by relevance
Reranker-->>Orchestrator: Top 3 chunks
Orchestrator->>LLM: System prompt + 3 chunks + query
LLM-->>Orchestrator: Grounded response
Chunk Schema
{
"chunk_id": "faq-return-policy-003",
"content": "Manga volumes can be returned within 30 days of delivery if they are in original condition. Damaged items can be returned for a full refund or replacement.",
"source_type": "faq",
"source_url": "/help/returns",
"asin": null,
"category": "manga",
"embedding": [0.012, -0.034, ...],
"last_updated": "2025-12-01"
}
LLD-4: Conversation Memory
DynamoDB Schema
| Attribute | Type | Applies To | Description |
|---|---|---|---|
pk |
String | All items | SESSION#<session_id> |
sk |
String | All items | META, TURN#<timestamp>, or SUMMARY#<window_id> |
customer_id (GSI1PK) |
String | Meta item | Amazon customer ID for authenticated session lookup |
updated_at (GSI1SK) |
Number | Meta item | Supports fetching the most recent sessions |
role |
String | Turn item | user, assistant, or system |
content |
String | Turn and summary items | Raw turn text or compressed summary |
intent |
String | Turn item | Classified intent for the turn |
page_context |
Map | Meta item | Latest page ASIN, store section, and cart snapshot |
created_at |
Number | All items | Epoch timestamp |
ttl |
Number | Meta and turn items | Epoch + 86400 (24-hour expiry) |
turn_count |
Number | Meta item | Total turns in session |
Memory Management
graph TD
A[New Message Arrives] --> B[Read META + latest turns]
B --> C{Turn count > 20?}
C -->|No| D[Write new TURN item]
C -->|Yes| E[Summarize oldest unsummarized window<br>Window size: 10 turns]
E --> F[Write SUMMARY item]
F --> D
D --> G[Update META item<br>turn_count, updated_at, last_intent]
Window size for summarization: 10 turns (5 user messages + 5 assistant responses). The LLM compresses these into a 2-3 sentence summary preserving: what the user was looking for, what was recommended, and any unresolved issues.
Why summarize instead of truncate? Summarization preserves key context (what the user was looking for, what was recommended) while reducing token count for LLM prompts.
Why not store the full conversation as one DynamoDB item? Separate turn items avoid the 400 KB item-size ceiling, reduce rewrite amplification on every new message, and make concurrent writes safer during streaming or retries.
DynamoDB throttle handling: If a TURN write is throttled, the Orchestrator retries with exponential backoff (max 2 retries). If the write still fails, the response is still delivered to the user, but the turn is enqueued to an SQS retry queue and persisted by an async worker to avoid blocking the response path. Messages that exhaust the worker's retries land in a DLQ for manual review.
LLD-5: Response Generation - Prompt Engineering
System Prompt Template
You are MangaAssist, a helpful shopping assistant for the JP Manga store on Amazon.com.
RULES:
1. Only recommend products that exist in the provided product data.
2. Never invent prices, availability, or delivery dates.
3. If you don't have the information, say so and offer to help differently.
4. Keep responses concise (2-4 sentences for simple questions, up to a short paragraph for recommendations).
5. Always include product links when recommending items.
6. Never discuss competitors by name.
7. If the user needs help beyond your capabilities, offer to connect them with a support agent.
CONTEXT:
- Store section: {{store_section}}
- Current product (if any): {{current_product_json}}
- User's recent browsing: {{browsing_history}}
- Active promotions: {{active_promos}}
RETRIEVED INFORMATION:
{{rag_chunks}}
CONVERSATION HISTORY:
{{conversation_turns}}
USER MESSAGE:
{{user_message}}
Response Format Contract
{
"response_text": "Based on your love for action manga, here are 3 titles you might enjoy:",
"products": [
{
"asin": "B08X1YRSTR",
"title": "Chainsaw Man, Vol. 1",
"price": "$9.99",
"image_url": "https://...",
"product_url": "https://amazon.com/dp/..."
}
],
"actions": [
{
"label": "Add to Cart",
"type": "add_to_cart",
"asin": "B08X1YRSTR"
},
{
"label": "See More Like This",
"type": "more_recommendations"
}
],
"follow_up_suggestions": [
"Tell me about the art style",
"Is there a box set?",
"Show me horror manga instead"
]
}
LLD-6: Guardrails Pipeline
graph LR
A[User Message] --> B[Input Safety<br>PII scrub, prompt injection, abuse checks]
B --> C{Blocked?}
C -->|Yes| H[Safe fallback or escalation]
C -->|No| D[LLM or Template Response]
D --> E[Output Safety<br>PII, price, toxicity, competitor, ASIN, scope]
E --> F{All checks pass?}
F -->|Yes| G[Return to User]
F -->|No| H[Fallback Response<br>+ Log for review]
Guardrail Rules
| Rule | What It Checks | Action on Failure |
|---|---|---|
| Input PII Scrub | SSN, credit card, phone, email, address in user input before LLM or analytics | Mask sensitive tokens and log |
| Prompt Injection | Instruction override attempts or encoded jailbreak patterns | Block or safe-refuse; log |
| Output PII Redaction | Sensitive PII reproduced in output | Redact and log |
| Price Accuracy | Prices in response match catalog | Replace with correct price |
| Toxicity | Offensive or inappropriate content | Block response, return safe fallback |
| Competitor Mention | Names of competitors (Barnes & Noble, etc.) | Remove mention |
| ASIN Validation | Product IDs mentioned actually exist | Remove invalid product |
| Scope Check | Response stays on topic (manga/Amazon) | Redirect to on-topic |
LLD-7: API Contracts
The transport is split into synchronous submission and asynchronous delivery so the contract matches the streaming architecture.
POST /chat/init
Creates a new chat session and returns a welcome message. Must be called before sending messages.
Request:
{
"session_token": "amzn-session-xyz789",
"page_context": {
"current_asin": "B08X1YRSTR",
"store_section": "manga-home",
"url": "/stores/page/jp-manga"
},
"locale": "ja_JP"
}
Response (200 OK):
{
"session_id": "sess_abc123",
"customer_id": "C123",
"is_authenticated": true,
"welcome_message": "Welcome to the JP Manga store! How can I help you today?",
"quick_actions": [
{ "label": "Browse Popular Manga", "type": "discovery" },
{ "label": "Track My Order", "type": "order_tracking" },
{ "label": "Get Recommendations", "type": "recommendation" }
],
"websocket_url": "wss://chat.amazon.com/ws/sess_abc123"
}
POST /chat/message
Request:
{
"session_id": "sess_abc123",
"message": "Recommend something like Naruto",
"page_context": {
"current_asin": null,
"store_section": "manga-home",
"cart_asins": ["B09XYZ"],
"url": "/stores/page/jp-manga"
}
}
Response (202 Accepted):
{
"session_id": "sess_abc123",
"response_id": "resp_def456",
"status": "accepted",
"delivery_channel": "websocket"
}
WebSocket Events
Delta event:
{
"type": "chat.response.delta",
"session_id": "sess_abc123",
"response_id": "resp_def456",
"delta": "If you loved Naruto, you'll enjoy these..."
}
Completed event:
{
"type": "chat.response.completed",
"session_id": "sess_abc123",
"response_id": "resp_def456",
"response_text": "If you loved Naruto, you'll enjoy these...",
"products": [...],
"actions": [...],
"follow_up_suggestions": [...],
"metadata": {
"intent": "recommendation",
"latency_ms": 1842,
"model": "claude-3-5-sonnet",
"sources": ["recommendation_engine", "product_catalog"]
}
}
GET /chat/message/{response_id}
HTTPS fallback for clients that cannot keep a WebSocket connection. Returns the latest buffered state or the final completed response.
POST /chat/feedback
{
"response_id": "resp_def456",
"feedback": "thumbs_up",
"comment": null
}
POST /chat/escalate
{
"session_id": "sess_abc123",
"reason": "user_requested",
"summary": "User wants to return damaged Naruto Vol 5, order #112-3456789"
}
Error Response Schema
All endpoints return a consistent error structure on failure.
{
"error": {
"code": "RATE_LIMITED",
"message": "Too many requests. Please wait a moment and try again.",
"retry_after_ms": 5000
}
}
| HTTP Status | Error Code | When |
|---|---|---|
| 400 | INVALID_REQUEST |
Missing required fields or malformed payload |
| 401 | UNAUTHORIZED |
Session token expired or invalid |
| 403 | FORBIDDEN |
User banned or action not allowed for guest |
| 404 | SESSION_NOT_FOUND |
Session ID does not exist or expired |
| 429 | RATE_LIMITED |
Token bucket exhausted |
| 500 | INTERNAL_ERROR |
Unexpected server error |
| 503 | SERVICE_UNAVAILABLE |
Downstream dependency failure; includes fallback message if available |
LLD-8: Database Schemas
Conversation Memory (DynamoDB)
Table: manga_chatbot_memory
PK: pk (String) -- SESSION#<session_id>
SK: sk (String) -- META | TURN#<timestamp> | SUMMARY#<window_id>
GSI1PK: customer_id (String) -- nullable for guests
GSI1SK: updated_at (Number)
META item:
session_id: String
customer_id: String (nullable)
page_context: Map
created_at: Number
updated_at: Number
ttl: Number
turn_count: Number
last_intent: String
TURN item:
role: String
content: String
intent: String
created_at: Number
response_id: String
guardrail_flags: List<String>
SUMMARY item:
covered_turns: String
content: String
created_at: Number
Analytics Events (Kinesis -> Redshift)
CREATE TABLE chatbot_events (
event_id VARCHAR(64) PRIMARY KEY,
session_id VARCHAR(64),
customer_id VARCHAR(64),
event_type VARCHAR(32), -- message, response, feedback, escalation
intent VARCHAR(32),
message_text VARCHAR(2000), -- PII-scrubbed
response_text VARCHAR(4000),
products_shown VARCHAR(500), -- comma-separated ASINs
latency_ms INTEGER,
model_id VARCHAR(64),
feedback VARCHAR(16), -- thumbs_up, thumbs_down, null
created_at TIMESTAMP
);
RAG Knowledge Base Index (OpenSearch)
{
"mappings": {
"properties": {
"chunk_id": { "type": "keyword" },
"content": { "type": "text" },
"embedding": {
"type": "knn_vector",
"dimension": 1536,
"method": { "name": "hnsw", "engine": "nmslib" }
},
"source_type": { "type": "keyword" },
"asin": { "type": "keyword" },
"category": { "type": "keyword" },
"last_updated": { "type": "date" }
}
}
}
LLD-9: Escalation and Human Handoff
Escalation Flow
sequenceDiagram
participant User
participant Orchestrator
participant SummaryLLM as LLM (Summary)
participant Memory as DynamoDB
participant Connect as Amazon Connect
participant Agent as Human Agent
User->>Orchestrator: "I want to talk to a human"
Orchestrator->>Memory: Load full conversation history
Memory-->>Orchestrator: All turns + metadata
Orchestrator->>SummaryLLM: Summarize conversation for agent handoff
SummaryLLM-->>Orchestrator: Summary text
Orchestrator->>Connect: POST /escalate {session_id, summary, customer_id, intent_history, sentiment}
Connect-->>Orchestrator: {queue_id, estimated_wait_seconds}
Orchestrator-->>User: "Connecting you with a support agent. Estimated wait: ~2 minutes."
Connect->>Agent: Route to available agent with context payload
Agent-->>User: "Hi, I see you need help with a damaged manga return. Let me help."
Escalation Payload to Amazon Connect
{
"session_id": "sess_abc123",
"customer_id": "C123",
"is_prime": true,
"reason": "user_requested",
"conversation_summary": "Customer wants to return Naruto Vol 5 (ASIN B09XYZ) from order #112-3456789 due to water damage. Bot confirmed the return window is still open.",
"intent_history": ["product_question", "return_request", "escalation"],
"sentiment": "frustrated",
"turn_count": 6,
"unresolved_issue": "Return label generation failed"
}
LLD-10: Caching Strategy
Cache Architecture
graph TD
subgraph "Orchestrator"
A[Request Handler]
end
subgraph "ElastiCache Redis Cluster"
B[Product Cache<br>TTL: 5 min]
C[Recommendation Cache<br>TTL: 15 min]
D[Promotion Cache<br>TTL: 15 min]
E[Review Cache<br>TTL: 1 hour]
end
subgraph "Origin Services"
F[Product Catalog]
G[Recommendation Engine]
H[Promotions Service]
I[Reviews Service]
end
A -->|cache-aside read| B
B -->|miss| F
F -->|populate cache| B
A -->|cache-aside read| C
C -->|miss| G
G -->|populate cache| C
A -->|cache-aside read| D
D -->|miss| H
H -->|populate cache| D
A -->|cache-aside read| E
E -->|miss| I
I -->|populate cache| E
Cache Rules
| Data | Cache Key Pattern | TTL | Invalidation |
|---|---|---|---|
| Product details | product:{asin} |
5 min | Catalog change event via SNS |
| Recommendations | reco:{user_id}:{seed_asin} |
15 min | New session or explicit refresh |
| Promotions | promo:{store_section} |
15 min | Promotion change event via SNS |
| Reviews / ratings | review:{asin} |
1 hour | Scheduled refresh |
| Prices | Never cached | N/A | Always fetched live |
Why cache-aside? The Orchestrator checks the cache first and only calls the origin service on a miss. This pattern is simple, avoids stale-read complexity, and keeps the cache layer optional — if ElastiCache is down, requests fall through to origin services with higher latency but correct data.