6. Detailed Workflow — Step-by-Step Request Flow
End-to-End Request Lifecycle
sequenceDiagram
actor User
participant FE as Frontend<br>(React Widget)
participant GW as API Gateway
participant Auth as Auth Service
participant Orch as Orchestrator
participant IC as Intent Classifier
participant Mem as Conversation<br>Memory (DDB)
participant Rec as Recommendation<br>Engine
participant Cat as Product Catalog
participant RAG as RAG Pipeline
participant LLM as LLM (Bedrock)
participant GR as Guardrails
participant Ana as Analytics<br>(Kinesis)
Note over User,Ana: Step 1 — User opens JP Manga store
User->>FE: Opens amazon.com/stores/jp-manga
FE->>FE: Render chat FAB button
Note over User,Ana: Step 2 — User opens chatbot
User->>FE: Clicks chat FAB
FE->>GW: POST /chat/init (session start)
GW->>Auth: Validate session token
Auth-->>GW: customer_id = C123 (or guest)
GW->>Orch: Create session
Orch->>Mem: Create session entry
Orch-->>FE: Welcome message + quick chips
Note over User,Ana: Step 3 — User asks a question
User->>FE: "Recommend something like Attack on Titan"
FE->>FE: Attach page_context (current ASIN, section, cart)
FE->>GW: POST /chat/message (WebSocket)
Note over User,Ana: Step 4 — Backend validates and routes
GW->>GW: Rate limit check (30 msg/min)
GW->>Auth: Verify session still valid
GW->>Orch: Forward authenticated request
Note over User,Ana: Step 5 — Orchestrator processes
Orch->>Mem: Load conversation history (last 10 turns)
Mem-->>Orch: Previous turns []
Orch->>IC: Classify("Recommend something like Attack on Titan")
IC-->>Orch: intent=recommendation, entity={seed="Attack on Titan"}, confidence=0.94
Note over User,Ana: Step 6 — Service calls (parallel)
par Recommendation
Orch->>Rec: getSimilar(seed="Attack on Titan", userId=C123)
Rec-->>Orch: [ASIN1, ASIN2, ASIN3, ASIN4, ASIN5]
and Product Details
Orch->>Cat: getProducts([ASIN1...ASIN5])
Cat-->>Orch: [{title, price, rating, image, availability}...]
and RAG Context
Orch->>RAG: retrieve("manga similar to Attack on Titan")
RAG-->>Orch: [editorial_chunk_1, genre_description_chunk_2]
end
Note over User,Ana: Step 7 — LLM generates response
Orch->>Orch: Build prompt with:<br>- System instructions<br>- Conversation history<br>- Retrieved chunks<br>- Product data<br>- User message
Orch->>LLM: Generate response (streaming)
LLM-->>Orch: Streamed tokens
Note over User,Ana: Step 8 — Guardrails validate output
Orch->>GR: Validate response
GR->>GR: Check: PII? Toxicity? Price accuracy? Valid ASINs?
GR-->>Orch: ✅ Approved (or ❌ Blocked → fallback)
Note over User,Ana: Step 9 — Response delivered
Orch->>Mem: Save turn (user msg + bot response)
Orch-->>GW: Streamed response
GW-->>FE: WebSocket frames
FE-->>User: Display response with product cards + chips
Note over User,Ana: Step 10 — Logging and analytics
Orch->>Ana: Emit event (session, intent, latency, products shown)
User->>FE: Clicks 👍 (thumbs up)
FE->>GW: POST /chat/feedback
GW->>Ana: Emit feedback event
Step-by-Step Breakdown
Step 1: User Opens Store
- User navigates to
amazon.com/stores/page/jp-manga. - Frontend loads the manga store layout plus the MangaAssist chat widget (lazy-loaded, ~15KB bundle).
- The chat FAB renders in the bottom-right corner.
Step 2: Chat Session Initialization
- User clicks the FAB.
- Frontend sends
POST /chat/initwith the user's session token and current page context. - The Auth Service validates the token and returns the customer ID (or marks as guest).
- The Orchestrator creates a new session in DynamoDB with a 24-hour TTL.
- A welcome message is returned with contextual quick-action chips.
Step 3: User Sends a Message
- User types: "Recommend something like Attack on Titan."
- Frontend attaches
page_context(current ASIN if on a product page, store section, cart contents, locale). - Message is sent over the existing WebSocket connection.
Step 4: Validation & Routing
- API Gateway checks the rate limiter (token bucket, 30 messages/minute per user).
- Auth is re-verified (session not expired, user not banned).
- Request is forwarded to the Orchestrator.
Step 5: Orchestration
- Orchestrator loads the last 10 turns of conversation from DynamoDB.
- Orchestrator calls the Intent Classifier with the message + conversation context.
- Classifier returns:
intent=recommendation,seed_entity="Attack on Titan",confidence=0.94. - Orchestrator determines which downstream services to call based on the intent.
Step 6: Parallel Service Calls
Three calls happen in parallel (critical for latency):
| Call | Service | Input | Output | Latency Target |
|---|---|---|---|---|
| 1 | Recommendation Engine | Seed ASIN + user ID | 5 similar ASINs | < 200ms |
| 2 | Product Catalog | 5 ASINs | Full product details | < 100ms |
| 3 | RAG Pipeline | Query text | 3 relevant chunks | < 300ms |
Total parallel wall time: ~300ms (bounded by the slowest call).
Step 7: LLM Generation
- Orchestrator assembles the prompt:
- System instructions (persona, rules, constraints).
- Conversation history (previous turns).
- Retrieved RAG chunks (editorial descriptions, genre info).
- Product data (titles, prices, ratings, availability).
- The user's message.
- Sends to Amazon Bedrock (Claude) for streaming generation.
- LLM generates a natural response referencing the actual product data.
Step 8: Guardrails
- The full response is validated before delivery:
- PII check: No customer data leaked.
- Price accuracy: Prices in the response match the catalog data that was provided.
- ASIN validation: All product IDs mentioned are real.
- Toxicity filter: No offensive content.
- Competitor filter: No competitor names.
- If any check fails, the specific part is corrected or the response is replaced with a safe fallback.
Step 9: Response Delivery
- Response is streamed token-by-token over WebSocket for perceived speed.
- Product cards (images, prices, "Add to Cart" buttons) render as structured elements after the text stream completes.
- Follow-up suggestion chips appear below the response.
- The turn is saved to conversation memory.
Step 10: Logging & Analytics
- An event is emitted to Kinesis with: session ID, intent, latency, products shown, model used.
- If the user clicks thumbs up/down, a feedback event is captured.
- All data flows to Redshift for dashboards and model improvement.
Latency Budget
gantt
title Request Latency Budget (Target: < 3 seconds)
dateFormat X
axisFormat %L ms
section Gateway
Auth + Rate Limit :0, 50
section Orchestrator
Load Memory :50, 100
Intent Classification :100, 150
section Service Calls (Parallel)
Recommendation Engine :150, 350
Product Catalog :150, 250
RAG Retrieval :150, 450
section Generation
LLM First Token :450, 950
LLM Full Response :950, 2500
section Safety
Guardrails :2500, 2600
section Delivery
WebSocket Send :2600, 2650
Key insight: The user sees the first token at ~950ms (streaming), so the perceived latency is under 1 second. The full response completes within ~2.7 seconds.