6. Detailed Workflow — Step-by-Step Request Flow

End-to-End Request Lifecycle

sequenceDiagram
    actor User
    participant FE as Frontend<br>(React Widget)
    participant GW as API Gateway
    participant Auth as Auth Service
    participant Orch as Orchestrator
    participant IC as Intent Classifier
    participant Mem as Conversation<br>Memory (DDB)
    participant Rec as Recommendation<br>Engine
    participant Cat as Product Catalog
    participant RAG as RAG Pipeline
    participant LLM as LLM (Bedrock)
    participant GR as Guardrails
    participant Ana as Analytics<br>(Kinesis)

    Note over User,Ana: Step 1 — User opens JP Manga store
    User->>FE: Opens amazon.com/stores/jp-manga
    FE->>FE: Render chat FAB button

    Note over User,Ana: Step 2 — User opens chatbot
    User->>FE: Clicks chat FAB
    FE->>GW: POST /chat/init (session start)
    GW->>Auth: Validate session token
    Auth-->>GW: customer_id = C123 (or guest)
    GW->>Orch: Create session
    Orch->>Mem: Create session entry
    Orch-->>FE: Welcome message + quick chips

    Note over User,Ana: Step 3 — User asks a question
    User->>FE: "Recommend something like Attack on Titan"
    FE->>FE: Attach page_context (current ASIN, section, cart)
    FE->>GW: POST /chat/message (WebSocket)

    Note over User,Ana: Step 4 — Backend validates and routes
    GW->>GW: Rate limit check (30 msg/min)
    GW->>Auth: Verify session still valid
    GW->>Orch: Forward authenticated request

    Note over User,Ana: Step 5 — Orchestrator processes
    Orch->>Mem: Load conversation history (last 10 turns)
    Mem-->>Orch: Previous turns []
    Orch->>IC: Classify("Recommend something like Attack on Titan")
    IC-->>Orch: intent=recommendation, entity={seed="Attack on Titan"}, confidence=0.94

    Note over User,Ana: Step 6 — Service calls (parallel)
    par Recommendation
        Orch->>Rec: getSimilar(seed="Attack on Titan", userId=C123)
        Rec-->>Orch: [ASIN1, ASIN2, ASIN3, ASIN4, ASIN5]
    and Product Details
        Orch->>Cat: getProducts([ASIN1...ASIN5])
        Cat-->>Orch: [{title, price, rating, image, availability}...]
    and RAG Context
        Orch->>RAG: retrieve("manga similar to Attack on Titan")
        RAG-->>Orch: [editorial_chunk_1, genre_description_chunk_2]
    end

    Note over User,Ana: Step 7 — LLM generates response
    Orch->>Orch: Build prompt with:<br>- System instructions<br>- Conversation history<br>- Retrieved chunks<br>- Product data<br>- User message
    Orch->>LLM: Generate response (streaming)
    LLM-->>Orch: Streamed tokens

    Note over User,Ana: Step 8 — Guardrails validate output
    Orch->>GR: Validate response
    GR->>GR: Check: PII? Toxicity? Price accuracy? Valid ASINs?
    GR-->>Orch: ✅ Approved (or ❌ Blocked → fallback)

    Note over User,Ana: Step 9 — Response delivered
    Orch->>Mem: Save turn (user msg + bot response)
    Orch-->>GW: Streamed response
    GW-->>FE: WebSocket frames
    FE-->>User: Display response with product cards + chips

    Note over User,Ana: Step 10 — Logging and analytics
    Orch->>Ana: Emit event (session, intent, latency, products shown)
    User->>FE: Clicks 👍 (thumbs up)
    FE->>GW: POST /chat/feedback
    GW->>Ana: Emit feedback event

Step-by-Step Breakdown

Step 1: User Opens Store

User navigates to amazon.com/stores/page/jp-manga.
Frontend loads the manga store layout plus the MangaAssist chat widget (lazy-loaded, ~15KB bundle).
The chat FAB renders in the bottom-right corner.

Step 2: Chat Session Initialization

User clicks the FAB.
Frontend sends POST /chat/init with the user's session token and current page context.
The Auth Service validates the token and returns the customer ID (or marks as guest).
The Orchestrator creates a new session in DynamoDB with a 24-hour TTL.
A welcome message is returned with contextual quick-action chips.

Step 3: User Sends a Message

User types: "Recommend something like Attack on Titan."
Frontend attaches page_context (current ASIN if on a product page, store section, cart contents, locale).
Message is sent over the existing WebSocket connection.

Step 4: Validation & Routing

API Gateway checks the rate limiter (token bucket, 30 messages/minute per user).
Auth is re-verified (session not expired, user not banned).
Request is forwarded to the Orchestrator.

Step 5: Orchestration

Orchestrator loads the last 10 turns of conversation from DynamoDB.
Orchestrator calls the Intent Classifier with the message + conversation context.
Classifier returns: intent=recommendation, seed_entity="Attack on Titan", confidence=0.94.
Orchestrator determines which downstream services to call based on the intent.

Step 6: Parallel Service Calls

Three calls happen in parallel (critical for latency):

Call	Service	Input	Output	Latency Target
1	Recommendation Engine	Seed ASIN + user ID	5 similar ASINs	< 200ms
2	Product Catalog	5 ASINs	Full product details	< 100ms
3	RAG Pipeline	Query text	3 relevant chunks	< 300ms

Total parallel wall time: ~300ms (bounded by the slowest call).

Step 7: LLM Generation

Orchestrator assembles the prompt:
System instructions (persona, rules, constraints).
Conversation history (previous turns).
Retrieved RAG chunks (editorial descriptions, genre info).
Product data (titles, prices, ratings, availability).
The user's message.
Sends to Amazon Bedrock (Claude) for streaming generation.
LLM generates a natural response referencing the actual product data.

Step 8: Guardrails

The full response is validated before delivery:
PII check: No customer data leaked.
Price accuracy: Prices in the response match the catalog data that was provided.
ASIN validation: All product IDs mentioned are real.
Toxicity filter: No offensive content.
Competitor filter: No competitor names.
If any check fails, the specific part is corrected or the response is replaced with a safe fallback.

Step 9: Response Delivery

Response is streamed token-by-token over WebSocket for perceived speed.
Product cards (images, prices, "Add to Cart" buttons) render as structured elements after the text stream completes.
Follow-up suggestion chips appear below the response.
The turn is saved to conversation memory.

Step 10: Logging & Analytics

An event is emitted to Kinesis with: session ID, intent, latency, products shown, model used.
If the user clicks thumbs up/down, a feedback event is captured.
All data flows to Redshift for dashboards and model improvement.

Latency Budget

gantt
    title Request Latency Budget (Target: < 3 seconds)
    dateFormat X
    axisFormat %L ms

    section Gateway
    Auth + Rate Limit           :0, 50

    section Orchestrator
    Load Memory                 :50, 100
    Intent Classification       :100, 150

    section Service Calls (Parallel)
    Recommendation Engine       :150, 350
    Product Catalog             :150, 250
    RAG Retrieval               :150, 450

    section Generation
    LLM First Token             :450, 950
    LLM Full Response           :950, 2500

    section Safety
    Guardrails                  :2500, 2600

    section Delivery
    WebSocket Send              :2600, 2650

Key insight: The user sees the first token at ~950ms (streaming), so the perceived latency is under 1 second. The full response completes within ~2.7 seconds.