LOCAL PREVIEW View on GitHub

6. Detailed Workflow — Step-by-Step Request Flow

End-to-End Request Lifecycle

sequenceDiagram
    actor User
    participant FE as Frontend<br>(React Widget)
    participant GW as API Gateway
    participant Auth as Auth Service
    participant Orch as Orchestrator
    participant IC as Intent Classifier
    participant Mem as Conversation<br>Memory (DDB)
    participant Rec as Recommendation<br>Engine
    participant Cat as Product Catalog
    participant RAG as RAG Pipeline
    participant LLM as LLM (Bedrock)
    participant GR as Guardrails
    participant Ana as Analytics<br>(Kinesis)

    Note over User,Ana: Step 1 — User opens JP Manga store
    User->>FE: Opens amazon.com/stores/jp-manga
    FE->>FE: Render chat FAB button

    Note over User,Ana: Step 2 — User opens chatbot
    User->>FE: Clicks chat FAB
    FE->>GW: POST /chat/init (session start)
    GW->>Auth: Validate session token
    Auth-->>GW: customer_id = C123 (or guest)
    GW->>Orch: Create session
    Orch->>Mem: Create session entry
    Orch-->>FE: Welcome message + quick chips

    Note over User,Ana: Step 3 — User asks a question
    User->>FE: "Recommend something like Attack on Titan"
    FE->>FE: Attach page_context (current ASIN, section, cart)
    FE->>GW: POST /chat/message (WebSocket)

    Note over User,Ana: Step 4 — Backend validates and routes
    GW->>GW: Rate limit check (30 msg/min)
    GW->>Auth: Verify session still valid
    GW->>Orch: Forward authenticated request

    Note over User,Ana: Step 5 — Orchestrator processes
    Orch->>Mem: Load conversation history (last 10 turns)
    Mem-->>Orch: Previous turns []
    Orch->>IC: Classify("Recommend something like Attack on Titan")
    IC-->>Orch: intent=recommendation, entity={seed="Attack on Titan"}, confidence=0.94

    Note over User,Ana: Step 6 — Service calls (parallel)
    par Recommendation
        Orch->>Rec: getSimilar(seed="Attack on Titan", userId=C123)
        Rec-->>Orch: [ASIN1, ASIN2, ASIN3, ASIN4, ASIN5]
    and Product Details
        Orch->>Cat: getProducts([ASIN1...ASIN5])
        Cat-->>Orch: [{title, price, rating, image, availability}...]
    and RAG Context
        Orch->>RAG: retrieve("manga similar to Attack on Titan")
        RAG-->>Orch: [editorial_chunk_1, genre_description_chunk_2]
    end

    Note over User,Ana: Step 7 — LLM generates response
    Orch->>Orch: Build prompt with:<br>- System instructions<br>- Conversation history<br>- Retrieved chunks<br>- Product data<br>- User message
    Orch->>LLM: Generate response (streaming)
    LLM-->>Orch: Streamed tokens

    Note over User,Ana: Step 8 — Guardrails validate output
    Orch->>GR: Validate response
    GR->>GR: Check: PII? Toxicity? Price accuracy? Valid ASINs?
    GR-->>Orch: ✅ Approved (or ❌ Blocked → fallback)

    Note over User,Ana: Step 9 — Response delivered
    Orch->>Mem: Save turn (user msg + bot response)
    Orch-->>GW: Streamed response
    GW-->>FE: WebSocket frames
    FE-->>User: Display response with product cards + chips

    Note over User,Ana: Step 10 — Logging and analytics
    Orch->>Ana: Emit event (session, intent, latency, products shown)
    User->>FE: Clicks 👍 (thumbs up)
    FE->>GW: POST /chat/feedback
    GW->>Ana: Emit feedback event

Step-by-Step Breakdown

Step 1: User Opens Store

  • User navigates to amazon.com/stores/page/jp-manga.
  • Frontend loads the manga store layout plus the MangaAssist chat widget (lazy-loaded, ~15KB bundle).
  • The chat FAB renders in the bottom-right corner.

Step 2: Chat Session Initialization

  • User clicks the FAB.
  • Frontend sends POST /chat/init with the user's session token and current page context.
  • The Auth Service validates the token and returns the customer ID (or marks as guest).
  • The Orchestrator creates a new session in DynamoDB with a 24-hour TTL.
  • A welcome message is returned with contextual quick-action chips.

Step 3: User Sends a Message

  • User types: "Recommend something like Attack on Titan."
  • Frontend attaches page_context (current ASIN if on a product page, store section, cart contents, locale).
  • Message is sent over the existing WebSocket connection.

Step 4: Validation & Routing

  • API Gateway checks the rate limiter (token bucket, 30 messages/minute per user).
  • Auth is re-verified (session not expired, user not banned).
  • Request is forwarded to the Orchestrator.

Step 5: Orchestration

  • Orchestrator loads the last 10 turns of conversation from DynamoDB.
  • Orchestrator calls the Intent Classifier with the message + conversation context.
  • Classifier returns: intent=recommendation, seed_entity="Attack on Titan", confidence=0.94.
  • Orchestrator determines which downstream services to call based on the intent.

Step 6: Parallel Service Calls

Three calls happen in parallel (critical for latency):

Call Service Input Output Latency Target
1 Recommendation Engine Seed ASIN + user ID 5 similar ASINs < 200ms
2 Product Catalog 5 ASINs Full product details < 100ms
3 RAG Pipeline Query text 3 relevant chunks < 300ms

Total parallel wall time: ~300ms (bounded by the slowest call).

Step 7: LLM Generation

  • Orchestrator assembles the prompt:
  • System instructions (persona, rules, constraints).
  • Conversation history (previous turns).
  • Retrieved RAG chunks (editorial descriptions, genre info).
  • Product data (titles, prices, ratings, availability).
  • The user's message.
  • Sends to Amazon Bedrock (Claude) for streaming generation.
  • LLM generates a natural response referencing the actual product data.

Step 8: Guardrails

  • The full response is validated before delivery:
  • PII check: No customer data leaked.
  • Price accuracy: Prices in the response match the catalog data that was provided.
  • ASIN validation: All product IDs mentioned are real.
  • Toxicity filter: No offensive content.
  • Competitor filter: No competitor names.
  • If any check fails, the specific part is corrected or the response is replaced with a safe fallback.

Step 9: Response Delivery

  • Response is streamed token-by-token over WebSocket for perceived speed.
  • Product cards (images, prices, "Add to Cart" buttons) render as structured elements after the text stream completes.
  • Follow-up suggestion chips appear below the response.
  • The turn is saved to conversation memory.

Step 10: Logging & Analytics

  • An event is emitted to Kinesis with: session ID, intent, latency, products shown, model used.
  • If the user clicks thumbs up/down, a feedback event is captured.
  • All data flows to Redshift for dashboards and model improvement.

Latency Budget

gantt
    title Request Latency Budget (Target: < 3 seconds)
    dateFormat X
    axisFormat %L ms

    section Gateway
    Auth + Rate Limit           :0, 50

    section Orchestrator
    Load Memory                 :50, 100
    Intent Classification       :100, 150

    section Service Calls (Parallel)
    Recommendation Engine       :150, 350
    Product Catalog             :150, 250
    RAG Retrieval               :150, 450

    section Generation
    LLM First Token             :450, 950
    LLM Full Response           :950, 2500

    section Safety
    Guardrails                  :2500, 2600

    section Delivery
    WebSocket Send              :2600, 2650

Key insight: The user sees the first token at ~950ms (streaming), so the perceived latency is under 1 second. The full response completes within ~2.7 seconds.