HLD Deep Dive: Recommendations, Personalization & Caching

Questions covered: Q10, Q16, Q19 (partial), Q29 (async patterns)
Interviewer level: Senior Engineer → Staff Engineer

Q10. Why reuse Amazon Personalize instead of building a custom recommendation engine?

Short Answer

Amazon already has one of the best recommendation engines in the world. Reusing it saves months of development, leverages existing user signals, and is battle-tested at billions of interactions per day.

Deep Dive

What building custom would involve:

Custom Recommendation Engine (12+ months):

  Data Pipeline:
    - Collect user events (views, purchases, ratings, clicks)
    - Store interaction matrix (users × items)
    - De-duplicate, normalize, handle cold start

  Model Training:
    - Collaborative Filtering (Matrix Factorization, ALS)
    - Content-Based Filtering (item feature vectors)
    - Hybrid model combining both
    - Retrain pipeline (weekly? daily? real-time?)

  Serving Infrastructure:
    - Low-latency serving endpoint (<100ms)
    - Model versioning and A/B testing
    - Real-time feature updates

  Evaluation:
    - Offline metrics (NDCG, Recall@K)
    - Online A/B testing
    - Cold start handling (new users, new items)

Timeline: 6-12 months. Cost: $500K–$2M in engineering time.

What Amazon Personalize gives you out of the box:

Amazon Personalize handles:
  ✅ Data ingestion API (events, interactions, item catalog)
  ✅ Model training (HRNN, SIMS, Popularity-Count, etc.)
  ✅ Automatic retraining (real-time or scheduled)
  ✅ Cold start handling (new users → popularity-based)
  ✅ A/B testing infrastructure (experiments)
  ✅ Campaign management (deploy, update model)
  ✅ Context-aware recommendations (pass current items/query)
  ✅ High-availability serving endpoint

What you still control:
  ✅ Item catalog (what products are available)
  ✅ User event schema (what signals to send)
  ✅ Business rules (filter out-of-stock items)
  ✅ Contextual parameters (current page, search query)

Integration pattern:

import boto3

personalize_runtime = boto3.client("personalize-runtime")

async def get_recommendations(customer_id: str, query_context: str, 
                               num_results: int = 10) -> list:
    response = personalize_runtime.get_recommendations(
        campaignArn="arn:aws:personalize:ap-northeast-1:...:campaign/manga-recs",
        userId=customer_id,
        numResults=num_results,
        context={
            "CURRENT_QUERY": query_context,    # "dark fantasy manga"
            "DEVICE_TYPE": "desktop"
        },
        filterArn="arn:aws:personalize:...:filter/in-stock-filter"  # Only in-stock
    )

    asins = [item["itemId"] for item in response["itemList"]]

    # Enrich with product details from catalog
    products = await catalog.batch_get_products(asins)
    return products

Cold start problem — what Personalize does for new users:

New User (no history):
  → Personalize falls back to "POPULARITY_COUNT" recipe
  → Returns top-N most popular manga in the relevant category
  → As user interacts, real-time signals update recommendations immediately

Guest User:
  → Use session-level context (current page category, search query)
  → In-session recommendations only ("similar to what you're viewing now")

Why this is a "build vs. buy" win: - Recommendation quality from day 1 is excellent (Amazon's training data is their competitive advantage, but the model architecture in Personalize is sound). - Engineering resources can focus on the chatbot's unique value (NLP, orchestration) rather than commodity ML infrastructure. - At scale, Amazon's recommendation signals (all of amazon.com purchase data) feed downstream models — the manga chatbot benefits from cross-category signals.

Caching Strategy: What Gets Cached and Why

ElastiCache Redis as the Hot Path

┌──────────────────────────────────────────────────────────────┐
│                   CACHING DECISION TABLE                     │
├─────────────────────┬──────────┬──────────┬────────────────┤
│ Data Type           │ Cached?  │ TTL      │ Reason         │
├─────────────────────┼──────────┼──────────┼────────────────┤
│ Product details     │ ✅ Yes   │ 1 hour   │ Changes rarely │
│ Promotions/deals    │ ✅ Yes   │ 5 min    │ Time-sensitive │
│ Recommendations     │ ✅ Yes   │ 15 min   │ Expensive call │
│ FAQ answers         │ ✅ Yes   │ 24 hours │ Very stable    │
│ Current PRICE       │ ❌ NO    │ N/A      │ MUST be live   │
│ Inventory status    │ ❌ NO    │ N/A      │ MUST be live   │
│ Order status        │ ❌ NO    │ N/A      │ MUST be live   │
│ Conversation memory │ ✅ Yes   │ 1 hour   │ DynamoDB backup│
└─────────────────────┴──────────┴──────────┴────────────────┘

Cache key design:

# Product details cache
cache_key = f"product:{asin}"                          # e.g., "product:B08KTZ8X3Q"

# Recommendation cache (user-specific, 15min TTL)
cache_key = f"recs:{customer_id}:{query_hash}"        # e.g., "recs:cust123:a7f3b2"

# FAQ answer cache
cache_key = f"faq:{question_hash}"                     # e.g., "faq:3f8a1c"

# Promotions cache (global, 5min TTL)
cache_key = f"promotions:category:{category}"          # e.g., "promotions:category:manga"

Cache invalidation — event-driven:

# When a product is updated in the catalog:
async def on_product_updated(event: ProductUpdateEvent):
    asin = event.asin

    # Invalidate product cache
    await redis.delete(f"product:{asin}")

    # Invalidate any recommendation caches that reference this product
    # (Use a secondary index: asin → list of cache keys that contain it)
    cache_keys = await redis.smembers(f"product_cache_refs:{asin}")
    if cache_keys:
        await redis.delete(*cache_keys)

    # Trigger re-indexing if product description changed
    if event.description_changed:
        await kinesis.put_record(
            StreamName="rag-reindex",
            Data=json.dumps({"asin": asin, "action": "reindex"})
        )

Why prices are NEVER cached:
If the chatbot shows a user a price from cache that's $5 cheaper than the current price, and the user clicks "Buy" and sees a different price → trust is broken. Worse, Amazon is at risk of legal/regulatory issues for advertising a price it doesn't honor. Prices are always fetched from the live catalog, with zero caching.

Q29. Where to introduce asynchronous patterns?

Short Answer

Analytics, feedback, RAG re-indexing, human handoff, and slow response generation are all candidates for async.

Deep Dive

Async patterns serve two purposes: 1. Fire-and-forget — the chatbot doesn't need to wait for this to complete the response. 2. Decoupling — separates the real-time response path from slower background operations.

Current synchronous calls (keep sync):

Intent Classification  → needs result before routing (~20ms)
Product Catalog query  → needs data for LLM context (~50ms)
Recommendations fetch  → needs data for LLM context (~200ms)
LLM generation         → needs output to send to user (~1,500ms)
Guardrails validation  → must complete before sending to user (~100ms)

Where to go async:

1. Analytics Logging (currently async via Kinesis — correct)

async def handle_message(session_id: str, message: str):
    response = await generate_response(message)

    # Don't await — fire and forget
    asyncio.create_task(
        analytics.log_event({
            "session_id": session_id,
            "message": message,
            "response": response,
            "intent": intent,
            "latency_ms": elapsed
        })
    )

    return response  # Return immediately without waiting for analytics

2. Feedback Processing (SQS queue)

async def submit_feedback(session_id: str, turn_id: str, rating: int):
    # Queue message for async processing
    await sqs.send_message(
        QueueUrl=FEEDBACK_QUEUE_URL,
        MessageBody=json.dumps({
            "session_id": session_id,
            "turn_id": turn_id,
            "rating": rating,
            "timestamp": datetime.utcnow().isoformat()
        })
    )
    # Immediately return 200 OK — user doesn't wait for feedback to be processed
    return {"status": "received"}

A separate Lambda consumer processes feedback from SQS, updates training labels, and triggers classifier retraining.

3. RAG Re-indexing (event-driven, not real-time path)

# Triggered by catalog change event (S3 upload, DynamoDB Streams)
async def on_knowledge_base_updated(event: S3Event):
    for record in event.records:
        await kinesis.put_record(
            StreamName="rag-reindex",
            Data=json.dumps({
                "s3_key": record.s3.key,
                "action": "upsert",
                "category": infer_category(record.s3.key)
            })
        )
    # Indexing happens asynchronously — current RAG continues serving old index
    # New index becomes available in ~minutes without any downtime

4. Human Handoff (Amazon Connect — async escalation)

async def escalate_to_human(session_id: str, customer_id: str, 
                              conversation_summary: str):
    # Create a task in the human agent queue
    await connect.start_task_contact(
        ContactFlowId=ESCALATION_FLOW_ID,
        Attributes={
            "customer_id": customer_id,
            "conversation_summary": conversation_summary,
            "session_id": session_id,
            "escalation_reason": "user_requested"
        }
    )

    # Immediately acknowledge to user — agent availability is checked async
    return {
        "message": "I've connected you with a support agent. They'll be with you shortly.",
        "estimated_wait": await connect.get_estimated_wait_time()
    }

5. Slow response generation (typing indicator pattern)

For responses expected to take >3 seconds:

1. Immediately send: { "type": "typing_indicator", "message": "MangaAssist is thinking..." }
2. Generate response asynchronously
3. When ready, push via WebSocket: { "type": "response", "content": "..." }

This keeps the user informed and prevents them from thinking the chatbot is broken.

Decision framework for sync vs. async:

Operation	Need result before responding?	Impact if delayed?	Pattern
Intent classification	✅ Yes	Blocks routing	Sync
Product data	✅ Yes	Can't build context	Sync
Analytics write	❌ No	None to user	Async (Kinesis)
Feedback save	❌ No	None to user	Async (SQS)
Cache update	❌ No	None to user	Async background
Human handoff	Partial (acknowledge sync)	Lose the customer	Async queue + sync ack