LOCAL PREVIEW View on GitHub

15. Trade-offs and Challenges

Key Trade-offs

1. Rule-Based vs. LLM-Based Responses

graph LR
    subgraph "Rule-Based"
        A[Fast<br>< 10ms]
        B[Predictable<br>No hallucination]
        C[Rigid<br>Cannot handle novel queries]
        D[Cheap<br>No LLM cost]
    end

    subgraph "LLM-Based"
        E[Slower<br>1-3 seconds]
        F[Flexible<br>Handles more cases]
        G[Risk<br>Can hallucinate]
        H[Expensive<br>Per-token cost]
    end

Decision: Hybrid. Use rules for greetings, simple lookups, and obvious order status questions. Use the LLM for recommendations, complex Q&A, and multi-turn conversations.

2. RAG vs. Direct API-Based Answers

Approach Pros Cons Best For
RAG Broad knowledge, flexible Retrieval can fail, slower FAQ, policies, editorial content
Direct API Precise, fast, no hallucination Rigid, needs per-intent coding Order status, prices, inventory

Decision: Use both. RAG for knowledge-heavy questions, direct API for structured data lookups.

3. Latency vs. Quality

graph TD
    A[User asks question] --> B{Simple or Complex?}
    B -->|Simple| C[Fast Path<br>Rule + Template<br>< 200ms]
    B -->|Complex| D{How complex?}
    D -->|Medium| E[RAG + LLM<br>1-2 seconds]
    D -->|High| F[Multi-step reasoning<br>2-3 seconds]

Decision: Target under 1 second for first token, use a smaller model for simple tasks, reserve the best model for truly complex requests, and never let one request exceed 5 seconds.

4. Personalization vs. Privacy

Decision: Prefer privacy-first personalization.

  • Authenticated users can use purchase history and browsing data.
  • Guest users use current session data only.
  • Never display full addresses, payment info, or sensitive data.
  • Keep all personalization data encrypted and access-controlled.

5. Cost vs. Accuracy

Decision: Optimize for the sweet spot.

  • Use Sonnet-class generation for quality.
  • Retrieve the top 3 RAG chunks with reranking.
  • Keep 10 turns of memory.
  • Cache product data but never cache prices.

6. Automation vs. Human Support

pie title "Target Resolution Mix (V2)"
    "Fully Automated" : 70
    "Automated + Confirmation" : 15
    "Escalated to Human" : 15

Decision: - Fully automated: discovery, recommendations, FAQ, product Q&A, order tracking. - Automated with confirmation: return initiation. - Always human: billing disputes, fraud reports, legal issues, deeply frustrated users.

Key Challenges

Challenge 1: Manga Domain Knowledge

Problem: General-purpose models may not know about specific editions, translations, or niche titles available on Amazon.

Solution: Use RAG indexed with Amazon's actual product descriptions and editorial content.

Challenge 2: Multi-Format Complexity

Problem: One manga series can have many formats, editions, and bundled products.

Solution: Build a series resolver that groups ASINs by series and presents clear comparisons.

Challenge 3: Hallucinated Recommendations

Problem: The LLM might recommend manga that does not exist on Amazon.

Solution: Only feed the LLM real ASINs from the recommendation engine and validate product references after generation.

Challenge 4: Cold Start for New Users

Problem: New users with no browsing or purchase history get generic recommendations.

Solution: Use popularity-based recommendations, ask one clarifying question, and fall back to genre-based suggestions.

Challenge 5: Real-Time Data Consistency

Problem: Price or availability may change between chat and add-to-cart.

Solution: Always fetch prices in real time, warn that prices may change, and re-check critical data at action time.

Challenge 6: Prompt Injection

Problem: Malicious users may try to manipulate the chatbot with crafted messages.

Solution: Use input scanning, prompt isolation, explicit system rules, and output guardrails.

Challenge 7: Measuring True Impact

Problem: It can be hard to know whether the chatbot caused the purchase.

Solution: Use an A/B test, compare conversion and AOV between groups, and keep a small holdout group after launch.