15. Trade-offs and Challenges

Key Trade-offs

1. Rule-Based vs. LLM-Based Responses

graph LR
    subgraph "Rule-Based"
        A[Fast<br>< 10ms]
        B[Predictable<br>No hallucination]
        C[Rigid<br>Cannot handle novel queries]
        D[Cheap<br>No LLM cost]
    end

    subgraph "LLM-Based"
        E[Slower<br>1-3 seconds]
        F[Flexible<br>Handles more cases]
        G[Risk<br>Can hallucinate]
        H[Expensive<br>Per-token cost]
    end

Decision: Hybrid. Use rules for greetings, simple lookups, and obvious order status questions. Use the LLM for recommendations, complex Q&A, and multi-turn conversations.

2. RAG vs. Direct API-Based Answers

Approach	Pros	Cons	Best For
RAG	Broad knowledge, flexible	Retrieval can fail, slower	FAQ, policies, editorial content
Direct API	Precise, fast, no hallucination	Rigid, needs per-intent coding	Order status, prices, inventory

Decision: Use both. RAG for knowledge-heavy questions, direct API for structured data lookups.

3. Latency vs. Quality

graph TD
    A[User asks question] --> B{Simple or Complex?}
    B -->|Simple| C[Fast Path<br>Rule + Template<br>< 200ms]
    B -->|Complex| D{How complex?}
    D -->|Medium| E[RAG + LLM<br>1-2 seconds]
    D -->|High| F[Multi-step reasoning<br>2-3 seconds]

Decision: Target under 1 second for first token, use a smaller model for simple tasks, reserve the best model for truly complex requests, and never let one request exceed 5 seconds.

4. Personalization vs. Privacy

Decision: Prefer privacy-first personalization.

Authenticated users can use purchase history and browsing data.
Guest users use current session data only.
Never display full addresses, payment info, or sensitive data.
Keep all personalization data encrypted and access-controlled.

5. Cost vs. Accuracy

Decision: Optimize for the sweet spot.

Use Sonnet-class generation for quality.
Retrieve the top 3 RAG chunks with reranking.
Keep 10 turns of memory.
Cache product data but never cache prices.

6. Automation vs. Human Support

pie title "Target Resolution Mix (V2)"
    "Fully Automated" : 70
    "Automated + Confirmation" : 15
    "Escalated to Human" : 15

Decision: - Fully automated: discovery, recommendations, FAQ, product Q&A, order tracking. - Automated with confirmation: return initiation. - Always human: billing disputes, fraud reports, legal issues, deeply frustrated users.

Key Challenges

Challenge 1: Manga Domain Knowledge

Problem: General-purpose models may not know about specific editions, translations, or niche titles available on Amazon.

Solution: Use RAG indexed with Amazon's actual product descriptions and editorial content.

Challenge 2: Multi-Format Complexity

Problem: One manga series can have many formats, editions, and bundled products.

Solution: Build a series resolver that groups ASINs by series and presents clear comparisons.

Challenge 3: Hallucinated Recommendations

Problem: The LLM might recommend manga that does not exist on Amazon.

Solution: Only feed the LLM real ASINs from the recommendation engine and validate product references after generation.

Challenge 4: Cold Start for New Users

Problem: New users with no browsing or purchase history get generic recommendations.

Solution: Use popularity-based recommendations, ask one clarifying question, and fall back to genre-based suggestions.

Challenge 5: Real-Time Data Consistency

Problem: Price or availability may change between chat and add-to-cart.

Solution: Always fetch prices in real time, warn that prices may change, and re-check critical data at action time.

Challenge 6: Prompt Injection

Problem: Malicious users may try to manipulate the chatbot with crafted messages.

Solution: Use input scanning, prompt isolation, explicit system rules, and output guardrails.

Challenge 7: Measuring True Impact

Problem: It can be hard to know whether the chatbot caused the purchase.

Solution: Use an A/B test, compare conversion and AOV between groups, and keep a small holdout group after launch.