RAG-Based MCP Integration — MangaAssist Amazon Chatbot

What Is MCP (Model Context Protocol)?

MCP is Anthropic's open standard that lets an LLM call typed, schema-validated tools backed by external servers. Each MCP Server exposes: - A tool manifest (name, description, JSON-schema inputs/outputs) - A transport (stdio, HTTP/SSE, or WebSocket) - Security (OAuth 2.0 / API-key per server)

The LLM decides at inference time which tool to call — no hardcoded routing.

What Is RAG-Based MCP?

A plain MCP tool is a live API call (database lookup, REST endpoint). A RAG-based MCP tool wraps a retrieval pipeline before hitting the LLM or database.

Key distinction: The MCP server IS the RAG system. The LLM doesn't hold knowledge — it holds reasoning; the MCP servers hold knowledge.

flowchart TD
    A([User Query]) --> B[MCP Router\nLLM decides tool]
    B --> C[MCP Server]
    C --> D[1 · Embed Query\nTitan Embed v2]
    D --> E[2 · Retrieve\nOpenSearch / DynamoDB / S3]
    E --> F[3 · Rerank\nCross-encoder / RRF fusion]
    F --> G[4 · Format\nStructured context block]
    G --> H[Tool Result returned to LLM]
    H --> I([LLM synthesises\nfinal answer])

    style A fill:#4A90D9,color:#fff
    style I fill:#27AE60,color:#fff
    style C fill:#8E44AD,color:#fff

Chatbot MCP Landscape — All Seven Servers

#	MCP Server	RAG Source	Primary Purpose	Critical Scale Factor
1	Catalog Search MCP	OpenSearch (manga metadata)	Title/genre/author lookup	5M+ manga titles, multilingual
2	User Preference MCP	DynamoDB + Personalize vectors	Personalized recs	10M users × sparse preference vectors
3	Order & Inventory MCP	RDS + ElastiCache (hybrid RAG)	Order status, stock check	Real-time freshness requirement
4	Review & Sentiment MCP	OpenSearch (review corpus)	Community opinion synthesis	50M+ reviews, sentiment aggregation
5	Support & Policy MCP	S3 + OpenSearch (doc RAG)	FAQ, returns, billing help	Policy doc freshness, version control
6	Trending & Discovery MCP	DynamoDB Streams + Kinesis	New releases, bestsellers	Sub-5s time-to-trend requirement
7	Cross-Title Link MCP	Neptune (graph) + OpenSearch	"If you liked X, try Y"	Multi-hop graph traversal + RAG

High-Level Architecture

flowchart TD
    User([User]) --> APIGW[API Gateway]
    APIGW --> Lambda[Lambda Handler]
    Lambda --> Claude[Claude claude-sonnet-4-6\nMCP Client]

    Claude -->|Tool dispatch\nLLM decides| MCP1[Catalog\nSearch MCP]
    Claude --> MCP2[User Preference\nMCP]
    Claude --> MCP3[Order &\nInventory MCP]
    Claude --> MCP4[Review &\nSentiment MCP]
    Claude --> MCP5[Support &\nPolicy MCP]
    Claude --> MCP6[Trending &\nDiscovery MCP]
    Claude --> MCP7[Cross-Title\nLink MCP]

    MCP1 --> DS1[(OpenSearch\nManga Index)]
    MCP2 --> DS2[(DynamoDB +\nPersonalize)]
    MCP3 --> DS3[(RDS + \nElastiCache)]
    MCP4 --> DS4[(OpenSearch\nReview Corpus)]
    MCP5 --> DS5[(S3 + \nOpenSearch Docs)]
    MCP6 --> DS6[(Kinesis +\nDynamoDB Streams)]
    MCP7 --> DS7[(Neptune Graph\n+ OpenSearch)]

    style User fill:#4A90D9,color:#fff
    style Claude fill:#8E44AD,color:#fff
    style MCP1 fill:#E67E22,color:#fff
    style MCP2 fill:#E67E22,color:#fff
    style MCP3 fill:#E67E22,color:#fff
    style MCP4 fill:#E67E22,color:#fff
    style MCP5 fill:#E67E22,color:#fff
    style MCP6 fill:#E67E22,color:#fff
    style MCP7 fill:#E67E22,color:#fff

Shared Infrastructure

Embedding Layer (All MCP Servers)

# All RAG-based MCPs share the same embedding service
embedding_config = {
    "model": "amazon.titan-embed-text-v2:0",
    "dimensions": 1024,
    "normalize": True,
    "batch_size": 100,
}
# Deployed as: ECS Fargate service, behind ElastiCache for dedup

MCP Server Transport

All servers use HTTP/SSE transport on Amazon ECS Fargate: - POST /tools/list — capability discovery - POST /tools/call — tool execution - JWT-authenticated via Cognito → API Gateway → ECS

Tool Invocation Flow (Claude SDK)

import anthropic

client = anthropic.Anthropic()

response = client.beta.messages.create(
    model="claude-sonnet-4-6",
    max_tokens=4096,
    tools=[
        {"type": "mcp", "server": "catalog-mcp",   "tools": ["search_manga", "get_manga_details"]},
        {"type": "mcp", "server": "user-pref-mcp",  "tools": ["get_recommendations"]},
        {"type": "mcp", "server": "order-mcp",      "tools": ["get_order_status", "check_stock"]},
        {"type": "mcp", "server": "review-mcp",     "tools": ["get_sentiment_summary"]},
        {"type": "mcp", "server": "support-mcp",    "tools": ["answer_faq", "get_return_policy"]},
        {"type": "mcp", "server": "trending-mcp",   "tools": ["get_trending", "get_new_releases"]},
        {"type": "mcp", "server": "graph-mcp",      "tools": ["get_similar_titles"]},
    ],
    messages=[{"role": "user", "content": user_message}],
    betas=["mcp-client-2025-04-04"],
)

RAG Pipeline — Shared Stages Across All MCPs

Stage 1: Query Understanding (Pre-Retrieval)

flowchart LR
    Q([Raw Query\n'dark fantasy manga\nlike Berserk for adults']) --> IC[Intent Classifier]
    IC --> QR[Query Rewriter]
    QR --> SD[Sub-query\nDecomposer]

    SD --> S1[manga similar to Berserk\n→ Cross-Title MCP]
    SD --> S2[dark fantasy genre filter\n→ Catalog MCP]
    SD --> S3[adult / mature themes\n→ User Preference MCP]

    style Q fill:#4A90D9,color:#fff
    style S1 fill:#27AE60,color:#fff
    style S2 fill:#27AE60,color:#fff
    style S3 fill:#27AE60,color:#fff

Stage 2: Hybrid Retrieval

flowchart LR
    Q([Query]) --> DE[Dense Embedding\nTitan Embed v2]
    Q --> BM[BM25 Keyword\nSparse Match]
    DE --> SF[Score Fusion\nRRF / Weighted Sum]
    BM --> SF
    SF --> TK[Top-K Candidates]

    style Q fill:#4A90D9,color:#fff
    style TK fill:#27AE60,color:#fff

Stage 3: Reranking → Context Assembly

flowchart LR
    TK[Top-K Candidates] --> RR[Cross-Encoder Reranker\nBGE-reranker-v2-m3\non SageMaker]
    RR --> T3[Top-3 Results]
    T3 --> CA[Context Assembly\nStructured XML block]
    CA --> TR([Tool Result\nreturned to Claude])

    style TK fill:#E67E22,color:#fff
    style TR fill:#27AE60,color:#fff

Stage 4: Context Assembly Code

def assemble_context(results: list[RetrievedDoc], tool_name: str) -> str:
    return f"""
<tool_result name="{tool_name}">
{chr(10).join(f'[{i+1}] {r.to_structured_xml()}' for i, r in enumerate(results))}
</tool_result>
"""

MCP Selection Decision Tree

flowchart TD
    IN([User Intent Detected]) --> Q1{Intent type?}

    Q1 -->|Browse / Search| MCP1[Catalog Search MCP]
    Q1 -->|Personalised Rec| MCP2A[User Preference MCP]
    MCP2A --> MCP2B[+ Catalog MCP]
    Q1 -->|Order Status| MCP3[Order & Inventory MCP]
    Q1 -->|In stock?| MCP3
    Q1 -->|Reader opinions| MCP4[Review & Sentiment MCP]
    Q1 -->|Return policy\nCancel / Billing| MCP5[Support & Policy MCP]
    Q1 -->|What's new?\nTrending?| MCP6[Trending & Discovery MCP]
    Q1 -->|More like this\nSame author| MCP7[Cross-Title Link MCP]
    Q1 -->|2+ intents| PAR[Multi-MCP\nParallel Call]

    style IN fill:#4A90D9,color:#fff
    style PAR fill:#C0392B,color:#fff

Multi-MCP Parallel Tool Calls

sequenceDiagram
    actor User
    participant Claude
    participant CatalogMCP
    participant TrendingMCP
    participant GraphMCP

    User->>Claude: "I loved Berserk — what's trending\nsimilar to it, and can I order vol 42?"

    par Parallel dispatch
        Claude->>CatalogMCP: search_manga("Berserk volume 42")
    and
        Claude->>TrendingMCP: get_trending(genre="dark_fantasy")
    and
        Claude->>GraphMCP: get_similar_titles("Berserk")
    end

    CatalogMCP-->>Claude: Stock: in stock, ¥1,200
    TrendingMCP-->>Claude: [Vagabond, Vinland Saga, Claymore]
    GraphMCP-->>Claude: [Vinland Saga, Claymore, Gantz]

    Claude->>User: Synthesised answer combining\nstock status + trending + similar titles

Key Architecture Decisions + Trade-offs

Decision	Choice	Why
MCP transport	HTTP/SSE	Streaming support; stdio only for local dev
Embedding model	Titan Embed v2	AWS-native, no egress cost, 1024-dim
Retrieval store	OpenSearch Serverless	Auto-scaling, FAISS under the hood, no cluster management
Reranker	BGE-reranker on SageMaker	Better precision than BM25 alone, <50ms
MCP server host	ECS Fargate	Per-MCP isolation, independent scaling
Auth	Cognito → JWT per MCP	Fine-grained per-server permissions
Cache layer	ElastiCache (Redis)	Embedding dedup, result caching for trending

Interview Grill: Quick-Fire

Q: Why not just put all tools in one MCP server? A: Single server means single blast radius. If Review MCP goes down, Catalog still works. Also: independent scaling — Trending spikes on Mondays, Order spikes on sale days.

Q: How does the LLM know which MCP to call? A: Tool descriptions are the API. Each tool carries a rich description string. Claude reads it at inference time — no routing code in application layer.

Q: What happens if a RAG retrieval returns nothing? A: Each MCP server implements a fallback_strategy: (1) broaden filters, (2) semantic fallback with lower threshold, (3) return structured no_results object with suggested alternatives — never silent empty.

Q: How do you prevent prompt injection in MCP tool results? A: Tool results are wrapped in <tool_result> XML tags. Claude treats content inside these tags as data, not instructions. Additionally, the MCP server strips markdown headings and code fences from user-generated content before returning.

Q: What's the latency budget per MCP call? A: P99 < 800ms per tool call. Breakdown: embed(50ms) + retrieve(200ms) + rerank(100ms) + format(10ms) = 360ms, leaving 440ms buffer for network + cold start.