MCP Orchestration Router — How Claude Selects and Sequences Tools
Purpose
Documents the decision layer between Claude and the seven MCP servers: how the LLM selects tools, handles multi-step reasoning chains, manages parallel dispatch, and recovers from tool failures. This layer is invisible in code — it lives entirely in tool descriptions and the system prompt — but it is the most critical design surface in the entire chatbot.
The Central Principle
There is no routing code. The router IS Claude.
Tool descriptions are the routing logic. A poorly written tool description is a misrouted request. The orchestration layer is a prompt engineering discipline, not a software architecture problem.
Tool Selection Flow
flowchart TD
UM([User Message]) --> SP[System Prompt\nRole + constraints + tool inventory]
SP --> CI[Claude Inference\nIntent classification via tool descriptions]
CI --> TD{Requires\ntool call?}
TD -->|No| DR([Direct Response\ne.g. general manga trivia])
TD -->|Yes - single intent| ST[Single Tool Call]
TD -->|Yes - multi-intent| MT[Parallel Tool Calls\nasyncio.gather pattern]
ST --> TR[Tool Result]
MT --> TR
TR --> CV{Needs follow-up\ntool call?}
CV -->|Yes - chained| CT[Chain: next tool call\nusing prior result as input]
CV -->|No| FA([Final Answer Synthesis])
CT --> TR
style UM fill:#4A90D9,color:#fff
style FA fill:#27AE60,color:#fff
style CI fill:#8E44AD,color:#fff
System Prompt Architecture
flowchart LR
SP([System Prompt]) --> RD[Role Definition\n'You are MangaAssist...']
SP --> TC[Tool Constraints\n'For policy Qs always use support MCP'\n'Never estimate prices']
SP --> TI[Tool Inventory Summary\n7 MCP servers listed with scope]
SP --> RS[Response Style\nConcise · Cite sources · No hallucination]
SP --> ES[Escalation Rules\nWhen to call escalate_to_agent]
RD --> CL[Claude at Inference]
TC --> CL
TI --> CL
RS --> CL
ES --> CL
style SP fill:#8E44AD,color:#fff
style CL fill:#4A90D9,color:#fff
Critical System Prompt Rules (Non-Negotiable)
TOOL USAGE RULES:
1. For any price question, ALWAYS call get_price. Never state or estimate a price.
2. For any order-related question, ALWAYS call get_order_status or check_stock.
3. For return/refund questions, ALWAYS call answer_faq or check_refund_eligibility. Never answer from training knowledge.
4. For recommendations, ALWAYS call get_recommendations or get_similar_titles.
5. For trending content, ALWAYS call get_trending or get_new_releases.
6. ALWAYS cite the tool source when answering policy questions.
7. If a tool returns escalation_suggested=true, ALWAYS call escalate_to_agent next.
Tool Description Engineering
Each tool description acts as a routing signal. The description must be specific enough that Claude never confuses two tools.
Good vs Bad Tool Descriptions
# BAD — ambiguous, causes misrouting
@app.tool()
async def search(query: str) -> dict:
"""Search for manga information."""
...
# GOOD — specific, scoped, with negative examples
@app.tool()
async def search_manga(query: str, filters: dict = {}) -> dict:
"""
Search the MangaAssist catalog by title, author, genre, or free-text description.
Use for: browsing/discovering titles, finding manga by description, author search.
Do NOT use for: checking stock (use check_stock), recommendations (use get_recommendations),
or trending titles (use get_trending).
Returns: ranked list of manga with title, author, genres, and manga_id.
"""
...
Multi-Tool Orchestration Patterns
Pattern 1: Parallel Independent Calls
sequenceDiagram
actor User
participant Claude
participant CatalogMCP
participant TrendingMCP
participant ReviewMCP
User->>Claude: "What's trending in dark fantasy, and how\nare readers rating those titles?"
par Tool dispatch
Claude->>TrendingMCP: get_trending(genre="dark_fantasy")
and
Claude->>ReviewMCP: get_sentiment_summary(aspect="overall")
end
TrendingMCP-->>Claude: [Vinland Saga, Claymore, Dorohedoro]
ReviewMCP-->>Claude: [4.8, 4.6, 4.4] avg ratings
Claude->>User: Synthesised: trending titles + their community reception
Pattern 2: Sequential Chained Calls
sequenceDiagram
actor User
participant Claude
participant CatalogMCP
participant OrderMCP
User->>Claude: "Is volume 10 of Claymore available?"
Claude->>CatalogMCP: search_manga(query="Claymore")
CatalogMCP-->>Claude: {manga_id: "CLAYMORE", ...}
Claude->>OrderMCP: check_stock(manga_id="CLAYMORE", volume_number=10)
OrderMCP-->>Claude: {in_stock: true, quantity: 7}
Claude->>User: "Yes, Claymore volume 10 is in stock (7 copies available)."
Note: Claude used the manga_id from the first call as input to the second. This chaining is emergent from reasoning, not programmed.
Pattern 3: Conditional Branching
flowchart TD
Q([User: 'I want a refund for my last order']) --> OS[get_order_status\nfetch last order]
OS --> OD{Order found?}
OD -->|No| EX[escalate_to_agent\nCan't find order]
OD -->|Yes| RE[check_refund_eligibility\norder_id + reason]
RE --> EL{Eligible?}
EL -->|Yes| PI[Return eligible\nProvide steps]
EL -->|No, borderline| GW[Offer goodwill credit\nas alternative]
EL -->|No| DN[Explain ineligibility\nwith policy citation]
DN --> SU{User unsatisfied?}
SU -->|Yes| EX
SU -->|No| END([End])
style Q fill:#4A90D9,color:#fff
style EX fill:#C0392B,color:#fff
style PI fill:#27AE60,color:#fff
Tool Failure Handling
flowchart TD
TC([Tool Call]) --> TR{Tool result\nstatus?}
TR -->|success| PR([Process result normally])
TR -->|not_found| FB[Use fallback_strategy\nfrom tool result]
TR -->|error 5xx| RT{Retry count\n< 2?}
RT -->|Yes| TC
RT -->|No| DG[Degrade gracefully\nReturn partial answer + disclaimer]
TR -->|timeout| TM[Inform user of delay\nOffer to try again]
TR -->|escalation_suggested| ES[Call escalate_to_agent\nmandatory per system prompt]
style TC fill:#4A90D9,color:#fff
style PR fill:#27AE60,color:#fff
style DG fill:#E67E22,color:#fff
style ES fill:#C0392B,color:#fff
Context Window Budget Management
With 7 MCP servers and multi-tool calls, tool results can overflow Claude's context window.
pie title Context Window Budget (200k tokens)
"System prompt" : 2000
"Conversation history" : 40000
"Pending tool results" : 30000
"Claude reasoning scratchpad" : 20000
"Safety buffer" : 108000
Tool Result Trimming
Each MCP server is instructed to return structured, minimal results — not prose:
# BAD — prose burns context tokens
return {"answer": "Berserk is a dark fantasy manga by Kentaro Miura published by Hakusensha..."}
# GOOD — structured, Claude synthesises prose
return {
"manga_id": "BERSERK",
"title_en": "Berserk",
"author": "Kentaro Miura",
"genre": ["dark_fantasy", "action", "seinen"],
"volumes": 42,
"status": "ongoing",
"avg_rating": 4.9,
}
Latency Orchestration: Total End-to-End
gantt
title End-to-End Latency Budget (P99) for Complex Multi-Tool Query
dateFormat X
axisFormat %Lms
section Phase 1 — First inference
Claude reads user message + tools list :0, 300
Claude decides which tools to call :300, 500
section Phase 2 — Parallel tool execution
Catalog MCP :500, 1000
Trending MCP :500, 800
Review MCP :500, 950
section Phase 3 — Final synthesis
Claude reads all tool results :1000, 1200
Claude generates final answer :1200, 1700
Streaming tokens to user begins :1500, 1700
Target P99 total: <3 seconds. For simple single-tool queries, target P99 <1.5 seconds.
Observability: Tracing Tool Calls
Every tool invocation is traced end-to-end with AWS X-Ray:
@xray_recorder.capture("mcp_tool_call")
async def dispatch_tool(tool_name: str, args: dict, user_id: str) -> dict:
subsegment = xray_recorder.current_subsegment()
subsegment.put_annotation("tool", tool_name)
subsegment.put_annotation("user_id", user_id)
start = time.monotonic()
result = await _call_mcp_server(tool_name, args)
latency_ms = (time.monotonic() - start) * 1000
subsegment.put_metadata("latency_ms", latency_ms)
subsegment.put_metadata("result_size_bytes", len(json.dumps(result)))
# Emit to CloudWatch for SLA dashboard
cloudwatch.put_metric_data(
Namespace="MangaAssist/MCP",
MetricData=[{
"MetricName": "ToolLatencyMs",
"Dimensions": [{"Name": "Tool", "Value": tool_name}],
"Value": latency_ms,
"Unit": "Milliseconds",
}]
)
return result
Interview Grill
Q: How do you ensure Claude doesn't call a tool when it doesn't need to? A: Two levers: (1) Tool descriptions include explicit "Do NOT use for:" sections. (2) System prompt says "If you can answer factual/general questions from your training knowledge without real-time data, do so without a tool call." This reduces unnecessary tool invocations for questions like "What genre is shonen?"
Q: What if Claude calls the wrong tool? A: Tool call logs are emitted to CloudWatch. A monthly sample of 500 tool calls is reviewed by the ML team. Wrong-tool calls inform tool description rewrites — it's an iterative prompt engineering feedback loop, not a code fix.
Q: How do you handle a user asking something across 4 intents simultaneously? A: Claude can dispatch up to 5 tools in a single parallel batch. If >5 tools are needed, Claude naturally prioritises — the system prompt says "Answer the most critical aspect first; ask the user to clarify the rest." This is preferable to a 6-tool parallel call that may hit the 4s latency budget.
Q: How does the chatbot handle a tool result that contradicts another tool result?
A: Tool results are tagged with source and freshness. If Trending MCP says "in stock" but Catalog MCP has no listing, Claude surfaces the discrepancy: "The trending list includes X, but I couldn't find it in the catalog — it may be a very recent listing." Contradiction surfacing is preferred over silent resolution.