LOCAL PREVIEW View on GitHub

MCP Orchestration Router — How Claude Selects and Sequences Tools

Purpose

Documents the decision layer between Claude and the seven MCP servers: how the LLM selects tools, handles multi-step reasoning chains, manages parallel dispatch, and recovers from tool failures. This layer is invisible in code — it lives entirely in tool descriptions and the system prompt — but it is the most critical design surface in the entire chatbot.


The Central Principle

There is no routing code. The router IS Claude.

Tool descriptions are the routing logic. A poorly written tool description is a misrouted request. The orchestration layer is a prompt engineering discipline, not a software architecture problem.


Tool Selection Flow

flowchart TD
    UM([User Message]) --> SP[System Prompt\nRole + constraints + tool inventory]
    SP --> CI[Claude Inference\nIntent classification via tool descriptions]
    CI --> TD{Requires\ntool call?}
    TD -->|No| DR([Direct Response\ne.g. general manga trivia])
    TD -->|Yes - single intent| ST[Single Tool Call]
    TD -->|Yes - multi-intent| MT[Parallel Tool Calls\nasyncio.gather pattern]

    ST --> TR[Tool Result]
    MT --> TR
    TR --> CV{Needs follow-up\ntool call?}
    CV -->|Yes - chained| CT[Chain: next tool call\nusing prior result as input]
    CV -->|No| FA([Final Answer Synthesis])
    CT --> TR

    style UM fill:#4A90D9,color:#fff
    style FA fill:#27AE60,color:#fff
    style CI fill:#8E44AD,color:#fff

System Prompt Architecture

flowchart LR
    SP([System Prompt]) --> RD[Role Definition\n'You are MangaAssist...']
    SP --> TC[Tool Constraints\n'For policy Qs always use support MCP'\n'Never estimate prices']
    SP --> TI[Tool Inventory Summary\n7 MCP servers listed with scope]
    SP --> RS[Response Style\nConcise · Cite sources · No hallucination]
    SP --> ES[Escalation Rules\nWhen to call escalate_to_agent]

    RD --> CL[Claude at Inference]
    TC --> CL
    TI --> CL
    RS --> CL
    ES --> CL

    style SP fill:#8E44AD,color:#fff
    style CL fill:#4A90D9,color:#fff

Critical System Prompt Rules (Non-Negotiable)

TOOL USAGE RULES:
1. For any price question, ALWAYS call get_price. Never state or estimate a price.
2. For any order-related question, ALWAYS call get_order_status or check_stock.
3. For return/refund questions, ALWAYS call answer_faq or check_refund_eligibility. Never answer from training knowledge.
4. For recommendations, ALWAYS call get_recommendations or get_similar_titles.
5. For trending content, ALWAYS call get_trending or get_new_releases.
6. ALWAYS cite the tool source when answering policy questions.
7. If a tool returns escalation_suggested=true, ALWAYS call escalate_to_agent next.

Tool Description Engineering

Each tool description acts as a routing signal. The description must be specific enough that Claude never confuses two tools.

Good vs Bad Tool Descriptions

# BAD — ambiguous, causes misrouting
@app.tool()
async def search(query: str) -> dict:
    """Search for manga information."""
    ...

# GOOD — specific, scoped, with negative examples
@app.tool()
async def search_manga(query: str, filters: dict = {}) -> dict:
    """
    Search the MangaAssist catalog by title, author, genre, or free-text description.
    Use for: browsing/discovering titles, finding manga by description, author search.
    Do NOT use for: checking stock (use check_stock), recommendations (use get_recommendations),
    or trending titles (use get_trending).
    Returns: ranked list of manga with title, author, genres, and manga_id.
    """
    ...

Multi-Tool Orchestration Patterns

Pattern 1: Parallel Independent Calls

sequenceDiagram
    actor User
    participant Claude
    participant CatalogMCP
    participant TrendingMCP
    participant ReviewMCP

    User->>Claude: "What's trending in dark fantasy, and how\nare readers rating those titles?"

    par Tool dispatch
        Claude->>TrendingMCP: get_trending(genre="dark_fantasy")
    and
        Claude->>ReviewMCP: get_sentiment_summary(aspect="overall")
    end

    TrendingMCP-->>Claude: [Vinland Saga, Claymore, Dorohedoro]
    ReviewMCP-->>Claude: [4.8, 4.6, 4.4] avg ratings

    Claude->>User: Synthesised: trending titles + their community reception

Pattern 2: Sequential Chained Calls

sequenceDiagram
    actor User
    participant Claude
    participant CatalogMCP
    participant OrderMCP

    User->>Claude: "Is volume 10 of Claymore available?"

    Claude->>CatalogMCP: search_manga(query="Claymore")
    CatalogMCP-->>Claude: {manga_id: "CLAYMORE", ...}

    Claude->>OrderMCP: check_stock(manga_id="CLAYMORE", volume_number=10)
    OrderMCP-->>Claude: {in_stock: true, quantity: 7}

    Claude->>User: "Yes, Claymore volume 10 is in stock (7 copies available)."

Note: Claude used the manga_id from the first call as input to the second. This chaining is emergent from reasoning, not programmed.

Pattern 3: Conditional Branching

flowchart TD
    Q([User: 'I want a refund for my last order']) --> OS[get_order_status\nfetch last order]
    OS --> OD{Order found?}
    OD -->|No| EX[escalate_to_agent\nCan't find order]
    OD -->|Yes| RE[check_refund_eligibility\norder_id + reason]
    RE --> EL{Eligible?}
    EL -->|Yes| PI[Return eligible\nProvide steps]
    EL -->|No, borderline| GW[Offer goodwill credit\nas alternative]
    EL -->|No| DN[Explain ineligibility\nwith policy citation]
    DN --> SU{User unsatisfied?}
    SU -->|Yes| EX
    SU -->|No| END([End])

    style Q fill:#4A90D9,color:#fff
    style EX fill:#C0392B,color:#fff
    style PI fill:#27AE60,color:#fff

Tool Failure Handling

flowchart TD
    TC([Tool Call]) --> TR{Tool result\nstatus?}
    TR -->|success| PR([Process result normally])
    TR -->|not_found| FB[Use fallback_strategy\nfrom tool result]
    TR -->|error 5xx| RT{Retry count\n< 2?}
    RT -->|Yes| TC
    RT -->|No| DG[Degrade gracefully\nReturn partial answer + disclaimer]
    TR -->|timeout| TM[Inform user of delay\nOffer to try again]
    TR -->|escalation_suggested| ES[Call escalate_to_agent\nmandatory per system prompt]

    style TC fill:#4A90D9,color:#fff
    style PR fill:#27AE60,color:#fff
    style DG fill:#E67E22,color:#fff
    style ES fill:#C0392B,color:#fff

Context Window Budget Management

With 7 MCP servers and multi-tool calls, tool results can overflow Claude's context window.

pie title Context Window Budget (200k tokens)
    "System prompt" : 2000
    "Conversation history" : 40000
    "Pending tool results" : 30000
    "Claude reasoning scratchpad" : 20000
    "Safety buffer" : 108000

Tool Result Trimming

Each MCP server is instructed to return structured, minimal results — not prose:

# BAD — prose burns context tokens
return {"answer": "Berserk is a dark fantasy manga by Kentaro Miura published by Hakusensha..."}

# GOOD — structured, Claude synthesises prose
return {
    "manga_id": "BERSERK",
    "title_en": "Berserk",
    "author": "Kentaro Miura",
    "genre": ["dark_fantasy", "action", "seinen"],
    "volumes": 42,
    "status": "ongoing",
    "avg_rating": 4.9,
}

Latency Orchestration: Total End-to-End

gantt
    title End-to-End Latency Budget (P99) for Complex Multi-Tool Query
    dateFormat X
    axisFormat %Lms

    section Phase 1 — First inference
    Claude reads user message + tools list   :0, 300
    Claude decides which tools to call       :300, 500

    section Phase 2 — Parallel tool execution
    Catalog MCP                              :500, 1000
    Trending MCP                             :500, 800
    Review MCP                               :500, 950

    section Phase 3 — Final synthesis
    Claude reads all tool results            :1000, 1200
    Claude generates final answer            :1200, 1700
    Streaming tokens to user begins          :1500, 1700

Target P99 total: <3 seconds. For simple single-tool queries, target P99 <1.5 seconds.


Observability: Tracing Tool Calls

Every tool invocation is traced end-to-end with AWS X-Ray:

@xray_recorder.capture("mcp_tool_call")
async def dispatch_tool(tool_name: str, args: dict, user_id: str) -> dict:
    subsegment = xray_recorder.current_subsegment()
    subsegment.put_annotation("tool", tool_name)
    subsegment.put_annotation("user_id", user_id)

    start = time.monotonic()
    result = await _call_mcp_server(tool_name, args)
    latency_ms = (time.monotonic() - start) * 1000

    subsegment.put_metadata("latency_ms", latency_ms)
    subsegment.put_metadata("result_size_bytes", len(json.dumps(result)))

    # Emit to CloudWatch for SLA dashboard
    cloudwatch.put_metric_data(
        Namespace="MangaAssist/MCP",
        MetricData=[{
            "MetricName": "ToolLatencyMs",
            "Dimensions": [{"Name": "Tool", "Value": tool_name}],
            "Value": latency_ms,
            "Unit": "Milliseconds",
        }]
    )
    return result

Interview Grill

Q: How do you ensure Claude doesn't call a tool when it doesn't need to? A: Two levers: (1) Tool descriptions include explicit "Do NOT use for:" sections. (2) System prompt says "If you can answer factual/general questions from your training knowledge without real-time data, do so without a tool call." This reduces unnecessary tool invocations for questions like "What genre is shonen?"

Q: What if Claude calls the wrong tool? A: Tool call logs are emitted to CloudWatch. A monthly sample of 500 tool calls is reviewed by the ML team. Wrong-tool calls inform tool description rewrites — it's an iterative prompt engineering feedback loop, not a code fix.

Q: How do you handle a user asking something across 4 intents simultaneously? A: Claude can dispatch up to 5 tools in a single parallel batch. If >5 tools are needed, Claude naturally prioritises — the system prompt says "Answer the most critical aspect first; ask the user to clarify the rest." This is preferable to a 6-tool parallel call that may hit the 4s latency budget.

Q: How does the chatbot handle a tool result that contradicts another tool result? A: Tool results are tagged with source and freshness. If Trending MCP says "in stock" but Catalog MCP has no listing, Claude surfaces the discrepancy: "The trending list includes X, but I couldn't find it in the catalog — it may be a very recent listing." Contradiction surfacing is preferred over silent resolution.