02 — ProductSearchAgent

Catalog discovery for 5M+ multilingual manga titles. Thin domain wrapper over the Catalog Search MCP.

The ProductSearchAgent is the sub-agent the Orchestrator calls whenever a user message contains a search-shaped intent: title lookup, genre browsing, author discovery, ISBN/ASIN resolution. It is not a separate Claude instance — it is a logical handler exposed to the Orchestrator as a set of MCP tools, plus the small amount of domain logic that lives between the tools and the catalog index.

What it is

A logical sub-agent that owns three responsibilities:

Query understanding — locale detection (English / Japanese / JP romaji), entity extraction (series name, volume number, edition).
Retrieval orchestration — calls the Catalog Search MCP with the right parameters and post-processes results.
Result shaping — enriches raw catalog hits with stock signals, price, and availability before returning to the Orchestrator.

It is backed by the Catalog Search MCP (../RAG-MCP-Integration/01-catalog-search-mcp.md), which itself wraps an OpenSearch Serverless index of 5M+ titles.

Tools exposed to the Orchestrator

Tool	Purpose	Typical use
`search_manga(query, filters)`	Hybrid retrieval over the catalog	"Find dark fantasy manga"
`get_manga_details(asin)`	Structured fetch of a single title	"Tell me about Berserk volume 42"
`find_by_author(author, lang)`	Author-scoped listing	"Show me everything by Naoki Urasawa"
`find_by_series(series_name, volume?)`	Series + volume resolution	"Vinland Saga volume 12 in Japanese"

Each tool description is engineered to disambiguate from similar tools in other MCPs — see 06-tool-dispatch-and-routing.md for tool description engineering.

Retrieval pipeline

User query → Locale detection → Entity extraction
          → Hybrid retrieval (BM25 + Titan dense)
          → RRF fusion (top-50 candidates)
          → BGE-reranker (top-3)
          → Stock + price enrichment
          → Return to Orchestrator

Hybrid retrieval

Pure BM25 misses semantic matches ("dark fantasy" → "Berserk" requires no shared keywords). Pure dense retrieval misses exact-title matches in the long tail. We run both and fuse with Reciprocal Rank Fusion:

def rrf(rank_lists, k=60):
    scores = defaultdict(float)
    for rank_list in rank_lists:
        for rank, doc_id in enumerate(rank_list):
            scores[doc_id] += 1.0 / (k + rank)
    return sorted(scores.items(), key=lambda x: -x[1])

k=60 is empirically chosen — too low and BM25 dominates exact matches at the cost of semantic coverage; too high and dense scores dominate.

Multilingual handling

Titan Embeddings v2 has decent cross-lingual alignment for English/Japanese, but Japanese romaji ("Naruto" written romaji vs. ナルト in kana) is a known weak spot. Mitigation: a small lookup table normalizes common romaji titles to their canonical Japanese form before embedding.

State management

Stateless per call. The agent owns no persistent state of its own — everything it needs is in the call payload or in the Catalog MCP's index.

Two caches sit in the path:

Cache	Store	TTL	What it caches
Embedding cache	ElastiCache	1 hour	Query → embedding vector (dedupe identical queries)
Result cache	ElastiCache	5 minutes	Query+filter hash → top-3 results

Result-cache TTL is short because catalog stock and price change frequently; we don't want a 1-hour-stale "in stock" lying to a customer who clicks through.

Failure handling

Failure	Detection	Recovery
Empty result set	`len(hits) == 0`	Broaden filters → drop genre constraint → semantic fallback with lower threshold → structured `no_results` with suggested alternatives
OpenSearch timeout	800ms exceeded	One retry → return cached if available → fail with structured error
Reranker SageMaker cold	First call after 15min idle	Skip rerank, return RRF top-3 directly with `reranked: false` flag
Malformed input (bad ASIN format)	Schema validation	Return 400 with reason, no retry
Locale mis-detection	Manual heuristic flag	Run both English and Japanese pipelines, merge results

Critical principle: never return silent empty. The Orchestrator's downstream behavior depends on knowing whether "no results" means "we searched and found nothing" vs. "the search failed."

Latency budget

Target: P99 < 800ms per tool call.

Stage              | Budgeted | Notes
-------------------|----------|------
Locale + entity    |    20ms  | Regex + small lookup, in-process
Embedding          |    50ms  | Titan call, often cache hit (~5ms)
OpenSearch hybrid  |   200ms  | Two parallel queries (BM25 + KNN), fused
Reranker           |   100ms  | BGE on SageMaker, warm endpoint
Enrichment         |    30ms  | Stock + price from ElastiCache
Format + return    |    10ms  | XML wrap
-------------------|----------|------
Total              |   410ms  | leaves 390ms for network + cold start

Why this shape

Alternative	Why we rejected it
Single retrieval method (BM25 only)	Misses semantic matches in long-tail genre queries
Single retrieval method (dense only)	Misses exact-title matches; BM25 dominates exact-match precision
Direct OpenSearch from Orchestrator (no MCP boundary)	Couples LLM context to query DSL; hard to evolve
Pre-filter by genre before retrieval	Cuts recall on cross-genre titles; reranker handles this better
Use a single OpenSearch query that does both	OpenSearch supports this but RRF fusion outside the index gives better tunability

Validation: Constraint Sanity Check

Claimed metric	Verdict	Why
P99 < 800ms per tool call	Aggressive under load	Component sum of medians is 410ms; P99 of components stacks to ~1.1–1.4s during concurrent traffic. Holds at low concurrency, breaks at peak.
OpenSearch hybrid 200ms	Optimistic	OpenSearch Serverless KNN over 5M vectors with HNSW: P50 ~80–150ms with `ef_search=64`, P99 ~300–500ms under concurrent load. Two parallel queries (BM25 + KNN) take max of both, so 200ms P99 requires aggressive HNSW tuning.
Reranker 100ms	Cold-start failure mode	BGE-reranker-v2-m3 on SageMaker endpoint warm: 60–100ms. Cold (after autoscale-down): 800ms–2s. The fallback to "skip rerank" is the right call but degrades quality silently — needs a dashboard to track rerank skip rate.
Embedding 50ms	Cache-hit dependent	Titan call P99 over network: 80–150ms. The 50ms target only holds with cache hit (which dedupes identical queries). Fresh queries pay full cost.
5M+ titles, multilingual	Quality claim, not validated here	Cross-lingual retrieval quality (English query → Japanese results and vice versa) needs an offline eval set. The romaji lookup table is a band-aid; properly handling JP romaji probably needs a dedicated tokenizer or fine-tuned embedding. No eval data quoted.
Result-cache TTL 5 min	Inventory freshness conflict	Stock changes within minutes during sales. 5-min cache means up to 5 minutes of "in stock" lies. Either drop the cache to 30–60s, or invalidate on stock-change events from the inventory service.
RRF fusion `k=60`	Empirically chosen — undocumented	Says "empirically chosen." If true, link the experiment. If not, it's a hyperparameter someone copied from a paper.
"Never return silent empty" principle	Realistic if enforced	Depends on actual implementation. Easy to regress — needs a unit test that explicit-empty results never bypass the structured-no-results path.

The biggest issue: P99 latency under concurrency

The 800ms budget assumes the components run in their isolated steady-state. Under realistic concurrent load:

OpenSearch P99 for KNN on 5M vectors with ef_search=64 and 50 concurrent queries: ~400–600ms (not 200ms).
SageMaker reranker endpoint queue depth grows when QPS spikes — reranker P99 can hit 500ms during busy periods.
Titan API rate limits can introduce queueing on the embedding side.

Realistic P99 at production load: 1.2–1.8s. The 800ms target holds at P95, possibly P97. Restate the SLA accordingly.

The freshness/cache contradiction

The 5-minute result cache and the "real-time stock" guarantee in the problem statement are in tension. Two paths to reconcile:

Cache only the retrieval result (top-3 ASINs), not the enrichment (price/stock). Always re-fetch enrichment from the inventory service. This shortens the lie window to whatever the inventory cache TTL is.
Listen to inventory-change events on a Kinesis stream and invalidate result cache entries proactively when their ASINs change. Engineering cost is real; do this only if Path 1 isn't enough.

The current architecture as written caches the whole result for 5 minutes. That's the lie window.

01-orchestrator-agent.md — How the Orchestrator dispatches to this agent
06-tool-dispatch-and-routing.md — Tool description engineering
../RAG-MCP-Integration/01-catalog-search-mcp.md — Catalog MCP internals
../RAG-MCP-Integration/09-rag-retrieval-pipeline-deep-dive.md — Shared retrieval pipeline