Embedding Retrieval Scenarios - MangaAssist
This companion document turns the embedding fine-tuning topic into concrete MangaAssist scenarios. It assumes MangaAssist already uses a fine-tuned intent classifier for routing and then calls retrieval for product discovery, recommendations, product questions, and FAQ grounding.
When This Topic Matters
Use embedding fine-tuning when the chatbot understands the user's intent but retrieves the wrong manga, help article, or catalog cluster.
| Symptom | Example user message | Likely failure |
|---|---|---|
| Similar titles are missed | "dark fantasy like Berserk but less brutal" | embedding space overweights popularity |
| Tone is missed | "cozy slice of life manga with found family" | genre tags are too shallow |
| Japanese/English phrasing drifts | "iyashikei manga for stress relief" | multilingual domain phrase not close to catalog tags |
| Support retrieval is weak | "why did preorder shipping split my order?" | FAQ article not near the query |
Scenario 1 - Catalog Search Adapter
MangaAssist keeps Titan embeddings as the base encoder and trains a small projection adapter on top of 1024-dimensional vectors.
Training data:
| Source | Count | Positive signal | Negative signal |
|---|---|---|---|
| search clicks | 18,000 | clicked product | skipped but shown result |
| add-to-cart events | 4,000 | purchased or wishlisted title | high-rank non-click |
| editorial pairs | 2,000 | curated similar title | same broad genre, wrong tone |
| synthetic pairs | 3,000 | generated from metadata | filtered by reviewer |
Training objective:
loss = -log exp(sim(query, positive) / tau)
/ sum(exp(sim(query, candidate_i) / tau))
Recommended first run:
| Setting | Value |
|---|---|
| Adapter | 1024 -> 512 -> 1024 MLP with LayerNorm |
| Loss | InfoNCE |
| Temperature | 0.07 |
| Batch size | 128 |
| Hard negatives | 7 per query |
| Epochs | 4 |
Promotion gate:
| Metric | Baseline | Candidate gate |
|---|---|---|
| Recall@3 | 0.68 | >= 0.80 |
| NDCG@10 | 0.71 | >= 0.78 |
| Bad top-1 rate | 14% | <= 8% |
| Embedding latency add-on | 0 ms | <= 2 ms |
Scenario 2 - Query Rewriting Feedback Loop
Some failures are caused by query wording, not the catalog vector. Keep a rejected-query bucket from the intent classifier and retrieval layer:
- top retrieval score below threshold,
- user reformulates within 30 seconds,
- user clicks none of the top 10 results,
- user uses manga-specific slang or romanized Japanese.
Cluster those failures and create new training pairs. Example cluster:
"slow burn rivals to lovers"
"enemies to lovers manga but not fantasy"
"rivals become couple manga"
Action: add positives from romance subgenre metadata and hard negatives from action-rivalry titles without romance.
Scenario 3 - Metadata-Aware Retrieval
For titles with sparse descriptions, create enriched product text:
title + author + demographic + genre + tone + themes + age rating + availability
Fine-tune the adapter on the enriched text, but evaluate on live user queries. This prevents the model from winning only on artificial metadata similarity.
Production Logs To Add
{
"event": "retrieval_eval",
"query": "dark fantasy like Berserk but less brutal",
"intent": "recommendation",
"top3_titles": ["Claymore", "Vinland Saga", "Vagabond"],
"recall_at_3_hit": true,
"adapter_version": "embed-adapter-v03",
"latency_ms": 3.1
}
Failure Modes
| Failure | Detection | Fix |
|---|---|---|
| popularity collapse | top results repeat best sellers | add popularity-balanced negatives |
| synthetic drift | synthetic queries outperform real logs | cap synthetic share below 20% |
| metadata leakage | eval positives share exact tags | evaluate on held-out real queries |
| over-narrow retrieval | same subgenre only | add diverse positives within user taste |
Final Decision
Ship an embedding adapter only if it improves real user retrieval without adding enough latency to hurt the 15 ms intent-routing path or downstream response time. For MangaAssist, Recall@3 and NDCG@10 matter more than raw embedding loss.