Mixture of Experts Scenarios - MangaAssist
Mixture of Experts, or MoE, routes each request to one or more specialized expert models. For MangaAssist, MoE is useful when different traffic types need different behavior: manga recommendations, support policies, checkout help, and human escalation are not the same task.
When This Topic Matters
Use MoE when one model shows conflicting behavior across workflows:
- recommendation responses need rich taste matching,
- support responses need strict procedure,
- escalation requires safe handoff language,
- genre-heavy queries need domain nuance,
- chitchat should stay short and not trigger expensive retrieval.
Scenario 1 - Workflow Expert Routing
Experts:
| Expert | Trigger | Model behavior |
|---|---|---|
catalog_reco |
product discovery, recommendation | taste matching and title comparison |
product_qa |
product questions | metadata-grounded answers |
commerce_support |
order, return, checkout | policy-grounded procedural answers |
escalation_safety |
frustration or human handoff | empathetic escalation |
smalltalk |
chitchat | lightweight conversational response |
Router inputs:
- intent classifier probabilities,
- sentiment/frustration score,
- retrieval confidence,
- top-2 intent margin,
- business risk score.
Promotion gate:
| Metric | Gate |
|---|---|
| routing accuracy by workflow | >= 95% |
| expert load imbalance | no expert over 45% unless expected |
| high-risk misroute rate | <= 0.2% |
| cost per conversation | <= champion + 5% |
Scenario 2 - Genre Expert Adapters
For recommendation-heavy traffic, the router can select genre adapters:
- shonen,
- shojo,
- seinen,
- josei,
- horror,
- romance,
- slice-of-life,
- isekai.
Keep this behind a strong fallback. If genre confidence is low, use the general recommendation expert rather than forcing a niche expert.
Scenario 3 - Sparse Expert Serving
MoE can reduce cost if only one or two experts run per request.
Decision rule:
if business_risk_high:
run support/escalation expert
elif intent_confident and genre_confident:
run one genre expert
else:
run general expert
Failure Modes
| Failure | Detection | Fix |
|---|---|---|
| wrong expert selected | good model gives wrong workflow | improve router and use fallback thresholds |
| load collapse | one expert gets all traffic | add load balancing loss |
| expert disagreement | two experts conflict | add arbitration policy |
| too much complexity | ops overhead exceeds gain | use simpler adapter routing |
Production Log
{
"event": "moe_route",
"intent": "recommendation",
"genre_hint": "seinen",
"selected_experts": ["catalog_reco", "seinen_adapter"],
"router_confidence": 0.88,
"business_risk": 0.12,
"latency_ms": 917
}
Final Decision
MoE is powerful for MangaAssist only after simpler routing, LoRA adapters, and calibrated intent probabilities are mature. Use it when specialization clearly improves quality without making the production system hard to reason about.