Knowledge Distillation Scenarios - MangaAssist
Knowledge distillation trains a smaller student model to mimic a stronger teacher. For MangaAssist, this is useful when a high-quality model is too slow or expensive for every request.
When This Topic Matters
Use distillation when:
- a managed LLM gives great answers but is expensive,
- a cross-encoder reranker is accurate but slow,
- the intent classifier needs a smaller mobile or CPU variant,
- you want teacher labels for unlabeled production logs.
Scenario 1 - Response Model Distillation
Teacher:
- high-quality managed model,
- retrieval-grounded prompt,
- strict MangaAssist answer rubric.
Student:
- smaller self-hosted model,
- trained on teacher responses plus human-reviewed production examples.
Dataset:
| Source | Count | Purpose |
|---|---|---|
| production prompts | 25,000 | realistic user distribution |
| teacher responses | 25,000 | target answer behavior |
| human-corrected responses | 5,000 | prevent teacher mistakes from copying |
| refusal/escalation examples | 2,000 | safety and support behavior |
Promotion gate:
| Metric | Gate |
|---|---|
| teacher preference match | >= 85% |
| human win rate vs base student | >= 65% |
| catalog hallucination rate | <= 4% |
| cost per 1K responses | at least 50% lower |
Scenario 2 - Distilled Reranker
Train a small reranker from a larger cross-encoder teacher.
Teacher output:
{
"query": "romance manga with adult cast",
"candidate": "Nana",
"teacher_score": 0.94
}
Student objective:
loss = alpha * supervised_relevance_loss
+ (1 - alpha) * mse(student_score, teacher_score)
Use this when the teacher improves NDCG but cannot fit latency targets.
Scenario 3 - Intent Soft-Label Distillation
The intent classifier can distill from an ensemble:
- DistilBERT fine-tuned classifier,
- rules for high-risk escalation phrases,
- LLM adjudicator on ambiguous labels.
Soft labels preserve ambiguity:
return_request: 0.55
order_tracking: 0.30
checkout_help: 0.10
faq: 0.05
This teaches the student that many MangaAssist messages are not cleanly one-hot.
Failure Modes
| Failure | Detection | Fix |
|---|---|---|
| teacher hallucinations copied | student repeats wrong facts | human filter and retrieval-grounded teacher prompts |
| student too small | high loss and weak eval | increase size or use LoRA |
| dark knowledge hurts rare classes | rare recall drops | blend hard labels and soft labels |
| cost win too small | serving cost barely changes | quantize student or simplify architecture |
Production Log
{
"event": "distilled_model_eval",
"student": "manga-student-8b-v03",
"teacher": "managed-teacher-v07",
"human_win_rate": 0.67,
"hallucination_rate": 0.031,
"cost_reduction": 0.58
}
Final Decision
Distillation is worth it for MangaAssist when the teacher's behavior can be captured by a cheaper model without copying its mistakes. Always keep a human-reviewed set for policy, escalation, and factuality.