Knowledge Distillation Scenarios - MangaAssist

Knowledge distillation trains a smaller student model to mimic a stronger teacher. For MangaAssist, this is useful when a high-quality model is too slow or expensive for every request.

When This Topic Matters

Use distillation when:

a managed LLM gives great answers but is expensive,
a cross-encoder reranker is accurate but slow,
the intent classifier needs a smaller mobile or CPU variant,
you want teacher labels for unlabeled production logs.

Scenario 1 - Response Model Distillation

Teacher:

high-quality managed model,
retrieval-grounded prompt,
strict MangaAssist answer rubric.

Student:

smaller self-hosted model,
trained on teacher responses plus human-reviewed production examples.

Dataset:

Source	Count	Purpose
production prompts	25,000	realistic user distribution
teacher responses	25,000	target answer behavior
human-corrected responses	5,000	prevent teacher mistakes from copying
refusal/escalation examples	2,000	safety and support behavior

Promotion gate:

Metric	Gate
teacher preference match	>= 85%
human win rate vs base student	>= 65%
catalog hallucination rate	<= 4%
cost per 1K responses	at least 50% lower

Scenario 2 - Distilled Reranker

Train a small reranker from a larger cross-encoder teacher.

Teacher output:

{
  "query": "romance manga with adult cast",
  "candidate": "Nana",
  "teacher_score": 0.94
}

Student objective:

loss = alpha * supervised_relevance_loss
     + (1 - alpha) * mse(student_score, teacher_score)

Use this when the teacher improves NDCG but cannot fit latency targets.

Scenario 3 - Intent Soft-Label Distillation

The intent classifier can distill from an ensemble:

DistilBERT fine-tuned classifier,
rules for high-risk escalation phrases,
LLM adjudicator on ambiguous labels.

Soft labels preserve ambiguity:

return_request: 0.55
order_tracking: 0.30
checkout_help: 0.10
faq: 0.05

This teaches the student that many MangaAssist messages are not cleanly one-hot.

Failure Modes

Failure	Detection	Fix
teacher hallucinations copied	student repeats wrong facts	human filter and retrieval-grounded teacher prompts
student too small	high loss and weak eval	increase size or use LoRA
dark knowledge hurts rare classes	rare recall drops	blend hard labels and soft labels
cost win too small	serving cost barely changes	quantize student or simplify architecture

Production Log

{
  "event": "distilled_model_eval",
  "student": "manga-student-8b-v03",
  "teacher": "managed-teacher-v07",
  "human_win_rate": 0.67,
  "hallucination_rate": 0.031,
  "cost_reduction": 0.58
}

Final Decision

Distillation is worth it for MangaAssist when the teacher's behavior can be captured by a cheaper model without copying its mistakes. Always keep a human-reviewed set for policy, escalation, and factuality.