Training MLOps Scenarios - MangaAssist

This companion document turns the training infrastructure topic into a MangaAssist operating playbook. The goal is repeatable fine-tuning, validation, release, monitoring, and rollback across the full chatbot model stack.

Models In Scope

Model	Cadence	Main gate
intent classifier	weekly or drift-triggered	accuracy, rare recall, business harm, latency
embedding adapter	monthly	Recall@3, NDCG@10
cross-encoder reranker	monthly	MRR@10, latency
sentiment detector	weekly	escalation recall
LoRA/DPO response model	quarterly or quality-triggered	human preference and factuality
RAFT answer model	policy/catalog-triggered	grounded accuracy

Scenario 1 - Weekly Router Release

Pipeline:

flowchart TD
    A[Collect labeled logs] --> B[Data validation]
    B --> C[Train candidate]
    C --> D[Offline eval]
    D --> E[Latency test]
    E --> F[Shadow deploy]
    F --> G{Promotion gates pass?}
    G -- yes --> H[Blue-green release]
    G -- no --> I[Keep champion and open error review]

Required artifacts:

dataset version,
tokenizer hash,
training config,
model artifact,
eval report,
confusion matrix,
latency report,
rollback pointer.

Promotion gate:

Gate	Rule
overall accuracy	candidate >= champion - 0.2 points
rare-class accuracy	no critical regression
business-weighted harm	improves or remains within threshold
escalation miss rate	no increase
P95 latency	under 15 ms
shadow disagreement	reviewed if over threshold

Scenario 2 - Cross-Model Release Coordination

The embedding adapter and reranker should not be released independently if their metrics are tightly coupled.

Example:

embedding adapter changes candidate distribution,
reranker was trained on old candidate distribution,
final top-3 quality drops despite separate offline wins.

Solution:

evaluate retrieval plus reranking together,
version compatible model pairs,
run end-to-end catalog search validation.

Scenario 3 - Model Registry Discipline

Every MangaAssist model should have a champion/challenger state.

Registry fields:

{
  "model_name": "intent-distilbert",
  "version": "v15",
  "dataset_version": "intent-data-2026-04-20",
  "training_code_sha": "abc123",
  "metrics": {
    "accuracy": 0.922,
    "rare_accuracy": 0.889,
    "p95_latency_ms": 12.1
  },
  "status": "shadow"
}

Failure Modes

Failure	Detection	Fix
train/serve skew	offline pass, live fail	tokenizer and preprocessing hash checks
silent data leakage	validation too good	group split by conversation/user
untracked manual model	no reproducibility	registry required for deployment
shadow ignored	bad model promoted	promotion checklist enforced

Final Decision

For MangaAssist, MLOps is part of model quality. A model is not done when it trains; it is done when it has a traceable dataset, repeatable pipeline, measurable gates, shadow evidence, and rollback safety.