Intent Classification — Folder Index
The MangaAssist intent classifier is a DistilBERT-based 10-way softmax router that gates every request before it reaches the recommendation engine, FAQ lookup, order system, or escalation queue. It is the gold-standard reference folder in this curriculum: every other Tier-1 topic folder mirrors its 8-file pattern (see
../SCENARIO_TEMPLATE.md).
This folder contains the main technique doc plus seven research-grade deep-dives plus this README. Each deep-dive owns a slice of the routing problem (numerical proof, calibration, business cost, dry-run, multi-intent, OOD, new-intent discovery). Together they form one coherent, production-ready intent-routing system.
Reading Order
Land here, then follow the sequence below. Each step builds on the previous.
| # | File | Persona | Why read it next |
|---|---|---|---|
| 1 | 01-intent-classifier-fine-tuning.md | Priya + Aiko | start here — theory, math, architecture, training code, ablations, comparative methods, segment-wise results, failure tree |
| 2 | 01-fine_tuning_dry_run_mangaassist.md | Jordan | reproduce the result — manifest, error-injection tests, gate-failure tree |
| 3 | 01-fine_tuning_numerical_worked_examples_mangaassist.md | Aiko | every metric in concrete arithmetic; bootstrap CI procedure |
| 4 | 01-confidence_calibration_for_intent_routing_mangaassist.md | Aiko + Jordan | trust the probabilities — temperature scaling vs alternatives, T-sweep, drift SLA |
| 5 | 01-business_weighted_error_score_mangaassist.md | Sam + Marcus | trust the consequences — cost-matrix sensitivity, $ savings CIs |
| 6 | 01-multi_intent_detection_mangaassist.md | Priya + Aiko | handle requests with two valid intents — sigmoid head, pair drift detection |
| 7 | 01-ood_unknown_intent_detection_mangaassist.md | Priya + Marcus | refuse safely on unsupported queries — energy / MSP / ODIN comparison, ROC bands, adversarial robustness |
| 8 | 01-cluster_based_new_intent_discovery_mangaassist.md | Sam + Aiko | close the loop — turn rejected traffic into new intents (NIS, HDBSCAN, sensitivity sweep) |
Tip. Files 1-3 are the mechanical core (build it + reproduce it + do the math). Files 4-5 are the trust layer (probabilities + dollars). Files 6-7 handle the edges (multi + OOD). File 8 closes the feedback loop.
Shared MangaAssist Baseline (verbatim across every doc)
| Item | Value |
|---|---|
| Product | MangaAssist — Amazon retail chatbot for manga shopping & support |
| Model | DistilBERT-base, fine-tuned, 10-way softmax head |
| Intents | product_discovery (22%) · recommendation (18%) · product_question (15%) · order_tracking (12%) · faq (8%) · return_request (7%) · chitchat (6%) · promotion (5%) · checkout_help (4%) · escalation (3%) |
| Dataset | 50K production + 5K synthetic-filtered = 55K total → 80/10/10 stratified split → 44K / 5.5K / 5.5K |
| Headline accuracy | 92.1% ± 0.4% (95% CI) post fine-tuning; pre-fine-tune baseline 83.2% |
| Rare-class accuracy | 88.6% ± 1.7% on escalation |
| Latency budget | < 15 ms P95 at the routing layer |
| Calibration | temperature scaling, T = 1.6 → ECE 0.040 ± 0.005 |
| Multi-intent traffic | 18% (≥ 2 valid intents) |
| OOD traffic | ~5% (outside the 10-intent taxonomy) |
| Languages | English primary; 9% JP-EN code-switch |
| Hardware (training) | g5.12xlarge (4× A10G), SageMaker, ~37 min/3 epochs |
| Hardware (inference) | inf2.xlarge (Inferentia 2), 12 ms P95 |
| Promotion gate | acc ≥ 91.7% AND macro-F1 ≥ 0.860 AND rare-class ≥ 87.0% AND ECE ≤ 0.045 AND P95 ≤ 15ms |
| Rollback | shadow → canary 5% → 25% → 50% → 100%; auto-rollback on any gate breach |
If a number elsewhere in the folder diverges from this table, the divergence is a bug — open an issue.
Personas
| Persona | Role | Lens | Where they lead |
|---|---|---|---|
| Priya | ML Engineer | training stability, optimizer, math | files 1, 6, 7 |
| Marcus | Architect | system trade-offs, latency, scaling | files 1, 5, 7 |
| Aiko | Data Scientist | metrics, statistics, data quality | files 1, 3, 4, 8 |
| Jordan | MLOps | pipeline, reproducibility, monitoring | files 2, 4 |
| Sam | Product Manager | user/business impact, CSAT, $ | files 5, 8 |
Personas are consistent across every doc. New deep-dives must use these five names with these roles — do not invent new personas.
Prerequisites
To get full value from this folder you should already be comfortable with:
- Cross-entropy and softmax at the matrix level (logits → probabilities → loss → gradient)
- Transformers fine-tuning basics (frozen vs. unfrozen, classification head,
[CLS]pooling) - Bootstrap confidence intervals (resample with replacement; percentile method)
- Reading mermaid diagrams
Optional but helpful: ECE / Brier / NLL definitions; UMAP / HDBSCAN; energy-based OOD intuition.
Glossary
| Term | Definition |
|---|---|
| DistilBERT | 6-layer student model distilled from BERT-base; ~66M params; 97% of BERT's NLU at 60% of params (Sanh 2019) |
| Focal loss | (1-p_t)^γ · CE; down-weights easy examples (Lin 2017); we use γ = 2 |
| Discriminative LR | per-layer LR lr_i = base_lr · decay^(L-i); we use decay = 0.82 (Howard & Ruder 2018) |
| ECE | Expected Calibration Error; mean confidence minus mean accuracy across confidence bins (Naeini 2015) |
| Temperature scaling | softmax(z / T) with T fitted on val NLL; one-parameter post-hoc calibrator (Guo 2017) |
| MSP / energy / ODIN / Mahalanobis | OOD scoring functions over logits or features (Hendrycks 2017 / Liu 2020 / Liang 2018 / Lee 2018) |
| NIS | Novel Intent Score; weighted combination of cluster purity, size, growth, business pain, stability |
| HDBSCAN | Hierarchical density-based clustering with probabilistic outlier handling (Campello 2013) |
| Business-weighted error | error rate weighted by per-error-type cost matrix (Elkan 2001) |
| PCD | Pair Co-occurrence Drift; total-variation distance between train and prod label-pair distributions |
| Acceptance suite | the assert block in the dry-run doc that gates every PR |
| Promotion gate | the metric thresholds a model must pass to enter canary fleet |
Folder Citation Index (deduplicated across the 8 files)
Every paper cited anywhere in the folder appears once below. File-level bibliographies are subsets of this index.
Foundational
- Devlin, J. et al. (2018). BERT: Pre-training of Deep Bidirectional Transformers. NAACL. https://arxiv.org/abs/1810.04805
- Sanh, V. et al. (2019). DistilBERT, a distilled version of BERT. NeurIPS-EMC². https://arxiv.org/abs/1910.01108
Optimization, schedule, fine-tuning
- Sun, C. et al. (2019). How to Fine-Tune BERT for Text Classification. CCL.
- Howard, J., Ruder, S. (2018). Universal Language Model Fine-tuning (ULMFiT). ACL.
- Smith, L. N. (2017). Cyclical Learning Rates / 1cycle. IEEE WACV.
- Loshchilov, I., Hutter, F. (2019). Decoupled Weight Decay (AdamW). ICLR.
- Bengio, Y. et al. (2009). Curriculum Learning. ICML.
Loss / class imbalance
- Lin, T.-Y. et al. (2017). Focal Loss for Dense Object Detection. ICCV.
- He, H., Garcia, E. A. (2009). Learning from Imbalanced Data. IEEE TKDE.
- Chawla, N. et al. (2002). SMOTE. JAIR.
- Szegedy, C. et al. (2016). Rethinking the Inception Architecture (label smoothing). CVPR.
- Wang, F., Liu, H. (2021). Understanding the Behaviour of Contrastive Loss. CVPR.
- Buda, M. et al. (2018). A Systematic Study of the Class Imbalance Problem. Neural Networks.
Calibration
- Guo, C. et al. (2017). On Calibration of Modern Neural Networks. ICML.
- Naeini, M. P. et al. (2015). ECE via Bayesian Binning. AAAI.
- Platt, J. (1999). Probabilistic Outputs for SVMs.
- Zadrozny, B., Elkan, C. (2001/2002). Histogram binning + isotonic regression. ICML / KDD.
- Kull, M. et al. (2017). Beta Calibration. AISTATS.
- Kull, M. et al. (2019). Dirichlet Calibration. NeurIPS.
- Gal, Y., Ghahramani, Z. (2016). Dropout as Bayesian Approximation. ICML.
- Lakshminarayanan, B. et al. (2017). Deep Ensembles. NeurIPS.
- Ovadia, Y. et al. (2019). Calibration under Distribution Shift. NeurIPS.
- Roelofs, R. et al. (2022). Mitigating Bias in Calibration Error Estimation. AISTATS.
- Stutz, D. et al. (2020). Confidence-Calibrated Adversarial Training. ICML.
OOD / open-set / uncertainty
- Hendrycks, D., Gimpel, K. (2017). MSP baseline. ICLR.
- Liang, S. et al. (2018). ODIN. ICLR.
- Lee, K. et al. (2018). Mahalanobis OOD. NeurIPS.
- Liu, W. et al. (2020). Energy-based OOD. NeurIPS.
- Sun, Y. et al. (2022). k-NN OOD. ICML.
- Wang, H. et al. (2022). ViM. CVPR.
- Huang, R. et al. (2021). GradNorm. NeurIPS.
- Hendrycks, D. et al. (2019). Outlier Exposure. ICLR.
- Bendale, A., Boult, T. E. (2016). OpenMax. CVPR.
- Sensoy, M. et al. (2018). Evidential Deep Learning. NeurIPS.
- Goodge, A. et al. (2022). Robustness of OOD Detectors. AAAI.
- Yang, J. et al. (2024). Generalized OOD Detection: A Survey. TPAMI.
- Joshi, A. J. et al. (2009). Margin-score active learning. CVPR.
Multi-label
- Read, J. et al. (2011). Classifier Chains. Machine Learning.
- Tsoumakas, G., Katakis, I. (2007). Multi-Label Classification Survey. IDA.
- Yang, P. et al. (2018). SGM. COLING.
- Bogatinovski, J. et al. (2022). Multi-Label Methods Comparative Study. TKDE.
- Lee, J. et al. (2019). Set Transformer. ICML.
- Dembczynski, K. et al. (2012). Label Dependence in Multi-Label Classification. Machine Learning.
- Wu, J. et al. (2017). Meta-learning for multi-label. EMNLP.
Cost-sensitive learning
- Elkan, C. (2001). Foundations of Cost-Sensitive Learning. IJCAI.
- Provost, F. (2000). Imbalanced Data Sets 101 / threshold moving. AAAI Workshop.
- Bahnsen, A. C. et al. (2014). Example-Dependent Cost-Sensitive Decision Trees. ESWA.
- Khan, S. H. et al. (2018). Cost-Sensitive Deep Feature Learning. IEEE TNNLS.
- Domingos, P. (1999). MetaCost. KDD.
- Dalvi, N. et al. (2004). Adversarial Classification. KDD.
Clustering / new-intent discovery
- Campello, R. J. G. B. et al. (2013). HDBSCAN. PAKDD.
- McInnes, L. et al. (2018). UMAP. arXiv.
- Ester, M. et al. (1996). DBSCAN. KDD.
- Lloyd, S. P. (1982). k-means. IEEE TIT.
- Ng, A. Y. et al. (2002). Spectral Clustering. NeurIPS.
- Rodriguez, A., Laio, A. (2014). Density-Peak Clustering. Science.
- Lin, T.-E. et al. (2020). Discovering New Intents. NAACL.
- Zhang, H. et al. (2021). Discovering New Intents (Deep Aligned Clustering). AAAI.
- Vaze, S. et al. (2022). Generalized Category Discovery. CVPR.
- Saltelli, A. et al. (2010). Variance-based Sensitivity Analysis (Sobol). Comput. Phys. Comm.
Variance, reproducibility, evaluation, fairness
- Bouthillier, X. et al. (2021). Accounting for Variance. MLSys.
- Henderson, P. et al. (2018). Deep RL that Matters. AAAI.
- Pineau, J. et al. (2021). NeurIPS Reproducibility Checklist.
- Efron, B., Tibshirani, R. (1993). An Introduction to the Bootstrap. CRC Press.
- Politis, D. N. et al. (1999). Subsampling. Springer.
- Northcutt, C. et al. (2021). Pervasive Label Errors. NeurIPS Datasets & Benchmarks.
- Mitchell, M. et al. (2019). Model Cards. FAccT.
- Gebru, T. et al. (2021). Datasheets for Datasets. CACM.
- Hashimoto, T. et al. (2018). Fairness Without Demographics. ICML.
- McNemar, Q. (1947). Note on the sampling error of the difference between correlated proportions. Psychometrika.
- Quiñonero-Candela, J. et al. (2008). Dataset Shift in ML. MIT Press.
Adversarial / robustness
- Wang, J. et al. (2021). TextAttack. EMNLP/NAACL.
- Perez, F., Ribeiro, I. (2022). Ignore Previous Prompt. NeurIPS Workshop.
- Szegedy, C. et al. (2014). Intriguing properties of neural networks. ICLR.
- Madry, A. et al. (2018). Adversarial training (PGD). ICLR.
Per-file citation budgets actually delivered: main 22 · business 8 · cluster 10 · calibration 11 · dry-run 9 · numerical 8 · multi-intent 8 · OOD 13 · README (no body cites) · total unique 70+ across the folder.
Audit Checklist (tick before declaring this folder "done")
- All 8 files exist
- Folder README exists (this file)
- Shared baseline is verbatim in every doc
- Every doc has a Research-Grade Addendum
- Every reported metric has a 95% bootstrap CI (or notes when CI not yet computed)
- Every numeric design choice has either an ablation table OR a citation
- Every doc has a failure-mode tree (mermaid)
- Dry-run doc has reproducibility manifest
- Personas are consistent across every doc
- Bibliography in each file is a subset of this README's citation index
- Validation report
../mangaassist_document_validation_report_v2.mdextended for new arithmetic (in progress — see Phase B-Validate)
Cross-Folder Pointers
- Master curriculum index →
../README.md - Topic-scenario map (numbering for every technique) →
../00-mangaassist_fine_tuning_topic_scenario_map.md - Template all Tier-1 folders mirror →
../SCENARIO_TEMPLATE.md - Existing arithmetic validation →
../mangaassist_document_validation_report_v2.md - Companion Tier-1 folders (mirror this 8-file pattern) → Embedding-Fine-Tuning, Retrieval-Fine-Tuning (RAFT), Fine-Tuning-Techniques (LoRA), Alignment-RLHF, Model-Compression-Optimization (KD)