LOCAL PREVIEW View on GitHub

Intent Classification — Folder Index

The MangaAssist intent classifier is a DistilBERT-based 10-way softmax router that gates every request before it reaches the recommendation engine, FAQ lookup, order system, or escalation queue. It is the gold-standard reference folder in this curriculum: every other Tier-1 topic folder mirrors its 8-file pattern (see ../SCENARIO_TEMPLATE.md).

This folder contains the main technique doc plus seven research-grade deep-dives plus this README. Each deep-dive owns a slice of the routing problem (numerical proof, calibration, business cost, dry-run, multi-intent, OOD, new-intent discovery). Together they form one coherent, production-ready intent-routing system.


Reading Order

Land here, then follow the sequence below. Each step builds on the previous.

# File Persona Why read it next
1 01-intent-classifier-fine-tuning.md Priya + Aiko start here — theory, math, architecture, training code, ablations, comparative methods, segment-wise results, failure tree
2 01-fine_tuning_dry_run_mangaassist.md Jordan reproduce the result — manifest, error-injection tests, gate-failure tree
3 01-fine_tuning_numerical_worked_examples_mangaassist.md Aiko every metric in concrete arithmetic; bootstrap CI procedure
4 01-confidence_calibration_for_intent_routing_mangaassist.md Aiko + Jordan trust the probabilities — temperature scaling vs alternatives, T-sweep, drift SLA
5 01-business_weighted_error_score_mangaassist.md Sam + Marcus trust the consequences — cost-matrix sensitivity, $ savings CIs
6 01-multi_intent_detection_mangaassist.md Priya + Aiko handle requests with two valid intents — sigmoid head, pair drift detection
7 01-ood_unknown_intent_detection_mangaassist.md Priya + Marcus refuse safely on unsupported queries — energy / MSP / ODIN comparison, ROC bands, adversarial robustness
8 01-cluster_based_new_intent_discovery_mangaassist.md Sam + Aiko close the loop — turn rejected traffic into new intents (NIS, HDBSCAN, sensitivity sweep)

Tip. Files 1-3 are the mechanical core (build it + reproduce it + do the math). Files 4-5 are the trust layer (probabilities + dollars). Files 6-7 handle the edges (multi + OOD). File 8 closes the feedback loop.


Shared MangaAssist Baseline (verbatim across every doc)

Item Value
Product MangaAssist — Amazon retail chatbot for manga shopping & support
Model DistilBERT-base, fine-tuned, 10-way softmax head
Intents product_discovery (22%) · recommendation (18%) · product_question (15%) · order_tracking (12%) · faq (8%) · return_request (7%) · chitchat (6%) · promotion (5%) · checkout_help (4%) · escalation (3%)
Dataset 50K production + 5K synthetic-filtered = 55K total → 80/10/10 stratified split → 44K / 5.5K / 5.5K
Headline accuracy 92.1% ± 0.4% (95% CI) post fine-tuning; pre-fine-tune baseline 83.2%
Rare-class accuracy 88.6% ± 1.7% on escalation
Latency budget < 15 ms P95 at the routing layer
Calibration temperature scaling, T = 1.6 → ECE 0.040 ± 0.005
Multi-intent traffic 18% (≥ 2 valid intents)
OOD traffic ~5% (outside the 10-intent taxonomy)
Languages English primary; 9% JP-EN code-switch
Hardware (training) g5.12xlarge (4× A10G), SageMaker, ~37 min/3 epochs
Hardware (inference) inf2.xlarge (Inferentia 2), 12 ms P95
Promotion gate acc ≥ 91.7% AND macro-F1 ≥ 0.860 AND rare-class ≥ 87.0% AND ECE ≤ 0.045 AND P95 ≤ 15ms
Rollback shadow → canary 5% → 25% → 50% → 100%; auto-rollback on any gate breach

If a number elsewhere in the folder diverges from this table, the divergence is a bug — open an issue.


Personas

Persona Role Lens Where they lead
Priya ML Engineer training stability, optimizer, math files 1, 6, 7
Marcus Architect system trade-offs, latency, scaling files 1, 5, 7
Aiko Data Scientist metrics, statistics, data quality files 1, 3, 4, 8
Jordan MLOps pipeline, reproducibility, monitoring files 2, 4
Sam Product Manager user/business impact, CSAT, $ files 5, 8

Personas are consistent across every doc. New deep-dives must use these five names with these roles — do not invent new personas.


Prerequisites

To get full value from this folder you should already be comfortable with:

  • Cross-entropy and softmax at the matrix level (logits → probabilities → loss → gradient)
  • Transformers fine-tuning basics (frozen vs. unfrozen, classification head, [CLS] pooling)
  • Bootstrap confidence intervals (resample with replacement; percentile method)
  • Reading mermaid diagrams

Optional but helpful: ECE / Brier / NLL definitions; UMAP / HDBSCAN; energy-based OOD intuition.


Glossary

Term Definition
DistilBERT 6-layer student model distilled from BERT-base; ~66M params; 97% of BERT's NLU at 60% of params (Sanh 2019)
Focal loss (1-p_t)^γ · CE; down-weights easy examples (Lin 2017); we use γ = 2
Discriminative LR per-layer LR lr_i = base_lr · decay^(L-i); we use decay = 0.82 (Howard & Ruder 2018)
ECE Expected Calibration Error; mean confidence minus mean accuracy across confidence bins (Naeini 2015)
Temperature scaling softmax(z / T) with T fitted on val NLL; one-parameter post-hoc calibrator (Guo 2017)
MSP / energy / ODIN / Mahalanobis OOD scoring functions over logits or features (Hendrycks 2017 / Liu 2020 / Liang 2018 / Lee 2018)
NIS Novel Intent Score; weighted combination of cluster purity, size, growth, business pain, stability
HDBSCAN Hierarchical density-based clustering with probabilistic outlier handling (Campello 2013)
Business-weighted error error rate weighted by per-error-type cost matrix (Elkan 2001)
PCD Pair Co-occurrence Drift; total-variation distance between train and prod label-pair distributions
Acceptance suite the assert block in the dry-run doc that gates every PR
Promotion gate the metric thresholds a model must pass to enter canary fleet

Folder Citation Index (deduplicated across the 8 files)

Every paper cited anywhere in the folder appears once below. File-level bibliographies are subsets of this index.

Foundational

  • Devlin, J. et al. (2018). BERT: Pre-training of Deep Bidirectional Transformers. NAACL. https://arxiv.org/abs/1810.04805
  • Sanh, V. et al. (2019). DistilBERT, a distilled version of BERT. NeurIPS-EMC². https://arxiv.org/abs/1910.01108

Optimization, schedule, fine-tuning

  • Sun, C. et al. (2019). How to Fine-Tune BERT for Text Classification. CCL.
  • Howard, J., Ruder, S. (2018). Universal Language Model Fine-tuning (ULMFiT). ACL.
  • Smith, L. N. (2017). Cyclical Learning Rates / 1cycle. IEEE WACV.
  • Loshchilov, I., Hutter, F. (2019). Decoupled Weight Decay (AdamW). ICLR.
  • Bengio, Y. et al. (2009). Curriculum Learning. ICML.

Loss / class imbalance

  • Lin, T.-Y. et al. (2017). Focal Loss for Dense Object Detection. ICCV.
  • He, H., Garcia, E. A. (2009). Learning from Imbalanced Data. IEEE TKDE.
  • Chawla, N. et al. (2002). SMOTE. JAIR.
  • Szegedy, C. et al. (2016). Rethinking the Inception Architecture (label smoothing). CVPR.
  • Wang, F., Liu, H. (2021). Understanding the Behaviour of Contrastive Loss. CVPR.
  • Buda, M. et al. (2018). A Systematic Study of the Class Imbalance Problem. Neural Networks.

Calibration

  • Guo, C. et al. (2017). On Calibration of Modern Neural Networks. ICML.
  • Naeini, M. P. et al. (2015). ECE via Bayesian Binning. AAAI.
  • Platt, J. (1999). Probabilistic Outputs for SVMs.
  • Zadrozny, B., Elkan, C. (2001/2002). Histogram binning + isotonic regression. ICML / KDD.
  • Kull, M. et al. (2017). Beta Calibration. AISTATS.
  • Kull, M. et al. (2019). Dirichlet Calibration. NeurIPS.
  • Gal, Y., Ghahramani, Z. (2016). Dropout as Bayesian Approximation. ICML.
  • Lakshminarayanan, B. et al. (2017). Deep Ensembles. NeurIPS.
  • Ovadia, Y. et al. (2019). Calibration under Distribution Shift. NeurIPS.
  • Roelofs, R. et al. (2022). Mitigating Bias in Calibration Error Estimation. AISTATS.
  • Stutz, D. et al. (2020). Confidence-Calibrated Adversarial Training. ICML.

OOD / open-set / uncertainty

  • Hendrycks, D., Gimpel, K. (2017). MSP baseline. ICLR.
  • Liang, S. et al. (2018). ODIN. ICLR.
  • Lee, K. et al. (2018). Mahalanobis OOD. NeurIPS.
  • Liu, W. et al. (2020). Energy-based OOD. NeurIPS.
  • Sun, Y. et al. (2022). k-NN OOD. ICML.
  • Wang, H. et al. (2022). ViM. CVPR.
  • Huang, R. et al. (2021). GradNorm. NeurIPS.
  • Hendrycks, D. et al. (2019). Outlier Exposure. ICLR.
  • Bendale, A., Boult, T. E. (2016). OpenMax. CVPR.
  • Sensoy, M. et al. (2018). Evidential Deep Learning. NeurIPS.
  • Goodge, A. et al. (2022). Robustness of OOD Detectors. AAAI.
  • Yang, J. et al. (2024). Generalized OOD Detection: A Survey. TPAMI.
  • Joshi, A. J. et al. (2009). Margin-score active learning. CVPR.

Multi-label

  • Read, J. et al. (2011). Classifier Chains. Machine Learning.
  • Tsoumakas, G., Katakis, I. (2007). Multi-Label Classification Survey. IDA.
  • Yang, P. et al. (2018). SGM. COLING.
  • Bogatinovski, J. et al. (2022). Multi-Label Methods Comparative Study. TKDE.
  • Lee, J. et al. (2019). Set Transformer. ICML.
  • Dembczynski, K. et al. (2012). Label Dependence in Multi-Label Classification. Machine Learning.
  • Wu, J. et al. (2017). Meta-learning for multi-label. EMNLP.

Cost-sensitive learning

  • Elkan, C. (2001). Foundations of Cost-Sensitive Learning. IJCAI.
  • Provost, F. (2000). Imbalanced Data Sets 101 / threshold moving. AAAI Workshop.
  • Bahnsen, A. C. et al. (2014). Example-Dependent Cost-Sensitive Decision Trees. ESWA.
  • Khan, S. H. et al. (2018). Cost-Sensitive Deep Feature Learning. IEEE TNNLS.
  • Domingos, P. (1999). MetaCost. KDD.
  • Dalvi, N. et al. (2004). Adversarial Classification. KDD.

Clustering / new-intent discovery

  • Campello, R. J. G. B. et al. (2013). HDBSCAN. PAKDD.
  • McInnes, L. et al. (2018). UMAP. arXiv.
  • Ester, M. et al. (1996). DBSCAN. KDD.
  • Lloyd, S. P. (1982). k-means. IEEE TIT.
  • Ng, A. Y. et al. (2002). Spectral Clustering. NeurIPS.
  • Rodriguez, A., Laio, A. (2014). Density-Peak Clustering. Science.
  • Lin, T.-E. et al. (2020). Discovering New Intents. NAACL.
  • Zhang, H. et al. (2021). Discovering New Intents (Deep Aligned Clustering). AAAI.
  • Vaze, S. et al. (2022). Generalized Category Discovery. CVPR.
  • Saltelli, A. et al. (2010). Variance-based Sensitivity Analysis (Sobol). Comput. Phys. Comm.

Variance, reproducibility, evaluation, fairness

  • Bouthillier, X. et al. (2021). Accounting for Variance. MLSys.
  • Henderson, P. et al. (2018). Deep RL that Matters. AAAI.
  • Pineau, J. et al. (2021). NeurIPS Reproducibility Checklist.
  • Efron, B., Tibshirani, R. (1993). An Introduction to the Bootstrap. CRC Press.
  • Politis, D. N. et al. (1999). Subsampling. Springer.
  • Northcutt, C. et al. (2021). Pervasive Label Errors. NeurIPS Datasets & Benchmarks.
  • Mitchell, M. et al. (2019). Model Cards. FAccT.
  • Gebru, T. et al. (2021). Datasheets for Datasets. CACM.
  • Hashimoto, T. et al. (2018). Fairness Without Demographics. ICML.
  • McNemar, Q. (1947). Note on the sampling error of the difference between correlated proportions. Psychometrika.
  • Quiñonero-Candela, J. et al. (2008). Dataset Shift in ML. MIT Press.

Adversarial / robustness

  • Wang, J. et al. (2021). TextAttack. EMNLP/NAACL.
  • Perez, F., Ribeiro, I. (2022). Ignore Previous Prompt. NeurIPS Workshop.
  • Szegedy, C. et al. (2014). Intriguing properties of neural networks. ICLR.
  • Madry, A. et al. (2018). Adversarial training (PGD). ICLR.

Per-file citation budgets actually delivered: main 22 · business 8 · cluster 10 · calibration 11 · dry-run 9 · numerical 8 · multi-intent 8 · OOD 13 · README (no body cites) · total unique 70+ across the folder.


Audit Checklist (tick before declaring this folder "done")

  • All 8 files exist
  • Folder README exists (this file)
  • Shared baseline is verbatim in every doc
  • Every doc has a Research-Grade Addendum
  • Every reported metric has a 95% bootstrap CI (or notes when CI not yet computed)
  • Every numeric design choice has either an ablation table OR a citation
  • Every doc has a failure-mode tree (mermaid)
  • Dry-run doc has reproducibility manifest
  • Personas are consistent across every doc
  • Bibliography in each file is a subset of this README's citation index
  • Validation report ../mangaassist_document_validation_report_v2.md extended for new arithmetic (in progress — see Phase B-Validate)

Cross-Folder Pointers