Intent Classification — Folder Index

The MangaAssist intent classifier is a DistilBERT-based 10-way softmax router that gates every request before it reaches the recommendation engine, FAQ lookup, order system, or escalation queue. It is the gold-standard reference folder in this curriculum: every other Tier-1 topic folder mirrors its 8-file pattern (see ../SCENARIO_TEMPLATE.md).

This folder contains the main technique doc plus seven research-grade deep-dives plus this README. Each deep-dive owns a slice of the routing problem (numerical proof, calibration, business cost, dry-run, multi-intent, OOD, new-intent discovery). Together they form one coherent, production-ready intent-routing system.

Reading Order

Land here, then follow the sequence below. Each step builds on the previous.

#	File	Persona	Why read it next
1	01-intent-classifier-fine-tuning.md	Priya + Aiko	start here — theory, math, architecture, training code, ablations, comparative methods, segment-wise results, failure tree
2	01-fine_tuning_dry_run_mangaassist.md	Jordan	reproduce the result — manifest, error-injection tests, gate-failure tree
3	01-fine_tuning_numerical_worked_examples_mangaassist.md	Aiko	every metric in concrete arithmetic; bootstrap CI procedure
4	01-confidence_calibration_for_intent_routing_mangaassist.md	Aiko + Jordan	trust the probabilities — temperature scaling vs alternatives, T-sweep, drift SLA
5	01-business_weighted_error_score_mangaassist.md	Sam + Marcus	trust the consequences — cost-matrix sensitivity, $ savings CIs
6	01-multi_intent_detection_mangaassist.md	Priya + Aiko	handle requests with two valid intents — sigmoid head, pair drift detection
7	01-ood_unknown_intent_detection_mangaassist.md	Priya + Marcus	refuse safely on unsupported queries — energy / MSP / ODIN comparison, ROC bands, adversarial robustness
8	01-cluster_based_new_intent_discovery_mangaassist.md	Sam + Aiko	close the loop — turn rejected traffic into new intents (NIS, HDBSCAN, sensitivity sweep)

Tip. Files 1-3 are the mechanical core (build it + reproduce it + do the math). Files 4-5 are the trust layer (probabilities + dollars). Files 6-7 handle the edges (multi + OOD). File 8 closes the feedback loop.

Shared MangaAssist Baseline (verbatim across every doc)

Item	Value
Product	MangaAssist — Amazon retail chatbot for manga shopping & support
Model	DistilBERT-base, fine-tuned, 10-way softmax head
Intents	product_discovery (22%) · recommendation (18%) · product_question (15%) · order_tracking (12%) · faq (8%) · return_request (7%) · chitchat (6%) · promotion (5%) · checkout_help (4%) · escalation (3%)
Dataset	50K production + 5K synthetic-filtered = 55K total → 80/10/10 stratified split → 44K / 5.5K / 5.5K
Headline accuracy	92.1% ± 0.4% (95% CI) post fine-tuning; pre-fine-tune baseline 83.2%
Rare-class accuracy	88.6% ± 1.7% on escalation
Latency budget	< 15 ms P95 at the routing layer
Calibration	temperature scaling, T = 1.6 → ECE 0.040 ± 0.005
Multi-intent traffic	18% (≥ 2 valid intents)
OOD traffic	~5% (outside the 10-intent taxonomy)
Languages	English primary; 9% JP-EN code-switch
Hardware (training)	`g5.12xlarge` (4× A10G), SageMaker, ~37 min/3 epochs
Hardware (inference)	`inf2.xlarge` (Inferentia 2), 12 ms P95
Promotion gate	acc ≥ 91.7% AND macro-F1 ≥ 0.860 AND rare-class ≥ 87.0% AND ECE ≤ 0.045 AND P95 ≤ 15ms
Rollback	shadow → canary 5% → 25% → 50% → 100%; auto-rollback on any gate breach

If a number elsewhere in the folder diverges from this table, the divergence is a bug — open an issue.

Personas

Persona	Role	Lens	Where they lead
Priya	ML Engineer	training stability, optimizer, math	files 1, 6, 7
Marcus	Architect	system trade-offs, latency, scaling	files 1, 5, 7
Aiko	Data Scientist	metrics, statistics, data quality	files 1, 3, 4, 8
Jordan	MLOps	pipeline, reproducibility, monitoring	files 2, 4
Sam	Product Manager	user/business impact, CSAT, $	files 5, 8

Personas are consistent across every doc. New deep-dives must use these five names with these roles — do not invent new personas.

Prerequisites

To get full value from this folder you should already be comfortable with:

Cross-entropy and softmax at the matrix level (logits → probabilities → loss → gradient)
Transformers fine-tuning basics (frozen vs. unfrozen, classification head, [CLS] pooling)
Bootstrap confidence intervals (resample with replacement; percentile method)
Reading mermaid diagrams

Optional but helpful: ECE / Brier / NLL definitions; UMAP / HDBSCAN; energy-based OOD intuition.

Glossary

Term	Definition
DistilBERT	6-layer student model distilled from BERT-base; ~66M params; 97% of BERT's NLU at 60% of params (Sanh 2019)
Focal loss	`(1-p_t)^γ · CE`; down-weights easy examples (Lin 2017); we use γ = 2
Discriminative LR	per-layer LR `lr_i = base_lr · decay^(L-i)`; we use decay = 0.82 (Howard & Ruder 2018)
ECE	Expected Calibration Error; mean confidence minus mean accuracy across confidence bins (Naeini 2015)
Temperature scaling	softmax(z / T) with T fitted on val NLL; one-parameter post-hoc calibrator (Guo 2017)
MSP / energy / ODIN / Mahalanobis	OOD scoring functions over logits or features (Hendrycks 2017 / Liu 2020 / Liang 2018 / Lee 2018)
NIS	Novel Intent Score; weighted combination of cluster purity, size, growth, business pain, stability
HDBSCAN	Hierarchical density-based clustering with probabilistic outlier handling (Campello 2013)
Business-weighted error	error rate weighted by per-error-type cost matrix (Elkan 2001)
PCD	Pair Co-occurrence Drift; total-variation distance between train and prod label-pair distributions
Acceptance suite	the assert block in the dry-run doc that gates every PR
Promotion gate	the metric thresholds a model must pass to enter canary fleet

Folder Citation Index (deduplicated across the 8 files)

Every paper cited anywhere in the folder appears once below. File-level bibliographies are subsets of this index.

Foundational

Devlin, J. et al. (2018). BERT: Pre-training of Deep Bidirectional Transformers. NAACL. https://arxiv.org/abs/1810.04805
Sanh, V. et al. (2019). DistilBERT, a distilled version of BERT. NeurIPS-EMC². https://arxiv.org/abs/1910.01108

Optimization, schedule, fine-tuning

Sun, C. et al. (2019). How to Fine-Tune BERT for Text Classification. CCL.
Howard, J., Ruder, S. (2018). Universal Language Model Fine-tuning (ULMFiT). ACL.
Smith, L. N. (2017). Cyclical Learning Rates / 1cycle. IEEE WACV.
Loshchilov, I., Hutter, F. (2019). Decoupled Weight Decay (AdamW). ICLR.
Bengio, Y. et al. (2009). Curriculum Learning. ICML.

Loss / class imbalance

Lin, T.-Y. et al. (2017). Focal Loss for Dense Object Detection. ICCV.
He, H., Garcia, E. A. (2009). Learning from Imbalanced Data. IEEE TKDE.
Chawla, N. et al. (2002). SMOTE. JAIR.
Szegedy, C. et al. (2016). Rethinking the Inception Architecture (label smoothing). CVPR.
Wang, F., Liu, H. (2021). Understanding the Behaviour of Contrastive Loss. CVPR.
Buda, M. et al. (2018). A Systematic Study of the Class Imbalance Problem. Neural Networks.

Calibration

Guo, C. et al. (2017). On Calibration of Modern Neural Networks. ICML.
Naeini, M. P. et al. (2015). ECE via Bayesian Binning. AAAI.
Platt, J. (1999). Probabilistic Outputs for SVMs.
Zadrozny, B., Elkan, C. (2001/2002). Histogram binning + isotonic regression. ICML / KDD.
Kull, M. et al. (2017). Beta Calibration. AISTATS.
Kull, M. et al. (2019). Dirichlet Calibration. NeurIPS.
Gal, Y., Ghahramani, Z. (2016). Dropout as Bayesian Approximation. ICML.
Lakshminarayanan, B. et al. (2017). Deep Ensembles. NeurIPS.
Ovadia, Y. et al. (2019). Calibration under Distribution Shift. NeurIPS.
Roelofs, R. et al. (2022). Mitigating Bias in Calibration Error Estimation. AISTATS.
Stutz, D. et al. (2020). Confidence-Calibrated Adversarial Training. ICML.

OOD / open-set / uncertainty

Hendrycks, D., Gimpel, K. (2017). MSP baseline. ICLR.
Liang, S. et al. (2018). ODIN. ICLR.
Lee, K. et al. (2018). Mahalanobis OOD. NeurIPS.
Liu, W. et al. (2020). Energy-based OOD. NeurIPS.
Sun, Y. et al. (2022). k-NN OOD. ICML.
Wang, H. et al. (2022). ViM. CVPR.
Huang, R. et al. (2021). GradNorm. NeurIPS.
Hendrycks, D. et al. (2019). Outlier Exposure. ICLR.
Bendale, A., Boult, T. E. (2016). OpenMax. CVPR.
Sensoy, M. et al. (2018). Evidential Deep Learning. NeurIPS.
Goodge, A. et al. (2022). Robustness of OOD Detectors. AAAI.
Yang, J. et al. (2024). Generalized OOD Detection: A Survey. TPAMI.
Joshi, A. J. et al. (2009). Margin-score active learning. CVPR.

Multi-label

Read, J. et al. (2011). Classifier Chains. Machine Learning.
Tsoumakas, G., Katakis, I. (2007). Multi-Label Classification Survey. IDA.
Yang, P. et al. (2018). SGM. COLING.
Bogatinovski, J. et al. (2022). Multi-Label Methods Comparative Study. TKDE.
Lee, J. et al. (2019). Set Transformer. ICML.
Dembczynski, K. et al. (2012). Label Dependence in Multi-Label Classification. Machine Learning.
Wu, J. et al. (2017). Meta-learning for multi-label. EMNLP.

Cost-sensitive learning

Elkan, C. (2001). Foundations of Cost-Sensitive Learning. IJCAI.
Provost, F. (2000). Imbalanced Data Sets 101 / threshold moving. AAAI Workshop.
Bahnsen, A. C. et al. (2014). Example-Dependent Cost-Sensitive Decision Trees. ESWA.
Khan, S. H. et al. (2018). Cost-Sensitive Deep Feature Learning. IEEE TNNLS.
Domingos, P. (1999). MetaCost. KDD.
Dalvi, N. et al. (2004). Adversarial Classification. KDD.

Clustering / new-intent discovery

Campello, R. J. G. B. et al. (2013). HDBSCAN. PAKDD.
McInnes, L. et al. (2018). UMAP. arXiv.
Ester, M. et al. (1996). DBSCAN. KDD.
Lloyd, S. P. (1982). k-means. IEEE TIT.
Ng, A. Y. et al. (2002). Spectral Clustering. NeurIPS.
Rodriguez, A., Laio, A. (2014). Density-Peak Clustering. Science.
Lin, T.-E. et al. (2020). Discovering New Intents. NAACL.
Zhang, H. et al. (2021). Discovering New Intents (Deep Aligned Clustering). AAAI.
Vaze, S. et al. (2022). Generalized Category Discovery. CVPR.
Saltelli, A. et al. (2010). Variance-based Sensitivity Analysis (Sobol). Comput. Phys. Comm.

Variance, reproducibility, evaluation, fairness

Bouthillier, X. et al. (2021). Accounting for Variance. MLSys.
Henderson, P. et al. (2018). Deep RL that Matters. AAAI.
Pineau, J. et al. (2021). NeurIPS Reproducibility Checklist.
Efron, B., Tibshirani, R. (1993). An Introduction to the Bootstrap. CRC Press.
Politis, D. N. et al. (1999). Subsampling. Springer.
Northcutt, C. et al. (2021). Pervasive Label Errors. NeurIPS Datasets & Benchmarks.
Mitchell, M. et al. (2019). Model Cards. FAccT.
Gebru, T. et al. (2021). Datasheets for Datasets. CACM.
Hashimoto, T. et al. (2018). Fairness Without Demographics. ICML.
McNemar, Q. (1947). Note on the sampling error of the difference between correlated proportions. Psychometrika.
Quiñonero-Candela, J. et al. (2008). Dataset Shift in ML. MIT Press.

Adversarial / robustness

Wang, J. et al. (2021). TextAttack. EMNLP/NAACL.
Perez, F., Ribeiro, I. (2022). Ignore Previous Prompt. NeurIPS Workshop.
Szegedy, C. et al. (2014). Intriguing properties of neural networks. ICLR.
Madry, A. et al. (2018). Adversarial training (PGD). ICLR.

Per-file citation budgets actually delivered: main 22 · business 8 · cluster 10 · calibration 11 · dry-run 9 · numerical 8 · multi-intent 8 · OOD 13 · README (no body cites) · total unique 70+ across the folder.

Audit Checklist (tick before declaring this folder "done")

Cross-Folder Pointers

Master curriculum index → ../README.md
Topic-scenario map (numbering for every technique) → ../00-mangaassist_fine_tuning_topic_scenario_map.md
Template all Tier-1 folders mirror → ../SCENARIO_TEMPLATE.md
Existing arithmetic validation → ../mangaassist_document_validation_report_v2.md
Companion Tier-1 folders (mirror this 8-file pattern) → Embedding-Fine-Tuning, Retrieval-Fine-Tuning (RAFT), Fine-Tuning-Techniques (LoRA), Alignment-RLHF, Model-Compression-Optimization (KD)