Statistical Inference in MangaAssist
This folder covers how hypothesis testing, confidence intervals, and formal statistical tests were applied throughout the MangaAssist project — from A/B testing and canary analysis to model evaluation and drift detection.
Documents
| Document | Description |
|---|---|
| 01-hypothesis-testing.md | Null/alternative hypotheses, p-values, significance levels, Type I/II errors, and how hypothesis testing drove every rollout decision |
| 02-confidence-intervals.md | Interval estimation for rates, means, and proportions — applied to conversion rate, latency, hallucination rate, and A/B test lifts |
| 03-t-tests.md | One-sample, two-sample, and paired t-tests for comparing latency, token cost, and response quality across model versions |
| 04-chi-square-tests.md | Chi-square tests for intent distribution shifts, guardrail outcome independence, and categorical variable associations |
| 05-additional-statistical-tests.md | Z-tests for proportions, Mann-Whitney U, KS tests, Fisher's exact test, ANOVA, Bonferroni correction, and sequential testing |
| 06-tools-and-libraries.md | Tools, libraries, and platforms that enabled statistical inference in the project — scipy, statsmodels, pandas, CloudWatch, and more |
| 07-importance-for-mlops-engineers.md | Why hypothesis testing matters operationally for MLOps engineers, with production scenarios across deployment, monitoring, drift detection, and experimentation |
| 08-deep-dive-scenarios-hypothesis-testing.md | Interview-style deep dive on exactly where hypothesis testing was used in the chatbot, including rollout scenarios, critical decisions, and panel-style follow-up questions |
How This Connects to the Project
Statistical inference was the decision-making backbone of MangaAssist:
- Canary rollouts used two-proportion z-tests to decide whether to promote or rollback model changes
- A/B tests used confidence intervals and hypothesis tests to measure conversion lift, AOV changes, and CSAT impact
- Model evaluation used t-tests and chi-square tests to detect regressions in latency, intent accuracy, and retrieval quality
- Drift detection used KL divergence and distribution tests to catch silent degradation in production