LOCAL PREVIEW View on GitHub

Statistical Inference in MangaAssist

This folder covers how hypothesis testing, confidence intervals, and formal statistical tests were applied throughout the MangaAssist project — from A/B testing and canary analysis to model evaluation and drift detection.

Documents

Document Description
01-hypothesis-testing.md Null/alternative hypotheses, p-values, significance levels, Type I/II errors, and how hypothesis testing drove every rollout decision
02-confidence-intervals.md Interval estimation for rates, means, and proportions — applied to conversion rate, latency, hallucination rate, and A/B test lifts
03-t-tests.md One-sample, two-sample, and paired t-tests for comparing latency, token cost, and response quality across model versions
04-chi-square-tests.md Chi-square tests for intent distribution shifts, guardrail outcome independence, and categorical variable associations
05-additional-statistical-tests.md Z-tests for proportions, Mann-Whitney U, KS tests, Fisher's exact test, ANOVA, Bonferroni correction, and sequential testing
06-tools-and-libraries.md Tools, libraries, and platforms that enabled statistical inference in the project — scipy, statsmodels, pandas, CloudWatch, and more
07-importance-for-mlops-engineers.md Why hypothesis testing matters operationally for MLOps engineers, with production scenarios across deployment, monitoring, drift detection, and experimentation
08-deep-dive-scenarios-hypothesis-testing.md Interview-style deep dive on exactly where hypothesis testing was used in the chatbot, including rollout scenarios, critical decisions, and panel-style follow-up questions

How This Connects to the Project

Statistical inference was the decision-making backbone of MangaAssist:

  • Canary rollouts used two-proportion z-tests to decide whether to promote or rollback model changes
  • A/B tests used confidence intervals and hypothesis tests to measure conversion lift, AOV changes, and CSAT impact
  • Model evaluation used t-tests and chi-square tests to detect regressions in latency, intent accuracy, and retrieval quality
  • Drift detection used KL divergence and distribution tests to catch silent degradation in production