ML Scenarios — Index
Eight scenarios covering ground-truth evolution on the classical-ML side of the manga-chatbot stack: recommendation, sentiment/ABSA, search ranking, spam detection, demand forecasting, embeddings, and computer-vision classifiers. Each scenario is a self-contained deep dive with a 5–7 question Q&A drill.
At-a-glance
| # | Scenario | Axis of change | Subsystem | "Truth" that moves |
|---|---|---|---|---|
| 01 | Recommendation label decay | Time | Personalize + RAG fusion | Implicit (user, item) → action label |
| 02 | Sentiment classifier domain shift | Time + Requirements | Sentiment / ABSA service | (text, sentiment_class) |
| 03 | Search-ranking UI redesign | Requirements | Search ranker | (query, item, rank) → engagement |
| 04 | Spam / abuse review detection — adversarial label expiry | Adversary | Review-Sentiment MCP anti-spam | (review_text, spam?) |
| 05 | Demand forecasting — promo distortion | Requirements + Time | Inventory / forecasting | (sku, day, units_sold) actuals |
| 06 | ABSA aspect emergence | Requirements | Review-Sentiment MCP | (text, aspect_span, polarity) |
| 07 | Embedding model — category expansion | Scale + Requirements | Embedding service | (item_a, item_b, similar?) |
| 08 | Cover-art classifier — style drift | Time + Adversary | Multimodal cover-art model | (image, art_style_label) |
Coverage matrix (axes × scenarios)
| Axis | Scenarios that hit it |
|---|---|
| Requirements | 02, 03, 05, 06, 07 |
| Constraints | (covered by GenAI 04; ML side has it implicitly in 02 and 05) |
| Scale | 07 |
| Time | 01, 02, 05, 08 |
| Adversary | 04, 08 |
Combined with the GenAI folder, every axis is covered ≥ 2 times across the two folders. Adversary appears in both 04 and 08 (08's "AI-generated cover art" gives the cover-art classifier an adversarial vector).
Suggested reading order
- Read first:
01-recommendation-label-decay.md— pure Time axis, classic implicit-feedback decay; the cleanest case for "labels age." - Then:
04-spam-review-adversarial-evolution.md— pure Adversary axis, pairs with GenAI 07. - Then:
03-search-ranking-ui-redesign.md— pure Requirements axis, illustrates how product changes silently invalidate labels. - Then: the multi-axis cases (02, 05, 06, 07, 08).
Intuition gained per scenario
| # | One-line intuition to internalize |
|---|---|
| 01 | Implicit-feedback labels decay; a model retrained on biased recent labels bakes the bias deeper. |
| 02 | A sentiment classifier on a moving language is a frozen photo of a flowing river — annual re-labeling is the floor, not the ceiling. |
| 03 | A UI change can be a bigger ground-truth shift than a model change; the ranker's "correct" depends on the surface. |
| 04 | Spam labels have an adversarial half-life of weeks; a quarterly-retrained classifier is structurally behind. |
| 05 | Promotional events are not anomalies to filter — they're the part of the distribution business cares about most. |
| 06 | ABSA's aspect schema is a living thing; new aspects appear as the product or content evolves and old aspects become obsolete. |
| 07 | Adding a new item category is not a data-augmentation problem; it's a re-anchoring of the embedding space and a re-labeling of the similarity graph. |
| 08 | When the input distribution itself can be generated by an adversary, the classifier's "ground truth" includes a defense surface. |
How GenAI and ML differ in these scenarios
A pattern visible across all 8 ML scenarios:
- Re-labeling cost is the binding constraint. GenAI golden sets are hundreds to thousands of items; ML training sets are millions. You cannot re-label your way out of a problem; you have to architect re-labeling.
- Implicit feedback is the dominant ground-truth source — and it's the most biased.
- Retraining is expensive enough to be a quarterly event in most ML pipelines, which means the defense layer (heuristics, blocklists, online learning) has to absorb churn between retrains. This pattern appears in 04 (spam), 05 (forecasting overlays), 08 (cover-art).
- Production engagement IS the eval signal, which means selection bias is everywhere. IPS, off-policy evaluation, and random hold-outs are not optional.
- Model performance metrics (AUC, RMSE, NDCG) often look fine while the user experience is degrading — same lying-aggregate problem as GenAI, different metric.
What to expect in each scenario file
Same 9-section template across all eight (matches the GenAI folder):
- TL;DR
- Context & Trigger
- The Old Ground Truth
- The New Reality
- Why Naive Approaches Fail
- Detection
- Architecture / Implementation Deep Dive (Mermaid + code/config)
- Trade-offs & Alternatives
- Production Pitfalls
- Interview Q&A Drill (opening + grills + architect-level escalation + red-flag/strong-answer indicators)
Read ../00-overview-ground-truth-evolution.md and ../01-framework-axes-of-change.md first if you haven't.