LOCAL PREVIEW View on GitHub

ML Scenarios — Index

Eight scenarios covering ground-truth evolution on the classical-ML side of the manga-chatbot stack: recommendation, sentiment/ABSA, search ranking, spam detection, demand forecasting, embeddings, and computer-vision classifiers. Each scenario is a self-contained deep dive with a 5–7 question Q&A drill.


At-a-glance

# Scenario Axis of change Subsystem "Truth" that moves
01 Recommendation label decay Time Personalize + RAG fusion Implicit (user, item) → action label
02 Sentiment classifier domain shift Time + Requirements Sentiment / ABSA service (text, sentiment_class)
03 Search-ranking UI redesign Requirements Search ranker (query, item, rank) → engagement
04 Spam / abuse review detection — adversarial label expiry Adversary Review-Sentiment MCP anti-spam (review_text, spam?)
05 Demand forecasting — promo distortion Requirements + Time Inventory / forecasting (sku, day, units_sold) actuals
06 ABSA aspect emergence Requirements Review-Sentiment MCP (text, aspect_span, polarity)
07 Embedding model — category expansion Scale + Requirements Embedding service (item_a, item_b, similar?)
08 Cover-art classifier — style drift Time + Adversary Multimodal cover-art model (image, art_style_label)

Coverage matrix (axes × scenarios)

Axis Scenarios that hit it
Requirements 02, 03, 05, 06, 07
Constraints (covered by GenAI 04; ML side has it implicitly in 02 and 05)
Scale 07
Time 01, 02, 05, 08
Adversary 04, 08

Combined with the GenAI folder, every axis is covered ≥ 2 times across the two folders. Adversary appears in both 04 and 08 (08's "AI-generated cover art" gives the cover-art classifier an adversarial vector).


Suggested reading order

  1. Read first: 01-recommendation-label-decay.md — pure Time axis, classic implicit-feedback decay; the cleanest case for "labels age."
  2. Then: 04-spam-review-adversarial-evolution.md — pure Adversary axis, pairs with GenAI 07.
  3. Then: 03-search-ranking-ui-redesign.md — pure Requirements axis, illustrates how product changes silently invalidate labels.
  4. Then: the multi-axis cases (02, 05, 06, 07, 08).

Intuition gained per scenario

# One-line intuition to internalize
01 Implicit-feedback labels decay; a model retrained on biased recent labels bakes the bias deeper.
02 A sentiment classifier on a moving language is a frozen photo of a flowing river — annual re-labeling is the floor, not the ceiling.
03 A UI change can be a bigger ground-truth shift than a model change; the ranker's "correct" depends on the surface.
04 Spam labels have an adversarial half-life of weeks; a quarterly-retrained classifier is structurally behind.
05 Promotional events are not anomalies to filter — they're the part of the distribution business cares about most.
06 ABSA's aspect schema is a living thing; new aspects appear as the product or content evolves and old aspects become obsolete.
07 Adding a new item category is not a data-augmentation problem; it's a re-anchoring of the embedding space and a re-labeling of the similarity graph.
08 When the input distribution itself can be generated by an adversary, the classifier's "ground truth" includes a defense surface.

How GenAI and ML differ in these scenarios

A pattern visible across all 8 ML scenarios:

  • Re-labeling cost is the binding constraint. GenAI golden sets are hundreds to thousands of items; ML training sets are millions. You cannot re-label your way out of a problem; you have to architect re-labeling.
  • Implicit feedback is the dominant ground-truth source — and it's the most biased.
  • Retraining is expensive enough to be a quarterly event in most ML pipelines, which means the defense layer (heuristics, blocklists, online learning) has to absorb churn between retrains. This pattern appears in 04 (spam), 05 (forecasting overlays), 08 (cover-art).
  • Production engagement IS the eval signal, which means selection bias is everywhere. IPS, off-policy evaluation, and random hold-outs are not optional.
  • Model performance metrics (AUC, RMSE, NDCG) often look fine while the user experience is degrading — same lying-aggregate problem as GenAI, different metric.

What to expect in each scenario file

Same 9-section template across all eight (matches the GenAI folder):

  1. TL;DR
  2. Context & Trigger
  3. The Old Ground Truth
  4. The New Reality
  5. Why Naive Approaches Fail
  6. Detection
  7. Architecture / Implementation Deep Dive (Mermaid + code/config)
  8. Trade-offs & Alternatives
  9. Production Pitfalls
  10. Interview Q&A Drill (opening + grills + architect-level escalation + red-flag/strong-answer indicators)

Read ../00-overview-ground-truth-evolution.md and ../01-framework-axes-of-change.md first if you haven't.