LOCAL PREVIEW View on GitHub

Scenario 04 – Implicit Signal Collection

MangaAssist JP Manga Chatbot · User-Centered Evaluation · AIP-C01 Skill 5.1.3
Parent: 03-user-centered-evaluation


Scenario Overview

Not all user feedback is explicit. MangaAssist collects implicit behavioral signals that serve as proxies for satisfaction or dissatisfaction. These include cart adds after recommendations, page dwell time on linked manga pages, conversation length (both positive engagement and frustration indicators), re-engagement (return visits within 24 hours), and escalation to human agent as a strong negative signal. Implicit signals are captured via Kinesis Data Streams, joined with conversation context from DynamoDB, and analyzed in Athena. Signal engineering feeds into SageMaker reward models and offline evaluation.


Architecture Snapshot

┌────────────────────────────────────────────────────────────┐
│                    MangaAssist Session                      │
│                                                            │
│  Chatbot Response → User clicks manga link → Browses page │
│       │                    │                      │        │
│       │              Page view event         Dwell time    │
│       │                    │                 tracked via    │
│       │                    │                 JS beacon      │
│       │                    │                      │        │
│       ▼                    ▼                      ▼        │
│  Conversation log     Clickstream log       Engagement log │
└───────┬────────────────────┬──────────────────────┬────────┘
        │                    │                      │
        ▼                    ▼                      ▼
     DynamoDB           Kinesis Data            Kinesis Data
   (sessions)           Streams                 Streams
        │                    │                      │
        └────────────────────┴──────────────────────┘
                             │
                      ┌──────┴──────┐
                      ▼              ▼
                 S3 (raw)       Real-time
                 → Athena       aggregation
                 → QuickSight   (Lambda)
                                  │
                                  ▼
                            ElastiCache Redis
                          (session-level signal
                            aggregation)

Interview Questions (12)

Easy (3)

Q1. After MangaAssist recommends "Chainsaw Man Vol. 12", the user clicks the product link and adds it to their cart. Describe the end-to-end event flow that captures this as a positive implicit signal. Include the specific Kinesis stream schema, the DynamoDB join key to link the cart event to the chatbot conversation, and how you'd attribute this conversion to the chatbot vs. organic browsing.

Q2. MangaAssist logs conversation length (number of turns) for every session. Explain why conversation length is an ambiguous implicit signal — when is a long conversation positive (engaged discovery) vs. negative (frustrated user failing to get an answer)? Propose one additional signal you'd combine with conversation length to disambiguate.

Q3. The team wants to use "escalation to human agent" (escalation intent) as a negative implicit signal. Describe an Athena query that computes the weekly escalation rate per intent and identify which intents you'd expect to have the highest escalation rates and why.


Medium (3)

Q4. Design an implicit signal scoring model that assigns a satisfaction score (0.0–1.0) to each MangaAssist session using only behavioral signals (no explicit feedback). Specify at least 5 features (e.g., cart_add_within_30m, dwell_time_on_linked_page, conversation_abandonment_flag, re-engagement_within_24h, escalation_flag), their expected polarity (positive/negative), and a simple weighted scoring formula. Justify the weights using domain reasoning.

Q5. MangaAssist wants to measure page dwell time on manga product pages linked from chatbot responses. The JS beacon fires a heartbeat every 5 seconds. Design the Kinesis ingestion and Redis aggregation pipeline: (a) Kinesis record schema for heartbeats; (b) Lambda consumer logic to aggregate heartbeats into dwell time per page visit; © ElastiCache Redis data structure (sorted set or hash) for real-time session-level dwell aggregation; (d) how you handle tab switches and background tabs.

Q6. You want to measure re-engagement — whether users who interact with MangaAssist return within 24 hours. This requires joining session data across time boundaries. Design the DynamoDB schema and Athena query approach to: (a) identify returning users (keyed by customer_id); (b) compute the re-engagement rate per intent; © distinguish re-engagement driven by chatbot satisfaction from re-engagement driven by unresolved issues (e.g., returning because order_tracking wasn't answered).


Hard (3)

Q7. The data science team proposes training a reward model on SageMaker using implicit signals as reward labels instead of human annotations. Design the training pipeline: (a) feature engineering from raw Kinesis events to per-turn reward labels; (b) how you handle the sparse reward problem (most turns don't have direct implicit signals — only the session outcome); © the temporal credit assignment approach (which turn in a 5-turn conversation caused the cart add?); (d) the SageMaker training job configuration and the model architecture.

Q8. MangaAssist has both explicit (thumbs, ratings) and implicit (cart, dwell, escalation) signals. You want to validate that implicit signals are actually predictive of user satisfaction. Design a correlation analysis between implicit signal composites and explicit ratings. Specify: (a) the hypothesis and null hypothesis; (b) the statistical test (Spearman rank correlation, logistic regression, or other); © the minimum sample size for 80% power; (d) confounders to control for (intent type, user tenure, time of day); (e) the Athena SQL query to join implicit and explicit data for this analysis.

Q9. You observe that the promotion intent has a very high cart-add rate (28%) compared to recommendation intent (11%). Before concluding that promotion responses are better, analyze three confounding factors that could explain this difference. For each confounder, describe the debiasing analysis you'd run and the data sources needed.


Very Hard (3)

Q10. Design a causal inference framework to determine whether MangaAssist chatbot interactions actually cause users to purchase more manga, rather than merely being correlated with purchase intent. Specify: (a) the identification strategy (instrumental variable, regression discontinuity, or difference-in-differences — justify your choice); (b) the treatment definition (chatbot interaction) and the outcome (purchase within 48h); © the potential confounders and how you'd address them; (d) the data requirements from DynamoDB, Kinesis, and the Amazon retail data lake; (e) the SageMaker pipeline for running the causal model at scale.

Q11. Build an implicit signal anomaly detection system that continuously monitors behavioral metrics across all 10 intents and automatically detects when an intent's implicit satisfaction profile changes significantly (e.g., escalation rate for checkout_help suddenly doubles after a deployment). Design: (a) the feature vector per intent per hour; (b) the anomaly detection model (isolation forest, ADTK, or Prophet — justify your choice); © the Kinesis + Lambda + CloudWatch pipeline for real-time detection; (d) the alert routing (which metric anomalies page on-call vs. create Jira tickets); (e) the false positive suppression strategy to avoid alert fatigue.

Q12. Design a multi-objective optimization system that uses implicit signals to automatically tune MangaAssist's behavior. The system has three potentially conflicting objectives: (a) maximize cart-add rate (business metric); (b) minimize escalation rate (user satisfaction); © minimize average conversation length (efficiency). Formulate this as a Pareto optimization problem. Specify: the decision variables (prompt template selection, retrieval threshold from OpenSearch, response length parameter), the objective functions computed from implicit signals, the optimization algorithm (Bayesian optimization on SageMaker, multi-armed bandit, or evolutionary — justify your choice), and the safety constraints (no single objective may degrade more than 10% from baseline).