Scenario 01 – Thumbs Feedback Interface

Parent: ../README.md · Skill: 5.1.3 – User-Centered Evaluation
System: MangaAssist – JP Manga Chatbot on Amazon.com
Stack: AWS Bedrock Claude 3.5 Sonnet · DynamoDB · Kinesis Data Streams · ElastiCache Redis · ECS Fargate

Context

MangaAssist displays a thumbs-up / thumbs-down widget after every chatbot response. This binary explicit feedback is the highest-volume, lowest-friction user signal available. Designing it well means balancing collection rate against signal quality, managing storage at scale, and using the data to improve recommendations, FAQ answers, and escalation decisions across all 10 intents.

Scenario Questions (12)

Easy (3)

Q1. MangaAssist shows a 👍/👎 widget after each chatbot response. The product team wants to store every feedback event for later analysis. Design the DynamoDB table schema for storing thumbs feedback, including partition key, sort key, and the attributes you would capture alongside the binary signal.

Q2. After deploying the thumbs widget, only 4% of users click either button. The PM asks you to increase the feedback rate without degrading the chat experience. Propose three UI/UX strategies to boost participation, and explain which MangaAssist intents (e.g., recommendation, faq, order_tracking) would benefit most from higher feedback volume.

Q3. A junior engineer suggests storing thumbs feedback directly in the same DynamoDB table used for conversation history. Explain why this is problematic and describe the preferred architectural separation, including how Kinesis Data Streams fits into the pipeline.

Medium (3)

Q4. MangaAssist's recommendation intent has a 72% thumbs-up rate, while faq sits at 58%. The team wants to understand whether the difference is statistically significant or could be due to sample size differences. Describe the statistical test you would apply, the minimum sample size calculation, and how you would account for the fact that power users leave feedback disproportionately.

Q5. You discover that thumbs-down feedback on checkout_help responses is 3× higher during flash sale events (e.g., Amazon Prime Day). Design a system that detects such temporal spikes in negative feedback in near-real-time using Kinesis Data Streams and CloudWatch, and triggers an alert to the on-call team. Include the Kinesis analytics SQL or Lambda logic.

Q6. The team wants to add an optional free-text comment field when a user clicks 👎. Design the end-to-end flow: UI trigger → API Gateway → Lambda → DynamoDB, including how you would classify the free-text reason (e.g., "wrong manga genre", "outdated price", "didn't understand my question") using a lightweight Bedrock Claude call, and how you store both the raw text and the classified category.

Hard (3)

Q7. MangaAssist caches popular responses in ElastiCache Redis. A cached recommendation response gets 500 thumbs-up and 200 thumbs-down over 48 hours. Design a feedback-driven cache invalidation policy: define the metric threshold (e.g., thumbs-down ratio > X% over Y hours), the invalidation mechanism, and how the system re-generates the response via Bedrock Claude 3.5 Sonnet before re-caching.

Q8. You need to use thumbs feedback to fine-tune the intent classifier running on SageMaker. However, thumbs feedback is noisy — a user might thumbs-down a correct faq answer because they dislike the product, not the answer. Design a data pipeline that filters, cleans, and labels thumbs feedback into a training-ready dataset, including your strategy for handling label noise and class imbalance across the 10 intents.

Q9. MangaAssist serves 50M chatbot interactions/month. At a 6% feedback rate, that's 3M feedback events/month. Design the full data architecture: ingestion (Kinesis), hot storage (DynamoDB with TTL), warm storage (S3 via Kinesis Firehose), and cold analytics (Athena). Include throughput calculations, DynamoDB capacity mode selection (on-demand vs. provisioned), and cost estimates.

Very Hard (3)

Q10. The MangaAssist team suspects that showing the thumbs widget changes user behavior (anchoring effect) — users who see the widget engage differently than those who don't. Design a rigorous A/B test to measure the causal effect of the feedback widget on conversation length, cart-add rate, and escalation rate. Address: randomization unit (session vs. user), minimum detectable effect, duration, and how you would handle the fact that the treatment group generates feedback data but the control group does not.

Q11. You want to build a "feedback flywheel" where thumbs signals continuously improve MangaAssist. Design the closed-loop system: (1) feedback collection → (2) aggregation & anomaly detection → (3) automatic prompt tuning for Bedrock Claude → (4) A/B deployment of updated prompts → (5) measuring improvement via the same thumbs signal. Address the cold-start problem for new intents like product_discovery and the risk of feedback loops (optimizing for thumbs-up rate leading to sycophantic responses).

Q12. Amazon's leadership wants a single "User Satisfaction Score" (USS) that combines thumbs feedback across all 10 intents, weighted by business impact. Define the USS formula, justify the weights (e.g., recommendation thumbs-up should count more than chitchat), describe how you would handle intents with low feedback volume (sparse signal), and propose a dashboard design that decomposes USS into intent-level and time-series views using QuickSight.

Answer Key

→ ANSWERS.md