LOCAL PREVIEW View on GitHub

03 – User-Centered Evaluation

Parent: 03-user-centered-evaluation.md · AIP-C01 Skill: 5.1.3 – User-Centered Evaluation Methods
System: MangaAssist – JP Manga Chatbot on Amazon.com
Stack: AWS Bedrock Claude 3.5 Sonnet · SageMaker · DynamoDB · OpenSearch Serverless · ElastiCache Redis · ECS Fargate


Overview

User-centered evaluation captures real human judgment about chatbot quality — the signals that offline metrics alone cannot provide. For MangaAssist, this means understanding whether manga fans actually find the recommendations useful, whether product answers resolve their questions, and whether the conversational experience feels natural.

This skill area covers four complementary feedback collection strategies:

# Scenario Signal Type Key Metric
01 Thumbs Feedback Interface Explicit – binary Thumbs-up rate per intent
02 Response Rating System Explicit – ordinal Mean rating, NPS
03 Annotation Workflow & Quality Expert – rubric-based Cohen's Kappa, annotation accuracy
04 Implicit Signal Collection Behavioral – passive Cart-add rate, re-engagement, dwell time

Why User-Centered Evaluation Matters for MangaAssist

  1. Offline metrics lie — A recommendation engine with high NDCG@10 can still surface manga titles the user already owns or genres they dislike.
  2. Intent coverage gaps — BLEU/ROUGE scores on faq responses don't capture whether the answer actually resolved the customer's question.
  3. Subjective satisfaction — Whether a chitchat response about a manga series feels knowledgeable requires human judgment.
  4. Business alignment — Amazon cares about conversion (cart adds after recommendation), not just response fluency.

Quick Navigation

03-user-centered-evaluation/
├── README.md                              ← You are here
├── 01-thumbs-feedback-interface/
│   ├── README.md                          ← 12 scenario questions
│   └── ANSWERS.md                         ← Detailed answers
├── 02-response-rating-system/
│   ├── README.md                          ← 12 scenario questions
│   └── ANSWERS.md                         ← Detailed answers
├── 03-annotation-workflow-quality/
│   ├── README.md                          ← 12 scenario questions
│   └── ANSWERS.md                         ← Detailed answers
└── 04-implicit-signal-collection/
    ├── README.md                          ← 12 scenario questions
    └── ANSWERS.md                         ← Detailed answers

Cross-Cutting Themes

  • Feedback fatigue: Asking users too often kills response rates. Balance explicit collection with implicit signals.
  • Sample bias: Power users who leave feedback are not representative of all MangaAssist users.
  • Storage & streaming: All signals flow through Kinesis Data Streams → DynamoDB (hot) / S3 (cold) for real-time and batch analysis.
  • Privacy: PII handling under Amazon's data governance — feedback is associated with session IDs, not customer IDs, unless opt-in consent is granted.
  • Closing the loop: Feedback must flow back into model fine-tuning (SageMaker), prompt optimization (Bedrock), and cache invalidation (ElastiCache Redis).