03 – User-Centered Evaluation
Parent: 03-user-centered-evaluation.md · AIP-C01 Skill: 5.1.3 – User-Centered Evaluation Methods
System: MangaAssist – JP Manga Chatbot on Amazon.com
Stack: AWS Bedrock Claude 3.5 Sonnet · SageMaker · DynamoDB · OpenSearch Serverless · ElastiCache Redis · ECS Fargate
Overview
User-centered evaluation captures real human judgment about chatbot quality — the signals that offline metrics alone cannot provide. For MangaAssist, this means understanding whether manga fans actually find the recommendations useful, whether product answers resolve their questions, and whether the conversational experience feels natural.
This skill area covers four complementary feedback collection strategies:
| # | Scenario | Signal Type | Key Metric |
|---|---|---|---|
| 01 | Thumbs Feedback Interface | Explicit – binary | Thumbs-up rate per intent |
| 02 | Response Rating System | Explicit – ordinal | Mean rating, NPS |
| 03 | Annotation Workflow & Quality | Expert – rubric-based | Cohen's Kappa, annotation accuracy |
| 04 | Implicit Signal Collection | Behavioral – passive | Cart-add rate, re-engagement, dwell time |
Why User-Centered Evaluation Matters for MangaAssist
- Offline metrics lie — A recommendation engine with high NDCG@10 can still surface manga titles the user already owns or genres they dislike.
- Intent coverage gaps — BLEU/ROUGE scores on
faqresponses don't capture whether the answer actually resolved the customer's question. - Subjective satisfaction — Whether a
chitchatresponse about a manga series feels knowledgeable requires human judgment. - Business alignment — Amazon cares about conversion (cart adds after
recommendation), not just response fluency.
Quick Navigation
03-user-centered-evaluation/
├── README.md ← You are here
├── 01-thumbs-feedback-interface/
│ ├── README.md ← 12 scenario questions
│ └── ANSWERS.md ← Detailed answers
├── 02-response-rating-system/
│ ├── README.md ← 12 scenario questions
│ └── ANSWERS.md ← Detailed answers
├── 03-annotation-workflow-quality/
│ ├── README.md ← 12 scenario questions
│ └── ANSWERS.md ← Detailed answers
└── 04-implicit-signal-collection/
├── README.md ← 12 scenario questions
└── ANSWERS.md ← Detailed answers
Cross-Cutting Themes
- Feedback fatigue: Asking users too often kills response rates. Balance explicit collection with implicit signals.
- Sample bias: Power users who leave feedback are not representative of all MangaAssist users.
- Storage & streaming: All signals flow through Kinesis Data Streams → DynamoDB (hot) / S3 (cold) for real-time and batch analysis.
- Privacy: PII handling under Amazon's data governance — feedback is associated with session IDs, not customer IDs, unless opt-in consent is granted.
- Closing the loop: Feedback must flow back into model fine-tuning (SageMaker), prompt optimization (Bedrock), and cache invalidation (ElastiCache Redis).