03 – User-Centered Evaluation

Parent: 03-user-centered-evaluation.md · AIP-C01 Skill: 5.1.3 – User-Centered Evaluation Methods
System: MangaAssist – JP Manga Chatbot on Amazon.com
Stack: AWS Bedrock Claude 3.5 Sonnet · SageMaker · DynamoDB · OpenSearch Serverless · ElastiCache Redis · ECS Fargate

Overview

User-centered evaluation captures real human judgment about chatbot quality — the signals that offline metrics alone cannot provide. For MangaAssist, this means understanding whether manga fans actually find the recommendations useful, whether product answers resolve their questions, and whether the conversational experience feels natural.

This skill area covers four complementary feedback collection strategies:

#	Scenario	Signal Type	Key Metric
01	Thumbs Feedback Interface	Explicit – binary	Thumbs-up rate per intent
02	Response Rating System	Explicit – ordinal	Mean rating, NPS
03	Annotation Workflow & Quality	Expert – rubric-based	Cohen's Kappa, annotation accuracy
04	Implicit Signal Collection	Behavioral – passive	Cart-add rate, re-engagement, dwell time

Why User-Centered Evaluation Matters for MangaAssist

Offline metrics lie — A recommendation engine with high NDCG@10 can still surface manga titles the user already owns or genres they dislike.
Intent coverage gaps — BLEU/ROUGE scores on faq responses don't capture whether the answer actually resolved the customer's question.
Subjective satisfaction — Whether a chitchat response about a manga series feels knowledgeable requires human judgment.
Business alignment — Amazon cares about conversion (cart adds after recommendation), not just response fluency.

03-user-centered-evaluation/
├── README.md                              ← You are here
├── 01-thumbs-feedback-interface/
│   ├── README.md                          ← 12 scenario questions
│   └── ANSWERS.md                         ← Detailed answers
├── 02-response-rating-system/
│   ├── README.md                          ← 12 scenario questions
│   └── ANSWERS.md                         ← Detailed answers
├── 03-annotation-workflow-quality/
│   ├── README.md                          ← 12 scenario questions
│   └── ANSWERS.md                         ← Detailed answers
└── 04-implicit-signal-collection/
    ├── README.md                          ← 12 scenario questions
    └── ANSWERS.md                         ← Detailed answers

Cross-Cutting Themes

Feedback fatigue: Asking users too often kills response rates. Balance explicit collection with implicit signals.
Sample bias: Power users who leave feedback are not representative of all MangaAssist users.
Storage & streaming: All signals flow through Kinesis Data Streams → DynamoDB (hot) / S3 (cold) for real-time and batch analysis.
Privacy: PII handling under Amazon's data governance — feedback is associated with session IDs, not customer IDs, unless opt-in consent is granted.
Closing the loop: Feedback must flow back into model fine-tuning (SageMaker), prompt optimization (Bedrock), and cache invalidation (ElastiCache Redis).

03 – User-Centered Evaluation

Overview

Why User-Centered Evaluation Matters for MangaAssist

Quick Navigation

Cross-Cutting Themes