Review & Sentiment MCP — Community Opinion Synthesis
Purpose
Surfaces aggregated community intelligence from 50M+ reader reviews. Rather than returning raw review text, this MCP synthesises sentiment, surfaces representative quotes, and provides nuanced opinion breakdowns — all via RAG over the review corpus.
Exposed Tools
| Tool | Input | Output | Use Case |
|---|---|---|---|
get_sentiment_summary |
manga_id, aspect? |
SentimentSummary |
Overall / aspect sentiment |
get_reviews |
manga_id, sort, limit |
ReviewList |
Sample representative reviews |
get_aspect_breakdown |
manga_id, aspects[] |
AspectBreakdown |
Art / story / pacing / translation |
compare_sentiment |
manga_ids[] |
ComparisonTable |
Side-by-side opinion comparison |
get_volume_reception |
manga_id, volume? |
VolumeReception |
"How was vol 8 received?" |
RAG Pipeline
flowchart TD
TC([Tool Call: get_sentiment_summary\nmanga_id='BERSERK' aspect='art']) --> MF[Metadata Filter\nPre-filter by manga_id]
MF --> EB[Embed aspect query\n'art style illustrations detail']
EB --> OS[(OpenSearch\nReview Corpus Index)]
MF --> OS
OS --> HY[Hybrid Retrieval\nkNN + BM25 on review text]
HY --> RK[Rerank top-50 → top-20]
RK --> SA[Sentiment Aggregator\nPositive · Negative · Mixed counts]
SA --> QX[Quote Extractor\nPick 3 representative verbatim quotes]
QX --> SUM[Summary Builder\nStructured SentimentSummary]
SUM --> TR([Tool Result → Claude])
style TC fill:#4A90D9,color:#fff
style TR fill:#27AE60,color:#fff
style SA fill:#8E44AD,color:#fff
Review Corpus Index Design
{
"mappings": {
"properties": {
"review_id": { "type": "keyword" },
"manga_id": { "type": "keyword" },
"volume_number": { "type": "integer" },
"user_id": { "type": "keyword" },
"rating": { "type": "float" },
"review_text": { "type": "text", "analyzer": "english" },
"helpful_votes": { "type": "integer" },
"verified": { "type": "boolean" },
"aspects": {
"type": "object",
"properties": {
"art": { "type": "float" },
"story": { "type": "float" },
"pacing": { "type": "float" },
"translation": { "type": "float" },
"characters": { "type": "float" }
}
},
"sentiment_label": { "type": "keyword" },
"created_at": { "type": "date" },
"embedding": { "type": "knn_vector", "dimension": 1024,
"method": { "name": "hnsw", "engine": "faiss" } }
}
}
}
Key design choices:
- aspects scores are pre-computed offline by an aspect-based sentiment model (ABSA) and stored — not computed at query time
- verified flag filters out bot reviews
- helpful_votes used as a quality signal in the RRF score fusion
Aspect-Based Sentiment Breakdown
flowchart LR
RV([Raw Review Text]) --> ABSA[Aspect-Based\nSentiment Model\nDeBERTa fine-tuned]
ABSA --> AR[Art: 4.8 / 5]
ABSA --> SR[Story: 4.6 / 5]
ABSA --> PR[Pacing: 3.9 / 5]
ABSA --> TR2[Translation: 4.2 / 5]
ABSA --> CR[Characters: 4.9 / 5]
AR --> IDX[(OpenSearch\nReview Index)]
SR --> IDX
PR --> IDX
TR2 --> IDX
CR --> IDX
style RV fill:#4A90D9,color:#fff
style IDX fill:#E67E22,color:#fff
The ABSA model runs as an offline batch job (AWS Batch, nightly). Reviews indexed without aspect scores are flagged as aspects_pending: true and excluded from aspect-specific queries.
Sentiment Aggregation Logic
def aggregate_sentiment(reviews: list[Review], aspect: str | None) -> SentimentSummary:
if aspect:
scores = [r.aspects.get(aspect) for r in reviews if r.aspects.get(aspect)]
else:
scores = [r.rating for r in reviews]
positive = sum(1 for s in scores if s >= 4.0)
mixed = sum(1 for s in scores if 2.5 <= s < 4.0)
negative = sum(1 for s in scores if s < 2.5)
avg = sum(scores) / len(scores) if scores else 0
# Pick representative quotes: one per sentiment bucket
quotes = pick_representative_quotes(reviews, aspect)
return SentimentSummary(
average_score=round(avg, 2),
positive_pct=positive / len(scores) * 100,
mixed_pct=mixed / len(scores) * 100,
negative_pct=negative / len(scores) * 100,
representative_quotes=quotes,
review_count=len(scores),
)
Volume-Level Reception Trend
xychart-beta
title "Berserk Volume Reception Over Time (avg rating)"
x-axis [V1, V5, V10, V15, V20, V25, V30, V35, V40]
y-axis "Avg Rating" 1 --> 5
line [4.1, 4.3, 4.7, 4.8, 4.9, 4.9, 4.8, 4.6, 4.7]
The get_volume_reception tool fetches per-volume aggregates from a DynamoDB summary table (pre-computed nightly) rather than re-running RAG over 50M reviews at query time.
Anti-Spam and Quality Filters
flowchart TD
NR([New Review Submitted]) --> VF{Verified\nPurchase?}
VF -->|Yes| SC[Spam Classifier\nLightGBM model]
VF -->|No| PQ[Penalised Quality Score ×0.5]
SC -->|Clean| HV{Helpful votes\n≥ 3?}
SC -->|Spam| DQ[Quarantine\nNot indexed]
HV -->|Yes| FI[Full Index\nincluded in retrieval]
HV -->|No| LQ[Low-quality bucket\nonly in full-scan queries]
PQ --> LQ
style NR fill:#4A90D9,color:#fff
style DQ fill:#C0392B,color:#fff
style FI fill:#27AE60,color:#fff
Failure Modes
| Failure | Symptom | Mitigation |
|---|---|---|
| Sentiment skew from review bombing | Avg drops overnight | Anomaly detector on hourly avg shifts > 0.5; auto-hold suspicious reviews |
| ABSA model drift | Aspect scores diverge from true sentiment | Monthly eval against human-labeled test set; alert if F1 < 0.80 |
| Review corpus index lag | New reviews not retrieved | Kinesis stream → OpenSearch ingest takes <60s; SLA acceptable |
| Over-fetching old reviews for new volumes | Stale reception data | created_at filter: weight recent reviews 2× for volumes < 3 months old |
Interview Grill
Q: With 50M reviews, how do you keep OpenSearch responsive?
A: Pre-filter by manga_id reduces the search space from 50M to ~50K per title. HNSW index on that filtered subset is trivially fast. For the top 1000 titles, we also maintain pre-computed summary DynamoDB items so the RAG pipeline is bypassed entirely.
Q: How do you prevent one viral negative review from dominating the summary? A: Helpful votes are part of the RRF score fusion. High-upvote reviews rank higher, but we cap the quote extraction to pick one per sentiment bucket — preventing any single review from monopolising the output.
Q: How do you distinguish between "bad art style" (genuine critique) and "I don't like this genre's art" (preference mismatch)?
A: The ABSA model is trained on genre-specific labelled data. A 3-star review of Berserk's art from a shoujo fan is tagged as preference_mismatch: true using a subclassifier, and excluded from the objective art quality scores.
Q: Should Claude read raw review text from the tool result?
A: No. The tool returns a structured SentimentSummary with aggregated scores and 3 curated quotes. Raw review text would (a) blow the context window, (b) expose user PII, and © risk prompt injection from malicious review content.