LOCAL PREVIEW View on GitHub

Review & Sentiment MCP — Community Opinion Synthesis

Purpose

Surfaces aggregated community intelligence from 50M+ reader reviews. Rather than returning raw review text, this MCP synthesises sentiment, surfaces representative quotes, and provides nuanced opinion breakdowns — all via RAG over the review corpus.


Exposed Tools

Tool Input Output Use Case
get_sentiment_summary manga_id, aspect? SentimentSummary Overall / aspect sentiment
get_reviews manga_id, sort, limit ReviewList Sample representative reviews
get_aspect_breakdown manga_id, aspects[] AspectBreakdown Art / story / pacing / translation
compare_sentiment manga_ids[] ComparisonTable Side-by-side opinion comparison
get_volume_reception manga_id, volume? VolumeReception "How was vol 8 received?"

RAG Pipeline

flowchart TD
    TC([Tool Call: get_sentiment_summary\nmanga_id='BERSERK' aspect='art']) --> MF[Metadata Filter\nPre-filter by manga_id]
    MF --> EB[Embed aspect query\n'art style illustrations detail']
    EB --> OS[(OpenSearch\nReview Corpus Index)]
    MF --> OS
    OS --> HY[Hybrid Retrieval\nkNN + BM25 on review text]
    HY --> RK[Rerank top-50 → top-20]
    RK --> SA[Sentiment Aggregator\nPositive · Negative · Mixed counts]
    SA --> QX[Quote Extractor\nPick 3 representative verbatim quotes]
    QX --> SUM[Summary Builder\nStructured SentimentSummary]
    SUM --> TR([Tool Result → Claude])

    style TC fill:#4A90D9,color:#fff
    style TR fill:#27AE60,color:#fff
    style SA fill:#8E44AD,color:#fff

Review Corpus Index Design

{
  "mappings": {
    "properties": {
      "review_id":     { "type": "keyword" },
      "manga_id":      { "type": "keyword" },
      "volume_number": { "type": "integer" },
      "user_id":       { "type": "keyword" },
      "rating":        { "type": "float" },
      "review_text":   { "type": "text", "analyzer": "english" },
      "helpful_votes": { "type": "integer" },
      "verified":      { "type": "boolean" },
      "aspects": {
        "type": "object",
        "properties": {
          "art":         { "type": "float" },
          "story":       { "type": "float" },
          "pacing":      { "type": "float" },
          "translation": { "type": "float" },
          "characters":  { "type": "float" }
        }
      },
      "sentiment_label": { "type": "keyword" },
      "created_at":      { "type": "date" },
      "embedding":       { "type": "knn_vector", "dimension": 1024,
                           "method": { "name": "hnsw", "engine": "faiss" } }
    }
  }
}

Key design choices: - aspects scores are pre-computed offline by an aspect-based sentiment model (ABSA) and stored — not computed at query time - verified flag filters out bot reviews - helpful_votes used as a quality signal in the RRF score fusion


Aspect-Based Sentiment Breakdown

flowchart LR
    RV([Raw Review Text]) --> ABSA[Aspect-Based\nSentiment Model\nDeBERTa fine-tuned]
    ABSA --> AR[Art: 4.8 / 5]
    ABSA --> SR[Story: 4.6 / 5]
    ABSA --> PR[Pacing: 3.9 / 5]
    ABSA --> TR2[Translation: 4.2 / 5]
    ABSA --> CR[Characters: 4.9 / 5]
    AR --> IDX[(OpenSearch\nReview Index)]
    SR --> IDX
    PR --> IDX
    TR2 --> IDX
    CR --> IDX

    style RV fill:#4A90D9,color:#fff
    style IDX fill:#E67E22,color:#fff

The ABSA model runs as an offline batch job (AWS Batch, nightly). Reviews indexed without aspect scores are flagged as aspects_pending: true and excluded from aspect-specific queries.


Sentiment Aggregation Logic

def aggregate_sentiment(reviews: list[Review], aspect: str | None) -> SentimentSummary:
    if aspect:
        scores = [r.aspects.get(aspect) for r in reviews if r.aspects.get(aspect)]
    else:
        scores = [r.rating for r in reviews]

    positive = sum(1 for s in scores if s >= 4.0)
    mixed    = sum(1 for s in scores if 2.5 <= s < 4.0)
    negative = sum(1 for s in scores if s < 2.5)
    avg      = sum(scores) / len(scores) if scores else 0

    # Pick representative quotes: one per sentiment bucket
    quotes = pick_representative_quotes(reviews, aspect)

    return SentimentSummary(
        average_score=round(avg, 2),
        positive_pct=positive / len(scores) * 100,
        mixed_pct=mixed / len(scores) * 100,
        negative_pct=negative / len(scores) * 100,
        representative_quotes=quotes,
        review_count=len(scores),
    )

Volume-Level Reception Trend

xychart-beta
    title "Berserk Volume Reception Over Time (avg rating)"
    x-axis [V1, V5, V10, V15, V20, V25, V30, V35, V40]
    y-axis "Avg Rating" 1 --> 5
    line [4.1, 4.3, 4.7, 4.8, 4.9, 4.9, 4.8, 4.6, 4.7]

The get_volume_reception tool fetches per-volume aggregates from a DynamoDB summary table (pre-computed nightly) rather than re-running RAG over 50M reviews at query time.


Anti-Spam and Quality Filters

flowchart TD
    NR([New Review Submitted]) --> VF{Verified\nPurchase?}
    VF -->|Yes| SC[Spam Classifier\nLightGBM model]
    VF -->|No| PQ[Penalised Quality Score ×0.5]
    SC -->|Clean| HV{Helpful votes\n≥ 3?}
    SC -->|Spam| DQ[Quarantine\nNot indexed]
    HV -->|Yes| FI[Full Index\nincluded in retrieval]
    HV -->|No| LQ[Low-quality bucket\nonly in full-scan queries]
    PQ --> LQ

    style NR fill:#4A90D9,color:#fff
    style DQ fill:#C0392B,color:#fff
    style FI fill:#27AE60,color:#fff

Failure Modes

Failure Symptom Mitigation
Sentiment skew from review bombing Avg drops overnight Anomaly detector on hourly avg shifts > 0.5; auto-hold suspicious reviews
ABSA model drift Aspect scores diverge from true sentiment Monthly eval against human-labeled test set; alert if F1 < 0.80
Review corpus index lag New reviews not retrieved Kinesis stream → OpenSearch ingest takes <60s; SLA acceptable
Over-fetching old reviews for new volumes Stale reception data created_at filter: weight recent reviews 2× for volumes < 3 months old

Interview Grill

Q: With 50M reviews, how do you keep OpenSearch responsive? A: Pre-filter by manga_id reduces the search space from 50M to ~50K per title. HNSW index on that filtered subset is trivially fast. For the top 1000 titles, we also maintain pre-computed summary DynamoDB items so the RAG pipeline is bypassed entirely.

Q: How do you prevent one viral negative review from dominating the summary? A: Helpful votes are part of the RRF score fusion. High-upvote reviews rank higher, but we cap the quote extraction to pick one per sentiment bucket — preventing any single review from monopolising the output.

Q: How do you distinguish between "bad art style" (genuine critique) and "I don't like this genre's art" (preference mismatch)? A: The ABSA model is trained on genre-specific labelled data. A 3-star review of Berserk's art from a shoujo fan is tagged as preference_mismatch: true using a subclassifier, and excluded from the objective art quality scores.

Q: Should Claude read raw review text from the tool result? A: No. The tool returns a structured SentimentSummary with aggregated scores and 3 curated quotes. Raw review text would (a) blow the context window, (b) expose user PII, and © risk prompt injection from malicious review content.