LOCAL PREVIEW View on GitHub

04. RAG Prompt Integration

Why RAG Prompting Matters

MangaAssist depends on retrieval for policies, editorial context, and some product knowledge. A good generation model cannot compensate for poor retrieval assembly.

Prompt engineering for RAG is therefore mostly about context discipline.

Retrieval-to-Prompt Flow

  1. classify intent
  2. choose retrieval domain
  3. retrieve top candidates
  4. rerank
  5. filter by freshness and metadata
  6. compress into prompt-friendly context
  7. instruct the model how to use the retrieved evidence

RAG Context Packing Rules

Rule Why
include source type metadata helps the model separate policy from editorial context
keep chunks short enough to stay attributable large chunks dilute evidence
filter by intent before reranking prevents irrelevant but semantically similar hits
carry freshness metadata supports conflict resolution
separate factual chunks from stylistic instructions reduces confusion

Prompt Pattern for Grounded Answers

Use only the retrieved chunks below when answering the user's factual question.
If the chunks are insufficient, say what is known and what is missing.
If chunks conflict, prefer the newest chunk by last_updated.
Do not generalize beyond the retrieved text.

Scenario-Specific Retrieval Guidance

FAQ and Policy

  • prefer policy and FAQ chunks only
  • exclude editorial and review content
  • keep response literal and short

Recommendation Enrichment

  • use editorial chunks and high-level product descriptors
  • avoid review snippets that can introduce noisy sentiment
  • ask the FM to explain why provided ranked items fit the user

Product Q and A

  • prefer structured catalog JSON first
  • only add RAG if catalog fields are sparse and the source is approved

Contradiction Handling

Contradictory retrieval is common in evolving systems.

Prompt Strategy

Some retrieved sources may overlap.
If they conflict, use the most recent authoritative source.
If authority is unclear, state the ambiguity rather than merging the claims.

Operational Strategy

Prompt logic alone is not enough.

Also use:

  • source ranking
  • freshness metadata
  • domain filtering
  • content deduplication in the index pipeline

Chunk Count Strategy

More chunks are not always better.

Use Case Suggested Chunk Count Why
simple FAQ 1 to 2 reduce noise
policy edge case 2 to 3 enough to cover exceptions
recommendation explanation 2 to 4 short editorial snippets improve specificity without flooding prompt
complex multi-intent request per-section chunk groups keep sub-answers grounded separately

RAG Failure Modes That Look Like Prompt Problems

  1. irrelevant chunk retrieval
  2. stale policy chunk outranking fresh chunk
  3. editorial chunk used as factual source
  4. high token pressure causing grounding to be truncated
  5. history dominating retrieved evidence

These often appear as prompt issues in reviews, but the root cause is retrieval quality or assembly logic.

Context Assembly Template

RETRIEVED POLICY CHUNKS
[source=policy][last_updated=2026-02-01] ...

RETRIEVED EDITORIAL CHUNKS
[source=editorial][asin=B0...] ...

PRODUCT DATA
{...}

INSTRUCTION
Use policy chunks for facts.
Use editorial chunks only for recommendation phrasing.
Use product data for product attributes.

When Optimization Failed

Failure

Adding more chunks to the prompt was expected to improve answer quality.

What Actually Happened

  • latency went up
  • first-token time got worse
  • the model produced blurrier answers
  • contradiction risk increased

Workaround

Reduce raw retrieval count, rerank more aggressively, and separate chunks by function before assembly.

The win came from better curation, not bigger prompts.