04. RAG Prompt Integration

Why RAG Prompting Matters

MangaAssist depends on retrieval for policies, editorial context, and some product knowledge. A good generation model cannot compensate for poor retrieval assembly.

Prompt engineering for RAG is therefore mostly about context discipline.

Retrieval-to-Prompt Flow

classify intent
choose retrieval domain
retrieve top candidates
rerank
filter by freshness and metadata
compress into prompt-friendly context
instruct the model how to use the retrieved evidence

RAG Context Packing Rules

Rule	Why
include source type metadata	helps the model separate policy from editorial context
keep chunks short enough to stay attributable	large chunks dilute evidence
filter by intent before reranking	prevents irrelevant but semantically similar hits
carry freshness metadata	supports conflict resolution
separate factual chunks from stylistic instructions	reduces confusion

Prompt Pattern for Grounded Answers

Use only the retrieved chunks below when answering the user's factual question.
If the chunks are insufficient, say what is known and what is missing.
If chunks conflict, prefer the newest chunk by last_updated.
Do not generalize beyond the retrieved text.

Scenario-Specific Retrieval Guidance

FAQ and Policy

prefer policy and FAQ chunks only
exclude editorial and review content
keep response literal and short

Recommendation Enrichment

use editorial chunks and high-level product descriptors
avoid review snippets that can introduce noisy sentiment
ask the FM to explain why provided ranked items fit the user

Product Q and A

prefer structured catalog JSON first
only add RAG if catalog fields are sparse and the source is approved

Contradiction Handling

Contradictory retrieval is common in evolving systems.

Prompt Strategy

Some retrieved sources may overlap.
If they conflict, use the most recent authoritative source.
If authority is unclear, state the ambiguity rather than merging the claims.

Operational Strategy

Prompt logic alone is not enough.

Also use:

source ranking
freshness metadata
domain filtering
content deduplication in the index pipeline

Chunk Count Strategy

More chunks are not always better.

Use Case	Suggested Chunk Count	Why
simple FAQ	1 to 2	reduce noise
policy edge case	2 to 3	enough to cover exceptions
recommendation explanation	2 to 4 short editorial snippets	improve specificity without flooding prompt
complex multi-intent request	per-section chunk groups	keep sub-answers grounded separately

RAG Failure Modes That Look Like Prompt Problems

irrelevant chunk retrieval
stale policy chunk outranking fresh chunk
editorial chunk used as factual source
high token pressure causing grounding to be truncated
history dominating retrieved evidence

These often appear as prompt issues in reviews, but the root cause is retrieval quality or assembly logic.

Context Assembly Template

RETRIEVED POLICY CHUNKS
[source=policy][last_updated=2026-02-01] ...

RETRIEVED EDITORIAL CHUNKS
[source=editorial][asin=B0...] ...

PRODUCT DATA
{...}

INSTRUCTION
Use policy chunks for facts.
Use editorial chunks only for recommendation phrasing.
Use product data for product attributes.

When Optimization Failed

Failure

Adding more chunks to the prompt was expected to improve answer quality.

What Actually Happened

latency went up
first-token time got worse
the model produced blurrier answers
contradiction risk increased

Workaround

Reduce raw retrieval count, rerank more aggressively, and separate chunks by function before assembly.

The win came from better curation, not bigger prompts.