04. RAG Prompt Integration
Why RAG Prompting Matters
MangaAssist depends on retrieval for policies, editorial context, and some product knowledge. A good generation model cannot compensate for poor retrieval assembly.
Prompt engineering for RAG is therefore mostly about context discipline.
Retrieval-to-Prompt Flow
- classify intent
- choose retrieval domain
- retrieve top candidates
- rerank
- filter by freshness and metadata
- compress into prompt-friendly context
- instruct the model how to use the retrieved evidence
RAG Context Packing Rules
| Rule | Why |
|---|---|
| include source type metadata | helps the model separate policy from editorial context |
| keep chunks short enough to stay attributable | large chunks dilute evidence |
| filter by intent before reranking | prevents irrelevant but semantically similar hits |
| carry freshness metadata | supports conflict resolution |
| separate factual chunks from stylistic instructions | reduces confusion |
Prompt Pattern for Grounded Answers
Use only the retrieved chunks below when answering the user's factual question.
If the chunks are insufficient, say what is known and what is missing.
If chunks conflict, prefer the newest chunk by last_updated.
Do not generalize beyond the retrieved text.
Scenario-Specific Retrieval Guidance
FAQ and Policy
- prefer policy and FAQ chunks only
- exclude editorial and review content
- keep response literal and short
Recommendation Enrichment
- use editorial chunks and high-level product descriptors
- avoid review snippets that can introduce noisy sentiment
- ask the FM to explain why provided ranked items fit the user
Product Q and A
- prefer structured catalog JSON first
- only add RAG if catalog fields are sparse and the source is approved
Contradiction Handling
Contradictory retrieval is common in evolving systems.
Prompt Strategy
Some retrieved sources may overlap.
If they conflict, use the most recent authoritative source.
If authority is unclear, state the ambiguity rather than merging the claims.
Operational Strategy
Prompt logic alone is not enough.
Also use:
- source ranking
- freshness metadata
- domain filtering
- content deduplication in the index pipeline
Chunk Count Strategy
More chunks are not always better.
| Use Case | Suggested Chunk Count | Why |
|---|---|---|
| simple FAQ | 1 to 2 | reduce noise |
| policy edge case | 2 to 3 | enough to cover exceptions |
| recommendation explanation | 2 to 4 short editorial snippets | improve specificity without flooding prompt |
| complex multi-intent request | per-section chunk groups | keep sub-answers grounded separately |
RAG Failure Modes That Look Like Prompt Problems
- irrelevant chunk retrieval
- stale policy chunk outranking fresh chunk
- editorial chunk used as factual source
- high token pressure causing grounding to be truncated
- history dominating retrieved evidence
These often appear as prompt issues in reviews, but the root cause is retrieval quality or assembly logic.
Context Assembly Template
RETRIEVED POLICY CHUNKS
[source=policy][last_updated=2026-02-01] ...
RETRIEVED EDITORIAL CHUNKS
[source=editorial][asin=B0...] ...
PRODUCT DATA
{...}
INSTRUCTION
Use policy chunks for facts.
Use editorial chunks only for recommendation phrasing.
Use product data for product attributes.
When Optimization Failed
Failure
Adding more chunks to the prompt was expected to improve answer quality.
What Actually Happened
- latency went up
- first-token time got worse
- the model produced blurrier answers
- contradiction risk increased
Workaround
Reduce raw retrieval count, rerank more aggressively, and separate chunks by function before assembly.
The win came from better curation, not bigger prompts.