RAG Indexing, Retrieval, And Re-Ranking
Covers Q12, Q13, Q14, Q15, Q23, Q26, Q30, Q34.
What The Interviewer Is Testing
- Whether you can explain retrieval as a pipeline, not just "we use vectors".
- Whether you understand chunking, metadata, HNSW tuning, and reranking trade-offs.
- Whether you can talk about freshness and contradiction handling in production.
Deep Dive
Ingestion Pipeline
- Source documents arrive from FAQs, product content, or policy content.
- Documents are cleaned, normalized, and chunked.
- Each chunk receives metadata.
- Embeddings are generated.
- Vectors and metadata are upserted into OpenSearch.
Chunking Design
The stated design uses 512-token chunks with 50-token overlap. A strong answer explains why:
- Large enough for semantic coherence.
- Small enough for precise retrieval.
- Overlap preserves continuity when relevant facts straddle boundaries.
Metadata Matters
Metadata is not an afterthought. It enables:
- filtered retrieval by source type or category
- freshness-aware ranking
- source attribution
- safer handling of product-specific queries through
asin
Retrieval Path
- Embed the user query.
- Retrieve top-k candidates through ANN search.
- Re-rank with a cross-encoder.
- Select the final chunks.
- Pass them into the prompt with source metadata.
HNSW Talking Points
Mcontrols graph connectivity and memory use.ef_constructionaffects index build quality and build time.ef_searchaffects recall and query latency.
For RAG, good answers usually bias slightly toward recall because poor retrieval quality hurts the final answer more than a modest search-time increase.
Contradictions And Freshness
Retrieved chunks can disagree. The answer should mention:
- versioning or deduplication during indexing
last_updatedmetadata- prompt instructions to prefer fresher sources
- operational flags for human review when contradictions persist
Strong Answer Pattern
- "RAG quality is dominated by retrieval quality, not only model quality."
- "Metadata filtering often matters as much as vector similarity."
- "Reranking is worth the cost when top-3 quality matters."
- "Freshness is a pipeline problem, not just a prompt problem."
Scenario 1: Contradictory Chunks
Primary Prompt
The retriever returns one chunk saying a return policy is 30 days and another saying it is 14 days. What should happen?
Follow-Up 1
Is this primarily an LLM problem or an indexing problem?
Follow-Up 2
How should last_updated affect ranking or selection?
Follow-Up 3
What analytics event would you emit so the data pipeline team can fix the root cause?
Strong Answer Markers
- Attributes root cause to data or indexing quality first.
- Uses freshness metadata and source preference rules.
- Mentions surfacing contradictions for human cleanup.
Scenario 2: Retrieval Recall Is Poor For Niche Manga Queries
Primary Prompt
Users asking for obscure manga volumes are getting weak retrieval results even though the documents exist. Where do you investigate first?
Follow-Up 1
Would you adjust chunking, ef_search, metadata filters, or the embedding model first?
Follow-Up 2
How would you know if reranking is helping or hurting?
Follow-Up 3
What offline evaluation dataset would you build?
Strong Answer Markers
- Separates recall issues from reranking issues.
- Uses query-set evaluation rather than intuition.
- Talks about retrieval metrics such as recall@k and MRR.
- Understands that over-filtering metadata can kill recall.
Scenario 3: Real-Time Catalog Updates
Primary Prompt
A product is discontinued. How does that propagate through the RAG index without serving stale content for hours?
Follow-Up 1
What event source would trigger the update pipeline?
Follow-Up 2
Would you ever store prices inside chunks?
Follow-Up 3
What SLO would you define for index freshness?
Strong Answer Markers
- Uses event-driven indexing through SNS, streams, or similar triggers.
- Deletes or updates chunks for discontinued products.
- Avoids relying on static chunked prices for live pricing.
- Defines freshness explicitly, for example under five minutes.
Red Flags
- Saying overlap is always good without a cost trade-off.
- Ignoring metadata filters.
- Treating HNSW parameters as random knobs.
- Solving contradictory sources only with prompt wording.
Two-Minute Whiteboard Version
Draw two planes:
- Offline ingestion plane with chunking, embedding, indexing.
- Online retrieval plane with embed, search, rerank, select, prompt.