LOCAL PREVIEW View on GitHub

RAG Indexing, Retrieval, And Re-Ranking

Covers Q12, Q13, Q14, Q15, Q23, Q26, Q30, Q34.

What The Interviewer Is Testing

  • Whether you can explain retrieval as a pipeline, not just "we use vectors".
  • Whether you understand chunking, metadata, HNSW tuning, and reranking trade-offs.
  • Whether you can talk about freshness and contradiction handling in production.

Deep Dive

Ingestion Pipeline

  1. Source documents arrive from FAQs, product content, or policy content.
  2. Documents are cleaned, normalized, and chunked.
  3. Each chunk receives metadata.
  4. Embeddings are generated.
  5. Vectors and metadata are upserted into OpenSearch.

Chunking Design

The stated design uses 512-token chunks with 50-token overlap. A strong answer explains why:

  • Large enough for semantic coherence.
  • Small enough for precise retrieval.
  • Overlap preserves continuity when relevant facts straddle boundaries.

Metadata Matters

Metadata is not an afterthought. It enables:

  • filtered retrieval by source type or category
  • freshness-aware ranking
  • source attribution
  • safer handling of product-specific queries through asin

Retrieval Path

  1. Embed the user query.
  2. Retrieve top-k candidates through ANN search.
  3. Re-rank with a cross-encoder.
  4. Select the final chunks.
  5. Pass them into the prompt with source metadata.

HNSW Talking Points

  • M controls graph connectivity and memory use.
  • ef_construction affects index build quality and build time.
  • ef_search affects recall and query latency.

For RAG, good answers usually bias slightly toward recall because poor retrieval quality hurts the final answer more than a modest search-time increase.

Contradictions And Freshness

Retrieved chunks can disagree. The answer should mention:

  • versioning or deduplication during indexing
  • last_updated metadata
  • prompt instructions to prefer fresher sources
  • operational flags for human review when contradictions persist

Strong Answer Pattern

  • "RAG quality is dominated by retrieval quality, not only model quality."
  • "Metadata filtering often matters as much as vector similarity."
  • "Reranking is worth the cost when top-3 quality matters."
  • "Freshness is a pipeline problem, not just a prompt problem."

Scenario 1: Contradictory Chunks

Primary Prompt

The retriever returns one chunk saying a return policy is 30 days and another saying it is 14 days. What should happen?

Follow-Up 1

Is this primarily an LLM problem or an indexing problem?

Follow-Up 2

How should last_updated affect ranking or selection?

Follow-Up 3

What analytics event would you emit so the data pipeline team can fix the root cause?

Strong Answer Markers

  • Attributes root cause to data or indexing quality first.
  • Uses freshness metadata and source preference rules.
  • Mentions surfacing contradictions for human cleanup.

Scenario 2: Retrieval Recall Is Poor For Niche Manga Queries

Primary Prompt

Users asking for obscure manga volumes are getting weak retrieval results even though the documents exist. Where do you investigate first?

Follow-Up 1

Would you adjust chunking, ef_search, metadata filters, or the embedding model first?

Follow-Up 2

How would you know if reranking is helping or hurting?

Follow-Up 3

What offline evaluation dataset would you build?

Strong Answer Markers

  • Separates recall issues from reranking issues.
  • Uses query-set evaluation rather than intuition.
  • Talks about retrieval metrics such as recall@k and MRR.
  • Understands that over-filtering metadata can kill recall.

Scenario 3: Real-Time Catalog Updates

Primary Prompt

A product is discontinued. How does that propagate through the RAG index without serving stale content for hours?

Follow-Up 1

What event source would trigger the update pipeline?

Follow-Up 2

Would you ever store prices inside chunks?

Follow-Up 3

What SLO would you define for index freshness?

Strong Answer Markers

  • Uses event-driven indexing through SNS, streams, or similar triggers.
  • Deletes or updates chunks for discontinued products.
  • Avoids relying on static chunked prices for live pricing.
  • Defines freshness explicitly, for example under five minutes.

Red Flags

  • Saying overlap is always good without a cost trade-off.
  • Ignoring metadata filters.
  • Treating HNSW parameters as random knobs.
  • Solving contradictory sources only with prompt wording.

Two-Minute Whiteboard Version

Draw two planes:

  1. Offline ingestion plane with chunking, embedding, indexing.
  2. Online retrieval plane with embed, search, rerank, select, prompt.