LOCAL PREVIEW View on GitHub

Scenario 3: Shared Read/Write Path in OpenSearch

Scenario Summary

The same OpenSearch collection handles live retrieval traffic and heavy background indexing. When the ingest job runs, write pressure degrades online search latency and the chatbot quietly falls back to weak, ungrounded answers.

Why It Matters

Architectural design is not only about choosing the right service. It is also about separating workloads that have different latency goals, scaling behavior, and failure tolerance.

Failure Pattern

Design area Weak choice Better choice
Workload isolation One collection for all reads and writes Separate paths or capacity strategy for online vs batch work
Fallback behavior Retrieval timeout silently becomes "answer anyway" Detect degraded retrieval and change the response mode
Capacity planning Assume auto-scaling will absorb every spike Model ingest windows and concurrency explicitly

Deep Dive

This failure usually hides because the system still returns an answer. The true issue is that the answer is no longer grounded in current data. In RAG systems, background maintenance can quietly corrupt the quality of the online path if both workloads compete for the same search capacity.

Better designs include:

  • dedicated collections or indexes for ingestion and serving,
  • staged indexing followed by controlled alias swaps,
  • scheduled ingest windows aligned to low user traffic,
  • alarms that combine retrieval latency and retrieval hit quality.

Detection Signals

  • Search latency spikes during predictable ingest windows
  • More answers are generated without retrieved context
  • Query timeout rates increase while batch jobs are active

Runbook

  1. Correlate retrieval latency with indexing-job schedules.
  2. Separate online serving from ingest-heavy write paths.
  3. Use staged indexing and cutover instead of in-place heavy updates.
  4. Add a degraded-mode response when retrieval misses are caused by infrastructure stress.
  5. Review whether the current store is still the right fit for the update pattern.

Questions To Ask

  • Which search workload has the stricter SLA: online queries or batch updates?
  • Can we rebuild or refresh data without touching the serving path?
  • What should the user experience be when retrieval is temporarily degraded?
  • Are we measuring retrieval quality separately from LLM response success?

Interview Drill

Why is "the model still answered" not enough evidence that the retrieval architecture is healthy?

Good Outcome

Online retrieval has protected capacity and quality stays stable even while the corpus is being refreshed.