Orchestrator Request Flow

Covers Q1, Q11, Q24, Q38, Q46.

What The Interviewer Is Testing

Whether you can explain the orchestrator as the decision-making control plane instead of a bag of helper calls.
Whether you understand sequencing, parallel fan-out, latency budgets, and persistence boundaries.
Whether you can evolve an MVP design without over-engineering it too early.

Deep Dive

Core Mental Model

The orchestrator should decide, not do everything itself. A strong answer usually frames it this way:

Load context.
Classify the incoming request.
Fan out only to the services required for that intent.
Aggregate partial results into a prompt-ready shape.
Invoke the LLM.
Run guardrails.
Persist the turn and emit analytics.
Return a structured response.

What Belongs Inside The Orchestrator

State transition logic.
Timeout and retry policy selection.
Dependency ordering.
Partial-result assembly.
Correlation IDs, tracing context, and response metadata.

What Should Stay Outside

Business logic for recommendations, catalog, or promotions.
Prompt template ownership.
Guardrail implementation details.
Session storage implementation details.

Production Answer Shape

Start with the happy-path state machine.
Then describe parallel fan-out for non-dependent calls.
Then explain what happens when one dependency is slow or unavailable.
End with the latency budget and how you would decompose the class if intents and services grow.

Strong Answer Pattern

"The orchestrator is the workflow controller."
"It separates mandatory dependencies from best-effort dependencies."
"It must be idempotent around retries and safe around partial failures."
"For MVP a single class is acceptable, but I would split routing, prompt building, and response assembly once intent count and dependency count grow."

Scenario 1: Partial Fan-Out Failure

Primary Prompt

The user asks for manga recommendations. Catalog returns in 120 ms, Recommendations times out at 800 ms, Promotions succeeds. How should the orchestrator behave?

Follow-Up 1

Which dependency is critical here, and how do you encode that distinction in the orchestration layer?

Follow-Up 2

Would you retry Recommendations synchronously, or continue with partial results? What timeout budget would you use?

Follow-Up 3

How should the prompt change so the LLM does not hallucinate promotions or recommendations that were never fetched?

Strong Answer Markers

Classifies downstreams into critical and best-effort.
Uses scatter-gather with per-service timeouts.
Continues with available data when the failed service is non-critical.
Passes explicit "data unavailable" markers into prompt construction.
Logs dependency-specific failures for later tuning.

Scenario 2: Latency Regression At P95

Primary Prompt

The end-to-end p95 latency moved from 2.1 s to 4.0 s after adding a reranker and extra guardrails. Walk through your debugging plan.

Follow-Up 1

What spans and metrics must already exist to make that diagnosis fast?

Follow-Up 2

If the regression is split between LLM generation and a catalog lookup inside guardrails, what is the likely design flaw?

Follow-Up 3

What would you optimize first if product accuracy matters more than raw speed?

Strong Answer Markers

Breaks latency down by orchestration stage.
Mentions tracing IDs and per-span histograms.
Identifies expensive synchronous work placed too late in the pipeline.
Optimizes within a latency budget instead of hand-waving about caching everything.

Scenario 3: The MVP Orchestrator Is Becoming A Monolith

Primary Prompt

The orchestrator class now supports 50 intents and 15 downstream services. What would you refactor first?

Follow-Up 1

Would you immediately split into microservices, or first modularize in-process? Why?

Follow-Up 2

How do you avoid turning routeByIntent into a giant switch statement?

Follow-Up 3

What runtime evidence would justify moving from a modular monolith to multiple services?

Strong Answer Markers

Chooses plugin or registry-based intent handlers.
Separates workflow control from prompt construction and response assembly.
Uses growth signals such as team ownership, deploy cadence, scaling profile, and blast radius.
Avoids premature service sprawl.

Red Flags

Describing the orchestrator as if it owns every business rule.
Saying "retry everything" without idempotency or deadline awareness.
Ignoring partial-result behavior.
Proposing microservices only because the system is important.

Two-Minute Whiteboard Version

Draw a pipeline with three lanes:

Synchronous control path.
Parallel dependency fan-out.
Post-generation validation and persistence.

Then annotate each stage with target latency and fallback behavior.