Orchestrator Request Flow
Covers Q1, Q11, Q24, Q38, Q46.
What The Interviewer Is Testing
- Whether you can explain the orchestrator as the decision-making control plane instead of a bag of helper calls.
- Whether you understand sequencing, parallel fan-out, latency budgets, and persistence boundaries.
- Whether you can evolve an MVP design without over-engineering it too early.
Deep Dive
Core Mental Model
The orchestrator should decide, not do everything itself. A strong answer usually frames it this way:
- Load context.
- Classify the incoming request.
- Fan out only to the services required for that intent.
- Aggregate partial results into a prompt-ready shape.
- Invoke the LLM.
- Run guardrails.
- Persist the turn and emit analytics.
- Return a structured response.
What Belongs Inside The Orchestrator
- State transition logic.
- Timeout and retry policy selection.
- Dependency ordering.
- Partial-result assembly.
- Correlation IDs, tracing context, and response metadata.
What Should Stay Outside
- Business logic for recommendations, catalog, or promotions.
- Prompt template ownership.
- Guardrail implementation details.
- Session storage implementation details.
Production Answer Shape
- Start with the happy-path state machine.
- Then describe parallel fan-out for non-dependent calls.
- Then explain what happens when one dependency is slow or unavailable.
- End with the latency budget and how you would decompose the class if intents and services grow.
Strong Answer Pattern
- "The orchestrator is the workflow controller."
- "It separates mandatory dependencies from best-effort dependencies."
- "It must be idempotent around retries and safe around partial failures."
- "For MVP a single class is acceptable, but I would split routing, prompt building, and response assembly once intent count and dependency count grow."
Scenario 1: Partial Fan-Out Failure
Primary Prompt
The user asks for manga recommendations. Catalog returns in 120 ms, Recommendations times out at 800 ms, Promotions succeeds. How should the orchestrator behave?
Follow-Up 1
Which dependency is critical here, and how do you encode that distinction in the orchestration layer?
Follow-Up 2
Would you retry Recommendations synchronously, or continue with partial results? What timeout budget would you use?
Follow-Up 3
How should the prompt change so the LLM does not hallucinate promotions or recommendations that were never fetched?
Strong Answer Markers
- Classifies downstreams into critical and best-effort.
- Uses scatter-gather with per-service timeouts.
- Continues with available data when the failed service is non-critical.
- Passes explicit "data unavailable" markers into prompt construction.
- Logs dependency-specific failures for later tuning.
Scenario 2: Latency Regression At P95
Primary Prompt
The end-to-end p95 latency moved from 2.1 s to 4.0 s after adding a reranker and extra guardrails. Walk through your debugging plan.
Follow-Up 1
What spans and metrics must already exist to make that diagnosis fast?
Follow-Up 2
If the regression is split between LLM generation and a catalog lookup inside guardrails, what is the likely design flaw?
Follow-Up 3
What would you optimize first if product accuracy matters more than raw speed?
Strong Answer Markers
- Breaks latency down by orchestration stage.
- Mentions tracing IDs and per-span histograms.
- Identifies expensive synchronous work placed too late in the pipeline.
- Optimizes within a latency budget instead of hand-waving about caching everything.
Scenario 3: The MVP Orchestrator Is Becoming A Monolith
Primary Prompt
The orchestrator class now supports 50 intents and 15 downstream services. What would you refactor first?
Follow-Up 1
Would you immediately split into microservices, or first modularize in-process? Why?
Follow-Up 2
How do you avoid turning routeByIntent into a giant switch statement?
Follow-Up 3
What runtime evidence would justify moving from a modular monolith to multiple services?
Strong Answer Markers
- Chooses plugin or registry-based intent handlers.
- Separates workflow control from prompt construction and response assembly.
- Uses growth signals such as team ownership, deploy cadence, scaling profile, and blast radius.
- Avoids premature service sprawl.
Red Flags
- Describing the orchestrator as if it owns every business rule.
- Saying "retry everything" without idempotency or deadline awareness.
- Ignoring partial-result behavior.
- Proposing microservices only because the system is important.
Two-Minute Whiteboard Version
Draw a pipeline with three lanes:
- Synchronous control path.
- Parallel dependency fan-out.
- Post-generation validation and persistence.
Then annotate each stage with target latency and fallback behavior.