LOCAL PREVIEW View on GitHub

Orchestrator Request Flow

Covers Q1, Q11, Q24, Q38, Q46.

What The Interviewer Is Testing

  • Whether you can explain the orchestrator as the decision-making control plane instead of a bag of helper calls.
  • Whether you understand sequencing, parallel fan-out, latency budgets, and persistence boundaries.
  • Whether you can evolve an MVP design without over-engineering it too early.

Deep Dive

Core Mental Model

The orchestrator should decide, not do everything itself. A strong answer usually frames it this way:

  1. Load context.
  2. Classify the incoming request.
  3. Fan out only to the services required for that intent.
  4. Aggregate partial results into a prompt-ready shape.
  5. Invoke the LLM.
  6. Run guardrails.
  7. Persist the turn and emit analytics.
  8. Return a structured response.

What Belongs Inside The Orchestrator

  • State transition logic.
  • Timeout and retry policy selection.
  • Dependency ordering.
  • Partial-result assembly.
  • Correlation IDs, tracing context, and response metadata.

What Should Stay Outside

  • Business logic for recommendations, catalog, or promotions.
  • Prompt template ownership.
  • Guardrail implementation details.
  • Session storage implementation details.

Production Answer Shape

  • Start with the happy-path state machine.
  • Then describe parallel fan-out for non-dependent calls.
  • Then explain what happens when one dependency is slow or unavailable.
  • End with the latency budget and how you would decompose the class if intents and services grow.

Strong Answer Pattern

  • "The orchestrator is the workflow controller."
  • "It separates mandatory dependencies from best-effort dependencies."
  • "It must be idempotent around retries and safe around partial failures."
  • "For MVP a single class is acceptable, but I would split routing, prompt building, and response assembly once intent count and dependency count grow."

Scenario 1: Partial Fan-Out Failure

Primary Prompt

The user asks for manga recommendations. Catalog returns in 120 ms, Recommendations times out at 800 ms, Promotions succeeds. How should the orchestrator behave?

Follow-Up 1

Which dependency is critical here, and how do you encode that distinction in the orchestration layer?

Follow-Up 2

Would you retry Recommendations synchronously, or continue with partial results? What timeout budget would you use?

Follow-Up 3

How should the prompt change so the LLM does not hallucinate promotions or recommendations that were never fetched?

Strong Answer Markers

  • Classifies downstreams into critical and best-effort.
  • Uses scatter-gather with per-service timeouts.
  • Continues with available data when the failed service is non-critical.
  • Passes explicit "data unavailable" markers into prompt construction.
  • Logs dependency-specific failures for later tuning.

Scenario 2: Latency Regression At P95

Primary Prompt

The end-to-end p95 latency moved from 2.1 s to 4.0 s after adding a reranker and extra guardrails. Walk through your debugging plan.

Follow-Up 1

What spans and metrics must already exist to make that diagnosis fast?

Follow-Up 2

If the regression is split between LLM generation and a catalog lookup inside guardrails, what is the likely design flaw?

Follow-Up 3

What would you optimize first if product accuracy matters more than raw speed?

Strong Answer Markers

  • Breaks latency down by orchestration stage.
  • Mentions tracing IDs and per-span histograms.
  • Identifies expensive synchronous work placed too late in the pipeline.
  • Optimizes within a latency budget instead of hand-waving about caching everything.

Scenario 3: The MVP Orchestrator Is Becoming A Monolith

Primary Prompt

The orchestrator class now supports 50 intents and 15 downstream services. What would you refactor first?

Follow-Up 1

Would you immediately split into microservices, or first modularize in-process? Why?

Follow-Up 2

How do you avoid turning routeByIntent into a giant switch statement?

Follow-Up 3

What runtime evidence would justify moving from a modular monolith to multiple services?

Strong Answer Markers

  • Chooses plugin or registry-based intent handlers.
  • Separates workflow control from prompt construction and response assembly.
  • Uses growth signals such as team ownership, deploy cadence, scaling profile, and blast radius.
  • Avoids premature service sprawl.

Red Flags

  • Describing the orchestrator as if it owns every business rule.
  • Saying "retry everything" without idempotency or deadline awareness.
  • Ignoring partial-result behavior.
  • Proposing microservices only because the system is important.

Two-Minute Whiteboard Version

Draw a pipeline with three lanes:

  1. Synchronous control path.
  2. Parallel dependency fan-out.
  3. Post-generation validation and persistence.

Then annotate each stage with target latency and fallback behavior.