API Contracts, Streaming, And Escalation

Covers Q2, Q3, Q4, Q10, Q18, Q25, Q29, Q44, Q45.

What The Interviewer Is Testing

Whether you understand API surface design, not just payload fields.
Whether you can explain streaming semantics and reconnect behavior.
Whether you can extend the contract to multilingual support without rewriting the system.

Deep Dive

Why Separate Endpoints Exist

/chat/message, /chat/feedback, and /chat/escalate have different:

latency expectations
durability requirements
downstream workflows
scaling profiles

Keeping them separate prevents the message path from becoming bloated with unrelated logic.

Request And Response Contracts

The main chat request should carry identity, session, current message, and PageContext. The response should separate:

natural language text
structured products
suggested actions
follow-up prompts
metadata such as latency or response IDs

Streaming Design

A good streaming answer mentions explicit message types such as:

chat_message
response_start
response_chunk
response_end
error
ping
pong

Partial Responses

Streaming requires special handling for partial generation:

detect truncated output
retry safely when possible
avoid persisting incomplete turns
present graceful degradation to the client

Multilingual Extension

The API contract should stay language-agnostic. Add detected language or locale metadata, but avoid duplicating the workflow just because the content language changes.

Strong Answer Pattern

"The contract is a product surface, not just a transport object."
"Streaming changes failure handling because completion is no longer atomic."
"Actions and product cards should be validated separately from the text stream."

Scenario 1: WebSocket Stream Drops Mid-Response

Primary Prompt

The client loses the WebSocket connection after receiving 40 percent of the generated answer. What should happen next?

Follow-Up 1

What session or response identifiers must the client send on reconnect?

Follow-Up 2

Would you resume the stream, replay from the start, or ask the user to retry?

Follow-Up 3

How do you ensure the incomplete answer is not saved as a full assistant turn?

Strong Answer Markers

Uses session_id and response_id for resume logic.
Defines a policy for replay versus restart.
Separates transport completion from application-level completion.

Scenario 2: HTTPS Fallback And Guardrails

Primary Prompt

WebSockets are unavailable for some clients, so they use HTTPS fallback. What changes in orchestration and validation?

Follow-Up 1

Why is HTTPS simpler from a guardrail perspective?

Follow-Up 2

What user-experience trade-off do you accept?

Follow-Up 3

Would you still keep the same response schema?

Strong Answer Markers

Notes that the full payload is available before returning.
Keeps schema parity between transport modes.
Explains loss of progressive UX but simpler post-generation checks.

Scenario 3: Japanese User, English Catalog, Same Backend

Primary Prompt

The user asks in Japanese, but product content and some FAQs exist only in English. How do you support the conversation without cloning the system?

Follow-Up 1

Where do you perform language detection?

Follow-Up 2

Do you maintain per-language indexes or use a multilingual embedding model?

Follow-Up 3

Which guardrails need language-specific tuning?

Strong Answer Markers

Keeps orchestration and contracts language-neutral.
Adds language awareness in NLP, retrieval, and safety layers.
Talks about multilingual embeddings or language-specific indexes.

Red Flags

Merging feedback and escalation into the message endpoint for convenience.
Streaming product cards before they are validated.
Saving partial assistant turns as completed responses.
Treating multilingual support as only a translation problem.

Two-Minute Whiteboard Version

Draw the same logical response flowing through two transports:

WebSocket for chunked text and final structured payload.
HTTPS for full-response fallback.