API Contracts, Streaming, And Escalation
Covers Q2, Q3, Q4, Q10, Q18, Q25, Q29, Q44, Q45.
What The Interviewer Is Testing
- Whether you understand API surface design, not just payload fields.
- Whether you can explain streaming semantics and reconnect behavior.
- Whether you can extend the contract to multilingual support without rewriting the system.
Deep Dive
Why Separate Endpoints Exist
/chat/message, /chat/feedback, and /chat/escalate have different:
- latency expectations
- durability requirements
- downstream workflows
- scaling profiles
Keeping them separate prevents the message path from becoming bloated with unrelated logic.
Request And Response Contracts
The main chat request should carry identity, session, current message, and PageContext. The response should separate:
- natural language text
- structured products
- suggested actions
- follow-up prompts
- metadata such as latency or response IDs
Streaming Design
A good streaming answer mentions explicit message types such as:
chat_messageresponse_startresponse_chunkresponse_enderrorpingpong
Partial Responses
Streaming requires special handling for partial generation:
- detect truncated output
- retry safely when possible
- avoid persisting incomplete turns
- present graceful degradation to the client
Multilingual Extension
The API contract should stay language-agnostic. Add detected language or locale metadata, but avoid duplicating the workflow just because the content language changes.
Strong Answer Pattern
- "The contract is a product surface, not just a transport object."
- "Streaming changes failure handling because completion is no longer atomic."
- "Actions and product cards should be validated separately from the text stream."
Scenario 1: WebSocket Stream Drops Mid-Response
Primary Prompt
The client loses the WebSocket connection after receiving 40 percent of the generated answer. What should happen next?
Follow-Up 1
What session or response identifiers must the client send on reconnect?
Follow-Up 2
Would you resume the stream, replay from the start, or ask the user to retry?
Follow-Up 3
How do you ensure the incomplete answer is not saved as a full assistant turn?
Strong Answer Markers
- Uses
session_idandresponse_idfor resume logic. - Defines a policy for replay versus restart.
- Separates transport completion from application-level completion.
Scenario 2: HTTPS Fallback And Guardrails
Primary Prompt
WebSockets are unavailable for some clients, so they use HTTPS fallback. What changes in orchestration and validation?
Follow-Up 1
Why is HTTPS simpler from a guardrail perspective?
Follow-Up 2
What user-experience trade-off do you accept?
Follow-Up 3
Would you still keep the same response schema?
Strong Answer Markers
- Notes that the full payload is available before returning.
- Keeps schema parity between transport modes.
- Explains loss of progressive UX but simpler post-generation checks.
Scenario 3: Japanese User, English Catalog, Same Backend
Primary Prompt
The user asks in Japanese, but product content and some FAQs exist only in English. How do you support the conversation without cloning the system?
Follow-Up 1
Where do you perform language detection?
Follow-Up 2
Do you maintain per-language indexes or use a multilingual embedding model?
Follow-Up 3
Which guardrails need language-specific tuning?
Strong Answer Markers
- Keeps orchestration and contracts language-neutral.
- Adds language awareness in NLP, retrieval, and safety layers.
- Talks about multilingual embeddings or language-specific indexes.
Red Flags
- Merging feedback and escalation into the message endpoint for convenience.
- Streaming product cards before they are validated.
- Saving partial assistant turns as completed responses.
- Treating multilingual support as only a translation problem.
Two-Minute Whiteboard Version
Draw the same logical response flowing through two transports:
- WebSocket for chunked text and final structured payload.
- HTTPS for full-response fallback.