Scale, Failures, And Architecture Evolution
Covers Q31, Q41, Q43, Q45, Q48, Q50.
What The Interviewer Is Testing
- Whether you can reason about blast radius, redundancy, and degraded modes.
- Whether you can evolve the system without rewriting every layer.
- Whether you can critique the design before launch with specific improvements.
Deep Dive
Failure Thinking
The mature answer is never "make everything HA." It is:
- identify the most likely failure domains
- define degraded but usable behavior
- reduce blast radius through abstraction boundaries
- instrument the system to detect and recover fast
Likely Critical Failure Domains
- LLM backend availability
- conversation state store availability or latency
- intent-routing dependency availability
- stale or broken retrieval index
- guardrail regressions that block good traffic or miss bad traffic
Backend Swap Blast Radius
If the system changes from Claude on Bedrock to an internal model, the biggest concerns are:
- prompt compatibility
- latency and throughput differences
- output-format reliability
- guardrail retuning
The clean contract to protect is the structured response shape. If that remains stable, the frontend and many downstream systems stay insulated.
Multilingual Evolution
The smart architectural move is to keep orchestration, session state, and API contracts language-neutral while making NLP, retrieval, and safety components language-aware.
Pre-Launch Review Lens
When asked for improvements before launch, prioritize:
- component latency budgets
- fallback and circuit-breaker behavior
- prompt versioning
- semantic or exact-match caching where safe
- explicit load-test evidence
Strong Answer Pattern
- "I would define graceful degradation before adding more redundancy."
- "I want abstraction around the model boundary so backend changes do not leak everywhere."
- "Before launch, I want evidence for latency, scale, and fallback behavior."
Scenario 1: Bedrock Region Outage
Primary Prompt
The Bedrock region serving generation is unavailable. How should the system respond in the first five minutes?
Follow-Up 1
What is the fastest degraded mode that still helps the user?
Follow-Up 2
Would you fail over cross-region automatically or gate it behind a feature flag?
Follow-Up 3
What metrics and alerts would confirm the failover is actually working?
Strong Answer Markers
- Uses regional failover or controlled fallback.
- Describes a non-LLM degraded mode for common intents.
- Talks about error rate, latency, and completion-rate confirmation.
Scenario 2: Replace The LLM Backend
Primary Prompt
Leadership wants to move from Bedrock-hosted Claude to an internally trained model. What changes and what stays stable?
Follow-Up 1
Which abstraction protects the frontend and guardrails from major churn?
Follow-Up 2
What revalidation is mandatory before cutover?
Follow-Up 3
How would you phase the migration to reduce blast radius?
Strong Answer Markers
- Anchors on structured output contract stability.
- Mentions prompt retuning, latency benchmarking, and guardrail recalibration.
- Uses shadow traffic or limited rollout before full cutover.
Scenario 3: Five Improvements Before Launch
Primary Prompt
You are reviewing the LLD one week before launch. What are the five highest-value improvements?
Follow-Up 1
Which of those are must-have blockers versus good follow-on work?
Follow-Up 2
How would you justify the ranking to engineering leadership?
Follow-Up 3
What would you explicitly defer from the roadmap?
Strong Answer Markers
- Gives concrete, not generic, improvements.
- Prioritizes based on user harm, launch risk, and operational readiness.
- Defers nice-to-have complexity instead of bloating the launch plan.
Red Flags
- Saying redundancy alone solves availability.
- Treating a model swap as only an endpoint change.
- Offering vague launch improvements like "more testing" without specifying what kind.
- Ignoring degraded-mode UX.
Two-Minute Whiteboard Version
Draw three columns:
- Failure domain.
- Immediate fallback.
- Longer-term architectural improvement.