Scale, Failures, And Architecture Evolution

Covers Q31, Q41, Q43, Q45, Q48, Q50.

What The Interviewer Is Testing

Whether you can reason about blast radius, redundancy, and degraded modes.
Whether you can evolve the system without rewriting every layer.
Whether you can critique the design before launch with specific improvements.

Deep Dive

Failure Thinking

The mature answer is never "make everything HA." It is:

identify the most likely failure domains
define degraded but usable behavior
reduce blast radius through abstraction boundaries
instrument the system to detect and recover fast

Likely Critical Failure Domains

LLM backend availability
conversation state store availability or latency
intent-routing dependency availability
stale or broken retrieval index
guardrail regressions that block good traffic or miss bad traffic

Backend Swap Blast Radius

If the system changes from Claude on Bedrock to an internal model, the biggest concerns are:

prompt compatibility
latency and throughput differences
output-format reliability
guardrail retuning

The clean contract to protect is the structured response shape. If that remains stable, the frontend and many downstream systems stay insulated.

Multilingual Evolution

The smart architectural move is to keep orchestration, session state, and API contracts language-neutral while making NLP, retrieval, and safety components language-aware.

Pre-Launch Review Lens

When asked for improvements before launch, prioritize:

component latency budgets
fallback and circuit-breaker behavior
prompt versioning
semantic or exact-match caching where safe
explicit load-test evidence

Strong Answer Pattern

"I would define graceful degradation before adding more redundancy."
"I want abstraction around the model boundary so backend changes do not leak everywhere."
"Before launch, I want evidence for latency, scale, and fallback behavior."

Scenario 1: Bedrock Region Outage

Primary Prompt

The Bedrock region serving generation is unavailable. How should the system respond in the first five minutes?

Follow-Up 1

What is the fastest degraded mode that still helps the user?

Follow-Up 2

Would you fail over cross-region automatically or gate it behind a feature flag?

Follow-Up 3

What metrics and alerts would confirm the failover is actually working?

Strong Answer Markers

Uses regional failover or controlled fallback.
Describes a non-LLM degraded mode for common intents.
Talks about error rate, latency, and completion-rate confirmation.

Scenario 2: Replace The LLM Backend

Primary Prompt

Leadership wants to move from Bedrock-hosted Claude to an internally trained model. What changes and what stays stable?

Follow-Up 1

Which abstraction protects the frontend and guardrails from major churn?

Follow-Up 2

What revalidation is mandatory before cutover?

Follow-Up 3

How would you phase the migration to reduce blast radius?

Strong Answer Markers

Anchors on structured output contract stability.
Mentions prompt retuning, latency benchmarking, and guardrail recalibration.
Uses shadow traffic or limited rollout before full cutover.

Scenario 3: Five Improvements Before Launch

Primary Prompt

You are reviewing the LLD one week before launch. What are the five highest-value improvements?

Follow-Up 1

Which of those are must-have blockers versus good follow-on work?

Follow-Up 2

How would you justify the ranking to engineering leadership?

Follow-Up 3

What would you explicitly defer from the roadmap?

Strong Answer Markers

Gives concrete, not generic, improvements.
Prioritizes based on user harm, launch risk, and operational readiness.
Defers nice-to-have complexity instead of bloating the launch plan.

Red Flags

Saying redundancy alone solves availability.
Treating a model swap as only an endpoint change.
Offering vague launch improvements like "more testing" without specifying what kind.
Ignoring degraded-mode UX.

Two-Minute Whiteboard Version

Draw three columns:

Failure domain.
Immediate fallback.
Longer-term architectural improvement.