LOCAL PREVIEW View on GitHub

Scenario 4: Cold-Start Latency Cascade

Scenario Summary

The architecture places multiple cold-start-prone services in series: authorizer, workflow engine, functions, and container tasks. Each hop adds start-up delay, and the combined time destroys the end-user latency target during bursts or scale-out events.

Why It Matters

Architects often optimize for separation of concerns and forget that the user pays the sum of the synchronous path. Clean diagrams can still produce a fragile hot path.

Failure Pattern

Design area Weak choice Better choice
Critical path Many synchronous hops in series Fewer synchronous components, more inline or parallel work
Compute choice Cold-start-prone services on the user path Warm capacity for latency-sensitive orchestration
Load validation Average latency from steady traffic Burst and scale-out tests against p95 and p99

Deep Dive

This scenario is less about any one service and more about composition. A design that chains API Gateway, Lambda authorizer, Step Functions, multiple Lambda functions, and then ECS is multiplying risk. Each layer has its own initialization cost and scaling lag. When traffic rises from idle, these delays can align instead of averaging out.

The architectural correction is usually to collapse the synchronous chain:

  • keep the critical orchestration in one warm service,
  • reserve serverless fan-out for asynchronous side work,
  • add minimum warm capacity where the SLA depends on it.

Detection Signals

  • First-request latency is dramatically worse than warm-request latency
  • Problems cluster around scale-out events rather than all traffic
  • Traces show several small startup penalties that add up to a large total

Runbook

  1. Trace the synchronous request path end to end.
  2. Mark which hops can cold start.
  3. Consolidate hot-path orchestration into fewer warm components.
  4. Move noncritical steps to asynchronous processing.
  5. Retest with burst traffic and p99 latency goals.

Questions To Ask

  • How many synchronous hops are on the user-facing path?
  • Which of those hops can cold start at the same time?
  • What is the real p99 budget for each hop?
  • Which components should have minimum warm capacity?

Interview Drill

When is it worth paying for always-warm capacity instead of relying entirely on scale-to-zero patterns?

Good Outcome

The hot path is intentionally short, warm where necessary, and validated against burst traffic instead of only steady-state demos.