DynamoDB Scaling Scenarios — How DynamoDB Outperforms Relational and Other NoSQL Databases

Audience: Architects, senior developers, and interview candidates who need to articulate the specific scaling situations where DynamoDB is the right choice and why.

Why Scaling Scenarios Matter

Most interviews and design reviews do not ask "what is DynamoDB?" They ask "why would you pick DynamoDB over Postgres for this specific workload?" This document maps real scaling situations to DynamoDB's strengths, with direct comparisons to alternatives so you can justify the choice precisely.

Scenario 1: E-Commerce Flash Sale — Traffic Spike from 1× to 100× in Minutes

The Situation

An e-commerce platform runs at 5,000 requests per second during normal hours. A flash sale announcement causes traffic to spike to 500,000 requests per second within minutes. The primary workload is session state reads, cart lookups by user ID, and inventory status checks.

Why Relational Databases Struggle Here

Problem	What Actually Happens
Connection pool exhaustion	PostgreSQL supports ~200–500 active connections per instance. At 500,000 RPS, a connection pool saturates in seconds even with PgBouncer
Vertical scaling ceiling	Read replicas take 5–15 minutes to provision and warm up. The spike is over before they are ready
Write amplification on updates	Cart updates modify a row, trigger WAL writes, update indexes, and hold row-level locks — not ideal under contention
Schema lock during load	Index maintenance or auto-vacuum locks tables at the worst time

How DynamoDB Handles This

flowchart LR
    subgraph Normal["Normal: 5K RPS"]
        N1[Lambda] --> N2[DynamoDB On-Demand]
        N2 -->|"~5K WRU/sec"| N3[Auto-managed capacity]
    end

    subgraph Flash["Flash Sale: 500K RPS"]
        F1[Lambda] --> F2[DynamoDB On-Demand]
        F2 -->|"~500K WRU/sec"| F3[Adaptive capacity scales automatically]
    end

DynamoDB Advantage	Detail
No connection limit	DynamoDB is HTTP/HTTPS-based. No connection pool. Every Lambda invocation gets an independent request
On-demand mode absorbs burst	Capacity adjusts within seconds. No provisioning step needed for the spike
Adaptive capacity	DynamoDB detects and reallocates capacity to hot partitions within seconds
Predictable per-item latency	Latency stays at single-digit milliseconds at 5K RPS and at 500K RPS
Horizontal partition expansion	DynamoDB automatically splits and redistributes partitions as data and traffic grow

Comparison Summary

Database	Flash Sale Behavior
DynamoDB (on-demand)	Handles the spike transparently; cost rises proportionally to requests
Aurora / PostgreSQL	Likely fails under connection storm; requires pre-warming read replicas
MongoDB Atlas	Scales with sharding, but shard rebalancing takes minutes; connection limits still apply
Redis (standalone)	Single-threaded per shard; saturates at high write concurrency; no durability

Scenario 2: Gaming Leaderboard — Millions of Concurrent Score Updates per Second

The Situation

A mobile game has 10 million daily active users. Every time a player completes a level, their score is updated and regional leaderboards must reflect changes within one second. Peak concurrency is 200,000 score updates per second. Leaderboard reads happen 10× more often than writes.

Why Relational Databases Struggle Here

Problem	Root Cause
Row-level lock contention	Updating a leaderboard rank requires reading, computing, and writing a rank column under a lock
ORDER BY … LIMIT at scale	A `SELECT * FROM leaderboard ORDER BY score DESC LIMIT 100` performs a full index scan; acceptable at small scale, painful at 10M rows with concurrent writes
Index maintenance overhead	Writes to a high-concurrency score column trigger B-tree index rewrites on every update

DynamoDB Solution Design

PK = LEADERBOARD#global        SK = SCORE#<zero_padded_score>#<player_id>
PK = LEADERBOARD#region_NA     SK = SCORE#<zero_padded_score>#<player_id>
PK = PLAYER#<player_id>        SK = PROFILE → stores raw score

Key insight: Score is embedded in the sort key, left-zero-padded so lexicographic sort matches numeric sort. The top-N leaderboard is a Query in reverse order — no ORDER BY needed.

Operation	DynamoDB Design	Cost
Update score	`PutItem` new sort key entry + `DeleteItem` old entry	2 WCU
Read top 100	`Query(PK=LEADERBOARD#global, ScanIndexForward=False, Limit=100)`	~100 RCU (4KB each, eventual) = 25 RCU
Player rank lookup	`Query with ExclusiveStartKey` from player's score	Sub-millisecond with DAX

Comparison Summary

Database	Leaderboard at 200K writes/sec
DynamoDB	Sort-key-embedded scores make rank queries a bounded `Query`; scales horizontally
PostgreSQL	Requires careful indexing; rank queries degrade as concurrent writes increase; connection bottleneck
Redis Sorted Sets	Microsecond latency, excellent for leaderboards; but durability is weaker; limited to single-shard capacity per sorted set
MongoDB	Aggregation pipeline for ranks is flexible but scan-heavy at 10M rows

DynamoDB + Redis combined: Use DynamoDB as the durable source of truth for scores. Use Redis Sorted Sets as the hot leaderboard cache. DynamoDB Streams trigger a Lambda to keep Redis in sync.

Scenario 3: IoT Sensor Ingestion — 1 Million Devices Writing Every 30 Seconds

The Situation

An industrial IoT platform collects temperature, pressure, and vibration readings from 1,000,000 devices, each sending a data point every 30 seconds. That is roughly 33,000 writes per second at baseline. Reads are time-windowed: "give me the last 24 hours of readings for device XYZ."

Why Relational Databases Struggle Here

Problem	Root Cause
Write throughput ceiling	A single PostgreSQL primary handles ~5,000–10,000 writes per second reliably; 33K WPS requires aggressive sharding
Time-range partition pruning is fragile	Without proper table partitioning (PARTITION BY RANGE), every time-range query scans the full index
Schema migrations on live tables	Adding a new sensor type at 33K WPS causes ALTER TABLE lock timeouts
Storage bloat	Dead row versions accumulate (MVCC bloat); VACUUM competes with active writes

DynamoDB Solution Design

PK = DEVICE#<device_id>     SK = TS#<unix_epoch_ms>
TTL = epoch for 30-day retention

Access Pattern	DynamoDB Operation	Why It Works
Ingest one reading	`PutItem`	1 WCU per 1 KB reading; 33K WPS = 33K WCU (on-demand absorbs)
Last 24 hours for device	`Query(PK=DEVICE#id, SK BETWEEN yesterday AND now)`	Bounded range within one partition; no table scan
Expire old readings	TTL on 30-day epoch	Zero WCU; no delete jobs needed
Fan-out to alerts pipeline	DynamoDB Streams → Lambda	Every new reading triggers downstream processing asynchronously

Comparison Summary

Database	IoT 33K WPS
DynamoDB	On-demand handles burst; time-range queries are partition-bounded; TTL manages retention
PostgreSQL	Requires time-series extension (TimescaleDB) or manual PARTITION BY RANGE; write ceiling requires careful capacity planning
Cassandra	Comparable write throughput; BUT requires manual ring management, compaction tuning, and infrastructure ownership
InfluxDB / TimescaleDB	Purpose-built for time-series; better aggregation functions (AVG, ROLLUP); but higher operational overhead than DynamoDB

When to prefer Timestream or TimescaleDB over DynamoDB for IoT: If the primary query is time-series analytics (moving averages, downsampling, interpolation), a purpose-built time-series database wins on query expressiveness. DynamoDB wins on scale, simplicity, and time-windowed point lookups.

Scenario 4: Multi-Tenant SaaS — Thousands of Tenants, Wildly Variable Load

The Situation

A SaaS platform serves 10,000 tenant companies. A handful of enterprise tenants generate 80% of the traffic. Smaller tenants generate almost no traffic at all. Data isolation between tenants is a compliance requirement. Some tenants grow 10× in a week after a product launch.

Why Shared Relational Databases Struggle Here

Problem	Root Cause
One noisy tenant hurts all	A large tenant running a heavy `SELECT *` query or a long transaction blocks other tenants on the same instance
Provisioning for peak wastes money	You size the database for the largest enterprise tenant's peak, which means all small tenants pay for unused capacity
Schema migrations across all tenants	A schema change to add a column requires locking the shared table, affecting every tenant simultaneously
Connection multiplexing	10,000 tenants each wanting persistent connections overwhelms any reasonable connection limit

DynamoDB Solution: Tenant-Prefixed Key Design

PK = TENANT#<tenant_id>#RESOURCE#<resource_id>    SK = <version_or_timestamp>
GSI: PK = TENANT#<tenant_id>, SK = CREATED_AT

Scaling Benefit	How DynamoDB Delivers It
Tenant isolation at partition level	Each tenant's data lives in its own partition range; a noisy tenant causes throttling only on their own partition; other tenants are unaffected
Linear cost scaling	Small tenants pay for tiny WCU/RCU; large tenants pay more; the cost scales precisely with usage
No schema migrations	New feature attributes are added per-item without altering a shared schema
No connection management	DynamoDB is connectionless; 10,000 tenants × 100 concurrent users = 1,000,000 simultaneous HTTP requests all accepted

Comparison Summary

Database	Multi-Tenant at 10K Tenants
DynamoDB	Per-tenant partition isolation; pay-per-use costs; connectionless; no shared lock contention
PostgreSQL (schema-per-tenant)	Better isolation but 10K schemas creates index and connection management complexity
PostgreSQL (row-level security)	Simpler than separate schemas; noisy-tenant risk remains; connection limits still apply
MongoDB (collection-per-tenant)	10K collections in one cluster is manageable but adds operational overhead

Scenario 5: Session and Authentication State — Millions of Active Sessions

The Situation

A consumer web application has 5 million logged-in users during peak hours. Each API request checks if a session token is valid and loads a small payload of session data. Session data must survive server restarts. Sessions expire after 30 minutes of inactivity.

Why Redis Alone Struggles as the Primary Session Store

Problem	Root Cause
Durability on restart	Redis AOF persistence reduces this risk but adds latency; RDB snapshots are periodic and may lose recent sessions on crash
Memory pressure at 5M sessions	At ~1 KB per session, 5M sessions = ~5 GB hot working set in memory; expensive at Redis memory pricing
Single-shard hot key ceiling	If you naively store all sessions under one key or a few keys, the shard bottlenecks
Cluster failover gap	Redis cluster failover takes 15–30 seconds; sessions become unreadable during failover

DynamoDB Solution

PK = SESSION#<session_id>     SK = META
TTL = current_time + 1800     (30-minute inactivity expiry, reset on each request)

Requirement	DynamoDB Behavior
Durability	Multi-AZ replication; no data loss on AZ failure
5M concurrent sessions	On-demand mode handles any read/write rate; no memory ceiling
30-minute sliding expiry	`UpdateItem` resets TTL on every request; physical deletion is automatic
Sub-10ms reads	`GetItem` by `session_id` is a direct key lookup; consistent single-digit ms latency
Compliance	Encryption at rest with KMS; access audited via CloudTrail

Hybrid Architecture: DynamoDB + Redis

flowchart LR
    REQUEST["API Request\n(session token)"] --> REDIS["Redis Cache\n(hot sessions, 60-sec TTL)"]
    REDIS -->|Cache hit| RESPONSE["Response"]
    REDIS -->|Cache miss| DYNAMO["DynamoDB\n(durable session store)"]
    DYNAMO --> REDIS
    DYNAMO --> RESPONSE

Redis handles repeated reads for hot sessions (users clicking rapidly)
DynamoDB is the durable source of truth
Cache miss falls through to DynamoDB and warms Redis

Comparison Summary

Database	5M Concurrent Sessions
DynamoDB	Durable, TTL-native, infinite horizontal scale, no connection limit
Redis (primary store)	Faster but durability risk; memory cost at 5M sessions; failover gap
PostgreSQL	Session table at 5M rows with high-frequency TTL updates causes bloat and lock pressure
Memcached	No durability; no TTL at item level; data lost on restart

Scenario 6: Real-Time Activity Feed — Fan-Out Writes with High Read Amplification

The Situation

A social platform has 100M users. When a celebrity with 10M followers posts content, all 10M followers should see the post in their feed within a second. Reads are 10× more frequent than writes. Feed items should expire after 30 days.

The Fan-Out Write Challenge

The write fan-out model (write one post to 10M follower feeds) creates: - 10M individual writes per celebrity post - Burst write load lasting 5–30 minutes per viral post

Why Relational Databases Fail at Fan-Out Scale

Problem	Root Cause
10M row inserts in minutes	PostgreSQL insert throughput with indexes saturates a single primary
Index maintenance on insert	Every insert into a `feed` table with `user_id + created_at` index triggers B-tree maintenance
Connection storm	Every write worker needs a connection; 1,000 parallel writers approach the connection limit

DynamoDB Solution: Per-User Feed Partition

PK = FEED#<user_id>      SK = CREATED_AT#<post_id>
TTL = 30-day epoch

Operation	DynamoDB Behavior
Fan-out write (celebrity post)	`BatchWriteItem` calls writing 25 items each; rate-distributed via SQS queue with Lambda consumers
Read user feed	`Query(PK=FEED#user_id, ScanIndexForward=False, Limit=20)` — single-partition, reverse-sorted, fast
Expire old posts	TTL handles 30-day rolling expiry without a cleanup job
Handle celebrity fan-out peak	SQS absorbs the burst; Lambda scales out fan-out writers to 1,000 concurrent workers

flowchart TB
    POST["Celebrity posts content"] --> SQS["SQS Queue\n(fan-out job)"]
    SQS --> LAMBDA["Lambda Fan-Out Workers\n(1,000 concurrent)"]
    LAMBDA -->|"BatchWriteItem × 400K batches"| DYNAMO["DynamoDB\nFeed Table (per-user partition)"]
    USER["Follower reads feed"] -->|"Query(PK=FEED#user_id)"| DYNAMO

Comparison Summary

Database	Fan-Out Write to 10M Followers
DynamoDB	Per-user partition distributes writes; on-demand absorbs burst; TTL manages retention
PostgreSQL	Single-primary insert throughput insufficient; read replicas help reads but not writes
Cassandra	Comparable write throughput; per-user partition maps naturally; but higher operational burden
MongoDB	Flexible but single-shard write ceiling applies; sharding configuration adds complexity

Scenario 7: Global Active-Active — Multi-Region with Low-Latency Local Reads

The Situation

A global B2B SaaS application serves customers in North America, Europe, and Asia-Pacific. Regulatory requirements mandate that customer data in EU stays in EU. Local read latency must be under 10ms. Write conflicts must be resolved automatically.

Cross-Region Relational Database Challenges

Problem	Root Cause
Write conflicts between regions	Active-active writes to two PostgreSQL masters require application-level conflict resolution
Replication lag	Async replication means EU reads may lag behind writes from US by hundreds of milliseconds
Latency over the wire	Routing US writes to EU for strong consistency adds 80–120ms of cross-region latency
Failover complexity	Aurora Global Database failover requires promotional steps and DNS propagation

DynamoDB Global Tables Solution

flowchart LR
    subgraph US-East["us-east-1"]
        DDB_US["DynamoDB\nGlobal Table Replica"]
        LAMBDA_US["Lambda\n(US traffic)"]
    end

    subgraph EU-West["eu-west-1"]
        DDB_EU["DynamoDB\nGlobal Table Replica"]
        LAMBDA_EU["Lambda\n(EU traffic)"]
    end

    subgraph AP-SE["ap-southeast-1"]
        DDB_AP["DynamoDB\nGlobal Table Replica"]
        LAMBDA_AP["Lambda\n(AP traffic)"]
    end

    LAMBDA_US -->|"Read/Write local"| DDB_US
    LAMBDA_EU -->|"Read/Write local"| DDB_EU
    LAMBDA_AP -->|"Read/Write local"| DDB_AP

    DDB_US <-->|"Async replication\n~1 second"| DDB_EU
    DDB_EU <-->|"Async replication\n~1 second"| DDB_AP
    DDB_US <-->|"Async replication\n~1 second"| DDB_AP

DynamoDB Global Tables Property	Detail
Replication lag	Typically under 1 second between regions
Conflict resolution	Last-writer-wins based on timestamp (at item version level)
Regional isolation	Reads and writes in each region hit the local replica (sub-10ms)
Failover	Traffic shifting to another region requires only DNS/routing change; replica is already warm and up to date
Data residency	EU replica stays in EU; write IAM policies prevent cross-region writes for regulated data

Comparison Summary

Database	Global Active-Active
DynamoDB Global Tables	Native multi-region active-active; ~1s replication lag; last-writer-wins conflict resolution
Aurora Global Database	One writer region; cross-region reads only (read replicas); failover takes minutes
CockroachDB	True multi-region active-active with consensus; more configuration; higher latency for cross-region writes
Cassandra (multi-DC)	High write availability; requires manual rack/DC topology configuration; compaction and repair management

Scenario 8: Event Sourcing and Audit Logs — Immutable Append-Only Write Patterns

The Situation

A financial services platform stores every state change as an immutable event — account debits, credits, transfers, and status changes. The audit log must be query-able by account ID and time range. Older events must never be modified. Regulators require 7-year retention.

Why This Pattern Suits DynamoDB Naturally

The event log workload is purely append-heavy with ordered reads. It never modifies existing records. The access pattern is simple: "give me all events for account X between date A and date B."

Design

PK = ACCOUNT#<account_id>     SK = EVENT#<unix_epoch_ms>#<event_id>
event_type = "CREDIT" | "DEBIT" | "TRANSFER"
amount = 250.00
balance_after = 1750.00
actor = "system" | "user_id"
TTL = not set (7-year retention)  ← TTL is intentionally absent for compliance

Access Pattern	Operation	Cost
Append one event	`PutItem` with `attribute_not_exists(SK)` condition	1 WCU
Load full account history	`Query(PK=ACCOUNT#id)` paginated	~1 RCU per page
Load events in date range	`Query(PK=ACCOUNT#id, SK BETWEEN start AND end)`	Bounded read
Audit export to S3	DynamoDB Streams → Lambda → S3 Parquet	Near real-time export without Scan

Conditional Write for Idempotency

table.put_item(
    Item={
        "PK": f"ACCOUNT#{account_id}",
        "SK": f"EVENT#{timestamp}#{event_id}",
        "event_type": "CREDIT",
        "amount": Decimal("250.00"),
        "actor": actor_id
    },
    ConditionExpression="attribute_not_exists(SK)"
)

If the same event is retried (e.g., Lambda retry after timeout), the condition fails and raises ConditionalCheckFailedException. The duplicate write is silently rejected — the event log stays clean.

Comparison Summary

Database	Append-Only Event Log at Scale
DynamoDB	Natural per-account partition; sort-key-ordered events; conditional writes prevent duplicates
PostgreSQL	Works well at moderate scale; high-volume OLTP event inserts require table partitioning and archiving strategy
Kafka (event store)	Excellent for event streaming and replay; less suitable for per-entity query-by-account queries without a secondary index
EventStoreDB	Purpose-built for event sourcing; better stream projection support; more operational overhead than DynamoDB

Scenario 9: Burst Traffic with Cold Start — Serverless-First Architecture

The Situation

A startup deploys an API using AWS Lambda + API Gateway. At zero traffic, no compute runs. During a viral social media moment, the API receives 50,000 requests per minute within 30 seconds of the post.

The Database Cold Start Problem

Most databases have a warm-up cost: - Aurora Serverless v1: cold starts take 25–30 seconds to resume from paused state - RDS: always-on but connection pool must be pre-allocated - ElastiCache: cluster must be provisioned ahead of time - Aurora Serverless v2: faster, but still needs connection management

Why DynamoDB Is the Natural Serverless Pair

Property	Why It Matters for Lambda
No connection state	Lambda does not hold a database connection between invocations. DynamoDB requests are stateless HTTP calls — no connection warm-up
DAX client pool (if used)	Be careful: DAX requires a persistent TCP connection from the Lambda VPC, which conflicts with cold starts
On-demand mode	No pre-warming; capacity materializes as traffic arrives
Zero idle cost	When Lambda is at zero invocations, DynamoDB costs nothing for reads/writes
RDS Proxy alternative	If you must use Aurora with Lambda, RDS Proxy manages the connection pool — but adds latency and cost that DynamoDB avoids entirely

Comparison Summary

Database	Serverless Lambda Cold Start
DynamoDB	Stateless HTTP; zero connection overhead; on-demand capacity; natural pair
Aurora + RDS Proxy	Works but adds proxy latency and cost; proxy itself has a warm-up period
Aurora Serverless v1	Cold start from paused state was 25–30s — catastrophic for burst traffic
Redis (ElastiCache)	Requires VPC; persistent TCP connection from Lambda; idle cluster cost even at zero traffic

Scenario 10: Time-Based Session Memory with Sliding Window Summarization

The Situation

A conversational AI chatbot (like MangaAssist) maintains multi-turn context. Memory must be loaded on every message within a latency budget. Old turns must be summarized and compressed to avoid token overflow in LLM prompts. Sessions must expire automatically after inactivity.

The Memory Access Pattern

Every message in the chatbot triggers: 1. Load the last N turns (ordered by recency) 2. Load the latest summary item (if it exists) 3. Assemble context from turns + summary 4. Append the new user turn 5. Append the new assistant turn 6. Optionally write a new summary if the window is full

This is a purely key-based, ordered-range, append-heavy workload with no joins.

Schema Recap

PK = SESSION#<session_id>     SK = META           → session lifecycle state
PK = SESSION#<session_id>     SK = TURN#<epoch>   → one item per chat turn
PK = SESSION#<session_id>     SK = SUMMARY#<n>    → compressed window summary
GSI: PK = customer_id, SK = updated_at            → resume sessions by customer

Why This Is Specifically Where DynamoDB Beats Every Alternative

Access Pattern	DynamoDB	PostgreSQL	Redis
Load last 10 turns	`Query` reverse sort, Limit=10 — 1 network round trip	`SELECT … ORDER BY created_at DESC LIMIT 10` — works but slower at scale and under connection load	`LRANGE` on a list — fast but no TTL-per-item; durability risk
Append one turn	`PutItem` — small item, 1 WCU	`INSERT` — adds row, updates index, writes WAL	`RPUSH` — fast but atomicity on crash worse
Update session metadata	`UpdateItem` — patches specific attributes	`UPDATE` — row lock required	`HMSET` — fast; no durability guarantee
Natural TTL per item	TTL attribute — no cleanup job	Requires scheduled `DELETE WHERE` job or partition pruning	`EXPIRE` per key — available but per-list-item TTL is awkward
Burst at 500K sessions	On-demand scales automatically	Connection pool saturates; needs read replicas	Single shard OOM risk at 500K hot keys

Summary: When to Choose DynamoDB Over Alternatives

Scenario	Primary Reason to Choose DynamoDB
E-commerce flash sale	On-demand capacity absorbs instant spikes; no connection pool
Gaming leaderboards	Sort-key rank embedding; millisecond read latency at any scale
IoT sensor ingestion	High-write throughput; per-device partition; TTL for retention
Multi-tenant SaaS	Per-tenant partition isolation; pay-per-use; connectionless
Session state	Durable; TTL-native; no connection overhead; multi-AZ
Social feed fan-out	Per-user partition; batch write; TTL for rolling expiry
Global active-active	Global Tables; local reads under 10ms; built-in replication
Event sourcing / audit log	Append-only; conditional idempotency; ordered range queries
Serverless Lambda workloads	Stateless HTTP; no connection warm-up; zero idle cost
Conversational AI memory	Per-session partition; ordered turns; summary items; TTL

When NOT to Choose DynamoDB

Situation	Better Choice	Why
Complex ad-hoc queries, reporting, analytics	PostgreSQL / Redshift	SQL joins, aggregations, GROUP BY, window functions
Multi-entity transaction with full ACID	PostgreSQL	True serializable transactions across arbitrary tables
Full-text search on document content	OpenSearch / Elasticsearch	Text tokenization, relevance scoring, faceted search
High-cardinality time-series aggregation	InfluxDB / TimescaleDB	Built-in rollup, downsampling, interpolation
Graph traversal (friends-of-friends)	Neptune / Neo4j	Efficient multi-hop traversal; DynamoDB has no native graph semantics
Schema evolves heavily and unpredictably	MongoDB	Flexible schema querying without key redesign
Small data, few users, complex queries	PostgreSQL	Operationally simple; no need for NoSQL overhead

Architectural Principle: Fit the Access Pattern, Not the Data Shape

The single most important rule when evaluating DynamoDB for a scaling scenario:

DynamoDB rewards workloads where you know your access patterns upfront. For every access pattern you can enumerate, DynamoDB delivers predictable latency and linear scale. For every access pattern you cannot enumerate, DynamoDB punishes you with Scans or impossible queries.

If your workload has three well-defined access patterns, DynamoDB is likely the right tool. If your product manager changes the analytics requirement every sprint, keep that layer in a relational or document store where ad-hoc queries are natural.