LOCAL PREVIEW View on GitHub

DynamoDB Basics — Comprehensive Reference

Audience: Developers, architects, and interview candidates who need a complete grounding in DynamoDB from first principles through production-ready patterns.


1. What Is DynamoDB?

Amazon DynamoDB is a fully managed, serverless, key-value and document NoSQL database built for any scale. It delivers single-digit millisecond latency at any throughput level, without manual sharding, schema migrations, or capacity tuning by default.

Property Value
Type Key-value + document NoSQL
Managed by AWS (no servers to operate)
Latency target Single-digit milliseconds
Durability Multi-AZ replication, 99.999% SLA
Consistency options Eventually consistent or strongly consistent
Scaling model On-demand or provisioned with auto-scaling
Max item size 400 KB
Max partitions Unlimited (managed automatically)

2. Core Concepts

2.1 Table

A DynamoDB table is the top-level container for data. Unlike a relational table, it does not enforce a fixed schema across all items. Every item in the same table can have different attributes, except for the primary key.

Table: customer-orders
  Item → { order_id: "ORD-001", customer: "alice", amount: 59.99 }
  Item → { order_id: "ORD-002", customer: "bob", amount: 12.00, promo: "SAVE10" }
  Item → { order_id: "ORD-003", customer: "alice", note: "gift", amount: 200.00 }

2.2 Item

An item is a single record in a table. It is a collection of attributes. Each item must include the primary key attributes. All other attributes are optional and can vary per item.

2.3 Attribute

An attribute is a named value stored in an item. Think of it as a field in a JSON object.

Supported Data Types

Category Types Notes
Scalar S (String), N (Number), B (Binary), BOOL (Boolean), NULL Most common building blocks
Set SS (String Set), NS (Number Set), BS (Binary Set) Unordered; no duplicates allowed
Document M (Map), L (List) Nested structures, similar to JSON objects and arrays

Number storage note: Numbers are stored as strings internally to avoid floating-point precision loss. All numeric comparisons still work correctly.


3. Primary Keys

The primary key is the single most important design decision in DynamoDB. It determines how data is partitioned, how it is retrieved, and what access patterns are efficient.

3.1 Simple Primary Key (Partition Key Only)

Only a partition key is defined. Each item must have a unique partition key value.

PK = user_id
  "user123" → { name: "Alice", email: "alice@example.com" }
  "user456" → { name: "Bob",   email: "bob@example.com" }

Good for: Lookups where you always know the exact key. GetItem by user ID is the canonical example.

3.2 Composite Primary Key (Partition Key + Sort Key)

Both a partition key and a sort key are defined. Items with the same partition key are grouped together and sorted by the sort key. This is the most powerful and most commonly used key design.

PK = user_id  |  SK = order_date
  "user123"  |  "2024-01-01"  →  first order
  "user123"  |  "2024-03-15"  →  second order
  "user456"  |  "2024-02-10"  →  another user's order

Good for: Time-series data, hierarchical relationships, one-to-many relationships, ordered retrieval within a group.

3.3 Partition Key Design Rules

Rule Why It Matters
High cardinality Low-cardinality keys (like status = "active") concentrate traffic on one partition
Write distribution All writes to the same key go to the same partition shard
Avoid hot keys A single very popular partition key limits throughput to ~1000 WCU and ~3000 RCU per partition
Prefix patterns Using USER#<id> or ORDER#<id> prefixes prevents cross-entity key collisions in single-table designs

3.4 Hot Partition Mitigation

If your workload puts too much traffic on a single partition key:

  1. Write sharding: Append a random suffix (user_id#1, user_id#2, ..., user_id#N) and scatter writes across N logical partitions. Reads must aggregate across shards.
  2. Attribute bucketing: Use a time bucket as part of the partition key (user_id#2024-03) so traffic is distributed by time.
  3. DAX in front: Cache hot reads to reduce RCU pressure on a single partition.

4. Indexes

4.1 Local Secondary Index (LSI)

An LSI shares the partition key with the base table but uses a different sort key. It gives you an alternate ordering of items within the same partition.

Property Value
Partition key Same as base table
Sort key Different attribute
Creation time Must be defined at table creation — cannot add later
Consistency Supports strongly consistent reads
Limit Up to 5 per table
Storage Shares partition storage with base table
Base table:   PK = user_id,  SK = created_at
LSI:          PK = user_id,  SK = status
→ Now you can query all items for a user filtered by status within that partition

4.2 Global Secondary Index (GSI)

A GSI has its own partition key and optional sort key, completely independent of the base table. It lets you query the same data from a different angle.

Property Value
Partition key Any attribute (different from base table)
Sort key Optional, any attribute
Creation time Can be added or deleted at any time
Consistency Eventually consistent only
Limit Up to 20 per table (default quota, can be raised)
Throughput Has its own separate WCU/RCU allocation
Projection ALL, KEYS_ONLY, or INCLUDE (specific attributes)
Base table:   PK = order_id
GSI:          PK = customer_id,  SK = order_date
→ Now you can query all orders for a customer sorted by date

GSI projection strategy: - KEYS_ONLY: only index keys are stored in the index — cheapest - INCLUDE: explicitly named attributes are included — balance between cost and query needs - ALL: every attribute is replicated — most flexible but most expensive

4.3 LSI vs GSI Comparison

Dimension LSI GSI
Partition key Must match base table Independent
Consistency Strong or eventual Eventual only
Added after creation No Yes
Item collection size limit 10 GB per partition key value No such limit
Can span across partitions No Yes

5. Capacity Modes

5.1 On-Demand Mode

DynamoDB automatically scales to accommodate any request rate. You pay per read/write request consumed.

Aspect Detail
Scaling Instant, no capacity planning
Cost model Pay per request (WRU/RRU)
Best for Unpredictable traffic, new applications, traffic spikes
Pricing (2024, us-east-1) ~$1.25 per million write request units, ~$0.25 per million read request units

5.2 Provisioned Mode

You specify RCU and WCU targets in advance. You pay for the provisioned capacity whether it is used or not.

Aspect Detail
Scaling Manual or auto-scaling (with min/max targets)
Cost model Pay for provisioned throughput by the hour
Best for Predictable steady traffic where cost optimization matters
Auto-scaling CloudWatch-driven, reactive to actual usage with target utilization setting

5.3 Capacity Unit Math

Unit Rule
1 WCU 1 write of up to 1 KB per second
1 RCU (strong) 1 read of up to 4 KB per second
1 RCU (eventual) 2 reads of up to 4 KB per second (half the cost)
Transactional WCU 2× the standard WCU cost
Transactional RCU 2× the standard RCU cost

Size rounding: Items are rounded up to the nearest 1 KB for writes, and to the nearest 4 KB for reads. A 1.5 KB item counts as 2 WCU.


6. Read and Write Operations

6.1 Single-Item Operations

API What It Does
PutItem Creates or fully replaces an item identified by its primary key
GetItem Retrieves a single item by its exact primary key
UpdateItem Creates or patches an item; can increment/append/remove individual attributes
DeleteItem Removes an item by its primary key

6.2 Batch Operations

API What It Does Limit
BatchGetItem Retrieves up to 100 items across one or more tables 100 items, 16 MB per call
BatchWriteItem Puts or deletes up to 25 items per call 25 items, 16 MB per call

Important: Batch operations are not atomic. Individual operations within a batch can fail independently. You must handle UnprocessedItems / UnprocessedKeys in your retry logic.

6.3 Query

Reads all items that share a partition key, with optional sort key filtering.

  • Always requires a partition key value
  • Sort key can be filtered with =, <, >, BETWEEN, begins_with
  • Results can be paginated with ExclusiveStartKey / LastEvaluatedKey
  • Can be scanned in forward or reverse sort order
  • Much cheaper than Scan because it is a bounded, targeted read
response = table.query(
    KeyConditionExpression=Key("PK").eq("SESSION#abc") & Key("SK").begins_with("TURN#"),
    ScanIndexForward=False,   # descending sort order (latest turns first)
    Limit=10
)

6.4 Scan

Reads every item in a table or index sequentially.

  • No key requirement — reads everything
  • Very expensive at scale; avoid in hot paths
  • Can be parallelized using Segment and TotalSegments for bulk export jobs
  • Add a FilterExpression to limit returned data, but filtered items still consume RCU
# Parallel scan across 4 workers
response = table.scan(TotalSegments=4, Segment=worker_id)

6.5 Filter Expression vs Key Condition Expression

Concept Evaluated At Affects RCU Cost
KeyConditionExpression Before reading Yes — limits items fetched
FilterExpression After reading No — items are already read and billed

This is a critical distinction. A FilterExpression that discards 90% of results still bills you for reading 100% of them. Design key patterns so the key condition does most of the filtering.


7. Writes in Depth

7.1 Conditional Writes

Write only proceeds if a condition is true. If the condition fails, DynamoDB returns ConditionalCheckFailedException.

# Only insert if the item does not already exist
table.put_item(
    Item={"PK": "SESSION#abc", "SK": "META", "status": "active"},
    ConditionExpression="attribute_not_exists(PK)"
)

# Atomic counter increment — only if counter is below limit
table.update_item(
    Key={"PK": "COUNTER#daily", "SK": "2024-03-28"},
    UpdateExpression="SET count = count + :inc",
    ConditionExpression="count < :limit",
    ExpressionAttributeValues={":inc": 1, ":limit": 1000}
)

7.2 UpdateExpression Operators

Operator Purpose Example
SET Set or overwrite an attribute SET status = :val
REMOVE Delete an attribute from an item REMOVE page_context
ADD Increment a number or add to a set ADD view_count :one
DELETE Remove values from a set attribute DELETE tags :old_tag

Multiple clauses can be combined: SET updated_at = :ts REMOVE old_field ADD view_count :one

7.3 Transactions

TransactWriteItems and TransactGetItems provide all-or-nothing semantics across up to 100 items.

client.transact_write_items(
    TransactItems=[
        {
            "Put": {
                "TableName": "orders",
                "Item": {"PK": "ORDER#999", "status": "placed", "amount": 99},
                "ConditionExpression": "attribute_not_exists(PK)"
            }
        },
        {
            "Update": {
                "TableName": "inventory",
                "Key": {"PK": "SKU#X1"},
                "UpdateExpression": "SET stock = stock - :one",
                "ConditionExpression": "stock > :zero",
                "ExpressionAttributeValues": {":one": 1, ":zero": 0}
            }
        }
    ]
)

Transaction costs: Each item in a transaction consumes 2× the normal RCU or WCU. Use transactions only when atomicity genuinely matters.


8. TTL (Time to Live)

DynamoDB TTL automatically deletes expired items without consuming WCU.

Property Detail
Attribute type Number (Unix epoch seconds)
Deletion lag Items are typically deleted within 48 hours of expiry
Cost No WCU consumed for TTL deletions
Streams behavior TTL deletions do appear in DynamoDB Streams as REMOVE events
Read behavior Expired items can still be returned in reads before physical deletion

Best practice: Always enforce expiry in application logic by checking ttl < time.time() on reads. Do not rely solely on DynamoDB's physical deletion timing.

import time

def is_expired(item: dict) -> bool:
    ttl_value = item.get("ttl")
    return ttl_value is not None and ttl_value < int(time.time())

9. DynamoDB Streams

Streams capture a time-ordered sequence of item-level changes (inserts, updates, deletes) with a 24-hour retention window.

Stream View Types

View Type What Is Captured
KEYS_ONLY Only the primary key attributes
NEW_IMAGE The entire item after the change
OLD_IMAGE The entire item before the change
NEW_AND_OLD_IMAGES Both before and after — most complete but most data

Common Use Cases

  • Trigger downstream Lambda processing on item changes
  • Replicate data to ElasticSearch / OpenSearch for full-text search
  • Invalidate caches when source data changes
  • Build event sourcing pipelines
  • Drive analytics aggregation in real time
  • Maintain materialized views across tables
DynamoDB item change
  → Stream record (change event)
    → Lambda trigger (reads stream in batches)
      → Process: update search index, publish to SNS, write to S3, etc.

10. DAX (DynamoDB Accelerator)

DAX is an in-memory read/write-through cache for DynamoDB that delivers microsecond response times.

Aspect DynamoDB DAX
Latency Single-digit milliseconds Microseconds
API compatibility DynamoDB API Same DynamoDB API (drop-in)
Consistency Strong or eventual Eventual only (item cache), eventual (query cache)
Write behavior Writes go to DynamoDB Write-through: DAX writes to both cache and DynamoDB
Best for General purpose Hot read keys, repeated identical queries
Not suitable for Write-heavy tables, strong consistency requirements, Scan-heavy patterns

DAX has two internal caches: - Item cache: Caches individual GetItem results - Query cache: Caches Query and Scan result sets by parameter hash


11. Consistency Models

Eventually Consistent Reads

  • Returns a response that may not reflect a very recent write (typically within a second)
  • Half the RCU cost of strongly consistent reads
  • Default for Query and Scan
  • Suitable for most read workloads where slight staleness is acceptable

Strongly Consistent Reads

  • Always reflects the most recent successfully completed write
  • Full RCU cost
  • Available only on base table reads and LSI reads (not GSI)
  • Use when correctness is critical: financial balances, inventory counts, idempotency checks

12. Single-Table Design Principles

Single-table design is the practice of storing multiple entity types in one DynamoDB table, differentiated by key patterns.

Why Do It?

  • Reduces the number of network round trips (fetch related entities in one Query)
  • Avoids cross-table joins at the application layer
  • Leads to better cost and latency characteristics when entities are accessed together

How to Differentiate Entity Types

Technique Example
PK prefix USER#abc, ORDER#xyz, SESSION#123
SK type marker SK = META, SK = PROFILE, SK = TURN#ts
entity_type attribute entity_type = "order" on every item

Access Pattern Mapping Template

Before building the schema, enumerate every access pattern:

# Access Pattern Operation Key Used
1 Get user by ID GetItem PK = USER#id
2 Get all orders for a user Query PK = USER#id, SK begins_with ORDER#
3 Get order by ID GetItem PK = ORDER#id
4 Get recent sessions Query GSI GSI1PK = customer_id, SK desc

Map every access pattern before writing a single line of code. DynamoDB schemas are hard to migrate later.


13. Common Pitfalls

Pitfall Root Cause Solution
Scan in production hot path Schema did not anticipate the access pattern Add a GSI for the query; never use Scan on active traffic
Hot partition throttling Low-cardinality or celebrity partition key Write sharding, bucketing, or DAX in front of hot reads
400 KB item size limit Storing large documents per item Decompose into multiple items per entity (e.g., per-turn model)
GSI lag causing stale reads GSI replication is eventually consistent Read from base table when strong consistency is needed
Missing UnprocessedItems retry Batch operations can partially fail Always loop on UnprocessedItems / UnprocessedKeys
TTL expiry gap Relying on physical deletion for correctness Enforce expiry at read time in application code
Over-relying on FilterExpression Filtering after reads does not save RCU Push filter logic into key conditions; design keys around your queries
Transaction overuse Using transactions for every write Only use transactions when multi-item atomicity is strictly required; 2× cost
No exponential backoff Burst writes exceed provisioned or adaptive capacity Use AWS SDK retry config with exponential backoff and jitter
Large Scan for bulk export Full table scan blocks other traffic Use parallel scan with segments; schedule during off-peak; export via AWS Data Pipeline or DynamoDB Export to S3

14. DynamoDB vs Other Databases — Quick Reference

Dimension DynamoDB PostgreSQL / Aurora MongoDB Cassandra
Data model Key-value + document Relational (tables, rows, joins) Document (JSON/BSON) Wide-column (partition + clustering keys)
Schema Flexible per item Rigid (migrations required) Flexible Semi-rigid (schema per column family)
Joins None (by design) Full JOIN support Limited ($lookup) None
Query flexibility Access-pattern-driven Ad hoc SQL Flexible JSON queries Access-pattern-driven
Scaling model Horizontal, automatic Vertical primary + read replicas Horizontal sharding Horizontal, manual ring management
Consistency Eventual or strong per read ACID transactions Tunable per query Tunable (eventual to quorum)
Ideal workload OLTP, session state, event streams Complex queries, reporting, transactions Flexible schemas, document storage High-write time-series, IoT, logs
Ops burden Near-zero (fully managed) Medium (patching, vacuuming, connections) Medium (sharding, index management) High (ring topology, compaction tuning)

15. Key Vocabulary for Interviews

Term One-Line Definition
Partition key The value that determines which physical shard stores an item
Sort key Secondary key that orders items within one partition
GSI An index with its own partition key for alternate access patterns, created at any time
LSI An index on the same partition key with a different sort key, defined at table creation only
WCU One write of up to 1 KB per second in provisioned mode
RCU One strongly-consistent read of up to 4 KB per second (0.5 for eventual)
On-demand mode Pay-per-request; no capacity planning; absorbs any burst
Adaptive capacity DynamoDB automatically reallocates capacity to hot partitions
Hot partition A partition receiving disproportionate traffic causing throttling
Write sharding Distributing writes across N virtual partitions to avoid hot spots
Single-table design Storing multiple entity types in one table, differentiated by key patterns
DAX In-memory write-through cache in front of DynamoDB; drops latency to microseconds
TTL Automatic item expiry based on a Unix epoch attribute; no WCU cost
Streams Ordered change log of DynamoDB mutations; 24-hour retention; triggers Lambda
Conditional write Write succeeds only if a specified condition on the item is true
Transaction All-or-nothing atomic operation across up to 100 items; costs 2×
FilterExpression Server-side filter evaluated after items are read; does not reduce RCU cost
KeyConditionExpression Filter applied before reading; bounds the read and reduces RCU cost