DynamoDB Basics — Comprehensive Reference

Audience: Developers, architects, and interview candidates who need a complete grounding in DynamoDB from first principles through production-ready patterns.

1. What Is DynamoDB?

Amazon DynamoDB is a fully managed, serverless, key-value and document NoSQL database built for any scale. It delivers single-digit millisecond latency at any throughput level, without manual sharding, schema migrations, or capacity tuning by default.

Property	Value
Type	Key-value + document NoSQL
Managed by	AWS (no servers to operate)
Latency target	Single-digit milliseconds
Durability	Multi-AZ replication, 99.999% SLA
Consistency options	Eventually consistent or strongly consistent
Scaling model	On-demand or provisioned with auto-scaling
Max item size	400 KB
Max partitions	Unlimited (managed automatically)

2. Core Concepts

2.1 Table

A DynamoDB table is the top-level container for data. Unlike a relational table, it does not enforce a fixed schema across all items. Every item in the same table can have different attributes, except for the primary key.

Table: customer-orders
  Item → { order_id: "ORD-001", customer: "alice", amount: 59.99 }
  Item → { order_id: "ORD-002", customer: "bob", amount: 12.00, promo: "SAVE10" }
  Item → { order_id: "ORD-003", customer: "alice", note: "gift", amount: 200.00 }

2.2 Item

An item is a single record in a table. It is a collection of attributes. Each item must include the primary key attributes. All other attributes are optional and can vary per item.

2.3 Attribute

An attribute is a named value stored in an item. Think of it as a field in a JSON object.

Supported Data Types

Category	Types	Notes
Scalar	`S` (String), `N` (Number), `B` (Binary), `BOOL` (Boolean), `NULL`	Most common building blocks
Set	`SS` (String Set), `NS` (Number Set), `BS` (Binary Set)	Unordered; no duplicates allowed
Document	`M` (Map), `L` (List)	Nested structures, similar to JSON objects and arrays

Number storage note: Numbers are stored as strings internally to avoid floating-point precision loss. All numeric comparisons still work correctly.

3. Primary Keys

The primary key is the single most important design decision in DynamoDB. It determines how data is partitioned, how it is retrieved, and what access patterns are efficient.

3.1 Simple Primary Key (Partition Key Only)

Only a partition key is defined. Each item must have a unique partition key value.

PK = user_id
  "user123" → { name: "Alice", email: "alice@example.com" }
  "user456" → { name: "Bob",   email: "bob@example.com" }

Good for: Lookups where you always know the exact key. GetItem by user ID is the canonical example.

3.2 Composite Primary Key (Partition Key + Sort Key)

Both a partition key and a sort key are defined. Items with the same partition key are grouped together and sorted by the sort key. This is the most powerful and most commonly used key design.

PK = user_id  |  SK = order_date
  "user123"  |  "2024-01-01"  →  first order
  "user123"  |  "2024-03-15"  →  second order
  "user456"  |  "2024-02-10"  →  another user's order

Good for: Time-series data, hierarchical relationships, one-to-many relationships, ordered retrieval within a group.

3.3 Partition Key Design Rules

Rule	Why It Matters
High cardinality	Low-cardinality keys (like `status = "active"`) concentrate traffic on one partition
Write distribution	All writes to the same key go to the same partition shard
Avoid hot keys	A single very popular partition key limits throughput to ~1000 WCU and ~3000 RCU per partition
Prefix patterns	Using `USER#<id>` or `ORDER#<id>` prefixes prevents cross-entity key collisions in single-table designs

3.4 Hot Partition Mitigation

If your workload puts too much traffic on a single partition key:

Write sharding: Append a random suffix (user_id#1, user_id#2, ..., user_id#N) and scatter writes across N logical partitions. Reads must aggregate across shards.
Attribute bucketing: Use a time bucket as part of the partition key (user_id#2024-03) so traffic is distributed by time.
DAX in front: Cache hot reads to reduce RCU pressure on a single partition.

4. Indexes

4.1 Local Secondary Index (LSI)

An LSI shares the partition key with the base table but uses a different sort key. It gives you an alternate ordering of items within the same partition.

Property	Value
Partition key	Same as base table
Sort key	Different attribute
Creation time	Must be defined at table creation — cannot add later
Consistency	Supports strongly consistent reads
Limit	Up to 5 per table
Storage	Shares partition storage with base table

Base table:   PK = user_id,  SK = created_at
LSI:          PK = user_id,  SK = status
→ Now you can query all items for a user filtered by status within that partition

4.2 Global Secondary Index (GSI)

A GSI has its own partition key and optional sort key, completely independent of the base table. It lets you query the same data from a different angle.

Property	Value
Partition key	Any attribute (different from base table)
Sort key	Optional, any attribute
Creation time	Can be added or deleted at any time
Consistency	Eventually consistent only
Limit	Up to 20 per table (default quota, can be raised)
Throughput	Has its own separate WCU/RCU allocation
Projection	ALL, KEYS_ONLY, or INCLUDE (specific attributes)

Base table:   PK = order_id
GSI:          PK = customer_id,  SK = order_date
→ Now you can query all orders for a customer sorted by date

GSI projection strategy: - KEYS_ONLY: only index keys are stored in the index — cheapest - INCLUDE: explicitly named attributes are included — balance between cost and query needs - ALL: every attribute is replicated — most flexible but most expensive

4.3 LSI vs GSI Comparison

Dimension	LSI	GSI
Partition key	Must match base table	Independent
Consistency	Strong or eventual	Eventual only
Added after creation	No	Yes
Item collection size limit	10 GB per partition key value	No such limit
Can span across partitions	No	Yes

5. Capacity Modes

5.1 On-Demand Mode

DynamoDB automatically scales to accommodate any request rate. You pay per read/write request consumed.

Aspect	Detail
Scaling	Instant, no capacity planning
Cost model	Pay per request (WRU/RRU)
Best for	Unpredictable traffic, new applications, traffic spikes
Pricing (2024, us-east-1)	~$1.25 per million write request units, ~$0.25 per million read request units

5.2 Provisioned Mode

You specify RCU and WCU targets in advance. You pay for the provisioned capacity whether it is used or not.

Aspect	Detail
Scaling	Manual or auto-scaling (with min/max targets)
Cost model	Pay for provisioned throughput by the hour
Best for	Predictable steady traffic where cost optimization matters
Auto-scaling	CloudWatch-driven, reactive to actual usage with target utilization setting

5.3 Capacity Unit Math

Unit	Rule
1 WCU	1 write of up to 1 KB per second
1 RCU (strong)	1 read of up to 4 KB per second
1 RCU (eventual)	2 reads of up to 4 KB per second (half the cost)
Transactional WCU	2× the standard WCU cost
Transactional RCU	2× the standard RCU cost

Size rounding: Items are rounded up to the nearest 1 KB for writes, and to the nearest 4 KB for reads. A 1.5 KB item counts as 2 WCU.

6. Read and Write Operations

6.1 Single-Item Operations

API	What It Does
`PutItem`	Creates or fully replaces an item identified by its primary key
`GetItem`	Retrieves a single item by its exact primary key
`UpdateItem`	Creates or patches an item; can increment/append/remove individual attributes
`DeleteItem`	Removes an item by its primary key

6.2 Batch Operations

API	What It Does	Limit
`BatchGetItem`	Retrieves up to 100 items across one or more tables	100 items, 16 MB per call
`BatchWriteItem`	Puts or deletes up to 25 items per call	25 items, 16 MB per call

Important: Batch operations are not atomic. Individual operations within a batch can fail independently. You must handle UnprocessedItems / UnprocessedKeys in your retry logic.

6.3 Query

Reads all items that share a partition key, with optional sort key filtering.

Always requires a partition key value
Sort key can be filtered with =, <, >, BETWEEN, begins_with
Results can be paginated with ExclusiveStartKey / LastEvaluatedKey
Can be scanned in forward or reverse sort order
Much cheaper than Scan because it is a bounded, targeted read

response = table.query(
    KeyConditionExpression=Key("PK").eq("SESSION#abc") & Key("SK").begins_with("TURN#"),
    ScanIndexForward=False,   # descending sort order (latest turns first)
    Limit=10
)

6.4 Scan

Reads every item in a table or index sequentially.

No key requirement — reads everything
Very expensive at scale; avoid in hot paths
Can be parallelized using Segment and TotalSegments for bulk export jobs
Add a FilterExpression to limit returned data, but filtered items still consume RCU

# Parallel scan across 4 workers
response = table.scan(TotalSegments=4, Segment=worker_id)

6.5 Filter Expression vs Key Condition Expression

Concept	Evaluated At	Affects RCU Cost
`KeyConditionExpression`	Before reading	Yes — limits items fetched
`FilterExpression`	After reading	No — items are already read and billed

This is a critical distinction. A FilterExpression that discards 90% of results still bills you for reading 100% of them. Design key patterns so the key condition does most of the filtering.

7. Writes in Depth

7.1 Conditional Writes

Write only proceeds if a condition is true. If the condition fails, DynamoDB returns ConditionalCheckFailedException.

# Only insert if the item does not already exist
table.put_item(
    Item={"PK": "SESSION#abc", "SK": "META", "status": "active"},
    ConditionExpression="attribute_not_exists(PK)"
)

# Atomic counter increment — only if counter is below limit
table.update_item(
    Key={"PK": "COUNTER#daily", "SK": "2024-03-28"},
    UpdateExpression="SET count = count + :inc",
    ConditionExpression="count < :limit",
    ExpressionAttributeValues={":inc": 1, ":limit": 1000}
)

7.2 UpdateExpression Operators

Operator	Purpose	Example
`SET`	Set or overwrite an attribute	`SET status = :val`
`REMOVE`	Delete an attribute from an item	`REMOVE page_context`
`ADD`	Increment a number or add to a set	`ADD view_count :one`
`DELETE`	Remove values from a set attribute	`DELETE tags :old_tag`

Multiple clauses can be combined: SET updated_at = :ts REMOVE old_field ADD view_count :one

7.3 Transactions

TransactWriteItems and TransactGetItems provide all-or-nothing semantics across up to 100 items.

client.transact_write_items(
    TransactItems=[
        {
            "Put": {
                "TableName": "orders",
                "Item": {"PK": "ORDER#999", "status": "placed", "amount": 99},
                "ConditionExpression": "attribute_not_exists(PK)"
            }
        },
        {
            "Update": {
                "TableName": "inventory",
                "Key": {"PK": "SKU#X1"},
                "UpdateExpression": "SET stock = stock - :one",
                "ConditionExpression": "stock > :zero",
                "ExpressionAttributeValues": {":one": 1, ":zero": 0}
            }
        }
    ]
)

Transaction costs: Each item in a transaction consumes 2× the normal RCU or WCU. Use transactions only when atomicity genuinely matters.

8. TTL (Time to Live)

DynamoDB TTL automatically deletes expired items without consuming WCU.

Property	Detail
Attribute type	Number (Unix epoch seconds)
Deletion lag	Items are typically deleted within 48 hours of expiry
Cost	No WCU consumed for TTL deletions
Streams behavior	TTL deletions do appear in DynamoDB Streams as `REMOVE` events
Read behavior	Expired items can still be returned in reads before physical deletion

Best practice: Always enforce expiry in application logic by checking ttl < time.time() on reads. Do not rely solely on DynamoDB's physical deletion timing.

import time

def is_expired(item: dict) -> bool:
    ttl_value = item.get("ttl")
    return ttl_value is not None and ttl_value < int(time.time())

9. DynamoDB Streams

Streams capture a time-ordered sequence of item-level changes (inserts, updates, deletes) with a 24-hour retention window.

Stream View Types

View Type	What Is Captured
`KEYS_ONLY`	Only the primary key attributes
`NEW_IMAGE`	The entire item after the change
`OLD_IMAGE`	The entire item before the change
`NEW_AND_OLD_IMAGES`	Both before and after — most complete but most data

Common Use Cases

Trigger downstream Lambda processing on item changes
Replicate data to ElasticSearch / OpenSearch for full-text search
Invalidate caches when source data changes
Build event sourcing pipelines
Drive analytics aggregation in real time
Maintain materialized views across tables

DynamoDB item change
  → Stream record (change event)
    → Lambda trigger (reads stream in batches)
      → Process: update search index, publish to SNS, write to S3, etc.

10. DAX (DynamoDB Accelerator)

DAX is an in-memory read/write-through cache for DynamoDB that delivers microsecond response times.

Aspect	DynamoDB	DAX
Latency	Single-digit milliseconds	Microseconds
API compatibility	DynamoDB API	Same DynamoDB API (drop-in)
Consistency	Strong or eventual	Eventual only (item cache), eventual (query cache)
Write behavior	Writes go to DynamoDB	Write-through: DAX writes to both cache and DynamoDB
Best for	General purpose	Hot read keys, repeated identical queries
Not suitable for	—	Write-heavy tables, strong consistency requirements, Scan-heavy patterns

DAX has two internal caches: - Item cache: Caches individual GetItem results - Query cache: Caches Query and Scan result sets by parameter hash

11. Consistency Models

Eventually Consistent Reads

Returns a response that may not reflect a very recent write (typically within a second)
Half the RCU cost of strongly consistent reads
Default for Query and Scan
Suitable for most read workloads where slight staleness is acceptable

Strongly Consistent Reads

Always reflects the most recent successfully completed write
Full RCU cost
Available only on base table reads and LSI reads (not GSI)
Use when correctness is critical: financial balances, inventory counts, idempotency checks

12. Single-Table Design Principles

Single-table design is the practice of storing multiple entity types in one DynamoDB table, differentiated by key patterns.

Why Do It?

Reduces the number of network round trips (fetch related entities in one Query)
Avoids cross-table joins at the application layer
Leads to better cost and latency characteristics when entities are accessed together

How to Differentiate Entity Types

Technique	Example
PK prefix	`USER#abc`, `ORDER#xyz`, `SESSION#123`
SK type marker	`SK = META`, `SK = PROFILE`, `SK = TURN#ts`
`entity_type` attribute	`entity_type = "order"` on every item

Access Pattern Mapping Template

Before building the schema, enumerate every access pattern:

#	Access Pattern	Operation	Key Used
1	Get user by ID	`GetItem`	`PK = USER#id`
2	Get all orders for a user	`Query`	`PK = USER#id, SK begins_with ORDER#`
3	Get order by ID	`GetItem`	`PK = ORDER#id`
4	Get recent sessions	`Query GSI`	`GSI1PK = customer_id, SK desc`

Map every access pattern before writing a single line of code. DynamoDB schemas are hard to migrate later.

13. Common Pitfalls

Pitfall	Root Cause	Solution
Scan in production hot path	Schema did not anticipate the access pattern	Add a GSI for the query; never use Scan on active traffic
Hot partition throttling	Low-cardinality or celebrity partition key	Write sharding, bucketing, or DAX in front of hot reads
400 KB item size limit	Storing large documents per item	Decompose into multiple items per entity (e.g., per-turn model)
GSI lag causing stale reads	GSI replication is eventually consistent	Read from base table when strong consistency is needed
Missing `UnprocessedItems` retry	Batch operations can partially fail	Always loop on `UnprocessedItems` / `UnprocessedKeys`
TTL expiry gap	Relying on physical deletion for correctness	Enforce expiry at read time in application code
Over-relying on FilterExpression	Filtering after reads does not save RCU	Push filter logic into key conditions; design keys around your queries
Transaction overuse	Using transactions for every write	Only use transactions when multi-item atomicity is strictly required; 2× cost
No exponential backoff	Burst writes exceed provisioned or adaptive capacity	Use AWS SDK retry config with exponential backoff and jitter
Large Scan for bulk export	Full table scan blocks other traffic	Use parallel scan with segments; schedule during off-peak; export via AWS Data Pipeline or DynamoDB Export to S3

14. DynamoDB vs Other Databases — Quick Reference

Dimension	DynamoDB	PostgreSQL / Aurora	MongoDB	Cassandra
Data model	Key-value + document	Relational (tables, rows, joins)	Document (JSON/BSON)	Wide-column (partition + clustering keys)
Schema	Flexible per item	Rigid (migrations required)	Flexible	Semi-rigid (schema per column family)
Joins	None (by design)	Full JOIN support	Limited ($lookup)	None
Query flexibility	Access-pattern-driven	Ad hoc SQL	Flexible JSON queries	Access-pattern-driven
Scaling model	Horizontal, automatic	Vertical primary + read replicas	Horizontal sharding	Horizontal, manual ring management
Consistency	Eventual or strong per read	ACID transactions	Tunable per query	Tunable (eventual to quorum)
Ideal workload	OLTP, session state, event streams	Complex queries, reporting, transactions	Flexible schemas, document storage	High-write time-series, IoT, logs
Ops burden	Near-zero (fully managed)	Medium (patching, vacuuming, connections)	Medium (sharding, index management)	High (ring topology, compaction tuning)

15. Key Vocabulary for Interviews

Term	One-Line Definition
Partition key	The value that determines which physical shard stores an item
Sort key	Secondary key that orders items within one partition
GSI	An index with its own partition key for alternate access patterns, created at any time
LSI	An index on the same partition key with a different sort key, defined at table creation only
WCU	One write of up to 1 KB per second in provisioned mode
RCU	One strongly-consistent read of up to 4 KB per second (0.5 for eventual)
On-demand mode	Pay-per-request; no capacity planning; absorbs any burst
Adaptive capacity	DynamoDB automatically reallocates capacity to hot partitions
Hot partition	A partition receiving disproportionate traffic causing throttling
Write sharding	Distributing writes across N virtual partitions to avoid hot spots
Single-table design	Storing multiple entity types in one table, differentiated by key patterns
DAX	In-memory write-through cache in front of DynamoDB; drops latency to microseconds
TTL	Automatic item expiry based on a Unix epoch attribute; no WCU cost
Streams	Ordered change log of DynamoDB mutations; 24-hour retention; triggers Lambda
Conditional write	Write succeeds only if a specified condition on the item is true
Transaction	All-or-nothing atomic operation across up to 100 items; costs 2×
FilterExpression	Server-side filter evaluated after items are read; does not reduce RCU cost
KeyConditionExpression	Filter applied before reading; bounds the read and reduces RCU cost