Scenario File 1 — Conversation Memory
Context in the Architecture
The Conversation Memory store is read and written on every single user message.
- Read path: Orchestrator loads
META+ latest turns before calling the Intent Classifier. - Write path: After the response is delivered, a new TURN item is written and META is updated.
- Access pattern: Key-value lookup by
SESSION#<session_id>, range query by timestamp for the last N turns. - Volume: At 500 concurrent sessions with a 3 req/s burst, this is ~1,500 writes/sec and ~1,500 reads/sec sustained.
- Constraints: 24-hour TTL per session, 400 KB DynamoDB item limit mitigated by separate turn items, session isolation (no cross-session queries needed).
Current Choice: DynamoDB
Why it was chosen: Single-digit millisecond reads at any scale, serverless (no cluster management), native TTL, per-item writes avoid re-writing the whole conversation on each turn, and on-demand capacity handles burst traffic from manga release days without pre-provisioning.
Schema recap:
PK = SESSION#<session_id>
SK = META | TURN#<epoch_ms> | SUMMARY#<window_id>
TTL on all items = epoch + 86400
Alternative 1: Redis (ElastiCache) as Conversation Store
What Changes
Instead of DynamoDB turn items, each session is a Redis Hash or Redis List. Turns are written as LPUSH chat:sess:abc123 <turn_json> and the latest 20 fetched with LRANGE chat:sess:abc123 0 19.
Best Case
- Sub-millisecond reads (<1ms) vs DynamoDB's ~5ms. If the latency budget is extremely tight and every ms counts for the first-token latency, Redis wins on raw speed.
- Simpler serialization: no composite key design needed, just serialize the turn as JSON and push.
- Session TTL is trivial:
EXPIRE chat:sess:abc123 86400.
Failure Scenario — The Flash Sale Memory Wipe
What happens: A major manga title (Chainsaw Man Vol 20) launches. 80,000 concurrent sessions spike in 15 minutes. Each session is an in-memory Redis List. The ElastiCache cluster memory fills to 95%.
Redis eviction policy is allkeys-lru. Redis starts evicting the least recently used sessions. Users mid-conversation suddenly lose all context — the chatbot responds as if they're starting fresh.
User: "What about the second one you mentioned?" Chatbot: "I'm sorry, what are you referring to?" (session was evicted)
The silent killer: There is no error logged. The chatbot simply cannot find the session and initializes a new one. The user thinks the bot is broken. CSAT drops. Escalation rate spikes.
Amplifier: Memory pressure also causes Redis replication lag. The replica falls behind. Reads hit the primary, doubling read latency and causing the very bottleneck Redis was supposed to avoid.
How you detect it: Session length distributions in your analytics dashboard show a bimodal pattern — most sessions show 8+ turns, but 30% show exactly 1 turn. That 1-turn spike on flash sale day is sessions that lost memory.
Grilling Questions
- You set
maxmemory-policy noevictionto prevent eviction. Now what happens when memory fills? Redis starts refusing all writes. The chatbot throw-catches the Redis error and falls back to stateless mode. How do you prevent this from cascading to Bedrock call failures? - A Redis primary node fails during the flash sale. Failover takes 30-60 seconds. How many sessions lose their in-progress context? What is the user-visible surface of this failure?
- Redis Cluster mode shards sessions by key. If two consecutive requests for the same session land on different shards due to a resharding event, what happens?
Decision Heuristic
Pick Redis as conversation memory only if your session count is low (<10,000 concurrent) AND you can tolerate volatile memory (sessions can be lost on failover). For a chatbot at Amazon scale, DynamoDB's durability is non-negotiable.
Alternative 2: PostgreSQL (Aurora) as Conversation Store
What Changes
Sessions and turns are stored in two relational tables:
CREATE TABLE sessions (
session_id VARCHAR(64) PRIMARY KEY,
customer_id VARCHAR(64),
created_at TIMESTAMP,
updated_at TIMESTAMP,
page_context JSONB,
turn_count INT,
expires_at TIMESTAMP
);
CREATE TABLE turns (
turn_id BIGSERIAL PRIMARY KEY,
session_id VARCHAR(64) REFERENCES sessions(session_id),
role VARCHAR(16),
content TEXT,
intent VARCHAR(32),
created_at TIMESTAMP
);
CREATE INDEX idx_turns_session_time ON turns(session_id, created_at DESC);
Best Case
- Rich query support: you can JOIN sessions and turns to answer questions like "what are the top 5 most common intents in sessions that escalated?" without a separate analytics system.
- Strong consistency without conditional writes or read-your-writes complexity.
- If you already run Aurora, zero additional managed services.
Failure Scenario — Connection Pool Exhaustion During Traffic Spike
What happens: Aurora PostgreSQL has a connection limit. The DBA sets max_connections = 500 for the db.r6g.large instance.
The Orchestrator runs on ECS Fargate with 50 tasks. Each task uses a connection pool with max_pool_size=20. At full scale: 50 × 20 = 1,000 connections attempted. RDS Proxy is not configured.
During a traffic spike, connection establishment fails. The Orchestrator receives FATAL: remaining connection slots are reserved from every new request. The chatbot returns 503 to every user.
Amplifier: The retry logic in the Orchestrator uses exponential backoff. Retries pile up. The connection queue depth grows. Even after traffic normalizes, the retries keep the connection count pegged high for 3 additional minutes.
What makes this different from DynamoDB: DynamoDB has no connection concept. It handles 100,000+ concurrent requests without a connection pool. Aurora's connection limit is a hard architectural ceiling that requires RDS Proxy, pgBouncer, or aggressive pool sizing to manage.
Failure Scenario 2 — The N+1 Write Problem
For every user message, the code does:
session.updated_at = now()
session.turn_count += 1
db.update(session)
db.insert(turn)
At 1,500 writes/sec, that is 3,000 SQL statements/sec with two round-trips per message. Connection latency is 3ms each: total write cost = 9ms per message just for two sequential SQL writes vs. DynamoDB's single conditional write completing in ~5ms.
Over a 6-turn conversation: PostgreSQL adds 54ms of cumulative write overhead vs. ~30ms for DynamoDB.
Grilling Questions
- You batch turn writes with a 100ms buffer to reduce SQL statement count. Now a Fargate task crashes mid-buffer. How many turns are lost? How do you reconcile the session state?
- A VACUUM ANALYZE job runs on
turnsduring peak traffic. What happens to query latency? How do you prevent this? - The
turnstable grows to 500M rows over 6 months. A query for "all sessions with unresolved escalation intent in the last 24 hours" runs for 45 seconds. How do you partition this table to avoid this?
Decision Heuristic
Pick Aurora/PostgreSQL when you need rich cross-session querying (analytics on session data) AND you run RDS Proxy or pgBouncer for connection management AND session counts stay below 50K concurrent. At Amazon chatbot scale, this is almost never the right choice for the hot write path.
Alternative 3: MongoDB (DocumentDB) as Conversation Store
What Changes
Each session becomes one document with a nested array of turns:
{
"_id": "sess_abc123",
"customer_id": "C123",
"page_context": { "asin": "B08X1YRSTR" },
"turns": [
{ "role": "user", "content": "Show me horror manga", "intent": "product_discovery", "ts": 1700000001 },
{ "role": "assistant", "content": "Here are 3 horror manga...", "ts": 1700000003 }
],
"created_at": 1700000000,
"ttl": 1700086400
}
Best Case
- Schema-less: adding a new field to a turn (e.g.,
guardrail_flags) requires no migration. - Single document read/write per session: one
findOnefor the whole session, oneupdateOneto append a turn. - Flexible querying with aggregation pipelines if session analytics are needed.
Failure Scenario — The 16MB Document Bomb
What happens: MongoDB has a 16MB hard document size limit. A power user opens a chat session and asks 200 questions about manga (testing the chatbot, reviewing content, etc.). Each turn averages 800 bytes. At 400 turns: 320KB — safe. But if turn content includes LLM-generated product cards and follow-up suggestions serialized as JSON: 2KB per turn × 400 turns = 800KB approaching the limit.
In practice, at 8,000 turns the document hits 16MB and the next write fails with:
MongoError: BSONObj size: 16777222 (0xFFFFFF) is invalid. Size must be
between 0 and 16793600(16MB)
The session becomes write-locked. The chatbot crashes with an unhandled exception. The user's 200-turn research session is lost.
The insidious part: This never happens in testing (short sessions), never happens in staging (max 20 turns per test). It only hits a specific class of power user in production.
Mitigation (and why it creates new problems): You split turns into separate documents per window of 20 turns. Now you have the same N+1 read problem you were trying to avoid — fetching the latest 20 turns requires reading the most recent turn-window document AND checking if the active window has rolled over.
Grilling Questions
- A MongoDB replica set primary election happens while a turn write is in-flight. Write concern
w:1means the write was acknowledged by the primary before it crashed. Is that turn durable? What write concern do you need, and what does it cost in p99 latency? - You use
$pushwith$sliceto keep only the last 20 turns in the document (auto-truncation). A summary of turns 1-20 should be generated before they're sliced. How do you guarantee the summary always exists before the slice operation? - TTL indexes in MongoDB run every 60 seconds on a background thread. During a traffic spike, TTL cleanup lags by 120 seconds. What is the operational risk?
Decision Heuristic
Pick MongoDB when your turn structure is highly heterogeneous (different intents produce wildly different turn schemas) AND session length is bounded (max 50 turns by design). For unbounded conversation length, the document size limit is a ticking bomb.
Alternative 4: DynamoDB + DAX (Accelerator) for Conversation Memory
What Changes
DAX is an in-memory cache layer in front of DynamoDB that is API-compatible — zero code change. Reads that would take 5ms from DynamoDB take <1ms from DAX's cache.
Best Case
- For repeated reads of the same session within a short window (e.g., mobile client that pre-fetches session state and re-fetches on reconnect), DAX eliminates redundant DynamoDB reads.
- Useful if your read-to-write ratio is >5:1 per session.
Failure Scenario — The Write-Through Cache Miss on Every Turn
What happens: Each new TURN write must invalidate or update the DAX cache. DAX supports write-through — writes go to DynamoDB and invalidate the cache item.
But in MangaAssist, the session access pattern is: write once (new turn), read once (next turn). Every turn is written, then the next message reads it. The DAX cache hit rate for turn items approaches 0% because each turn is unique and freshly written.
Result: DAX cluster costs ~$0.25/node-hour (3-node minimum = $0.75/hr). You pay for DAX, but every read still goes to DynamoDB because turns are never hot in the cache. DAX accelerates the META item (which is read every turn), but META is a single item per session — the DynamoDB read is already <5ms.
The lesson: DAX optimizes the read-heavy, stable-key pattern (e.g., same product detail page read by thousands of users). Conversation turns are write-once and never re-read by multiple users. DAX is the wrong optimization target here.
Grilling Questions
- You do get one meaningful DAX hit: the META item (session metadata) is read on every turn write to check
turn_count. Should you put DAX in the path for just the META item? What is the code complexity cost? - A DAX node fails. The client falls back to DynamoDB. How does this compare to the Redis failover scenario?
- DAX is only available within the same VPC. If you ever need cross-region replication for conversation memory, does DAX block that?
Decision Heuristic
Add DAX only when DynamoDB reads for the same key are repeated within seconds by multiple clients. For per-session, per-turn unique keys, DAX adds cost and complexity with near-zero cache hit rate.
Master Summary Table
| Choice | Read Latency | Write Latency | Durability | Scale Ceiling | Key Failure Risk |
|---|---|---|---|---|---|
| DynamoDB (current) | ~5ms | ~5ms | High (multi-AZ) | Virtually unlimited | Hot partition if key design fails |
| Redis | <1ms | <1ms | Low (volatile) | Bounded by RAM | Memory eviction wipes sessions |
| Aurora PostgreSQL | 3-10ms | 6-15ms (×2 writes) | High | Connection pool ceiling | Connection exhaustion at scale |
| MongoDB | 5-15ms | 5-15ms | Medium (w:1) | 16MB doc limit | Document size bomb on long sessions |
| DynamoDB + DAX | <1ms (hit) / ~5ms (miss) | ~5ms | High | DAX cluster size | Zero cache hit rate for turn pattern |