5. Incident Response and Security Forensics
Incident response for an LLM product is not just "watch for 500s and roll back bad code." MangaAssist can fail in ways that are subtle, distributed, and partially probabilistic:
- a prompt change can open a new path for PII generation
- a retrieval bug can surface an internal document without any infrastructure breach
- a model regression can change behavior without a code diff
- a state isolation bug can leak one user's data into another user's session
- an anomaly can look like exfiltration but be a legitimate user asking for detail
For that reason, the incident program needs three things at the same time:
- very fast containment
- defensible forensic evidence
- enough telemetry to separate model behavior from application bugs, retrieval bugs, and infrastructure issues
This chapter expands the original material into a full operating model: lifecycle, HLD, LLD, data flow, evidence design, runbooks, scenario walkthroughs, and follow-up questions with deep-dive answers.
Why Incident Response Is Harder for LLM Systems
| Failure Mode | Why It Is Hard to Detect | Why It Is Hard to Prove | Evidence Required |
|---|---|---|---|
| Prompt-induced PII generation | It may affect only a narrow slice of prompts | The unsafe output may be generated, blocked, or redacted before users see it | Prompt version, FM output hash, guardrail decisions, redaction events |
| Cross-session leakage | It can happen only on warm containers or rare cache paths | User report may be the first signal | Container ID, session IDs, prompt assembly payload, memory source trace |
| Retrieval ACL failure | Output may look factually correct, just over-authorized | Need to prove which chunk was retrieved and why | Retrieval document IDs, metadata, index snapshot |
| Model behavior shift | Same code, same prompt, different generation behavior | Often mistaken for application regressions | Model ID, region, model version, shadow outputs, canary traces |
| Suspicious long responses | Looks like exfiltration or dumping | Could be valid detailed help | Intent, retrieval chunk count, response token count, policy source list |
The key design principle is this:
Every customer-visible response must be reconstructable as a chain of evidence: user request -> auth context -> routing -> retrieval -> prompt assembly -> model output -> guardrail actions -> delivered response.
Without that chain, you can detect incidents, but you cannot investigate them rigorously.
Incident Response Objectives
| Objective | What Good Looks Like | Why It Matters |
|---|---|---|
| Fast containment | Kill switch or config rollback in under 15 minutes for SEV-1 | User harm grows while unsafe traffic continues |
| Evidence preservation | Relevant artifacts snapshotted before mutation or expiry | Prompt versions, KB state, and logs can change fast |
| Accurate classification | Real breach vs false positive vs quality issue is separated quickly | Overreacting causes avoidable outages; underreacting causes harm |
| Blast-radius assessment | Affected sessions, intents, and data elements can be enumerated | Notification, remediation, and legal review depend on scope |
| Systemic prevention | Root cause leads to code, process, and test changes | Repeating the same incident is an engineering failure |
Target operating thresholds:
SEV-1: contain in under 15 minutesSEV-2: contain in under 60 minutesSEV-3: triage and begin evidence preservation in under 4 hoursSEV-4: investigate in business hours unless a trigger escalates
Lifecycle and Command Model
Incident Lifecycle
stateDiagram-v2
[*] --> Detect: Alert, user report, audit finding
Detect --> Declare: Confirm incident or security anomaly
Declare --> Contain: Kill switch, rollback, traffic shaping
Declare --> Preserve: Snapshot evidence immediately
Contain --> Investigate
Preserve --> Investigate
Investigate --> Eradicate: Remove root cause
Eradicate --> Recover: Restore traffic safely
Recover --> Monitor: Verify no recurrence
Monitor --> Review: Post-incident review
Review --> [*]: Action items tracked to closure
Parallel Workstreams During an Incident
In a serious incident, these happen in parallel, not sequentially:
Containment track: stop further harmForensics track: preserve evidence before it disappearsComms track: notify on-call, security, legal, privacy, and supportDecision track: decide whether this is a breach, quality issue, abuse attempt, or false alarm
Response Roles
| Role | Primary Responsibility | Typical Owner |
|---|---|---|
| Incident Commander | Owns severity, priorities, timeline, and recovery decision | On-call engineering lead |
| Security Lead | Owns breach assessment and evidence integrity | Security engineer |
| Service Owner | Owns technical diagnosis and remediation | MangaAssist backend lead |
| Data Protection / Privacy | Owns legal notification requirements | Privacy or compliance lead |
| Scribe | Maintains decision log and exact timeline | Secondary on-call |
| Support Liaison | Coordinates user-facing support impact | Customer support lead |
The most common anti-pattern is having everyone investigate while nobody owns containment. MangaAssist explicitly assigns containment ownership to the Incident Commander from minute one.
Severity Matrix and Escalation Rules
| Severity | Definition | Containment SLA | Typical Examples |
|---|---|---|---|
SEV-1 |
Confirmed or strongly suspected cross-user data exposure, active breach, or unsafe output at scale | < 15 min |
Cross-session order leak, internal policy exposure with sensitive thresholds, mass PII leakage |
SEV-2 |
Production security regression with limited scope or partial exposure | < 60 min |
Prompt canary leaking obfuscated email patterns, guardrail bypass for one intent |
SEV-3 |
Security anomaly needing investigation, unclear exploitability | < 4 h to declare path |
Response-size anomaly, injection spike, elevated block rate |
SEV-4 |
Low-impact or single-session issue with no evidence of scale | < 72 h |
One odd transcript, isolated policy mismatch, monitoring false positive |
Escalation Triggers
A lower severity becomes higher if any of the following is true:
- multiple independent sessions show the same pattern
- a user-visible leak involves another customer's data
- the issue is trending upward after a rollout
- the unsafe behavior affects a high-risk intent like order tracking or returns
- external disclosure occurs on social media, support escalations, or bug bounty channels
HLD: Incident Detection, Containment, and Forensics Plane
flowchart TB
subgraph Runtime["MangaAssist Runtime"]
U[User]
G[API Gateway]
O[Orchestrator]
R[Retriever / KB]
L[LLM on Bedrock]
GR[Guardrails Pipeline]
M[Conversation Memory]
B[Backend Services<br/>Orders, Returns, Catalog]
U --> G --> O
O --> R
O --> M
O --> B
O --> L
L --> GR
GR --> G
end
subgraph Observability["Observability and Control"]
Logs[CloudWatch Logs]
Metrics[CloudWatch Metrics / Alarms]
Trail[CloudTrail]
Events[EventBridge]
Config[AWS AppConfig]
Pager[SNS / PagerDuty / Slack]
end
subgraph Forensics["Incident Response and Forensics"]
Router[Incident Router]
SFN[Step Functions War Room Workflow]
Evidence[Evidence Collector]
Index[DynamoDB Incident Index]
Audit[S3 Audit Bucket<br/>Object Lock + SSE-KMS]
Athena[Athena / Logs Insights]
end
O --> Logs
R --> Logs
L --> Logs
GR --> Logs
G --> Metrics
O --> Metrics
Trail --> Events
Logs --> Events
Metrics --> Events
Events --> Router --> SFN
SFN --> Pager
SFN --> Config
SFN --> Evidence
Evidence --> Index
Evidence --> Audit
Athena --> Audit
HLD Principles
- Runtime and response plane are separate. Incidents must still be manageable when the application plane is degraded.
- Containment is config-first. AppConfig kill switches are faster and safer than emergency redeploys.
- Evidence is immutable. The investigation uses append-only artifacts, not mutable app logs alone.
- Every incident gets a data package. Even false alarms produce a minimal evidence package so tuning is auditable.
End-to-End Incident Data Flow
sequenceDiagram
participant User
participant Gateway as API Gateway
participant Orch as Orchestrator
participant Guard as Guardrails
participant Obs as Logs and Metrics
participant EB as EventBridge
participant IR as Incident Workflow
participant CFG as AppConfig
participant EV as Evidence Collector
participant S3 as Immutable Audit Bucket
participant IC as Incident Commander
User->>Gateway: Chat request
Gateway->>Orch: Authenticated message + session context
Orch->>Guard: Candidate response + metadata
Guard-->>Orch: pass / modify / block
Orch->>Obs: Structured events, hashes, latency, versions
Obs->>EB: Alarm or anomaly event
EB->>IR: Create incident workflow
IR->>IC: Page on-call and create incident channel
IR->>CFG: Optional automated containment
IR->>EV: Snapshot prompt version, retrieval docs, transcripts, config
EV->>S3: Store immutable evidence bundle
EV-->>IC: Evidence manifest and first triage summary
Important detail: evidence preservation begins as soon as an incident is declared, even before root cause is known. That prevents losing volatile artifacts such as canary prompt versions, temporary feature flags, or short-lived transcripts.
LLD: Core Response Components
flowchart LR
subgraph Detection["Detection Layer"]
A1[CloudWatch Alarm<br/>threshold or anomaly]
A2[Custom Detector Lambda]
A3[Support Ticket Ingest]
A4[GuardDuty / CloudTrail Findings]
end
subgraph Routing["Incident Routing"]
B1[EventBridge Rules]
B2[Incident Router Lambda]
B3[Severity Calculator]
end
subgraph Actions["Containment and Preservation"]
C1[AppConfig Kill Switch API]
C2[Evidence Collector Lambda]
C3[Timeline Builder]
C4[Pager and Slack Notifier]
end
subgraph Storage["Forensic Storage"]
D1[DynamoDB Incident Table]
D2[S3 Evidence Bucket<br/>Object Lock]
D3[CloudWatch Logs]
D4[CloudTrail Archive]
end
subgraph Query["Investigation"]
E1[CloudWatch Logs Insights]
E2[Athena]
E3[Security Dashboard]
end
A1 --> B1
A2 --> B1
A3 --> B1
A4 --> B1
B1 --> B2 --> B3
B3 --> C1
B3 --> C2
B3 --> C3
B3 --> C4
C2 --> D1
C2 --> D2
C3 --> D1
D3 --> E1
D2 --> E2
D1 --> E3
D4 --> E2
LLD Component Table
| Component | Implementation | What It Stores or Does | Failure It Helps Diagnose |
|---|---|---|---|
| Incident Router | Lambda behind EventBridge | Normalizes alerts into one incident envelope | Duplicate or fragmented alerting |
| Severity Calculator | Deterministic rules + overrides | Maps signal type, intent, and data sensitivity to severity | Slow or inconsistent triage |
| Kill Switch Controller | AppConfig update API | Disable intents, swap model tiers, force static fallback | Delayed containment |
| Evidence Collector | Lambda + Step Functions | Pulls transcripts, prompt versions, KB docs, config, hashes | Missing volatile evidence |
| Timeline Builder | Lambda over logs | Builds event-by-event chronology from correlation IDs | Confusing incident timelines |
| Incident Table | DynamoDB | Incident metadata, manifest, owners, actions | Lack of central status |
| Immutable Evidence Bucket | S3 with Object Lock + KMS | Forensic artifacts and signed manifests | Tampering or accidental deletion |
Evidence Model and Chain of Custody
What Must Be Captured for Every Serious Incident
For SEV-1 and SEV-2, MangaAssist preserves:
- user message hash and response hash
- raw transcript in restricted evidence storage
correlation_id,session_id,customer_id_hash- prompt template version and resolved prompt text
- retrieval document IDs and metadata
- model ID, region, inference timestamp, and feature-flag state
- guardrail stage outcomes and any redaction or modification metadata
- backend call summaries for order, return, and catalog services
- deployment version, Lambda container ID, and request ID where applicable
Correlation Keys
| Field | Purpose | Notes |
|---|---|---|
correlation_id |
Ties the end-to-end request together | Generated at API edge and propagated everywhere |
session_id |
Groups multi-turn history | Stable across a conversation |
request_id |
Per-service request trace | Useful when correlation propagation breaks |
deployment_id |
Identifies code or prompt rollout version | Critical for rollback analysis |
retrieval_snapshot_id |
Identifies the exact KB or index snapshot | Critical in retrieval leaks |
container_id |
Ties events to warm runtime state | Important for Lambda state contamination bugs |
PII-Safe Security Event Schema
{
"timestamp": "2026-03-24T18:04:51.221Z",
"event_type": "guardrail_decision",
"severity_hint": "SEV-2",
"correlation_id": "corr_7c22f2d9",
"session_id": "sess_91fd7e",
"request_id": "req_0be44d",
"deployment_id": "prompt-canary-2026-03-24-01",
"customer_id_hash": "sha256:91c2...",
"intent": "recommendation",
"model": {
"provider": "bedrock",
"model_id": "anthropic.claude-3-5-sonnet",
"region": "us-east-1"
},
"guardrails": [
{
"stage": "pii_filter",
"action": "modify",
"reason_code": "OBFUSCATED_EMAIL_PATTERN",
"latency_ms": 4.2
}
],
"request_hash": "sha256:8aa1...",
"response_hash": "sha256:f102...",
"feature_flags": {
"order_intent_enabled": true,
"model_tier": "primary",
"prompt_version": "rec-v37"
}
}
Restricted Evidence Manifest
{
"incident_id": "inc_2026_03_24_017",
"created_at": "2026-03-24T18:09:00Z",
"severity": "SEV-1",
"artifacts": [
{
"type": "transcript",
"s3_key": "evidence/inc_2026_03_24_017/transcript_corr_7c22f2d9.json",
"sha256": "96c0..."
},
{
"type": "prompt_bundle",
"s3_key": "evidence/inc_2026_03_24_017/prompt_bundle.json",
"sha256": "d487..."
},
{
"type": "retrieval_snapshot",
"s3_key": "evidence/inc_2026_03_24_017/retrieval_docs.json",
"sha256": "ef34..."
}
],
"approved_access": [
"security_lead",
"privacy_officer"
]
}
Chain of Custody
flowchart TD
A[Alert or user report] --> B[Incident declared]
B --> C[Evidence collector snapshots artifacts]
C --> D[Hash each artifact]
D --> E[Write to S3 evidence bucket with Object Lock]
E --> F[Store manifest in DynamoDB]
F --> G[Restricted access via IAM role + MFA]
G --> H[Every evidence read is logged to CloudTrail]
This design matters because incident response is often challenged later by legal, compliance, or postmortem review. If evidence can be edited after the fact, the investigation is not trustworthy.
Implementation Details
1. Correlation ID Propagation
The API edge generates a correlation_id once. Every downstream service receives it and writes it to logs and traces.
def build_request_context(event: dict) -> dict:
headers = event.get("headers", {})
correlation_id = headers.get("x-correlation-id") or new_correlation_id()
return {
"correlation_id": correlation_id,
"session_id": event["session_id"],
"customer_id_hash": sha256(event["customer_id"]),
"deployment_id": os.environ["DEPLOYMENT_ID"],
"prompt_version": os.environ["PROMPT_VERSION"],
"container_id": os.environ.get("AWS_LAMBDA_LOG_STREAM_NAME", "unknown"),
}
def emit_security_event(event_type: str, ctx: dict, detail: dict) -> None:
payload = {
"timestamp": utc_now_iso(),
"event_type": event_type,
"correlation_id": ctx["correlation_id"],
"session_id": ctx["session_id"],
"customer_id_hash": ctx["customer_id_hash"],
"deployment_id": ctx["deployment_id"],
"detail": detail,
}
logger.info(json.dumps(payload))
2. Automated Incident Creation
CloudWatch alarms and custom detectors publish a normalized event to EventBridge:
{
"source": "mangaassist.security",
"detail-type": "security-anomaly",
"detail": {
"signal": "pii-in-response-rate",
"severity_hint": "SEV-2",
"intent": "recommendation",
"threshold": 0.005,
"observed": 0.0081,
"deployment_id": "prompt-canary-2026-03-24-01"
}
}
The Incident Router Lambda then:
- deduplicates similar alerts
- enriches with deployment and intent metadata
- computes initial severity
- starts a Step Functions workflow
- optionally triggers automated containment for known playbooks
3. Containment Through Config, Not Code
Emergency containment is pre-wired in AppConfig:
{
"intent_controls": {
"order_tracking": { "enabled": true, "mode": "dynamic" },
"return_request": { "enabled": true, "mode": "dynamic" },
"recommendation": { "enabled": true, "mode": "dynamic" }
},
"emergency_controls": {
"global_static_fallback": false,
"force_safe_model_tier": false,
"disable_personalized_context": false
}
}
Typical containment actions:
- disable one intent
- disable personalized context injection
- force static fallback for high-risk routes
- switch from canary prompt to stable prompt
- switch from primary model tier to safer fallback tier
4. Evidence Collector
def preserve_incident_evidence(incident: dict) -> dict:
correlation_ids = incident["correlation_ids"]
artifacts = []
transcripts = fetch_raw_transcripts(correlation_ids)
artifacts.append(write_locked_artifact(incident, "transcripts.json", transcripts))
prompt_bundle = fetch_prompt_bundle(incident["deployment_id"])
artifacts.append(write_locked_artifact(incident, "prompt_bundle.json", prompt_bundle))
retrieval_docs = fetch_retrieval_documents(correlation_ids)
artifacts.append(write_locked_artifact(incident, "retrieval_docs.json", retrieval_docs))
config_snapshot = fetch_appconfig_snapshot()
artifacts.append(write_locked_artifact(incident, "config_snapshot.json", config_snapshot))
manifest = {
"incident_id": incident["incident_id"],
"created_at": utc_now_iso(),
"artifacts": artifacts,
}
put_manifest(manifest)
return manifest
5. Investigation Queries
CloudWatch Logs Insights example for a prompt canary incident:
fields @timestamp, correlation_id, detail.reason_code, deployment_id, intent
| filter event_type = "guardrail_decision"
| filter detail.reason_code = "OBFUSCATED_EMAIL_PATTERN"
| stats count() as hits by deployment_id, intent, bin(5m)
| sort by bin(5m) desc
Athena example for blast radius estimation:
SELECT
deployment_id,
count(DISTINCT session_id) AS affected_sessions,
count(*) AS affected_events
FROM forensic_events
WHERE event_date BETWEEN DATE '2026-03-21' AND DATE '2026-03-24'
AND event_type = 'cross_session_leak_suspected'
GROUP BY deployment_id;
Investigation Decision Tree
flowchart TD
A[Unsafe or suspicious response observed] --> B{Was the bad content in delivered response?}
B -->|No| C[Near miss only<br/>guardrail caught or modified it]
B -->|Yes| D[User-visible incident]
C --> E{Where was it introduced?}
D --> E
E -->|In retrieved chunk| F[Retrieval or ACL issue]
E -->|In backend payload| G[Service authorization or data bug]
E -->|In prompt history| H[Memory or session isolation issue]
E -->|Only in model output| I[FM hallucination or prompt-induced generation]
E -->|Not reproducible| J[Need wider sampling and shadow replay]
F --> K[Check document IDs, metadata, index snapshot, ACL tags]
G --> L[Check service auth, caller identity, upstream payloads]
H --> M[Check session boundaries, cache keys, container state]
I --> N[Check prompt version, model version, guardrail thresholds]
This decision tree prevents teams from blaming the model too early. In practice, a large fraction of "LLM incidents" are application-state or retrieval-governance issues.
Detailed Scenarios
The scenarios below are written the way a strong incident responder would explain them in design review or interviews:
- what happened
- how it was detected
- how containment was performed
- what evidence proved root cause
- what changed afterward
- what follow-up questions usually come next
Scenario 1: Prompt Canary Introduced an Obfuscated PII Leak
Context
The recommendation prompt was updated to sound more community-aware and conversational. The change went through offline evaluation and a 10 percent canary, but it created a subtle failure mode:
- the new prompt encouraged the FM to mention how readers could "follow creators"
- the model sometimes responded with invented contact details
- direct email patterns were mostly blocked
- obfuscated formats like
author name [at] publisher [dot] comslipped through
Detection
pii-in-response-ratealarm crossed the0.5%threshold for canary trafficpii_near_miss_ratealso spiked, which meant the FM was generating more PII-like content even when guardrails caught some of it- the spike was isolated to
prompt_version = rec-v37-canary
Sequence of Events
sequenceDiagram
participant User
participant Orch as Orchestrator
participant Prompt as Prompt v37 Canary
participant LLM as Bedrock Model
participant PII as PII Filter
participant Metrics as CloudWatch
participant Oncall as On-call Engineer
participant Config as AppConfig
User->>Orch: "Recommend manga by indie creators"
Orch->>Prompt: Build prompt with canary version
Prompt->>LLM: Prompt asks for richer creator context
LLM-->>PII: Response includes obfuscated contact pattern
PII-->>Metrics: Partial misses accumulate
Metrics-->>Oncall: PII rate alarm fires
Oncall->>Config: Roll back prompt canary
Config-->>Orch: Stable prompt restored
Containment
Containment took 17 minutes:
- rolled back the canary prompt through AppConfig
- verified
pii-in-response-ratereturned to baseline within two metric windows - disabled the specific creator-community enrichment clause while investigation continued
This is exactly why prompt versions are treated as deployable, reversible artifacts.
Forensic Investigation
The investigation used four evidence sources:
- Prompt diff:
rec-v36vsrec-v37 - Guardrail decision logs: showed the miss pattern only on obfuscated formats
- Canary segmentation: proved the issue was isolated to the new prompt
- Transcript replay: confirmed reproducibility against the same prompt bundle
Key forensic clue:
- the unsafe string was not present in retrieval data
- it was not present in any backend payload
- it appeared only in FM output after the prompt wording changed
That ruled out retrieval leaks and upstream data exposure.
Root Cause
The prompt added an instruction that increased the probability of the FM generating contact-like information. The PII filter was tuned for direct email syntax, not obfuscated variants.
Remediation
- removed the risky prompt clause
- expanded the PII detector with obfuscated email regexes
- added adversarial tests specifically for creator-contact generation
- added a pre-prod review question: "Does this prompt create a new path for generating sensitive information?"
Verification
- replayed the affected transcripts against stable and fixed prompt versions
- ran adversarial suite with direct and obfuscated PII patterns
- monitored
pii_near_miss_ratefor 24 hours
Follow-Up Questions and Deep-Dive Answers
Q1. Why did offline testing miss this?
Offline tests only checked whether direct PII leaked in final responses. They did not test induction risk: whether a prompt instruction could cause the model to invent sensitive-looking content that the downstream detector might miss. After this incident, the prompt evaluation suite added:
- induced PII generation probes
- obfuscated PII variants
- near-miss metrics, not just delivered-leak metrics
Q2. How did you prove it was the prompt and not a model update?
The strongest evidence was segmentation:
- only
rec-v37-canarytraffic showed the spike - same model, same region, same backend context under
rec-v36did not - replaying identical transcripts with the old prompt removed the issue
That is the difference between correlation and causation in an incident review.
Q3. Why not block every response that contains anything PII-like?
Because the recommendation flow still needs to be usable. Blanket blocking would create high false positives and degrade customer experience. The right policy is layered:
- redact when the content is clearly removable
- block when the response is unsafe or materially untrustworthy
- track near misses because they are leading indicators of prompt risk
Scenario 2: Response-Size Spike Looked Like Exfiltration but Was Legitimate
Context
A CloudWatch anomaly detector flagged that FAQ responses were about 3x longer than baseline. Long responses are suspicious because they can indicate:
- prompt injection causing policy dumps
- retrieval over-sharing
- internal document leakage
- scripted exfiltration attempts
But long responses alone do not prove malicious behavior.
Sequence of Events
sequenceDiagram
participant Detector as Response Size Detector
participant Router as Incident Router
participant Analyst as On-call Analyst
participant Logs as Logs Insights
participant KB as Knowledge Base
participant User
Detector->>Router: FAQ length anomaly
Router->>Analyst: Create SEV-3 investigation
Analyst->>Logs: Pull top long responses
Analyst->>KB: Inspect retrieved source docs
Logs-->>Analyst: One user, repeated detailed policy questions
KB-->>Analyst: Only customer-facing policy docs retrieved
Analyst-->>Router: Downgrade to SEV-4 false alarm
Investigation Details
The analyst checked:
- Which intents were affected? Only FAQ, not recommendations or orders.
- Which users were involved? One authenticated long-tenure customer.
- Which documents were retrieved? Only customer-facing return policy documents.
- Were internal-only chunks retrieved? No.
- Was the user escalating question specificity? Yes, each question added more edge-case detail.
The response growth was explained by the RAG system doing exactly what it was designed to do: return more policy detail as the question became more specific.
Why This Still Mattered
Even though it was not a breach, the alert was useful because it showed a blind spot:
- anomaly detection was not intent-aware enough
- FAQ traffic has a very different length distribution from recommendation traffic
- response length without source-scope context generates too many false alarms
Fixes
- changed the detector to baseline by intent
- incorporated retrieved document count and source audience into the anomaly score
- capped FAQ response length at 300 tokens and linked out for exhaustive detail
Follow-Up Questions and Deep-Dive Answers
Q1. How do you avoid alert fatigue without missing real exfiltration?
You do not remove the alert. You improve its context. MangaAssist changed from a one-dimensional metric to a richer detector:
- response length by intent
- retrieved chunk count
- source audience labels
- per-user burst behavior
That preserved sensitivity while reducing noise.
Q2. Why not classify this as not-an-incident immediately?
Because long output is a real exfiltration signal. The right posture is "investigate first, downgrade with evidence." For security anomalies, false positives are acceptable; false dismissals are not.
Q3. What evidence would have escalated this to a true breach?
Any of the following:
- internal-only document IDs in retrieval logs
- multiple sessions showing the same dump pattern
- prompt text indicating successful instruction override
- backend payloads containing over-authorized data
Scenario 3: Cross-Session Order Data Leak Caused by Lambda Warm-Container State
Context
A user reported that the bot mentioned someone else's order. This was treated as SEV-1 immediately because cross-user data exposure is a breach until disproven.
Sequence of Events
sequenceDiagram
participant UserA
participant UserB
participant Lambda as Warm Lambda Container
participant Memory as In-Memory Global State
participant Order as Order Service
participant LLM
participant Support
participant IC as Incident Commander
UserA->>Lambda: Prior session request
Lambda->>Memory: Store summarized turn in global list
UserB->>Lambda: New order question on reused container
Lambda->>Order: Fetch UserB order
Lambda->>LLM: Prompt includes UserB order + stale UserA summary
LLM-->>UserB: Mixed response with foreign order data
UserB->>Support: "Bot told me about someone else's order"
Support->>IC: Escalate immediately
Immediate Containment
Within 8 minutes:
- disabled all order-related intents through AppConfig
- forced a static fallback to the existing "Your Orders" page
- opened a SEV-1 incident channel and pulled privacy and security in immediately
This was a textbook example of why high-risk intents need pre-built kill switches.
Forensic Investigation
The investigation did not start by blaming the model. It traced prompt assembly.
Evidence chain:
- Delivered transcript contained an extra order reference.
- Order service logs showed the backend returned only the correct user's order.
- Prompt assembly snapshot showed an extra conversation-history summary.
- Container ID analysis showed that the same Lambda container handled another customer's prior session.
- Code inspection found mutable global state used for conversation history.
Problematic pattern:
# Bug: mutable global state survives warm Lambda invocations.
conversation_history = []
def handler(event, context):
conversation_history.append(event["new_turn"])
prompt = build_prompt(conversation_history)
return generate_response(prompt)
Corrected pattern:
def handler(event, context):
conversation_history = load_history_for_session(event["session_id"])
conversation_history.append(event["new_turn"])
prompt = build_prompt(conversation_history)
return generate_response(prompt)
Root Cause
This was not a model issue. It was a session-isolation bug caused by warm-container reuse combined with mutable global state.
Blast Radius Analysis
Blast radius could not rely on the single reported session. The team queried:
- all invocations sharing the affected container pattern
- all prompts assembled with mixed
customer_id_hashevidence - all order intents during the last three days
The result identified 23 sessions with potential contamination.
Remediation
- fixed the handler to make state request-scoped
- scanned all Lambda functions for mutable globals
- added CI linting to block mutable module-level state in request handlers
- added a two-user integration test that forces warm container reuse patterns
Verification
- replayed dual-session tests
- verified no cross-user content in sampled prompt assemblies
- re-enabled order intents only after the evidence review passed
Follow-Up Questions and Deep-Dive Answers
Q1. How did you prove the order service was not the source of the leak?
By comparing three artifacts:
- upstream order-service response
- prompt assembly payload
- delivered FM output
The extra order reference appeared in prompt history but not in the authorized service response. That is conclusive evidence that the leak happened in the application layer before generation.
Q2. How did you estimate blast radius if only one user reported it?
We looked for the signature of the failure, not just more complaints:
- repeated container IDs
- mismatched
customer_id_hashvalues inside one prompt assembly - order intents executed on warm containers in the affected deployment window
That is the right way to quantify rare but serious isolation bugs.
Q3. Why disable all order intents instead of just one endpoint?
Because the trust boundary was unclear at first. When customer data separation is in doubt, containment should be broader than the suspected code path. Once isolation was re-verified, traffic could be restored safely.
Scenario 4: Internal Returns Policy Was Exposed Through a Retrieval ACL Bug
Context
A support agent escalated a transcript where the chatbot explained internal fraud-review thresholds that customers should never see. This looked like a model hallucination at first because the answer was fluent and specific. It turned out to be a retrieval governance bug.
Sequence of Events
sequenceDiagram
participant User
participant Orch as Orchestrator
participant Search as RAG Search
participant KB as OpenSearch Index
participant LLM
participant Guard as Guardrails
participant Support
participant Sec as Security Lead
User->>Orch: "Why was my return denied?"
Orch->>Search: Query KB with return-policy intent
Search->>KB: Retrieve top documents
KB-->>Search: Includes internal SOP chunk tagged incorrectly
Search-->>LLM: Customer docs + internal chunk
LLM-->>Guard: Response includes internal thresholds
Guard-->>User: Passes because content is not toxic or PII
User->>Support: Questions internal rules shown by bot
Support->>Sec: Escalate retrieval leak
Why This Incident Is Important
This is a classic LLM-era incident:
- infrastructure is healthy
- the model is technically grounded in retrieved data
- the answer is factually correct
- the failure is authorization, not generation quality
Traditional app monitoring often misses this because nothing "breaks."
Forensic Investigation
The team checked:
- retrieval document IDs from the transcript's
correlation_id - index metadata for each chunk
- source S3 object metadata and ingestion logs
- access-control tags applied during chunking
They found:
- an internal returns SOP was ingested into the customer-facing index
- the
audience=internalmetadata tag was missing on a batch of chunks - the reranker boosted the internal chunk because it contained precise denial criteria
Root Cause
The ingestion job treated missing audience metadata as public by default. That was a governance flaw. Missing classification should have failed closed, not open.
Containment
- removed the affected chunk IDs from the index
- disabled returns-policy retrieval while rebuilding the filtered snapshot
- patched ingestion so unclassified documents are quarantined instead of indexed
Remediation
- changed metadata policy from default-public to default-quarantine
- added a pre-index validator requiring
audience,data_classification, andowner - added a retrieval guardrail that blocks non-customer audiences at serve time
- added canary tests that intentionally probe for internal policy exposure
Verification
- rebuilt the index from clean source manifests
- replayed the affected query set
- sampled retrieval results for all returns-related intents
Follow-Up Questions and Deep-Dive Answers
Q1. How did you know this was retrieval leakage and not model memorization?
Because the exact internal terminology was present in the retrieved chunk IDs tied to the incident's correlation_id. If the content is in the retrieval trace, you have concrete provenance. Model memorization is a fallback hypothesis only when the content is absent from prompt, retrieval, and backend data.
Q2. Why did the guardrails not stop this?
The existing guardrails focused on toxicity, competitor mentions, pricing, PII, and scope. They did not enforce document audience labels. This incident showed that content safety and authorization safety are different layers.
Q3. What is the long-term fix: better prompts or better retrieval governance?
Retrieval governance. Prompts can reduce risk, but they should not be the primary control for authorization. The durable fix is:
- fail-closed metadata handling
- index-time validation
- serve-time audience enforcement
- forensic retrieval traces for every answer
First 15 Minutes: Practical Runbook
When a real incident hits, the first mistake teams make is doing too much analysis before containment. MangaAssist uses the following first-15-minute checklist.
| Minute Window | Action | Goal |
|---|---|---|
| 0-5 | Confirm signal, assign Incident Commander, open incident channel | Avoid ownership confusion |
| 0-5 | Snapshot volatile evidence immediately | Preserve prompt versions, config, transcripts |
| 5-10 | Trigger kill switch or rollback if impact could be user-visible | Stop ongoing harm |
| 5-10 | Classify severity conservatively | Under-classification is more dangerous |
| 10-15 | Pull initial blast-radius query and identify affected intent or deployment | Bound scope early |
Minimal First-15-Minute Questions
- Is this user-visible?
- Is another user's data involved?
- Did a recent prompt, model, or config change correlate with onset?
- Can we contain by config instead of code?
- What evidence might disappear if we wait?
First Hour: Forensic Checklist
Within the first hour, the team should answer:
-
Where was the bad content introduced? Prompt, retrieval, backend payload, conversation memory, or model-only generation.
-
What is the blast radius? How many sessions, intents, users, and data elements are potentially affected.
-
What version boundary explains the incident? Deployment ID, prompt version, KB snapshot, model tier, or region.
-
What prevents recurrence right now? Temporary block, rollback, stricter guardrail, index quarantine, or route disablement.
-
What external obligations exist? Privacy notification, customer support messaging, or legal review.
Post-Incident Review Template
## Incident Review: [TITLE]
### 1. Executive Summary
- Severity:
- Start time:
- End time:
- User impact:
- Data impact:
- Detection source:
### 2. Exact Timeline
| Time | Event | Actor | Evidence |
|---|---|---|---|
| T+0 | Alert or report received | Detector / user | Incident envelope |
| T+Xm | Containment started | Incident Commander | AppConfig change ID |
| T+Xm | Evidence snapshotted | Security lead | Manifest ID |
| T+Xh | Root cause identified | Service owner | Query / code diff |
| T+Xh | Fix deployed | Engineering | Deployment ID |
| T+Xh | Recovery approved | Incident Commander | Verification report |
### 3. Blast Radius
- Sessions affected:
- Users affected:
- Intents affected:
- Data classes affected:
- Confidence level of estimate:
### 4. Technical Root Cause
- Immediate cause:
- Contributing factors:
- Why existing controls failed:
### 5. Detection Analysis
- What signal fired:
- What did not fire but should have:
- Mean time to detect:
- Mean time to contain:
### 6. Corrective Actions
| Action | Owner | Due Date | Status |
|---|---|---|---|
| Add regression test | | | |
| Add detector or guardrail | | | |
| Update runbook | | | |
| Review architecture decision | | | |
### 7. Evidence Manifest
- S3 manifest path:
- Query notebooks:
- Relevant transcript IDs:
- Config snapshot:
### 8. Lessons Learned
- What changed in engineering practice:
- What changed in architecture:
- What changed in monitoring:
Architecture Decisions and Tradeoffs
| Decision | Choice | Why | Tradeoff |
|---|---|---|---|
| Raw transcript handling | Raw content only in restricted evidence bucket | Enables deep investigations without exposing raw PII in general logs | More operational complexity |
| Evidence storage | S3 Object Lock + KMS | Tamper-resistant and compliance-friendly | Harder to correct logging mistakes |
| Containment path | AppConfig kill switches | Fast, low-risk rollback | Requires up-front design for each intent |
| Retrieval governance | Fail closed on missing metadata | Prevents accidental public exposure | More ingest rejects and operational overhead |
| Blast-radius analysis | Correlation-ID centric traces | Faster root cause and user impact analysis | Requires disciplined propagation across services |
| False-positive posture | Investigate first, downgrade with evidence | Safer for security anomalies | More analyst time spent on benign cases |
Key Lessons
-
Prompts are production code from a security perspective. If a prompt can change the probability of unsafe generation, it belongs in the same rollback and evidence system as application code.
-
Most serious LLM incidents are cross-layer incidents. Root cause often sits in the seams between orchestration, retrieval, memory, and model behavior.
-
Forensics requires provenance, not just logs. Knowing the final answer is not enough. You need prompt, retrieval, backend, and guardrail lineage.
-
Containment speed depends on design done before the incident. Teams that need to invent a kill switch during a SEV-1 are already late.
-
Authorization bugs in RAG systems are security incidents even when the model is factually correct. Correct content can still be unauthorized content.
-
Near misses are leading indicators. A detector that catches generated PII before delivery is telling you the system is drifting toward a real incident.
Cross-References
- Prompt injection incidents: 01-prompt-injection-defense.md
- PII detection and protection: 02-pii-protection-data-privacy.md
- Guardrail architecture and failure modes: 03-guardrails-pipeline-deep-dive.md
- Abuse detection and moderation: 04-content-moderation-abuse-prevention.md
- Encryption and key management: 08-encryption-key-management.md
- Debugging playbooks: ../Debugging/03-debugging-scenarios.md
- Application logging architecture: ../Debugging/02-application-logging.md