5. Incident Response and Security Forensics

Incident response for an LLM product is not just "watch for 500s and roll back bad code." MangaAssist can fail in ways that are subtle, distributed, and partially probabilistic:

a prompt change can open a new path for PII generation
a retrieval bug can surface an internal document without any infrastructure breach
a model regression can change behavior without a code diff
a state isolation bug can leak one user's data into another user's session
an anomaly can look like exfiltration but be a legitimate user asking for detail

For that reason, the incident program needs three things at the same time:

very fast containment
defensible forensic evidence
enough telemetry to separate model behavior from application bugs, retrieval bugs, and infrastructure issues

This chapter expands the original material into a full operating model: lifecycle, HLD, LLD, data flow, evidence design, runbooks, scenario walkthroughs, and follow-up questions with deep-dive answers.

Why Incident Response Is Harder for LLM Systems

Failure Mode	Why It Is Hard to Detect	Why It Is Hard to Prove	Evidence Required
Prompt-induced PII generation	It may affect only a narrow slice of prompts	The unsafe output may be generated, blocked, or redacted before users see it	Prompt version, FM output hash, guardrail decisions, redaction events
Cross-session leakage	It can happen only on warm containers or rare cache paths	User report may be the first signal	Container ID, session IDs, prompt assembly payload, memory source trace
Retrieval ACL failure	Output may look factually correct, just over-authorized	Need to prove which chunk was retrieved and why	Retrieval document IDs, metadata, index snapshot
Model behavior shift	Same code, same prompt, different generation behavior	Often mistaken for application regressions	Model ID, region, model version, shadow outputs, canary traces
Suspicious long responses	Looks like exfiltration or dumping	Could be valid detailed help	Intent, retrieval chunk count, response token count, policy source list

The key design principle is this:

Every customer-visible response must be reconstructable as a chain of evidence: user request -> auth context -> routing -> retrieval -> prompt assembly -> model output -> guardrail actions -> delivered response.

Without that chain, you can detect incidents, but you cannot investigate them rigorously.

Incident Response Objectives

Objective	What Good Looks Like	Why It Matters
Fast containment	Kill switch or config rollback in under 15 minutes for SEV-1	User harm grows while unsafe traffic continues
Evidence preservation	Relevant artifacts snapshotted before mutation or expiry	Prompt versions, KB state, and logs can change fast
Accurate classification	Real breach vs false positive vs quality issue is separated quickly	Overreacting causes avoidable outages; underreacting causes harm
Blast-radius assessment	Affected sessions, intents, and data elements can be enumerated	Notification, remediation, and legal review depend on scope
Systemic prevention	Root cause leads to code, process, and test changes	Repeating the same incident is an engineering failure

Target operating thresholds:

SEV-1: contain in under 15 minutes
SEV-2: contain in under 60 minutes
SEV-3: triage and begin evidence preservation in under 4 hours
SEV-4: investigate in business hours unless a trigger escalates

Lifecycle and Command Model

Incident Lifecycle

stateDiagram-v2
    [*] --> Detect: Alert, user report, audit finding
    Detect --> Declare: Confirm incident or security anomaly
    Declare --> Contain: Kill switch, rollback, traffic shaping
    Declare --> Preserve: Snapshot evidence immediately
    Contain --> Investigate
    Preserve --> Investigate
    Investigate --> Eradicate: Remove root cause
    Eradicate --> Recover: Restore traffic safely
    Recover --> Monitor: Verify no recurrence
    Monitor --> Review: Post-incident review
    Review --> [*]: Action items tracked to closure

Parallel Workstreams During an Incident

In a serious incident, these happen in parallel, not sequentially:

Containment track: stop further harm
Forensics track: preserve evidence before it disappears
Comms track: notify on-call, security, legal, privacy, and support
Decision track: decide whether this is a breach, quality issue, abuse attempt, or false alarm

Response Roles

Role	Primary Responsibility	Typical Owner
Incident Commander	Owns severity, priorities, timeline, and recovery decision	On-call engineering lead
Security Lead	Owns breach assessment and evidence integrity	Security engineer
Service Owner	Owns technical diagnosis and remediation	MangaAssist backend lead
Data Protection / Privacy	Owns legal notification requirements	Privacy or compliance lead
Scribe	Maintains decision log and exact timeline	Secondary on-call
Support Liaison	Coordinates user-facing support impact	Customer support lead

The most common anti-pattern is having everyone investigate while nobody owns containment. MangaAssist explicitly assigns containment ownership to the Incident Commander from minute one.

Severity Matrix and Escalation Rules

Severity	Definition	Containment SLA	Typical Examples
`SEV-1`	Confirmed or strongly suspected cross-user data exposure, active breach, or unsafe output at scale	`< 15 min`	Cross-session order leak, internal policy exposure with sensitive thresholds, mass PII leakage
`SEV-2`	Production security regression with limited scope or partial exposure	`< 60 min`	Prompt canary leaking obfuscated email patterns, guardrail bypass for one intent
`SEV-3`	Security anomaly needing investigation, unclear exploitability	`< 4 h` to declare path	Response-size anomaly, injection spike, elevated block rate
`SEV-4`	Low-impact or single-session issue with no evidence of scale	`< 72 h`	One odd transcript, isolated policy mismatch, monitoring false positive

Escalation Triggers

A lower severity becomes higher if any of the following is true:

multiple independent sessions show the same pattern
a user-visible leak involves another customer's data
the issue is trending upward after a rollout
the unsafe behavior affects a high-risk intent like order tracking or returns
external disclosure occurs on social media, support escalations, or bug bounty channels

HLD: Incident Detection, Containment, and Forensics Plane

flowchart TB
    subgraph Runtime["MangaAssist Runtime"]
        U[User]
        G[API Gateway]
        O[Orchestrator]
        R[Retriever / KB]
        L[LLM on Bedrock]
        GR[Guardrails Pipeline]
        M[Conversation Memory]
        B[Backend Services<br/>Orders, Returns, Catalog]
        U --> G --> O
        O --> R
        O --> M
        O --> B
        O --> L
        L --> GR
        GR --> G
    end

    subgraph Observability["Observability and Control"]
        Logs[CloudWatch Logs]
        Metrics[CloudWatch Metrics / Alarms]
        Trail[CloudTrail]
        Events[EventBridge]
        Config[AWS AppConfig]
        Pager[SNS / PagerDuty / Slack]
    end

    subgraph Forensics["Incident Response and Forensics"]
        Router[Incident Router]
        SFN[Step Functions War Room Workflow]
        Evidence[Evidence Collector]
        Index[DynamoDB Incident Index]
        Audit[S3 Audit Bucket<br/>Object Lock + SSE-KMS]
        Athena[Athena / Logs Insights]
    end

    O --> Logs
    R --> Logs
    L --> Logs
    GR --> Logs
    G --> Metrics
    O --> Metrics
    Trail --> Events
    Logs --> Events
    Metrics --> Events
    Events --> Router --> SFN
    SFN --> Pager
    SFN --> Config
    SFN --> Evidence
    Evidence --> Index
    Evidence --> Audit
    Athena --> Audit

HLD Principles

Runtime and response plane are separate. Incidents must still be manageable when the application plane is degraded.
Containment is config-first. AppConfig kill switches are faster and safer than emergency redeploys.
Evidence is immutable. The investigation uses append-only artifacts, not mutable app logs alone.
Every incident gets a data package. Even false alarms produce a minimal evidence package so tuning is auditable.

End-to-End Incident Data Flow

sequenceDiagram
    participant User
    participant Gateway as API Gateway
    participant Orch as Orchestrator
    participant Guard as Guardrails
    participant Obs as Logs and Metrics
    participant EB as EventBridge
    participant IR as Incident Workflow
    participant CFG as AppConfig
    participant EV as Evidence Collector
    participant S3 as Immutable Audit Bucket
    participant IC as Incident Commander

    User->>Gateway: Chat request
    Gateway->>Orch: Authenticated message + session context
    Orch->>Guard: Candidate response + metadata
    Guard-->>Orch: pass / modify / block
    Orch->>Obs: Structured events, hashes, latency, versions
    Obs->>EB: Alarm or anomaly event
    EB->>IR: Create incident workflow
    IR->>IC: Page on-call and create incident channel
    IR->>CFG: Optional automated containment
    IR->>EV: Snapshot prompt version, retrieval docs, transcripts, config
    EV->>S3: Store immutable evidence bundle
    EV-->>IC: Evidence manifest and first triage summary

Important detail: evidence preservation begins as soon as an incident is declared, even before root cause is known. That prevents losing volatile artifacts such as canary prompt versions, temporary feature flags, or short-lived transcripts.

LLD: Core Response Components

flowchart LR
    subgraph Detection["Detection Layer"]
        A1[CloudWatch Alarm<br/>threshold or anomaly]
        A2[Custom Detector Lambda]
        A3[Support Ticket Ingest]
        A4[GuardDuty / CloudTrail Findings]
    end

    subgraph Routing["Incident Routing"]
        B1[EventBridge Rules]
        B2[Incident Router Lambda]
        B3[Severity Calculator]
    end

    subgraph Actions["Containment and Preservation"]
        C1[AppConfig Kill Switch API]
        C2[Evidence Collector Lambda]
        C3[Timeline Builder]
        C4[Pager and Slack Notifier]
    end

    subgraph Storage["Forensic Storage"]
        D1[DynamoDB Incident Table]
        D2[S3 Evidence Bucket<br/>Object Lock]
        D3[CloudWatch Logs]
        D4[CloudTrail Archive]
    end

    subgraph Query["Investigation"]
        E1[CloudWatch Logs Insights]
        E2[Athena]
        E3[Security Dashboard]
    end

    A1 --> B1
    A2 --> B1
    A3 --> B1
    A4 --> B1
    B1 --> B2 --> B3
    B3 --> C1
    B3 --> C2
    B3 --> C3
    B3 --> C4
    C2 --> D1
    C2 --> D2
    C3 --> D1
    D3 --> E1
    D2 --> E2
    D1 --> E3
    D4 --> E2

LLD Component Table

Component	Implementation	What It Stores or Does	Failure It Helps Diagnose
Incident Router	Lambda behind EventBridge	Normalizes alerts into one incident envelope	Duplicate or fragmented alerting
Severity Calculator	Deterministic rules + overrides	Maps signal type, intent, and data sensitivity to severity	Slow or inconsistent triage
Kill Switch Controller	AppConfig update API	Disable intents, swap model tiers, force static fallback	Delayed containment
Evidence Collector	Lambda + Step Functions	Pulls transcripts, prompt versions, KB docs, config, hashes	Missing volatile evidence
Timeline Builder	Lambda over logs	Builds event-by-event chronology from correlation IDs	Confusing incident timelines
Incident Table	DynamoDB	Incident metadata, manifest, owners, actions	Lack of central status
Immutable Evidence Bucket	S3 with Object Lock + KMS	Forensic artifacts and signed manifests	Tampering or accidental deletion

Evidence Model and Chain of Custody

What Must Be Captured for Every Serious Incident

For SEV-1 and SEV-2, MangaAssist preserves:

user message hash and response hash
raw transcript in restricted evidence storage
correlation_id, session_id, customer_id_hash
prompt template version and resolved prompt text
retrieval document IDs and metadata
model ID, region, inference timestamp, and feature-flag state
guardrail stage outcomes and any redaction or modification metadata
backend call summaries for order, return, and catalog services
deployment version, Lambda container ID, and request ID where applicable

Correlation Keys

Field	Purpose	Notes
`correlation_id`	Ties the end-to-end request together	Generated at API edge and propagated everywhere
`session_id`	Groups multi-turn history	Stable across a conversation
`request_id`	Per-service request trace	Useful when correlation propagation breaks
`deployment_id`	Identifies code or prompt rollout version	Critical for rollback analysis
`retrieval_snapshot_id`	Identifies the exact KB or index snapshot	Critical in retrieval leaks
`container_id`	Ties events to warm runtime state	Important for Lambda state contamination bugs

PII-Safe Security Event Schema

{
  "timestamp": "2026-03-24T18:04:51.221Z",
  "event_type": "guardrail_decision",
  "severity_hint": "SEV-2",
  "correlation_id": "corr_7c22f2d9",
  "session_id": "sess_91fd7e",
  "request_id": "req_0be44d",
  "deployment_id": "prompt-canary-2026-03-24-01",
  "customer_id_hash": "sha256:91c2...",
  "intent": "recommendation",
  "model": {
    "provider": "bedrock",
    "model_id": "anthropic.claude-3-5-sonnet",
    "region": "us-east-1"
  },
  "guardrails": [
    {
      "stage": "pii_filter",
      "action": "modify",
      "reason_code": "OBFUSCATED_EMAIL_PATTERN",
      "latency_ms": 4.2
    }
  ],
  "request_hash": "sha256:8aa1...",
  "response_hash": "sha256:f102...",
  "feature_flags": {
    "order_intent_enabled": true,
    "model_tier": "primary",
    "prompt_version": "rec-v37"
  }
}

Restricted Evidence Manifest

{
  "incident_id": "inc_2026_03_24_017",
  "created_at": "2026-03-24T18:09:00Z",
  "severity": "SEV-1",
  "artifacts": [
    {
      "type": "transcript",
      "s3_key": "evidence/inc_2026_03_24_017/transcript_corr_7c22f2d9.json",
      "sha256": "96c0..."
    },
    {
      "type": "prompt_bundle",
      "s3_key": "evidence/inc_2026_03_24_017/prompt_bundle.json",
      "sha256": "d487..."
    },
    {
      "type": "retrieval_snapshot",
      "s3_key": "evidence/inc_2026_03_24_017/retrieval_docs.json",
      "sha256": "ef34..."
    }
  ],
  "approved_access": [
    "security_lead",
    "privacy_officer"
  ]
}

Chain of Custody

flowchart TD
    A[Alert or user report] --> B[Incident declared]
    B --> C[Evidence collector snapshots artifacts]
    C --> D[Hash each artifact]
    D --> E[Write to S3 evidence bucket with Object Lock]
    E --> F[Store manifest in DynamoDB]
    F --> G[Restricted access via IAM role + MFA]
    G --> H[Every evidence read is logged to CloudTrail]

This design matters because incident response is often challenged later by legal, compliance, or postmortem review. If evidence can be edited after the fact, the investigation is not trustworthy.

Implementation Details

1. Correlation ID Propagation

The API edge generates a correlation_id once. Every downstream service receives it and writes it to logs and traces.

def build_request_context(event: dict) -> dict:
    headers = event.get("headers", {})
    correlation_id = headers.get("x-correlation-id") or new_correlation_id()
    return {
        "correlation_id": correlation_id,
        "session_id": event["session_id"],
        "customer_id_hash": sha256(event["customer_id"]),
        "deployment_id": os.environ["DEPLOYMENT_ID"],
        "prompt_version": os.environ["PROMPT_VERSION"],
        "container_id": os.environ.get("AWS_LAMBDA_LOG_STREAM_NAME", "unknown"),
    }


def emit_security_event(event_type: str, ctx: dict, detail: dict) -> None:
    payload = {
        "timestamp": utc_now_iso(),
        "event_type": event_type,
        "correlation_id": ctx["correlation_id"],
        "session_id": ctx["session_id"],
        "customer_id_hash": ctx["customer_id_hash"],
        "deployment_id": ctx["deployment_id"],
        "detail": detail,
    }
    logger.info(json.dumps(payload))

2. Automated Incident Creation

CloudWatch alarms and custom detectors publish a normalized event to EventBridge:

{
  "source": "mangaassist.security",
  "detail-type": "security-anomaly",
  "detail": {
    "signal": "pii-in-response-rate",
    "severity_hint": "SEV-2",
    "intent": "recommendation",
    "threshold": 0.005,
    "observed": 0.0081,
    "deployment_id": "prompt-canary-2026-03-24-01"
  }
}

The Incident Router Lambda then:

deduplicates similar alerts
enriches with deployment and intent metadata
computes initial severity
starts a Step Functions workflow
optionally triggers automated containment for known playbooks

3. Containment Through Config, Not Code

Emergency containment is pre-wired in AppConfig:

{
  "intent_controls": {
    "order_tracking": { "enabled": true, "mode": "dynamic" },
    "return_request": { "enabled": true, "mode": "dynamic" },
    "recommendation": { "enabled": true, "mode": "dynamic" }
  },
  "emergency_controls": {
    "global_static_fallback": false,
    "force_safe_model_tier": false,
    "disable_personalized_context": false
  }
}

Typical containment actions:

disable one intent
disable personalized context injection
force static fallback for high-risk routes
switch from canary prompt to stable prompt
switch from primary model tier to safer fallback tier

4. Evidence Collector

def preserve_incident_evidence(incident: dict) -> dict:
    correlation_ids = incident["correlation_ids"]
    artifacts = []

    transcripts = fetch_raw_transcripts(correlation_ids)
    artifacts.append(write_locked_artifact(incident, "transcripts.json", transcripts))

    prompt_bundle = fetch_prompt_bundle(incident["deployment_id"])
    artifacts.append(write_locked_artifact(incident, "prompt_bundle.json", prompt_bundle))

    retrieval_docs = fetch_retrieval_documents(correlation_ids)
    artifacts.append(write_locked_artifact(incident, "retrieval_docs.json", retrieval_docs))

    config_snapshot = fetch_appconfig_snapshot()
    artifacts.append(write_locked_artifact(incident, "config_snapshot.json", config_snapshot))

    manifest = {
        "incident_id": incident["incident_id"],
        "created_at": utc_now_iso(),
        "artifacts": artifacts,
    }
    put_manifest(manifest)
    return manifest

5. Investigation Queries

CloudWatch Logs Insights example for a prompt canary incident:

fields @timestamp, correlation_id, detail.reason_code, deployment_id, intent
| filter event_type = "guardrail_decision"
| filter detail.reason_code = "OBFUSCATED_EMAIL_PATTERN"
| stats count() as hits by deployment_id, intent, bin(5m)
| sort by bin(5m) desc

Athena example for blast radius estimation:

SELECT
  deployment_id,
  count(DISTINCT session_id) AS affected_sessions,
  count(*) AS affected_events
FROM forensic_events
WHERE event_date BETWEEN DATE '2026-03-21' AND DATE '2026-03-24'
  AND event_type = 'cross_session_leak_suspected'
GROUP BY deployment_id;

Investigation Decision Tree

flowchart TD
    A[Unsafe or suspicious response observed] --> B{Was the bad content in delivered response?}
    B -->|No| C[Near miss only<br/>guardrail caught or modified it]
    B -->|Yes| D[User-visible incident]

    C --> E{Where was it introduced?}
    D --> E

    E -->|In retrieved chunk| F[Retrieval or ACL issue]
    E -->|In backend payload| G[Service authorization or data bug]
    E -->|In prompt history| H[Memory or session isolation issue]
    E -->|Only in model output| I[FM hallucination or prompt-induced generation]
    E -->|Not reproducible| J[Need wider sampling and shadow replay]

    F --> K[Check document IDs, metadata, index snapshot, ACL tags]
    G --> L[Check service auth, caller identity, upstream payloads]
    H --> M[Check session boundaries, cache keys, container state]
    I --> N[Check prompt version, model version, guardrail thresholds]

This decision tree prevents teams from blaming the model too early. In practice, a large fraction of "LLM incidents" are application-state or retrieval-governance issues.

Detailed Scenarios

The scenarios below are written the way a strong incident responder would explain them in design review or interviews:

what happened
how it was detected
how containment was performed
what evidence proved root cause
what changed afterward
what follow-up questions usually come next

Scenario 1: Prompt Canary Introduced an Obfuscated PII Leak

Context

The recommendation prompt was updated to sound more community-aware and conversational. The change went through offline evaluation and a 10 percent canary, but it created a subtle failure mode:

the new prompt encouraged the FM to mention how readers could "follow creators"
the model sometimes responded with invented contact details
direct email patterns were mostly blocked
obfuscated formats like author name [at] publisher [dot] com slipped through

Detection

pii-in-response-rate alarm crossed the 0.5% threshold for canary traffic
pii_near_miss_rate also spiked, which meant the FM was generating more PII-like content even when guardrails caught some of it
the spike was isolated to prompt_version = rec-v37-canary

Sequence of Events

sequenceDiagram
    participant User
    participant Orch as Orchestrator
    participant Prompt as Prompt v37 Canary
    participant LLM as Bedrock Model
    participant PII as PII Filter
    participant Metrics as CloudWatch
    participant Oncall as On-call Engineer
    participant Config as AppConfig

    User->>Orch: "Recommend manga by indie creators"
    Orch->>Prompt: Build prompt with canary version
    Prompt->>LLM: Prompt asks for richer creator context
    LLM-->>PII: Response includes obfuscated contact pattern
    PII-->>Metrics: Partial misses accumulate
    Metrics-->>Oncall: PII rate alarm fires
    Oncall->>Config: Roll back prompt canary
    Config-->>Orch: Stable prompt restored

Containment

Containment took 17 minutes:

rolled back the canary prompt through AppConfig
verified pii-in-response-rate returned to baseline within two metric windows
disabled the specific creator-community enrichment clause while investigation continued

This is exactly why prompt versions are treated as deployable, reversible artifacts.

Forensic Investigation

The investigation used four evidence sources:

Prompt diff: rec-v36 vs rec-v37
Guardrail decision logs: showed the miss pattern only on obfuscated formats
Canary segmentation: proved the issue was isolated to the new prompt
Transcript replay: confirmed reproducibility against the same prompt bundle

Key forensic clue:

the unsafe string was not present in retrieval data
it was not present in any backend payload
it appeared only in FM output after the prompt wording changed

That ruled out retrieval leaks and upstream data exposure.

Root Cause

The prompt added an instruction that increased the probability of the FM generating contact-like information. The PII filter was tuned for direct email syntax, not obfuscated variants.

Remediation

removed the risky prompt clause
expanded the PII detector with obfuscated email regexes
added adversarial tests specifically for creator-contact generation
added a pre-prod review question: "Does this prompt create a new path for generating sensitive information?"

Verification

replayed the affected transcripts against stable and fixed prompt versions
ran adversarial suite with direct and obfuscated PII patterns
monitored pii_near_miss_rate for 24 hours

Follow-Up Questions and Deep-Dive Answers

Q1. Why did offline testing miss this?

Offline tests only checked whether direct PII leaked in final responses. They did not test induction risk: whether a prompt instruction could cause the model to invent sensitive-looking content that the downstream detector might miss. After this incident, the prompt evaluation suite added:

induced PII generation probes
obfuscated PII variants
near-miss metrics, not just delivered-leak metrics

Q2. How did you prove it was the prompt and not a model update?

The strongest evidence was segmentation:

only rec-v37-canary traffic showed the spike
same model, same region, same backend context under rec-v36 did not
replaying identical transcripts with the old prompt removed the issue

That is the difference between correlation and causation in an incident review.

Q3. Why not block every response that contains anything PII-like?

Because the recommendation flow still needs to be usable. Blanket blocking would create high false positives and degrade customer experience. The right policy is layered:

redact when the content is clearly removable
block when the response is unsafe or materially untrustworthy
track near misses because they are leading indicators of prompt risk

Scenario 2: Response-Size Spike Looked Like Exfiltration but Was Legitimate

Context

A CloudWatch anomaly detector flagged that FAQ responses were about 3x longer than baseline. Long responses are suspicious because they can indicate:

prompt injection causing policy dumps
retrieval over-sharing
internal document leakage
scripted exfiltration attempts

But long responses alone do not prove malicious behavior.

Sequence of Events

sequenceDiagram
    participant Detector as Response Size Detector
    participant Router as Incident Router
    participant Analyst as On-call Analyst
    participant Logs as Logs Insights
    participant KB as Knowledge Base
    participant User

    Detector->>Router: FAQ length anomaly
    Router->>Analyst: Create SEV-3 investigation
    Analyst->>Logs: Pull top long responses
    Analyst->>KB: Inspect retrieved source docs
    Logs-->>Analyst: One user, repeated detailed policy questions
    KB-->>Analyst: Only customer-facing policy docs retrieved
    Analyst-->>Router: Downgrade to SEV-4 false alarm

Investigation Details

The analyst checked:

Which intents were affected? Only FAQ, not recommendations or orders.
Which users were involved? One authenticated long-tenure customer.
Which documents were retrieved? Only customer-facing return policy documents.
Were internal-only chunks retrieved? No.
Was the user escalating question specificity? Yes, each question added more edge-case detail.

The response growth was explained by the RAG system doing exactly what it was designed to do: return more policy detail as the question became more specific.

Why This Still Mattered

Even though it was not a breach, the alert was useful because it showed a blind spot:

anomaly detection was not intent-aware enough
FAQ traffic has a very different length distribution from recommendation traffic
response length without source-scope context generates too many false alarms

Fixes

changed the detector to baseline by intent
incorporated retrieved document count and source audience into the anomaly score
capped FAQ response length at 300 tokens and linked out for exhaustive detail

Follow-Up Questions and Deep-Dive Answers

Q1. How do you avoid alert fatigue without missing real exfiltration?

You do not remove the alert. You improve its context. MangaAssist changed from a one-dimensional metric to a richer detector:

response length by intent
retrieved chunk count
source audience labels
per-user burst behavior

That preserved sensitivity while reducing noise.

Q2. Why not classify this as not-an-incident immediately?

Because long output is a real exfiltration signal. The right posture is "investigate first, downgrade with evidence." For security anomalies, false positives are acceptable; false dismissals are not.

Q3. What evidence would have escalated this to a true breach?

Any of the following:

internal-only document IDs in retrieval logs
multiple sessions showing the same dump pattern
prompt text indicating successful instruction override
backend payloads containing over-authorized data

Scenario 3: Cross-Session Order Data Leak Caused by Lambda Warm-Container State

Context

A user reported that the bot mentioned someone else's order. This was treated as SEV-1 immediately because cross-user data exposure is a breach until disproven.

Sequence of Events

sequenceDiagram
    participant UserA
    participant UserB
    participant Lambda as Warm Lambda Container
    participant Memory as In-Memory Global State
    participant Order as Order Service
    participant LLM
    participant Support
    participant IC as Incident Commander

    UserA->>Lambda: Prior session request
    Lambda->>Memory: Store summarized turn in global list
    UserB->>Lambda: New order question on reused container
    Lambda->>Order: Fetch UserB order
    Lambda->>LLM: Prompt includes UserB order + stale UserA summary
    LLM-->>UserB: Mixed response with foreign order data
    UserB->>Support: "Bot told me about someone else's order"
    Support->>IC: Escalate immediately

Immediate Containment

Within 8 minutes:

disabled all order-related intents through AppConfig
forced a static fallback to the existing "Your Orders" page
opened a SEV-1 incident channel and pulled privacy and security in immediately

This was a textbook example of why high-risk intents need pre-built kill switches.

Forensic Investigation

The investigation did not start by blaming the model. It traced prompt assembly.

Evidence chain:

Delivered transcript contained an extra order reference.
Order service logs showed the backend returned only the correct user's order.
Prompt assembly snapshot showed an extra conversation-history summary.
Container ID analysis showed that the same Lambda container handled another customer's prior session.
Code inspection found mutable global state used for conversation history.

Problematic pattern:

# Bug: mutable global state survives warm Lambda invocations.
conversation_history = []


def handler(event, context):
    conversation_history.append(event["new_turn"])
    prompt = build_prompt(conversation_history)
    return generate_response(prompt)

Corrected pattern:

def handler(event, context):
    conversation_history = load_history_for_session(event["session_id"])
    conversation_history.append(event["new_turn"])
    prompt = build_prompt(conversation_history)
    return generate_response(prompt)

Root Cause

This was not a model issue. It was a session-isolation bug caused by warm-container reuse combined with mutable global state.

Blast Radius Analysis

Blast radius could not rely on the single reported session. The team queried:

all invocations sharing the affected container pattern
all prompts assembled with mixed customer_id_hash evidence
all order intents during the last three days

The result identified 23 sessions with potential contamination.

Remediation

fixed the handler to make state request-scoped
scanned all Lambda functions for mutable globals
added CI linting to block mutable module-level state in request handlers
added a two-user integration test that forces warm container reuse patterns

Verification

replayed dual-session tests
verified no cross-user content in sampled prompt assemblies
re-enabled order intents only after the evidence review passed

Follow-Up Questions and Deep-Dive Answers

Q1. How did you prove the order service was not the source of the leak?

By comparing three artifacts:

upstream order-service response
prompt assembly payload
delivered FM output

The extra order reference appeared in prompt history but not in the authorized service response. That is conclusive evidence that the leak happened in the application layer before generation.

Q2. How did you estimate blast radius if only one user reported it?

We looked for the signature of the failure, not just more complaints:

repeated container IDs
mismatched customer_id_hash values inside one prompt assembly
order intents executed on warm containers in the affected deployment window

That is the right way to quantify rare but serious isolation bugs.

Q3. Why disable all order intents instead of just one endpoint?

Because the trust boundary was unclear at first. When customer data separation is in doubt, containment should be broader than the suspected code path. Once isolation was re-verified, traffic could be restored safely.

Scenario 4: Internal Returns Policy Was Exposed Through a Retrieval ACL Bug

Context

A support agent escalated a transcript where the chatbot explained internal fraud-review thresholds that customers should never see. This looked like a model hallucination at first because the answer was fluent and specific. It turned out to be a retrieval governance bug.

Sequence of Events

sequenceDiagram
    participant User
    participant Orch as Orchestrator
    participant Search as RAG Search
    participant KB as OpenSearch Index
    participant LLM
    participant Guard as Guardrails
    participant Support
    participant Sec as Security Lead

    User->>Orch: "Why was my return denied?"
    Orch->>Search: Query KB with return-policy intent
    Search->>KB: Retrieve top documents
    KB-->>Search: Includes internal SOP chunk tagged incorrectly
    Search-->>LLM: Customer docs + internal chunk
    LLM-->>Guard: Response includes internal thresholds
    Guard-->>User: Passes because content is not toxic or PII
    User->>Support: Questions internal rules shown by bot
    Support->>Sec: Escalate retrieval leak

Why This Incident Is Important

This is a classic LLM-era incident:

infrastructure is healthy
the model is technically grounded in retrieved data
the answer is factually correct
the failure is authorization, not generation quality

Traditional app monitoring often misses this because nothing "breaks."

Forensic Investigation

The team checked:

retrieval document IDs from the transcript's correlation_id
index metadata for each chunk
source S3 object metadata and ingestion logs
access-control tags applied during chunking

They found:

an internal returns SOP was ingested into the customer-facing index
the audience=internal metadata tag was missing on a batch of chunks
the reranker boosted the internal chunk because it contained precise denial criteria

Root Cause

The ingestion job treated missing audience metadata as public by default. That was a governance flaw. Missing classification should have failed closed, not open.

Containment

removed the affected chunk IDs from the index
disabled returns-policy retrieval while rebuilding the filtered snapshot
patched ingestion so unclassified documents are quarantined instead of indexed

Remediation

changed metadata policy from default-public to default-quarantine
added a pre-index validator requiring audience, data_classification, and owner
added a retrieval guardrail that blocks non-customer audiences at serve time
added canary tests that intentionally probe for internal policy exposure

Verification

rebuilt the index from clean source manifests
replayed the affected query set
sampled retrieval results for all returns-related intents

Follow-Up Questions and Deep-Dive Answers

Q1. How did you know this was retrieval leakage and not model memorization?

Because the exact internal terminology was present in the retrieved chunk IDs tied to the incident's correlation_id. If the content is in the retrieval trace, you have concrete provenance. Model memorization is a fallback hypothesis only when the content is absent from prompt, retrieval, and backend data.

Q2. Why did the guardrails not stop this?

The existing guardrails focused on toxicity, competitor mentions, pricing, PII, and scope. They did not enforce document audience labels. This incident showed that content safety and authorization safety are different layers.

Q3. What is the long-term fix: better prompts or better retrieval governance?

Retrieval governance. Prompts can reduce risk, but they should not be the primary control for authorization. The durable fix is:

fail-closed metadata handling
index-time validation
serve-time audience enforcement
forensic retrieval traces for every answer

First 15 Minutes: Practical Runbook

When a real incident hits, the first mistake teams make is doing too much analysis before containment. MangaAssist uses the following first-15-minute checklist.

Minute Window	Action	Goal
0-5	Confirm signal, assign Incident Commander, open incident channel	Avoid ownership confusion
0-5	Snapshot volatile evidence immediately	Preserve prompt versions, config, transcripts
5-10	Trigger kill switch or rollback if impact could be user-visible	Stop ongoing harm
5-10	Classify severity conservatively	Under-classification is more dangerous
10-15	Pull initial blast-radius query and identify affected intent or deployment	Bound scope early

Minimal First-15-Minute Questions

Is this user-visible?
Is another user's data involved?
Did a recent prompt, model, or config change correlate with onset?
Can we contain by config instead of code?
What evidence might disappear if we wait?

First Hour: Forensic Checklist

Within the first hour, the team should answer:

Where was the bad content introduced? Prompt, retrieval, backend payload, conversation memory, or model-only generation.
What is the blast radius? How many sessions, intents, users, and data elements are potentially affected.
What version boundary explains the incident? Deployment ID, prompt version, KB snapshot, model tier, or region.
What prevents recurrence right now? Temporary block, rollback, stricter guardrail, index quarantine, or route disablement.
What external obligations exist? Privacy notification, customer support messaging, or legal review.

Post-Incident Review Template

## Incident Review: [TITLE]

### 1. Executive Summary
- Severity:
- Start time:
- End time:
- User impact:
- Data impact:
- Detection source:

### 2. Exact Timeline
| Time | Event | Actor | Evidence |
|---|---|---|---|
| T+0 | Alert or report received | Detector / user | Incident envelope |
| T+Xm | Containment started | Incident Commander | AppConfig change ID |
| T+Xm | Evidence snapshotted | Security lead | Manifest ID |
| T+Xh | Root cause identified | Service owner | Query / code diff |
| T+Xh | Fix deployed | Engineering | Deployment ID |
| T+Xh | Recovery approved | Incident Commander | Verification report |

### 3. Blast Radius
- Sessions affected:
- Users affected:
- Intents affected:
- Data classes affected:
- Confidence level of estimate:

### 4. Technical Root Cause
- Immediate cause:
- Contributing factors:
- Why existing controls failed:

### 5. Detection Analysis
- What signal fired:
- What did not fire but should have:
- Mean time to detect:
- Mean time to contain:

### 6. Corrective Actions
| Action | Owner | Due Date | Status |
|---|---|---|---|
| Add regression test |  |  |  |
| Add detector or guardrail |  |  |  |
| Update runbook |  |  |  |
| Review architecture decision |  |  |  |

### 7. Evidence Manifest
- S3 manifest path:
- Query notebooks:
- Relevant transcript IDs:
- Config snapshot:

### 8. Lessons Learned
- What changed in engineering practice:
- What changed in architecture:
- What changed in monitoring:

Architecture Decisions and Tradeoffs

Decision	Choice	Why	Tradeoff
Raw transcript handling	Raw content only in restricted evidence bucket	Enables deep investigations without exposing raw PII in general logs	More operational complexity
Evidence storage	S3 Object Lock + KMS	Tamper-resistant and compliance-friendly	Harder to correct logging mistakes
Containment path	AppConfig kill switches	Fast, low-risk rollback	Requires up-front design for each intent
Retrieval governance	Fail closed on missing metadata	Prevents accidental public exposure	More ingest rejects and operational overhead
Blast-radius analysis	Correlation-ID centric traces	Faster root cause and user impact analysis	Requires disciplined propagation across services
False-positive posture	Investigate first, downgrade with evidence	Safer for security anomalies	More analyst time spent on benign cases

Key Lessons

Prompts are production code from a security perspective. If a prompt can change the probability of unsafe generation, it belongs in the same rollback and evidence system as application code.
Most serious LLM incidents are cross-layer incidents. Root cause often sits in the seams between orchestration, retrieval, memory, and model behavior.
Forensics requires provenance, not just logs. Knowing the final answer is not enough. You need prompt, retrieval, backend, and guardrail lineage.
Containment speed depends on design done before the incident. Teams that need to invent a kill switch during a SEV-1 are already late.
Authorization bugs in RAG systems are security incidents even when the model is factually correct. Correct content can still be unauthorized content.
Near misses are leading indicators. A detector that catches generated PII before delivery is telling you the system is drifting toward a real incident.

Cross-References

Prompt injection incidents: 01-prompt-injection-defense.md
PII detection and protection: 02-pii-protection-data-privacy.md
Guardrail architecture and failure modes: 03-guardrails-pipeline-deep-dive.md
Abuse detection and moderation: 04-content-moderation-abuse-prevention.md
Encryption and key management: 08-encryption-key-management.md
Debugging playbooks: ../Debugging/03-debugging-scenarios.md
Application logging architecture: ../Debugging/02-application-logging.md