LOCAL PREVIEW View on GitHub

5. Incident Response and Security Forensics

Incident response for an LLM product is not just "watch for 500s and roll back bad code." MangaAssist can fail in ways that are subtle, distributed, and partially probabilistic:

  • a prompt change can open a new path for PII generation
  • a retrieval bug can surface an internal document without any infrastructure breach
  • a model regression can change behavior without a code diff
  • a state isolation bug can leak one user's data into another user's session
  • an anomaly can look like exfiltration but be a legitimate user asking for detail

For that reason, the incident program needs three things at the same time:

  1. very fast containment
  2. defensible forensic evidence
  3. enough telemetry to separate model behavior from application bugs, retrieval bugs, and infrastructure issues

This chapter expands the original material into a full operating model: lifecycle, HLD, LLD, data flow, evidence design, runbooks, scenario walkthroughs, and follow-up questions with deep-dive answers.


Why Incident Response Is Harder for LLM Systems

Failure Mode Why It Is Hard to Detect Why It Is Hard to Prove Evidence Required
Prompt-induced PII generation It may affect only a narrow slice of prompts The unsafe output may be generated, blocked, or redacted before users see it Prompt version, FM output hash, guardrail decisions, redaction events
Cross-session leakage It can happen only on warm containers or rare cache paths User report may be the first signal Container ID, session IDs, prompt assembly payload, memory source trace
Retrieval ACL failure Output may look factually correct, just over-authorized Need to prove which chunk was retrieved and why Retrieval document IDs, metadata, index snapshot
Model behavior shift Same code, same prompt, different generation behavior Often mistaken for application regressions Model ID, region, model version, shadow outputs, canary traces
Suspicious long responses Looks like exfiltration or dumping Could be valid detailed help Intent, retrieval chunk count, response token count, policy source list

The key design principle is this:

Every customer-visible response must be reconstructable as a chain of evidence: user request -> auth context -> routing -> retrieval -> prompt assembly -> model output -> guardrail actions -> delivered response.

Without that chain, you can detect incidents, but you cannot investigate them rigorously.


Incident Response Objectives

Objective What Good Looks Like Why It Matters
Fast containment Kill switch or config rollback in under 15 minutes for SEV-1 User harm grows while unsafe traffic continues
Evidence preservation Relevant artifacts snapshotted before mutation or expiry Prompt versions, KB state, and logs can change fast
Accurate classification Real breach vs false positive vs quality issue is separated quickly Overreacting causes avoidable outages; underreacting causes harm
Blast-radius assessment Affected sessions, intents, and data elements can be enumerated Notification, remediation, and legal review depend on scope
Systemic prevention Root cause leads to code, process, and test changes Repeating the same incident is an engineering failure

Target operating thresholds:

  • SEV-1: contain in under 15 minutes
  • SEV-2: contain in under 60 minutes
  • SEV-3: triage and begin evidence preservation in under 4 hours
  • SEV-4: investigate in business hours unless a trigger escalates

Lifecycle and Command Model

Incident Lifecycle

stateDiagram-v2
    [*] --> Detect: Alert, user report, audit finding
    Detect --> Declare: Confirm incident or security anomaly
    Declare --> Contain: Kill switch, rollback, traffic shaping
    Declare --> Preserve: Snapshot evidence immediately
    Contain --> Investigate
    Preserve --> Investigate
    Investigate --> Eradicate: Remove root cause
    Eradicate --> Recover: Restore traffic safely
    Recover --> Monitor: Verify no recurrence
    Monitor --> Review: Post-incident review
    Review --> [*]: Action items tracked to closure

Parallel Workstreams During an Incident

In a serious incident, these happen in parallel, not sequentially:

  • Containment track: stop further harm
  • Forensics track: preserve evidence before it disappears
  • Comms track: notify on-call, security, legal, privacy, and support
  • Decision track: decide whether this is a breach, quality issue, abuse attempt, or false alarm

Response Roles

Role Primary Responsibility Typical Owner
Incident Commander Owns severity, priorities, timeline, and recovery decision On-call engineering lead
Security Lead Owns breach assessment and evidence integrity Security engineer
Service Owner Owns technical diagnosis and remediation MangaAssist backend lead
Data Protection / Privacy Owns legal notification requirements Privacy or compliance lead
Scribe Maintains decision log and exact timeline Secondary on-call
Support Liaison Coordinates user-facing support impact Customer support lead

The most common anti-pattern is having everyone investigate while nobody owns containment. MangaAssist explicitly assigns containment ownership to the Incident Commander from minute one.


Severity Matrix and Escalation Rules

Severity Definition Containment SLA Typical Examples
SEV-1 Confirmed or strongly suspected cross-user data exposure, active breach, or unsafe output at scale < 15 min Cross-session order leak, internal policy exposure with sensitive thresholds, mass PII leakage
SEV-2 Production security regression with limited scope or partial exposure < 60 min Prompt canary leaking obfuscated email patterns, guardrail bypass for one intent
SEV-3 Security anomaly needing investigation, unclear exploitability < 4 h to declare path Response-size anomaly, injection spike, elevated block rate
SEV-4 Low-impact or single-session issue with no evidence of scale < 72 h One odd transcript, isolated policy mismatch, monitoring false positive

Escalation Triggers

A lower severity becomes higher if any of the following is true:

  • multiple independent sessions show the same pattern
  • a user-visible leak involves another customer's data
  • the issue is trending upward after a rollout
  • the unsafe behavior affects a high-risk intent like order tracking or returns
  • external disclosure occurs on social media, support escalations, or bug bounty channels

HLD: Incident Detection, Containment, and Forensics Plane

flowchart TB
    subgraph Runtime["MangaAssist Runtime"]
        U[User]
        G[API Gateway]
        O[Orchestrator]
        R[Retriever / KB]
        L[LLM on Bedrock]
        GR[Guardrails Pipeline]
        M[Conversation Memory]
        B[Backend Services<br/>Orders, Returns, Catalog]
        U --> G --> O
        O --> R
        O --> M
        O --> B
        O --> L
        L --> GR
        GR --> G
    end

    subgraph Observability["Observability and Control"]
        Logs[CloudWatch Logs]
        Metrics[CloudWatch Metrics / Alarms]
        Trail[CloudTrail]
        Events[EventBridge]
        Config[AWS AppConfig]
        Pager[SNS / PagerDuty / Slack]
    end

    subgraph Forensics["Incident Response and Forensics"]
        Router[Incident Router]
        SFN[Step Functions War Room Workflow]
        Evidence[Evidence Collector]
        Index[DynamoDB Incident Index]
        Audit[S3 Audit Bucket<br/>Object Lock + SSE-KMS]
        Athena[Athena / Logs Insights]
    end

    O --> Logs
    R --> Logs
    L --> Logs
    GR --> Logs
    G --> Metrics
    O --> Metrics
    Trail --> Events
    Logs --> Events
    Metrics --> Events
    Events --> Router --> SFN
    SFN --> Pager
    SFN --> Config
    SFN --> Evidence
    Evidence --> Index
    Evidence --> Audit
    Athena --> Audit

HLD Principles

  1. Runtime and response plane are separate. Incidents must still be manageable when the application plane is degraded.
  2. Containment is config-first. AppConfig kill switches are faster and safer than emergency redeploys.
  3. Evidence is immutable. The investigation uses append-only artifacts, not mutable app logs alone.
  4. Every incident gets a data package. Even false alarms produce a minimal evidence package so tuning is auditable.

End-to-End Incident Data Flow

sequenceDiagram
    participant User
    participant Gateway as API Gateway
    participant Orch as Orchestrator
    participant Guard as Guardrails
    participant Obs as Logs and Metrics
    participant EB as EventBridge
    participant IR as Incident Workflow
    participant CFG as AppConfig
    participant EV as Evidence Collector
    participant S3 as Immutable Audit Bucket
    participant IC as Incident Commander

    User->>Gateway: Chat request
    Gateway->>Orch: Authenticated message + session context
    Orch->>Guard: Candidate response + metadata
    Guard-->>Orch: pass / modify / block
    Orch->>Obs: Structured events, hashes, latency, versions
    Obs->>EB: Alarm or anomaly event
    EB->>IR: Create incident workflow
    IR->>IC: Page on-call and create incident channel
    IR->>CFG: Optional automated containment
    IR->>EV: Snapshot prompt version, retrieval docs, transcripts, config
    EV->>S3: Store immutable evidence bundle
    EV-->>IC: Evidence manifest and first triage summary

Important detail: evidence preservation begins as soon as an incident is declared, even before root cause is known. That prevents losing volatile artifacts such as canary prompt versions, temporary feature flags, or short-lived transcripts.


LLD: Core Response Components

flowchart LR
    subgraph Detection["Detection Layer"]
        A1[CloudWatch Alarm<br/>threshold or anomaly]
        A2[Custom Detector Lambda]
        A3[Support Ticket Ingest]
        A4[GuardDuty / CloudTrail Findings]
    end

    subgraph Routing["Incident Routing"]
        B1[EventBridge Rules]
        B2[Incident Router Lambda]
        B3[Severity Calculator]
    end

    subgraph Actions["Containment and Preservation"]
        C1[AppConfig Kill Switch API]
        C2[Evidence Collector Lambda]
        C3[Timeline Builder]
        C4[Pager and Slack Notifier]
    end

    subgraph Storage["Forensic Storage"]
        D1[DynamoDB Incident Table]
        D2[S3 Evidence Bucket<br/>Object Lock]
        D3[CloudWatch Logs]
        D4[CloudTrail Archive]
    end

    subgraph Query["Investigation"]
        E1[CloudWatch Logs Insights]
        E2[Athena]
        E3[Security Dashboard]
    end

    A1 --> B1
    A2 --> B1
    A3 --> B1
    A4 --> B1
    B1 --> B2 --> B3
    B3 --> C1
    B3 --> C2
    B3 --> C3
    B3 --> C4
    C2 --> D1
    C2 --> D2
    C3 --> D1
    D3 --> E1
    D2 --> E2
    D1 --> E3
    D4 --> E2

LLD Component Table

Component Implementation What It Stores or Does Failure It Helps Diagnose
Incident Router Lambda behind EventBridge Normalizes alerts into one incident envelope Duplicate or fragmented alerting
Severity Calculator Deterministic rules + overrides Maps signal type, intent, and data sensitivity to severity Slow or inconsistent triage
Kill Switch Controller AppConfig update API Disable intents, swap model tiers, force static fallback Delayed containment
Evidence Collector Lambda + Step Functions Pulls transcripts, prompt versions, KB docs, config, hashes Missing volatile evidence
Timeline Builder Lambda over logs Builds event-by-event chronology from correlation IDs Confusing incident timelines
Incident Table DynamoDB Incident metadata, manifest, owners, actions Lack of central status
Immutable Evidence Bucket S3 with Object Lock + KMS Forensic artifacts and signed manifests Tampering or accidental deletion

Evidence Model and Chain of Custody

What Must Be Captured for Every Serious Incident

For SEV-1 and SEV-2, MangaAssist preserves:

  • user message hash and response hash
  • raw transcript in restricted evidence storage
  • correlation_id, session_id, customer_id_hash
  • prompt template version and resolved prompt text
  • retrieval document IDs and metadata
  • model ID, region, inference timestamp, and feature-flag state
  • guardrail stage outcomes and any redaction or modification metadata
  • backend call summaries for order, return, and catalog services
  • deployment version, Lambda container ID, and request ID where applicable

Correlation Keys

Field Purpose Notes
correlation_id Ties the end-to-end request together Generated at API edge and propagated everywhere
session_id Groups multi-turn history Stable across a conversation
request_id Per-service request trace Useful when correlation propagation breaks
deployment_id Identifies code or prompt rollout version Critical for rollback analysis
retrieval_snapshot_id Identifies the exact KB or index snapshot Critical in retrieval leaks
container_id Ties events to warm runtime state Important for Lambda state contamination bugs

PII-Safe Security Event Schema

{
  "timestamp": "2026-03-24T18:04:51.221Z",
  "event_type": "guardrail_decision",
  "severity_hint": "SEV-2",
  "correlation_id": "corr_7c22f2d9",
  "session_id": "sess_91fd7e",
  "request_id": "req_0be44d",
  "deployment_id": "prompt-canary-2026-03-24-01",
  "customer_id_hash": "sha256:91c2...",
  "intent": "recommendation",
  "model": {
    "provider": "bedrock",
    "model_id": "anthropic.claude-3-5-sonnet",
    "region": "us-east-1"
  },
  "guardrails": [
    {
      "stage": "pii_filter",
      "action": "modify",
      "reason_code": "OBFUSCATED_EMAIL_PATTERN",
      "latency_ms": 4.2
    }
  ],
  "request_hash": "sha256:8aa1...",
  "response_hash": "sha256:f102...",
  "feature_flags": {
    "order_intent_enabled": true,
    "model_tier": "primary",
    "prompt_version": "rec-v37"
  }
}

Restricted Evidence Manifest

{
  "incident_id": "inc_2026_03_24_017",
  "created_at": "2026-03-24T18:09:00Z",
  "severity": "SEV-1",
  "artifacts": [
    {
      "type": "transcript",
      "s3_key": "evidence/inc_2026_03_24_017/transcript_corr_7c22f2d9.json",
      "sha256": "96c0..."
    },
    {
      "type": "prompt_bundle",
      "s3_key": "evidence/inc_2026_03_24_017/prompt_bundle.json",
      "sha256": "d487..."
    },
    {
      "type": "retrieval_snapshot",
      "s3_key": "evidence/inc_2026_03_24_017/retrieval_docs.json",
      "sha256": "ef34..."
    }
  ],
  "approved_access": [
    "security_lead",
    "privacy_officer"
  ]
}

Chain of Custody

flowchart TD
    A[Alert or user report] --> B[Incident declared]
    B --> C[Evidence collector snapshots artifacts]
    C --> D[Hash each artifact]
    D --> E[Write to S3 evidence bucket with Object Lock]
    E --> F[Store manifest in DynamoDB]
    F --> G[Restricted access via IAM role + MFA]
    G --> H[Every evidence read is logged to CloudTrail]

This design matters because incident response is often challenged later by legal, compliance, or postmortem review. If evidence can be edited after the fact, the investigation is not trustworthy.


Implementation Details

1. Correlation ID Propagation

The API edge generates a correlation_id once. Every downstream service receives it and writes it to logs and traces.

def build_request_context(event: dict) -> dict:
    headers = event.get("headers", {})
    correlation_id = headers.get("x-correlation-id") or new_correlation_id()
    return {
        "correlation_id": correlation_id,
        "session_id": event["session_id"],
        "customer_id_hash": sha256(event["customer_id"]),
        "deployment_id": os.environ["DEPLOYMENT_ID"],
        "prompt_version": os.environ["PROMPT_VERSION"],
        "container_id": os.environ.get("AWS_LAMBDA_LOG_STREAM_NAME", "unknown"),
    }


def emit_security_event(event_type: str, ctx: dict, detail: dict) -> None:
    payload = {
        "timestamp": utc_now_iso(),
        "event_type": event_type,
        "correlation_id": ctx["correlation_id"],
        "session_id": ctx["session_id"],
        "customer_id_hash": ctx["customer_id_hash"],
        "deployment_id": ctx["deployment_id"],
        "detail": detail,
    }
    logger.info(json.dumps(payload))

2. Automated Incident Creation

CloudWatch alarms and custom detectors publish a normalized event to EventBridge:

{
  "source": "mangaassist.security",
  "detail-type": "security-anomaly",
  "detail": {
    "signal": "pii-in-response-rate",
    "severity_hint": "SEV-2",
    "intent": "recommendation",
    "threshold": 0.005,
    "observed": 0.0081,
    "deployment_id": "prompt-canary-2026-03-24-01"
  }
}

The Incident Router Lambda then:

  1. deduplicates similar alerts
  2. enriches with deployment and intent metadata
  3. computes initial severity
  4. starts a Step Functions workflow
  5. optionally triggers automated containment for known playbooks

3. Containment Through Config, Not Code

Emergency containment is pre-wired in AppConfig:

{
  "intent_controls": {
    "order_tracking": { "enabled": true, "mode": "dynamic" },
    "return_request": { "enabled": true, "mode": "dynamic" },
    "recommendation": { "enabled": true, "mode": "dynamic" }
  },
  "emergency_controls": {
    "global_static_fallback": false,
    "force_safe_model_tier": false,
    "disable_personalized_context": false
  }
}

Typical containment actions:

  • disable one intent
  • disable personalized context injection
  • force static fallback for high-risk routes
  • switch from canary prompt to stable prompt
  • switch from primary model tier to safer fallback tier

4. Evidence Collector

def preserve_incident_evidence(incident: dict) -> dict:
    correlation_ids = incident["correlation_ids"]
    artifacts = []

    transcripts = fetch_raw_transcripts(correlation_ids)
    artifacts.append(write_locked_artifact(incident, "transcripts.json", transcripts))

    prompt_bundle = fetch_prompt_bundle(incident["deployment_id"])
    artifacts.append(write_locked_artifact(incident, "prompt_bundle.json", prompt_bundle))

    retrieval_docs = fetch_retrieval_documents(correlation_ids)
    artifacts.append(write_locked_artifact(incident, "retrieval_docs.json", retrieval_docs))

    config_snapshot = fetch_appconfig_snapshot()
    artifacts.append(write_locked_artifact(incident, "config_snapshot.json", config_snapshot))

    manifest = {
        "incident_id": incident["incident_id"],
        "created_at": utc_now_iso(),
        "artifacts": artifacts,
    }
    put_manifest(manifest)
    return manifest

5. Investigation Queries

CloudWatch Logs Insights example for a prompt canary incident:

fields @timestamp, correlation_id, detail.reason_code, deployment_id, intent
| filter event_type = "guardrail_decision"
| filter detail.reason_code = "OBFUSCATED_EMAIL_PATTERN"
| stats count() as hits by deployment_id, intent, bin(5m)
| sort by bin(5m) desc

Athena example for blast radius estimation:

SELECT
  deployment_id,
  count(DISTINCT session_id) AS affected_sessions,
  count(*) AS affected_events
FROM forensic_events
WHERE event_date BETWEEN DATE '2026-03-21' AND DATE '2026-03-24'
  AND event_type = 'cross_session_leak_suspected'
GROUP BY deployment_id;

Investigation Decision Tree

flowchart TD
    A[Unsafe or suspicious response observed] --> B{Was the bad content in delivered response?}
    B -->|No| C[Near miss only<br/>guardrail caught or modified it]
    B -->|Yes| D[User-visible incident]

    C --> E{Where was it introduced?}
    D --> E

    E -->|In retrieved chunk| F[Retrieval or ACL issue]
    E -->|In backend payload| G[Service authorization or data bug]
    E -->|In prompt history| H[Memory or session isolation issue]
    E -->|Only in model output| I[FM hallucination or prompt-induced generation]
    E -->|Not reproducible| J[Need wider sampling and shadow replay]

    F --> K[Check document IDs, metadata, index snapshot, ACL tags]
    G --> L[Check service auth, caller identity, upstream payloads]
    H --> M[Check session boundaries, cache keys, container state]
    I --> N[Check prompt version, model version, guardrail thresholds]

This decision tree prevents teams from blaming the model too early. In practice, a large fraction of "LLM incidents" are application-state or retrieval-governance issues.


Detailed Scenarios

The scenarios below are written the way a strong incident responder would explain them in design review or interviews:

  • what happened
  • how it was detected
  • how containment was performed
  • what evidence proved root cause
  • what changed afterward
  • what follow-up questions usually come next

Scenario 1: Prompt Canary Introduced an Obfuscated PII Leak

Context

The recommendation prompt was updated to sound more community-aware and conversational. The change went through offline evaluation and a 10 percent canary, but it created a subtle failure mode:

  • the new prompt encouraged the FM to mention how readers could "follow creators"
  • the model sometimes responded with invented contact details
  • direct email patterns were mostly blocked
  • obfuscated formats like author name [at] publisher [dot] com slipped through

Detection

  • pii-in-response-rate alarm crossed the 0.5% threshold for canary traffic
  • pii_near_miss_rate also spiked, which meant the FM was generating more PII-like content even when guardrails caught some of it
  • the spike was isolated to prompt_version = rec-v37-canary

Sequence of Events

sequenceDiagram
    participant User
    participant Orch as Orchestrator
    participant Prompt as Prompt v37 Canary
    participant LLM as Bedrock Model
    participant PII as PII Filter
    participant Metrics as CloudWatch
    participant Oncall as On-call Engineer
    participant Config as AppConfig

    User->>Orch: "Recommend manga by indie creators"
    Orch->>Prompt: Build prompt with canary version
    Prompt->>LLM: Prompt asks for richer creator context
    LLM-->>PII: Response includes obfuscated contact pattern
    PII-->>Metrics: Partial misses accumulate
    Metrics-->>Oncall: PII rate alarm fires
    Oncall->>Config: Roll back prompt canary
    Config-->>Orch: Stable prompt restored

Containment

Containment took 17 minutes:

  1. rolled back the canary prompt through AppConfig
  2. verified pii-in-response-rate returned to baseline within two metric windows
  3. disabled the specific creator-community enrichment clause while investigation continued

This is exactly why prompt versions are treated as deployable, reversible artifacts.

Forensic Investigation

The investigation used four evidence sources:

  1. Prompt diff: rec-v36 vs rec-v37
  2. Guardrail decision logs: showed the miss pattern only on obfuscated formats
  3. Canary segmentation: proved the issue was isolated to the new prompt
  4. Transcript replay: confirmed reproducibility against the same prompt bundle

Key forensic clue:

  • the unsafe string was not present in retrieval data
  • it was not present in any backend payload
  • it appeared only in FM output after the prompt wording changed

That ruled out retrieval leaks and upstream data exposure.

Root Cause

The prompt added an instruction that increased the probability of the FM generating contact-like information. The PII filter was tuned for direct email syntax, not obfuscated variants.

Remediation

  1. removed the risky prompt clause
  2. expanded the PII detector with obfuscated email regexes
  3. added adversarial tests specifically for creator-contact generation
  4. added a pre-prod review question: "Does this prompt create a new path for generating sensitive information?"

Verification

  • replayed the affected transcripts against stable and fixed prompt versions
  • ran adversarial suite with direct and obfuscated PII patterns
  • monitored pii_near_miss_rate for 24 hours

Follow-Up Questions and Deep-Dive Answers

Q1. Why did offline testing miss this?

Offline tests only checked whether direct PII leaked in final responses. They did not test induction risk: whether a prompt instruction could cause the model to invent sensitive-looking content that the downstream detector might miss. After this incident, the prompt evaluation suite added:

  • induced PII generation probes
  • obfuscated PII variants
  • near-miss metrics, not just delivered-leak metrics

Q2. How did you prove it was the prompt and not a model update?

The strongest evidence was segmentation:

  • only rec-v37-canary traffic showed the spike
  • same model, same region, same backend context under rec-v36 did not
  • replaying identical transcripts with the old prompt removed the issue

That is the difference between correlation and causation in an incident review.

Q3. Why not block every response that contains anything PII-like?

Because the recommendation flow still needs to be usable. Blanket blocking would create high false positives and degrade customer experience. The right policy is layered:

  • redact when the content is clearly removable
  • block when the response is unsafe or materially untrustworthy
  • track near misses because they are leading indicators of prompt risk

Scenario 2: Response-Size Spike Looked Like Exfiltration but Was Legitimate

Context

A CloudWatch anomaly detector flagged that FAQ responses were about 3x longer than baseline. Long responses are suspicious because they can indicate:

  • prompt injection causing policy dumps
  • retrieval over-sharing
  • internal document leakage
  • scripted exfiltration attempts

But long responses alone do not prove malicious behavior.

Sequence of Events

sequenceDiagram
    participant Detector as Response Size Detector
    participant Router as Incident Router
    participant Analyst as On-call Analyst
    participant Logs as Logs Insights
    participant KB as Knowledge Base
    participant User

    Detector->>Router: FAQ length anomaly
    Router->>Analyst: Create SEV-3 investigation
    Analyst->>Logs: Pull top long responses
    Analyst->>KB: Inspect retrieved source docs
    Logs-->>Analyst: One user, repeated detailed policy questions
    KB-->>Analyst: Only customer-facing policy docs retrieved
    Analyst-->>Router: Downgrade to SEV-4 false alarm

Investigation Details

The analyst checked:

  1. Which intents were affected? Only FAQ, not recommendations or orders.
  2. Which users were involved? One authenticated long-tenure customer.
  3. Which documents were retrieved? Only customer-facing return policy documents.
  4. Were internal-only chunks retrieved? No.
  5. Was the user escalating question specificity? Yes, each question added more edge-case detail.

The response growth was explained by the RAG system doing exactly what it was designed to do: return more policy detail as the question became more specific.

Why This Still Mattered

Even though it was not a breach, the alert was useful because it showed a blind spot:

  • anomaly detection was not intent-aware enough
  • FAQ traffic has a very different length distribution from recommendation traffic
  • response length without source-scope context generates too many false alarms

Fixes

  1. changed the detector to baseline by intent
  2. incorporated retrieved document count and source audience into the anomaly score
  3. capped FAQ response length at 300 tokens and linked out for exhaustive detail

Follow-Up Questions and Deep-Dive Answers

Q1. How do you avoid alert fatigue without missing real exfiltration?

You do not remove the alert. You improve its context. MangaAssist changed from a one-dimensional metric to a richer detector:

  • response length by intent
  • retrieved chunk count
  • source audience labels
  • per-user burst behavior

That preserved sensitivity while reducing noise.

Q2. Why not classify this as not-an-incident immediately?

Because long output is a real exfiltration signal. The right posture is "investigate first, downgrade with evidence." For security anomalies, false positives are acceptable; false dismissals are not.

Q3. What evidence would have escalated this to a true breach?

Any of the following:

  • internal-only document IDs in retrieval logs
  • multiple sessions showing the same dump pattern
  • prompt text indicating successful instruction override
  • backend payloads containing over-authorized data

Scenario 3: Cross-Session Order Data Leak Caused by Lambda Warm-Container State

Context

A user reported that the bot mentioned someone else's order. This was treated as SEV-1 immediately because cross-user data exposure is a breach until disproven.

Sequence of Events

sequenceDiagram
    participant UserA
    participant UserB
    participant Lambda as Warm Lambda Container
    participant Memory as In-Memory Global State
    participant Order as Order Service
    participant LLM
    participant Support
    participant IC as Incident Commander

    UserA->>Lambda: Prior session request
    Lambda->>Memory: Store summarized turn in global list
    UserB->>Lambda: New order question on reused container
    Lambda->>Order: Fetch UserB order
    Lambda->>LLM: Prompt includes UserB order + stale UserA summary
    LLM-->>UserB: Mixed response with foreign order data
    UserB->>Support: "Bot told me about someone else's order"
    Support->>IC: Escalate immediately

Immediate Containment

Within 8 minutes:

  1. disabled all order-related intents through AppConfig
  2. forced a static fallback to the existing "Your Orders" page
  3. opened a SEV-1 incident channel and pulled privacy and security in immediately

This was a textbook example of why high-risk intents need pre-built kill switches.

Forensic Investigation

The investigation did not start by blaming the model. It traced prompt assembly.

Evidence chain:

  1. Delivered transcript contained an extra order reference.
  2. Order service logs showed the backend returned only the correct user's order.
  3. Prompt assembly snapshot showed an extra conversation-history summary.
  4. Container ID analysis showed that the same Lambda container handled another customer's prior session.
  5. Code inspection found mutable global state used for conversation history.

Problematic pattern:

# Bug: mutable global state survives warm Lambda invocations.
conversation_history = []


def handler(event, context):
    conversation_history.append(event["new_turn"])
    prompt = build_prompt(conversation_history)
    return generate_response(prompt)

Corrected pattern:

def handler(event, context):
    conversation_history = load_history_for_session(event["session_id"])
    conversation_history.append(event["new_turn"])
    prompt = build_prompt(conversation_history)
    return generate_response(prompt)

Root Cause

This was not a model issue. It was a session-isolation bug caused by warm-container reuse combined with mutable global state.

Blast Radius Analysis

Blast radius could not rely on the single reported session. The team queried:

  • all invocations sharing the affected container pattern
  • all prompts assembled with mixed customer_id_hash evidence
  • all order intents during the last three days

The result identified 23 sessions with potential contamination.

Remediation

  1. fixed the handler to make state request-scoped
  2. scanned all Lambda functions for mutable globals
  3. added CI linting to block mutable module-level state in request handlers
  4. added a two-user integration test that forces warm container reuse patterns

Verification

  • replayed dual-session tests
  • verified no cross-user content in sampled prompt assemblies
  • re-enabled order intents only after the evidence review passed

Follow-Up Questions and Deep-Dive Answers

Q1. How did you prove the order service was not the source of the leak?

By comparing three artifacts:

  • upstream order-service response
  • prompt assembly payload
  • delivered FM output

The extra order reference appeared in prompt history but not in the authorized service response. That is conclusive evidence that the leak happened in the application layer before generation.

Q2. How did you estimate blast radius if only one user reported it?

We looked for the signature of the failure, not just more complaints:

  • repeated container IDs
  • mismatched customer_id_hash values inside one prompt assembly
  • order intents executed on warm containers in the affected deployment window

That is the right way to quantify rare but serious isolation bugs.

Q3. Why disable all order intents instead of just one endpoint?

Because the trust boundary was unclear at first. When customer data separation is in doubt, containment should be broader than the suspected code path. Once isolation was re-verified, traffic could be restored safely.


Scenario 4: Internal Returns Policy Was Exposed Through a Retrieval ACL Bug

Context

A support agent escalated a transcript where the chatbot explained internal fraud-review thresholds that customers should never see. This looked like a model hallucination at first because the answer was fluent and specific. It turned out to be a retrieval governance bug.

Sequence of Events

sequenceDiagram
    participant User
    participant Orch as Orchestrator
    participant Search as RAG Search
    participant KB as OpenSearch Index
    participant LLM
    participant Guard as Guardrails
    participant Support
    participant Sec as Security Lead

    User->>Orch: "Why was my return denied?"
    Orch->>Search: Query KB with return-policy intent
    Search->>KB: Retrieve top documents
    KB-->>Search: Includes internal SOP chunk tagged incorrectly
    Search-->>LLM: Customer docs + internal chunk
    LLM-->>Guard: Response includes internal thresholds
    Guard-->>User: Passes because content is not toxic or PII
    User->>Support: Questions internal rules shown by bot
    Support->>Sec: Escalate retrieval leak

Why This Incident Is Important

This is a classic LLM-era incident:

  • infrastructure is healthy
  • the model is technically grounded in retrieved data
  • the answer is factually correct
  • the failure is authorization, not generation quality

Traditional app monitoring often misses this because nothing "breaks."

Forensic Investigation

The team checked:

  1. retrieval document IDs from the transcript's correlation_id
  2. index metadata for each chunk
  3. source S3 object metadata and ingestion logs
  4. access-control tags applied during chunking

They found:

  • an internal returns SOP was ingested into the customer-facing index
  • the audience=internal metadata tag was missing on a batch of chunks
  • the reranker boosted the internal chunk because it contained precise denial criteria

Root Cause

The ingestion job treated missing audience metadata as public by default. That was a governance flaw. Missing classification should have failed closed, not open.

Containment

  1. removed the affected chunk IDs from the index
  2. disabled returns-policy retrieval while rebuilding the filtered snapshot
  3. patched ingestion so unclassified documents are quarantined instead of indexed

Remediation

  1. changed metadata policy from default-public to default-quarantine
  2. added a pre-index validator requiring audience, data_classification, and owner
  3. added a retrieval guardrail that blocks non-customer audiences at serve time
  4. added canary tests that intentionally probe for internal policy exposure

Verification

  • rebuilt the index from clean source manifests
  • replayed the affected query set
  • sampled retrieval results for all returns-related intents

Follow-Up Questions and Deep-Dive Answers

Q1. How did you know this was retrieval leakage and not model memorization?

Because the exact internal terminology was present in the retrieved chunk IDs tied to the incident's correlation_id. If the content is in the retrieval trace, you have concrete provenance. Model memorization is a fallback hypothesis only when the content is absent from prompt, retrieval, and backend data.

Q2. Why did the guardrails not stop this?

The existing guardrails focused on toxicity, competitor mentions, pricing, PII, and scope. They did not enforce document audience labels. This incident showed that content safety and authorization safety are different layers.

Q3. What is the long-term fix: better prompts or better retrieval governance?

Retrieval governance. Prompts can reduce risk, but they should not be the primary control for authorization. The durable fix is:

  • fail-closed metadata handling
  • index-time validation
  • serve-time audience enforcement
  • forensic retrieval traces for every answer

First 15 Minutes: Practical Runbook

When a real incident hits, the first mistake teams make is doing too much analysis before containment. MangaAssist uses the following first-15-minute checklist.

Minute Window Action Goal
0-5 Confirm signal, assign Incident Commander, open incident channel Avoid ownership confusion
0-5 Snapshot volatile evidence immediately Preserve prompt versions, config, transcripts
5-10 Trigger kill switch or rollback if impact could be user-visible Stop ongoing harm
5-10 Classify severity conservatively Under-classification is more dangerous
10-15 Pull initial blast-radius query and identify affected intent or deployment Bound scope early

Minimal First-15-Minute Questions

  • Is this user-visible?
  • Is another user's data involved?
  • Did a recent prompt, model, or config change correlate with onset?
  • Can we contain by config instead of code?
  • What evidence might disappear if we wait?

First Hour: Forensic Checklist

Within the first hour, the team should answer:

  1. Where was the bad content introduced? Prompt, retrieval, backend payload, conversation memory, or model-only generation.

  2. What is the blast radius? How many sessions, intents, users, and data elements are potentially affected.

  3. What version boundary explains the incident? Deployment ID, prompt version, KB snapshot, model tier, or region.

  4. What prevents recurrence right now? Temporary block, rollback, stricter guardrail, index quarantine, or route disablement.

  5. What external obligations exist? Privacy notification, customer support messaging, or legal review.


Post-Incident Review Template

## Incident Review: [TITLE]

### 1. Executive Summary
- Severity:
- Start time:
- End time:
- User impact:
- Data impact:
- Detection source:

### 2. Exact Timeline
| Time | Event | Actor | Evidence |
|---|---|---|---|
| T+0 | Alert or report received | Detector / user | Incident envelope |
| T+Xm | Containment started | Incident Commander | AppConfig change ID |
| T+Xm | Evidence snapshotted | Security lead | Manifest ID |
| T+Xh | Root cause identified | Service owner | Query / code diff |
| T+Xh | Fix deployed | Engineering | Deployment ID |
| T+Xh | Recovery approved | Incident Commander | Verification report |

### 3. Blast Radius
- Sessions affected:
- Users affected:
- Intents affected:
- Data classes affected:
- Confidence level of estimate:

### 4. Technical Root Cause
- Immediate cause:
- Contributing factors:
- Why existing controls failed:

### 5. Detection Analysis
- What signal fired:
- What did not fire but should have:
- Mean time to detect:
- Mean time to contain:

### 6. Corrective Actions
| Action | Owner | Due Date | Status |
|---|---|---|---|
| Add regression test |  |  |  |
| Add detector or guardrail |  |  |  |
| Update runbook |  |  |  |
| Review architecture decision |  |  |  |

### 7. Evidence Manifest
- S3 manifest path:
- Query notebooks:
- Relevant transcript IDs:
- Config snapshot:

### 8. Lessons Learned
- What changed in engineering practice:
- What changed in architecture:
- What changed in monitoring:

Architecture Decisions and Tradeoffs

Decision Choice Why Tradeoff
Raw transcript handling Raw content only in restricted evidence bucket Enables deep investigations without exposing raw PII in general logs More operational complexity
Evidence storage S3 Object Lock + KMS Tamper-resistant and compliance-friendly Harder to correct logging mistakes
Containment path AppConfig kill switches Fast, low-risk rollback Requires up-front design for each intent
Retrieval governance Fail closed on missing metadata Prevents accidental public exposure More ingest rejects and operational overhead
Blast-radius analysis Correlation-ID centric traces Faster root cause and user impact analysis Requires disciplined propagation across services
False-positive posture Investigate first, downgrade with evidence Safer for security anomalies More analyst time spent on benign cases

Key Lessons

  1. Prompts are production code from a security perspective. If a prompt can change the probability of unsafe generation, it belongs in the same rollback and evidence system as application code.

  2. Most serious LLM incidents are cross-layer incidents. Root cause often sits in the seams between orchestration, retrieval, memory, and model behavior.

  3. Forensics requires provenance, not just logs. Knowing the final answer is not enough. You need prompt, retrieval, backend, and guardrail lineage.

  4. Containment speed depends on design done before the incident. Teams that need to invent a kill switch during a SEV-1 are already late.

  5. Authorization bugs in RAG systems are security incidents even when the model is factually correct. Correct content can still be unauthorized content.

  6. Near misses are leading indicators. A detector that catches generated PII before delivery is telling you the system is drifting toward a real incident.


Cross-References