LOCAL PREVIEW View on GitHub

10. Storytelling Guide — Security Scenarios for Interviews & Documents

Why Storytelling Matters for Security Topics

Security conversations in interviews often go one of two ways: 1. Generic checkbox answers: "We use encryption, we have guardrails, we follow OWASP." The interviewer nods and moves on — you didn't differentiate. 2. War stories that demonstrate judgment: "We discovered that a prompt change created a PII generation pathway that our test suite didn't cover. Here's how we detected it, contained it in 17 minutes, and changed our process so it couldn't happen again." The interviewer remembers you.

The difference is storytelling structure — taking real scenarios and presenting them so the listener understands the stakes, the constraints, and the decisions.


The STAR-D Framework (STAR + Decision)

Standard STAR (Situation, Task, Action, Result) works, but security scenarios need an extra element: Decision — the tradeoff you made and why.

Element What It Covers Time Allocation
Situation What was the system, what happened, why it mattered 15-20 seconds
Task What was your responsibility, what was the constraint 10-15 seconds
Action What you specifically did (technical details) 45-60 seconds
Result Measurable outcome (metrics, timeline, impact) 15-20 seconds
Decision The tradeoff you considered and why you chose this path 15-20 seconds

Total: 2-3 minutes per scenario. If you're going longer, you're losing the interviewer.


Opening Hooks — First 10 Seconds

The opening line determines whether the interviewer leans in or zones out.

Weak Openings (Don't Do This)

Weak Opening Why It Fails
"So, in my project we had security..." Vague, no stakes
"I'll tell you about our guardrail pipeline..." Technical without context — interviewer can't engage yet
"We used AWS KMS for encryption..." Jumping to solution without establishing the problem

Strong Openings (Do This)

Strong Opening Why It Works
"A user reported that our chatbot told them about someone else's order." Immediately establishes stakes — data breach
"We shipped a prompt change that accidentally taught our model to hallucinate email addresses." Specific, surprising, reveals a non-obvious failure mode
"Our toxicity filter was blocking 8% of legitimate questions about manga." Quantified problem, implies real user impact
"We discovered that 847 sessions from 23 IPs were systematically extracting our product catalog." Numbers + adversarial intent = compelling

Pattern: Start with the consequence or the surprising discovery, not with the system description.


Worked Examples: Weak vs. Strong

Example 1: PII Detection (from Doc 02)

Weak version:

"In our chatbot project, we implemented PII detection. We used regex patterns for email and phone numbers, and also an NER model. One challenge was that manga character names got flagged as PII. We fixed it by adding an allowlist. The false positive rate went from high to low."

Problems: - No opening hook - "High to low" — unquantified - Passive voice ("we implemented") — doesn't show your role - No tradeoff discussed

Strong version:

"Our NER model was flagging 'Gojo Satoru' as a real person's name and redacting it from every Jujutsu Kaisen response. That's a false positive rate of 8% on manga-related queries — and users were seeing responses like 'I recommend [REDACTED] for fans of supernatural action.' [SITUATION]

I owned the PII detection pipeline and needed to fix this without weakening protection for actual PII. [TASK]

I built a two-layer approach: the NER model runs first, then a domain-aware override checks detections against a curated allowlist of 5,000 manga character names. If the NER flags a PERSON entity that's on the allowlist, we downgrade the confidence score instead of redacting. I also fine-tuned the NER on 2,000 labeled examples of character names versus real names. [ACTION]

False positive rate dropped from 8% to 0.4%. PII detection precision stayed above 99.5%. [RESULT]

The tradeoff was maintaining an allowlist that grows with every new manga series. I chose this over lowering the NER sensitivity globally because lowering sensitivity would have missed real PII. The allowlist is 200 lines — the maintenance cost is worth the precision. [DECISION]"


Example 2: Incident Response (from Doc 05)

Weak version:

"We had a security incident where users could see other users' data. It was caused by a bug in Lambda. We fixed it by changing the code. We also notified the affected users."

Problems: - Missing all numbers and timelines - "A bug in Lambda" — doesn't demonstrate understanding - No containment story — went straight from detection to fix - No process improvement

Strong version:

"A user reported that our chatbot mentioned someone else's order — a potential cross-session data leak. [HOOK]

I triaged it as SEV-1 immediately. Within 8 minutes, I disabled all order-related intents via our feature flag system — no code deploy, just an AppConfig toggle. [CONTAINMENT]

Investigation revealed that a global Python list for conversation history wasn't being reset between Lambda invocations. When AWS reused a warm container for a different user, Session B inherited Session A's history. The model then included that stale data in its response. [ROOT CAUSE]

I fixed the immediate bug — moved the list initialization inside the handler function — and then scanned all 12 Lambda functions in our stack for the same pattern, finding 2 more instances. We added a CI check that flags global mutable state in Lambda handlers. [SYSTEMIC FIX]

Blast radius: 23 sessions over 3 days, ASIN and order status only — no addresses, no payment data. All 23 users were notified within 72 hours per GDPR requirements. [RESULT]

The key tradeoff was speed vs. coverage during containment. I chose to disable specific intents rather than taking the entire chatbot offline because only order-related intents had cross-session risk. This kept 70% of functionality available while we investigated. [DECISION]"


Example 3: Guardrail Overblocking (from Doc 03)

Weak version:

"Each team tightened their own guardrail stage independently. This caused too many blocks. We fixed it by coordinating the thresholds."

Strong version:

"Our fallback rate jumped from 3% to 14% overnight — one in seven users was getting 'I can't help with that' instead of an answer. [HOOK]

Three teams had independently tightened their guardrail stages the same week: toxicity, competitor filtering, and scope check. Each change was individually reasonable — maybe 2% more blocks. But the stages are serial, and block probabilities compound — a query that's borderline on two stages gets blocked even if it's fine on both individually. [ROOT CAUSE]

My fix was an intent-aware guardrail configuration system. Instead of one global threshold per stage, each intent type has its own guardrail profile. A recommendation query runs all 6 stages. A simple FAQ runs PII + scope only. A greeting runs scope only. [ACTION]

Fallback rate dropped from 14% to 2.8%. More importantly, we now have a coordination protocol: guardrail threshold changes require cross-team review with a projected compound block rate calculation. [RESULT]

The tradeoff: intent-aware configuration adds complexity — we went from 6 threshold values to 30+ (6 stages × 5+ intents). I chose this over simply loosening thresholds because the overblocking was caused by the interaction between stages, not by any one stage being too strict. [DECISION]"


Pacing and Depth Control

Read the Interviewer

Interviewer Signal What It Means Adjust How
Nodding, leaning forward They're engaged — continue at current depth Keep going, add one more specific detail
"Can you go deeper on..." They want technical details Shift to architecture/code level
Glancing at notes, preparing next question They've heard enough Wrap up with result + decision, stop
"Why did you choose X over Y?" They care about judgment Emphasize the tradeoff, mention what you rejected and why
"How would you do it differently now?" They want self-awareness Be honest — "I'd add [X] metric earlier" or "I'd automate [Y] instead of manual"

Depth Tiers

Tier 1 — Summary (30 seconds): Use when the interviewer hasn't asked for depth.

"We had a cross-session data leak caused by Lambda container reuse. I contained it in 8 minutes by disabling affected intents, fixed the root cause (global mutable state), and added CI checks to prevent it. 23 users affected, all notified within GDPR timelines."

Tier 2 — Standard (2-3 minutes): Use for most interview answers. Full STAR-D as shown in the worked examples above.

Tier 3 — Deep dive (5-7 minutes): Use only when explicitly asked. Add: - Architecture diagrams (describe verbally: "Picture a flow where...") - Code-level details ("The fix was literally moving one line from module scope to function scope") - Comparison with alternative approaches - Long-term systemic changes


Audience Framing

Different Interviewers Care About Different Things

Interviewer Role What They Care About Emphasize
Security Engineer Technical defense mechanisms, attack/defense asymmetry Specific patterns, detection logic, false positive/negative rates
Engineering Manager Team coordination, process improvement, timeline Containment speed, cross-team protocols, post-incident reviews
VP/Director Business impact, risk management, resource allocation User impact numbers, cost of security vs. cost of breach, MVP vs. production security
Product Manager User experience impact, feature tradeoffs "This guardrail blocks 8% of questions" vs. "This guardrail prevents harmful content"
Data Scientist / ML Engineer Model behavior, bias, evaluation methodology Bias metrics, evaluation suites, model drift detection
SRE / DevOps Detection, monitoring, incident response speed Alert thresholds, containment playbooks, MTTR
Compliance / Legal Regulatory compliance, audit evidence, documentation GDPR timelines, audit log immutability, breach notification process

Reframing the Same Scenario for Different Audiences

Scenario: Cross-session data leak

To a Security Engineer:

"Lambda warm container reuse caused global mutable state to persist across invocations. The conversation_history list accumulated turns from Session A and injected them into Session B's FM context. Detection was via user report — I've since added a cross-session correlation checker that samples 100 responses/hour."

To an Engineering Manager:

"A data isolation bug affected 23 users over 3 days. We contained it in 8 minutes via feature flags, identified root cause in 1.5 hours, fixed and deployed in 2 hours, and completed full blast radius assessment in 4 hours. Post-incident review led to a new CI check and an audit of all 12 Lambda functions."

To a VP:

"We had a limited-scope data leak — order IDs visible across sessions, no payment or address data. 23 users affected, contained in under 10 minutes, all notified within GDPR's 72-hour window. The root cause was a common serverless pitfall. We've since added automated prevention that catches this class of bug at build time."


Document Narrative Structure

When writing design documents (not just speaking in interviews), the same storytelling principles apply:

The "Problem → Constraint → Decision → Evidence" Pattern

Every section of a design document should follow this structure:

  1. Problem: What specific challenge does this address? (Not "we need encryption" but "PII fields in DynamoDB are accessible to any Lambda with the table read permission")
  2. Constraint: What limits the solution space? (Performance budget, cost, team size, regulatory requirements)
  3. Decision: What we chose and what we rejected (table format works well)
  4. Evidence: Metrics, benchmarks, or incident data showing the decision worked

Example: Architecture Decision Section

Weak Strong
Problem "We need to encrypt PII" "Any Lambda with DynamoDB read access can see plaintext PII fields — a single compromised function exposes all customer data"
Constraint "Encryption adds latency" "Guardrail pipeline budget is 30ms total; direct KMS encryption adds 35ms per PII field"
Decision "We use envelope encryption" "Envelope encryption with cached data keys: KMS generates the key once per hour, local encryption uses it for up to 10,000 messages. Alternative considered: direct KMS per field (35ms vs. 2ms)"
Evidence "It works well" "Pipeline P95 latency: 18ms → 19ms after adding encrypted PII logging (vs. 34ms with direct KMS)"

Common Anti-Patterns to Avoid

1. The "We Did Everything Right" Story

Problem: You describe a situation with no challenges, no tradeoffs, no mistakes. Reality: Interviewers know real engineering involves mistakes and tradeoffs. A story without them sounds rehearsed or dishonest. Fix: Include what surprised you, what you got wrong initially, what you'd do differently.

2. The "Technology List" Story

Problem: "We used KMS for encryption, WAF for protection, CloudWatch for monitoring, GuardDuty for threats..." Reality: Listing technologies doesn't demonstrate engineering judgment. Fix: For each technology, explain the decision: why this one, what was rejected, what tradeoff you accepted.

3. The "I Was There" Story

Problem: Using "we" for everything, making it unclear what YOU specifically did. Reality: Interviewers are evaluating you, not your team. Fix: Use "I" for your actions: "I triaged the incident as SEV-1", "I designed the detection pipeline", "I made the call to disable intents."

4. The "Premature Deep Dive" Story

Problem: Starting with low-level technical details before establishing context. Reality: "I changed the Lambda handler to initialize the list inside the function" means nothing without the story of why. Fix: Always establish stakes before details. Why does this matter? What would happen if we didn't fix it?

5. The "No Numbers" Story

Problem: "We reduced false positives significantly" or "We improved latency." Reality: "Significantly" is subjective. Numbers make stories credible. Fix: "8% → 0.4%", "35ms → 2ms", "23 users affected", "contained in 17 minutes."


Quick Reference: Scenario → Opening Hook

Scenario Opening Hook
Prompt injection defense "An attacker discovered that 12 turns of casual conversation could gradually rewire our chatbot's personality."
PII false positives "Our bot was redacting the name of one of the most popular anime characters in the world."
Guardrail overblocking "Overnight, one in seven users started getting 'I can't help with that' — and nothing had been deployed."
Content moderation scraping "During a new manga release week, we detected 847 sessions from 23 IPs systematically extracting our catalog."
Cross-session data leak "A customer reported that our chatbot mentioned someone else's order."
Prompt-induced PII generation "A prompt change designed to improve recommendations taught our model to hallucinate email addresses."
Bedrock regional outage "30% of our users were getting fallback responses for 45 minutes, and we couldn't switch regions because we hadn't pre-provisioned."
LangChain breaking change "Our 20-turn conversation test started failing with 'context length exceeded' — and we hadn't changed any code."
Encryption performance "After enabling field-level encryption, our guardrail pipeline doubled in latency."
Unauthorized KMS access "CloudTrail flagged decrypt attempts against our PII encryption key from a role that shouldn't have access."

Cross-References