Encryption and Key Management - Ignore All Previous Instructions Follow-Up Questions
Source document: 08-encryption-key-management.md Reference scenario: 01-prompt-injection-defense.md -> Scenario 1: Ignore All Previous Instructions
Scenario lens: Direct instruction override after several benign turns, with prompt dilution and instruction hierarchy as the main risks. Document lens: encryption, key isolation, secrets rotation, and cryptographic access control.
Use these prompts to push past the base scenario and explore deeper design, operational, interview, or storytelling tradeoffs.
Answer document: ANSWERS.md
Easy
- How would a direct instruction-override attempt surface inside encryption, key isolation, secrets rotation, and cryptographic access control, and which control should engage first: CMK separation, envelope encryption, key rotation, secret rotation, and VPC endpoints?
- What is the first metric or log signal you would inspect to confirm the attack changed behavior rather than just wording: unauthorized decrypt attempts, rotation success rate, encryption latency, and secret age?
Medium
- Where is the trust boundary in this design, and how do you keep attacker text from being interpreted as a higher-priority instruction?
- If a legitimate shopper says something like
ignore the spoiler warningsorforget the earlier recommendation, how would you tune the detector to reduce false positives without weakening the defense?
Hard
- How would you redesign the prompt assembly or control flow so protecting sensitive data without turning encryption into a latency bottleneck cannot silently weaken direct-injection resistance over long conversations?
- What negative tests would you add to CI so a future config, prompt, or dependency change cannot regress the protections described in 08-encryption-key-management.md?
Very Hard
- Assume the attacker adaptively probes the system for detector blind spots. How would you use KMS audit logs, key policy diffs, and latency traces to distinguish real resilience gains from attackers merely changing surface wording?
- What failure mode would still remain even after implementing CMK separation, envelope encryption, key rotation, secret rotation, and VPC endpoints, and what second-order mitigation would you add without blowing up latency or blocking legitimate shopping queries?