ML-Specific Threats and Adversarial AI - Multi-Turn Context Poisoning in Long Sessions Follow-Up Answers

Question document: README.md Source document: 06-ml-specific-threats.md Reference scenario: 01-prompt-injection-defense.md -> Scenario 4: Multi-Turn Context Poisoning in Long Sessions

Scenario lens: Gradual scope drift across long sessions where no single turn is clearly malicious, but the accumulated context becomes unsafe. Document lens: ML-Specific Threats and Adversarial AI.

Use this file as the answer key for the follow-up questions in README.md.

Easy

Q1

Question: What signals would tell you a conversation is slowly drifting from legitimate use into the kind of multi-turn poisoning risk that matters for ML-specific threats such as extraction, poisoning, inversion, and adversarial evasion? Answer: The early signal is a change in topic, requested capability, or confidence over time rather than one obvious malicious prompt. Watch for sessions that start as normal help and drift toward internal behavior, sensitive data, or privileged operations.

Q2

Question: At what point would you summarize, reset, or narrow context rather than letting the thread accumulate more state? Answer: Reset or summarize once the session crosses a turn budget or drift threshold, especially before sensitive tools or contexts are invoked. The aim is to preserve the user task while dropping attacker priming.

Medium

Q1

Question: How would you represent session memory so the assistant keeps necessary user context without carrying forward attacker priming? Answer: Session memory should store compact, typed state such as confirmed facts and current intent, not raw transcript text. That keeps subtle framing from being carried forward as if it were trusted context.

Q2

Question: What dashboard, alert, or review queue would you build to surface gradual drift that per-turn checks miss? Answer: Expose a drift score, restricted-topic ratio, and alert trail on the dashboard, then route high-risk sessions to review or tighter policy. Per-turn metrics alone miss this attack class because each message still looks reasonable.

Hard

Q1

Question: How would you test long-session resilience when the attack path depends on eight to twelve individually benign turns? Answer: Test with scripted long conversations where each step is individually benign but the whole arc becomes unsafe. The important signal is when the system redirects, re-summarizes, or starts leaking as the turn count rises.

Q2

Question: What tradeoff would you make between personalization and security if session-level controls start truncating useful context or increasing refusals? Answer: The tradeoff is continuity versus safety. Preserve facts that support the task, but compress or drop speculative framing and role narratives that add privilege without helping the user objective.

Very Hard

Q1

Question: How would you distinguish malicious scope drift from a legitimate advanced user who naturally asks deeper operational questions over time? Answer: Differentiate malicious drift from legitimate expertise by looking at transition speed, repeated probing after refusal, and how often the user steers toward disallowed internal detail. One deep technical conversation is fine; persistent boundary testing is the signal.

Q2

Question: If a distributed attacker spreads the poisoning pattern across many sessions and identities, what cross-session signals or offline analyses would you rely on to detect the campaign? Answer: Cross-session detection needs clustering over identities, IPs, time windows, and prompt similarity, plus offline review of attack paths. Campaigns often become obvious only when many almost-benign sessions are viewed together.