2. PII Protection & Data Privacy
MangaAssist sits in an awkward part of the risk surface: it is both a shopping assistant and a conversational system. Users voluntarily paste names, phone numbers, addresses, order IDs, gift card codes, and account details into natural-language chat. At the same time, the system itself can generate or echo sensitive data in responses. That means privacy cannot be treated as a storage-only problem or a prompt-only problem. It has to be enforced across ingestion, orchestration, model prompting, response filtering, logging, analytics, retention, and deletion.
This document goes deeper than the base privacy narrative and answers five practical engineering questions:
- Where can PII enter the system and where can it leak?
- How is PII detected with enough recall without breaking manga-domain queries?
- How is the confidence score calculated and used for routing?
- How do the storage and deletion flows prove privacy controls beyond the hot path?
- What follow-up questions should you expect in a design review or interview?
Why This Matters for MangaAssist
MangaAssist processes sensitive data across nearly every user journey:
| Data Type | Where It Appears | Why It Matters |
|---|---|---|
| Customer name | Account help, order lookup, shipping status | Identity exposure, unwanted profiling |
| Email address | Order support, account recovery questions | Phishing, spam targeting |
| Shipping address | Delivery estimate, returns, account help | Physical safety, direct privacy risk |
| Phone number | Tracking and customer support | SIM-swapping and unwanted contact |
| Amazon customer ID | Session context, account support | Account takeover vector |
| Order history | Recommendations, return support | Behavioral profiling |
| Payment references | Last-4 digits, gift card references | Fraud enablement |
| Browsing behavior | Personalization and ranking | Preference profiling |
The hard part is not only detection. The hard part is that the same string can be either sensitive or harmless depending on context:
Gojo Satorucan be a fictional character or a real person name123-4567can be a postal fragment, phone fragment, or noiseB0XXXXXXXlooks like an ASIN and should not be redacted as PIITBA123456789012is not a person identifier, but it is still sensitive operational data
The privacy system therefore needs:
- strong structured detection
- domain-aware unstructured detection
- policy-aware routing
- data minimization at every persistence boundary
- deletion that covers both primary and derived data
Design Goals
- Detect sensitive data before any non-essential persistence.
- Separate detection from authorization: something can still be PII even if a downstream service is allowed to see it.
- Keep the hot path deterministic and low latency.
- Preserve shopping utility by avoiding false positives on manga titles, character names, ASINs, and catalog terms.
- Make privacy controls auditable: what was detected, what was redacted, what was stored, what was deleted, and when.
- Treat guest sessions more conservatively because their data is harder to attribute later.
Threat Model and Trust Boundaries
Main Privacy Failure Modes
| Failure Mode | Example | Impact | First Control |
|---|---|---|---|
| User volunteers PII in chat | "My email is alex@example.com" | Raw PII reaches logs, history, analytics | Pre-logging PII scan |
| Model echoes sensitive data | FM repeats shipping address from context | Unauthorized disclosure in response | Post-generation PII response filter |
| Model hallucinates plausible PII | Fake email or phone number in prose | User sees fabricated personal data | Response-side regex + NER |
| False positive on manga entities | Gojo Satoru becomes [NAME_REDACTED] |
Broken recommendations and trust loss | Allowlist + context scorer |
| Locale miss | JP or DE formats not caught | PII leak due to detector blind spot | Locale-aware regex + multilingual NER |
| Derived data not deleted | Training export still contains customer_id |
GDPR/CCPA non-compliance | Data lineage map + deletion workflow |
| Guest data retention too long | Unattributable guest PII stored for 24h | Higher privacy risk, weak deletion path | Lower threshold + shorter TTL |
Trust Boundary View
flowchart TB
subgraph Untrusted["Untrusted / User-Controlled"]
User[User message]
History[Conversation text]
APIText[API payload text fields]
end
subgraph Controlled["Controlled Decision Layer"]
Normalize[Normalizer + locale resolver]
Detect[PII detection pipeline]
Policy[Policy and authorization engine]
Redact[Redaction and masking engine]
end
subgraph Sensitive["Sensitive Runtime Zone"]
Orch[Orchestrator request context]
FM[Foundation model prompt]
Ship[Shipping API]
Order[Order API]
end
subgraph Persistent["Persistent Stores"]
Logs[Redacted logs]
DDB[Redacted session history]
Analytics[Anonymized analytics]
S3[Archives and audit evidence]
end
User --> Normalize
History --> Normalize
APIText --> Normalize
Normalize --> Detect
Detect --> Policy
Policy --> Redact
Policy --> Orch
Orch --> FM
Orch --> Ship
Orch --> Order
Redact --> Logs
Redact --> DDB
Redact --> Analytics
DDB --> S3
Key rule: raw user text is allowed to exist in memory for the minimum time needed to fulfill the request, but persistent systems should see only the redacted or policy-approved representation.
High-Level Design (HLD)
System Overview
flowchart LR
User[Web / mobile client] --> Gateway[API Gateway + auth]
Gateway --> Orch[Chat orchestrator]
subgraph PII["PII Protection Layer"]
Normalize[Text normalizer]
Locale[Locale resolver]
Regex[Regex scanner]
NER[NER endpoint]
Custom[Custom detectors]
Merge[Overlap merger]
Score[Confidence scorer]
Policy[Authorization and action router]
Redact[Redaction engine]
end
Orch --> Normalize
Normalize --> Locale
Locale --> Regex
Locale --> NER
Locale --> Custom
Regex --> Merge
NER --> Merge
Custom --> Merge
Merge --> Score
Score --> Policy
Policy --> Redact
Policy -->|Ephemeral approved fields only| FM[Foundation model]
Policy -->|Ephemeral approved fields only| Shipping[Shipping API]
Policy -->|Ephemeral approved fields only| Orders[Order API]
Redact --> DDB[DynamoDB session history]
Redact --> Logs[CloudWatch application logs]
Redact --> Analytics[Analytics stream]
FM --> ResponsePII[Response PII filter]
Shipping --> ResponsePII
Orders --> ResponsePII
ResponsePII --> UserResponse[User-visible response]
HLD Principles
- Detection runs before persistence.
- Authorization is explicit and separate from confidence.
- Raw PII is never forwarded by default.
- Response-side filtering exists because the FM can still generate sensitive data even if the input path is clean.
- Audit and deletion are first-class parts of the design, not cleanup tasks.
End-to-End Dataflow
Inbound Request Dataflow
sequenceDiagram
participant User
participant Gateway as API Gateway
participant Orch as Orchestrator
participant Norm as Normalizer
participant Scan as PII Scanner
participant Policy as Policy Engine
participant DDB as DynamoDB
participant Log as Logs
participant FM as Foundation Model
participant API as Order/Shipping API
User->>Gateway: Chat message with possible PII
Gateway->>Orch: Auth context + message
Orch->>Norm: Normalize for detection
Norm->>Scan: Locale-aware text + offset map
Scan->>Policy: Findings with confidence
Policy->>Log: Persist redacted copy only
Policy->>DDB: Store redacted conversation history
Policy->>FM: Send minimized prompt
Policy->>API: Send approved PII fields only if required
FM-->>Orch: Response candidate
API-->>Orch: Structured data
Outbound Response Dataflow
flowchart LR
Candidate[FM or API-composed response] --> RespPII[Response-side PII scan]
RespPII --> Guardrails[Remaining guardrails]
Guardrails --> Deliver[Deliver to user]
RespPII --> Audit[PII near-miss audit event]
Important point: input-side privacy controls do not eliminate the need for output-side privacy controls. The model can hallucinate emails, phone numbers, or addresses that were never present in the input.
Hot Path vs Async Path
flowchart TD
subgraph HotPath["Synchronous hot path"]
A[Normalize] --> B[Detect]
B --> C[Score]
C --> D[Policy route]
D --> E[Redact and store safe copy]
D --> F[Pass ephemeral approved fields]
end
subgraph AsyncPath["Asynchronous follow-up"]
G[Review queue]
H[Analytics anonymization]
I[Audit aggregation]
J[Allowlist refresh]
K[Deletion evidence generation]
end
C --> G
E --> H
E --> I
D --> K
J --> B
This split matters because privacy logic that must prevent leakage belongs in the synchronous path. Optimization, aggregation, and human review belong in the async path.
PII Detection Architecture Deep Dive
Layer 0: Text Normalization and Locale Resolution
Before detection, the system normalizes text for scanners without losing the ability to redact the original string correctly.
The implementation detail that matters is the offset map:
- normalize Unicode for detectors
- remove zero-width characters and repeated spacing
- standardize separators where safe
- preserve a mapping from normalized spans back to original spans
Without this, you can detect on normalized text but redact the wrong characters in the original message.
def normalize_for_detection(text: str) -> tuple[str, list[int]]:
"""
Returns normalized text and an offset map where offset_map[i]
points to the original character index for normalized position i.
"""
normalized = []
offset_map = []
for index, char in enumerate(text):
transformed = normalize_char(char) # NFKC, zero-width removal, spacing cleanup
for out_char in transformed:
normalized.append(out_char)
offset_map.append(index)
return "".join(normalized), offset_map
Locale resolution uses:
- authenticated user marketplace or country
- shipping country if available
- UI locale
- detector hints from the text itself
If locale is ambiguous, the pipeline runs the safe union of the relevant locale patterns.
Layer 1: Regex-Based Pattern Detection
Regex is still the cheapest and most reliable detector for strongly structured fields.
PII_PATTERNS = {
"email": r"\b[A-Za-z0-9._%+-]+@[A-Za-z0-9.-]+\.[A-Za-z]{2,}\b",
"obfuscated_email": r"\b[A-Za-z0-9._%+-]+\s*(?:@|\[at\]|\(at\))\s*[A-Za-z0-9.-]+\s*(?:\.|\[dot\]|\(dot\))\s*[A-Za-z]{2,}\b",
"us_phone": r"\b(?:\+1[-.\s]?)?\(?\d{3}\)?[-.\s]?\d{3}[-.\s]?\d{4}\b",
"jp_phone": r"\b0\d{1,4}[-.\s]?\d{1,4}[-.\s]?\d{4}\b",
"ssn": r"\b\d{3}[-.\s]?\d{2}[-.\s]?\d{4}\b",
"credit_card": r"\b(?:\d{4}[-.\s]?){3}\d{4}\b",
"us_zip": r"\b\d{5}(?:-\d{4})?\b",
"jp_postal": r"\b\d{3}-\d{4}\b",
"amazon_order_id": r"\b\d{3}-\d{7}-\d{7}\b",
}
def scan_regex(text: str) -> list[dict]:
findings = []
for pii_type, pattern in PII_PATTERNS.items():
for match in re.finditer(pattern, text, flags=re.IGNORECASE):
findings.append({
"type": pii_type,
"value": match.group(),
"start": match.start(),
"end": match.end(),
"base_confidence": 0.95,
"source": "regex",
})
return findings
Regex strengths:
- sub-millisecond
- deterministic
- auditable
- ideal for hot-path filtering
Regex weaknesses:
- poor for names and free-form addresses
- brittle against novel obfuscation unless continuously refreshed
- no semantic understanding
Layer 2: NER-Based Entity Detection
NER catches the messy cases regex cannot handle:
- names
- addresses
- organizations
- location mentions that become sensitive in account context
NER_ENTITY_MAP = {
"PERSON": "person_name",
"ADDRESS": "address",
"GPE": "location",
"LOC": "location",
"ORG": "organization",
"JAPANESE_NAME": "person_name",
}
def scan_ner(text: str, locale: str) -> list[dict]:
response = sagemaker_runtime.invoke_endpoint(
EndpointName="pii-ner-endpoint",
Body=json.dumps({"text": text, "locale": locale}),
ContentType="application/json",
)
entities = json.loads(response["Body"].read())
findings = []
for entity in entities:
if entity["label"] not in NER_ENTITY_MAP:
continue
if entity["score"] <= 0.70:
continue
findings.append({
"type": NER_ENTITY_MAP[entity["label"]],
"value": entity["text"],
"start": entity["start"],
"end": entity["end"],
"base_confidence": entity["score"],
"source": "ner",
})
return findings
Domain-specific requirement: a generic NER model will over-flag manga character names as real people. Fine-tuning reduces this, but in production it is still not enough by itself. The right solution is layered:
- NER fine-tuning on manga/e-commerce data
- character allowlist
- context-aware score adjustments
- monitoring of mid-confidence
person_namefindings
Layer 3: Custom Detectors
Some sensitive strings are domain-specific and need business logic, not just generic NLP.
CUSTOM_PATTERNS = {
"amazon_customer_id": r"\b[A-Z0-9]{13,14}\b",
"tracking_number": r"\b(?:1Z[A-Z0-9]{16}|TBA\d{12,})\b",
"gift_card_code": r"\b[A-Z0-9]{4}-[A-Z0-9]{6}-[A-Z0-9]{4}\b",
"asin": r"\bB0[A-Z0-9]{8}\b",
}
ACCOUNT_CONTEXT_KEYWORDS = {"account", "customer id", "order", "my account", "profile"}
def scan_custom(text: str) -> list[dict]:
findings = []
text_lower = text.lower()
for pii_type, pattern in CUSTOM_PATTERNS.items():
for match in re.finditer(pattern, text):
base_confidence = 0.90
if pii_type == "amazon_customer_id":
if not any(keyword in text_lower for keyword in ACCOUNT_CONTEXT_KEYWORDS):
base_confidence = 0.40
findings.append({
"type": pii_type,
"value": match.group(),
"start": match.start(),
"end": match.end(),
"base_confidence": base_confidence,
"source": "custom",
})
return findings
This is where many privacy systems fail. They either redact too broadly and destroy utility, or they under-classify business-specific identifiers because they do not look like classic PII.
Layer 4: Overlap Merging and Canonical Findings
The same span may be detected by multiple scanners. If we do not merge, we create double-redaction bugs and inconsistent routing.
def merge_overlapping_findings(findings: list[dict]) -> list[dict]:
findings = sorted(findings, key=lambda item: (item["start"], item["end"]))
merged = []
for finding in findings:
if not merged or finding["start"] > merged[-1]["end"]:
merged.append({
**finding,
"sources": {finding["source"]},
})
continue
previous = merged[-1]
previous["end"] = max(previous["end"], finding["end"])
previous["sources"].add(finding["source"])
previous["base_confidence"] = max(previous["base_confidence"], finding["base_confidence"])
return merged
Merge rule: keep the most conservative type and the highest base confidence, then let the scorer apply agreement bonuses.
Layer 5: Confidence Score Calculation
The score is not a single ML probability. It is a composite routing score:
final_confidence =
clamp(
base_confidence
+ detector_agreement_bonus
+ pii_context_bonus
- non_pii_context_penalty
+ locale_support_bonus
+ manual_override
)
Base Score Sources
| Source | Base Score |
|---|---|
| Regex exact match | 0.95 |
| NER detection | model score |
| Custom detector | usually 0.90, reduced to 0.40 for weak account context |
Scoring Adjustments
| Adjustment | Example | Delta |
|---|---|---|
| Multi-detector agreement | Regex and NER both support same span | +0.05 |
| Strong PII context | my name is, ship to, email me, account |
+0.10 |
| Non-PII manga context | recommend, series, volumes, character |
-0.20 for person_name findings |
| Locale support bonus | Locale and format strongly align | +0.03 to +0.05 |
| Character allowlist override | Gojo Satoru from catalog list |
force to 0.10 |
Reference Implementation
PII_CONTEXT_BOOST = {"my name is", "ship to", "email me", "phone number", "account", "order"}
NON_PII_CONTEXT_HINTS = {"recommend", "series", "volumes", "character", "manga", "anime"}
def calculate_confidence(finding: dict, text: str, locale: str) -> float:
score = finding["base_confidence"]
text_lower = text.lower()
if len(finding.get("sources", set())) >= 2:
score += 0.05
if any(keyword in text_lower for keyword in PII_CONTEXT_BOOST):
score += 0.10
if finding["type"] == "person_name" and any(keyword in text_lower for keyword in NON_PII_CONTEXT_HINTS):
score -= 0.20
if locale_supports_finding(locale, finding):
score += 0.03
if finding["type"] == "person_name" and is_character_name(finding["value"]):
score = 0.10
return max(0.0, min(score, 1.0))
Scoring Examples
| Input | Detector Path | Final Score | Why |
|---|---|---|---|
alex@example.com |
Regex | 0.95 |
Exact pattern match |
Ship to 123 Oak Street |
NER address 0.84 + context |
0.94 |
Delivery context boosts address confidence |
Gojo Satoru in recommendation query |
NER 0.86 - manga hint - allowlist |
0.10 |
Known character name, not customer PII |
| 14-char account-like string without account context | Custom detector | 0.40 |
Too ambiguous to redact inline |
Important Distinction: Score vs Policy
The score answers "How likely is this span to be sensitive?".
Policy answers "What are we allowed to do with it?".
Those are different questions.
Example: a shipping address can have a confidence of 0.94 and still be allowed to reach the shipping API ephemerally. The score remains high because it is still PII. Policy decides that only one downstream component is allowed to see it.
Layer 6: Routing and Actions
flowchart TD
Findings[Canonical findings] --> Score[Calculate final confidence]
Score --> Auth{Authorized for this downstream use?}
Auth -->|No| Route1{Score range}
Auth -->|Yes| Route2{Score range}
Route1 -->|>= 0.9| Redact[Full redact]
Route1 -->|0.7 - 0.89| Mask[Mask + review queue]
Route1 -->|0.5 - 0.69| Monitor[Log only]
Route1 -->|< 0.5| Pass1[Pass]
Route2 -->|>= 0.9| Ephemeral[Ephemeral pass-through to approved service only]
Route2 -->|0.7 - 0.89| Mask2[Mask for storage, pass minimal approved field]
Route2 -->|< 0.7| Pass2[Pass to approved service]
Default Routing Table
| Final Score | Authenticated User | Guest User | Default Action |
|---|---|---|---|
>= 0.9 |
Redact in persistence, allow ephemeral approved use only | Redact everywhere | Highest certainty |
0.7 - 0.89 |
Mask for storage, review if ambiguous | Redact everywhere | Likely PII |
0.5 - 0.69 |
Monitor only unless policy requires stronger handling | Redact for guest | Borderline |
< 0.5 |
Pass | Pass | Likely false positive |
Guest policy is stricter because unattributable guest PII is harder to govern later.
Low-Level Design (LLD)
Component Breakdown
classDiagram
class PIIFinding {
+str type
+str value
+int start
+int end
+str source
+float base_confidence
+float final_confidence
+bool authorized
+str action
}
class TextNormalizer {
+normalize_for_detection(text)
}
class LocaleResolver {
+resolve(session, text)
}
class RegexScanner {
+scan(text, locale)
}
class NERClient {
+scan(text, locale)
}
class CustomDetector {
+scan(text, session)
}
class FindingMerger {
+merge(findings)
}
class ConfidenceScorer {
+score(finding, text, locale)
}
class PolicyEngine {
+route(findings, session, intent)
}
class RedactionEngine {
+apply(text, findings, offset_map)
}
class AuditPublisher {
+publish(findings, decisions)
}
TextNormalizer --> LocaleResolver
LocaleResolver --> RegexScanner
LocaleResolver --> NERClient
LocaleResolver --> CustomDetector
RegexScanner --> FindingMerger
NERClient --> FindingMerger
CustomDetector --> FindingMerger
FindingMerger --> ConfidenceScorer
ConfidenceScorer --> PolicyEngine
PolicyEngine --> RedactionEngine
PolicyEngine --> AuditPublisher
Data Contracts
from dataclasses import dataclass, field
@dataclass
class PIIFinding:
type: str
value: str
start: int
end: int
source: str
base_confidence: float
final_confidence: float = 0.0
locale: str = "unknown"
authorized: bool = False
action: str = "pass"
sources: set[str] = field(default_factory=set)
@dataclass
class PIIDecision:
redacted_text: str
approved_fields: dict
findings: list[PIIFinding]
audit_metadata: dict
Request-Side Orchestration
def process_inbound_message(
raw_text: str,
session: dict,
intent: str,
approved_fields_for_intent: set[str],
) -> PIIDecision:
normalized_text, offset_map = normalize_for_detection(raw_text)
locale = resolve_locale(session, normalized_text)
findings = []
findings.extend(scan_regex(normalized_text))
findings.extend(scan_ner(normalized_text, locale))
findings.extend(scan_custom(normalized_text))
merged = merge_overlapping_findings(findings)
for finding in merged:
finding["final_confidence"] = calculate_confidence(finding, normalized_text, locale)
finding["authorized"] = finding["type"] in approved_fields_for_intent
routed = route_findings(merged, session, intent)
redacted_text = apply_redactions(raw_text, routed, offset_map)
store_redacted_history(session, redacted_text)
emit_pii_audit_event(session, routed)
return PIIDecision(
redacted_text=redacted_text,
approved_fields=extract_ephemeral_approved_fields(raw_text, routed, offset_map),
findings=routed,
audit_metadata=build_audit_metadata(routed),
)
Storage Schema
Session History Record
| Field | Example | Notes |
|---|---|---|
session_id |
sess_123 |
Partition key |
turn_id |
17 |
Sort key |
actor |
user or assistant |
Who produced the turn |
redacted_text |
Ship to [ADDRESS_REDACTED] |
Stored form |
pii_types_detected |
["address"] |
Audit-friendly metadata |
pii_confidence_max |
0.94 |
Highest confidence in turn |
guest_mode |
true |
Drives TTL and policy |
expires_at |
epoch time | DynamoDB TTL |
PII Audit Event
| Field | Example | Notes |
|---|---|---|
request_id |
req_abc |
Trace correlation |
session_id_hash |
hash | PII-safe join key |
pii_findings_count |
2 |
Aggregate, not raw PII |
pii_types |
["email", "address"] |
Audit taxonomy |
max_confidence |
0.95 |
Detection signal |
actions |
["redact", "ephemeral_pass"] |
What happened |
authorized_fields |
["address"] |
Approved downstream use |
model_version |
pii-ner-v5 |
Reproducibility |
Latency Budget by Module
| Module | Target Latency | Notes |
|---|---|---|
| Normalizer + locale resolver | <1ms |
Pure in-process |
| Regex scanner | <1ms |
Cheap deterministic rules |
| Custom detectors | <1ms |
Mostly regex + context |
| NER endpoint | 3-8ms |
Cached real-time endpoint |
| Merger + scorer + routing | <1ms |
In-process |
| Redaction | <1ms |
Offset-map aware substring replacement |
| Total request-side PII stage | 5-12ms |
Fits inside chat latency budget |
Implementation Components and Tools
| Component / Tool | Why It Exists | Typical Use in This Design |
|---|---|---|
Python re |
Deterministic pattern matching | Email, phone, postal code, order ID, obfuscated email detection |
| SageMaker-hosted NER model | Low-latency entity extraction | Names, addresses, multilingual entities |
| Catalog-backed character allowlist | Domain false-positive control | Prevent character names from being redacted as customer names |
| DynamoDB | Short-lived session store | Redacted conversation history with TTL |
| CloudWatch Logs | Operational logging | Store redacted application events only |
| Analytics stream + warehouse | Product analytics | Consume anonymized events, never raw PII |
| S3 + lifecycle policy | Archive and evidence store | Encrypted archives, audit evidence, deletion artifacts |
| KMS envelope encryption | Field-level encryption where PII must be stored | Protect approved stored sensitive fields |
| Step Functions or equivalent workflow engine | Multi-system deletion orchestration | GDPR/CCPA erasure workflow with evidence chain |
| Async review queue | Human review of ambiguous detections | Medium-confidence or policy-sensitive cases |
Scenario Deep Dives
Scenario 1: Manga Character Names Triggering PII Redaction
Context
After the NER-based detector went live, recommendation quality dropped because fictional character names were being classified as real-person PII.
User message:
I want manga with Gojo Satoru and Levi Ackerman in it
Observed failure:
- NER labeled both names as
PERSON - confidence landed in the
0.82 - 0.91range - character names were redacted before recommendation retrieval
- retrieval lost the most important entity
Failure Path
flowchart LR
Query[Recommendation query with character names] --> NER[Generic PERSON detection]
NER --> Score[High person-name confidence]
Score --> Redact[Redact names]
Redact --> Retrieve[Retriever sees degraded query]
Retrieve --> BadRecs[Generic or irrelevant recommendations]
Root Cause
The model was right syntactically and wrong semantically. Manga character names look like real names, especially Japanese names. The detector lacked:
- negative examples for fictional names
- domain context such as
recommend,series,volumes - a catalog-derived allowlist
Improved Design
flowchart LR
Query[Recommendation query] --> NER[NER person-name detection]
Query --> Context[Intent and context scorer]
Query --> Allowlist[Character allowlist lookup]
NER --> Merge[Merge]
Context --> Merge
Allowlist --> Merge
Merge --> FinalScore[Final confidence]
FinalScore -->|0.10 after override| Pass[Keep term in query]
Pass --> Retrieve[Retriever sees original character name]
Implementation Changes
- Added a catalog-derived
CHARACTER_NAMESallowlist refreshed daily. - Added a negative penalty for recommendation-style context.
- Fine-tuned the NER model on manga-domain negative examples.
- Added a weekly audit for
person_namefindings in the0.70 - 0.90band.
Why the Hybrid Approach Won
Pure model fine-tuning helps but reacts slowly. Pure allowlists help quickly but are brittle. The hybrid approach:
- gives immediate mitigation
- reduces hot-path false positives
- keeps improving as the model is retrained
- creates explicit observability for uncertain names
Metric Signal
- false positive rate on character-name queries:
40% -> 3% - recommendation relevance on character-name queries:
+22%
Scenario 2: User Pasting Full Address for Delivery Estimate
Context
Users often paste their full address directly into chat instead of going through account settings.
Example:
Can you deliver to 123 Oak Street, Apt 4B, Springfield IL 62704?
The address is needed to compute delivery estimates, but it is not needed in:
- logs
- analytics
- conversation history
- model training exports
Incorrect Dataflow
flowchart LR
User[User address in chat] --> Store1[Store raw message]
Store1 --> Log[Logs]
Store1 --> DDB[Session history]
Store1 --> Analytics[Analytics]
Store1 --> PII[PII detection later]
PII --> Redact[Redact after the fact]
This is architecturally wrong because the leak already happened before redaction.
Correct Dataflow
sequenceDiagram
participant User
participant Gateway as API Gateway
participant Scan as PII scanner
participant Policy as Policy engine
participant Log as Logs
participant DDB as DynamoDB
participant Orch as Orchestrator
participant Ship as Shipping API
User->>Gateway: "Deliver to 123 Oak Street, Springfield IL 62704?"
Gateway->>Scan: Scan before any persistence
Scan->>Policy: Address finding, confidence 0.94
Policy->>Log: Write redacted text only
Policy->>DDB: Store redacted text only
Policy->>Orch: Pass original address ephemerally
Orch->>Ship: Request delivery estimate with real address
Ship-->>Orch: Delivery estimate
Orch-->>User: "Estimated delivery is March 28"
Implementation Details That Matter
- The original address only exists in orchestrator memory.
- It is never written to logs, history, or analytics.
- The session history stores the redacted version and the resulting answer.
- Any asynchronous debug trace stores only the redacted text plus metadata such as
pii_types=["address"].
Why Not Store the Encrypted Address for Convenience?
Because convenience is not a valid reason to expand the data footprint. Encryption reduces exposure after storage; it does not make unnecessary storage acceptable. The privacy-first design is to avoid storing it at all unless there is a clear product need.
Metric Signal
- address leakage to non-essential systems:
100% -> 0% - no meaningful latency increase because the scan already existed; only the ordering changed
Scenario 3: GDPR Right-to-Deletion Request
Context
A user requests erasure under GDPR Article 17. The hard part is not deleting the primary conversation row. The hard part is deleting or anonymizing every copy and derived form:
- live session store
- archives
- logs
- analytics
- training exports
- downstream evidence of deletion
HLD for Deletion Workflow
stateDiagram-v2
[*] --> RequestReceived
RequestReceived --> IdentityValidated
IdentityValidated --> RegistryUpdated
RegistryUpdated --> DeleteSessions
RegistryUpdated --> DeleteArchives
RegistryUpdated --> AnonymizeAnalytics
RegistryUpdated --> PurgeExports
RegistryUpdated --> ConfirmLogRetention
DeleteSessions --> Evidence
DeleteArchives --> Evidence
AnonymizeAnalytics --> Evidence
PurgeExports --> Evidence
ConfirmLogRetention --> Evidence
Evidence --> UserConfirmation
UserConfirmation --> [*]
Detailed Dataflow
sequenceDiagram
participant Support
participant Workflow as Deletion orchestrator
participant Registry as Deletion registry
participant DDB as DynamoDB
participant S3 as S3 archives
participant WH as Analytics warehouse
participant Export as Training export pipeline
participant Audit as Audit bucket
Support->>Workflow: Validated deletion request
Workflow->>Registry: Write deletion request and request_id
Workflow->>DDB: Delete conversation records by customer_id
Workflow->>S3: Delete archived objects by tag or inventory lookup
Workflow->>WH: Anonymize or delete attributable rows
Workflow->>Export: Add customer_id to denylist for future exports
Workflow->>Audit: Write evidence artifacts per step
Workflow-->>Support: Completion summary with evidence IDs
Low-Level Implementation Notes
- Deletion must be idempotent. Re-running the workflow should not fail if records are already gone.
- Derived data should either be deleted or anonymized with a documented policy.
- Backups need an explicit position: - short-lived immutable backups may be exempt from immediate mutation - restore workflows must replay the deletion registry before data becomes active
- The deletion registry exists to prove compliance and prevent reintroduction into later training exports.
Why the Data Lineage Map Matters
If a system stores user-linked data and is missing from the lineage map, deletion is already broken. The lineage map is not documentation overhead; it is the control surface for legal erasure.
Metric Signal
- deletion fulfillment time:
~3 days manual -> <4 hours automated - systems covered:
5/5 - dry-run audit pass rate:
100%
Scenario 4: Guest User PII Boundary Enforcement
Context
Guest users should not need to share PII, but many still paste it into chat:
My email is alex@example.com, can you check my gift card status?
The system cannot safely treat guest PII like authenticated PII because:
- there is no durable
customer_id - there is no strong identity binding
- later deletion is harder or impossible
Policy Comparison
flowchart TD
Start[Incoming message] --> Mode{Authenticated?}
Mode -->|Yes| Auth[Standard privacy policy]
Mode -->|No| Guest[Guest privacy policy]
Auth --> AuthScore[Use standard thresholds]
Auth --> AuthTTL[24h session TTL]
Auth --> AuthUse[Allow approved ephemeral PII use]
Guest --> GuestScore[Lower redaction threshold]
Guest --> GuestTTL[2h session TTL]
Guest --> GuestUse[Never pass account-related PII through]
Guest --> GuestPrompt[Prompt user to sign in]
Guest-Specific Rules
- Redact from
>= 0.5, not only from>= 0.9. - Never pass account-recovery-like fields to downstream account tools.
- Keep guest TTL at 2 hours, not 24 hours.
- Emit a user-facing message explaining that account assistance requires sign-in.
Why Not Block Any Guest Message That Contains PII?
Because some guest flows are still useful even when a small amount of PII appears accidentally. The better balance is:
- redact aggressively
- minimize retention
- nudge toward authentication for account-specific help
That protects privacy without turning the guest experience into a wall of rejections.
Metric Signal
- guest sessions containing stored PII:
~18% -> <1% - guest-to-auth conversion:
+12%
Data Retention, Deletion, and Evidence Chain
Retention Architecture
flowchart TB
subgraph SessionData["Session and interaction data"]
Conv[Conversation history<br/>24h auth / 2h guest]
Meta[Session metadata<br/>24h]
Logs[Application logs<br/>30 days]
Audit[Audit evidence<br/>1 year]
end
subgraph AnalyticsData["Derived data"]
Events[Identifiable analytics<br/>90 days]
Agg[Aggregated analytics<br/>Long-lived]
Exports[Training exports<br/>Rolling export cycle]
end
subgraph Controls["Controls"]
TTL[DynamoDB TTL]
LC[S3 lifecycle rules]
Anon[Warehouse anonymization job]
Registry[Deletion registry]
end
Conv --> TTL
Meta --> TTL
Logs --> LC
Audit --> LC
Events --> Anon
Exports --> Registry
Retention Table
| Store | Retention | Deletion Mechanism | Notes |
|---|---|---|---|
| DynamoDB conversation history | 24h authenticated, 2h guest | TTL + explicit delete | Stores redacted text only |
| Session metadata | 24h | TTL | No raw PII by default |
| CloudWatch logs | 30 days | Retention policy | Logs must already be redacted |
| S3 archives | 90 days then lifecycle delete | Lifecycle + explicit delete by tag | Encrypted |
| Analytics warehouse | 90 days identifiable then anonymized | ETL anonymization | Keep aggregate trends only |
| Training exports | Rolling cycle | Export filter + deletion registry | Deleted users excluded from future exports |
| Audit evidence | 1 year | Lifecycle retention | Keeps proof, not raw deleted data |
Evidence Chain for Deletion
The system should retain proof of action, not the deleted data itself. Evidence artifacts usually contain:
- request ID
- customer ID hash
- systems targeted
- timestamp per step
- success or retry status
- operator or workflow identity
That gives auditors a durable trail without preserving the original PII.
Monitoring, Alerting, and Testing
Key Metrics
| Metric | Why It Matters | Example Alert |
|---|---|---|
pii_in_response_rate |
Sensitive data still reaching users | >0.5% of responses |
pii_near_miss_rate |
Model or tools are generating data guardrails must catch | sudden spike over baseline |
character_name_false_positive_rate |
Domain utility regression | >5% |
guest_pii_persistence_rate |
Guest privacy boundary drift | any sustained increase |
deletion_sla_hours |
Compliance risk | >72h or internal target breach |
pii_stage_p95_ms |
Hot-path latency regression | >15ms |
mid_confidence_review_volume |
Ambiguity trend | unusual jump suggests drift |
Alert Design
Use alerts that distinguish true privacy incidents from noisy detector activity:
- high severity: PII in final response, cross-user leakage, failed deletion workflow
- medium severity: detector recall drop, guest-policy leakage, audit evidence gaps
- low severity: rising false positives, latency regression, review backlog growth
Test Strategy
| Test Layer | What It Covers | Example Cases |
|---|---|---|
| Unit tests | regex, scorer, routing | email, phone, character-name overrides |
| Integration tests | dataflow correctness | address should not appear in logs or history |
| Adversarial tests | obfuscation and prompt-induced PII | john [at] gmail [dot] com, zero-width chars |
| Multi-locale tests | locale-specific coverage | JP postal, DE address, US SSN |
| Regression tests | domain false positives | Gojo Satoru, Attack on Titan, ASINs |
| Canary monitoring | live safety before full rollout | compare near-miss rate and block rate |
Deep-Dive Test Cases Worth Having
- Obfuscated PII:
alex [at] example [dot] com - Mixed script text: full-width digits, Japanese address fragments
- Character name vs real customer name in similar syntax
- Same PII passed through authorized and unauthorized intents
- FM-generated fake contact details in response prose
- Deletion request for a user with data across every storage layer
Failure Modes and Tradeoffs
| Decision | What We Chose | Alternative | Upside | Downside |
|---|---|---|---|---|
| Detection timing | Scan before persistence | Scrub after storage | Prevents initial leak | Synchronous hot-path work |
| Confidence routing | Multi-tier thresholds | Single redact threshold | Better precision/recall balance | More policy complexity |
| Name handling | Allowlist + context + NER tuning | NER only | Fast false-positive reduction | Allowlist maintenance |
| Guest policy | Lower threshold + short TTL | Same as authenticated | Better privacy for unattributable data | Some guest flows become more limited |
| Address handling | Ephemeral pass-through only | Store encrypted for reuse | Strong minimization | User may need to re-enter later |
| Deletion workflow | Central orchestrator + registry | Manual deletions | Audit-ready and repeatable | Engineering investment |
Residual Risks
Even after all of the above, some risks remain:
- semantic PII that looks harmless to detectors
- future locale formats not yet covered
- prompt changes that induce the FM to generate contact-like strings
- derived datasets accidentally created outside the lineage map
The mitigation pattern is not "build one better detector." It is:
- layered controls
- observability
- explicit ownership of every persistence boundary
- continuous negative testing
Follow-Up Questions and Deep-Dive Answers
These are the questions a strong interviewer, reviewer, or principal engineer will ask after the base privacy design looks reasonable.
Q1. Why is authorization kept separate from the confidence score instead of blending them into one number?
Answer: Because they represent different facts. Confidence is a classification estimate: "How likely is this span to be sensitive?" Authorization is a policy decision: "Is this component allowed to see that sensitive field for this use case?" If you blend them, you create ambiguous behavior. A shipping address used for a delivery estimate might be highly sensitive with confidence 0.94, but still authorized for one API call. Lowering the score just because it is authorized would hide the fact that it is sensitive, break audit quality, and make downstream analysis harder. The clean design is to keep the high score, route storage through redaction, and allow only one ephemeral approved use.
Q2. Why not replace regex, custom rules, and allowlists with one stronger LLM or one larger NER model?
Answer: Because the job is not only recognition accuracy. It is low-latency, deterministic enforcement. Regex and rules are cheap, explainable, and reliable for structured patterns. The NER model handles names and addresses, but it is probabilistic and domain-sensitive. Allowlists handle known fictional entities that would otherwise create repeated false positives. A single-model design looks elegant but performs poorly on three dimensions that matter in production: latency, debuggability, and exact control over business-specific identifiers. Layering is not technical debt here. It is the operationally correct architecture.
Q3. How do you prevent overlap bugs such as double redaction or conflicting actions for the same text span?
Answer: The pipeline must canonicalize findings before routing. That means sorting findings by span, merging overlapping ranges, carrying forward the strongest conservative type, and recording all supporting sources for agreement bonuses and later audit. Routing decisions happen only on canonical findings, never on raw detector outputs. Without this step, one scanner can say mask, another can say redact, and the rendering layer ends up corrupting the message or applying inconsistent policy. The overlap merger is a low-complexity component, but it is essential.
Q4. How do you scale NER without turning privacy into the latency bottleneck?
Answer: Keep as much as possible in-process and deterministic, and reserve the model for what only the model can do. Regex and custom rules should handle all structured cases. NER should run on a warm real-time endpoint with strict entity filtering and a narrow label set. In addition, monitor the distribution of text lengths and consider early exits for short strings that only contain structured patterns already handled by regex. The latency budget should be explicit and enforced, because privacy that only works at low traffic is not production privacy.
Q5. How do you know a privacy regression is real and not just a change in user behavior?
Answer: You need both rate metrics and trace-level evidence. A metric such as pii_near_miss_rate tells you that the model or downstream tools are generating more sensitive content. But that by itself does not prove leakage. Pair it with trace sampling: inspect what was detected, where it was blocked, and whether the final delivered response still contained sensitive strings. For false positives, inspect mid-confidence review volume and domain-specific error slices such as character-name queries. The combination of aggregate metrics and trace slices is what distinguishes a real regression from traffic mix noise.
Q6. How would you test obfuscated PII and multi-locale inputs without breaking legitimate Japanese text?
Answer: Normalize for detectors, not for the final user-visible text. That means the detector path can collapse zero-width characters, standardize separators, and map homoglyphs where appropriate, while the original text and an offset map are preserved for precise redaction. The test suite should include multilingual examples, full-width digits, Japanese addresses, German street formats, obfuscated emails, and benign manga terms that should survive intact. The goal is not to normalize everything globally. The goal is to normalize enough for classification while preserving user fidelity and exact span control.
Q7. How do you delete data from analytics or training systems that are not simple key-value stores?
Answer: You need a documented policy per system. For analytics, row-level deletion is ideal when practical; otherwise anonymization can be acceptable if it truly breaks attribution and is documented in policy. For training exports, the deletion registry is critical. Once a user is deleted, future export jobs must exclude that identifier. If the training artifact has already been produced, you need a purge or regeneration rule. The key is to decide this upfront and encode it into the lineage map. If a system cannot answer "How do we delete or de-identify this user's data?" it is not production-ready.
Q8. Why keep a deletion registry for one year if the user asked to be deleted?
Answer: Because the registry is not the user's data in the product sense; it is compliance control metadata. Its purpose is to prevent reintroduction and to prove that a request was processed. The registry should contain the minimum necessary form, typically a hashed customer identifier, request metadata, timestamps, and outcome status. It should not contain the deleted content itself. This is an example of the difference between retaining business data and retaining control-plane evidence.
Q9. Why not forbid all guest messages that contain PII instead of using a lower threshold and shorter TTL?
Answer: Because privacy and usability both matter. Guests still ask legitimate shopping questions, and many paste unnecessary personal details accidentally. A hard-block-only design causes friction without improving security proportionally. The better design is to redact aggressively, keep retention minimal, restrict downstream account actions, and steer the user toward sign-in when account-specific help is needed. That preserves privacy while still allowing useful guest interactions like catalog discovery or general shipping policy questions.
Q10. What prevents the character allowlist from becoming a bypass where attackers choose names that look like allowed content?
Answer: The allowlist should be scoped narrowly and used only as one signal, not as unconditional trust. It should be derived from the catalog, refreshed automatically, and applied primarily to person_name findings in recommendation-style contexts. If the same string appears in account or shipping context, the context boost should outweigh the content hint and the finding should stay sensitive. Also monitor collisions: if a real customer name overlaps a popular character name, the audit and review path should surface that pattern. The point is to reduce domain false positives, not to create a universal privacy exemption.
Q11. If MangaAssist adds voice input with streaming transcripts, what changes in the privacy architecture?
Answer: The main change is granularity. Detection can no longer wait for the full message; it has to operate on partial transcript windows while still supporting correction as ASR hypotheses stabilize. That means temporary findings may need to be revised, and the UI should avoid exposing raw partial transcripts to logs. The architecture remains the same conceptually: detection before persistence, ephemeral approved use, redacted storage, response-side filtering. But the implementation gets more complex because timing and transcript revision become part of the privacy surface.
Q12. What is the hardest residual failure mode even after implementing all of this?
Answer: The hardest residual risk is semantically sensitive data that does not look like classic PII and only becomes risky when combined across turns or systems. For example, a sequence of benign-seeming utterances can reveal enough to identify a person or infer account ownership. That is why the long-term control is not only span detection. It is also session-level policy, bounded context windows, strong access control to downstream tools, and careful decisions about what history is retained at all.
Key Lessons
- Privacy controls must run before persistence, not after.
- Confidence-based routing is useful only if it is separated from authorization policy.
- Domain false positives are a product risk, not just an ML annoyance.
- Output-side privacy filtering is mandatory because the FM can still generate sensitive content.
- Deletion is an architecture capability, not a support-team procedure.
- Guest sessions deserve stricter defaults because their data is harder to govern later.
- The strongest privacy systems are layered, observable, and explicit about every persistence boundary.
Cross-References
- PII classification tables: 12-security-privacy.md
- Guardrails pipeline and response filtering: 03-guardrails-pipeline-deep-dive.md
- Encryption and field-level protection: 08-encryption-key-management.md
- Incident response and forensic handling: 05-incident-response-forensics.md
- Security interview follow-up bank: 09-interview-scenarios.md
- Prompt-level hardening: Prompt-Engineering/05-guardrails-and-prompt-hardening.md
- Scenario follow-up packs for this document: 02-pii-protection-data-privacy/README.md