LOCAL PREVIEW View on GitHub

2. PII Protection & Data Privacy

MangaAssist sits in an awkward part of the risk surface: it is both a shopping assistant and a conversational system. Users voluntarily paste names, phone numbers, addresses, order IDs, gift card codes, and account details into natural-language chat. At the same time, the system itself can generate or echo sensitive data in responses. That means privacy cannot be treated as a storage-only problem or a prompt-only problem. It has to be enforced across ingestion, orchestration, model prompting, response filtering, logging, analytics, retention, and deletion.

This document goes deeper than the base privacy narrative and answers five practical engineering questions:

  1. Where can PII enter the system and where can it leak?
  2. How is PII detected with enough recall without breaking manga-domain queries?
  3. How is the confidence score calculated and used for routing?
  4. How do the storage and deletion flows prove privacy controls beyond the hot path?
  5. What follow-up questions should you expect in a design review or interview?

Why This Matters for MangaAssist

MangaAssist processes sensitive data across nearly every user journey:

Data Type Where It Appears Why It Matters
Customer name Account help, order lookup, shipping status Identity exposure, unwanted profiling
Email address Order support, account recovery questions Phishing, spam targeting
Shipping address Delivery estimate, returns, account help Physical safety, direct privacy risk
Phone number Tracking and customer support SIM-swapping and unwanted contact
Amazon customer ID Session context, account support Account takeover vector
Order history Recommendations, return support Behavioral profiling
Payment references Last-4 digits, gift card references Fraud enablement
Browsing behavior Personalization and ranking Preference profiling

The hard part is not only detection. The hard part is that the same string can be either sensitive or harmless depending on context:

  • Gojo Satoru can be a fictional character or a real person name
  • 123-4567 can be a postal fragment, phone fragment, or noise
  • B0XXXXXXX looks like an ASIN and should not be redacted as PII
  • TBA123456789012 is not a person identifier, but it is still sensitive operational data

The privacy system therefore needs:

  • strong structured detection
  • domain-aware unstructured detection
  • policy-aware routing
  • data minimization at every persistence boundary
  • deletion that covers both primary and derived data

Design Goals

  1. Detect sensitive data before any non-essential persistence.
  2. Separate detection from authorization: something can still be PII even if a downstream service is allowed to see it.
  3. Keep the hot path deterministic and low latency.
  4. Preserve shopping utility by avoiding false positives on manga titles, character names, ASINs, and catalog terms.
  5. Make privacy controls auditable: what was detected, what was redacted, what was stored, what was deleted, and when.
  6. Treat guest sessions more conservatively because their data is harder to attribute later.

Threat Model and Trust Boundaries

Main Privacy Failure Modes

Failure Mode Example Impact First Control
User volunteers PII in chat "My email is alex@example.com" Raw PII reaches logs, history, analytics Pre-logging PII scan
Model echoes sensitive data FM repeats shipping address from context Unauthorized disclosure in response Post-generation PII response filter
Model hallucinates plausible PII Fake email or phone number in prose User sees fabricated personal data Response-side regex + NER
False positive on manga entities Gojo Satoru becomes [NAME_REDACTED] Broken recommendations and trust loss Allowlist + context scorer
Locale miss JP or DE formats not caught PII leak due to detector blind spot Locale-aware regex + multilingual NER
Derived data not deleted Training export still contains customer_id GDPR/CCPA non-compliance Data lineage map + deletion workflow
Guest data retention too long Unattributable guest PII stored for 24h Higher privacy risk, weak deletion path Lower threshold + shorter TTL

Trust Boundary View

flowchart TB
    subgraph Untrusted["Untrusted / User-Controlled"]
        User[User message]
        History[Conversation text]
        APIText[API payload text fields]
    end

    subgraph Controlled["Controlled Decision Layer"]
        Normalize[Normalizer + locale resolver]
        Detect[PII detection pipeline]
        Policy[Policy and authorization engine]
        Redact[Redaction and masking engine]
    end

    subgraph Sensitive["Sensitive Runtime Zone"]
        Orch[Orchestrator request context]
        FM[Foundation model prompt]
        Ship[Shipping API]
        Order[Order API]
    end

    subgraph Persistent["Persistent Stores"]
        Logs[Redacted logs]
        DDB[Redacted session history]
        Analytics[Anonymized analytics]
        S3[Archives and audit evidence]
    end

    User --> Normalize
    History --> Normalize
    APIText --> Normalize
    Normalize --> Detect
    Detect --> Policy
    Policy --> Redact
    Policy --> Orch
    Orch --> FM
    Orch --> Ship
    Orch --> Order
    Redact --> Logs
    Redact --> DDB
    Redact --> Analytics
    DDB --> S3

Key rule: raw user text is allowed to exist in memory for the minimum time needed to fulfill the request, but persistent systems should see only the redacted or policy-approved representation.


High-Level Design (HLD)

System Overview

flowchart LR
    User[Web / mobile client] --> Gateway[API Gateway + auth]
    Gateway --> Orch[Chat orchestrator]

    subgraph PII["PII Protection Layer"]
        Normalize[Text normalizer]
        Locale[Locale resolver]
        Regex[Regex scanner]
        NER[NER endpoint]
        Custom[Custom detectors]
        Merge[Overlap merger]
        Score[Confidence scorer]
        Policy[Authorization and action router]
        Redact[Redaction engine]
    end

    Orch --> Normalize
    Normalize --> Locale
    Locale --> Regex
    Locale --> NER
    Locale --> Custom
    Regex --> Merge
    NER --> Merge
    Custom --> Merge
    Merge --> Score
    Score --> Policy
    Policy --> Redact

    Policy -->|Ephemeral approved fields only| FM[Foundation model]
    Policy -->|Ephemeral approved fields only| Shipping[Shipping API]
    Policy -->|Ephemeral approved fields only| Orders[Order API]

    Redact --> DDB[DynamoDB session history]
    Redact --> Logs[CloudWatch application logs]
    Redact --> Analytics[Analytics stream]

    FM --> ResponsePII[Response PII filter]
    Shipping --> ResponsePII
    Orders --> ResponsePII
    ResponsePII --> UserResponse[User-visible response]

HLD Principles

  1. Detection runs before persistence.
  2. Authorization is explicit and separate from confidence.
  3. Raw PII is never forwarded by default.
  4. Response-side filtering exists because the FM can still generate sensitive data even if the input path is clean.
  5. Audit and deletion are first-class parts of the design, not cleanup tasks.

End-to-End Dataflow

Inbound Request Dataflow

sequenceDiagram
    participant User
    participant Gateway as API Gateway
    participant Orch as Orchestrator
    participant Norm as Normalizer
    participant Scan as PII Scanner
    participant Policy as Policy Engine
    participant DDB as DynamoDB
    participant Log as Logs
    participant FM as Foundation Model
    participant API as Order/Shipping API

    User->>Gateway: Chat message with possible PII
    Gateway->>Orch: Auth context + message
    Orch->>Norm: Normalize for detection
    Norm->>Scan: Locale-aware text + offset map
    Scan->>Policy: Findings with confidence
    Policy->>Log: Persist redacted copy only
    Policy->>DDB: Store redacted conversation history
    Policy->>FM: Send minimized prompt
    Policy->>API: Send approved PII fields only if required
    FM-->>Orch: Response candidate
    API-->>Orch: Structured data

Outbound Response Dataflow

flowchart LR
    Candidate[FM or API-composed response] --> RespPII[Response-side PII scan]
    RespPII --> Guardrails[Remaining guardrails]
    Guardrails --> Deliver[Deliver to user]
    RespPII --> Audit[PII near-miss audit event]

Important point: input-side privacy controls do not eliminate the need for output-side privacy controls. The model can hallucinate emails, phone numbers, or addresses that were never present in the input.

Hot Path vs Async Path

flowchart TD
    subgraph HotPath["Synchronous hot path"]
        A[Normalize] --> B[Detect]
        B --> C[Score]
        C --> D[Policy route]
        D --> E[Redact and store safe copy]
        D --> F[Pass ephemeral approved fields]
    end

    subgraph AsyncPath["Asynchronous follow-up"]
        G[Review queue]
        H[Analytics anonymization]
        I[Audit aggregation]
        J[Allowlist refresh]
        K[Deletion evidence generation]
    end

    C --> G
    E --> H
    E --> I
    D --> K
    J --> B

This split matters because privacy logic that must prevent leakage belongs in the synchronous path. Optimization, aggregation, and human review belong in the async path.


PII Detection Architecture Deep Dive

Layer 0: Text Normalization and Locale Resolution

Before detection, the system normalizes text for scanners without losing the ability to redact the original string correctly.

The implementation detail that matters is the offset map:

  • normalize Unicode for detectors
  • remove zero-width characters and repeated spacing
  • standardize separators where safe
  • preserve a mapping from normalized spans back to original spans

Without this, you can detect on normalized text but redact the wrong characters in the original message.

def normalize_for_detection(text: str) -> tuple[str, list[int]]:
    """
    Returns normalized text and an offset map where offset_map[i]
    points to the original character index for normalized position i.
    """
    normalized = []
    offset_map = []

    for index, char in enumerate(text):
        transformed = normalize_char(char)  # NFKC, zero-width removal, spacing cleanup
        for out_char in transformed:
            normalized.append(out_char)
            offset_map.append(index)

    return "".join(normalized), offset_map

Locale resolution uses:

  • authenticated user marketplace or country
  • shipping country if available
  • UI locale
  • detector hints from the text itself

If locale is ambiguous, the pipeline runs the safe union of the relevant locale patterns.

Layer 1: Regex-Based Pattern Detection

Regex is still the cheapest and most reliable detector for strongly structured fields.

PII_PATTERNS = {
    "email": r"\b[A-Za-z0-9._%+-]+@[A-Za-z0-9.-]+\.[A-Za-z]{2,}\b",
    "obfuscated_email": r"\b[A-Za-z0-9._%+-]+\s*(?:@|\[at\]|\(at\))\s*[A-Za-z0-9.-]+\s*(?:\.|\[dot\]|\(dot\))\s*[A-Za-z]{2,}\b",
    "us_phone": r"\b(?:\+1[-.\s]?)?\(?\d{3}\)?[-.\s]?\d{3}[-.\s]?\d{4}\b",
    "jp_phone": r"\b0\d{1,4}[-.\s]?\d{1,4}[-.\s]?\d{4}\b",
    "ssn": r"\b\d{3}[-.\s]?\d{2}[-.\s]?\d{4}\b",
    "credit_card": r"\b(?:\d{4}[-.\s]?){3}\d{4}\b",
    "us_zip": r"\b\d{5}(?:-\d{4})?\b",
    "jp_postal": r"\b\d{3}-\d{4}\b",
    "amazon_order_id": r"\b\d{3}-\d{7}-\d{7}\b",
}

def scan_regex(text: str) -> list[dict]:
    findings = []
    for pii_type, pattern in PII_PATTERNS.items():
        for match in re.finditer(pattern, text, flags=re.IGNORECASE):
            findings.append({
                "type": pii_type,
                "value": match.group(),
                "start": match.start(),
                "end": match.end(),
                "base_confidence": 0.95,
                "source": "regex",
            })
    return findings

Regex strengths:

  • sub-millisecond
  • deterministic
  • auditable
  • ideal for hot-path filtering

Regex weaknesses:

  • poor for names and free-form addresses
  • brittle against novel obfuscation unless continuously refreshed
  • no semantic understanding

Layer 2: NER-Based Entity Detection

NER catches the messy cases regex cannot handle:

  • names
  • addresses
  • organizations
  • location mentions that become sensitive in account context
NER_ENTITY_MAP = {
    "PERSON": "person_name",
    "ADDRESS": "address",
    "GPE": "location",
    "LOC": "location",
    "ORG": "organization",
    "JAPANESE_NAME": "person_name",
}

def scan_ner(text: str, locale: str) -> list[dict]:
    response = sagemaker_runtime.invoke_endpoint(
        EndpointName="pii-ner-endpoint",
        Body=json.dumps({"text": text, "locale": locale}),
        ContentType="application/json",
    )
    entities = json.loads(response["Body"].read())
    findings = []

    for entity in entities:
        if entity["label"] not in NER_ENTITY_MAP:
            continue
        if entity["score"] <= 0.70:
            continue

        findings.append({
            "type": NER_ENTITY_MAP[entity["label"]],
            "value": entity["text"],
            "start": entity["start"],
            "end": entity["end"],
            "base_confidence": entity["score"],
            "source": "ner",
        })
    return findings

Domain-specific requirement: a generic NER model will over-flag manga character names as real people. Fine-tuning reduces this, but in production it is still not enough by itself. The right solution is layered:

  • NER fine-tuning on manga/e-commerce data
  • character allowlist
  • context-aware score adjustments
  • monitoring of mid-confidence person_name findings

Layer 3: Custom Detectors

Some sensitive strings are domain-specific and need business logic, not just generic NLP.

CUSTOM_PATTERNS = {
    "amazon_customer_id": r"\b[A-Z0-9]{13,14}\b",
    "tracking_number": r"\b(?:1Z[A-Z0-9]{16}|TBA\d{12,})\b",
    "gift_card_code": r"\b[A-Z0-9]{4}-[A-Z0-9]{6}-[A-Z0-9]{4}\b",
    "asin": r"\bB0[A-Z0-9]{8}\b",
}

ACCOUNT_CONTEXT_KEYWORDS = {"account", "customer id", "order", "my account", "profile"}

def scan_custom(text: str) -> list[dict]:
    findings = []
    text_lower = text.lower()

    for pii_type, pattern in CUSTOM_PATTERNS.items():
        for match in re.finditer(pattern, text):
            base_confidence = 0.90

            if pii_type == "amazon_customer_id":
                if not any(keyword in text_lower for keyword in ACCOUNT_CONTEXT_KEYWORDS):
                    base_confidence = 0.40

            findings.append({
                "type": pii_type,
                "value": match.group(),
                "start": match.start(),
                "end": match.end(),
                "base_confidence": base_confidence,
                "source": "custom",
            })

    return findings

This is where many privacy systems fail. They either redact too broadly and destroy utility, or they under-classify business-specific identifiers because they do not look like classic PII.

Layer 4: Overlap Merging and Canonical Findings

The same span may be detected by multiple scanners. If we do not merge, we create double-redaction bugs and inconsistent routing.

def merge_overlapping_findings(findings: list[dict]) -> list[dict]:
    findings = sorted(findings, key=lambda item: (item["start"], item["end"]))
    merged = []

    for finding in findings:
        if not merged or finding["start"] > merged[-1]["end"]:
            merged.append({
                **finding,
                "sources": {finding["source"]},
            })
            continue

        previous = merged[-1]
        previous["end"] = max(previous["end"], finding["end"])
        previous["sources"].add(finding["source"])
        previous["base_confidence"] = max(previous["base_confidence"], finding["base_confidence"])

    return merged

Merge rule: keep the most conservative type and the highest base confidence, then let the scorer apply agreement bonuses.

Layer 5: Confidence Score Calculation

The score is not a single ML probability. It is a composite routing score:

final_confidence =
    clamp(
        base_confidence
        + detector_agreement_bonus
        + pii_context_bonus
        - non_pii_context_penalty
        + locale_support_bonus
        + manual_override
    )

Base Score Sources

Source Base Score
Regex exact match 0.95
NER detection model score
Custom detector usually 0.90, reduced to 0.40 for weak account context

Scoring Adjustments

Adjustment Example Delta
Multi-detector agreement Regex and NER both support same span +0.05
Strong PII context my name is, ship to, email me, account +0.10
Non-PII manga context recommend, series, volumes, character -0.20 for person_name findings
Locale support bonus Locale and format strongly align +0.03 to +0.05
Character allowlist override Gojo Satoru from catalog list force to 0.10

Reference Implementation

PII_CONTEXT_BOOST = {"my name is", "ship to", "email me", "phone number", "account", "order"}
NON_PII_CONTEXT_HINTS = {"recommend", "series", "volumes", "character", "manga", "anime"}

def calculate_confidence(finding: dict, text: str, locale: str) -> float:
    score = finding["base_confidence"]
    text_lower = text.lower()

    if len(finding.get("sources", set())) >= 2:
        score += 0.05

    if any(keyword in text_lower for keyword in PII_CONTEXT_BOOST):
        score += 0.10

    if finding["type"] == "person_name" and any(keyword in text_lower for keyword in NON_PII_CONTEXT_HINTS):
        score -= 0.20

    if locale_supports_finding(locale, finding):
        score += 0.03

    if finding["type"] == "person_name" and is_character_name(finding["value"]):
        score = 0.10

    return max(0.0, min(score, 1.0))

Scoring Examples

Input Detector Path Final Score Why
alex@example.com Regex 0.95 Exact pattern match
Ship to 123 Oak Street NER address 0.84 + context 0.94 Delivery context boosts address confidence
Gojo Satoru in recommendation query NER 0.86 - manga hint - allowlist 0.10 Known character name, not customer PII
14-char account-like string without account context Custom detector 0.40 Too ambiguous to redact inline

Important Distinction: Score vs Policy

The score answers "How likely is this span to be sensitive?".

Policy answers "What are we allowed to do with it?".

Those are different questions.

Example: a shipping address can have a confidence of 0.94 and still be allowed to reach the shipping API ephemerally. The score remains high because it is still PII. Policy decides that only one downstream component is allowed to see it.

Layer 6: Routing and Actions

flowchart TD
    Findings[Canonical findings] --> Score[Calculate final confidence]
    Score --> Auth{Authorized for this downstream use?}

    Auth -->|No| Route1{Score range}
    Auth -->|Yes| Route2{Score range}

    Route1 -->|>= 0.9| Redact[Full redact]
    Route1 -->|0.7 - 0.89| Mask[Mask + review queue]
    Route1 -->|0.5 - 0.69| Monitor[Log only]
    Route1 -->|< 0.5| Pass1[Pass]

    Route2 -->|>= 0.9| Ephemeral[Ephemeral pass-through to approved service only]
    Route2 -->|0.7 - 0.89| Mask2[Mask for storage, pass minimal approved field]
    Route2 -->|< 0.7| Pass2[Pass to approved service]

Default Routing Table

Final Score Authenticated User Guest User Default Action
>= 0.9 Redact in persistence, allow ephemeral approved use only Redact everywhere Highest certainty
0.7 - 0.89 Mask for storage, review if ambiguous Redact everywhere Likely PII
0.5 - 0.69 Monitor only unless policy requires stronger handling Redact for guest Borderline
< 0.5 Pass Pass Likely false positive

Guest policy is stricter because unattributable guest PII is harder to govern later.


Low-Level Design (LLD)

Component Breakdown

classDiagram
    class PIIFinding {
        +str type
        +str value
        +int start
        +int end
        +str source
        +float base_confidence
        +float final_confidence
        +bool authorized
        +str action
    }

    class TextNormalizer {
        +normalize_for_detection(text)
    }

    class LocaleResolver {
        +resolve(session, text)
    }

    class RegexScanner {
        +scan(text, locale)
    }

    class NERClient {
        +scan(text, locale)
    }

    class CustomDetector {
        +scan(text, session)
    }

    class FindingMerger {
        +merge(findings)
    }

    class ConfidenceScorer {
        +score(finding, text, locale)
    }

    class PolicyEngine {
        +route(findings, session, intent)
    }

    class RedactionEngine {
        +apply(text, findings, offset_map)
    }

    class AuditPublisher {
        +publish(findings, decisions)
    }

    TextNormalizer --> LocaleResolver
    LocaleResolver --> RegexScanner
    LocaleResolver --> NERClient
    LocaleResolver --> CustomDetector
    RegexScanner --> FindingMerger
    NERClient --> FindingMerger
    CustomDetector --> FindingMerger
    FindingMerger --> ConfidenceScorer
    ConfidenceScorer --> PolicyEngine
    PolicyEngine --> RedactionEngine
    PolicyEngine --> AuditPublisher

Data Contracts

from dataclasses import dataclass, field

@dataclass
class PIIFinding:
    type: str
    value: str
    start: int
    end: int
    source: str
    base_confidence: float
    final_confidence: float = 0.0
    locale: str = "unknown"
    authorized: bool = False
    action: str = "pass"
    sources: set[str] = field(default_factory=set)

@dataclass
class PIIDecision:
    redacted_text: str
    approved_fields: dict
    findings: list[PIIFinding]
    audit_metadata: dict

Request-Side Orchestration

def process_inbound_message(
    raw_text: str,
    session: dict,
    intent: str,
    approved_fields_for_intent: set[str],
) -> PIIDecision:
    normalized_text, offset_map = normalize_for_detection(raw_text)
    locale = resolve_locale(session, normalized_text)

    findings = []
    findings.extend(scan_regex(normalized_text))
    findings.extend(scan_ner(normalized_text, locale))
    findings.extend(scan_custom(normalized_text))

    merged = merge_overlapping_findings(findings)

    for finding in merged:
        finding["final_confidence"] = calculate_confidence(finding, normalized_text, locale)
        finding["authorized"] = finding["type"] in approved_fields_for_intent

    routed = route_findings(merged, session, intent)
    redacted_text = apply_redactions(raw_text, routed, offset_map)

    store_redacted_history(session, redacted_text)
    emit_pii_audit_event(session, routed)

    return PIIDecision(
        redacted_text=redacted_text,
        approved_fields=extract_ephemeral_approved_fields(raw_text, routed, offset_map),
        findings=routed,
        audit_metadata=build_audit_metadata(routed),
    )

Storage Schema

Session History Record

Field Example Notes
session_id sess_123 Partition key
turn_id 17 Sort key
actor user or assistant Who produced the turn
redacted_text Ship to [ADDRESS_REDACTED] Stored form
pii_types_detected ["address"] Audit-friendly metadata
pii_confidence_max 0.94 Highest confidence in turn
guest_mode true Drives TTL and policy
expires_at epoch time DynamoDB TTL

PII Audit Event

Field Example Notes
request_id req_abc Trace correlation
session_id_hash hash PII-safe join key
pii_findings_count 2 Aggregate, not raw PII
pii_types ["email", "address"] Audit taxonomy
max_confidence 0.95 Detection signal
actions ["redact", "ephemeral_pass"] What happened
authorized_fields ["address"] Approved downstream use
model_version pii-ner-v5 Reproducibility

Latency Budget by Module

Module Target Latency Notes
Normalizer + locale resolver <1ms Pure in-process
Regex scanner <1ms Cheap deterministic rules
Custom detectors <1ms Mostly regex + context
NER endpoint 3-8ms Cached real-time endpoint
Merger + scorer + routing <1ms In-process
Redaction <1ms Offset-map aware substring replacement
Total request-side PII stage 5-12ms Fits inside chat latency budget

Implementation Components and Tools

Component / Tool Why It Exists Typical Use in This Design
Python re Deterministic pattern matching Email, phone, postal code, order ID, obfuscated email detection
SageMaker-hosted NER model Low-latency entity extraction Names, addresses, multilingual entities
Catalog-backed character allowlist Domain false-positive control Prevent character names from being redacted as customer names
DynamoDB Short-lived session store Redacted conversation history with TTL
CloudWatch Logs Operational logging Store redacted application events only
Analytics stream + warehouse Product analytics Consume anonymized events, never raw PII
S3 + lifecycle policy Archive and evidence store Encrypted archives, audit evidence, deletion artifacts
KMS envelope encryption Field-level encryption where PII must be stored Protect approved stored sensitive fields
Step Functions or equivalent workflow engine Multi-system deletion orchestration GDPR/CCPA erasure workflow with evidence chain
Async review queue Human review of ambiguous detections Medium-confidence or policy-sensitive cases

Scenario Deep Dives

Scenario 1: Manga Character Names Triggering PII Redaction

Context

After the NER-based detector went live, recommendation quality dropped because fictional character names were being classified as real-person PII.

User message:

I want manga with Gojo Satoru and Levi Ackerman in it

Observed failure:

  • NER labeled both names as PERSON
  • confidence landed in the 0.82 - 0.91 range
  • character names were redacted before recommendation retrieval
  • retrieval lost the most important entity

Failure Path

flowchart LR
    Query[Recommendation query with character names] --> NER[Generic PERSON detection]
    NER --> Score[High person-name confidence]
    Score --> Redact[Redact names]
    Redact --> Retrieve[Retriever sees degraded query]
    Retrieve --> BadRecs[Generic or irrelevant recommendations]

Root Cause

The model was right syntactically and wrong semantically. Manga character names look like real names, especially Japanese names. The detector lacked:

  • negative examples for fictional names
  • domain context such as recommend, series, volumes
  • a catalog-derived allowlist

Improved Design

flowchart LR
    Query[Recommendation query] --> NER[NER person-name detection]
    Query --> Context[Intent and context scorer]
    Query --> Allowlist[Character allowlist lookup]
    NER --> Merge[Merge]
    Context --> Merge
    Allowlist --> Merge
    Merge --> FinalScore[Final confidence]
    FinalScore -->|0.10 after override| Pass[Keep term in query]
    Pass --> Retrieve[Retriever sees original character name]

Implementation Changes

  1. Added a catalog-derived CHARACTER_NAMES allowlist refreshed daily.
  2. Added a negative penalty for recommendation-style context.
  3. Fine-tuned the NER model on manga-domain negative examples.
  4. Added a weekly audit for person_name findings in the 0.70 - 0.90 band.

Why the Hybrid Approach Won

Pure model fine-tuning helps but reacts slowly. Pure allowlists help quickly but are brittle. The hybrid approach:

  • gives immediate mitigation
  • reduces hot-path false positives
  • keeps improving as the model is retrained
  • creates explicit observability for uncertain names

Metric Signal

  • false positive rate on character-name queries: 40% -> 3%
  • recommendation relevance on character-name queries: +22%

Scenario 2: User Pasting Full Address for Delivery Estimate

Context

Users often paste their full address directly into chat instead of going through account settings.

Example:

Can you deliver to 123 Oak Street, Apt 4B, Springfield IL 62704?

The address is needed to compute delivery estimates, but it is not needed in:

  • logs
  • analytics
  • conversation history
  • model training exports

Incorrect Dataflow

flowchart LR
    User[User address in chat] --> Store1[Store raw message]
    Store1 --> Log[Logs]
    Store1 --> DDB[Session history]
    Store1 --> Analytics[Analytics]
    Store1 --> PII[PII detection later]
    PII --> Redact[Redact after the fact]

This is architecturally wrong because the leak already happened before redaction.

Correct Dataflow

sequenceDiagram
    participant User
    participant Gateway as API Gateway
    participant Scan as PII scanner
    participant Policy as Policy engine
    participant Log as Logs
    participant DDB as DynamoDB
    participant Orch as Orchestrator
    participant Ship as Shipping API

    User->>Gateway: "Deliver to 123 Oak Street, Springfield IL 62704?"
    Gateway->>Scan: Scan before any persistence
    Scan->>Policy: Address finding, confidence 0.94
    Policy->>Log: Write redacted text only
    Policy->>DDB: Store redacted text only
    Policy->>Orch: Pass original address ephemerally
    Orch->>Ship: Request delivery estimate with real address
    Ship-->>Orch: Delivery estimate
    Orch-->>User: "Estimated delivery is March 28"

Implementation Details That Matter

  1. The original address only exists in orchestrator memory.
  2. It is never written to logs, history, or analytics.
  3. The session history stores the redacted version and the resulting answer.
  4. Any asynchronous debug trace stores only the redacted text plus metadata such as pii_types=["address"].

Why Not Store the Encrypted Address for Convenience?

Because convenience is not a valid reason to expand the data footprint. Encryption reduces exposure after storage; it does not make unnecessary storage acceptable. The privacy-first design is to avoid storing it at all unless there is a clear product need.

Metric Signal

  • address leakage to non-essential systems: 100% -> 0%
  • no meaningful latency increase because the scan already existed; only the ordering changed

Scenario 3: GDPR Right-to-Deletion Request

Context

A user requests erasure under GDPR Article 17. The hard part is not deleting the primary conversation row. The hard part is deleting or anonymizing every copy and derived form:

  • live session store
  • archives
  • logs
  • analytics
  • training exports
  • downstream evidence of deletion

HLD for Deletion Workflow

stateDiagram-v2
    [*] --> RequestReceived
    RequestReceived --> IdentityValidated
    IdentityValidated --> RegistryUpdated
    RegistryUpdated --> DeleteSessions
    RegistryUpdated --> DeleteArchives
    RegistryUpdated --> AnonymizeAnalytics
    RegistryUpdated --> PurgeExports
    RegistryUpdated --> ConfirmLogRetention
    DeleteSessions --> Evidence
    DeleteArchives --> Evidence
    AnonymizeAnalytics --> Evidence
    PurgeExports --> Evidence
    ConfirmLogRetention --> Evidence
    Evidence --> UserConfirmation
    UserConfirmation --> [*]

Detailed Dataflow

sequenceDiagram
    participant Support
    participant Workflow as Deletion orchestrator
    participant Registry as Deletion registry
    participant DDB as DynamoDB
    participant S3 as S3 archives
    participant WH as Analytics warehouse
    participant Export as Training export pipeline
    participant Audit as Audit bucket

    Support->>Workflow: Validated deletion request
    Workflow->>Registry: Write deletion request and request_id
    Workflow->>DDB: Delete conversation records by customer_id
    Workflow->>S3: Delete archived objects by tag or inventory lookup
    Workflow->>WH: Anonymize or delete attributable rows
    Workflow->>Export: Add customer_id to denylist for future exports
    Workflow->>Audit: Write evidence artifacts per step
    Workflow-->>Support: Completion summary with evidence IDs

Low-Level Implementation Notes

  1. Deletion must be idempotent. Re-running the workflow should not fail if records are already gone.
  2. Derived data should either be deleted or anonymized with a documented policy.
  3. Backups need an explicit position: - short-lived immutable backups may be exempt from immediate mutation - restore workflows must replay the deletion registry before data becomes active
  4. The deletion registry exists to prove compliance and prevent reintroduction into later training exports.

Why the Data Lineage Map Matters

If a system stores user-linked data and is missing from the lineage map, deletion is already broken. The lineage map is not documentation overhead; it is the control surface for legal erasure.

Metric Signal

  • deletion fulfillment time: ~3 days manual -> <4 hours automated
  • systems covered: 5/5
  • dry-run audit pass rate: 100%

Scenario 4: Guest User PII Boundary Enforcement

Context

Guest users should not need to share PII, but many still paste it into chat:

My email is alex@example.com, can you check my gift card status?

The system cannot safely treat guest PII like authenticated PII because:

  • there is no durable customer_id
  • there is no strong identity binding
  • later deletion is harder or impossible

Policy Comparison

flowchart TD
    Start[Incoming message] --> Mode{Authenticated?}

    Mode -->|Yes| Auth[Standard privacy policy]
    Mode -->|No| Guest[Guest privacy policy]

    Auth --> AuthScore[Use standard thresholds]
    Auth --> AuthTTL[24h session TTL]
    Auth --> AuthUse[Allow approved ephemeral PII use]

    Guest --> GuestScore[Lower redaction threshold]
    Guest --> GuestTTL[2h session TTL]
    Guest --> GuestUse[Never pass account-related PII through]
    Guest --> GuestPrompt[Prompt user to sign in]

Guest-Specific Rules

  1. Redact from >= 0.5, not only from >= 0.9.
  2. Never pass account-recovery-like fields to downstream account tools.
  3. Keep guest TTL at 2 hours, not 24 hours.
  4. Emit a user-facing message explaining that account assistance requires sign-in.

Why Not Block Any Guest Message That Contains PII?

Because some guest flows are still useful even when a small amount of PII appears accidentally. The better balance is:

  • redact aggressively
  • minimize retention
  • nudge toward authentication for account-specific help

That protects privacy without turning the guest experience into a wall of rejections.

Metric Signal

  • guest sessions containing stored PII: ~18% -> <1%
  • guest-to-auth conversion: +12%

Data Retention, Deletion, and Evidence Chain

Retention Architecture

flowchart TB
    subgraph SessionData["Session and interaction data"]
        Conv[Conversation history<br/>24h auth / 2h guest]
        Meta[Session metadata<br/>24h]
        Logs[Application logs<br/>30 days]
        Audit[Audit evidence<br/>1 year]
    end

    subgraph AnalyticsData["Derived data"]
        Events[Identifiable analytics<br/>90 days]
        Agg[Aggregated analytics<br/>Long-lived]
        Exports[Training exports<br/>Rolling export cycle]
    end

    subgraph Controls["Controls"]
        TTL[DynamoDB TTL]
        LC[S3 lifecycle rules]
        Anon[Warehouse anonymization job]
        Registry[Deletion registry]
    end

    Conv --> TTL
    Meta --> TTL
    Logs --> LC
    Audit --> LC
    Events --> Anon
    Exports --> Registry

Retention Table

Store Retention Deletion Mechanism Notes
DynamoDB conversation history 24h authenticated, 2h guest TTL + explicit delete Stores redacted text only
Session metadata 24h TTL No raw PII by default
CloudWatch logs 30 days Retention policy Logs must already be redacted
S3 archives 90 days then lifecycle delete Lifecycle + explicit delete by tag Encrypted
Analytics warehouse 90 days identifiable then anonymized ETL anonymization Keep aggregate trends only
Training exports Rolling cycle Export filter + deletion registry Deleted users excluded from future exports
Audit evidence 1 year Lifecycle retention Keeps proof, not raw deleted data

Evidence Chain for Deletion

The system should retain proof of action, not the deleted data itself. Evidence artifacts usually contain:

  • request ID
  • customer ID hash
  • systems targeted
  • timestamp per step
  • success or retry status
  • operator or workflow identity

That gives auditors a durable trail without preserving the original PII.


Monitoring, Alerting, and Testing

Key Metrics

Metric Why It Matters Example Alert
pii_in_response_rate Sensitive data still reaching users >0.5% of responses
pii_near_miss_rate Model or tools are generating data guardrails must catch sudden spike over baseline
character_name_false_positive_rate Domain utility regression >5%
guest_pii_persistence_rate Guest privacy boundary drift any sustained increase
deletion_sla_hours Compliance risk >72h or internal target breach
pii_stage_p95_ms Hot-path latency regression >15ms
mid_confidence_review_volume Ambiguity trend unusual jump suggests drift

Alert Design

Use alerts that distinguish true privacy incidents from noisy detector activity:

  • high severity: PII in final response, cross-user leakage, failed deletion workflow
  • medium severity: detector recall drop, guest-policy leakage, audit evidence gaps
  • low severity: rising false positives, latency regression, review backlog growth

Test Strategy

Test Layer What It Covers Example Cases
Unit tests regex, scorer, routing email, phone, character-name overrides
Integration tests dataflow correctness address should not appear in logs or history
Adversarial tests obfuscation and prompt-induced PII john [at] gmail [dot] com, zero-width chars
Multi-locale tests locale-specific coverage JP postal, DE address, US SSN
Regression tests domain false positives Gojo Satoru, Attack on Titan, ASINs
Canary monitoring live safety before full rollout compare near-miss rate and block rate

Deep-Dive Test Cases Worth Having

  1. Obfuscated PII: alex [at] example [dot] com
  2. Mixed script text: full-width digits, Japanese address fragments
  3. Character name vs real customer name in similar syntax
  4. Same PII passed through authorized and unauthorized intents
  5. FM-generated fake contact details in response prose
  6. Deletion request for a user with data across every storage layer

Failure Modes and Tradeoffs

Decision What We Chose Alternative Upside Downside
Detection timing Scan before persistence Scrub after storage Prevents initial leak Synchronous hot-path work
Confidence routing Multi-tier thresholds Single redact threshold Better precision/recall balance More policy complexity
Name handling Allowlist + context + NER tuning NER only Fast false-positive reduction Allowlist maintenance
Guest policy Lower threshold + short TTL Same as authenticated Better privacy for unattributable data Some guest flows become more limited
Address handling Ephemeral pass-through only Store encrypted for reuse Strong minimization User may need to re-enter later
Deletion workflow Central orchestrator + registry Manual deletions Audit-ready and repeatable Engineering investment

Residual Risks

Even after all of the above, some risks remain:

  • semantic PII that looks harmless to detectors
  • future locale formats not yet covered
  • prompt changes that induce the FM to generate contact-like strings
  • derived datasets accidentally created outside the lineage map

The mitigation pattern is not "build one better detector." It is:

  • layered controls
  • observability
  • explicit ownership of every persistence boundary
  • continuous negative testing

Follow-Up Questions and Deep-Dive Answers

These are the questions a strong interviewer, reviewer, or principal engineer will ask after the base privacy design looks reasonable.

Q1. Why is authorization kept separate from the confidence score instead of blending them into one number?

Answer: Because they represent different facts. Confidence is a classification estimate: "How likely is this span to be sensitive?" Authorization is a policy decision: "Is this component allowed to see that sensitive field for this use case?" If you blend them, you create ambiguous behavior. A shipping address used for a delivery estimate might be highly sensitive with confidence 0.94, but still authorized for one API call. Lowering the score just because it is authorized would hide the fact that it is sensitive, break audit quality, and make downstream analysis harder. The clean design is to keep the high score, route storage through redaction, and allow only one ephemeral approved use.

Q2. Why not replace regex, custom rules, and allowlists with one stronger LLM or one larger NER model?

Answer: Because the job is not only recognition accuracy. It is low-latency, deterministic enforcement. Regex and rules are cheap, explainable, and reliable for structured patterns. The NER model handles names and addresses, but it is probabilistic and domain-sensitive. Allowlists handle known fictional entities that would otherwise create repeated false positives. A single-model design looks elegant but performs poorly on three dimensions that matter in production: latency, debuggability, and exact control over business-specific identifiers. Layering is not technical debt here. It is the operationally correct architecture.

Q3. How do you prevent overlap bugs such as double redaction or conflicting actions for the same text span?

Answer: The pipeline must canonicalize findings before routing. That means sorting findings by span, merging overlapping ranges, carrying forward the strongest conservative type, and recording all supporting sources for agreement bonuses and later audit. Routing decisions happen only on canonical findings, never on raw detector outputs. Without this step, one scanner can say mask, another can say redact, and the rendering layer ends up corrupting the message or applying inconsistent policy. The overlap merger is a low-complexity component, but it is essential.

Q4. How do you scale NER without turning privacy into the latency bottleneck?

Answer: Keep as much as possible in-process and deterministic, and reserve the model for what only the model can do. Regex and custom rules should handle all structured cases. NER should run on a warm real-time endpoint with strict entity filtering and a narrow label set. In addition, monitor the distribution of text lengths and consider early exits for short strings that only contain structured patterns already handled by regex. The latency budget should be explicit and enforced, because privacy that only works at low traffic is not production privacy.

Q5. How do you know a privacy regression is real and not just a change in user behavior?

Answer: You need both rate metrics and trace-level evidence. A metric such as pii_near_miss_rate tells you that the model or downstream tools are generating more sensitive content. But that by itself does not prove leakage. Pair it with trace sampling: inspect what was detected, where it was blocked, and whether the final delivered response still contained sensitive strings. For false positives, inspect mid-confidence review volume and domain-specific error slices such as character-name queries. The combination of aggregate metrics and trace slices is what distinguishes a real regression from traffic mix noise.

Q6. How would you test obfuscated PII and multi-locale inputs without breaking legitimate Japanese text?

Answer: Normalize for detectors, not for the final user-visible text. That means the detector path can collapse zero-width characters, standardize separators, and map homoglyphs where appropriate, while the original text and an offset map are preserved for precise redaction. The test suite should include multilingual examples, full-width digits, Japanese addresses, German street formats, obfuscated emails, and benign manga terms that should survive intact. The goal is not to normalize everything globally. The goal is to normalize enough for classification while preserving user fidelity and exact span control.

Q7. How do you delete data from analytics or training systems that are not simple key-value stores?

Answer: You need a documented policy per system. For analytics, row-level deletion is ideal when practical; otherwise anonymization can be acceptable if it truly breaks attribution and is documented in policy. For training exports, the deletion registry is critical. Once a user is deleted, future export jobs must exclude that identifier. If the training artifact has already been produced, you need a purge or regeneration rule. The key is to decide this upfront and encode it into the lineage map. If a system cannot answer "How do we delete or de-identify this user's data?" it is not production-ready.

Q8. Why keep a deletion registry for one year if the user asked to be deleted?

Answer: Because the registry is not the user's data in the product sense; it is compliance control metadata. Its purpose is to prevent reintroduction and to prove that a request was processed. The registry should contain the minimum necessary form, typically a hashed customer identifier, request metadata, timestamps, and outcome status. It should not contain the deleted content itself. This is an example of the difference between retaining business data and retaining control-plane evidence.

Q9. Why not forbid all guest messages that contain PII instead of using a lower threshold and shorter TTL?

Answer: Because privacy and usability both matter. Guests still ask legitimate shopping questions, and many paste unnecessary personal details accidentally. A hard-block-only design causes friction without improving security proportionally. The better design is to redact aggressively, keep retention minimal, restrict downstream account actions, and steer the user toward sign-in when account-specific help is needed. That preserves privacy while still allowing useful guest interactions like catalog discovery or general shipping policy questions.

Q10. What prevents the character allowlist from becoming a bypass where attackers choose names that look like allowed content?

Answer: The allowlist should be scoped narrowly and used only as one signal, not as unconditional trust. It should be derived from the catalog, refreshed automatically, and applied primarily to person_name findings in recommendation-style contexts. If the same string appears in account or shipping context, the context boost should outweigh the content hint and the finding should stay sensitive. Also monitor collisions: if a real customer name overlaps a popular character name, the audit and review path should surface that pattern. The point is to reduce domain false positives, not to create a universal privacy exemption.

Q11. If MangaAssist adds voice input with streaming transcripts, what changes in the privacy architecture?

Answer: The main change is granularity. Detection can no longer wait for the full message; it has to operate on partial transcript windows while still supporting correction as ASR hypotheses stabilize. That means temporary findings may need to be revised, and the UI should avoid exposing raw partial transcripts to logs. The architecture remains the same conceptually: detection before persistence, ephemeral approved use, redacted storage, response-side filtering. But the implementation gets more complex because timing and transcript revision become part of the privacy surface.

Q12. What is the hardest residual failure mode even after implementing all of this?

Answer: The hardest residual risk is semantically sensitive data that does not look like classic PII and only becomes risky when combined across turns or systems. For example, a sequence of benign-seeming utterances can reveal enough to identify a person or infer account ownership. That is why the long-term control is not only span detection. It is also session-level policy, bounded context windows, strong access control to downstream tools, and careful decisions about what history is retained at all.


Key Lessons

  1. Privacy controls must run before persistence, not after.
  2. Confidence-based routing is useful only if it is separated from authorization policy.
  3. Domain false positives are a product risk, not just an ML annoyance.
  4. Output-side privacy filtering is mandatory because the FM can still generate sensitive content.
  5. Deletion is an architecture capability, not a support-team procedure.
  6. Guest sessions deserve stricter defaults because their data is harder to govern later.
  7. The strongest privacy systems are layered, observable, and explicit about every persistence boundary.

Cross-References