LOCAL PREVIEW View on GitHub

12. Security and Privacy

Security Architecture

graph TD
    subgraph "Perimeter"
        A[WAF] --> B[TLS 1.3]
        B --> C[API Gateway]
    end

    subgraph "Application"
        C --> D[IAM Roles<br>Least privilege]
        D --> E[Input Sanitization<br>Prompt injection defense]
        E --> F[Output Filtering<br>PII redaction]
    end

    subgraph "Data"
        F --> G[Encryption at Rest<br>KMS managed keys]
        G --> H[VPC Isolation<br>Private subnets]
        H --> I[Audit Logging<br>CloudTrail]
    end

User Data Protection

Data Classification

Data Type Classification Handling
User message text Confidential Encrypted in transit and at rest; PII-scrubbed before analytics
Customer ID Confidential Never logged in plaintext; hashed in analytics
Order details Confidential Accessed only when the user requests it; never cached long-term
Browsing history Internal Session-scoped; not stored beyond the chat session
Product catalog data Public Freely used in responses
Chat analytics (aggregated) Internal PII removed; used for metrics

Data Retention

Data Retention Reason
Conversation memory (DynamoDB) 24 hours (TTL) Multi-turn context only needed during session
Analytics events (Redshift) 90 days Metrics and improvement
Feedback data 1 year Model fine-tuning
Raw user messages Not stored beyond session Privacy by design
LLM prompts and responses 30 days, encrypted Debugging and quality review

Access Control

Every service component has its own IAM role with minimum required permissions.

Orchestrator Role - dynamodb:GetItem, PutItem on the chatbot sessions table - bedrock:InvokeModel on specific model ARNs - sagemaker:InvokeEndpoint on the intent classifier - opensearch:ESHttpGet on the knowledge base index - No access to payment systems - No access to account settings - No write access to order or catalog systems

Frontend - Can execute API Gateway endpoints only - No direct access to backend services

Authentication Flow

sequenceDiagram
    participant User
    participant Frontend
    participant Gateway
    participant Auth

    User->>Frontend: Opens chat
    Frontend->>Gateway: Request with Amazon session cookie
    Gateway->>Auth: Validate session token
    Auth-->>Gateway: {customer_id: C123, is_prime: true}
    Gateway->>Gateway: Attach customer_id to request context
    Gateway-->>Frontend: WebSocket established

Guest users receive a temporary session ID. They can use product discovery, FAQ, and generic recommendations. Personalized recommendations and order tracking require authentication.

PII Handling

Input Sanitization

Before any user message reaches the LLM or analytics pipeline:

graph LR
    A[User Message] --> B[PII Detector]
    B --> C{PII found?}
    C -->|Email| D[Replace with [EMAIL]]
    C -->|Phone| E[Replace with [PHONE]]
    C -->|SSN| F[Replace with [REDACTED]]
    C -->|Credit Card| G[Replace with [REDACTED]]
    C -->|Address| H[Replace with [ADDRESS]]
    C -->|No PII| I[Pass through]
    D --> J[Sanitized Message]
    E --> J
    F --> J
    G --> J
    H --> J
    I --> J

This prevents personal data from reaching model logs or analytics.

Output Filtering

If the LLM generates PII from conversation context, the guardrails pipeline redacts it before the response is shown to the user.

Secure Logging

What's Logged How PII Handling
Request metadata CloudWatch Logs No PII present
User messages for quality review Encrypted S3 bucket PII-scrubbed; access requires security approval
LLM prompts and responses Encrypted S3; 30-day retention PII-scrubbed from the user message portion
Analytics events Kinesis -> Redshift Customer ID hashed; messages PII-scrubbed
Error logs CloudWatch Logs Stack traces only

Access to logs containing user messages requires: 1. Security team approval. 2. Time-limited access. 3. Audit trail in CloudTrail.

Compliance Considerations

Regulation How We Comply
GDPR Data minimization, right to deletion, no cross-border transfer without safeguards
CCPA Users can request data deletion and opt out of data collection
COPPA No age data collection; no personalization for unauthenticated users who might be minors
PCI-DSS Chatbot never handles payment data
Amazon internal policies Data classification, access controls, encryption standards

Model Safety and Content Moderation

Prompt Injection Defense

graph TD
    A[User Message] --> B[Input Validator]
    B --> C{Contains injection patterns?}
    C -->|Ignore previous instructions| D[Block and Log]
    C -->|You are now...| D
    C -->|Encoded or obfuscated attempts| D
    C -->|Clean| E[Process normally]
    D --> F[Return generic response]

Defenses: 1. System prompt is clearly separated from user input. 2. Input is scanned for known injection patterns. 3. The LLM is instructed never to follow instructions embedded in user messages that contradict the system prompt. 4. Output guardrails catch responses that deviate from expected behavior.

Content Moderation

Risk Mitigation
Offensive user content Toxicity filter on input; respond with a neutral redirect
Inappropriate LLM output Bedrock Guardrails with toxicity, sexual content, and violence filters
Off-topic discussion Scope guardrail keeps the chatbot focused on manga shopping
System prompt extraction Prompt leakage detection in the guardrail pipeline

Abuse Prevention

Abuse Type Detection Response
Spam or flooding Rate limiter 429 response and temporary cooldown
Prompt injection Pattern matching + LLM-based detection Block, log, return safe response
Data scraping Session limits and CAPTCHA after suspicious patterns Block session
Social engineering Prompt hardening and policy constraints Guardrail + safe redirect
Fake escalations Limit escalations per session Offer self-service alternatives