Scenarios and Runbooks — Flexible Model Interaction

MangaAssist context: JP Manga store chatbot on AWS — Bedrock Claude 3 (Sonnet at $3/$15 per 1M tokens input/output, Haiku at $0.25/$1.25), OpenSearch Serverless (vector store), DynamoDB (sessions/products), ECS Fargate (orchestrator), API Gateway WebSocket, ElastiCache Redis. Target: useful answer in under 3 seconds, 1M messages/day scale.

Skill Mapping

Dimension	Value
Certification	AWS Certified AI Practitioner — Specialty (AIP-C01)
Task	2.4 — Select and implement FM API integration patterns
Skill	2.4.1 — Flexible Model Interaction
This File	03 — Scenarios and Runbooks (sync timeout, SQS backlog, JP text validation, Lambda throttling, async callback failure)

Skill Scope

This file provides five production incident scenarios that exercise the synchronous/asynchronous Bedrock integration patterns, API Gateway validation, and SQS processing pipeline. Each scenario includes the problem statement, a detection flow diagram, root cause analysis, resolution code, and preventive measures.

Scenario 1: Synchronous Timeout — 504 Gateway Timeout

Problem

During evening peak hours (19:00-22:00 JST) when manga readers are most active, approximately 15% of chat requests return HTTP 504 Gateway Timeout. Users see a blank response after waiting the full 29 seconds. The issue correlates with Sonnet model invocations for complex queries (multi-turn conversations with 8+ turns of context).

Detection

flowchart TD
    A[CloudWatch Alarm:<br/>5XX rate > 5%] --> B[Check API Gateway<br/>Latency metrics]
    B --> C{IntegrationLatency<br/>> 29000ms?}
    C -->|Yes| D[Backend timeout<br/>confirmed]
    C -->|No| E[Check ECS task<br/>health]

    D --> F[Check Bedrock<br/>InvocationLatency metric]
    F --> G{Bedrock latency<br/>> 25000ms?}
    G -->|Yes| H[Model inference<br/>is the bottleneck]
    G -->|No| I[Check DynamoDB/<br/>Redis latency]

    H --> J[Check input<br/>token count]
    J --> K{Input tokens<br/>> 8000?}
    K -->|Yes| L[Root Cause:<br/>Oversized context window]
    K -->|No| M[Root Cause:<br/>Bedrock service latency]

    style L fill:#ff6b6b
    style M fill:#ffa07a

Root Cause

Long conversation histories accumulate in DynamoDB sessions. When the orchestrator loads all turns and includes them in the prompt, the input token count exceeds 8,000 tokens. Claude 3 Sonnet's inference time scales with input length — at 8K+ input tokens with a 4K max output, the model needs 25-30 seconds, exceeding the API Gateway 29-second timeout.

Resolution

"""
Fix: Context window management with token budgeting.
Truncates conversation history to fit within a token budget
while preserving the most recent and most relevant turns.
"""
import re
import logging
from typing import List, Dict

logger = logging.getLogger(__name__)

# Budget: 25s timeout means ~6000 input tokens max for Sonnet
MAX_INPUT_TOKENS = 6000
SYSTEM_PROMPT_TOKENS = 200  # Approximate system prompt size
SAFETY_MARGIN = 500


def estimate_tokens(text: str) -> int:
    """Estimate token count — Japanese chars ~1.5 tokens each."""
    jp_chars = len(re.findall(r"[\u3000-\u9fff]", text))
    other = len(text) - jp_chars
    return int(jp_chars * 1.5 + other * 0.3)


def truncate_conversation_history(
    history: List[Dict],
    current_message: str,
    max_tokens: int = MAX_INPUT_TOKENS,
) -> List[Dict]:
    """
    Truncate conversation history to fit token budget.

    Strategy:
    1. Always keep the first turn (establishes topic)
    2. Always keep the last 3 turns (recent context)
    3. Summarize middle turns if needed
    4. Drop oldest middle turns first
    """
    budget = max_tokens - SYSTEM_PROMPT_TOKENS - SAFETY_MARGIN
    current_tokens = estimate_tokens(current_message)
    budget -= current_tokens

    if not history:
        return []

    # Always keep first and last 3 turns
    if len(history) <= 4:
        return history

    first_turn = [history[0]]
    last_turns = history[-3:]
    middle_turns = history[1:-3]

    # Calculate token usage of required turns
    required_tokens = sum(estimate_tokens(t["content"]) for t in first_turn + last_turns)

    if required_tokens > budget:
        # Even required turns exceed budget — keep only last 2
        logger.warning(
            "Context severely over budget | required=%d | budget=%d",
            required_tokens, budget,
        )
        return history[-2:]

    remaining_budget = budget - required_tokens
    kept_middle = []

    # Add middle turns from most recent to oldest
    for turn in reversed(middle_turns):
        turn_tokens = estimate_tokens(turn["content"])
        if remaining_budget >= turn_tokens:
            kept_middle.insert(0, turn)
            remaining_budget -= turn_tokens
        else:
            break

    result = first_turn + kept_middle + last_turns
    total_tokens = sum(estimate_tokens(t["content"]) for t in result) + current_tokens

    logger.info(
        "Context truncated | original=%d turns | kept=%d turns | tokens=%d/%d",
        len(history), len(result), total_tokens, max_tokens,
    )
    return result

Prevention

Measure	Implementation
Token budget enforcement	Always compute estimated tokens before calling Bedrock; reject or truncate if over 6000
Conversation summarization	After 5 turns, use Haiku to summarize older context into a single compact turn
Adaptive timeout	Set Bedrock read_timeout proportional to estimated input tokens: `min(25, 10 + tokens/1000)` seconds
Session TTL	DynamoDB TTL expires sessions after 30 minutes of inactivity, preventing unbounded growth
CloudWatch alarm	Alert when p95 InvocationLatency exceeds 20 seconds — investigate before it hits 29s

Scenario 2: SQS Backlog Growing — Catalog Enrichment Queue Depth Alarm

Problem

The manga-enrichment-queue.fifo queue depth grows steadily from 0 to 50,000+ messages over 6 hours. The Lambda consumer is running but not keeping pace. The admin dashboard shows enrichment jobs submitted at 500/minute but only 50/minute are completing. New manga catalog uploads are stacking up without synopses or tags.

Detection

flowchart TD
    A[CloudWatch Alarm:<br/>ApproximateNumberOfMessages > 10000] --> B[Check SQS Metrics]
    B --> C[NumberOfMessagesSent:<br/>500/min]
    B --> D[NumberOfMessagesReceived:<br/>60/min]
    B --> E[ApproximateAgeOfOldest:<br/>4 hours]

    D --> F[Check Lambda<br/>Concurrency]
    F --> G{ConcurrentExecutions<br/>= ReservedConcurrency?}
    G -->|Yes = 5| H[Lambda at<br/>concurrency cap]
    G -->|No| I[Check Lambda<br/>Duration]

    H --> J[Check Lambda<br/>batch processing time]
    J --> K{Average Duration<br/>> 60s per batch?}
    K -->|Yes| L[Root Cause:<br/>Bedrock latency per item<br/>× batch size = timeout]

    style L fill:#ff6b6b

Root Cause

The Lambda consumer has ReservedConcurrency=5 and BatchSize=10. Each enrichment job calls Bedrock Haiku sequentially, taking ~2 seconds per call. With 10 items per batch processed sequentially: 10 items x 2s = 20s per batch. At 5 concurrent Lambdas, throughput is: 5 x 10 / 20s = 2.5 messages/second = 150/minute. But the producer submits 500/minute — a 3.3x deficit.

Resolution

"""
Fix: Parallel Bedrock invocation within Lambda batch + concurrency increase.
Uses asyncio to process batch items concurrently instead of sequentially.
"""
import json
import time
import asyncio
import logging
from typing import Dict, Any, List
from concurrent.futures import ThreadPoolExecutor

import boto3

logger = logging.getLogger(__name__)

bedrock = boto3.client("bedrock-runtime", region_name="us-east-1")
dynamodb = boto3.resource("dynamodb").Table("manga-catalog")

# Thread pool for concurrent Bedrock calls within a single Lambda invocation
_executor = ThreadPoolExecutor(max_workers=10)


def _invoke_bedrock_sync(manga: Dict, enrichment_type: str) -> Dict:
    """Single synchronous Bedrock invocation (runs in thread pool)."""
    prompt = f"Generate a {enrichment_type} for manga: {manga.get('title', '')} ({manga.get('titleJp', '')})"

    response = bedrock.invoke_model(
        modelId="anthropic.claude-3-haiku-20240307-v1:0",
        contentType="application/json",
        accept="application/json",
        body=json.dumps({
            "anthropic_version": "bedrock-2023-05-31",
            "max_tokens": 1024,
            "temperature": 0.5,
            "messages": [{"role": "user", "content": prompt}],
        }),
    )
    result = json.loads(response["body"].read())
    return {
        "manga_id": manga["id"],
        "enrichment_type": enrichment_type,
        "data": result["content"][0]["text"],
    }


async def process_batch_parallel(records: List[Dict]) -> List[str]:
    """
    Process all batch records in parallel using thread pool.

    Before fix:  10 items x 2s sequential = 20s
    After fix:   10 items x 2s parallel  =  ~3s (with 10 threads)
    """
    loop = asyncio.get_event_loop()
    failed_ids = []
    tasks = []

    for record in records:
        body = json.loads(record["body"])
        manga = body["manga"]
        enrichment_type = body.get("enrichmentType", "synopsis")

        task = loop.run_in_executor(
            _executor,
            _invoke_bedrock_sync,
            manga,
            enrichment_type,
        )
        tasks.append((record["messageId"], manga["id"], task))

    for message_id, manga_id, task in tasks:
        try:
            result = await task
            # Store in DynamoDB
            dynamodb.update_item(
                Key={"mangaId": result["manga_id"]},
                UpdateExpression="SET enrichment.#t = :d",
                ExpressionAttributeNames={"#t": result["enrichment_type"]},
                ExpressionAttributeValues={":d": result["data"]},
            )
        except Exception as exc:
            logger.error("Enrichment failed | manga=%s | error=%s", manga_id, str(exc))
            failed_ids.append(message_id)

    return failed_ids


def lambda_handler(event: Dict[str, Any], context) -> Dict:
    """
    Optimized Lambda handler with parallel processing.

    Throughput improvement:
    - Before: 5 concurrency x 10 batch / 20s = 150 msg/min
    - After:  10 concurrency x 10 batch / 3s  = 2000 msg/min
    """
    records = event["Records"]
    logger.info("Processing batch | size=%d", len(records))

    # Run parallel processing
    loop = asyncio.new_event_loop()
    failed_ids = loop.run_until_complete(process_batch_parallel(records))
    loop.close()

    return {
        "batchItemFailures": [{"itemIdentifier": mid} for mid in failed_ids]
    }

Prevention

Measure	Implementation
Right-size concurrency	Set Lambda reserved concurrency based on: `production_rate / (batch_size / avg_batch_duration)`
Parallel within batch	Always process Bedrock calls concurrently within a batch, never sequentially
Auto-scaling alarm	CloudWatch alarm when queue age > 15 minutes triggers Lambda concurrency increase via Application Auto Scaling
Backpressure on producer	Rate-limit the admin API to prevent submitting faster than the consumer can process
Queue depth dashboard	Real-time dashboard showing messages in flight, age of oldest, and consumer throughput

Scenario 3: Japanese Text Validation Rejection — Valid Messages Blocked

Problem

Japanese-speaking users report that some messages are rejected with a 400 Bad Request error from API Gateway. The rejected messages contain valid Japanese text — specifically, messages with emoji (commonly used in Japanese digital communication) and special kanji characters. Approximately 8% of Japanese messages are being incorrectly rejected.

Detection

flowchart TD
    A[User reports:<br/>メッセージが送れません] --> B[Check API GW<br/>4XX metrics]
    B --> C{400 rate > normal<br/>baseline?}
    C -->|Yes, 8% vs 1%| D[Pull API GW<br/>execution logs]

    D --> E[Examine rejected<br/>request bodies]
    E --> F{Pattern in<br/>rejected messages?}
    F -->|Emoji present| G[Check JSON Schema<br/>maxLength validation]
    F -->|Special kanji| H[Check encoding<br/>handling]

    G --> I[maxLength counts<br/>UTF-16 surrogate pairs<br/>as 2 chars]
    H --> J[Some validators<br/>reject chars > BMP]

    I --> K[Root Cause:<br/>API GW JSON Schema<br/>counts string length<br/>in UTF-16 code units,<br/>not characters]

    style K fill:#ff6b6b

Root Cause

API Gateway's JSON Schema validation uses the maxLength keyword, which internally counts string length in UTF-16 code units (following JSON Schema spec). Japanese emoji like 🎉 and supplementary kanji like 𠮷 are encoded as UTF-16 surrogate pairs (2 code units each). A message with 2000 visible characters containing 100 emoji counts as 2100 in the validator, exceeding the maxLength: 4000 limit for messages that are actually 3950+ visible characters with emoji.

Resolution

"""
Fix: Two-layer validation — relaxed API GW schema + accurate ECS validation.

1. Increase API GW maxLength to 8000 (2x buffer for surrogate pairs)
2. Add byte-level validation in ECS that correctly counts characters
"""
import json
import logging
import unicodedata

logger = logging.getLogger(__name__)

# Updated API Gateway JSON Schema (deployed via CDK/CloudFormation)
UPDATED_API_GW_SCHEMA = {
    "$schema": "http://json-schema.org/draft-04/schema#",
    "title": "MangaAssistChatRequest",
    "type": "object",
    "required": ["action", "data"],
    "properties": {
        "action": {
            "type": "string",
            "enum": ["sendMessage", "getHistory", "clearSession"],
        },
        "data": {
            "type": "object",
            "required": ["message"],
            "properties": {
                "message": {
                    "type": "string",
                    "minLength": 1,
                    # Increased from 4000 to 8000 to account for
                    # UTF-16 surrogate pairs in emoji/supplementary kanji
                    "maxLength": 8000,
                },
            },
        },
    },
}


def validate_message_length_accurate(message: str, max_chars: int = 4000) -> tuple:
    """
    Accurately validate message length counting Unicode grapheme clusters.
    This handles emoji, combining characters, and supplementary kanji correctly.

    A single visible character like 👨‍👩‍👧‍👦 (family emoji) is:
    - 1 grapheme cluster (what users see)
    - 7 Unicode code points
    - 11 UTF-16 code units
    - 25 UTF-8 bytes

    We count grapheme clusters — what the user perceives as characters.
    """
    # Count grapheme clusters (visual characters)
    grapheme_count = count_grapheme_clusters(message)

    if grapheme_count > max_chars:
        return False, f"Message too long: {grapheme_count} characters (max {max_chars})"

    # Also enforce byte limit for backend safety
    byte_len = len(message.encode("utf-8"))
    max_bytes = max_chars * 4  # Worst case: 4 bytes per grapheme
    if byte_len > max_bytes:
        return False, f"Message too large: {byte_len} bytes (max {max_bytes})"

    return True, None


def count_grapheme_clusters(text: str) -> int:
    """
    Count user-perceived characters (grapheme clusters).
    Simple approximation that handles most JP text correctly.
    """
    count = 0
    i = 0
    while i < len(text):
        count += 1
        i += 1
        # Skip combining marks and variation selectors
        while i < len(text) and unicodedata.category(text[i]).startswith(("M", "Sk")):
            i += 1
        # Skip zero-width joiners (for compound emoji)
        while i < len(text) and text[i] == "\u200d":
            i += 1
            if i < len(text):
                i += 1  # Skip the joined character
                while i < len(text) and unicodedata.category(text[i]).startswith("M"):
                    i += 1
    return count


# Test cases for JP text with emoji
def test_validation():
    """Verify validation handles Japanese text correctly."""
    test_cases = [
        ("こんにちは", True),                           # Basic hiragana
        ("新刊マンガ🎉おすすめ！", True),              # With emoji
        ("𠮷田さんのマンガ", True),                     # Supplementary kanji
        ("A" * 4001, False),                            # Over limit
        ("あ" * 4000, True),                            # Exactly at limit
        ("🎉" * 2001, False),                           # Emoji at limit
    ]
    for text, expected_valid in test_cases:
        is_valid, reason = validate_message_length_accurate(text)
        status = "PASS" if is_valid == expected_valid else "FAIL"
        print(f"{status}: len={len(text)} graphemes={count_grapheme_clusters(text)} valid={is_valid}")

Prevention

Measure	Implementation
API GW schema as coarse filter	Set `maxLength` to 2x the actual limit to catch only extreme cases
Accurate server-side validation	Count grapheme clusters in ECS, not UTF-16 code units
Test with real JP data	Include emoji, supplementary kanji, and combining characters in validation test suite
Error message localization	Return Japanese error messages for JP users: `メッセージが長すぎます (最大4000文字)`
Monitoring	Track 400 rate by `preferredLanguage` dimension to detect language-specific issues

Scenario 4: Lambda Throttling on Enrichment Consumer

Problem

The catalog enrichment Lambda consumer starts returning TooManyRequestsException errors. The SQS event source mapping stops invoking new Lambda instances. The enrichment queue depth spikes from 1,000 to 80,000 in 30 minutes. The issue starts at 10:00 JST when a scheduled EventBridge rule triggers bulk enrichment for the weekly new releases (2,000 manga).

Detection

flowchart TD
    A[CloudWatch Alarm:<br/>Lambda Throttles > 0] --> B[Check Lambda<br/>ConcurrentExecutions]
    B --> C{At account<br/>concurrency limit?}
    C -->|Yes, 1000/1000| D[Account-level<br/>throttling]
    C -->|No| E{At reserved<br/>concurrency limit?}
    E -->|Yes, 5/5| F[Function-level<br/>throttling]

    D --> G[Check what else<br/>is consuming concurrency]
    G --> H[Other Lambda functions<br/>using 950 concurrent]
    H --> I[Root Cause:<br/>Shared account concurrency<br/>exhausted by unrelated<br/>image processing Lambda]

    F --> J[Root Cause:<br/>Reserved concurrency<br/>too low for burst]

    style I fill:#ff6b6b
    style J fill:#ffa07a

Root Cause

The AWS account has a 1,000 concurrent Lambda execution limit (default). An unrelated image processing pipeline in the same account is consuming 950 concurrent executions. The enrichment Lambda's reserved concurrency of 5 is honored, but when the image processing Lambda exceeds its unreserved portion, Lambda's throttling affects all functions competing for the unreserved pool. The enrichment Lambda's reserved concurrency protects its minimum, but the SQS event source mapping has its own scaling behavior that gets disrupted.

Resolution

"""
Fix: Isolate Lambda concurrency + implement SQS-based backpressure.

1. Request account concurrency limit increase to 3000
2. Set reserved concurrency for ALL Lambda functions
3. Add SQS-based backpressure to prevent overwhelming Bedrock
"""
import json
import logging
from typing import Dict, Any

import boto3

logger = logging.getLogger(__name__)


class LambdaConcurrencyManager:
    """Manage Lambda concurrency settings for MangaAssist functions."""

    def __init__(self, region: str = "us-east-1"):
        self.lambda_client = boto3.client("lambda", region_name=region)

    def configure_concurrency(self):
        """
        Set reserved concurrency for all MangaAssist Lambda functions.
        This isolates our functions from other account workloads.

        Account limit: 3000 (after increase request)
        MangaAssist allocation:
          - enrichment-consumer:     25 (handles burst of 2000 items)
          - moderation-consumer:     10
          - websocket-handler:       50 (real-time chat)
          - deferred-processor:      15
        Total reserved: 100
        Remaining unreserved: 2900 (for other account workloads)
        """
        allocations = {
            "manga-enrichment-consumer": 25,
            "manga-moderation-consumer": 10,
            "manga-websocket-handler": 50,
            "manga-deferred-processor": 15,
        }

        for function_name, concurrency in allocations.items():
            try:
                self.lambda_client.put_function_concurrency(
                    FunctionName=function_name,
                    ReservedConcurrentExecutions=concurrency,
                )
                logger.info(
                    "Set concurrency | function=%s | reserved=%d",
                    function_name, concurrency,
                )
            except Exception as exc:
                logger.error(
                    "Failed to set concurrency | function=%s | error=%s",
                    function_name, str(exc),
                )

    def configure_event_source_mapping(self):
        """
        Update SQS event source mapping with proper scaling config.

        ScalingConfig.MaximumConcurrency limits how many concurrent
        Lambda instances the SQS poller will invoke, preventing
        thundering herd when backlog clears.
        """
        sqs_client = boto3.client("lambda", region_name="us-east-1")

        # List event source mappings for our function
        response = sqs_client.list_event_source_mappings(
            FunctionName="manga-enrichment-consumer",
        )

        for mapping in response["EventSourceMappings"]:
            sqs_client.update_event_source_mapping(
                UUID=mapping["UUID"],
                BatchSize=10,
                MaximumBatchingWindowInSeconds=30,
                FunctionResponseTypes=["ReportBatchItemFailures"],
                ScalingConfig={
                    "MaximumConcurrency": 20,  # Cap concurrent pollers
                },
            )
            logger.info("Updated event source mapping | uuid=%s", mapping["UUID"])

Prevention

Measure	Implementation
Reserved concurrency for ALL functions	Prevents any single function from consuming the entire account limit
Account limit increase	Request 3000+ from AWS Support before production launch
SQS ScalingConfig.MaximumConcurrency	Cap Lambda scaling from SQS to prevent thundering herd
Separate accounts	Run unrelated workloads (image processing) in a different AWS account
Concurrency alarm	Alert when any function's concurrent executions exceed 80% of its reserved allocation

Scenario 5: Async Callback Failure — Enrichment Results Never Delivered

Problem

The admin dashboard shows enrichment jobs as "in progress" indefinitely. The SQS queue processes messages successfully (queue depth stays near zero), but the catalog database never receives the enrichment results. The Lambda consumer logs show successful Bedrock invocations but the DynamoDB writes silently fail. No errors appear in CloudWatch Logs.

Detection

flowchart TD
    A[Admin report:<br/>Enrichment stuck] --> B[Check SQS metrics]
    B --> C[Queue depth: 0<br/>Messages processed: OK]

    C --> D[Check Lambda<br/>CloudWatch Logs]
    D --> E[Bedrock calls:<br/>All successful]

    E --> F[Check DynamoDB<br/>writes]
    F --> G{ConsumedWriteCapacity<br/>normal?}
    G -->|Very low| H[Writes not<br/>reaching DDB]

    H --> I[Check Lambda<br/>IAM role]
    I --> J{Has dynamodb:UpdateItem<br/>on catalog table?}
    J -->|No| K[Root Cause:<br/>IAM policy missing<br/>after table rename]

    K --> L[Table renamed from<br/>manga-products to<br/>manga-catalog but<br/>Lambda role still<br/>references old name]

    style L fill:#ff6b6b

Root Cause

The DynamoDB table was renamed from manga-products to manga-catalog during a recent infrastructure update. The Lambda function's code was updated to reference the new table name, but the IAM execution role's policy still grants dynamodb:UpdateItem only on arn:aws:dynamodb:*:*:table/manga-products. The boto3 DynamoDB client raises an AccessDeniedException, but the Lambda's broad except Exception clause catches it and reports the message as a batch failure silently (the error log message is present but at DEBUG level, which is filtered out in production).

Resolution

"""
Fix: Correct IAM policy + improve error handling to surface DDB failures.
"""
import json
import logging
from typing import Dict, Any

import boto3
from botocore.exceptions import ClientError

logger = logging.getLogger(__name__)

# Fix 1: IAM policy update (via CDK/CloudFormation)
CORRECTED_IAM_POLICY = {
    "Version": "2012-10-17",
    "Statement": [
        {
            "Effect": "Allow",
            "Action": [
                "dynamodb:UpdateItem",
                "dynamodb:PutItem",
                "dynamodb:GetItem",
            ],
            "Resource": [
                # Use the CORRECT table name
                "arn:aws:dynamodb:us-east-1:123456789012:table/manga-catalog",
                "arn:aws:dynamodb:us-east-1:123456789012:table/manga-catalog/index/*",
            ],
        },
        {
            "Effect": "Allow",
            "Action": [
                "bedrock:InvokeModel",
            ],
            "Resource": [
                "arn:aws:bedrock:us-east-1::foundation-model/anthropic.claude-3-*",
            ],
        },
    ],
}


def store_enrichment_result(manga_id: str, enrichment_type: str, data: str) -> bool:
    """
    Store enrichment result with explicit error handling.

    Fix 2: Never swallow DynamoDB errors silently.
    Distinguish between retryable and non-retryable errors.
    """
    dynamodb = boto3.resource("dynamodb").Table("manga-catalog")

    try:
        dynamodb.update_item(
            Key={"mangaId": manga_id},
            UpdateExpression="SET enrichment.#t = :d, enrichment.lastUpdated = :ts",
            ExpressionAttributeNames={"#t": enrichment_type},
            ExpressionAttributeValues={
                ":d": data,
                ":ts": int(__import__("time").time()),
            },
        )
        logger.info("Enrichment stored | manga=%s | type=%s", manga_id, enrichment_type)
        return True

    except ClientError as exc:
        error_code = exc.response["Error"]["Code"]

        if error_code == "AccessDeniedException":
            # CRITICAL: This means IAM is misconfigured — raise loudly
            logger.critical(
                "DynamoDB ACCESS DENIED | manga=%s | table=manga-catalog | "
                "Check Lambda execution role IAM policy!",
                manga_id,
            )
            raise  # Do NOT swallow — let it surface as a batch failure

        elif error_code in ("ProvisionedThroughputExceededException", "ThrottlingException"):
            # Retryable — let SQS retry this message
            logger.warning(
                "DynamoDB throttled | manga=%s | will retry via SQS",
                manga_id,
            )
            raise

        elif error_code == "ResourceNotFoundException":
            logger.critical(
                "DynamoDB table not found! Expected: manga-catalog | "
                "This indicates a table rename or deletion.",
            )
            raise

        else:
            logger.error(
                "DynamoDB unexpected error | manga=%s | code=%s | msg=%s",
                manga_id, error_code, exc.response["Error"]["Message"],
            )
            raise

    except Exception as exc:
        # Catch-all with EXPLICIT logging at ERROR level
        logger.error(
            "Unexpected error storing enrichment | manga=%s | type=%s | error=%s",
            manga_id, enrichment_type, str(exc),
            exc_info=True,  # Include stack trace
        )
        raise  # Always re-raise — never swallow


def lambda_handler(event: Dict[str, Any], context) -> Dict:
    """Improved handler with proper error classification."""
    failed_items = []

    for record in event["Records"]:
        try:
            body = json.loads(record["body"])
            manga = body["manga"]
            enrichment_type = body.get("enrichmentType", "synopsis")

            # Invoke Bedrock (existing code)
            bedrock = boto3.client("bedrock-runtime")
            response = bedrock.invoke_model(
                modelId="anthropic.claude-3-haiku-20240307-v1:0",
                contentType="application/json",
                accept="application/json",
                body=json.dumps({
                    "anthropic_version": "bedrock-2023-05-31",
                    "max_tokens": 1024,
                    "messages": [{"role": "user", "content": f"Enrich: {manga['title']}"}],
                }),
            )
            result = json.loads(response["body"].read())

            # Store with improved error handling
            store_enrichment_result(
                manga_id=manga["id"],
                enrichment_type=enrichment_type,
                data=result["content"][0]["text"],
            )

        except Exception as exc:
            logger.error(
                "Record failed | msgId=%s | error=%s",
                record["messageId"], str(exc),
                exc_info=True,
            )
            failed_items.append({"itemIdentifier": record["messageId"]})

    return {"batchItemFailures": failed_items}

Prevention

Measure	Implementation
IAM policy references via CDK/CloudFormation	Use `table.grant_read_write_data(lambda_fn)` — CDK auto-updates ARNs when table name changes
Never swallow exceptions silently	Use `logger.error(..., exc_info=True)` and always re-raise DynamoDB errors
Integration test on deploy	Post-deploy Lambda runs a test write to DynamoDB and alerts if it fails
DynamoDB write metrics	CloudWatch alarm on `ConsumedWriteCapacityUnits = 0` when enrichment queue has messages
Structured logging	Use JSON structured logs so `AccessDeniedException` errors surface in CloudWatch Insights queries

Key Takeaways

#	Takeaway
1	Token budgets prevent timeouts — estimate input tokens before calling Bedrock and truncate conversation history to stay within the time budget.
2	Parallel batch processing transforms Lambda throughput — processing 10 Bedrock calls concurrently instead of sequentially improves throughput by ~7x.
3	UTF-16 surrogate pairs cause silent validation failures for Japanese emoji — set API Gateway `maxLength` to 2x and validate grapheme clusters server-side.
4	Reserved concurrency isolation prevents unrelated Lambda functions from starving your consumer.
5	Never swallow exceptions — `except Exception: pass` is the root cause of "silent failure" incidents. Always log at ERROR with `exc_info=True` and re-raise.