02: FM Integration Troubleshooting
AIP-C01 Mapping
Task 5.2 → Skill 5.2.2: Diagnose and resolve FM integration issues to identify and fix API integration problems specific to GenAI services (error logging, request validation, response analysis).
User Story
As a backend engineer on the MangaAssist team, I want to systematically diagnose and resolve Foundation Model API integration failures, So that the chatbot maintains reliable communication with Amazon Bedrock, handles transient and systemic failures gracefully, and provides degraded but useful responses when the FM is unavailable.
Acceptance Criteria
- Every Bedrock API call is logged with correlation ID, request size, response latency, and status code
- Request validation catches payload issues (missing fields, oversized input, invalid parameters) before the API call
- Response analysis detects and handles malformed JSON, incomplete streaming, and unexpected content
- Circuit breaker prevents cascading failures when Bedrock is degraded or throttled
- Retry strategy uses exponential backoff with jitter; max 2 retries for non-idempotent calls
- Error rate > 5% triggers automatic fallback to template-based responses for known intents
- Mean time to detect (MTTD) for FM integration failures < 2 minutes via CloudWatch alarms
High-Level Design
FM Integration Failure Taxonomy
graph TD
A[FM Integration<br>Failure] --> B[Request Failures]
A --> C[Response Failures]
A --> D[Streaming Failures]
A --> E[Systemic Failures]
B --> B1[Payload validation error<br>400 Bad Request]
B --> B2[Token limit exceeded<br>413 / validation error]
B --> B3[Model not found<br>404]
B --> B4[Missing permissions<br>403]
C --> C1[Malformed JSON output]
C --> C2[Incomplete response<br>cutoff mid-sentence]
C --> C3[Hallucinated structure<br>wrong schema]
C --> C4[Empty response body]
D --> D1[Stream timeout<br>no chunks received]
D --> D2[Stream interrupted<br>partial delivery]
D --> D3[WebSocket drop<br>during streaming]
D --> D4[Chunk ordering error]
E --> E1[Throttling<br>429 Too Many Requests]
E --> E2[Service outage<br>500/503]
E --> E3[Model deprecation<br>endpoint removed]
E --> E4[Region failover<br>needed]
Error Handling Architecture
flowchart TD
A[Chat Request] --> B[Request Validator]
B -->|Invalid| C[Return 400 with<br>specific error]
B -->|Valid| D{Circuit Breaker<br>State?}
D -->|OPEN| E[Return cached/template<br>response immediately]
D -->|HALF_OPEN| F[Allow single probe<br>request through]
D -->|CLOSED| G[Call Bedrock API]
G --> H{Response<br>received?}
H -->|Timeout| I{Retry count<br>< max?}
H -->|Error 429| J[Backoff + Retry<br>with jitter]
H -->|Error 5xx| I
H -->|Success| K[Response Analyzer]
I -->|Yes| G
I -->|No| L[Record failure<br>in circuit breaker]
J --> G
K --> M{Response<br>valid?}
M -->|Malformed JSON| N[Attempt repair<br>+ re-validate]
M -->|Incomplete| O[Log + return<br>partial if usable]
M -->|Valid| P[Return to<br>Orchestrator]
L --> E
N -->|Repaired| P
N -->|Unrepairable| E
F --> H
Bedrock Error Code Reference
| HTTP Status | Error Code | Cause | MangaAssist Impact | Resolution |
|---|---|---|---|---|
| 400 | ValidationException |
Malformed request body, invalid model parameters | Single request fails | Fix payload; log + alert if recurring |
| 400 | ModelTimeoutException |
Input too large or model overloaded | Response delayed or lost | Reduce input size; retry once |
| 403 | AccessDeniedException |
IAM role lacks bedrock:InvokeModel |
All requests fail | Fix IAM policy; critical alert |
| 404 | ResourceNotFoundException |
Model ID wrong or deprecated | All requests fail | Update model ID; critical alert |
| 429 | ThrottlingException |
Exceeded provisioned throughput or account limit | Requests queued or dropped | Backoff + jitter; request quota increase |
| 500 | InternalServerError |
Bedrock service error | Intermittent failures | Retry with backoff; open circuit if persistent |
| 503 | ServiceUnavailableException |
Bedrock capacity issue | All requests fail | Circuit breaker opens; template fallback |
Low-Level Design
1. Structured Error Logging
Every Bedrock interaction gets a standardized log entry. This is the single most important troubleshooting investment because it makes every failure diagnosable after the fact.
import json
import time
import uuid
import logging
from dataclasses import dataclass, field, asdict
from typing import Optional
logger = logging.getLogger("mangaassist.fm_integration")
@dataclass
class BedrockCallLog:
"""Structured log entry for every Bedrock API call."""
correlation_id: str
session_id: str
intent: str
model_id: str
request_timestamp: float
response_timestamp: Optional[float] = None
# Request metrics
input_tokens: int = 0
max_output_tokens: int = 0
temperature: float = 0.0
prompt_sections: dict = field(default_factory=dict) # section -> token count
# Response metrics
output_tokens: int = 0
latency_ms: float = 0.0
time_to_first_token_ms: Optional[float] = None
# Status
status: str = "pending" # pending, success, error, timeout, throttled
error_code: Optional[str] = None
error_message: Optional[str] = None
retry_count: int = 0
# Quality signals
response_valid_json: Optional[bool] = None
response_has_products: Optional[bool] = None
guardrail_blocked: bool = False
def to_log_dict(self) -> dict:
"""Produce a structured log dictionary safe for CloudWatch Logs."""
d = asdict(self)
d["log_type"] = "bedrock_call"
d["latency_ms"] = round(self.latency_ms, 2)
if self.time_to_first_token_ms is not None:
d["time_to_first_token_ms"] = round(self.time_to_first_token_ms, 2)
return d
def finalize(self, status: str, output_tokens: int = 0, error_code: str = None, error_message: str = None):
self.response_timestamp = time.time()
self.latency_ms = (self.response_timestamp - self.request_timestamp) * 1000
self.status = status
self.output_tokens = output_tokens
self.error_code = error_code
self.error_message = error_message
logger.info(json.dumps(self.to_log_dict()))
2. Request Validator
Catches bad requests before they hit Bedrock. Every rejected request is cheaper, faster, and more diagnosable than a Bedrock 400 error.
from dataclasses import dataclass
from typing import Optional
@dataclass
class ValidationResult:
valid: bool
error_code: Optional[str] = None
error_message: Optional[str] = None
field: Optional[str] = None
class BedrockRequestValidator:
"""Validates Bedrock invoke_model payloads before submission.
Catches:
- Missing required fields
- Token counts exceeding model limits
- Invalid parameter values (temperature, top_p, etc.)
- Empty or whitespace-only prompts
"""
MODEL_LIMITS = {
"anthropic.claude-3-5-sonnet-20241022-v2:0": {
"max_input_tokens": 200_000,
"max_output_tokens": 8_192,
"temperature_range": (0.0, 1.0),
"top_p_range": (0.0, 1.0),
},
"anthropic.claude-3-haiku-20240307-v1:0": {
"max_input_tokens": 200_000,
"max_output_tokens": 4_096,
"temperature_range": (0.0, 1.0),
"top_p_range": (0.0, 1.0),
},
}
# Practical budget — not the model limit, but what we allow to keep cost/latency in check
PRACTICAL_INPUT_LIMIT = 5_000
PRACTICAL_OUTPUT_LIMIT = 1_500
def validate(self, model_id: str, messages: list, params: dict) -> ValidationResult:
"""Validate a Bedrock Messages API request."""
# 1. Model exists and is supported
if model_id not in self.MODEL_LIMITS:
return ValidationResult(
valid=False,
error_code="INVALID_MODEL",
error_message=f"Model '{model_id}' is not configured. Available: {list(self.MODEL_LIMITS.keys())}",
field="model_id",
)
limits = self.MODEL_LIMITS[model_id]
# 2. Messages list is not empty
if not messages or len(messages) == 0:
return ValidationResult(
valid=False,
error_code="EMPTY_MESSAGES",
error_message="Messages list is empty",
field="messages",
)
# 3. Last message is from user
if messages[-1].get("role") != "user":
return ValidationResult(
valid=False,
error_code="INVALID_TURN_ORDER",
error_message="Last message must be from 'user' role",
field="messages[-1].role",
)
# 4. No empty content
for i, msg in enumerate(messages):
content = msg.get("content", "")
if isinstance(content, str) and not content.strip():
return ValidationResult(
valid=False,
error_code="EMPTY_CONTENT",
error_message=f"Message at index {i} has empty content",
field=f"messages[{i}].content",
)
# 5. Token count within practical limits
total_input_tokens = self._estimate_tokens(messages)
if total_input_tokens > self.PRACTICAL_INPUT_LIMIT:
return ValidationResult(
valid=False,
error_code="INPUT_TOO_LARGE",
error_message=(
f"Estimated input tokens ({total_input_tokens}) exceed practical limit "
f"({self.PRACTICAL_INPUT_LIMIT}). Compress prompt before submission."
),
field="messages",
)
# 6. Hard model limit check
if total_input_tokens > limits["max_input_tokens"]:
return ValidationResult(
valid=False,
error_code="INPUT_EXCEEDS_MODEL_LIMIT",
error_message=f"Input tokens ({total_input_tokens}) exceed model limit ({limits['max_input_tokens']})",
field="messages",
)
# 7. Parameter validation
temperature = params.get("temperature", 0.0)
min_t, max_t = limits["temperature_range"]
if not (min_t <= temperature <= max_t):
return ValidationResult(
valid=False,
error_code="INVALID_TEMPERATURE",
error_message=f"Temperature {temperature} outside range [{min_t}, {max_t}]",
field="temperature",
)
max_tokens = params.get("max_tokens", 1000)
if max_tokens > limits["max_output_tokens"]:
return ValidationResult(
valid=False,
error_code="MAX_TOKENS_EXCEEDED",
error_message=f"max_tokens ({max_tokens}) exceeds model limit ({limits['max_output_tokens']})",
field="max_tokens",
)
return ValidationResult(valid=True)
def _estimate_tokens(self, messages: list) -> int:
total = 0
for msg in messages:
content = msg.get("content", "")
if isinstance(content, str):
total += len(content) // 4
elif isinstance(content, list):
for block in content:
if isinstance(block, dict) and "text" in block:
total += len(block["text"]) // 4
return total
3. Bedrock Client with Retry and Circuit Breaker
import time
import random
import json
import logging
import boto3
from dataclasses import dataclass, field
from typing import Optional, AsyncIterator
from enum import Enum
logger = logging.getLogger("mangaassist.fm_integration")
class CircuitState(Enum):
CLOSED = "closed" # Normal operation
OPEN = "open" # Failing — reject requests immediately
HALF_OPEN = "half_open" # Testing — allow one probe request
@dataclass
class CircuitBreaker:
"""Circuit breaker for Bedrock API calls.
Opens after `failure_threshold` consecutive failures.
Half-opens after `recovery_timeout_seconds`.
Closes after a successful probe in HALF_OPEN state.
"""
failure_threshold: int = 5
recovery_timeout_seconds: float = 30.0
state: CircuitState = CircuitState.CLOSED
failure_count: int = 0
last_failure_time: float = 0.0
last_state_change: float = field(default_factory=time.time)
def record_success(self):
self.failure_count = 0
if self.state == CircuitState.HALF_OPEN:
self._transition(CircuitState.CLOSED)
logger.info("Circuit breaker CLOSED — Bedrock recovered")
def record_failure(self):
self.failure_count += 1
self.last_failure_time = time.time()
if self.state == CircuitState.HALF_OPEN:
self._transition(CircuitState.OPEN)
logger.warning("Circuit breaker OPEN (probe failed) — Bedrock still degraded")
elif self.failure_count >= self.failure_threshold:
self._transition(CircuitState.OPEN)
logger.warning(
"Circuit breaker OPEN — %d consecutive failures",
self.failure_count,
)
def allow_request(self) -> bool:
if self.state == CircuitState.CLOSED:
return True
if self.state == CircuitState.OPEN:
if time.time() - self.last_failure_time >= self.recovery_timeout_seconds:
self._transition(CircuitState.HALF_OPEN)
logger.info("Circuit breaker HALF_OPEN — allowing probe request")
return True
return False
if self.state == CircuitState.HALF_OPEN:
return True # Allow the single probe
return False
def _transition(self, new_state: CircuitState):
old_state = self.state
self.state = new_state
self.last_state_change = time.time()
logger.info(
"Circuit breaker transition: %s -> %s",
old_state.value, new_state.value,
)
class BedrockError(Exception):
"""Structured exception for Bedrock API failures."""
def __init__(self, error_code: str, message: str, retryable: bool, status_code: int = 0):
super().__init__(message)
self.error_code = error_code
self.retryable = retryable
self.status_code = status_code
@dataclass
class BedrockClientWrapper:
"""Production Bedrock client with validation, retry, circuit breaking, and structured logging."""
model_id: str = "anthropic.claude-3-5-sonnet-20241022-v2:0"
region: str = "ap-northeast-1"
max_retries: int = 2
base_backoff_seconds: float = 0.5
max_backoff_seconds: float = 8.0
def __post_init__(self):
self.client = boto3.client("bedrock-runtime", region_name=self.region)
self.validator = BedrockRequestValidator()
self.circuit_breaker = CircuitBreaker()
self._call_log: Optional[BedrockCallLog] = None
def invoke(
self,
messages: list,
system_prompt: str,
session_id: str,
intent: str,
temperature: float = 0.0,
max_tokens: int = 1000,
) -> dict:
"""Synchronous Bedrock invocation with full error handling pipeline."""
correlation_id = str(uuid.uuid4())
params = {"temperature": temperature, "max_tokens": max_tokens}
# Initialize structured log
self._call_log = BedrockCallLog(
correlation_id=correlation_id,
session_id=session_id,
intent=intent,
model_id=self.model_id,
request_timestamp=time.time(),
input_tokens=self.validator._estimate_tokens(messages),
max_output_tokens=max_tokens,
temperature=temperature,
)
# Step 1: Validate request
validation = self.validator.validate(self.model_id, messages, params)
if not validation.valid:
self._call_log.finalize(
status="validation_error",
error_code=validation.error_code,
error_message=validation.error_message,
)
raise BedrockError(
error_code=validation.error_code,
message=validation.error_message,
retryable=False,
)
# Step 2: Check circuit breaker
if not self.circuit_breaker.allow_request():
self._call_log.finalize(
status="circuit_open",
error_code="CIRCUIT_OPEN",
error_message="Circuit breaker is open — Bedrock degraded",
)
raise BedrockError(
error_code="CIRCUIT_OPEN",
message="Bedrock circuit breaker is open. Use template fallback.",
retryable=False,
)
# Step 3: Call Bedrock with retry
last_error = None
for attempt in range(1 + self.max_retries):
try:
response = self._invoke_bedrock(messages, system_prompt, params)
self.circuit_breaker.record_success()
# Step 4: Analyze response
analyzed = self._analyze_response(response)
self._call_log.finalize(
status="success",
output_tokens=analyzed.get("output_tokens", 0),
)
self._call_log.response_valid_json = analyzed.get("valid_json", False)
self._call_log.response_has_products = analyzed.get("has_products", False)
self._call_log.retry_count = attempt
return analyzed
except self.client.exceptions.ThrottlingException as e:
last_error = e
self._call_log.retry_count = attempt
backoff = self._calculate_backoff(attempt)
logger.warning(
"Bedrock throttled (attempt %d/%d), backing off %.1fs",
attempt + 1, 1 + self.max_retries, backoff,
extra={"correlation_id": correlation_id, "backoff_seconds": backoff},
)
time.sleep(backoff)
except (
self.client.exceptions.InternalServerError,
self.client.exceptions.ServiceUnavailableException,
) as e:
last_error = e
self.circuit_breaker.record_failure()
if attempt < self.max_retries:
backoff = self._calculate_backoff(attempt)
logger.warning(
"Bedrock server error (attempt %d/%d), retrying in %.1fs",
attempt + 1, 1 + self.max_retries, backoff,
extra={"correlation_id": correlation_id},
)
time.sleep(backoff)
except Exception as e:
# Non-retryable error
self.circuit_breaker.record_failure()
self._call_log.finalize(
status="error",
error_code=type(e).__name__,
error_message=str(e)[:500],
)
raise BedrockError(
error_code=type(e).__name__,
message=str(e),
retryable=False,
)
# All retries exhausted
self.circuit_breaker.record_failure()
self._call_log.finalize(
status="retries_exhausted",
error_code=type(last_error).__name__,
error_message=str(last_error)[:500],
)
raise BedrockError(
error_code="RETRIES_EXHAUSTED",
message=f"Bedrock call failed after {1 + self.max_retries} attempts: {last_error}",
retryable=False,
)
def _invoke_bedrock(self, messages: list, system_prompt: str, params: dict) -> dict:
"""Raw Bedrock API call."""
body = {
"anthropic_version": "bedrock-2023-05-31",
"max_tokens": params["max_tokens"],
"temperature": params["temperature"],
"system": system_prompt,
"messages": messages,
}
response = self.client.invoke_model(
modelId=self.model_id,
contentType="application/json",
accept="application/json",
body=json.dumps(body),
)
return json.loads(response["body"].read())
def _calculate_backoff(self, attempt: int) -> float:
"""Exponential backoff with full jitter."""
base = self.base_backoff_seconds * (2 ** attempt)
capped = min(base, self.max_backoff_seconds)
return random.uniform(0, capped)
def _analyze_response(self, response: dict) -> dict:
"""Analyze Bedrock response for quality signals."""
result = {
"raw_response": response,
"output_tokens": response.get("usage", {}).get("output_tokens", 0),
"stop_reason": response.get("stop_reason", "unknown"),
"valid_json": False,
"has_products": False,
"text": "",
}
# Extract text content
content_blocks = response.get("content", [])
text_parts = []
for block in content_blocks:
if block.get("type") == "text":
text_parts.append(block["text"])
result["text"] = "\n".join(text_parts)
# Check if response contains valid JSON (for structured outputs)
try:
# Try to parse the entire text as JSON
json.loads(result["text"])
result["valid_json"] = True
except (json.JSONDecodeError, TypeError):
# Try to find JSON within the text (common with markdown fences)
import re
json_match = re.search(r'```json\s*(.*?)\s*```', result["text"], re.DOTALL)
if json_match:
try:
json.loads(json_match.group(1))
result["valid_json"] = True
except (json.JSONDecodeError, TypeError):
pass
# Check for product data
result["has_products"] = bool(re.search(r'B0[A-Z0-9]{8}', result["text"]))
# Check for incomplete response
if response.get("stop_reason") == "max_tokens":
logger.warning(
"Response truncated at max_tokens — consider increasing output budget",
extra={"output_tokens": result["output_tokens"]},
)
result["was_truncated"] = True
return result
4. Response Analyzer — Repair and Validation
import json
import re
import logging
from dataclasses import dataclass
from typing import Optional
logger = logging.getLogger("mangaassist.fm_integration")
@dataclass
class ResponseAnalysis:
is_valid: bool
response_text: str
parsed_json: Optional[dict] = None
repair_applied: bool = False
repair_type: Optional[str] = None
issues: list = None
def __post_init__(self):
if self.issues is None:
self.issues = []
class ResponseAnalyzer:
"""Validates and repairs FM responses.
Common issues in production:
1. JSON with trailing commas (Claude sometimes does this)
2. JSON wrapped in markdown code fences
3. Incomplete JSON due to output token limit
4. Mixed text + JSON when structured output was expected
5. Missing required fields in structured response
"""
REQUIRED_RESPONSE_FIELDS = ["response_text"]
OPTIONAL_RESPONSE_FIELDS = ["products", "actions", "follow_up_suggestions"]
def analyze(self, raw_text: str, expected_format: str = "text") -> ResponseAnalysis:
"""Analyze and optionally repair the FM response."""
issues = []
if not raw_text or not raw_text.strip():
return ResponseAnalysis(
is_valid=False,
response_text="",
issues=["Empty response from FM"],
)
if expected_format == "json":
return self._analyze_json_response(raw_text)
elif expected_format == "structured":
return self._analyze_structured_response(raw_text)
else:
return ResponseAnalysis(is_valid=True, response_text=raw_text)
def _analyze_json_response(self, raw_text: str) -> ResponseAnalysis:
"""Try to extract valid JSON from the response."""
issues = []
# Attempt 1: Direct parse
try:
parsed = json.loads(raw_text)
return ResponseAnalysis(
is_valid=True, response_text=raw_text, parsed_json=parsed,
)
except json.JSONDecodeError:
issues.append("Direct JSON parse failed")
# Attempt 2: Extract from markdown code fence
json_match = re.search(r'```(?:json)?\s*([\s\S]*?)\s*```', raw_text)
if json_match:
try:
parsed = json.loads(json_match.group(1))
return ResponseAnalysis(
is_valid=True,
response_text=json_match.group(1),
parsed_json=parsed,
repair_applied=True,
repair_type="extracted_from_code_fence",
issues=issues,
)
except json.JSONDecodeError:
issues.append("JSON in code fence also invalid")
# Attempt 3: Remove trailing commas (common FM mistake)
cleaned = re.sub(r',\s*([}\]])', r'\1', raw_text)
try:
parsed = json.loads(cleaned)
return ResponseAnalysis(
is_valid=True,
response_text=cleaned,
parsed_json=parsed,
repair_applied=True,
repair_type="removed_trailing_commas",
issues=issues,
)
except json.JSONDecodeError:
issues.append("Trailing comma removal did not fix JSON")
# Attempt 4: Try to close unclosed braces/brackets (truncated response)
repaired = self._try_close_json(raw_text)
if repaired:
try:
parsed = json.loads(repaired)
logger.warning(
"Repaired truncated JSON by closing brackets",
extra={"repair_type": "closed_brackets"},
)
return ResponseAnalysis(
is_valid=True,
response_text=repaired,
parsed_json=parsed,
repair_applied=True,
repair_type="closed_truncated_json",
issues=issues + ["WARNING: JSON was truncated and auto-closed"],
)
except json.JSONDecodeError:
issues.append("Bracket closure repair failed")
# All repair attempts failed
logger.error(
"Unrepairable JSON response",
extra={"issues": issues, "raw_text_preview": raw_text[:200]},
)
return ResponseAnalysis(
is_valid=False,
response_text=raw_text,
issues=issues + ["All JSON repair attempts failed"],
)
def _analyze_structured_response(self, raw_text: str) -> ResponseAnalysis:
"""Validate structured response against expected MangaAssist schema."""
analysis = self._analyze_json_response(raw_text)
if not analysis.is_valid or analysis.parsed_json is None:
return analysis
# Check required fields
missing_required = [
f for f in self.REQUIRED_RESPONSE_FIELDS
if f not in analysis.parsed_json
]
if missing_required:
analysis.issues.append(f"Missing required fields: {missing_required}")
analysis.is_valid = False
# Validate product ASINs if present
products = analysis.parsed_json.get("products", [])
for i, product in enumerate(products):
asin = product.get("asin", "")
if asin and not re.match(r'^B0[A-Z0-9]{8}$', asin):
analysis.issues.append(f"Invalid ASIN format in product[{i}]: {asin}")
return analysis
def _try_close_json(self, text: str) -> Optional[str]:
"""Try to close truncated JSON by counting open/close brackets."""
open_braces = text.count('{') - text.count('}')
open_brackets = text.count('[') - text.count(']')
if open_braces <= 0 and open_brackets <= 0:
return None # Not a bracket issue
# Remove any trailing partial key-value pair
text = re.sub(r',\s*"[^"]*$', '', text) # Remove trailing partial key
text = re.sub(r',\s*$', '', text) # Remove trailing comma
# Close brackets
text += ']' * open_brackets + '}' * open_braces
return text
5. MangaAssist Scenarios
Scenario A: Prime Day Throttling Cascade
Context: Prime Day traffic increases MangaAssist usage 5x. Bedrock starts returning 429 ThrottlingException for 15% of requests.
Symptom: Users see "Sorry, I'm having trouble responding right now" messages. Response latency P95 jumps from 2.5s to 12s due to retries.
Detection: CloudWatch alarm fires on BedrockThrottleRate > 5%. Structured logs show status: "throttled" with retry_count: 2 before failure.
Resolution: 1. Circuit breaker opens after 5 consecutive throttle failures → template responses served instantly 2. Backoff with jitter prevents retry storms from making throttling worse 3. Ops team requests Bedrock provisioned throughput increase for the event
Prevention: - Pre-provision Bedrock throughput before known traffic events - Configure lower-cost model fallback (Claude Haiku) for simple intents during peak
Scenario B: Model Version Deprecation
Context: Model ID anthropic.claude-3-sonnet-20240229-v1:0 is deprecated. The MangaAssist staging environment starts failing with 404 ResourceNotFoundException.
Detection: BedrockRequestValidator catches the error in staging. Structured logs show error_code: "ResourceNotFoundException" for 100% of requests.
Resolution: Update model_id configuration to anthropic.claude-3-5-sonnet-20241022-v2:0. Run prompt regression tests (see file 03) to verify output quality with the new model version.
Prevention: Subscribe to AWS Bedrock model lifecycle notifications. Add model ID to externalized configuration (not hardcoded).
Scenario C: Streaming Response Interruption
Context: WebSocket connection drops mid-stream for ~3% of users on mobile networks. The user sees a partial response that cuts off mid-sentence.
Detection: CloudWatch metric StreamInterruptionRate (emitted by WebSocket handler) > 2%. Structured logs show stop_reason: "connection_closed" instead of "end_turn".
Resolution:
1. Implement response buffering in the orchestrator — buffer the full response before streaming to the WebSocket
2. Add a GET /chat/message/{response_id} fallback endpoint so clients can fetch the complete response after reconnection
3. Client-side retry: on WebSocket reconnect, fetch the buffered response for any in-flight response IDs
6. CloudWatch Dashboard and Alerts
Key Metrics
| Metric | Alarm Threshold | Action |
|---|---|---|
BedrockLatencyP95 |
> 3,000ms | Warn: check model throughput |
BedrockErrorRate |
> 5% for 2 min | Page: circuit breaker likely open |
BedrockThrottleRate |
> 2% for 5 min | Warn: request throughput increase |
CircuitBreakerState |
OPEN for > 1 min | Page: all requests on template fallback |
ResponseRepairRate |
> 10% | Warn: FM output quality degraded |
ValidationRejectionRate |
> 1% | Warn: upstream sending bad payloads |
CloudWatch Logs Insights Queries
Bedrock error breakdown by type:
fields @timestamp, error_code, intent, latency_ms
| filter log_type = "bedrock_call" and status != "success"
| stats count(*) as error_count by error_code, intent
| sort error_count desc
Retry effectiveness — how often do retries succeed?
fields @timestamp, correlation_id, retry_count, status
| filter log_type = "bedrock_call" and retry_count > 0
| stats count(*) as total,
sum(case when status = "success" then 1 else 0 end) as retry_successes
| display total, retry_successes, (retry_successes * 100.0 / total) as retry_success_pct
Circuit breaker state changes:
fields @timestamp, @message
| filter @message like /Circuit breaker transition/
| sort @timestamp desc
| limit 50
Latency percentiles by intent:
fields @timestamp, intent, latency_ms
| filter log_type = "bedrock_call" and status = "success"
| stats avg(latency_ms) as avg_ms,
percentile(latency_ms, 50) as p50,
percentile(latency_ms, 95) as p95,
percentile(latency_ms, 99) as p99
by intent
| sort p95 desc
Intuition Gained
What Mental Model You Build
FM integration troubleshooting teaches you to read API failures like a story. Every failure has a character (the error code), a plot (the sequence of events leading to it), and a moral (the systemic fix that prevents recurrence).
You develop three core instincts:
1. The Transient vs. Systemic Instinct: A single 429 is traffic noise. Five consecutive 429s is a capacity problem. A 404 is never transient — it means something changed permanently. You learn to classify failures instantly and respond appropriately: retry transient errors, escalate systemic ones, and never retry permanent failures.
2. The Graceful Degradation Instinct: You stop thinking in terms of "works" vs. "broken" and start thinking in degradation levels. The chatbot has five modes: (1) full FM response, (2) FM response with repaired JSON, (3) FM response from backup model (Haiku), (4) template response for known intent, (5) apologetic message with link to human agent. You design systems with all five modes built in.
3. The Instrumentation-First Instinct: You learn that the cost of adding a structured log entry is near-zero, but the cost of debugging without one is hours. Every Bedrock call gets a correlation ID, input token count, output token count, latency, status, and retry count — because the one field you skip is the one you need at 2 AM during an incident.
How This Intuition Guides Future Decisions
- When integrating a new FM provider: You immediately ask "What are the error codes? What's the retry policy? What's the rate limit?" before writing any prompt logic. You build the circuit breaker and fallback path before the happy path.
- When evaluating FM reliability: You look beyond uptime SLAs. You ask about throttling behavior under load, error response consistency, and streaming reliability. A service with 99.9% uptime but unpredictable throttling is harder to integrate than one with 99.5% uptime and clean error codes.
- When designing multi-model architectures: You know that model tiering (Sonnet for complex, Haiku for simple) is not just a cost optimization — it is a reliability pattern. If Sonnet is throttled, Haiku might not be. If one model version is deprecated, the others keep running.
- When debugging user-reported issues: You trace the correlation ID through logs before looking at the prompt or model output. Eight times out of ten, the issue is in the integration layer (timeout, truncation, throttling), not in the model itself.