02: Task 1.2 Select and Configure Foundation Models

AIP-C01 Mapping

Content Domain 1: Foundation Model Integration, Data Management, and Compliance Task 1.2: Select and configure FMs.

Task Goal

Choose the right model strategy for each business capability, decouple the application from any one provider, and ensure model operations remain resilient even when traffic spikes, regions fail, or customized models need to be replaced.

Task User Story

As a GenAI platform owner, I want to evaluate, route, customize, and operate foundation models through policy-driven architecture, So that the business gets the best balance of quality, speed, cost, resilience, and maintainability.

Task Architecture View

graph TD
    A[Application Request] --> B[Model Policy Router]
    B --> C[AppConfig Rules]
    C --> D[Primary FM on Bedrock]
    C --> E[Fallback FM]
    C --> F[Cross-Region Inference]
    C --> G[Customized Model Endpoint]

    D --> H[Telemetry and Quality Tracking]
    E --> H
    F --> H
    G --> H

    H --> I[Model Registry and Lifecycle]
    I --> J[CI/CD Promotion and Rollback]

Skill 1.2.1: Assess and Choose FMs

User Story

As a GenAI architect, I want to evaluate candidate foundation models against the real needs of each workload, So that model choice becomes a disciplined engineering decision rather than a popularity contest.

Deep Dive

The right model depends on the task. The best general model is often the wrong operational choice.

Evaluation Axis	Why It Matters	Example Questions
Capability fit	Determines whether the model can actually perform the task	Does it handle long-context reasoning, tool use, or multimodal inputs?
Latency	Affects user experience and concurrency planning	Can it stay within the session SLA under peak traffic?
Cost	Impacts sustainability of the use case	Is the quality gain worth the additional token or endpoint cost?
Safety and control	Matters for regulated and customer-facing workflows	Can we apply guardrails and trace decisions cleanly?
Operational availability	Shapes reliability	Is the model available in the required regions and quotas?

For MangaAssist, a sensible split might be:

Haiku-class model for classification, drafting, and simple FAQ responses
Sonnet-class model for grounded multi-turn support and recommendation synthesis
Embedding model optimized separately for search quality, not chosen by text-generation brand alignment

Acceptance Signals

Model-selection decisions include benchmarks, tradeoffs, and business implications
Each model has a defined role instead of being used everywhere
Selection criteria include both qualitative review and measurable performance data
Limitations are documented up front, including known weak intents or languages

Skill 1.2.2: Create Flexible Architecture Patterns for Dynamic Model Selection

User Story

As a platform engineer, I want to switch models or providers through configuration rather than code edits, So that the system can adapt quickly to pricing, outages, policy changes, or quality findings.

Deep Dive

The application should depend on a stable capability contract, not on a provider-specific request body.

Layer	Responsibility	Good Pattern
API Gateway	Stable client entry point	Keeps callers insulated from backend FM changes
Lambda or orchestration layer	Normalize request schema and route by policy	Maps use case to provider/model dynamically
AppConfig	Holds routing rules, thresholds, and feature flags	Lets teams change default models without deployment
Adapter layer	Converts standard request to provider-specific payload	Avoids vendor-specific code spreading through the app

Design Pattern

Define a common request contract: task_type, latency_tier, safety_level, context_window, response_format
Resolve model choice through config at runtime
Support canary traffic percentages per model
Log routing decisions so quality and cost can be attributed later

Acceptance Signals

A model swap can happen through configuration or deployment metadata
Callers do not need code changes to switch providers or versions
Routing decisions are observable and reversible
The abstraction preserves important controls such as temperature, max tokens, and schema constraints

Skill 1.2.3: Design Resilient AI Systems for Continuous Operation

User Story

As a reliability-focused architect, I want to design model-serving paths that continue operating during provider, region, or quota disruptions, So that user-facing experiences degrade gracefully instead of failing completely.

Deep Dive

GenAI resilience is not just retry logic. It is a fallback ladder.

graph LR
    A[Primary Model Call] --> B{Success?}
    B -->|Yes| C[Return Result]
    B -->|No| D[Circuit Breaker]
    D --> E[Cross-Region Inference]
    E --> F{Recovered?}
    F -->|Yes| C
    F -->|No| G[Fallback Model]
    G --> H{Still Failing?}
    H -->|No| C
    H -->|Yes| I[Graceful Degradation]

Resilience Techniques

Circuit breaker around repeated provider failures
Cross-Region Inference in Bedrock for limited regional availability
Secondary model for lower-tier answers or summarization-only mode
Cached responses, templates, or human escalation for the last-resort path

Acceptance Signals

The system defines primary, secondary, and degraded response modes
Retry behavior avoids uncontrolled cost and latency explosion
Cross-region or alternate-provider fallback is tested, not just diagrammed
Monitoring distinguishes provider outage, quota exhaustion, and application bugs

Skill 1.2.4: Implement FM Customization Deployment and Lifecycle Management

User Story

As a model platform owner, I want to manage customized foundation models from adaptation to retirement, So that domain-specific improvements can be delivered safely with version control, rollback, and governance.

Deep Dive

Customized models should be treated like productized assets, not one-time experiments.

Lifecycle Stage	What Must Happen	Typical AWS Support
Experimentation	Compare base model vs tuned variant	SageMaker training jobs, offline evaluation
Registration	Store version, lineage, metrics, and approval status	SageMaker Model Registry
Deployment	Promote to endpoint or managed serving path	SageMaker endpoints, pipelines
Release control	Gradual rollout and rollback	CI/CD, AppConfig, blue-green or shadow deployment
Retirement	Replace stale or weak models cleanly	Version policy, archive and decommission workflow

Customization choices should be proportional to the problem:

Use prompting or retrieval first if they solve the issue
Use LoRA or adapters when domain lift is needed without retraining everything
Use heavier fine-tuning only when repeated evidence shows prompting and RAG are not enough

Acceptance Signals

Every customized model version has training lineage and evaluation evidence
Promotion and rollback are automated
The serving layer can run old and new versions side by side
Retirement criteria are explicit, including stale data, lower accuracy, or unacceptable cost

Intuition Gained After Task 1.2

Task 1.2 teaches that model choice is really portfolio management. Different workloads deserve different models, and the routing policy is often more important than the individual model itself.

You also build the instinct that resilience must be designed at the FM layer, not bolted on later. A GenAI system without region, provider, or quality fallback paths is operationally fragile even if its application code is clean.

Finally, customization is only valuable when the organization can operate it. A tuned model without versioning, promotion gates, and rollback is not an asset. It is a hidden reliability risk.

02: Task 1.2 Select and Configure Foundation Models

AIP-C01 Mapping

Task Goal

Task User Story

Task Architecture View

Skill 1.2.1: Assess and Choose FMs

User Story

Deep Dive

Acceptance Signals

Skill 1.2.2: Create Flexible Architecture Patterns for Dynamic Model Selection

User Story

Deep Dive

Design Pattern

Acceptance Signals

Skill 1.2.3: Design Resilient AI Systems for Continuous Operation

User Story

Deep Dive

Resilience Techniques

Acceptance Signals

Skill 1.2.4: Implement FM Customization Deployment and Lifecycle Management

User Story

Deep Dive

Acceptance Signals

Intuition Gained After Task 1.2

References