LOCAL PREVIEW View on GitHub

02: Task 1.2 Select and Configure Foundation Models

AIP-C01 Mapping

Content Domain 1: Foundation Model Integration, Data Management, and Compliance Task 1.2: Select and configure FMs.


Task Goal

Choose the right model strategy for each business capability, decouple the application from any one provider, and ensure model operations remain resilient even when traffic spikes, regions fail, or customized models need to be replaced.


Task User Story

As a GenAI platform owner, I want to evaluate, route, customize, and operate foundation models through policy-driven architecture, So that the business gets the best balance of quality, speed, cost, resilience, and maintainability.


Task Architecture View

graph TD
    A[Application Request] --> B[Model Policy Router]
    B --> C[AppConfig Rules]
    C --> D[Primary FM on Bedrock]
    C --> E[Fallback FM]
    C --> F[Cross-Region Inference]
    C --> G[Customized Model Endpoint]

    D --> H[Telemetry and Quality Tracking]
    E --> H
    F --> H
    G --> H

    H --> I[Model Registry and Lifecycle]
    I --> J[CI/CD Promotion and Rollback]

Skill 1.2.1: Assess and Choose FMs

User Story

As a GenAI architect, I want to evaluate candidate foundation models against the real needs of each workload, So that model choice becomes a disciplined engineering decision rather than a popularity contest.

Deep Dive

The right model depends on the task. The best general model is often the wrong operational choice.

Evaluation Axis Why It Matters Example Questions
Capability fit Determines whether the model can actually perform the task Does it handle long-context reasoning, tool use, or multimodal inputs?
Latency Affects user experience and concurrency planning Can it stay within the session SLA under peak traffic?
Cost Impacts sustainability of the use case Is the quality gain worth the additional token or endpoint cost?
Safety and control Matters for regulated and customer-facing workflows Can we apply guardrails and trace decisions cleanly?
Operational availability Shapes reliability Is the model available in the required regions and quotas?

For MangaAssist, a sensible split might be:

  • Haiku-class model for classification, drafting, and simple FAQ responses
  • Sonnet-class model for grounded multi-turn support and recommendation synthesis
  • Embedding model optimized separately for search quality, not chosen by text-generation brand alignment

Acceptance Signals

  • Model-selection decisions include benchmarks, tradeoffs, and business implications
  • Each model has a defined role instead of being used everywhere
  • Selection criteria include both qualitative review and measurable performance data
  • Limitations are documented up front, including known weak intents or languages

Skill 1.2.2: Create Flexible Architecture Patterns for Dynamic Model Selection

User Story

As a platform engineer, I want to switch models or providers through configuration rather than code edits, So that the system can adapt quickly to pricing, outages, policy changes, or quality findings.

Deep Dive

The application should depend on a stable capability contract, not on a provider-specific request body.

Layer Responsibility Good Pattern
API Gateway Stable client entry point Keeps callers insulated from backend FM changes
Lambda or orchestration layer Normalize request schema and route by policy Maps use case to provider/model dynamically
AppConfig Holds routing rules, thresholds, and feature flags Lets teams change default models without deployment
Adapter layer Converts standard request to provider-specific payload Avoids vendor-specific code spreading through the app

Design Pattern

  • Define a common request contract: task_type, latency_tier, safety_level, context_window, response_format
  • Resolve model choice through config at runtime
  • Support canary traffic percentages per model
  • Log routing decisions so quality and cost can be attributed later

Acceptance Signals

  • A model swap can happen through configuration or deployment metadata
  • Callers do not need code changes to switch providers or versions
  • Routing decisions are observable and reversible
  • The abstraction preserves important controls such as temperature, max tokens, and schema constraints

Skill 1.2.3: Design Resilient AI Systems for Continuous Operation

User Story

As a reliability-focused architect, I want to design model-serving paths that continue operating during provider, region, or quota disruptions, So that user-facing experiences degrade gracefully instead of failing completely.

Deep Dive

GenAI resilience is not just retry logic. It is a fallback ladder.

graph LR
    A[Primary Model Call] --> B{Success?}
    B -->|Yes| C[Return Result]
    B -->|No| D[Circuit Breaker]
    D --> E[Cross-Region Inference]
    E --> F{Recovered?}
    F -->|Yes| C
    F -->|No| G[Fallback Model]
    G --> H{Still Failing?}
    H -->|No| C
    H -->|Yes| I[Graceful Degradation]

Resilience Techniques

  • Circuit breaker around repeated provider failures
  • Cross-Region Inference in Bedrock for limited regional availability
  • Secondary model for lower-tier answers or summarization-only mode
  • Cached responses, templates, or human escalation for the last-resort path

Acceptance Signals

  • The system defines primary, secondary, and degraded response modes
  • Retry behavior avoids uncontrolled cost and latency explosion
  • Cross-region or alternate-provider fallback is tested, not just diagrammed
  • Monitoring distinguishes provider outage, quota exhaustion, and application bugs

Skill 1.2.4: Implement FM Customization Deployment and Lifecycle Management

User Story

As a model platform owner, I want to manage customized foundation models from adaptation to retirement, So that domain-specific improvements can be delivered safely with version control, rollback, and governance.

Deep Dive

Customized models should be treated like productized assets, not one-time experiments.

Lifecycle Stage What Must Happen Typical AWS Support
Experimentation Compare base model vs tuned variant SageMaker training jobs, offline evaluation
Registration Store version, lineage, metrics, and approval status SageMaker Model Registry
Deployment Promote to endpoint or managed serving path SageMaker endpoints, pipelines
Release control Gradual rollout and rollback CI/CD, AppConfig, blue-green or shadow deployment
Retirement Replace stale or weak models cleanly Version policy, archive and decommission workflow

Customization choices should be proportional to the problem:

  • Use prompting or retrieval first if they solve the issue
  • Use LoRA or adapters when domain lift is needed without retraining everything
  • Use heavier fine-tuning only when repeated evidence shows prompting and RAG are not enough

Acceptance Signals

  • Every customized model version has training lineage and evaluation evidence
  • Promotion and rollback are automated
  • The serving layer can run old and new versions side by side
  • Retirement criteria are explicit, including stale data, lower accuracy, or unacceptable cost

Intuition Gained After Task 1.2

Task 1.2 teaches that model choice is really portfolio management. Different workloads deserve different models, and the routing policy is often more important than the individual model itself.

You also build the instinct that resilience must be designed at the FM layer, not bolted on later. A GenAI system without region, provider, or quality fallback paths is operationally fragile even if its application code is clean.

Finally, customization is only valuable when the organization can operate it. A tuned model without versioning, promotion gates, and rollback is not an asset. It is a hidden reliability risk.


References