02: Task 1.2 Select and Configure Foundation Models
AIP-C01 Mapping
Content Domain 1: Foundation Model Integration, Data Management, and Compliance Task 1.2: Select and configure FMs.
Task Goal
Choose the right model strategy for each business capability, decouple the application from any one provider, and ensure model operations remain resilient even when traffic spikes, regions fail, or customized models need to be replaced.
Task User Story
As a GenAI platform owner, I want to evaluate, route, customize, and operate foundation models through policy-driven architecture, So that the business gets the best balance of quality, speed, cost, resilience, and maintainability.
Task Architecture View
graph TD
A[Application Request] --> B[Model Policy Router]
B --> C[AppConfig Rules]
C --> D[Primary FM on Bedrock]
C --> E[Fallback FM]
C --> F[Cross-Region Inference]
C --> G[Customized Model Endpoint]
D --> H[Telemetry and Quality Tracking]
E --> H
F --> H
G --> H
H --> I[Model Registry and Lifecycle]
I --> J[CI/CD Promotion and Rollback]
Skill 1.2.1: Assess and Choose FMs
User Story
As a GenAI architect, I want to evaluate candidate foundation models against the real needs of each workload, So that model choice becomes a disciplined engineering decision rather than a popularity contest.
Deep Dive
The right model depends on the task. The best general model is often the wrong operational choice.
| Evaluation Axis | Why It Matters | Example Questions |
|---|---|---|
| Capability fit | Determines whether the model can actually perform the task | Does it handle long-context reasoning, tool use, or multimodal inputs? |
| Latency | Affects user experience and concurrency planning | Can it stay within the session SLA under peak traffic? |
| Cost | Impacts sustainability of the use case | Is the quality gain worth the additional token or endpoint cost? |
| Safety and control | Matters for regulated and customer-facing workflows | Can we apply guardrails and trace decisions cleanly? |
| Operational availability | Shapes reliability | Is the model available in the required regions and quotas? |
For MangaAssist, a sensible split might be:
- Haiku-class model for classification, drafting, and simple FAQ responses
- Sonnet-class model for grounded multi-turn support and recommendation synthesis
- Embedding model optimized separately for search quality, not chosen by text-generation brand alignment
Acceptance Signals
- Model-selection decisions include benchmarks, tradeoffs, and business implications
- Each model has a defined role instead of being used everywhere
- Selection criteria include both qualitative review and measurable performance data
- Limitations are documented up front, including known weak intents or languages
Skill 1.2.2: Create Flexible Architecture Patterns for Dynamic Model Selection
User Story
As a platform engineer, I want to switch models or providers through configuration rather than code edits, So that the system can adapt quickly to pricing, outages, policy changes, or quality findings.
Deep Dive
The application should depend on a stable capability contract, not on a provider-specific request body.
| Layer | Responsibility | Good Pattern |
|---|---|---|
| API Gateway | Stable client entry point | Keeps callers insulated from backend FM changes |
| Lambda or orchestration layer | Normalize request schema and route by policy | Maps use case to provider/model dynamically |
| AppConfig | Holds routing rules, thresholds, and feature flags | Lets teams change default models without deployment |
| Adapter layer | Converts standard request to provider-specific payload | Avoids vendor-specific code spreading through the app |
Design Pattern
- Define a common request contract:
task_type,latency_tier,safety_level,context_window,response_format - Resolve model choice through config at runtime
- Support canary traffic percentages per model
- Log routing decisions so quality and cost can be attributed later
Acceptance Signals
- A model swap can happen through configuration or deployment metadata
- Callers do not need code changes to switch providers or versions
- Routing decisions are observable and reversible
- The abstraction preserves important controls such as temperature, max tokens, and schema constraints
Skill 1.2.3: Design Resilient AI Systems for Continuous Operation
User Story
As a reliability-focused architect, I want to design model-serving paths that continue operating during provider, region, or quota disruptions, So that user-facing experiences degrade gracefully instead of failing completely.
Deep Dive
GenAI resilience is not just retry logic. It is a fallback ladder.
graph LR
A[Primary Model Call] --> B{Success?}
B -->|Yes| C[Return Result]
B -->|No| D[Circuit Breaker]
D --> E[Cross-Region Inference]
E --> F{Recovered?}
F -->|Yes| C
F -->|No| G[Fallback Model]
G --> H{Still Failing?}
H -->|No| C
H -->|Yes| I[Graceful Degradation]
Resilience Techniques
- Circuit breaker around repeated provider failures
- Cross-Region Inference in Bedrock for limited regional availability
- Secondary model for lower-tier answers or summarization-only mode
- Cached responses, templates, or human escalation for the last-resort path
Acceptance Signals
- The system defines primary, secondary, and degraded response modes
- Retry behavior avoids uncontrolled cost and latency explosion
- Cross-region or alternate-provider fallback is tested, not just diagrammed
- Monitoring distinguishes provider outage, quota exhaustion, and application bugs
Skill 1.2.4: Implement FM Customization Deployment and Lifecycle Management
User Story
As a model platform owner, I want to manage customized foundation models from adaptation to retirement, So that domain-specific improvements can be delivered safely with version control, rollback, and governance.
Deep Dive
Customized models should be treated like productized assets, not one-time experiments.
| Lifecycle Stage | What Must Happen | Typical AWS Support |
|---|---|---|
| Experimentation | Compare base model vs tuned variant | SageMaker training jobs, offline evaluation |
| Registration | Store version, lineage, metrics, and approval status | SageMaker Model Registry |
| Deployment | Promote to endpoint or managed serving path | SageMaker endpoints, pipelines |
| Release control | Gradual rollout and rollback | CI/CD, AppConfig, blue-green or shadow deployment |
| Retirement | Replace stale or weak models cleanly | Version policy, archive and decommission workflow |
Customization choices should be proportional to the problem:
- Use prompting or retrieval first if they solve the issue
- Use LoRA or adapters when domain lift is needed without retraining everything
- Use heavier fine-tuning only when repeated evidence shows prompting and RAG are not enough
Acceptance Signals
- Every customized model version has training lineage and evaluation evidence
- Promotion and rollback are automated
- The serving layer can run old and new versions side by side
- Retirement criteria are explicit, including stale data, lower accuracy, or unacceptable cost
Intuition Gained After Task 1.2
Task 1.2 teaches that model choice is really portfolio management. Different workloads deserve different models, and the routing policy is often more important than the individual model itself.
You also build the instinct that resilience must be designed at the FM layer, not bolted on later. A GenAI system without region, provider, or quality fallback paths is operationally fragile even if its application code is clean.
Finally, customization is only valuable when the organization can operate it. A tuned model without versioning, promotion gates, and rollback is not an asset. It is a hidden reliability risk.