The Three Pillars of AI-First Production Engineering
Every user story in this folder maps back to one of three pillars. Knowing which pillar a problem lives in tells you which playbook to reach for.
Pillar 1 — AI Workflow Design
What it is: How execution flows through the agent, how skills are composed, how sub-agents coordinate, how evals close the loop.
This is the part most teams skip. They write a prompt, wrap it in a Lambda, and ship. Then prod hits them with: "what happens when tool 3 of 5 fails halfway?", "how do you trace a request that fanned out to 6 sub-agents?", "how do you eval a workflow that takes 90 seconds end-to-end?"
Sub-concepts
- Execution graph — DAG vs state machine vs free-form ReAct loop. (See
User-Stories/01-execution-flow-design.md.) - Skill composition — How tools/skills are registered, discovered, invoked, and chained. (See
02-skill-composition-and-invocation.md.) - Sub-agent orchestration — Parent–child agent contracts, context handoff, blast radius isolation. (See
03-sub-agent-orchestration.md.) - Active + passive evals — Online judges, shadow runs, replay harnesses, drift detection. (See
04-active-passive-evals.md.)
Pillar 2 — System Design (the harness)
What it is: The infrastructure that makes long-running, externally-dependent, multi-step LLM workflows reliable at scale.
This is where AI-first looks the most like classic distributed systems — but with new twists. A single user turn in MangaAssist may invoke 8 tools, span 4 regions, write to 3 datastores, and take 14 seconds. Standard request-response thinking does not survive contact with that.
Sub-concepts
- Pause / resume workflows — Long-tail tool calls (catalog reindex check, human-in-the-loop approvals). (See
05-pause-resume-workflows.md.) - Checkpointing + serving — Durable state for in-flight conversations across deploys, instance failures, model swaps. (See
06-checkpointing-and-serving.md.) - Sync vs async invocation — When the user waits, when the system queues, how streaming bridges them. (See
07-sync-async-invocation.md.) - Observability, cost tracking, versioning, rate limiting, quotas — All operational concerns rolled into one cross-cutting story. (See
08-observability-cost-versioning-ratelimits.md.)
Pillar 3 — Low-Level Design (extensibility & boundaries)
What it is: The interfaces, contracts, and abstractions that decide whether the system can grow from 7 tools to 70 without rewriting itself.
LLD becomes life-and-death in AI systems because the surface area expands fast: new tool, new model, new locale, new safety check, new fallback. Over-abstraction kills you (rigid framework that can't model the new tool); under-abstraction kills you (every new tool needs a new branch in 14 files).
Sub-concepts
- Tool/skill interface contract — Inputs, outputs, errors, idempotency, timeouts.
- Provider adapter pattern — Bedrock, OpenAI, Anthropic direct, in-house FM behind one shape.
- Fallback chains — Primary → secondary → cached → rule-based, each with its own SLO.
- Capability flags — Which tools/models are allowed for which user tier / locale / experiment.
These are woven throughout the user stories rather than getting their own file — they show up as "the LLD that makes this story implementable."
How the pillars interact
flowchart TB
subgraph P1[Pillar 1 - AI Workflow]
EG[Execution Graph]
SK[Skill Composition]
SA[Sub-Agent Orchestration]
EV[Evals]
end
subgraph P2[Pillar 2 - System Design / Harness]
PR[Pause-Resume]
CP[Checkpoints]
SYNC[Sync-Async]
OB[Observability + Cost + Quota]
end
subgraph P3[Pillar 3 - Low-Level Design]
TC[Tool Contracts]
AD[Provider Adapters]
FB[Fallback Chains]
CF[Capability Flags]
end
EG --> PR
SK --> TC
SA --> CP
EV --> OB
TC --> AD
AD --> FB
PR --> SYNC
CP --> SYNC
OB --> CF
CF --> SK
Read the arrows as: a decision in one pillar forces decisions in the others. The execution graph (P1) defines what state the harness (P2) has to checkpoint, which defines what tool contracts (P3) are durable.
The "AI-pilled" gradient
The user message that prompted this folder said "I am not as AI-pilled as some of the other folks in the org, but I am getting there." That's worth naming directly. The gradient looks like this:
| Stage | Mindset | Symptom |
|---|---|---|
| 0. API-first | "LLM is a function call" | Prompt-in, string-out, no harness |
| 1. Workflow-aware | "LLM has multi-step turns" | Adds retries, tool calls, simple state |
| 2. Eval-aware | "We have to measure quality" | Builds offline eval set, ships canary |
| 3. Harness-aware | "Long-running is the default" | Pause/resume, checkpoints, fallback chains |
| 4. AI-first | "The harness IS the product" | Treats prompts as low-stakes config; treats orchestration as the hard part |
This folder is written to push from Stage 2 → Stage 4. Each user story names which stage it unlocks.
How constraints change everything (preview)
The second half of this folder (Changing-Constraints-Scenarios/) takes each pillar and asks: what happens when the ground shifts?
| Constraint change | Pillar most disrupted |
|---|---|
| 10× user surge overnight | P2 — harness must absorb burst |
| FM deprecated by provider | P3 — adapter layer is tested |
| Cost budget halved | All three — need workflow, harness, and LLD pivots |
| Latency SLA tightened | P1 + P2 — graph reshape + serving rework |
| New compliance rule | P1 (eval) + P3 (capability flags) |
| Tool count 7 → 70 | P3 dominates — LLD is the bottleneck |
| Provider quota revoked | P3 (adapters) + P2 (rate limit / quota mgr) |
| New locale launched in 2 weeks | All three — ground truth, infra, contracts |
The intuition you build by walking these scenarios is: architecture is a stance under uncertainty, not a static drawing.