The Three Pillars of AI-First Production Engineering

Every user story in this folder maps back to one of three pillars. Knowing which pillar a problem lives in tells you which playbook to reach for.

Pillar 1 — AI Workflow Design

What it is: How execution flows through the agent, how skills are composed, how sub-agents coordinate, how evals close the loop.

This is the part most teams skip. They write a prompt, wrap it in a Lambda, and ship. Then prod hits them with: "what happens when tool 3 of 5 fails halfway?", "how do you trace a request that fanned out to 6 sub-agents?", "how do you eval a workflow that takes 90 seconds end-to-end?"

Sub-concepts

Execution graph — DAG vs state machine vs free-form ReAct loop. (See User-Stories/01-execution-flow-design.md.)
Skill composition — How tools/skills are registered, discovered, invoked, and chained. (See 02-skill-composition-and-invocation.md.)
Sub-agent orchestration — Parent–child agent contracts, context handoff, blast radius isolation. (See 03-sub-agent-orchestration.md.)
Active + passive evals — Online judges, shadow runs, replay harnesses, drift detection. (See 04-active-passive-evals.md.)

Pillar 2 — System Design (the harness)

What it is: The infrastructure that makes long-running, externally-dependent, multi-step LLM workflows reliable at scale.

This is where AI-first looks the most like classic distributed systems — but with new twists. A single user turn in MangaAssist may invoke 8 tools, span 4 regions, write to 3 datastores, and take 14 seconds. Standard request-response thinking does not survive contact with that.

Sub-concepts

Pause / resume workflows — Long-tail tool calls (catalog reindex check, human-in-the-loop approvals). (See 05-pause-resume-workflows.md.)
Checkpointing + serving — Durable state for in-flight conversations across deploys, instance failures, model swaps. (See 06-checkpointing-and-serving.md.)
Sync vs async invocation — When the user waits, when the system queues, how streaming bridges them. (See 07-sync-async-invocation.md.)
Observability, cost tracking, versioning, rate limiting, quotas — All operational concerns rolled into one cross-cutting story. (See 08-observability-cost-versioning-ratelimits.md.)

Pillar 3 — Low-Level Design (extensibility & boundaries)

What it is: The interfaces, contracts, and abstractions that decide whether the system can grow from 7 tools to 70 without rewriting itself.

LLD becomes life-and-death in AI systems because the surface area expands fast: new tool, new model, new locale, new safety check, new fallback. Over-abstraction kills you (rigid framework that can't model the new tool); under-abstraction kills you (every new tool needs a new branch in 14 files).

Sub-concepts

Tool/skill interface contract — Inputs, outputs, errors, idempotency, timeouts.
Provider adapter pattern — Bedrock, OpenAI, Anthropic direct, in-house FM behind one shape.
Fallback chains — Primary → secondary → cached → rule-based, each with its own SLO.
Capability flags — Which tools/models are allowed for which user tier / locale / experiment.

These are woven throughout the user stories rather than getting their own file — they show up as "the LLD that makes this story implementable."

How the pillars interact

flowchart TB
  subgraph P1[Pillar 1 - AI Workflow]
    EG[Execution Graph]
    SK[Skill Composition]
    SA[Sub-Agent Orchestration]
    EV[Evals]
  end

  subgraph P2[Pillar 2 - System Design / Harness]
    PR[Pause-Resume]
    CP[Checkpoints]
    SYNC[Sync-Async]
    OB[Observability + Cost + Quota]
  end

  subgraph P3[Pillar 3 - Low-Level Design]
    TC[Tool Contracts]
    AD[Provider Adapters]
    FB[Fallback Chains]
    CF[Capability Flags]
  end

  EG --> PR
  SK --> TC
  SA --> CP
  EV --> OB
  TC --> AD
  AD --> FB
  PR --> SYNC
  CP --> SYNC
  OB --> CF
  CF --> SK

Read the arrows as: a decision in one pillar forces decisions in the others. The execution graph (P1) defines what state the harness (P2) has to checkpoint, which defines what tool contracts (P3) are durable.

The "AI-pilled" gradient

The user message that prompted this folder said "I am not as AI-pilled as some of the other folks in the org, but I am getting there." That's worth naming directly. The gradient looks like this:

Stage	Mindset	Symptom
0. API-first	"LLM is a function call"	Prompt-in, string-out, no harness
1. Workflow-aware	"LLM has multi-step turns"	Adds retries, tool calls, simple state
2. Eval-aware	"We have to measure quality"	Builds offline eval set, ships canary
3. Harness-aware	"Long-running is the default"	Pause/resume, checkpoints, fallback chains
4. AI-first	"The harness IS the product"	Treats prompts as low-stakes config; treats orchestration as the hard part

This folder is written to push from Stage 2 → Stage 4. Each user story names which stage it unlocks.

How constraints change everything (preview)

The second half of this folder (Changing-Constraints-Scenarios/) takes each pillar and asks: what happens when the ground shifts?

Constraint change	Pillar most disrupted
10× user surge overnight	P2 — harness must absorb burst
FM deprecated by provider	P3 — adapter layer is tested
Cost budget halved	All three — need workflow, harness, and LLD pivots
Latency SLA tightened	P1 + P2 — graph reshape + serving rework
New compliance rule	P1 (eval) + P3 (capability flags)
Tool count 7 → 70	P3 dominates — LLD is the bottleneck
Provider quota revoked	P3 (adapters) + P2 (rate limit / quota mgr)
New locale launched in 2 weeks	All three — ground truth, infra, contracts

The intuition you build by walking these scenarios is: architecture is a stance under uncertainty, not a static drawing.