Task 3.1: Implement Input and Output Safety Controls
This folder covers the runtime safety controls that sit closest to user traffic. For MangaAssist, that means every user message, retrieved context chunk, tool result, and generated answer must pass through safety checks without making the shopping experience feel broken or overly restrictive.
Included Skills
| Skill | File | Focus |
|---|---|---|
| 3.1.1 | 01-harmful-input-safety-systems.md | Detect and handle unsafe user inputs before they reach the FM |
| 3.1.2 | 02-harmful-output-safety-frameworks.md | Prevent unsafe or policy-violating model outputs |
| 3.1.3 | 03-accuracy-verification-hallucination-control.md | Reduce hallucinations through grounding and verification |
| 3.1.4 | 04-defense-in-depth-safety-architecture.md | Combine filters across pre-processing, generation, and response stages |
| 3.1.5 | 05-adversarial-threat-detection.md | Detect prompt injection, jailbreaks, and adversarial patterns |
| 3.1.1 Supplement | 06-step-functions-failures-and-langgraph-solutions.md | 12 production incidents where Step Functions was the bottleneck and how LangGraph/LangChain resolved them |
Runtime Safety Themes
- Intent salvage: if a customer is angry but still needs order help, preserve the useful path.
- Grounded refusal: do not just block; explain the boundary and redirect safely.
- Layered controls: guardrails fail open if you rely on one layer only.
- Deterministic answers for high-risk facts: pricing, returns eligibility, and shipping promises should come from tools or approved sources, not free-form generation.
- Adversarial readiness: assume attacks arrive through user text, retrieved documents, review content, or tool outputs.