Guardrails
Safety controls layered around an LLM or agent to prevent harmful, off-policy, or non-compliant outputs and actions.
Guardrails are the safety net between an agent’s output and the user (or external systems). Common shapes:
- Input guardrails — block or sanitize unsafe user inputs before they reach the model
- Output guardrails — scan model responses for forbidden content (PII leakage, off-topic outputs, policy violations)
- Action guardrails — block tool calls that violate policy (write to production, send to external addresses, exceed cost limits)
- Behavioral guardrails — enforce constraints on the agent loop (max iterations, max tokens, escalation thresholds)
Open-source frameworks like NVIDIA NeMo Guardrails and Guardrails AI provide structured policy engines. Most production agents combine these with custom checks specific to the use case.
Guardrails are not the same as agent policy, though they overlap. Policy defines what’s allowed; guardrails enforce it at runtime. The distinction matters because policy can be ignored by the model; guardrails sit outside the model and cannot be argued with.
For regulated deployments, guardrails are usually a documented control in the security review. See the compliance checklist for vendor-side requirements.
Stéphane Viaud-Murat
CEO, mi4.fr