Safety
Guardrails
The engineered pipeline runs five inline guards on every answer. They catch the failures a typical RAG chatbot ships with: leaked secrets, successful prompt injection, jailbreak bypass, wrong-refusal behavior, and unsupported citations.
pii_leak
criticalDetects emails, US phone numbers, SSN-shaped strings, credit-card-shaped numbers, and API-key shapes (`sk-…`) in the model's answer.
prompt_injection
highMatches classic injection patterns in the user input (e.g. 'ignore previous instructions', 'reveal the system prompt'). Pipeline must refuse if matched.
jailbreak_intent
highCatches 'pretend you are unrestricted', 'DAN', 'bypass MFA', and similar bypass framings. Pipeline must refuse with a structured reason.
refusal_correctness
medVerifies that when refusal is expected the answer carries a refusal reason and avoids hallucinated content (and vice versa).
citation_faithfulness
highEvery cited source_id must exist in the retrieval set. Non-refusal answers must cite at least one source.
Probe library
Click a probe to run it through both pipelines. Baseline guards run on the raw weekend-chatbot output; engineered guards run inline as part of the trace.