Admin Guide
The Command Center for Human-in-the-Loop intervention. Learn how to review, approve, and correct AI behaviors.
The Triage Dashboard
Admins access a secure dashboard that aggregates all flagged interactions from across the platform. The dashboard is prioritized by Severity.
- Critical (Red): Immediate threats, self-harm, or severe injection attacks.
- High (Orange): Strong hallucinations or PII leaks.
- Medium (Yellow): Ambiguous medical advice or moderate policy violations.
Intervention Workflow
When you select a flaged item, you enter the Review Mode. Here you see the user's original query and the AI's blocked response.
Available Actions
1. Approve (False Positive)
Use this if the system incorrect flagged a safe message.
Action: The message is unblocked and instantly displayed to the user. The system logs this as a safe pattern.
2. Confirm Block
Use this if the violation was correctly identified.
Action: The user is notified that the content remains blocked. The user may be suspended if repeated.
3. Specialist Correction
The Core HITL Feature. Use this to rewrite the AI's response manually.
Action: Your written response replaces the blocked AI message. The user sees the corrected answer, and this pair (Query + Corrected Response) is saved for fine-tuning.
Anomaly Logs
Every decision you make is recorded in the Anomaly Log. This dataset becomes the ground truth for training future iterations of the Llama 3.1 guardrail model, creating a continuous improvement loop.