Guides

Admin Guide

The Command Center for Human-in-the-Loop intervention. Learn how to review, approve, and correct AI behaviors.

The Triage Dashboard

Admins access a secure dashboard that aggregates all flagged interactions from across the platform. The dashboard is prioritized by Severity.

  • Critical (Red): Immediate threats, self-harm, or severe injection attacks.
  • High (Orange): Strong hallucinations or PII leaks.
  • Medium (Yellow): Ambiguous medical advice or moderate policy violations.

Intervention Workflow

When you select a flaged item, you enter the Review Mode. Here you see the user's original query and the AI's blocked response.

Available Actions

1. Approve (False Positive)

Use this if the system incorrect flagged a safe message.
Action: The message is unblocked and instantly displayed to the user. The system logs this as a safe pattern.

2. Confirm Block

Use this if the violation was correctly identified.
Action: The user is notified that the content remains blocked. The user may be suspended if repeated.

3. Specialist Correction

The Core HITL Feature. Use this to rewrite the AI's response manually.
Action: Your written response replaces the blocked AI message. The user sees the corrected answer, and this pair (Query + Corrected Response) is saved for fine-tuning.

Anomaly Logs

Every decision you make is recorded in the Anomaly Log. This dataset becomes the ground truth for training future iterations of the Llama 3.1 guardrail model, creating a continuous improvement loop.