An online KS-statistic monitor detects shifts in deployed safety classifiers with 86.6% valid detection rate, exposes conformal prediction collapse in high-dimensional embeddings, and derives a confidence-gated security boundary against adaptive attackers.
Tracking the risk of a deployed model and detecting harmful distri- bution shifts
3 Pith papers cite this work. Polarity classification is still indexing.
citation-role summary
citation-polarity summary
verdicts
UNVERDICTED 3roles
method 1polarities
use method 1representative citing papers
Pith review generated a malformed one-line summary.
Simple thresholding on an external verifier signal, calibrated by risk control, performs competitively with sequential hypothesis testing monitors on math reasoning and red-teaming datasets.
citing papers explorer
-
Online Shift Detection and Conformal Adaptation for Deployed Safety Classifiers
An online KS-statistic monitor detects shifts in deployed safety classifiers with 86.6% valid detection rate, exposes conformal prediction collapse in high-dimensional embeddings, and derives a confidence-gated security boundary against adaptive attackers.
-
A Gentle Introduction to Conformal Prediction and Distribution-Free Uncertainty Quantification
Pith review generated a malformed one-line summary.
-
Online Safety Monitoring for LLMs
Simple thresholding on an external verifier signal, calibrated by risk control, performs competitively with sequential hypothesis testing monitors on math reasoning and red-teaming datasets.