DriftGuard introduces multi-monitor safety-aware drift detection paired with hard-mix selective adaptation, reporting toxic recall gains to 0.8777 on Civil Comments and 0.8523 on DynaHate under temporal and cross-dataset shifts.
A comprehensive anal- ysis of indicator effect on llm performance,
1 Pith paper cite this work. Polarity classification is still indexing.
1
Pith paper citing it
fields
cs.CL 1years
2026 1verdicts
UNVERDICTED 1representative citing papers
citing papers explorer
-
DriftGuard: Safety-Aware Multi-Monitor Detection and Selective Adaptation for Evolving Toxicity Moderation
DriftGuard introduces multi-monitor safety-aware drift detection paired with hard-mix selective adaptation, reporting toxic recall gains to 0.8777 on Civil Comments and 0.8523 on DynaHate under temporal and cross-dataset shifts.