LLM safety judges resist adjusting evaluations when given contradictory context or new safety definitions, despite some ability to learn from new information.
arXiv preprint arXiv:2512.22712 , year =
2 Pith papers cite this work. Polarity classification is still indexing.
2
Pith papers citing it
years
2026 2verdicts
UNVERDICTED 2representative citing papers
GRPO reinforcement learning on the new PolyFact dataset outperforms SFT and CPT for cross-lingual factual consistency in Qwen-2.5-7B and OLMo-2-7B by reducing language specialization in MLP and attention layers.
citing papers explorer
-
Safety is Contextual, LLM-Judges Are Not: Navigating the Rigid Priors of Evaluators
LLM safety judges resist adjusting evaluations when given contradictory context or new safety definitions, despite some ability to learn from new information.