Behavioral safety metrics for LLMs are insufficient because models can maintain safe outputs while remaining vulnerable to latent-space interventions, as shown via dissociated models and the new Latent Vulnerability Score.
Measurement and Fairness
2 Pith papers cite this work. Polarity classification is still indexing.
2
Pith papers citing it
years
2026 2verdicts
UNVERDICTED 2representative citing papers
Context specification is a process that turns diffuse stakeholder perspectives into explicit definitions of properties, behaviors, and outcomes to guide context-aware AI evaluations.
citing papers explorer
-
When Behavioral Safety Evaluation Fails: A Representation-Level Perspective
Behavioral safety metrics for LLMs are insufficient because models can maintain safe outputs while remaining vulnerable to latent-space interventions, as shown via dissociated models and the new Latent Vulnerability Score.
-
Making AI Evaluation Deployment Relevant Through Context Specification
Context specification is a process that turns diffuse stakeholder perspectives into explicit definitions of properties, behaviors, and outcomes to guide context-aware AI evaluations.