Proceedings of the 2025 ACM Conference on Fairness, Accountability, and Transparency , pages=

AI Alignment at Your Discretion , author= · 2025

1 Pith paper cite this work. Polarity classification is still indexing.

1 Pith paper citing it

browse 1 citing papers

representative citing papers

Understanding Annotator Safety Policy with Interpretability

cs.AI · 2026-05-06 · unverdicted · novelty 6.0

Annotator Policy Models learn safety policies from labeling behavior alone, accurately predicting responses and revealing sources of disagreement like policy ambiguity and value pluralism.

citing papers explorer

Showing 1 of 1 citing paper.

Understanding Annotator Safety Policy with Interpretability cs.AI · 2026-05-06 · unverdicted · none · ref 10
Annotator Policy Models learn safety policies from labeling behavior alone, accurately predicting responses and revealing sources of disagreement like policy ambiguity and value pluralism.

Proceedings of the 2025 ACM Conference on Fairness, Accountability, and Transparency , pages=

fields

years

verdicts

representative citing papers

citing papers explorer