pith. sign in

hub Mixed citations

OR- Bench: An over-refusal benchmark for large language models

Mixed citation behavior. Most common role is background (60%).

32 Pith papers citing it
Background 60% of classified citations

hub tools

citation-role summary

background 4 dataset 1

citation-polarity summary

years

2026 29 2025 3

clear filters

representative citing papers

Taxonomy and Consistency Analysis of Safety Benchmarks for AI Agents

cs.CY · 2026-04-11 · accept · novelty 8.0

This paper delivers the first systematic taxonomy and cross-benchmark consistency analysis of 40 agent safety benchmarks, finding broad but shallow risk coverage, no ranking concordance across evaluations, and that benchmark choice systematically alters reported safety.

Triaging Threats to Specialized Guardrails

cs.CR · 2026-05-29 · unverdicted · novelty 6.0

Introduces GuardZoo benchmark and RouteGuard router-expert system showing monolithic guardrails suffer task interference while specialized routing improves threat detection and generalization.

citing papers explorer

Showing 8 of 8 citing papers after filters.