pith. sign in

hub Mixed citations

OR- Bench: An over-refusal benchmark for large language models

Mixed citation behavior. Most common role is background (60%).

19 Pith papers citing it
Background 60% of classified citations

hub tools

citation-role summary

background 4 dataset 1

citation-polarity summary

years

2026 16 2025 3

representative citing papers

Taxonomy and Consistency Analysis of Safety Benchmarks for AI Agents

cs.CY · 2026-04-11 · accept · novelty 8.0

This paper delivers the first systematic taxonomy and cross-benchmark consistency analysis of 40 agent safety benchmarks, finding broad but shallow risk coverage, no ranking concordance across evaluations, and that benchmark choice systematically alters reported safety.

Triaging Threats to Specialized Guardrails

cs.CR · 2026-05-29 · unverdicted · novelty 6.0

Introduces GuardZoo benchmark and RouteGuard router-expert system showing monolithic guardrails suffer task interference while specialized routing improves threat detection and generalization.

Knowledge Distillation Must Account for What It Loses

cs.LG · 2026-04-28 · unverdicted · novelty 4.0 · 2 refs

Knowledge distillation evaluations must report lost teacher capabilities via a Distillation Loss Statement rather than relying solely on task scores.

LLM-Safety Evaluations Lack Robustness

cs.CR · 2025-03-04 · unverdicted · novelty 4.0

LLM safety evaluations are hindered by noise in dataset curation, automated red-teaming, response generation, and LLM-judge evaluation, making fair comparisons difficult and slowing progress.

citing papers explorer

Showing 19 of 19 citing papers.