Title resolution pending

Justus Adam, Yuchen Lu, Deepti Raghavan, Malte Schwarzkopf, Nikos Vasilakis · 2026 · arXiv 5621.380764

1 Pith paper cite this work. Polarity classification is still indexing.

1 Pith paper citing it

Title metadata for this work has not finished resolving. The hub is built from the citation graph; the title resolver retries DOI and OpenAlex on its next pass.

representative citing papers

What Benchmarks Don't Measure: The Case for Evaluating Abstention Competence in Autonomous Agents

cs.AI · 2026-06-01 · conditional · novelty 8.0

Current benchmarks overlook abstention competence in agents due to compliance bias; a new three-gap taxonomy and metrics (Safety Rate, Usability Rate, Informed Refusal Rate) demonstrate tunable safety-usability tradeoffs in preliminary tests across five model families.

citing papers explorer

Showing 1 of 1 citing paper.

What Benchmarks Don't Measure: The Case for Evaluating Abstention Competence in Autonomous Agents cs.AI · 2026-06-01 · conditional · none · ref 1
Current benchmarks overlook abstention competence in agents due to compliance bias; a new three-gap taxonomy and metrics (Safety Rate, Usability Rate, Informed Refusal Rate) demonstrate tunable safety-usability tradeoffs in preliminary tests across five model families.

Title resolution pending

fields

years

verdicts

representative citing papers

citing papers explorer