Partial Evidence Bench is a deterministic benchmark that measures agent correctness, completeness awareness, gap-report quality, and unsafe overclaiming in authorization-constrained evidence environments.
Valencia et al.How Do LLMs Fail In Agentic Scenarios? A Qualitative Analysis of Success and Failure Scenarios of Various LLMs in Agentic Simulations
1 Pith paper cite this work. Polarity classification is still indexing.
1
Pith paper citing it
fields
cs.AI 1years
2026 1verdicts
UNVERDICTED 1representative citing papers
citing papers explorer
-
Partial Evidence Bench: Benchmarking Authorization-Limited Evidence in Agentic Systems
Partial Evidence Bench is a deterministic benchmark that measures agent correctness, completeness awareness, gap-report quality, and unsafe overclaiming in authorization-constrained evidence environments.