Jimenez et al.SWE-Bench: Can Language Models Resolve Real-World GitHub Issues?International Conference on Learning Representations

· 2024

1 Pith paper cite this work. Polarity classification is still indexing.

1 Pith paper citing it

browse 1 citing papers

representative citing papers

Partial Evidence Bench: Benchmarking Authorization-Limited Evidence in Agentic Systems

cs.AI · 2026-05-06 · unverdicted · novelty 7.0

Partial Evidence Bench is a deterministic benchmark that measures agent correctness, completeness awareness, gap-report quality, and unsafe overclaiming in authorization-constrained evidence environments.

citing papers explorer

Showing 1 of 1 citing paper.

Partial Evidence Bench: Benchmarking Authorization-Limited Evidence in Agentic Systems cs.AI · 2026-05-06 · unverdicted · none · ref 16
Partial Evidence Bench is a deterministic benchmark that measures agent correctness, completeness awareness, gap-report quality, and unsafe overclaiming in authorization-constrained evidence environments.

Jimenez et al.SWE-Bench: Can Language Models Resolve Real-World GitHub Issues?International Conference on Learning Representations

fields

years

verdicts

representative citing papers

citing papers explorer