pith. sign in

The Twelfth International Conference on Learning Representations , year =

2 Pith papers cite this work. Polarity classification is still indexing.

2 Pith papers citing it

fields

cs.SE 2

years

2026 2

representative citing papers

ContractBench: Can LLM Agents Preserve Observation Contracts?

cs.SE · 2026-05-17 · conditional · novelty 7.0

ContractBench shows that LLM agents frequently violate observation contracts by using expired artifacts or corrupting their byte integrity, with no model exceeding 80% success and notable scaling irregularities across families.

An Executable Benchmarking Suite for Tool-Using Agents

cs.SE · 2026-05-10 · unverdicted · novelty 5.0

The paper delivers a unified executable benchmarking suite for tool-using agents that enforces a shared evidence-admission contract across web, code, and micro-task environments.

citing papers explorer

Showing 2 of 2 citing papers.

  • ContractBench: Can LLM Agents Preserve Observation Contracts? cs.SE · 2026-05-17 · conditional · none · ref 17

    ContractBench shows that LLM agents frequently violate observation contracts by using expired artifacts or corrupting their byte integrity, with no model exceeding 80% success and notable scaling irregularities across families.

  • An Executable Benchmarking Suite for Tool-Using Agents cs.SE · 2026-05-10 · unverdicted · none · ref 3

    The paper delivers a unified executable benchmarking suite for tool-using agents that enforces a shared evidence-admission contract across web, code, and micro-task environments.