ACPBench: Reasoning about Action, Change, and Planning

Kokel, Harsha, Katz, Michael, Srinivas, Kavitha, Sohrabi, Shirin , year= · 2026 · DOI 10.1609/aaai.v39i25.34857

2 Pith papers cite this work. Polarity classification is still indexing.

2 Pith papers citing it

open at publisher browse 2 citing papers

representative citing papers

HalluWorld: A Controlled Benchmark for Hallucination via Reference World Models

cs.CL · 2026-05-19 · conditional · novelty 8.0

HalluWorld is a controlled benchmark using explicit reference world models to automatically label and disentangle hallucinations in LLMs across synthetic environments with varying complexity and observability.

When AI Says It Feels

cs.AI · 2026-06-04 · unverdicted · novelty 5.0

LLMs trained via rubric-based self-rewarding RL with GRPO enhanced feeling expression and sycophancy robustness but degraded truthful QA performance.

citing papers explorer

Showing 2 of 2 citing papers.

HalluWorld: A Controlled Benchmark for Hallucination via Reference World Models cs.CL · 2026-05-19 · conditional · none · ref 30
HalluWorld is a controlled benchmark using explicit reference world models to automatically label and disentangle hallucinations in LLMs across synthetic environments with varying complexity and observability.
When AI Says It Feels cs.AI · 2026-06-04 · unverdicted · none · ref 33
LLMs trained via rubric-based self-rewarding RL with GRPO enhanced feeling expression and sycophancy robustness but degraded truthful QA performance.

ACPBench: Reasoning about Action, Change, and Planning

fields

years

verdicts

representative citing papers

citing papers explorer