LongRLVR : Long-context reinforcement learning requires verifiable context rewards

Guanzheng Chen, Michael Qizhe Shieh, Lidong Bing · 2026 · arXiv 2603.02146

1 Pith paper cite this work. Polarity classification is still indexing.

1 Pith paper citing it

representative citing papers

Supersede: Diagnosing and Training the Memory-Update Gap in LLM Agents

cs.CL · 2026-06-25 · unverdicted · novelty 7.0

The supersession gap in LLM agents—failing to use current facts and discard superseded ones—is a distinct failure not fixed by scale or memory size, but improvable via RL training on a new environment.

citing papers explorer

Showing 1 of 1 citing paper after filters.

Supersede: Diagnosing and Training the Memory-Update Gap in LLM Agents cs.CL · 2026-06-25 · unverdicted · none · ref 23
The supersession gap in LLM agents—failing to use current facts and discard superseded ones—is a distinct failure not fixed by scale or memory size, but improvable via RL training on a new environment.

LongRLVR : Long-context reinforcement learning requires verifiable context rewards

fields

years

verdicts

representative citing papers

citing papers explorer