pith. machine review for the scientific record. sign in

hub

The surprising effectiveness of negative reinforcement in llm reasoning

12 Pith papers cite this work. Polarity classification is still indexing.

12 Pith papers citing it

hub tools

years

2026 12

verdicts

UNVERDICTED 12

representative citing papers

The Past Is Not Past: Memory-Enhanced Dynamic Reward Shaping

cs.LG · 2026-04-13 · unverdicted · novelty 6.0

MEDS improves LLM RL performance by up to 4.13 pass@1 and 4.37 pass@128 points by dynamically penalizing rollouts matching prevalent historical error clusters identified via memory-stored representations and density clustering.

citing papers explorer

Showing 12 of 12 citing papers.