pith. sign in

hub Canonical reference

The surprising effectiveness of negative reinforcement in llm reasoning.arXiv preprint arXiv:2506.01347

Canonical reference. 80% of citing Pith papers cite this work as background.

18 Pith papers citing it
Background 80% of classified citations

hub tools

citation-role summary

background 5

citation-polarity summary

years

2026 15 2025 3

roles

background 5

polarities

background 4 unclear 1

clear filters

representative citing papers

The Past Is Not Past: Memory-Enhanced Dynamic Reward Shaping

cs.LG · 2026-04-13 · unverdicted · novelty 6.0

MEDS improves LLM RL performance by up to 4.13 pass@1 and 4.37 pass@128 points by dynamically penalizing rollouts matching prevalent historical error clusters identified via memory-stored representations and density clustering.

What Is Preference Optimization Doing, and Why?

cs.LG · 2025-11-30 · unverdicted · novelty 5.0

Gradient analysis and ablations show DPO and PPO have different target directions and component roles in preference optimization for LLMs.

citing papers explorer

Showing 1 of 1 citing paper after filters.