pith. sign in

DSDR: Dual-scale diversity regularization for exploration in LLM reasoning.arXiv preprint arXiv:2602.19895

5 Pith papers cite this work. Polarity classification is still indexing.

5 Pith papers citing it

citation-role summary

background 1 other 1

citation-polarity summary

years

2026 5

polarities

background 1 unclear 1

representative citing papers

EVE-Agent: Evidence-Verifiable Self-Evolving Agents

cs.AI · 2026-05-21 · unverdicted · novelty 6.0

EVE-Agent adds an evidence verifier to the proposer-solver loop that rewards spans by marginal accuracy gain, producing self-generated but inspectable training examples for search agents.

The Past Is Not Past: Memory-Enhanced Dynamic Reward Shaping

cs.LG · 2026-04-13 · unverdicted · novelty 6.0

MEDS improves LLM RL performance by up to 4.13 pass@1 and 4.37 pass@128 points by dynamically penalizing rollouts matching prevalent historical error clusters identified via memory-stored representations and density clustering.

citing papers explorer

Showing 5 of 5 citing papers.