DSDR: Dual-scale diversity regularization for exploration in LLM reasoning.arXiv preprint arXiv:2602.19895

DSDR: Dual-Scale Diversity Regularization for Exploration in LLM Reasoning , author= · 2026 · arXiv 2602.19895

5 Pith papers cite this work. Polarity classification is still indexing.

5 Pith papers citing it

read on arXiv browse 5 citing papers

citation-role summary

background 1 other 1

citation-polarity summary

background 1 unclear 1

representative citing papers

The Ringelmann Effect in Multi-Agent LLM Systems: A Scaling Law for Effective Team Size

physics.soc-ph · 2026-05-31 · conditional · novelty 7.0

A derived scaling law R(N) = 1/(1 + c(N-1)N^{-β}) fits answer diversity and correctness across 44 LLM multi-agent conditions with R² > 0.99, classifying regimes by β and showing only heterogeneous teams escape hard-ceiling saturation.

Rebellious Student: Reversing Teacher Signals for Reasoning Exploration with Self-Distilled RLVR

cs.LG · 2026-05-11 · unverdicted · novelty 7.0

RLRT augments GRPO by reinforcing tokens on correct student rollouts that the teacher would not have predicted, outperforming standard self-distillation and exploration baselines on Qwen3 models.

Where Rollouts Begin: Low-Load, High-Leverage First-Token Diversification for RLVR

cs.AI · 2026-05-27 · unverdicted · novelty 6.0

REFT improves Pass@1/8/64 in RLVR by uniform first-token sampling from top-N candidates across 0.5B-7B models and multiple difficulty levels.

EVE-Agent: Evidence-Verifiable Self-Evolving Agents

cs.AI · 2026-05-21 · unverdicted · novelty 6.0

EVE-Agent adds an evidence verifier to the proposer-solver loop that rewards spans by marginal accuracy gain, producing self-generated but inspectable training examples for search agents.

The Past Is Not Past: Memory-Enhanced Dynamic Reward Shaping

cs.LG · 2026-04-13 · unverdicted · novelty 6.0

MEDS improves LLM RL performance by up to 4.13 pass@1 and 4.37 pass@128 points by dynamically penalizing rollouts matching prevalent historical error clusters identified via memory-stored representations and density clustering.

citing papers explorer

Showing 5 of 5 citing papers.

The Ringelmann Effect in Multi-Agent LLM Systems: A Scaling Law for Effective Team Size physics.soc-ph · 2026-05-31 · conditional · none · ref 55
A derived scaling law R(N) = 1/(1 + c(N-1)N^{-β}) fits answer diversity and correctness across 44 LLM multi-agent conditions with R² > 0.99, classifying regimes by β and showing only heterogeneous teams escape hard-ceiling saturation.
Rebellious Student: Reversing Teacher Signals for Reasoning Exploration with Self-Distilled RLVR cs.LG · 2026-05-11 · unverdicted · none · ref 23
RLRT augments GRPO by reinforcing tokens on correct student rollouts that the teacher would not have predicted, outperforming standard self-distillation and exploration baselines on Qwen3 models.
Where Rollouts Begin: Low-Load, High-Leverage First-Token Diversification for RLVR cs.AI · 2026-05-27 · unverdicted · none · ref 31
REFT improves Pass@1/8/64 in RLVR by uniform first-token sampling from top-N candidates across 0.5B-7B models and multiple difficulty levels.
EVE-Agent: Evidence-Verifiable Self-Evolving Agents cs.AI · 2026-05-21 · unverdicted · none · ref 11
EVE-Agent adds an evidence verifier to the proposer-solver loop that rewards spans by marginal accuracy gain, producing self-generated but inspectable training examples for search agents.
The Past Is Not Past: Memory-Enhanced Dynamic Reward Shaping cs.LG · 2026-04-13 · unverdicted · none · ref 12
MEDS improves LLM RL performance by up to 4.13 pass@1 and 4.37 pass@128 points by dynamically penalizing rollouts matching prevalent historical error clusters identified via memory-stored representations and density clustering.

DSDR: Dual-scale diversity regularization for exploration in LLM reasoning.arXiv preprint arXiv:2602.19895

citation-role summary

citation-polarity summary

fields

years

verdicts

roles

polarities

representative citing papers

citing papers explorer