DASD improves math reasoning in LLMs by adaptively directing self-distillation based on per-token entropy to balance exploration and step accuracy, outperforming prior self-distillation and RLVR baselines on six benchmarks.
Reinforcement learn- ing meets large language models: A survey of ad- vancements and applications across the llm lifecycle
4 Pith papers cite this work. Polarity classification is still indexing.
citation-role summary
citation-polarity summary
verdicts
UNVERDICTED 4roles
background 2polarities
background 2representative citing papers
StaRPO improves LLM reasoning by adding autocorrelation function and path efficiency stability metrics to RL policy optimization, yielding higher accuracy and fewer logic errors on reasoning benchmarks.
This survey organizes RL for LLM multi-agent systems into reward families, credit units, and five orchestration sub-decisions, notes the absence of explicit stopping-decision training in its paper pool, and releases a tagged corpus.
A survey of physical AI that distinguishes theoretical physics reasoning from applied understanding and synthesizes advances in symbolic reasoning, embodied systems, and generative models to advocate for physics-grounded world models.
citing papers explorer
-
Tailoring Teaching to Aptitude: Direction-Adaptive Self-Distillation for LLM Reasoning
DASD improves math reasoning in LLMs by adaptively directing self-distillation based on per-token entropy to balance exploration and step accuracy, outperforming prior self-distillation and RLVR baselines on six benchmarks.
-
StaRPO: Stability-Augmented Reinforcement Policy Optimization
StaRPO improves LLM reasoning by adding autocorrelation function and path efficiency stability metrics to RL policy optimization, yielding higher accuracy and fewer logic errors on reasoning benchmarks.
-
Reinforcement Learning for LLM-based Multi-Agent Systems through Orchestration Traces
This survey organizes RL for LLM multi-agent systems into reward families, credit units, and five orchestration sub-decisions, notes the absence of explicit stopping-decision training in its paper pool, and releases a tagged corpus.
-
Aligning Perception, Reasoning, Modeling and Interaction: A Survey on Physical AI
A survey of physical AI that distinguishes theoretical physics reasoning from applied understanding and synthesizes advances in symbolic reasoning, embodied systems, and generative models to advocate for physics-grounded world models.