Measuring mathematical problem solving with the math dataset

Dan Hendrycks, Collin Burns, Saurav Kadavath, Akul Arora, Steven Basart, Eric Tang, Dawn Song, Jacob Steinhardt · 2021

4 Pith papers cite this work. Polarity classification is still indexing.

4 Pith papers citing it

browse 4 citing papers

representative citing papers

Not How Many, But Which: Parameter Placement in Low-Rank Adaptation

cs.LG · 2026-05-12 · unverdicted · novelty 6.0

Gradient-informed placement of LoRA parameters recovers full performance under GRPO while random placement does not, due to differences in gradient rank and stability across training regimes.

Self-Distilled Trajectory-Aware Boltzmann Modeling: Bridging the Training-Inference Discrepancy in Diffusion Language Models

cs.CL · 2026-05-12 · unverdicted · novelty 6.0

TABOM models inference unmasking preferences as a Boltzmann distribution over predictive entropies and derives a ranking loss to align DLM training with observed trajectories, yielding gains in new domains and reduced catastrophic forgetting versus standard SFT.

Valid Best-Model Identification for LLM Evaluation via Low-Rank Factorization

cs.LG · 2026-05-11 · unverdicted · novelty 6.0

Doubly robust estimators that incorporate low-rank predictions enable valid finite-sample confidence intervals for best-model identification under adaptive sampling and without-replacement example selection in LLM evaluation.

ReAD: Reinforcement-Guided Capability Distillation for Large Language Models

cs.CL · 2026-05-11 · unverdicted · novelty 5.0

ReAD applies a contextual bandit to allocate fixed-token distillation budget across interdependent LLM capabilities, yielding higher task utility and fewer negative spillovers than standard methods.

citing papers explorer

Showing 4 of 4 citing papers.

Not How Many, But Which: Parameter Placement in Low-Rank Adaptation cs.LG · 2026-05-12 · unverdicted · none · ref 38
Gradient-informed placement of LoRA parameters recovers full performance under GRPO while random placement does not, due to differences in gradient rank and stability across training regimes.
Self-Distilled Trajectory-Aware Boltzmann Modeling: Bridging the Training-Inference Discrepancy in Diffusion Language Models cs.CL · 2026-05-12 · unverdicted · none · ref 24
TABOM models inference unmasking preferences as a Boltzmann distribution over predictive entropies and derives a ranking loss to align DLM training with observed trajectories, yielding gains in new domains and reduced catastrophic forgetting versus standard SFT.
Valid Best-Model Identification for LLM Evaluation via Low-Rank Factorization cs.LG · 2026-05-11 · unverdicted · none · ref 5
Doubly robust estimators that incorporate low-rank predictions enable valid finite-sample confidence intervals for best-model identification under adaptive sampling and without-replacement example selection in LLM evaluation.
ReAD: Reinforcement-Guided Capability Distillation for Large Language Models cs.CL · 2026-05-11 · unverdicted · none · ref 11
ReAD applies a contextual bandit to allocate fixed-token distillation budget across interdependent LLM capabilities, yielding higher task utility and fewer negative spillovers than standard methods.

Measuring mathematical problem solving with the math dataset

fields

years

verdicts

representative citing papers

citing papers explorer