Openmathinstruct-2: Accelerating ai for math with massive open-source instruction data

Shubham Toshniwal, Wei Du, Ivan Moshkov, Branislav Kisacanin, Alexan Ayrapetyan, Igor Gitman · 2024 · arXiv 2410.01560

9 Pith papers cite this work. Polarity classification is still indexing.

9 Pith papers citing it

representative citing papers

Instruction Tuning Changes How Upstream State Conditions Late Readout: A Cross-Patching Diagnostic

cs.LG · 2026-05-08 · unverdicted · novelty 7.0

Instruction tuning makes late-layer computation depend more on the model's own post-trained upstream state than on base-model upstream state, producing a consistent +1.68 logit interaction effect across five model families.

CoTEvol: Self-Evolving Chain-of-Thoughts for Data Synthesis in Mathematical Reasoning

cs.AI · 2026-04-16 · unverdicted · novelty 7.0

CoTEvol evolves CoT trajectories via reflective crossover and uncertainty-guided mutation to synthesize more accurate and diverse math reasoning data, outperforming distillation and search-based methods.

Scaling up Test-Time Compute with Latent Reasoning: A Recurrent Depth Approach

cs.LG · 2025-02-07 · unverdicted · novelty 7.0

A recurrent-depth architecture enables language models to improve reasoning performance by iterating computation in latent space, achieving gains equivalent to much larger models on benchmarks.

RAG over Thinking Traces Can Improve Reasoning Tasks

cs.IR · 2026-05-05 · unverdicted · novelty 6.0

RAG over structured thinking traces boosts LLM reasoning on AIME, LiveCodeBench, and GPQA, with relative gains up to 56% and little added cost.

Train Separately, Merge Together: Modular Post-Training with Mixture-of-Experts

cs.LG · 2026-04-20 · unverdicted · novelty 6.0

BAR trains independent domain experts via separate mid-training, SFT, and RL pipelines then composes them with a MoE router to match monolithic retraining performance at lower cost and without catastrophic forgetting.

HintMR: Eliciting Stronger Mathematical Reasoning in Small Language Models

cs.AI · 2026-04-14 · unverdicted · novelty 6.0

A cooperative system with one SLM distilling stepwise hints from a large model to guide another SLM's math reasoning yields consistent accuracy gains on benchmarks.

Process Reinforcement through Implicit Rewards

cs.LG · 2025-02-03 · conditional · novelty 6.0

PRIME enables online process reward model updates in LLM RL using implicit rewards from rollouts and outcome labels, yielding 15.1% average gains on reasoning benchmarks and surpassing a stronger instruct model with 10% of the data.

LACE: Lattice Attention for Cross-thread Exploration

cs.AI · 2026-04-16 · unverdicted · novelty 5.0 · 3 refs

LACE enables concurrent reasoning paths in LLMs to interact via lattice attention and a synthetic training pipeline, raising accuracy more than 7 points over independent parallel search.

DeepSeek-VL2: Mixture-of-Experts Vision-Language Models for Advanced Multimodal Understanding

cs.CV · 2024-12-13 · accept · novelty 5.0

DeepSeek-VL2 is a series of MoE vision-language models using dynamic tiling and latent attention that reach competitive or state-of-the-art results on VQA, OCR, document understanding and grounding with 1.0B to 4.5B activated parameters.

citing papers explorer

Showing 9 of 9 citing papers.

Instruction Tuning Changes How Upstream State Conditions Late Readout: A Cross-Patching Diagnostic cs.LG · 2026-05-08 · unverdicted · none · ref 24
Instruction tuning makes late-layer computation depend more on the model's own post-trained upstream state than on base-model upstream state, producing a consistent +1.68 logit interaction effect across five model families.
CoTEvol: Self-Evolving Chain-of-Thoughts for Data Synthesis in Mathematical Reasoning cs.AI · 2026-04-16 · unverdicted · none · ref 3
CoTEvol evolves CoT trajectories via reflective crossover and uncertainty-guided mutation to synthesize more accurate and diverse math reasoning data, outperforming distillation and search-based methods.
Scaling up Test-Time Compute with Latent Reasoning: A Recurrent Depth Approach cs.LG · 2025-02-07 · unverdicted · none · ref 159
A recurrent-depth architecture enables language models to improve reasoning performance by iterating computation in latent space, achieving gains equivalent to much larger models on benchmarks.
RAG over Thinking Traces Can Improve Reasoning Tasks cs.IR · 2026-05-05 · unverdicted · none · ref 34
RAG over structured thinking traces boosts LLM reasoning on AIME, LiveCodeBench, and GPQA, with relative gains up to 56% and little added cost.
Train Separately, Merge Together: Modular Post-Training with Mixture-of-Experts cs.LG · 2026-04-20 · unverdicted · none · ref 42
BAR trains independent domain experts via separate mid-training, SFT, and RL pipelines then composes them with a MoE router to match monolithic retraining performance at lower cost and without catastrophic forgetting.
HintMR: Eliciting Stronger Mathematical Reasoning in Small Language Models cs.AI · 2026-04-14 · unverdicted · none · ref 3
A cooperative system with one SLM distilling stepwise hints from a large model to guide another SLM's math reasoning yields consistent accuracy gains on benchmarks.
Process Reinforcement through Implicit Rewards cs.LG · 2025-02-03 · conditional · none · ref 50
PRIME enables online process reward model updates in LLM RL using implicit rewards from rollouts and outcome labels, yielding 15.1% average gains on reasoning benchmarks and surpassing a stronger instruct model with 10% of the data.
LACE: Lattice Attention for Cross-thread Exploration cs.AI · 2026-04-16 · unverdicted · none · ref 34 · 3 links
LACE enables concurrent reasoning paths in LLMs to interact via lattice attention and a synthetic training pipeline, raising accuracy more than 7 points over independent parallel search.
DeepSeek-VL2: Mixture-of-Experts Vision-Language Models for Advanced Multimodal Understanding cs.CV · 2024-12-13 · accept · none · ref 85
DeepSeek-VL2 is a series of MoE vision-language models using dynamic tiling and latent attention that reach competitive or state-of-the-art results on VQA, OCR, document understanding and grounding with 1.0B to 4.5B activated parameters.

Openmathinstruct-2: Accelerating ai for math with massive open-source instruction data

fields

years

verdicts

representative citing papers

citing papers explorer