hub

Opencodereasoning: Advancing data distillation for competitive coding.arXiv preprint arXiv:2504.01943

Wasi Uddin Ahmad, Sean Narenthiran, Somshubra Majumdar, Aleksander Ficek, Siddhartha Jain, Jocelyn Huang, Vahid Noroozi, Boris Ginsburg · 2025 · arXiv 2504.01943

13 Pith papers cite this work. Polarity classification is still indexing.

13 Pith papers citing it

read on arXiv browse 13 citing papers

hub tools

JSON dossier citing papers JSON arXiv source

citation-role summary

method 1

citation-polarity summary

extend 1

representative citing papers

Primal Generation, Dual Judgment: Self-Training from Test-Time Scaling

cs.LG · 2026-05-11 · conditional · novelty 7.0

DuST self-trains LLMs for code generation by ranking their own test-time samples via sandbox execution and applying GRPO, improving judgment by +6.2 NDCG and single-sample pass@1 by +3.1 on LiveCodeBench.

Teaching Language Models to Think in Code

cs.CL · 2026-05-08 · unverdicted · novelty 7.0 · 2 refs

ThinC trains small models to reason primarily in code rather than natural language, outperforming tool-integrated baselines and even larger models on competition math benchmarks.

Rethinking KV Cache Eviction via a Unified Information-Theoretic Objective

cs.LG · 2026-04-28 · unverdicted · novelty 7.0

KV cache eviction is unified under an information capacity maximization principle derived from a linear-Gaussian attention surrogate, with CapKV proposed as a leverage-score based implementation that outperforms prior heuristics in experiments.

Rethinking the Comparison Unit in Sequence-Level Reinforcement Learning: An Equal-Length Paired Training Framework from Loss Correction to Sample Construction

cs.LG · 2026-04-19 · unverdicted · novelty 7.0

EqLen is a sample-construction framework that builds equal-length paired segments via dual-track generation and masking for stable group-relative RL in sequences, reframing the length problem as a comparison-unit issue rather than loss scaling.

TrigReason: Trigger-Based Collaboration between Small and Large Reasoning Models

cs.AI · 2026-04-16 · unverdicted · novelty 7.0

TrigReason matches large reasoning model accuracy on math and science benchmarks by delegating most steps to small models and intervening selectively on three triggers, cutting latency by 43.9% and cost by 73.3%.

SUPERNOVA: Eliciting General Reasoning in LLMs with Reinforcement Learning on Natural Instructions

cs.AI · 2026-04-09 · unverdicted · novelty 7.0

SUPERNOVA adapts instruction-tuning data for RLVR and achieves up to 52.8% relative gains on general reasoning benchmarks like BBEH through targeted task selection and mixing.

Think Anywhere in Code Generation

cs.SE · 2026-03-31 · unverdicted · novelty 7.0

Think-Anywhere lets LLMs invoke on-demand reasoning at any token during code generation via cold-start imitation followed by outcome-based RL, reaching state-of-the-art results on LeetCode, LiveCodeBench, HumanEval, and MBPP.

Scalable Token-Level Hallucination Detection in Large Language Models

cs.CL · 2026-05-12 · unverdicted · novelty 6.0

TokenHD uses a scalable data synthesis engine and importance-weighted training to create token-level hallucination detectors that work on free-form text and scale from 0.6B to 8B parameters, outperforming larger reasoning models.

Train Separately, Merge Together: Modular Post-Training with Mixture-of-Experts

cs.LG · 2026-04-20 · unverdicted · novelty 6.0

BAR trains independent domain experts via separate mid-training, SFT, and RL pipelines then composes them with a MoE router to match monolithic retraining performance at lower cost and without catastrophic forgetting.

Internalizing Outcome Supervision into Process Supervision: A New Paradigm for Reinforcement Learning for Reasoning

cs.LG · 2026-04-19 · unverdicted · novelty 6.0

A new RL paradigm for reasoning where models generate their own internal process supervision from outcome feedback by recycling failed trajectories.

InCoder-32B-Thinking: Industrial Code World Model for Thinking

cs.AR · 2026-04-03 · unverdicted · novelty 6.0

InCoder-32B-Thinking uses error-feedback synthesized thinking traces and a code world model to reach top open-source scores on general and industrial code benchmarks including 81.3% on LiveCodeBench and 84.0% on CAD-Coder.

Open-Reasoner-Zero: An Open Source Approach to Scaling Up Reinforcement Learning on the Base Model

cs.LG · 2025-03-31 · unverdicted · novelty 6.0

A simple PPO-based RL training pipeline on base models scales reasoning performance and response length, outperforming prior work on math and science benchmarks with one-tenth the training steps.

Towards Reasoning Era: A Survey of Long Chain-of-Thought for Reasoning Large Language Models

cs.AI · 2025-03-12 · unverdicted · novelty 5.0

The paper unifies perspectives on Long CoT in reasoning LLMs by introducing a taxonomy, detailing characteristics of deep reasoning and reflection, and discussing emergence phenomena and future directions.

citing papers explorer

Showing 13 of 13 citing papers.

Primal Generation, Dual Judgment: Self-Training from Test-Time Scaling cs.LG · 2026-05-11 · conditional · none · ref 54
DuST self-trains LLMs for code generation by ranking their own test-time samples via sandbox execution and applying GRPO, improving judgment by +6.2 NDCG and single-sample pass@1 by +3.1 on LiveCodeBench.
Teaching Language Models to Think in Code cs.CL · 2026-05-08 · unverdicted · none · ref 1 · 2 links
ThinC trains small models to reason primarily in code rather than natural language, outperforming tool-integrated baselines and even larger models on competition math benchmarks.
Rethinking KV Cache Eviction via a Unified Information-Theoretic Objective cs.LG · 2026-04-28 · unverdicted · none · ref 2
KV cache eviction is unified under an information capacity maximization principle derived from a linear-Gaussian attention surrogate, with CapKV proposed as a leverage-score based implementation that outperforms prior heuristics in experiments.
Rethinking the Comparison Unit in Sequence-Level Reinforcement Learning: An Equal-Length Paired Training Framework from Loss Correction to Sample Construction cs.LG · 2026-04-19 · unverdicted · none · ref 26
EqLen is a sample-construction framework that builds equal-length paired segments via dual-track generation and masking for stable group-relative RL in sequences, reframing the length problem as a comparison-unit issue rather than loss scaling.
TrigReason: Trigger-Based Collaboration between Small and Large Reasoning Models cs.AI · 2026-04-16 · unverdicted · none · ref 1
TrigReason matches large reasoning model accuracy on math and science benchmarks by delegating most steps to small models and intervening selectively on three triggers, cutting latency by 43.9% and cost by 73.3%.
SUPERNOVA: Eliciting General Reasoning in LLMs with Reinforcement Learning on Natural Instructions cs.AI · 2026-04-09 · unverdicted · none · ref 2
SUPERNOVA adapts instruction-tuning data for RLVR and achieves up to 52.8% relative gains on general reasoning benchmarks like BBEH through targeted task selection and mixing.
Think Anywhere in Code Generation cs.SE · 2026-03-31 · unverdicted · none · ref 1
Think-Anywhere lets LLMs invoke on-demand reasoning at any token during code generation via cold-start imitation followed by outcome-based RL, reaching state-of-the-art results on LeetCode, LiveCodeBench, HumanEval, and MBPP.
Scalable Token-Level Hallucination Detection in Large Language Models cs.CL · 2026-05-12 · unverdicted · none · ref 36
TokenHD uses a scalable data synthesis engine and importance-weighted training to create token-level hallucination detectors that work on free-form text and scale from 0.6B to 8B parameters, outperforming larger reasoning models.
Train Separately, Merge Together: Modular Post-Training with Mixture-of-Experts cs.LG · 2026-04-20 · unverdicted · none · ref 1
BAR trains independent domain experts via separate mid-training, SFT, and RL pipelines then composes them with a MoE router to match monolithic retraining performance at lower cost and without catastrophic forgetting.
Internalizing Outcome Supervision into Process Supervision: A New Paradigm for Reinforcement Learning for Reasoning cs.LG · 2026-04-19 · unverdicted · none · ref 17
A new RL paradigm for reasoning where models generate their own internal process supervision from outcome feedback by recycling failed trajectories.
InCoder-32B-Thinking: Industrial Code World Model for Thinking cs.AR · 2026-04-03 · unverdicted · none · ref 2
InCoder-32B-Thinking uses error-feedback synthesized thinking traces and a code world model to reach top open-source scores on general and industrial code benchmarks including 81.3% on LiveCodeBench and 84.0% on CAD-Coder.
Open-Reasoner-Zero: An Open Source Approach to Scaling Up Reinforcement Learning on the Base Model cs.LG · 2025-03-31 · unverdicted · none · ref 24
A simple PPO-based RL training pipeline on base models scales reasoning performance and response length, outperforming prior work on math and science benchmarks with one-tenth the training steps.
Towards Reasoning Era: A Survey of Long Chain-of-Thought for Reasoning Large Language Models cs.AI · 2025-03-12 · unverdicted · none · ref 7
The paper unifies perspectives on Long CoT in reasoning LLMs by introducing a taxonomy, detailing characteristics of deep reasoning and reflection, and discussing emergence phenomena and future directions.

Opencodereasoning: Advancing data distillation for competitive coding.arXiv preprint arXiv:2504.01943

hub tools

citation-role summary

citation-polarity summary

fields

years

verdicts

roles

polarities

representative citing papers

citing papers explorer