A deep dive into scaling rl for code generation with synthetic data and curricula

Cansu Sancaktar, David Zhang, Gabriel Synnaeve, Taco Cohen · 2026 · arXiv 2603.24202

2 Pith papers cite this work. Polarity classification is still indexing.

2 Pith papers citing it

representative citing papers

BenchEvolver: Frontier Task Synthesis via Solution-Centric Evolution

cs.SE · 2026-05-31 · unverdicted · novelty 6.0

BenchEvolver evolves coding problem solutions to generate harder, valid tasks, producing LiveCodeBench-Plus where frontier models score 27.5-62.6% and enabling RL gains on held-out tests.

AutoOR: Scalably Post-training LLMs to Autoformalize Operations Research Problems

cs.LG · 2026-04-18 · unverdicted · novelty 6.0

AutoOR uses synthetic data generation and RL post-training with solver feedback to enable 8B LLMs to autoformalize linear, mixed-integer, and non-linear OR problems, matching larger models on benchmarks.

citing papers explorer

Showing 1 of 1 citing paper after filters.

AutoOR: Scalably Post-training LLMs to Autoformalize Operations Research Problems cs.LG · 2026-04-18 · unverdicted · none · ref 71
AutoOR uses synthetic data generation and RL post-training with solver feedback to enable 8B LLMs to autoformalize linear, mixed-integer, and non-linear OR problems, matching larger models on benchmarks.

A deep dive into scaling rl for code generation with synthetic data and curricula

fields

years

verdicts

representative citing papers

citing papers explorer