Title resolution pending

Aggregation:We compute the mean pass rate, mean gap to VBS across all M= 1000 problems for each bootstrap iteration, then report the mean, standard deviation of these aggrega

1 Pith paper cite this work. Polarity classification is still indexing.

1 Pith paper citing it

browse 1 citing papers

Title metadata for this work has not finished resolving. The hub is built from the citation graph; the title resolver retries DOI and OpenAlex on its next pass.

representative citing papers

Beyond Inference-Time Search: Reinforcement Learning Synthesizes Reusable Solvers

cs.LG · 2026-05-18 · conditional · novelty 6.0

RL fine-tuning of Qwen2.5-Coder-14B with GRPO and feasibility-gated reward produces reusable constraint-aware Simulated Annealing solvers for Synergistic Dependency Selection, reducing gap to virtual best solver from 28.7% to 5.0% at 91x lower cost.

citing papers explorer

Showing 1 of 1 citing paper.

Beyond Inference-Time Search: Reinforcement Learning Synthesizes Reusable Solvers cs.LG · 2026-05-18 · conditional · none · ref 15
RL fine-tuning of Qwen2.5-Coder-14B with GRPO and feasibility-gated reward produces reusable constraint-aware Simulated Annealing solvers for Synergistic Dependency Selection, reducing gap to virtual best solver from 28.7% to 5.0% at 91x lower cost.

Title resolution pending

fields

years

verdicts

representative citing papers

citing papers explorer