MathConstraint generates scalable, automatically verifiable combinatorial problems where LLMs achieve 18.5-66.9% accuracy without tools but roughly double that with solver access.
Enigmata: Scaling logical reasoning in large language models with synthetic verifiable puzzles
10 Pith papers cite this work. Polarity classification is still indexing.
citation-role summary
citation-polarity summary
representative citing papers
Introduces OPT* tasks and two training regimes (solver-guided online policy optimization with rank-based reward shaping and search-based offline RL) plus a theoretical link between search success and information extraction per budget unit, showing empirical gains in optimization-like reasoning.
The paper delivers the first systematic taxonomy and hierarchical framework for data-efficient reinforcement learning post-training of large language models across data-centric, training-centric, and framework-centric views.
NExt accelerates RLVR training for LLMs by nonlinearly extrapolating low-rank parameter trajectories extracted from LoRA runs.
HC-INR uses a hierarchical hypernetwork to warp input coordinates into a disentangled space, raising the representable frequency bound while cutting parameters by 30-60% and boosting fidelity up to 4x over prior INRs.
NSTR models space-varying frequency fields in implicit neural representations by learning a frequency transport PDE that modulates global sinusoids, achieving better accuracy-parameter trade-offs than SIREN or Instant-NGP on images, audio, and 3D tasks.
RL training compute for logical reasoning follows a power law with horizon depth whose exponent rises with logical expressiveness, yielding better downstream transfer when models train on richer logics.
SPHINX generates synthetic visual puzzles for benchmarking LVLMs, where GPT-5 scores 51.1% and RLVR training improves both in-domain and external visual reasoning performance.
The paper identifies four missing interfaces (data autolabelling, embodiment retargeting, physics-grounded world models, and video-based reward inference) as the central bottleneck beyond VLA scaling for robot intelligence.
A survey compiling RL methods, challenges, data resources, and applications for enhancing reasoning in large language models and large reasoning models since DeepSeek-R1.
citing papers explorer
-
Low-rank Optimization Trajectories Modeling for LLM RLVR Acceleration
NExt accelerates RLVR training for LLMs by nonlinearly extrapolating low-rank parameter trajectories extracted from LoRA runs.