Internalizing Geometric Law: Learning from Solver Residuals for Precision-Critical Generation

Pang Zixi; Rafael Cabral; Shen Xin; Ziyi Shou

arxiv: 2606.09278 · v1 · pith:QKC7HHHJnew · submitted 2026-06-08 · 💻 cs.LG · cs.AI

Internalizing Geometric Law: Learning from Solver Residuals for Precision-Critical Generation

Rafael Cabral , Pang Zixi , Ziyi Shou , Shen Xin This is my paper

Pith reviewed 2026-06-27 17:00 UTC · model grok-4.3

classification 💻 cs.LG cs.AI

keywords geometric synthesisreward designoutlier gradient maskinglanguage modelsconstraint satisfactiondifferentiable lossPyGeoX-Bench

0 comments

The pith

Saturating additive rewards prevent one violated constraint from erasing the learning signal across all others in geometric synthesis.

A machine-rendered reading of the paper's core claim, the machinery that carries it, and where it could break.

Large language models often fail to produce constructions that satisfy many interacting geometric constraints at once when trained with conventional global error rewards. The paper identifies Outlier Gradient Masking as the cause: a single badly violated constraint can dominate the total norm and wipe out the gradient signal for every other constraint. It therefore introduces Saturating Additive Rewards that replace the global norm with a sum of separate bounded terms, one per constraint. Each term saturates so that extreme violations do not cancel progress on the remaining constraints. On the released PyGeoX-Bench the new reward raises the hard-tier solving rate by a factor of 2.3 relative to MSE-based baselines, and the resulting 8B model matches much larger frontier systems.

Core claim

The central claim is that global-norm rewards such as exp(-MSE) allow a single outlier constraint to nullify the learning signal for all other constraints, whereas Saturating Additive Rewards that decompose the total into bounded per-constraint terms preserve partial progress and deliver consistent gradients even under severe violations, producing a 2.3 times higher hard-tier solving rate on PyGeoX-Bench.

What carries the argument

Saturating Additive Rewards (SAR), a reward that sums individually bounded saturating functions of each constraint residual so that no single term can dominate the total.

If this is right

Models trained with SAR solve a larger fraction of problems that contain dozens of simultaneously active constraints.
An 8B-parameter model trained under SAR reaches performance comparable to much larger frontier systems on the benchmark.
Per-constraint reward terms allow direct inspection of which individual constraints remain unsatisfied during generation.
The released PyGeoX DSL turns declarative geometric constraints into a differentiable loss that can be used for any downstream training method.

Where Pith is reading between the lines

These are editorial extensions of the paper, not claims the author makes directly.

The same per-component saturation principle could be tested in other multi-constraint domains such as logical theorem proving or physical simulation from text.
Future benchmarks could measure how the benefit of SAR scales when the typical number of constraints per problem increases beyond the current suite.
Reward design that isolates and bounds each constraint may matter more than the choice of base loss function for any task that requires simultaneous satisfaction of many independent conditions.

Load-bearing premise

The 300 problems and their constraint interactions in PyGeoX-Bench are representative of the precision-critical geometric synthesis tasks that arise in real technical diagramming and mechanical design.

What would settle it

An experiment that retrains the same model on a fresh collection of geometric problems drawn from actual engineering drawings and finds that SAR produces no improvement over MSE-based rewards would falsify the central claim.

Figures

Figures reproduced from arXiv: 2606.09278 by Pang Zixi, Rafael Cabral, Shen Xin, Ziyi Shou.

**Figure 2.** Figure 2: Reward signal analysis for Qwen3-8B on PyGeoX-Bench. (Left) The global-norm reward collapses to [PITH_FULL_IMAGE:figures/full_fig_p004_2.png] view at source ↗

**Figure 3.** Figure 3: The PyGeoX Symbolic Representation of a geometric figure (a pentagon ABCDE with incircle O and [PITH_FULL_IMAGE:figures/full_fig_p006_3.png] view at source ↗

**Figure 4.** Figure 4: Data generation pipeline. randomly sampled from a weighted vocabulary of 35 object and 38 relationship types. The LLM expands these seeds into fully defined geometric specifications which are subsequently validated against the PyGeoX numerical solver and any configuration that fails to converge or exhibits degeneracy is automatically discarded. Each training sample is a 4-tuple: (1) natural language descr… view at source ↗

**Figure 5.** Figure 5: Visual progression of the reward landscape for the diagram in Figure 3. The top row shows all [PITH_FULL_IMAGE:figures/full_fig_p015_5.png] view at source ↗

**Figure 6.** Figure 6: Ground truth diagram image for the easy difficulty example generated by PyGeoX (left) and Qwen3- [PITH_FULL_IMAGE:figures/full_fig_p021_6.png] view at source ↗

**Figure 7.** Figure 7: Ground truth diagram image for the medium difficulty example generated by PyGeoX (left) and [PITH_FULL_IMAGE:figures/full_fig_p022_7.png] view at source ↗

**Figure 8.** Figure 8: Ground truth diagram image for the hard difficulty example generated by PyGeoX (left) and Qwen3- [PITH_FULL_IMAGE:figures/full_fig_p023_8.png] view at source ↗

read the original abstract

Large Language Models frequently hallucinate in precision-critical domains such as technical diagramming and mechanical design, where outputs must satisfy strict geometric constraints. We study open-ended geometric synthesis from natural language: translating free-form descriptions into precise constructions whose entities must simultaneously satisfy dozens of interacting constraints. To make this tractable, we release PyGeoX, a programmable geometric DSL that compiles declarative constraints into a differentiable loss, and PyGeoX-Bench, a stratified suite of 300 problems with per-constraint verifiable rewards. Using PyGeoX as a verifier, we identify a failure mode we call Outlier Gradient Masking: under global-norm rewards (any scheme that aggregates residuals through a single norm, for example, $\exp(-\mathrm{MSE})$), a single outlier constraint can nullify the learning signal across all others. To address this, we propose Saturating Additive Rewards (SAR), which decompose the reward into bounded per-constraint terms, preserving partial progress and ensuring consistent gradients even under severe violations. Against MSE-based rewards, the natural baseline for geometry solvers, SAR improves the hard-tier solving rate by $2.3\times$, and the resulting 8B model is competitive with much larger frontier systems on this benchmark. We release the engine, benchmark, and data at https://github.com/Huawei-AI4Math/PyGeoX.

Editorial analysis

A structured set of objections, weighed in public.

Desk editor's note, referee report, simulated authors' rebuttal, and a circularity audit. Tearing a paper down is the easy half of reading it; the pith above is the substance, this is the friction.

Desk Editor's Note private letter to a colleague

SAR reward plus released PyGeoX-Bench and code make the 2.3x hard-tier gain directly checkable.

read the letter

The main takeaway is that the authors release a geometric DSL (PyGeoX) that compiles constraints into a differentiable loss, a 300-problem benchmark stratified by difficulty, and a reward scheme called SAR. SAR breaks the total reward into bounded per-constraint terms instead of a single global norm like exp(-MSE). This stops one badly violated constraint from wiping out the gradient for the rest, which they label outlier gradient masking. On their benchmark the change lifts the hard-tier solve rate by 2.3 times and lets an 8B model stay competitive with larger frontier systems.

The release of the compiler, the problem suite, and the training code is the part that actually matters. Anyone can rerun the comparison and see whether the numerical result holds, which is rarer than it should be.

The soft spot is how well the 300 problems stand in for real precision-critical tasks in diagramming or mechanical design. The paper presents them as representative, but the constraint density and interaction patterns could still be narrower than what shows up in production CAD work. If the benchmark problems are mostly loosely coupled, the reported lift may shrink on harder instances. The baselines are the obvious MSE variants, so that part is fair.

This is aimed at people working on constrained generation or reward design for structured outputs. A reader who needs verifiable geometric synthesis will get immediate use from the artifacts.

I would send it to peer review. The empirical claim rests on released material rather than hidden runs, so referees have something concrete to examine.

Referee Report

0 major / 3 minor

Summary. The manuscript introduces PyGeoX, a programmable geometric DSL that compiles declarative constraints into a differentiable loss, and PyGeoX-Bench, a stratified 300-problem suite with per-constraint verifiable rewards. It diagnoses an 'Outlier Gradient Masking' failure mode under global-norm rewards (e.g., exp(-MSE)) and proposes Saturating Additive Rewards (SAR) that decompose the reward into bounded per-constraint terms. The central empirical claim is that SAR yields a 2.3× higher hard-tier solving rate than MSE-based rewards on the benchmark, with the resulting 8B model competitive with larger frontier systems; the DSL compiler, benchmark, and training code are released.

Significance. If the reported performance gains hold under detailed scrutiny, the work supplies a targeted reward-design technique for training LLMs on precision-critical geometric synthesis, directly addressing a diagnosed gradient issue in constraint aggregation. The explicit release of the engine, benchmark artifacts, and code constitutes a clear strength, supporting reproducibility and follow-on work in technical diagramming and mechanical design applications.

minor comments (3)

Abstract: the notation exp(-MSE) is used without an accompanying equation or explicit definition of the aggregation; a one-line formal statement of the baseline reward would improve immediate readability.
Abstract: the phrase 'hard-tier solving rate' is introduced without a parenthetical definition or reference to the stratification criteria used in PyGeoX-Bench; a brief clarification would help readers interpret the 2.3× figure.
The manuscript would benefit from an explicit statement in the experimental section (or a dedicated reproducibility paragraph) confirming that the released training code exactly reproduces the reported 2.3× delta under the same random seeds and hyper-parameters.

Simulated Author's Rebuttal

0 responses · 0 unresolved

We thank the referee for the accurate summary of our work and the recommendation for minor revision. The assessment correctly captures the PyGeoX DSL, the 300-problem benchmark, the identification of outlier gradient masking, and the SAR reward design. No specific major comments were raised in the report.

Circularity Check

0 steps flagged

No significant circularity; empirical result on released benchmark

full rationale

The paper's core contribution is an empirical comparison: SAR yields a 2.3× higher hard-tier solve rate than MSE-based rewards on the newly introduced, released PyGeoX-Bench. The method (per-constraint bounded rewards vs. global-norm aggregation) and the identified failure mode (Outlier Gradient Masking) are presented as observations from running the solver, not as derivations that reduce to fitted parameters or self-citations. No equations, uniqueness theorems, or ansatzes are invoked that loop back to the inputs by construction. The benchmark, DSL compiler, and training code are released, making the numerical claim directly falsifiable outside any internal loop. This is a standard non-circular empirical paper.

Axiom & Free-Parameter Ledger

0 free parameters · 1 axioms · 1 invented entities

Abstract-only view limits visibility into internal assumptions; the central claim rests on the representativeness of the new benchmark and the transferability of the observed reward improvement.

axioms (1)

domain assumption PyGeoX-Bench problems capture the interacting constraints typical of real precision-critical geometric tasks
Invoked to support generalization of the 2.3x improvement beyond the benchmark.

invented entities (1)

Saturating Additive Rewards (SAR) no independent evidence
purpose: Decompose global reward into bounded per-constraint terms to avoid outlier masking
New method introduced to address the identified failure mode; no independent evidence outside the paper's experiments.

pith-pipeline@v0.9.1-grok · 5774 in / 1239 out tokens · 18893 ms · 2026-06-27T17:00:05.395012+00:00 · methodology

discussion (0)

Reference graph

Works this paper leans on

46 extracted references · 5 canonical work pages

[1]

Nature , volume=

AlphaGeometry: An Automatic Theorem Prover for High-School Geometry , author=. Nature , volume=
[2]

Proceedings of the International Conference on Learning Representations (ICLR) , year=

GeometryZero: Generating Geometry Proofs by Searching with Group Contrastive Policy Optimization , author=. Proceedings of the International Conference on Learning Representations (ICLR) , year=
[3]

Science Robotics , volume=

INGRID: Instructing Generative Robots with Kinematic Mechanism Design , author=. Science Robotics , volume=
[4]

2025 , eprint=

RLCAD: Reinforcement Learning Training Gym for Revolution Involved CAD Command Sequence Generation , author=. 2025 , eprint=

2025
[5]

ArXiv , year=

ReCAD: Reinforcement Learning Enhanced Parametric CAD Model Generation with Vision-Language Models , author=. ArXiv , year=
[6]

Proceedings of the IEEE/CVF Conference on Computer Vision and Pattern Recognition (CVPR) , year=

GeoCoder: Fine-tuning VLMs for Visual Geometric Code Synthesis , author=. Proceedings of the IEEE/CVF Conference on Computer Vision and Pattern Recognition (CVPR) , year=
[7]

arXiv preprint arXiv:2501.00001 , year=

AlphaGeometry 2: Advancing Automated Geometric Theorem Proving , author=. arXiv preprint arXiv:2501.00001 , year=

arXiv
[8]

Nature Machine Intelligence , year=

BIRM: Bridging Intermediate Reasoning and Master Models for Complex Task Solving , author=. Nature Machine Intelligence , year=
[9]

Proceedings of the International Conference on Machine Learning (ICML) , year=

PIRF: Physics-Informed Reward Fine-Tuning for Generative Models , author=. Proceedings of the International Conference on Machine Learning (ICML) , year=
[10]

ASME Journal of Mechanical Design , year=

Creative Synthesis of Kinematic Mechanisms via Variational Autoencoders , author=. ASME Journal of Mechanical Design , year=
[11]

Wang, Junxiao and Zhang, Ting and Yu, Heng and Wang, Jingdong and Huang, Hua , journal=
[12]

Wei, Jingxuan and Jia, Caijun and Bai, Xi and Xu, Xinglong and Li, Siyuan and Sun, Linzhuang and Yu, Bihui and He, Conghui and Wu, Lijun and Tan, Cheng , journal=
[13]

2025 , note=

Aligning Constraint Generation with Design Intent in Parametric CAD , author=. 2025 , note=

2025
[14]

2023 , eprint=

FormalGeo: An Extensible Formalized Framework for Olympiad Geometric Problem Solving , author=. 2023 , eprint=

2023
[15]

ArXiv , title =

Jian Hu and Xibin Wu and Weixun Wang and Dehao Zhang and Yu Cao and OpenLLMAI Team and Netease Fuxi and AI Lab and Alibaba Group , booktitle =. ArXiv , title =
[16]

Nature645(8081), 633–638 (Sep 2025)

DeepSeek-R1 incentivizes reasoning in LLMs through reinforcement learning , volume=. Nature , publisher=. 2025 , month=sep, pages=. doi:10.1038/s41586-025-09422-z , number=

work page doi:10.1038/s41586-025-09422-z 2025
[17]

Curves and Surfaces in Unigraphics and Parasolid , ISBN=

Sears, Ken and Allen, George , year=. Curves and Surfaces in Unigraphics and Parasolid , ISBN=. doi:10.1007/978-3-322-86773-5_8 , booktitle=

work page doi:10.1007/978-3-322-86773-5_8
[18]

SolveSpace: Parametric 2d/3d CAD , url =

Westhues, Jonathan and. SolveSpace: Parametric 2d/3d CAD , url =
[19]

Ziatdinov, Rushan and Valles, James R. , year=. Synthesis of Modeling, Visualization, and Programming in GeoGebra as an Effective Approach for Teaching and Learning STEM Topics , volume=. Mathematics , publisher=. doi:10.3390/math10030398 , number=

work page doi:10.3390/math10030398
[20]

PeerJ Computer Science , issn =

SymPy: symbolic computing in Python , author =. PeerJ Computer Science , issn =
[21]

Z3: An Efficient SMT Solver

de Moura, Leonardo and Bj rner, Nikolaj. Z3: An Efficient SMT Solver. Tools and Algorithms for the Construction and Analysis of Systems. 2008

2008
[22]

An introduction to geometry expert

Chou, Shang-Ching and Gao, Xiao-Shan and Zhang, Jing-Zhong. An introduction to geometry expert. Automated Deduction --- Cade-13. 1996

1996
[23]

2025 , eprint=

GeoLoom: High-quality Geometric Diagram Generation from Textual Input , author=. 2025 , eprint=

2025
[24]

ArXiv , title =

Zhibin Gou and Zhihong Shao and Yeyun Gong and Yelong Shen and Yujiu Yang and Minlie Huang and Nan Duan and Weizhu Chen , booktitle =. ArXiv , title =
[25]

Raissi and P

M. Raissi and P. Perdikaris and G. Karniadakis , booktitle =. ArXiv , title =
[26]

Olympiad-level formal mathematical reasoning with reinforcement learning

Hubert, Thomas and Mehta, Rishi and Sartran, Laurent and Horv. Olympiad-level formal mathematical reasoning with reinforcement learning , journal =. 2025 , month =. doi:10.1038/s41586-025-09833-y , url =

work page doi:10.1038/s41586-025-09833-y 2025
[27]

2025 , eprint=

Reinforcement Learning with Verifiable Rewards Implicitly Incentivizes Correct Reasoning in Base LLMs , author=. 2025 , eprint=

2025
[28]

Leike and John Schulman and I

Hunter Lightman and Vineet Kosaraju and Yura Burda and Harrison Edwards and Bowen Baker and Teddy Lee and J. Leike and John Schulman and I. Sutskever and K. Cobbe , booktitle =. ArXiv , title =
[29]

2025 , eprint=

Hybrid Reinforcement: When Reward Is Sparse, It's Better to Be Dense , author=. 2025 , eprint=

2025
[30]

2025 , eprint=

Aligning Constraint Generation with Design Intent in Parametric CAD , author=. 2025 , eprint=

2025
[31]

2025 , eprint=

CAD-Llama: Leveraging Large Language Models for Computer-Aided Design Parametric 3D Model Generation , author=. 2025 , eprint=

2025
[32]

2024 , eprint=

CAD-GPT: Synthesising CAD Construction Sequence with Spatial Reasoning-Enhanced Multimodal LLMs , author=. 2024 , eprint=

2024
[33]

2025 , eprint=

CAD-MLLM: Unifying Multimodality-Conditioned CAD Generation With MLLM , author=. 2025 , eprint=

2025
[34]

2025 , eprint=

CrystalFormer-RL: Reinforcement Fine-Tuning for Materials Design , author=. 2025 , eprint=

2025
[35]

2026 , eprint=

GGBench: A Geometric Generative Reasoning Benchmark for Unified Multimodal Models , author=. 2026 , eprint=

2026
[36]

2018 , eprint=

Equivalence Between Policy Gradients and Soft Q-Learning , author=. 2018 , eprint=

2018
[37]

2022 , eprint=

RL with KL penalties is better viewed as Bayesian inference , author=. 2022 , eprint=

2022
[38]

, title =

Hinton, Geoffrey E. , title =. Neural Computation , volume =. 2002 , month =. doi:10.1162/089976602760128018 , url =

work page doi:10.1162/089976602760128018 2002
[39]

2022 , eprint=

GeoQA: A Geometric Question Answering Benchmark Towards Multimodal Numerical Reasoning , author=. 2022 , eprint=

2022
[40]

2022 , eprint=

UniGeo: Unifying Geometry Logical Reasoning via Reformulating Mathematical Expression , author=. 2022 , eprint=

2022
[41]

2024 , eprint=

GeoEval: Benchmark for Evaluating LLMs and Multi-Modal Models on Geometry Problem-Solving , author=. 2024 , eprint=

2024
[42]

2024 , eprint=

FGeo-DRL: Deductive Reasoning for Geometric Problems through Deep Reinforcement Learning , author=. 2024 , eprint=

2024
[43]

International Joint Conference on Artificial Intelligence , year=

A Multi-Modal Neural Geometric Solver with Textual Clauses Parsed from Diagram , author=. International Joint Conference on Artificial Intelligence , year=
[44]

ArXiv , year=

GeoSense: Evaluating Identification and Application of Geometric Principles in Multimodal Reasoning , author=. ArXiv , year=
[45]

2024 , eprint=

SWIFT:A Scalable lightWeight Infrastructure for Fine-Tuning , author=. 2024 , eprint=

2024
[46]

arXiv preprint arXiv:2405.11143 , year=

OpenRLHF: An Easy-to-use, Scalable and High-performance RLHF Framework , author=. arXiv preprint arXiv:2405.11143 , year=

Pith/arXiv arXiv

[1] [1]

Nature , volume=

AlphaGeometry: An Automatic Theorem Prover for High-School Geometry , author=. Nature , volume=

[2] [2]

Proceedings of the International Conference on Learning Representations (ICLR) , year=

GeometryZero: Generating Geometry Proofs by Searching with Group Contrastive Policy Optimization , author=. Proceedings of the International Conference on Learning Representations (ICLR) , year=

[3] [3]

Science Robotics , volume=

INGRID: Instructing Generative Robots with Kinematic Mechanism Design , author=. Science Robotics , volume=

[4] [4]

2025 , eprint=

RLCAD: Reinforcement Learning Training Gym for Revolution Involved CAD Command Sequence Generation , author=. 2025 , eprint=

2025

[5] [5]

ArXiv , year=

ReCAD: Reinforcement Learning Enhanced Parametric CAD Model Generation with Vision-Language Models , author=. ArXiv , year=

[6] [6]

Proceedings of the IEEE/CVF Conference on Computer Vision and Pattern Recognition (CVPR) , year=

GeoCoder: Fine-tuning VLMs for Visual Geometric Code Synthesis , author=. Proceedings of the IEEE/CVF Conference on Computer Vision and Pattern Recognition (CVPR) , year=

[7] [7]

arXiv preprint arXiv:2501.00001 , year=

AlphaGeometry 2: Advancing Automated Geometric Theorem Proving , author=. arXiv preprint arXiv:2501.00001 , year=

arXiv

[8] [8]

Nature Machine Intelligence , year=

BIRM: Bridging Intermediate Reasoning and Master Models for Complex Task Solving , author=. Nature Machine Intelligence , year=

[9] [9]

Proceedings of the International Conference on Machine Learning (ICML) , year=

PIRF: Physics-Informed Reward Fine-Tuning for Generative Models , author=. Proceedings of the International Conference on Machine Learning (ICML) , year=

[10] [10]

ASME Journal of Mechanical Design , year=

Creative Synthesis of Kinematic Mechanisms via Variational Autoencoders , author=. ASME Journal of Mechanical Design , year=

[11] [11]

Wang, Junxiao and Zhang, Ting and Yu, Heng and Wang, Jingdong and Huang, Hua , journal=

[12] [12]

Wei, Jingxuan and Jia, Caijun and Bai, Xi and Xu, Xinglong and Li, Siyuan and Sun, Linzhuang and Yu, Bihui and He, Conghui and Wu, Lijun and Tan, Cheng , journal=

[13] [13]

2025 , note=

Aligning Constraint Generation with Design Intent in Parametric CAD , author=. 2025 , note=

2025

[14] [14]

2023 , eprint=

FormalGeo: An Extensible Formalized Framework for Olympiad Geometric Problem Solving , author=. 2023 , eprint=

2023

[15] [15]

ArXiv , title =

Jian Hu and Xibin Wu and Weixun Wang and Dehao Zhang and Yu Cao and OpenLLMAI Team and Netease Fuxi and AI Lab and Alibaba Group , booktitle =. ArXiv , title =

[16] [16]

Nature645(8081), 633–638 (Sep 2025)

DeepSeek-R1 incentivizes reasoning in LLMs through reinforcement learning , volume=. Nature , publisher=. 2025 , month=sep, pages=. doi:10.1038/s41586-025-09422-z , number=

work page doi:10.1038/s41586-025-09422-z 2025

[17] [17]

Curves and Surfaces in Unigraphics and Parasolid , ISBN=

Sears, Ken and Allen, George , year=. Curves and Surfaces in Unigraphics and Parasolid , ISBN=. doi:10.1007/978-3-322-86773-5_8 , booktitle=

work page doi:10.1007/978-3-322-86773-5_8

[18] [18]

SolveSpace: Parametric 2d/3d CAD , url =

Westhues, Jonathan and. SolveSpace: Parametric 2d/3d CAD , url =

[19] [19]

Ziatdinov, Rushan and Valles, James R. , year=. Synthesis of Modeling, Visualization, and Programming in GeoGebra as an Effective Approach for Teaching and Learning STEM Topics , volume=. Mathematics , publisher=. doi:10.3390/math10030398 , number=

work page doi:10.3390/math10030398

[20] [20]

PeerJ Computer Science , issn =

SymPy: symbolic computing in Python , author =. PeerJ Computer Science , issn =

[21] [21]

Z3: An Efficient SMT Solver

de Moura, Leonardo and Bj rner, Nikolaj. Z3: An Efficient SMT Solver. Tools and Algorithms for the Construction and Analysis of Systems. 2008

2008

[22] [22]

An introduction to geometry expert

Chou, Shang-Ching and Gao, Xiao-Shan and Zhang, Jing-Zhong. An introduction to geometry expert. Automated Deduction --- Cade-13. 1996

1996

[23] [23]

2025 , eprint=

GeoLoom: High-quality Geometric Diagram Generation from Textual Input , author=. 2025 , eprint=

2025

[24] [24]

ArXiv , title =

Zhibin Gou and Zhihong Shao and Yeyun Gong and Yelong Shen and Yujiu Yang and Minlie Huang and Nan Duan and Weizhu Chen , booktitle =. ArXiv , title =

[25] [25]

Raissi and P

M. Raissi and P. Perdikaris and G. Karniadakis , booktitle =. ArXiv , title =

[26] [26]

Olympiad-level formal mathematical reasoning with reinforcement learning

Hubert, Thomas and Mehta, Rishi and Sartran, Laurent and Horv. Olympiad-level formal mathematical reasoning with reinforcement learning , journal =. 2025 , month =. doi:10.1038/s41586-025-09833-y , url =

work page doi:10.1038/s41586-025-09833-y 2025

[27] [27]

2025 , eprint=

Reinforcement Learning with Verifiable Rewards Implicitly Incentivizes Correct Reasoning in Base LLMs , author=. 2025 , eprint=

2025

[28] [28]

Leike and John Schulman and I

Hunter Lightman and Vineet Kosaraju and Yura Burda and Harrison Edwards and Bowen Baker and Teddy Lee and J. Leike and John Schulman and I. Sutskever and K. Cobbe , booktitle =. ArXiv , title =

[29] [29]

2025 , eprint=

Hybrid Reinforcement: When Reward Is Sparse, It's Better to Be Dense , author=. 2025 , eprint=

2025

[30] [30]

2025 , eprint=

Aligning Constraint Generation with Design Intent in Parametric CAD , author=. 2025 , eprint=

2025

[31] [31]

2025 , eprint=

CAD-Llama: Leveraging Large Language Models for Computer-Aided Design Parametric 3D Model Generation , author=. 2025 , eprint=

2025

[32] [32]

2024 , eprint=

CAD-GPT: Synthesising CAD Construction Sequence with Spatial Reasoning-Enhanced Multimodal LLMs , author=. 2024 , eprint=

2024

[33] [33]

2025 , eprint=

CAD-MLLM: Unifying Multimodality-Conditioned CAD Generation With MLLM , author=. 2025 , eprint=

2025

[34] [34]

2025 , eprint=

CrystalFormer-RL: Reinforcement Fine-Tuning for Materials Design , author=. 2025 , eprint=

2025

[35] [35]

2026 , eprint=

GGBench: A Geometric Generative Reasoning Benchmark for Unified Multimodal Models , author=. 2026 , eprint=

2026

[36] [36]

2018 , eprint=

Equivalence Between Policy Gradients and Soft Q-Learning , author=. 2018 , eprint=

2018

[37] [37]

2022 , eprint=

RL with KL penalties is better viewed as Bayesian inference , author=. 2022 , eprint=

2022

[38] [38]

, title =

Hinton, Geoffrey E. , title =. Neural Computation , volume =. 2002 , month =. doi:10.1162/089976602760128018 , url =

work page doi:10.1162/089976602760128018 2002

[39] [39]

2022 , eprint=

GeoQA: A Geometric Question Answering Benchmark Towards Multimodal Numerical Reasoning , author=. 2022 , eprint=

2022

[40] [40]

2022 , eprint=

UniGeo: Unifying Geometry Logical Reasoning via Reformulating Mathematical Expression , author=. 2022 , eprint=

2022

[41] [41]

2024 , eprint=

GeoEval: Benchmark for Evaluating LLMs and Multi-Modal Models on Geometry Problem-Solving , author=. 2024 , eprint=

2024

[42] [42]

2024 , eprint=

FGeo-DRL: Deductive Reasoning for Geometric Problems through Deep Reinforcement Learning , author=. 2024 , eprint=

2024

[43] [43]

International Joint Conference on Artificial Intelligence , year=

A Multi-Modal Neural Geometric Solver with Textual Clauses Parsed from Diagram , author=. International Joint Conference on Artificial Intelligence , year=

[44] [44]

ArXiv , year=

GeoSense: Evaluating Identification and Application of Geometric Principles in Multimodal Reasoning , author=. ArXiv , year=

[45] [45]

2024 , eprint=

SWIFT:A Scalable lightWeight Infrastructure for Fine-Tuning , author=. 2024 , eprint=

2024

[46] [46]

arXiv preprint arXiv:2405.11143 , year=

OpenRLHF: An Easy-to-use, Scalable and High-performance RLHF Framework , author=. arXiv preprint arXiv:2405.11143 , year=

Pith/arXiv arXiv