pith. machine review for the scientific record. sign in

GeometryZero: Advancing Geometry Solving via Group Contrastive Policy Optimization

2 Pith papers cite this work. Polarity classification is still indexing.

2 Pith papers citing it
abstract

Recent progress in large language models (LLMs) has boosted mathematical reasoning, yet geometry remains challenging where auxiliary construction is often essential. Prior methods either underperform or depend on very large models (e.g., GPT-4o), making them costly. We argue that reinforcement learning with verifiable rewards (e.g., GRPO) can train smaller models to couple auxiliary construction with solid geometric reasoning. However, naively applying GRPO yields unconditional rewards, encouraging indiscriminate and sometimes harmful constructions. We propose Group Contrastive Policy Optimization (GCPO), an RL framework with two components: (1) Group Contrastive Masking, which assigns positive/negative construction rewards based on contextual utility, and (2) a Length Reward that encourages longer reasoning chains. On top of GCPO, we build GeometryZero, an affordable family of geometry reasoning models that selectively use auxiliary construction. Experiments on Geometry3K and MathVista show GeometryZero consistently outperforms RL baselines (e.g., GRPO, ToRL). The code has been available at https://github.com/ekonwang/GeometryZero.

citation-role summary

method 1

citation-polarity summary

fields

cs.CL 2

years

2026 2

verdicts

UNVERDICTED 2

roles

method 1

polarities

use method 1

representative citing papers

citing papers explorer

Showing 2 of 2 citing papers.