Hilbert-Geo: Solving Solid Geometric Problems by Neural-Symbolic Reasoning

Bin Dong; Haoyu Cheng; Qiufeng Wang; Ruoran Xu

arxiv: 2605.16385 · v1 · pith:K5DCLXFTnew · submitted 2026-05-11 · 💻 cs.CV · cs.AI· cs.CL

Hilbert-Geo: Solving Solid Geometric Problems by Neural-Symbolic Reasoning

Ruoran Xu , Haoyu Cheng , Bin Dong , Qiufeng Wang This is my paper

Pith reviewed 2026-05-20 22:48 UTC · model grok-4.3

classification 💻 cs.CV cs.AIcs.CL

keywords solid geometryneural-symbolic reasoningformal languagepredicate librarytheorem bankgeometric problem solvingmultimodal reasoningParse2Reason

0 comments

The pith

Hilbert-Geo converts solid geometry text and diagrams into a predicate-based formal language and then applies theorems to produce verifiable step-by-step solutions.

A machine-rendered reading of the paper's core claim, the machinery that carries it, and where it could break.

The paper introduces Hilbert-Geo as a unified formal language framework for solid geometry that includes a predicate library and theorem bank. It defines a Parse2Reason pipeline that first translates natural-language statements and 3D diagrams into conditional description language, then performs relational inference and algebraic computation. This produces strictly correct and human-readable reasoning traces that current multimodal models cannot match on the same tasks. The method also applies to plane geometry and comes with new expert-annotated datasets that include formal-language annotations. A sympathetic reader would care because the approach directly targets the spatial and relational failures that appear when large models confront three-dimensional figures.

Core claim

Hilbert-Geo is the first unified formal language framework for solid geometry, including an extensive predicate library and a dedicated theorem bank. Based on this framework, the Parse2Reason method first parses problem descriptions and diagrams into conditional description language (CDL) and then leverages the formal CDL together with the theorem bank to perform relational inference and algebraic computation, generating strictly correct, verifiable, and human-readable reasoning processes.

What carries the argument

Conditional Description Language (CDL), a formalized language of predicates that encodes geometric conditions from both natural text and visual diagrams so that symbolic reasoning can follow.

If this is right

The generated reasoning processes are strictly correct and human-readable because they rest on explicit theorem applications rather than statistical generation.
The same Parse2Reason pipeline reaches 80.2 percent accuracy on the plane-geometry dataset PlaneFGeo3k, showing that the framework is not limited to three-dimensional cases.
Performance reaches 77.3 percent on SolidFGeo2k and 84.1 percent on MathVerse-Solid, exceeding the results of Gemini-2.5-pro and GPT-5 on the same benchmarks.
The method supplies new expert-annotated datasets equipped with formal-language annotations, solutions, and answers that can support further development of geometric reasoners.

Where Pith is reading between the lines

These are editorial extensions of the paper, not claims the author makes directly.

If the predicate representation proves complete, the same parsing-plus-theorem structure could be reused for other 3D spatial reasoning tasks such as mechanical assembly or architectural verification.
The explicit theorem bank opens the possibility of combining the system with existing automated theorem provers to handle longer proof chains without manual intervention.
Because the output is a sequence of formal steps, it becomes feasible to check the reasoning against an independent symbolic engine and thereby measure the rate of hidden errors.

Load-bearing premise

The conditional description language and predicate library can represent both natural-language statements and solid geometric diagrams without loss of information or ambiguity that would break later reasoning.

What would settle it

A collection of solid-geometry problems whose diagrams contain spatial relations that cannot be expressed uniquely in the predicate library, resulting in either failed parses or inconsistent reasoning chains.

Figures

Figures reproduced from arXiv: 2605.16385 by Bin Dong, Haoyu Cheng, Qiufeng Wang, Ruoran Xu.

**Figure 2.** Figure 2: Error distribution in SolidFGeo2k of Gemini and GPT [PITH_FULL_IMAGE:figures/full_fig_p002_2.png] view at source ↗

**Figure 3.** Figure 3: Visual Perception Error for the Problem in Fig. 4 [PITH_FULL_IMAGE:figures/full_fig_p002_3.png] view at source ↗

**Figure 4.** Figure 4: Overall process of parsing images and text into geometry condition description language (CDL) [PITH_FULL_IMAGE:figures/full_fig_p004_4.png] view at source ↗

**Figure 5.** Figure 5: Reasoning using geometric condition description language (CDL) [PITH_FULL_IMAGE:figures/full_fig_p005_5.png] view at source ↗

**Figure 6.** Figure 6: Performances for Different Subjects (CSS, SMR, SSI, [PITH_FULL_IMAGE:figures/full_fig_p006_6.png] view at source ↗

**Figure 7.** Figure 7: Reasoning performance of MLLMs under different num [PITH_FULL_IMAGE:figures/full_fig_p007_7.png] view at source ↗

**Figure 8.** Figure 8: Fuzzy Jaccard Similarity Score between Predicted CDL [PITH_FULL_IMAGE:figures/full_fig_p007_8.png] view at source ↗

**Figure 12.** Figure 12: example of hallucination and Visual Perception Error [PITH_FULL_IMAGE:figures/full_fig_p008_12.png] view at source ↗

**Figure 11.** Figure 11: Error distribution of Hilbert-Geo (Gemini 2.5 pro and [PITH_FULL_IMAGE:figures/full_fig_p008_11.png] view at source ↗

**Figure 15.** Figure 15: Examples from Mathverse-solid [PITH_FULL_IMAGE:figures/full_fig_p014_15.png] view at source ↗

**Figure 16.** Figure 16: Data Samples from Mathverse-solid and SolidFGeo2k [PITH_FULL_IMAGE:figures/full_fig_p014_16.png] view at source ↗

**Figure 17.** Figure 17: Comparison of Datasets for Solid Geometry Problems [PITH_FULL_IMAGE:figures/full_fig_p014_17.png] view at source ↗

**Figure 18.** Figure 18: Proportion of different subjects in SolidFGeo2k, note [PITH_FULL_IMAGE:figures/full_fig_p015_18.png] view at source ↗

**Figure 19.** Figure 19: Examples from different subjects [PITH_FULL_IMAGE:figures/full_fig_p015_19.png] view at source ↗

**Figure 20.** Figure 20: Construction Parsing Performance: Jaccard similar [PITH_FULL_IMAGE:figures/full_fig_p018_20.png] view at source ↗

**Figure 21.** Figure 21: Condition Parsing Performance: Jaccard similarity [PITH_FULL_IMAGE:figures/full_fig_p019_21.png] view at source ↗

**Figure 23.** Figure 23: Examples from designed samples The complete prompt template is shown below: You are an expert in geometry, logic, and computer science. Your task is to precisely convert a geometry problem (with natural language and an image) into a JSON object following the provided JSON Schema. You must strictly follow the schema and output a complete JSON object. Rule 0: Predicate Compliance (MOST IMPORTANT) - All CDL … view at source ↗

**Figure 25.** Figure 25: Plane and Solid differ greatly in all aspects, even if the [PITH_FULL_IMAGE:figures/full_fig_p022_25.png] view at source ↗

**Figure 24.** Figure 24: Theorem Search Tree and a inference demonstration is [PITH_FULL_IMAGE:figures/full_fig_p022_24.png] view at source ↗

**Figure 27.** Figure 27: A theorem (as illustrated in Fig. 25) that falls into set [PITH_FULL_IMAGE:figures/full_fig_p022_27.png] view at source ↗

**Figure 28.** Figure 28: one illustrative example for reasoning based on Hilbert-Geo [PITH_FULL_IMAGE:figures/full_fig_p023_28.png] view at source ↗

**Figure 29.** Figure 29: Reasoning performance of MLLMs under different [PITH_FULL_IMAGE:figures/full_fig_p024_29.png] view at source ↗

read the original abstract

Geometric problem solving, as a typical multimodal reasoning problem, has attracted much attention and made great progress recently, however most of works focus on plane geometry while usually fail in solid geometry due to 3D spatial diagrams and complex reasoning. To bridge this gap, we introduce Hilbert-Geo, the first unified formal language framework for solid geometry, including an extensive predicate library and a dedicated theorem bank. Based on this framework, we propose a Parse2Reason method containing two steps of first parsing then reasoning. In the parsing step, we utilize conditional description language (CDL), a formalized language composed of predicates specifically designed to construct geometric conditions, to represent both problem description (natural text) and solid diagrams (visual image). In the reasoning step, we leverage those formal CDL and the theorem bank to perform relational inference and algebraic computation, generating strictly correct, verifiable, and human-readable reasoning processes. Notably, our proposed Hilbert-Geo is also applicable to plane geometry. To advance geometric reasoning, we curate two expert-annotated dataset SolidFGeo2k and PlaneFGeo3k, which are furnished with geometric formal language annotations, solutions and answers. Extensive experiments show that our proposed method achieves the state-of-the-art (SOTA) performance 77.3% in SolidFGeo2k and 84.1% in MathVerse-Solid (one small subset in MathVerse dedicated to solid geometry), substantially outperforming leading MLLMs, such as Gemini-2.5-pro (54.2% on SolidFGeo2k) and GPT-5 (62.9% on MathVerse-Solid). In addition, our method achieves the SOTA accuracy 80.2% in PlaneFGeo3k, demonstrating the generality of the Hilbert-Geo in geometric reasoning. Our code and datasets will be publicly available.

Editorial analysis

A structured set of objections, weighed in public.

Desk editor's note, referee report, simulated authors' rebuttal, and a circularity audit. Tearing a paper down is the easy half of reading it; the pith above is the substance, this is the friction.

Referee Report

2 major / 2 minor

Summary. The paper proposes Hilbert-Geo, a unified formal language framework for solid geometry consisting of an extensive predicate library and dedicated theorem bank. It introduces the Parse2Reason pipeline that first converts natural-language problem statements and solid diagrams into Conditional Description Language (CDL) predicates, then performs relational inference and algebraic computation over the theorem bank to produce strictly correct, verifiable reasoning traces. The method is evaluated on two new expert-annotated datasets (SolidFGeo2k and PlaneFGeo3k) plus the MathVerse-Solid subset, reporting SOTA accuracies of 77.3% on SolidFGeo2k and 84.1% on MathVerse-Solid, substantially above leading MLLMs such as Gemini-2.5-pro and GPT-5; the framework is also shown to generalize to plane geometry.

Significance. If the CDL predicate library can be shown to capture all required 3D spatial relations from 2D projections without loss or ambiguity, the neural-symbolic Parse2Reason approach would constitute a meaningful advance in verifiable multimodal geometric reasoning. The public release of the curated datasets with formal annotations and the code would provide a valuable resource for the community.

major comments (2)

[Parsing step / CDL definition] The SOTA claims on SolidFGeo2k rest on the unverified assumption that the CDL predicates faithfully encode all necessary 3D relations (depth ordering, hidden-line intersections, coplanarity, occlusion) from 2D diagram projections. The manuscript provides no explicit validation, completeness proof, or failure-case analysis demonstrating that the parsed CDL inputs are sufficient for the theorem bank to derive correct solutions in all cases.
[Experiments / Results] The evaluation reports aggregate accuracies but lacks an ablation study or error breakdown that isolates the contribution of the formal reasoning component versus the neural parsing component. Without this, it is difficult to confirm that the performance gains over pure MLLMs derive from the verifiable theorem-based reasoning rather than improved diagram parsing alone.

minor comments (2)

[Abstract / Datasets] Clarify the exact size, selection criteria, and annotation protocol for the MathVerse-Solid subset so readers can assess the scope and difficulty of that evaluation.
[Method] Add a short table or figure illustrating a concrete solid-geometry example with its CDL predicate list and the corresponding theorem-bank derivation to make the Parse2Reason pipeline more transparent.

Simulated Author's Rebuttal

2 responses · 0 unresolved

We thank the referee for the detailed and constructive review. We appreciate the acknowledgment of the potential significance of the CDL-based framework and Parse2Reason pipeline for verifiable solid geometry reasoning. Below we respond point-by-point to the major comments. We will revise the manuscript to strengthen the presentation of validation and experimental analysis.

read point-by-point responses

Referee: The SOTA claims on SolidFGeo2k rest on the unverified assumption that the CDL predicates faithfully encode all necessary 3D relations (depth ordering, hidden-line intersections, coplanarity, occlusion) from 2D diagram projections. The manuscript provides no explicit validation, completeness proof, or failure-case analysis demonstrating that the parsed CDL inputs are sufficient for the theorem bank to derive correct solutions in all cases.

Authors: We agree that an explicit validation section would improve clarity. The CDL predicate library (detailed in Section 3.1) was constructed in collaboration with geometry experts to cover depth ordering, occlusion, coplanarity, hidden-line intersections, and other 3D relations via predicates such as DepthOrder, OccludedBy, Coplanar, and IntersectHidden. The SolidFGeo2k dataset was annotated by experts using these predicates, and the theorem bank was verified to derive solutions from the resulting CDL representations. To directly address the concern, we will add a new subsection (e.g., 4.3) that includes (i) a coverage argument mapping common 3D spatial relations to CDL predicates, (ii) quantitative statistics on predicate usage in the dataset, and (iii) a failure-case analysis of parsing errors and their impact on downstream reasoning. This revision will be included in the next version. revision: yes
Referee: The evaluation reports aggregate accuracies but lacks an ablation study or error breakdown that isolates the contribution of the formal reasoning component versus the neural parsing component. Without this, it is difficult to confirm that the performance gains over pure MLLMs derive from the verifiable theorem-based reasoning rather than improved diagram parsing alone.

Authors: We concur that isolating the contributions would strengthen the experimental claims. The current results compare the full Parse2Reason pipeline against MLLMs that perform both parsing and reasoning in a single forward pass, thereby indirectly highlighting the benefit of the symbolic theorem bank. However, we acknowledge the value of a direct ablation. In the revised manuscript we will add an ablation study that (a) replaces the relational inference and algebraic computation steps with direct answer generation from the parsed CDL (to measure the isolated effect of the theorem bank), and (b) reports error breakdowns categorized by parsing failures versus reasoning failures. These additions will appear in Section 5 and the supplementary material. revision: yes

Circularity Check

0 steps flagged

No circularity: empirical performance claims rest on held-out evaluation of an independent symbolic framework

full rationale

The paper defines a CDL predicate library and theorem bank as a formal representation layer, then applies Parse2Reason (parse diagrams/text to CDL, then relational/algebraic inference) to produce verifiable reasoning traces. Reported accuracies (77.3% SolidFGeo2k, 84.1% MathVerse-Solid, 80.2% PlaneFGeo3k) are measured on expert-annotated held-out test sets and compared against MLLM baselines; these numbers are not obtained by fitting parameters to the target metrics or by renaming any internal quantity as a prediction. No equation or step reduces the central result to its own inputs by construction, and the theorem bank is presented as an external, reusable resource rather than a self-citation chain that carries the performance claim.

Axiom & Free-Parameter Ledger

0 free parameters · 1 axioms · 1 invented entities

The approach depends on the assumption that solid geometry problems can be completely captured by a finite predicate library and that the theorem bank contains all necessary inference rules; no numerical free parameters are introduced.

axioms (1)

domain assumption Standard Euclidean solid geometry theorems and spatial relations can be expressed as a finite set of predicates and inference rules.
The theorem bank and CDL are built on this premise to enable relational inference.

invented entities (1)

Conditional Description Language (CDL) no independent evidence
purpose: Formalized language to represent geometric conditions from both text and diagrams.
New predicate-based language introduced to bridge natural language, images, and symbolic reasoning.

pith-pipeline@v0.9.0 · 5878 in / 1372 out tokens · 54223 ms · 2026-05-20T22:48:47.645837+00:00 · methodology

discussion (0)

Lean theorems connected to this paper

Citations machine-checked in the Pith Canon. Every link opens the source theorem in the public Lean library.

IndisputableMonolith/Foundation/AlexanderDuality.lean alexander_duality_circle_linking echoes

?

echoes
ECHOES: this paper passage has the same mathematical shape or conceptual pattern as the Recognition theorem, but is not a direct formal dependency.

we introduce Hilbert-Geo, the first unified formal language framework for solid geometry, including an extensive predicate library and a dedicated theorem bank... CDL... to represent both problem description and solid diagrams

What do these tags mean?

matches: The paper's claim is directly supported by a theorem in the formal canon.
supports: The theorem supports part of the paper's argument, but the paper may add assumptions or extra steps.
extends: The paper goes beyond the formal theorem; the theorem is a base layer rather than the whole result.
uses: The paper appears to rely on the theorem as machinery.
contradicts: The paper's claim conflicts with a theorem or certificate in the canon.
unclear: Pith found a possible connection, but the passage is too broad, indirect, or ambiguous to say the theorem truly supports the claim.

Reference graph

Works this paper leans on

61 extracted references · 61 canonical work pages · 5 internal anchors

[1]

Claude 3.7 sonnet system card.https : //www.anthropic.com/claude- 3- 7- sonnet- system-card, 2025

Anthropic. Claude 3.7 sonnet system card.https : //www.anthropic.com/claude- 3- 7- sonnet- system-card, 2025. System card for Claude 3.7 Sonnet

work page 2025
[2]

Arnon, George E

Dennis S. Arnon, George E. Collins, and Scott McCallum. Cylindrical algebraic decomposition i: The basic algorithm. SIAM Journal on Computing, 13(4):865–877, 1984

work page 1984
[3]

Birkh ¨auser Basel, 2004

Lucian B ˘adescu.Projective Geometry and Formal Geome- try. Birkh ¨auser Basel, 2004

work page 2004
[4]

Qwen2.5-VL Technical Report

Shuai Bai, Keqin Chen, Xuejing Liu, et al. Qwen2.5-VL technical report.arXiv preprint arXiv:2502.13923, 2025

work page internal anchor Pith review Pith/arXiv arXiv 2025
[5]

Relational inductive biases, deep learning, and graph networks

Peter W. Battaglia, Jessica B. Hamrick, Victor Bapst, et al. Relational inductive biases, deep learning, and graph net- works.arXiv preprint arXiv:1806.01261, 2018

work page internal anchor Pith review Pith/arXiv arXiv 2018
[6]

Xing, and Liang Lin

Jiaqi Chen, Jianheng Tang, Jinghui Qin, Xiaodan Liang, Lingbo Liu, Eric P. Xing, and Liang Lin. GeoQA: A ge- ometric question answering benchmark towards multimodal numerical reasoning. InFindings of the Association for Com- putational Linguistics: ACL-IJCNLP 2021, pages 513–523, 2021

work page 2021
[7]

UniGeo: Unify- ing geometry logical reasoning via reformulating mathemat- ical expression

Jiaqi Chen, Tong Li, Jinghui Qin, et al. UniGeo: Unify- ing geometry logical reasoning via reformulating mathemat- ical expression. InProceedings of the 2022 Conference on Empirical Methods in Natural Language Processing, pages 3313–3323, 2022

work page 2022
[8]

Do NOT think that much for 2+3=? on the overthinking of long reasoning models

Xingyu Chen, Jiahao Xu, Tian Liang, et al. Do NOT think that much for 2+3=? on the overthinking of long reasoning models. InProceedings of the 42nd International Confer- ence on Machine Learning, pages 9487–9499. PMLR, 2025

work page 2025
[9]

A coefficient of agreement for nominal scales

Jacob Cohen. A coefficient of agreement for nominal scales. Educational and Psychological Measurement, 20(1):37–46, 1960

work page 1960
[10]

DeepSeek-V3 Technical Report

DeepSeek-AI. DeepSeek-V3 technical report.arXiv preprint arXiv:2412.19437, 2024

work page internal anchor Pith review Pith/arXiv arXiv 2024
[11]

Gemini 2.5: Updates to our family of thinking mod- els.https://developers.googleblog.com/en/ gemini- 2- 5- thinking- model- updates/, 2025

Google. Gemini 2.5: Updates to our family of thinking mod- els.https://developers.googleblog.com/en/ gemini- 2- 5- thinking- model- updates/, 2025. Introduces Gemini 2.5 Pro and Gemini 2.5 Flash updates

work page 2025
[12]

Large language models: A comprehensive survey of its applications, challenges, limitations, and future prospects

Muhammad Usman Hadi, Qasem Al Tashi, Abbas Shah, et al. Large language models: A comprehensive survey of its applications, challenges, limitations, and future prospects. TechRxiv, 2024. Preprint, version 6

work page 2024
[13]

Formal verification meth- ods

Osman Hasan and Sofiene Tahar. Formal verification meth- ods. InEncyclopedia of Information Science and Technol- ogy, Third Edition, pages 7162–7170. IGI Global Scientific Publishing, 2015

work page 2015
[14]

Springer Tokyo, 2014

Takayuki Hibi, editor.Gr ¨obner Bases: Statistics and Soft- ware Systems. Springer Tokyo, 2014. Copyright 2013

work page 2014
[15]

Solving ge- ometry problems via feature learning and contrastive learn- ing of multimodal data.Computer Modeling in Engineering & Sciences, 136(2):1707–1728, 2023

Pengpeng Jian, Fucheng Guo, Yanli Wang, et al. Solving ge- ometry problems via feature learning and contrastive learn- ing of multimodal data.Computer Modeling in Engineering & Sciences, 136(2):1707–1728, 2023

work page 2023
[16]

A survey on large language models for code generation.ACM Transactions on Software Engineering and Methodology, 35(2):1–72, 2026

Juyong Jiang, Fan Wang, Jiasi Shen, et al. A survey on large language models for code generation.ACM Transactions on Software Engineering and Methodology, 35(2):1–72, 2026

work page 2026
[17]

Vidhalluc: Evaluating temporal hallucinations in multimodal large lan- guage models for video understanding

Chaoyu Li, Eun Woo Im, Pooyan Fazli, et al. Vidhalluc: Evaluating temporal hallucinations in multimodal large lan- guage models for video understanding. InProceedings of the IEEE/CVF Conference on Computer Vision and Pattern Recognition (CVPR), pages 13723–13733, 2025

work page 2025
[18]

A survey on deep learning for theorem proving

Zhaoyu Li, Jialiang Sun, Logan Murphy, et al. A survey on deep learning for theorem proving. InProceedings of the First Conference on Language Modeling, 2024

work page 2024
[19]

Inter-GPS: In- terpretable geometry problem solving with formal language and symbolic reasoning

Pan Lu, Ran Gong, Shibiao Jiang, et al. Inter-GPS: In- terpretable geometry problem solving with formal language and symbolic reasoning. InProceedings of the 59th An- nual Meeting of the Association for Computational Linguis- tics and the 11th International Joint Conference on Natural Language Processing (Volume 1: Long Papers), pages 6774– 6786, Online, ...

work page 2021
[20]

MathVista: Evalu- ating mathematical reasoning of foundation models in visual contexts

Pan Lu, Hritik Bansal, Tony Xia, et al. MathVista: Evalu- ating mathematical reasoning of foundation models in visual contexts. InThe Twelfth International Conference on Learn- ing Representations, 2024. Oral presentation

work page 2024
[21]

Llama 3.3 model cards and prompt formats, 2024

Meta. Llama 3.3 model cards and prompt formats, 2024. Of- ficial Meta documentation for Llama 3.3, release date: De- cember 6, 2024

work page 2024
[22]

Autofor- malizing euclidean geometry

Logan Murphy, Kaiyu Yang, Jialiang Sun, et al. Autofor- malizing euclidean geometry. InProceedings of the 41st In- ternational Conference on Machine Learning, pages 36847– 36893. PMLR, 2024

work page 2024
[23]

A com- prehensive overview of large language models.ACM Trans- actions on Intelligent Systems and Technology, 16(5):1–72, 2025

Humza Naveed, Asad Ullah Khan, Shi Qiu, et al. A com- prehensive overview of large language models.ACM Trans- actions on Intelligent Systems and Technology, 16(5):1–72, 2025

work page 2025
[24]

A symbolic characters aware model for solving ge- ometry problems

Maizhen Ning, Qiu-Feng Wang, Kaizhu Huang, and Xiaowei Huang. A symbolic characters aware model for solving ge- ometry problems. InProceedings of the 31st ACM Inter- national Conference on Multimedia (MM ’23), pages 7767– 7775, New York, NY , USA, 2023. ACM

work page 2023
[25]

GNS: Solving plane geometry problems by neural-symbolic reasoning with multi-modal llms

Maizhen Ning, Zihao Zhou, Qiufeng Wang, Xiaowei Huang, and Kaizhu Huang. GNS: Solving plane geometry problems by neural-symbolic reasoning with multi-modal llms. InPro- ceedings of the AAAI Conference on Artificial Intelligence, pages 24957–24965, 2025

work page 2025
[26]

Hello gpt-4o.https : / / openai

OpenAI. Hello gpt-4o.https : / / openai . com / index/hello-gpt-4o/, 2024. OpenAI announcement

work page 2024
[27]

Introducing gpt-5.https://openai.com/ index/introducing- gpt- 5/, 2025

OpenAI. Introducing gpt-5.https://openai.com/ index/introducing- gpt- 5/, 2025. OpenAI an- nouncement

work page 2025
[28]

GPT-5 system card.https://openai.com/ index/gpt-5-system-card/, 2025

OpenAI. GPT-5 system card.https://openai.com/ index/gpt-5-system-card/, 2025. OpenAI system card

work page 2025
[29]

Pittalis and C

M. Pittalis and C. Christou. Types of reasoning in 3d geom- etry thinking and their relation with spatial ability.Educa- tional Studies in Mathematics, 75(2):191–212, 2010

work page 2010
[30]

Gemini: A Family of Highly Capable Multimodal Models

Gemini Team. Gemini: A family of highly capable multi- modal models.arXiv preprint arXiv:2312.11805, 2023

work page internal anchor Pith review Pith/arXiv arXiv 2023
[31]

Measuring multi- modal mathematical reasoning with math-vision dataset

Ke Wang, Junting Pan, Weikang Shi, et al. Measuring multi- modal mathematical reasoning with math-vision dataset. In NeurIPS 2024 Datasets and Benchmarks Track, 2024

work page 2024
[32]

SolidGeo: Measuring multimodal spatial math reasoning in solid ge- ometry

Peijie Wang, Chao Yang, Zhong-Zhi Li, et al. SolidGeo: Measuring multimodal spatial math reasoning in solid ge- ometry. InNeurIPS 2025 Datasets and Benchmarks Track,

work page 2025
[33]

Thoughts are all over the place: On the underthinking of o1-like llms.arXiv preprint arXiv:2501.18585, 2025

Yue Wang, Qiuzhi Liu, Jiahao Xu, Tian Liang, Xingyu Chen, Zhiwei He, Linfeng Song, Dian Yu, Juntao Li, Zhuosheng Zhang, et al. Thoughts are all over the place: On the under- thinking of o1-like llms.arXiv preprint arXiv:2501.18585, 2025

work page arXiv 2025
[34]

A survey on large language models for recommendation.World Wide Web, 27(5):60, 2024

Likang Wu, Zhi Zheng, Zhaopeng Qiu, et al. A survey on large language models for recommendation.World Wide Web, 27(5):60, 2024

work page 2024
[35]

NesyGeo: A neuro-symbolic framework for multimodal geometric reasoning data generation

Weiming Wu, Jiachen Ye, Zihao Wang, Ziyi Zhou, Yifan Li, and Luzhen Guo. Nesygeo: A neuro-symbolic framework for multimodal geometric reasoning data generation.arXiv preprint arXiv:2505.17121, 2025

work page arXiv 2025
[36]

GeoX: Ge- ometric problem solving through unified formalized vision- language pre-training

Renqiu Xia, Mingsheng Li, Hancheng Ye, et al. GeoX: Ge- ometric problem solving through unified formalized vision- language pre-training. InThe Thirteenth International Con- ference on Learning Representations, 2025

work page 2025
[37]

Math- Verse: Does your multi-modal llm truly see the diagrams in visual math problems? InEuropean Conference on Com- puter Vision, pages 169–186

Renrui Zhang, Dongzhi Jiang, Yichi Zhang, et al. Math- Verse: Does your multi-modal llm truly see the diagrams in visual math problems? InEuropean Conference on Com- puter Vision, pages 169–186. Springer, 2024

work page 2024
[38]

FormalGeo: An extensible formalized framework for olympiad geometric problem solving.arXiv preprint arXiv:2310.18021, 2023

Xiaokai Zhang, Na Zhu, Yiming He, et al. FormalGeo: An extensible formalized framework for olympiad geometric problem solving.arXiv preprint arXiv:2310.18021, 2023

work page arXiv 2023
[39]

Pi-GPS: Enhancing geometry problem solving by unleashing the power of diagrammatic information

Junbo Zhao, Ting Zhang, Jiayu Sun, Mi Tian, and Hua Huang. Pi-GPS: Enhancing geometry problem solving by unleashing the power of diagrammatic information. In Proceedings of the IEEE/CVF International Conference on Computer Vision (ICCV), pages 1526–1536, 2025

work page 2025
[40]

A Survey of Large Language Models

Wayne Xin Zhao, Kun Zhou, Junyi Li, et al. A survey of large language models.arXiv preprint arXiv:2303.18223, 2023. Hilbert-Geo: Solving Solid Geometric Problems by Neural-Symbolic Reasoning Supplementary Material A. Solid Geometry Formal Language A.1. Formal Geometry Representation In the domain of solid geometry, simple geometric bodies serve as fundame...

work page internal anchor Pith review Pith/arXiv arXiv 2023
[41]

LetR A andR B be face sets of two valid polyhedra

work page
[42]

Execute the operationR result =R A ⊕3D RB

work page
[43]

This operation removes the internal contact faces (S A andS B) and retains all external surfaces

work page
[44]

According to the generalization of Euler’s formula for manifolds, when two closed manifolds are glued along a simply connected face and that face is removed, the remaining surface still constitutes a closed 2-manifold (i.e., the boundary of the new polyhedron)

work page
[45]

∀RA, RB ∈S,(R A ⊕3D RB)∈S(5) Theorem A.2.R A ⊕3D RB =R B ⊕3D RA

Therefore,R result remains a set of faces describing a closed solid. ∀RA, RB ∈S,(R A ⊕3D RB)∈S(5) Theorem A.2.R A ⊕3D RB =R B ⊕3D RA. Proof.LetR A containmfaces andR B containnfaces. Based on Eq. 4: RA ⊕3D RB ={f|f∈R A ∪R B, f̸=S A, f̸=S B}(6) Now consider the reverse operationR B ⊕3D RA: RB ⊕3D RA = (R B \ {SB})∪(R A \ {SA})(7) According to set algebra, ...

work page
[46]

PolyhedraAandBshare interface faces(S AB, SBA)

work page
[47]

par- allel

PolyhedraBandCshare interface faces(S BC , SCB ). 3.R A, RB, RC are their respective face sets. Left Hand Side: LetR AB =R A ⊕3D RB. RAB = (R A ∪R B)\ {S AB, SBA}(10) Next, calculateR AB ⊕3D RC. The contact interface in- volvesBandC(i.e.,S BC andS CB ): (RA ⊕3D RB)⊕ 3D RC = (R AB ∪R C)\ {S BC , SCB } (11) = ((R A ∪R B)\ {S AB, SBA} ∪R C)\ {S BC , SCB } (1...

work page
[48]

- image_cdl MUST include only facts directly observable from the image (e.g., length labels, right-angle marks, shape recognition)

Information Source Separation: - text_cdl MUST include only facts extracted from the natural language description. - image_cdl MUST include only facts directly observable from the image (e.g., length labels, right-angle marks, shape recognition). - If a fact appears in both text and image, include it in both fields

work page
[49]

construction_cdl - Geometric construction predicates (IMPORTANT): construction_cdl defines basic construction for entities, and MUST include the following types where applicable: - Shape predicates: define edges/segments of shapes * For segments/edges: Shape(AB,BC,CD,DA) or Shape(OP,PO) or Shape(PQ,QP) * For points (spheres etc.): Shape(O) or Shape(P) * E...

work page
[50]

10", "36 *pi

Answer formatting: - problem_answer MUST be a pure number or expression (e.g., "10", "36 *pi"), and MUST NOT contain units or extra text

work page
[51]

Core predicate logic: - Length/Height: Equal(LengthOfLine(A,B),5), Equal(HeightOfCone(O,P),12) - Relations: PerpendicularBetweenLine(A,B,C,D), ParallelBetweenLine(A,B,C,D) - Goal: the requested quantity MUST be wrapped by Value(...)

work page
[52]

- Quantities allowed in CDL expressions are LIMITED to standard forms: VolumeOfCone, SurfaceAreaOfCylinder, AreaOfCircle, LengthOfLine, etc

Predicate and Operator Legality (CRITICAL): - Only reuse names from the official predicate list; DO NOT invent new construction predicates. - Quantities allowed in CDL expressions are LIMITED to standard forms: VolumeOfCone, SurfaceAreaOfCylinder, AreaOfCircle, LengthOfLine, etc. - Only the following algebraic operators are allowed: Value, Add, Sub, Mul, ...

work page
[53]

Important: Output Requirements

Completeness Checks: - Ensure every entity used by text_cdl/image_cdl exists in construction_cdl - Ensure the target entity in goal_cdl exists in the construction as well - Self-check after generation: verify all predicates/operators are allowed, no extra spaces, and no undeclared entities are referenced. Important: Output Requirements

work page
[54]

You MUST output a complete JSON object with all required fields

work page
[55]

All CDL fields MUST be arrays of strings

work page
[56]

Value(VolumeOfCone(O,P))

goal_cdl MUST be a string (e.g., "Value(VolumeOfCone(O,P))") C.4.2. Direct Problem Solving Prompt In addition to CDL generation, the system also supportsdi- rect problem solvingusing GPT models fortesting model accuracy. This approach bypasses formalization and di- rectly generates answers to geometry problems, providing a baseline for comparison with for...

work page
[57]

Carefully analyze the problem text and the accompanying image

work page
[58]

Show your reasoning process step by step

work page
[59]

At the end, provide your final answer in a clear format

work page
[60]

10", "5.5

**Your final answer should be ONLY a number or mathematical expression (like "10", "5.5", "12 *pi", "36 *pi"), without any units or text **

work page
[61]

FINAL ANSWER:

Put your final answer on a line starting with "FINAL ANSWER: " Example format: FINAL ANSWER: 10 or FINAL ANSWER: 36 *pi Now, please solve this problem: D. SGRE Supplementary Information D.1. Theorem Search Tree and Search Process Figure 24. Theorem Search Tree and a inference demonstration is shown in fig. 28 The search process involves constructing a sea...

work page

[1] [1]

Claude 3.7 sonnet system card.https : //www.anthropic.com/claude- 3- 7- sonnet- system-card, 2025

Anthropic. Claude 3.7 sonnet system card.https : //www.anthropic.com/claude- 3- 7- sonnet- system-card, 2025. System card for Claude 3.7 Sonnet

work page 2025

[2] [2]

Arnon, George E

Dennis S. Arnon, George E. Collins, and Scott McCallum. Cylindrical algebraic decomposition i: The basic algorithm. SIAM Journal on Computing, 13(4):865–877, 1984

work page 1984

[3] [3]

Birkh ¨auser Basel, 2004

Lucian B ˘adescu.Projective Geometry and Formal Geome- try. Birkh ¨auser Basel, 2004

work page 2004

[4] [4]

Qwen2.5-VL Technical Report

Shuai Bai, Keqin Chen, Xuejing Liu, et al. Qwen2.5-VL technical report.arXiv preprint arXiv:2502.13923, 2025

work page internal anchor Pith review Pith/arXiv arXiv 2025

[5] [5]

Relational inductive biases, deep learning, and graph networks

Peter W. Battaglia, Jessica B. Hamrick, Victor Bapst, et al. Relational inductive biases, deep learning, and graph net- works.arXiv preprint arXiv:1806.01261, 2018

work page internal anchor Pith review Pith/arXiv arXiv 2018

[6] [6]

Xing, and Liang Lin

Jiaqi Chen, Jianheng Tang, Jinghui Qin, Xiaodan Liang, Lingbo Liu, Eric P. Xing, and Liang Lin. GeoQA: A ge- ometric question answering benchmark towards multimodal numerical reasoning. InFindings of the Association for Com- putational Linguistics: ACL-IJCNLP 2021, pages 513–523, 2021

work page 2021

[7] [7]

UniGeo: Unify- ing geometry logical reasoning via reformulating mathemat- ical expression

Jiaqi Chen, Tong Li, Jinghui Qin, et al. UniGeo: Unify- ing geometry logical reasoning via reformulating mathemat- ical expression. InProceedings of the 2022 Conference on Empirical Methods in Natural Language Processing, pages 3313–3323, 2022

work page 2022

[8] [8]

Do NOT think that much for 2+3=? on the overthinking of long reasoning models

Xingyu Chen, Jiahao Xu, Tian Liang, et al. Do NOT think that much for 2+3=? on the overthinking of long reasoning models. InProceedings of the 42nd International Confer- ence on Machine Learning, pages 9487–9499. PMLR, 2025

work page 2025

[9] [9]

A coefficient of agreement for nominal scales

Jacob Cohen. A coefficient of agreement for nominal scales. Educational and Psychological Measurement, 20(1):37–46, 1960

work page 1960

[10] [10]

DeepSeek-V3 Technical Report

DeepSeek-AI. DeepSeek-V3 technical report.arXiv preprint arXiv:2412.19437, 2024

work page internal anchor Pith review Pith/arXiv arXiv 2024

[11] [11]

Gemini 2.5: Updates to our family of thinking mod- els.https://developers.googleblog.com/en/ gemini- 2- 5- thinking- model- updates/, 2025

Google. Gemini 2.5: Updates to our family of thinking mod- els.https://developers.googleblog.com/en/ gemini- 2- 5- thinking- model- updates/, 2025. Introduces Gemini 2.5 Pro and Gemini 2.5 Flash updates

work page 2025

[12] [12]

Large language models: A comprehensive survey of its applications, challenges, limitations, and future prospects

Muhammad Usman Hadi, Qasem Al Tashi, Abbas Shah, et al. Large language models: A comprehensive survey of its applications, challenges, limitations, and future prospects. TechRxiv, 2024. Preprint, version 6

work page 2024

[13] [13]

Formal verification meth- ods

Osman Hasan and Sofiene Tahar. Formal verification meth- ods. InEncyclopedia of Information Science and Technol- ogy, Third Edition, pages 7162–7170. IGI Global Scientific Publishing, 2015

work page 2015

[14] [14]

Springer Tokyo, 2014

Takayuki Hibi, editor.Gr ¨obner Bases: Statistics and Soft- ware Systems. Springer Tokyo, 2014. Copyright 2013

work page 2014

[15] [15]

Solving ge- ometry problems via feature learning and contrastive learn- ing of multimodal data.Computer Modeling in Engineering & Sciences, 136(2):1707–1728, 2023

Pengpeng Jian, Fucheng Guo, Yanli Wang, et al. Solving ge- ometry problems via feature learning and contrastive learn- ing of multimodal data.Computer Modeling in Engineering & Sciences, 136(2):1707–1728, 2023

work page 2023

[16] [16]

A survey on large language models for code generation.ACM Transactions on Software Engineering and Methodology, 35(2):1–72, 2026

Juyong Jiang, Fan Wang, Jiasi Shen, et al. A survey on large language models for code generation.ACM Transactions on Software Engineering and Methodology, 35(2):1–72, 2026

work page 2026

[17] [17]

Vidhalluc: Evaluating temporal hallucinations in multimodal large lan- guage models for video understanding

Chaoyu Li, Eun Woo Im, Pooyan Fazli, et al. Vidhalluc: Evaluating temporal hallucinations in multimodal large lan- guage models for video understanding. InProceedings of the IEEE/CVF Conference on Computer Vision and Pattern Recognition (CVPR), pages 13723–13733, 2025

work page 2025

[18] [18]

A survey on deep learning for theorem proving

Zhaoyu Li, Jialiang Sun, Logan Murphy, et al. A survey on deep learning for theorem proving. InProceedings of the First Conference on Language Modeling, 2024

work page 2024

[19] [19]

Inter-GPS: In- terpretable geometry problem solving with formal language and symbolic reasoning

Pan Lu, Ran Gong, Shibiao Jiang, et al. Inter-GPS: In- terpretable geometry problem solving with formal language and symbolic reasoning. InProceedings of the 59th An- nual Meeting of the Association for Computational Linguis- tics and the 11th International Joint Conference on Natural Language Processing (Volume 1: Long Papers), pages 6774– 6786, Online, ...

work page 2021

[20] [20]

MathVista: Evalu- ating mathematical reasoning of foundation models in visual contexts

Pan Lu, Hritik Bansal, Tony Xia, et al. MathVista: Evalu- ating mathematical reasoning of foundation models in visual contexts. InThe Twelfth International Conference on Learn- ing Representations, 2024. Oral presentation

work page 2024

[21] [21]

Llama 3.3 model cards and prompt formats, 2024

Meta. Llama 3.3 model cards and prompt formats, 2024. Of- ficial Meta documentation for Llama 3.3, release date: De- cember 6, 2024

work page 2024

[22] [22]

Autofor- malizing euclidean geometry

Logan Murphy, Kaiyu Yang, Jialiang Sun, et al. Autofor- malizing euclidean geometry. InProceedings of the 41st In- ternational Conference on Machine Learning, pages 36847– 36893. PMLR, 2024

work page 2024

[23] [23]

A com- prehensive overview of large language models.ACM Trans- actions on Intelligent Systems and Technology, 16(5):1–72, 2025

Humza Naveed, Asad Ullah Khan, Shi Qiu, et al. A com- prehensive overview of large language models.ACM Trans- actions on Intelligent Systems and Technology, 16(5):1–72, 2025

work page 2025

[24] [24]

A symbolic characters aware model for solving ge- ometry problems

Maizhen Ning, Qiu-Feng Wang, Kaizhu Huang, and Xiaowei Huang. A symbolic characters aware model for solving ge- ometry problems. InProceedings of the 31st ACM Inter- national Conference on Multimedia (MM ’23), pages 7767– 7775, New York, NY , USA, 2023. ACM

work page 2023

[25] [25]

GNS: Solving plane geometry problems by neural-symbolic reasoning with multi-modal llms

Maizhen Ning, Zihao Zhou, Qiufeng Wang, Xiaowei Huang, and Kaizhu Huang. GNS: Solving plane geometry problems by neural-symbolic reasoning with multi-modal llms. InPro- ceedings of the AAAI Conference on Artificial Intelligence, pages 24957–24965, 2025

work page 2025

[26] [26]

Hello gpt-4o.https : / / openai

OpenAI. Hello gpt-4o.https : / / openai . com / index/hello-gpt-4o/, 2024. OpenAI announcement

work page 2024

[27] [27]

Introducing gpt-5.https://openai.com/ index/introducing- gpt- 5/, 2025

OpenAI. Introducing gpt-5.https://openai.com/ index/introducing- gpt- 5/, 2025. OpenAI an- nouncement

work page 2025

[28] [28]

GPT-5 system card.https://openai.com/ index/gpt-5-system-card/, 2025

OpenAI. GPT-5 system card.https://openai.com/ index/gpt-5-system-card/, 2025. OpenAI system card

work page 2025

[29] [29]

Pittalis and C

M. Pittalis and C. Christou. Types of reasoning in 3d geom- etry thinking and their relation with spatial ability.Educa- tional Studies in Mathematics, 75(2):191–212, 2010

work page 2010

[30] [30]

Gemini: A Family of Highly Capable Multimodal Models

Gemini Team. Gemini: A family of highly capable multi- modal models.arXiv preprint arXiv:2312.11805, 2023

work page internal anchor Pith review Pith/arXiv arXiv 2023

[31] [31]

Measuring multi- modal mathematical reasoning with math-vision dataset

Ke Wang, Junting Pan, Weikang Shi, et al. Measuring multi- modal mathematical reasoning with math-vision dataset. In NeurIPS 2024 Datasets and Benchmarks Track, 2024

work page 2024

[32] [32]

SolidGeo: Measuring multimodal spatial math reasoning in solid ge- ometry

Peijie Wang, Chao Yang, Zhong-Zhi Li, et al. SolidGeo: Measuring multimodal spatial math reasoning in solid ge- ometry. InNeurIPS 2025 Datasets and Benchmarks Track,

work page 2025

[33] [33]

Thoughts are all over the place: On the underthinking of o1-like llms.arXiv preprint arXiv:2501.18585, 2025

Yue Wang, Qiuzhi Liu, Jiahao Xu, Tian Liang, Xingyu Chen, Zhiwei He, Linfeng Song, Dian Yu, Juntao Li, Zhuosheng Zhang, et al. Thoughts are all over the place: On the under- thinking of o1-like llms.arXiv preprint arXiv:2501.18585, 2025

work page arXiv 2025

[34] [34]

A survey on large language models for recommendation.World Wide Web, 27(5):60, 2024

Likang Wu, Zhi Zheng, Zhaopeng Qiu, et al. A survey on large language models for recommendation.World Wide Web, 27(5):60, 2024

work page 2024

[35] [35]

NesyGeo: A neuro-symbolic framework for multimodal geometric reasoning data generation

Weiming Wu, Jiachen Ye, Zihao Wang, Ziyi Zhou, Yifan Li, and Luzhen Guo. Nesygeo: A neuro-symbolic framework for multimodal geometric reasoning data generation.arXiv preprint arXiv:2505.17121, 2025

work page arXiv 2025

[36] [36]

GeoX: Ge- ometric problem solving through unified formalized vision- language pre-training

Renqiu Xia, Mingsheng Li, Hancheng Ye, et al. GeoX: Ge- ometric problem solving through unified formalized vision- language pre-training. InThe Thirteenth International Con- ference on Learning Representations, 2025

work page 2025

[37] [37]

Math- Verse: Does your multi-modal llm truly see the diagrams in visual math problems? InEuropean Conference on Com- puter Vision, pages 169–186

Renrui Zhang, Dongzhi Jiang, Yichi Zhang, et al. Math- Verse: Does your multi-modal llm truly see the diagrams in visual math problems? InEuropean Conference on Com- puter Vision, pages 169–186. Springer, 2024

work page 2024

[38] [38]

FormalGeo: An extensible formalized framework for olympiad geometric problem solving.arXiv preprint arXiv:2310.18021, 2023

Xiaokai Zhang, Na Zhu, Yiming He, et al. FormalGeo: An extensible formalized framework for olympiad geometric problem solving.arXiv preprint arXiv:2310.18021, 2023

work page arXiv 2023

[39] [39]

Pi-GPS: Enhancing geometry problem solving by unleashing the power of diagrammatic information

Junbo Zhao, Ting Zhang, Jiayu Sun, Mi Tian, and Hua Huang. Pi-GPS: Enhancing geometry problem solving by unleashing the power of diagrammatic information. In Proceedings of the IEEE/CVF International Conference on Computer Vision (ICCV), pages 1526–1536, 2025

work page 2025

[40] [40]

A Survey of Large Language Models

Wayne Xin Zhao, Kun Zhou, Junyi Li, et al. A survey of large language models.arXiv preprint arXiv:2303.18223, 2023. Hilbert-Geo: Solving Solid Geometric Problems by Neural-Symbolic Reasoning Supplementary Material A. Solid Geometry Formal Language A.1. Formal Geometry Representation In the domain of solid geometry, simple geometric bodies serve as fundame...

work page internal anchor Pith review Pith/arXiv arXiv 2023

[41] [41]

LetR A andR B be face sets of two valid polyhedra

work page

[42] [42]

Execute the operationR result =R A ⊕3D RB

work page

[43] [43]

This operation removes the internal contact faces (S A andS B) and retains all external surfaces

work page

[44] [44]

According to the generalization of Euler’s formula for manifolds, when two closed manifolds are glued along a simply connected face and that face is removed, the remaining surface still constitutes a closed 2-manifold (i.e., the boundary of the new polyhedron)

work page

[45] [45]

∀RA, RB ∈S,(R A ⊕3D RB)∈S(5) Theorem A.2.R A ⊕3D RB =R B ⊕3D RA

Therefore,R result remains a set of faces describing a closed solid. ∀RA, RB ∈S,(R A ⊕3D RB)∈S(5) Theorem A.2.R A ⊕3D RB =R B ⊕3D RA. Proof.LetR A containmfaces andR B containnfaces. Based on Eq. 4: RA ⊕3D RB ={f|f∈R A ∪R B, f̸=S A, f̸=S B}(6) Now consider the reverse operationR B ⊕3D RA: RB ⊕3D RA = (R B \ {SB})∪(R A \ {SA})(7) According to set algebra, ...

work page

[46] [46]

PolyhedraAandBshare interface faces(S AB, SBA)

work page

[47] [47]

par- allel

PolyhedraBandCshare interface faces(S BC , SCB ). 3.R A, RB, RC are their respective face sets. Left Hand Side: LetR AB =R A ⊕3D RB. RAB = (R A ∪R B)\ {S AB, SBA}(10) Next, calculateR AB ⊕3D RC. The contact interface in- volvesBandC(i.e.,S BC andS CB ): (RA ⊕3D RB)⊕ 3D RC = (R AB ∪R C)\ {S BC , SCB } (11) = ((R A ∪R B)\ {S AB, SBA} ∪R C)\ {S BC , SCB } (1...

work page

[48] [48]

- image_cdl MUST include only facts directly observable from the image (e.g., length labels, right-angle marks, shape recognition)

Information Source Separation: - text_cdl MUST include only facts extracted from the natural language description. - image_cdl MUST include only facts directly observable from the image (e.g., length labels, right-angle marks, shape recognition). - If a fact appears in both text and image, include it in both fields

work page

[49] [49]

construction_cdl - Geometric construction predicates (IMPORTANT): construction_cdl defines basic construction for entities, and MUST include the following types where applicable: - Shape predicates: define edges/segments of shapes * For segments/edges: Shape(AB,BC,CD,DA) or Shape(OP,PO) or Shape(PQ,QP) * For points (spheres etc.): Shape(O) or Shape(P) * E...

work page

[50] [50]

10", "36 *pi

Answer formatting: - problem_answer MUST be a pure number or expression (e.g., "10", "36 *pi"), and MUST NOT contain units or extra text

work page

[51] [51]

Core predicate logic: - Length/Height: Equal(LengthOfLine(A,B),5), Equal(HeightOfCone(O,P),12) - Relations: PerpendicularBetweenLine(A,B,C,D), ParallelBetweenLine(A,B,C,D) - Goal: the requested quantity MUST be wrapped by Value(...)

work page

[52] [52]

- Quantities allowed in CDL expressions are LIMITED to standard forms: VolumeOfCone, SurfaceAreaOfCylinder, AreaOfCircle, LengthOfLine, etc

Predicate and Operator Legality (CRITICAL): - Only reuse names from the official predicate list; DO NOT invent new construction predicates. - Quantities allowed in CDL expressions are LIMITED to standard forms: VolumeOfCone, SurfaceAreaOfCylinder, AreaOfCircle, LengthOfLine, etc. - Only the following algebraic operators are allowed: Value, Add, Sub, Mul, ...

work page

[53] [53]

Important: Output Requirements

Completeness Checks: - Ensure every entity used by text_cdl/image_cdl exists in construction_cdl - Ensure the target entity in goal_cdl exists in the construction as well - Self-check after generation: verify all predicates/operators are allowed, no extra spaces, and no undeclared entities are referenced. Important: Output Requirements

work page

[54] [54]

You MUST output a complete JSON object with all required fields

work page

[55] [55]

All CDL fields MUST be arrays of strings

work page

[56] [56]

Value(VolumeOfCone(O,P))

goal_cdl MUST be a string (e.g., "Value(VolumeOfCone(O,P))") C.4.2. Direct Problem Solving Prompt In addition to CDL generation, the system also supportsdi- rect problem solvingusing GPT models fortesting model accuracy. This approach bypasses formalization and di- rectly generates answers to geometry problems, providing a baseline for comparison with for...

work page

[57] [57]

Carefully analyze the problem text and the accompanying image

work page

[58] [58]

Show your reasoning process step by step

work page

[59] [59]

At the end, provide your final answer in a clear format

work page

[60] [60]

10", "5.5

**Your final answer should be ONLY a number or mathematical expression (like "10", "5.5", "12 *pi", "36 *pi"), without any units or text **

work page

[61] [61]

FINAL ANSWER:

Put your final answer on a line starting with "FINAL ANSWER: " Example format: FINAL ANSWER: 10 or FINAL ANSWER: 36 *pi Now, please solve this problem: D. SGRE Supplementary Information D.1. Theorem Search Tree and Search Process Figure 24. Theorem Search Tree and a inference demonstration is shown in fig. 28 The search process involves constructing a sea...

work page