pith. machine review for the scientific record. sign in

arxiv: 2605.14068 · v1 · submitted 2026-05-13 · 💻 cs.CV · cs.LG

Recognition: 1 theorem link

· Lean Theorem

CurveBench: A Benchmark for Exact Topological Reasoning over Nested Jordan Curves

Authors on Pith no claims yet

Pith reviewed 2026-05-15 05:30 UTC · model grok-4.3

classification 💻 cs.CV cs.LG
keywords CurveBenchJordan curvestopological reasoningcontainment treevision-language modelsnested regionsbenchmarkhierarchical structure
0
0 comments X

The pith

Vision-language models recover exact containment trees from nested Jordan curves at only 71 percent accuracy on easy cases and 19 percent on hard cases.

A machine-rendered reading of the paper's core claim, the machinery that carries it, and where it could break.

The paper presents CurveBench, a collection of 756 images showing non-intersecting Jordan curves in varied layouts, each annotated with a rooted tree that records which regions contain which others. It frames the task as full tree recovery from the image alone. Even the strongest tested model reaches 71.1 percent tree-generation accuracy on the easy subset and falls to 19.1 percent on the hard subset. Fine-tuning an open 8B vision-language model lifts easy-set performance to 33.3 percent and surpasses several larger closed models, yet the gap on complex configurations remains large.

Core claim

CurveBench supplies images of pairwise non-intersecting Jordan curves together with ground-truth rooted trees that encode the full hierarchy of containment among the bounded planar regions. The central finding is that current vision-language models cannot reliably reconstruct these trees, with top accuracy at 71.1 percent on easy images and 19.1 percent on hard images; targeted fine-tuning improves results but leaves the capability far from solved.

What carries the argument

The rooted containment tree that encodes the complete hierarchy of nesting relations among the planar regions delimited by the curves.

If this is right

  • Fine-tuning with RLVR-style methods raises tree-generation accuracy on the easy subset and can exceed some closed models.
  • Accuracy collapses on maze-like and dense counting configurations, exposing limits in handling complex nesting.
  • The benchmark supplies a concrete metric for measuring progress toward topology-aware visual reasoning.
  • A persistent performance shortfall after fine-tuning indicates the required capability is not yet present in current models.

Where Pith is reading between the lines

These are editorial extensions of the paper, not claims the author makes directly.

  • Topological containment appears to be a distinct visual reasoning skill that general training does not reliably produce.
  • The same tree-recovery format could be applied to test hierarchical understanding in maps, diagrams, or organizational charts.
  • Models that master this task may also improve on other spatial problems that require precise region relations rather than approximate visual matching.

Load-bearing premise

The generated images and their tree annotations isolate topological containment without supplying unintended low-level visual shortcuts or dataset-specific biases.

What would settle it

A model that consistently generates correct trees on the hard subset across new image styles and without prior exposure to similar curve data would show the reasoning gap has been closed.

Figures

Figures reproduced from arXiv: 2605.14068 by Amirreza Mohseni, Mona Mohammadi, Morteza Saghafian, Naser Talebizadeh Saradari.

Figure 1
Figure 1. Figure 1: Representative examples from each category within the CurveBench dataset [PITH_FULL_IMAGE:figures/full_fig_p004_1.png] view at source ↗
Figure 2
Figure 2. Figure 2: Tree-reward learning dynamics for trained models. Left: training set Right: eval set environments for CurveBench-Easy and CurveBench-Hard, ensuring that all models are evaluated under identical conditions. The released environments are listed in Appendix B.3. 6 Results [PITH_FULL_IMAGE:figures/full_fig_p007_2.png] view at source ↗
Figure 3
Figure 3. Figure 3: CurveBench Dataset: Hierarchical Distribution CurveBench is released as a collection of benchmark datasets for evaluating visual topological reasoning. The released resources include CurveBench-Easy and the main CurveBench benchmark used for the harder evaluation set￾ting. Review mode. This submission is intended for the single￾blind review option in the NeurIPS 2026 Evaluations & Datasets Track. CurveBenc… view at source ↗
Figure 4
Figure 4. Figure 4: Per-category success-rates. 16 [PITH_FULL_IMAGE:figures/full_fig_p016_4.png] view at source ↗
Figure 5
Figure 5. Figure 5: Stacked reward greakdown for CurveBench-Hard. Darkest, medium, and Lightest color [PITH_FULL_IMAGE:figures/full_fig_p017_5.png] view at source ↗
read the original abstract

We introduce CurveBench, a benchmark for hierarchical topological reasoning from visual input. CurveBench consists of \textbf{756 images} of pairwise non-intersecting Jordan curves across easy, polygonal, topographic-inspired, maze-like, and dense counting configurations. Each image is annotated with a rooted tree encoding the containment relations between planar regions. We formulate the task as structured prediction: given an image, a model must recover the full rooted containment tree induced by the curves. Despite the visual simplicity of the task, the strongest evaluated model, Gemini 3.1 Pro, achieves only \textbf{71.1\%} tree-generation accuracy on CurveBench-Easy and \textbf{19.1\%} on CurveBench-Hard. We further demonstrate benchmark utility through RLVR-style fine-tuning of open-weight vision-language models. Our trained Qwen3-VL-8B model improves over \texttt{Qwen-3-VL-8B-Thinking} from \textbf{2.8\%} to \textbf{33.3\%} tree-generation accuracy on CurveBench-Easy, exceeding GPT-5.4 and Claude Opus 4.5 under our evaluation protocol. The remaining gap, especially on CurveBench-Hard, shows that exact topology-aware visual reasoning remains far from solved.

Editorial analysis

A structured set of objections, weighed in public.

Desk editor's note, referee report, simulated authors' rebuttal, and a circularity audit. Tearing a paper down is the easy half of reading it; the pith above is the substance, this is the friction.

Referee Report

2 major / 2 minor

Summary. The manuscript introduces CurveBench, a benchmark of 756 procedurally generated images depicting pairwise non-intersecting Jordan curves in easy, polygonal, topographic, maze-like, and dense configurations. Each image is annotated with a rooted tree encoding containment relations among planar regions. The task is formulated as structured prediction of the full containment tree from the image. The authors report that Gemini 3.1 Pro achieves 71.1% tree-generation accuracy on CurveBench-Easy and 19.1% on CurveBench-Hard; fine-tuning Qwen3-VL-8B raises its Easy accuracy from 2.8% to 33.3%, exceeding several closed models under the authors' protocol. The central conclusion is that exact topology-aware visual reasoning remains far from solved.

Significance. If the images and annotations isolate containment relations without exploitable low-level visual cues, the benchmark would provide a useful, falsifiable test of exact hierarchical topological reasoning in VLMs. The fine-tuning result would further demonstrate that targeted training can measurably improve performance on this task. Such a resource could help shift evaluation away from approximate or heuristic visual reasoning toward precise, verifiable topological competence.

major comments (2)
  1. [Abstract and dataset-construction section] Abstract and dataset-construction section: the central claim that low accuracies demonstrate failure of exact topological reasoning rests on the assumption that the 756 images contain no unintended low-level signals (curve thickness, vertex density, shading, or positional statistics) correlated with nesting depth or region count. No ablation, correlation analysis, or independent symbolic verification of the tree annotations decoupled from pixel features is described, leaving open the possibility that models exploit rendering artifacts rather than winding-number or ray-casting reasoning.
  2. [Evaluation protocol] Evaluation protocol: the reported tree-generation accuracies (71.1% Easy, 19.1% Hard) are presented without details on the exact prompting template, output parsing rules, or statistical significance testing across multiple runs, making it difficult to assess whether the performance gap is robust or sensitive to evaluation choices.
minor comments (2)
  1. The abstract states that the benchmark uses 'rooted-tree annotations' but does not specify the exact tree representation (e.g., parent-pointer list, nested parentheses) or how ties in containment are resolved; a brief formal definition would improve reproducibility.
  2. Figure captions and the description of the five configurations could include explicit counts of images per subset and average nesting depth to allow readers to gauge difficulty distribution.

Simulated Author's Rebuttal

2 responses · 0 unresolved

We thank the referee for the constructive feedback on dataset integrity and evaluation details. We address each major comment below and have revised the manuscript to incorporate additional verification and protocol specifications.

read point-by-point responses
  1. Referee: [Abstract and dataset-construction section] Abstract and dataset-construction section: the central claim that low accuracies demonstrate failure of exact topological reasoning rests on the assumption that the 756 images contain no unintended low-level signals (curve thickness, vertex density, shading, or positional statistics) correlated with nesting depth or region count. No ablation, correlation analysis, or independent symbolic verification of the tree annotations decoupled from pixel features is described, leaving open the possibility that models exploit rendering artifacts rather than winding-number or ray-casting reasoning.

    Authors: We agree that explicit checks strengthen the claim. CurveBench images are generated via a procedural pipeline that independently samples curve parameters (control points, nesting depth, region count) from uniform distributions before rendering with fixed line width and no shading or texture. In the revision we add a dedicated verification subsection: (i) Pearson correlations between low-level image statistics (edge length, vertex density, bounding-box area) and tree properties (depth, node count) are all below 0.12; (ii) containment trees were recomputed from the underlying vector representations using an independent symbolic ray-casting routine, matching the released annotations at 100 %. These results are now reported in the dataset-construction section. revision: yes

  2. Referee: [Evaluation protocol] Evaluation protocol: the reported tree-generation accuracies (71.1% Easy, 19.1% Hard) are presented without details on the exact prompting template, output parsing rules, or statistical significance testing across multiple runs, making it difficult to assess whether the performance gap is robust or sensitive to evaluation choices.

    Authors: We have expanded the Evaluation Protocol section to include the complete prompting templates (system message plus user prompt with image placeholder) for every model family, the deterministic JSON schema validator used for output parsing, and the fallback rule-based extractor applied to the <5 % of malformed outputs. All accuracies are now reported as means over three independent runs with distinct sampling seeds, accompanied by standard deviations and paired t-test p-values confirming significance of the reported gaps (p < 0.01). revision: yes

Circularity Check

0 steps flagged

No circularity: empirical benchmark evaluation is self-contained

full rationale

The paper introduces CurveBench with 756 procedurally generated images and rooted-tree annotations for containment relations, then reports direct accuracy measurements (e.g., Gemini 3.1 Pro at 71.1% Easy / 19.1% Hard) and fine-tuning gains on external models. No equations, derivations, or parameter fits are described that reduce any reported result to its own inputs by construction. The central claims rest on dataset construction and model evaluation rather than any self-referential mathematical chain, satisfying the default expectation of no significant circularity.

Axiom & Free-Parameter Ledger

0 free parameters · 1 axioms · 0 invented entities

The central claim rests on the standard mathematical definition of Jordan curves together with the creation and annotation of a new image dataset; no free parameters, invented entities, or ad-hoc axioms are introduced beyond these.

axioms (1)
  • standard math Jordan curves are simple closed curves in the plane that do not self-intersect and divide the plane into an interior and exterior region.
    Invoked to define the containment relations encoded in the rooted trees.

pith-pipeline@v0.9.0 · 5547 in / 1291 out tokens · 64878 ms · 2026-05-15T05:30:49.710142+00:00 · methodology

discussion (0)

Sign in with ORCID, Apple, or X to comment. Anyone can read and Pith papers without signing in.

Lean theorems connected to this paper

Citations machine-checked in the Pith Canon. Every link opens the source theorem in the public Lean library.

What do these tags mean?
matches
The paper's claim is directly supported by a theorem in the formal canon.
supports
The theorem supports part of the paper's argument, but the paper may add assumptions or extra steps.
extends
The paper goes beyond the formal theorem; the theorem is a base layer rather than the whole result.
uses
The paper appears to rely on the theorem as machinery.
contradicts
The paper's claim conflicts with a theorem or certificate in the canon.
unclear
Pith found a possible connection, but the passage is too broad, indirect, or ambiguous to say the theorem truly supports the claim.

Reference graph

Works this paper leans on

44 extracted references · 44 canonical work pages · 8 internal anchors

  1. [1]

    Topographic Map Symbols , year =

  2. [2]

    Jordan Theorem , year =

  3. [3]

    Journal of Visual Languages and Computing , volume =

    Rodgers, Peter , title =. Journal of Visual Languages and Computing , volume =. 2014 , url =

  4. [4]

    Findings of the Association for Computational Linguistics: ACL 2022 , pages =

    Masry, Ahmed and Long, Do Xuan and Tan, Jia Qing and Joty, Shafiq and Hoque, Enamul , title =. Findings of the Association for Computational Linguistics: ACL 2022 , pages =. 2022 , url =

  5. [5]

    Mathew, Minesh and Karatzas, Dimosthenis and Jawahar, C. V. , title =. Proceedings of the IEEE/CVF Winter Conference on Applications of Computer Vision , pages =. 2021 , url =

  6. [6]

    Advances in Neural Information Processing Systems , volume =

    Lu, Pan and Mishra, Swaroop and Xia, Tony and Qiu, Liang and Chang, Kai-Wei and Zhu, Song-Chun and Tafjord, Oyvind and Clark, Peter and Kalyan, Ashwin , title =. Advances in Neural Information Processing Systems , volume =. 2022 , url =

  7. [7]

    International Conference on Learning Representations Workshop Track , year =

    Ebrahimi Kahou, Samira and Michalski, Vincent and Atkinson, Adam and K. International Conference on Learning Representations Workshop Track , year =

  8. [8]

    Advances in Neural Information Processing Systems, Datasets and Benchmarks Track , year =

    Lu, Pan and Qiu, Liang and Chen, Jiaqi and Xia, Tony and Zhao, Yizhou and Zhang, Wei and Yu, Zhou and Liang, Xiaodan and Zhu, Song-Chun , title =. Advances in Neural Information Processing Systems, Datasets and Benchmarks Track , year =

  9. [9]

    and Ma, Wei-Chiu and Krishna, Ranjay , title =

    Fu, Xingyu and Hu, Yushi and Li, Bangzheng and Feng, Yu and Wang, Haoyu and Lin, Xudong and Roth, Dan and Smith, Noah A. and Ma, Wei-Chiu and Krishna, Ranjay , title =. European Conference on Computer Vision , year =

  10. [10]

    Proceedings of the IEEE/CVF Conference on Computer Vision and Pattern Recognition , pages =

    Chen, Boyuan and Xu, Zhuo and Kirmani, Sean and Ichter, Brian and Sadigh, Dorsa and Guibas, Leonidas and Xia, Fei , title =. Proceedings of the IEEE/CVF Conference on Computer Vision and Pattern Recognition , pages =. 2024 , url =

  11. [11]

    Proceedings of the 2025 Conference of the Nations of the Americas Chapter of the Association for Computational Linguistics: Human Language Technologies , year =

    Rismanchian, Sina and Razeghi, Yasaman and Singh, Sameer and Doroudi, Shayan , title =. Proceedings of the 2025 Conference of the Nations of the Americas Chapter of the Association for Computational Linguistics: Human Language Technologies , year =

  12. [12]

    Contour Detection and Hierarchical Image Segmentation , journal =

    Arbel. Contour Detection and Hierarchical Image Segmentation , journal =. 2011 , url =

  13. [13]

    , title =

    Bastani, Favyen and He, Songtao and Abbar, Sofiane and Alizadeh, Mohammad and Balakrishnan, Hari and Chawla, Sanjay and Madden, Sam and DeWitt, David J. , title =. Proceedings of the IEEE Conference on Computer Vision and Pattern Recognition , pages =. 2018 , url =

  14. [14]

    Proceedings of the IEEE/CVF International Conference on Computer Vision , pages =

    Li, Zuoyue and Wegner, Jan Dirk and Lucchi, Aurelien , title =. Proceedings of the IEEE/CVF International Conference on Computer Vision , pages =. 2019 , url =

  15. [15]

    Proceedings of the IEEE Conference on Computer Vision and Pattern Recognition , pages =

    Zellers, Rowan and Yatskar, Mark and Thomson, Sam and Choi, Yejin , title =. Proceedings of the IEEE Conference on Computer Vision and Pattern Recognition , pages =. 2018 , url =

  16. [16]

    and Bernstein, Michael S

    Krishna, Ranjay and Zhu, Yuke and Groth, Oliver and Johnson, Justin and Hata, Kenji and Kravitz, Joshua and Chen, Stephanie and Kalantidis, Yannis and Li, Li-Jia and Shamma, David A. and Bernstein, Michael S. and Fei-Fei, Li , title =. International Journal of Computer Vision , volume =. 2017 , url =

  17. [17]

    and Hinton, Geoffrey , title =

    Chen, Ting and Saxena, Saurabh and Li, Lala and Fleet, David J. and Hinton, Geoffrey , title =. International Conference on Learning Representations , year =

  18. [18]

    Proceedings of the 40th International Conference on Machine Learning , pages =

    Lee, Kenton and Joshi, Mandar and Turc, Iulia Raluca and Hu, Hexiang and Liu, Fangyu and Eisenschlos, Julian Martin and Khandelwal, Urvashi and Shaw, Peter and Chang, Ming-Wei and Toutanova, Kristina , title =. Proceedings of the 40th International Conference on Machine Learning , pages =. 2023 , url =

  19. [19]

    European Conference on Computer Vision , pages =

    Carion, Nicolas and Massa, Francisco and Synnaeve, Gabriel and Usunier, Nicolas and Kirillov, Alexander and Zagoruyko, Sergey , title =. European Conference on Computer Vision , pages =. 2020 , url =

  20. [20]

    European Conference on Computer Vision , pages =

    Kembhavi, Aniruddha and Salvato, Mike and Kolve, Eric and Seo, Minjoon and Hajishirzi, Hannaneh and Farhadi, Ali , title =. European Conference on Computer Vision , pages =. 2016 , url =

  21. [21]

    and Manning, Christopher D

    Hudson, Drew A. and Manning, Christopher D. , title =. Proceedings of the IEEE/CVF Conference on Computer Vision and Pattern Recognition , pages =. 2019 , url =

  22. [22]

    Transactions of the Association for Computational Linguistics , volume =

    Liu, Fangyu and Emerson, Guy and Collier, Nigel , title =. Transactions of the Association for Computational Linguistics , volume =. 2023 , url =

  23. [23]

    Advances in Neural Information Processing Systems , volume =

    Hu, Xiaoling and Li, Fuxin and Samaras, Dimitris and Chen, Chao , title =. Advances in Neural Information Processing Systems , volume =. 2019 , url =

  24. [24]

    Shao, Zhihong and Wang, Peiyi and Zhu, Qihao and Xu, Runxin and Song, Junxiao and Bi, Xiao and Zhang, Haowei and Zhang, Mingchuan and Li, Y. K. and Wu, Y. and Guo, Daya , title =. arXiv preprint arXiv:2402.03300 , year =. 2402.03300 , archivePrefix =

  25. [25]

    Proximal Policy Optimization Algorithms

    Schulman, John and Wolski, Filip and Dhariwal, Prafulla and Radford, Alec and Klimov, Oleg , title =. arXiv preprint arXiv:1707.06347 , year =. 1707.06347 , archivePrefix =

  26. [26]

    Prime Intellect Environments Hub , year =

  27. [27]

    Lawrence and Girshick, Ross , title =

    Johnson, Justin and Hariharan, Bharath and van der Maaten, Laurens and Fei-Fei, Li and Zitnick, C. Lawrence and Girshick, Ross , title =. Proceedings of the IEEE Conference on Computer Vision and Pattern Recognition , pages =. 2017 , url =

  28. [28]

    arXiv preprint arXiv:2204.02380 , year =

    Salewski, Leonard and Koepke, Sophia and Lensch, Hendrik and Akata, Zeynep , title =. arXiv preprint arXiv:2204.02380 , year =. 2204.02380 , archivePrefix =

  29. [29]

    Advances in Neural Information Processing Systems , volume =

    Madaan, Aman and Tandon, Niket and Gupta, Prakhar and Hallinan, Skyler and Gao, Luyu and Wiegreffe, Sarah and Alon, Uri and Dziri, Nouha and Prabhumoye, Shrimai and Yang, Yiming and Gupta, Shashank and Majumder, Bodhisattwa Prasad and Hermann, Katherine and Welleck, Sean and Yazdanbakhsh, Amir and Clark, Peter , title =. Advances in Neural Information Pro...

  30. [30]

    and Zhang, Hao and Gonzalez, Joseph E

    Zheng, Lianmin and Chiang, Wei-Lin and Sheng, Ying and Zhuang, Siyuan and Wu, Zhanghao and Zhuang, Yonghao and Lin, Zi and Li, Zhuohan and Li, Dacheng and Xing, Eric P. and Zhang, Hao and Gonzalez, Joseph E. and Stoica, Ion , title =. Advances in Neural Information Processing Systems, Datasets and Benchmarks Track , volume =. 2023 , url =

  31. [31]

    and Gonzalez, Joseph E

    Chiang, Wei-Lin and Zheng, Lianmin and Sheng, Ying and Angelopoulos, Anastasios Nikolas and Li, Tianle and Li, Dacheng and Zhu, Banghua and Zhang, Hao and Jordan, Michael I. and Gonzalez, Joseph E. and Stoica, Ion , title =. Proceedings of the 41st International Conference on Machine Learning , pages =. 2024 , url =

  32. [32]

    Proceedings of the 2023 Conference on Empirical Methods in Natural Language Processing , pages =

    Liu, Yang and Iter, Dan and Xu, Yichong and Wang, Shuohang and Xu, Ruochen and Zhu, Chenguang , title =. Proceedings of the 2023 Conference on Empirical Methods in Natural Language Processing , pages =. 2023 , url =

  33. [33]

    Tulu 3: Pushing Frontiers in Open Language Model Post-Training

    Lambert, Nathan and Morrison, Jacob and Pyatkin, Valentina and Huang, Shengyi and Ivison, Hamish and Brahman, Faeze and Miranda, Lester James V. and Liu, Alisa and Dziri, Nouha and Lyu, Shane and Gu, Yuling and Malik, Saumya and Graf, Victoria and Hwang, Jena D. and Yang, Jiangjiang and Le Bras, Ronan and Tafjord, Oyvind and Wilhelm, Chris and Soldaini, L...

  34. [34]

    DeepSeek-R1: Incentivizing Reasoning Capability in LLMs via Reinforcement Learning

    arXiv preprint arXiv:2501.12948 , year =. 2501.12948 , archivePrefix =

  35. [35]

    and Shen, Yelong and Wallis, Phillip and Allen-Zhu, Zeyuan and Li, Yuanzhi and Wang, Shean and Wang, Lu and Chen, Weizhu , title =

    Hu, Edward J. and Shen, Yelong and Wallis, Phillip and Allen-Zhu, Zeyuan and Li, Yuanzhi and Wang, Shean and Wang, Lu and Chen, Weizhu , title =. International Conference on Learning Representations , year =

  36. [36]

    2024 , eprint =

    Gemma: Open Models Based on Gemini Research and Technology , journal =. 2024 , eprint =

  37. [37]

    Qwen3-VL Technical Report

    Bai, Shuai and Cai, Yuxuan and Chen, Ruizhe and Chen, Keqin and Chen, Xionghui and Cheng, Zesen and Deng, Lianghao and Ding, Wei and Gao, Chang and Ge, Chunjiang and Ge, Wenbin and Guo, Zhifang and Huang, Qidong and Huang, Jie and Huang, Fei and Hui, Binyuan and Jiang, Shutong and Li, Zhaohai and Li, Mingsheng and Li, Mei and Li, Kaixin and Lin, Zicheng a...

  38. [38]

    Understanding R1-Zero-Like Training: A Critical Perspective

    Liu, Zichen and Chen, Changyu and Li, Wenjun and Qi, Penghui and Pang, Tianyu and Du, Chao and Lee, Wee Sun and Lin, Min , title =. arXiv preprint arXiv:2503.20783 , year =. 2503.20783 , archivePrefix =

  39. [39]

    VLM-R1: A Stable and Generalizable R1-style Large Vision-Language Model

    Shen, Haozhan and Liu, Peng and Li, Jingcheng and Fang, Chunxin and Ma, Yibo and Liao, Jiajia and Shen, Qiaoli and Zhang, Zilun and Zhao, Kangjia and Zhang, Qianqian and Xu, Ruochen and Zhao, Tiancheng , title =. arXiv preprint arXiv:2504.07615 , year =. 2504.07615 , archivePrefix =

  40. [40]

    arXiv preprint arXiv:2503.07536 , year =

    Peng, Yingzhe and Zhang, Gongrui and Zhang, Miaosen and You, Zhiyuan and Liu, Jie and Zhu, Qipeng and Yang, Kai and Xu, Xingzhong and Geng, Xin and Yang, Xu , title =. arXiv preprint arXiv:2503.07536 , year =. 2503.07536 , archivePrefix =

  41. [41]

    arXiv preprint arXiv:2503.12937 , year =

    Zhang, Jingyi and Huang, Jiaxing and Yao, Huanjin and Liu, Shunyu and Zhang, Xikun and Lu, Shijian and Tao, Dacheng , title =. arXiv preprint arXiv:2503.12937 , year =. 2503.12937 , archivePrefix =

  42. [42]

    arXiv preprint arXiv:2504.07954 , year =

    Yu, En and Lin, Kangheng and Zhao, Liang and Yin, Jisheng and Wei, Yana and Peng, Yuang and Wei, Haoran and Sun, Jianjian and Han, Chunrui and Ge, Zheng and Zhang, Xiangyu and Jiang, Daxin and Wang, Jingyu and Tao, Wenbing , title =. arXiv preprint arXiv:2504.07954 , year =. 2504.07954 , archivePrefix =

  43. [43]

    MM-Eureka: Exploring the Frontiers of Multimodal Reasoning with Rule-based Reinforcement Learning

    Meng, Fanqing and Du, Lingxiao and Liu, Zongkai and Zhou, Zhixiang and Lu, Quanfeng and Fu, Daocheng and Shi, Botian and Wang, Wenhai and He, Junjun and Zhang, Kaipeng and Luo, Ping and Qiao, Yu and Zhang, Qiaosheng and Shao, Wenqi , title =. arXiv preprint arXiv:2503.07365 , year =. 2503.07365 , archivePrefix =

  44. [44]

    2025 , howpublished =

    Schulman, John and Thinking Machines Lab , title =. 2025 , howpublished =