Geo-Strat-RL: Learning Geological Event Reasoning from Verifiable Tasks
Pith reviewed 2026-06-26 00:05 UTC · model grok-4.3
The pith
Reinforcement learning with verifiable rewards on synthetic stratigraphic tasks improves vision-language models' geological event reconstructions and transfers to seismic formats without domain-specific training.
A machine-rendered reading of the paper's core claim, the machinery that carries it, and where it could break.
Core claim
We show that RLVR improves geological reconstruction in vision-language models (VLMs), increasing geological content scores on held out stratigraphic diagrams. We further evaluate the same held-out geological histories in a synthetic seismic observation domain by converting the generated scenes into acoustic-impedance-derived amplitude sections. In this controlled paired-renderer setting, we present evidence that geological reasoning learned from stratigraphic diagram-domain RLVR training transfers to synthetic seismic representations without seismic-specific training examples, supporting the hypothesis that RLVR can teach reusable geological reasoning concepts across related observation for
What carries the argument
Geo-Strat-RL, a synthetic environment that pairs a geological generator producing stratigraphic observations and compact event histories with an executable verifier scoring chronology, event identity, deposition, and structural relationships.
If this is right
- RLVR raises geological content scores on held-out stratigraphic diagrams.
- Geological reasoning acquired in the diagram domain transfers to synthetic seismic sections without seismic-specific training.
- Verifiable rewards enable learning of reusable concepts that apply across observation formats sharing the same underlying geological structure.
- This approach offers a route to train models on tasks where ground-truth histories are not uniquely identifiable from observations alone.
Where Pith is reading between the lines
- Similar synthetic verifiable environments could be constructed for other scientific domains that involve ambiguous temporal or structural reasoning.
- The observed transfer suggests the method captures abstract geological relationships rather than surface visual patterns specific to one rendering style.
- A direct next test would apply the trained models to real field data to check whether simulator gains survive outside the controlled generator.
- If the transfer effect generalizes, the method could reduce the need for large labeled datasets when teaching scientific reasoning to AI systems.
Load-bearing premise
The synthetic generator and verifier produce observations and reward signals that faithfully encode real geological principles so that gains reflect genuine reasoning improvements rather than simulator artifacts.
What would settle it
No performance gain or outright degradation when the RLVR-trained model is applied to real stratigraphic diagrams or seismic sections whose event histories have been independently verified by field geologists.
Figures
read the original abstract
To evaluate whether vision-language models can reason about geological histories, it is necessary to construct observations for which the underlying process history is known. Furthermore, reasoning over geological histories is not just a question of recognizing visual patterns, but also of understanding temporal and structural relationships that may be only indirectly visible or highly ambiguous. When ground-truth event histories are not uniquely identifiable or are unavailable, it remains an open challenge to teach models capable of visual reasoning to produce valid geological reconstructions that are consistent with both observed evidence and geological principles. We therefore investigate whether defining a verifiable geological reasoning task can improve geological event reconstruction across observation domains through reinforcement learning with verifiable rewards (RLVR). To this end, we present Geo-Strat-RL, a synthetic environment that generates stratigraphic observations and compact visible-evidence event histories. The environment combines a geological generator with an executable verifier that scores chronology, event identity, deposition, and structural relationships. We show that RLVR improves geological reconstruction in vision-language models (VLMs), increasing geological content scores on held out stratigraphic diagrams. We further evaluate the same held-out geological histories in a synthetic seismic observation domain by converting the generated scenes into acoustic-impedance-derived amplitude sections. In this controlled paired-renderer setting, we present evidence that geological reasoning learned from stratigraphic diagram-domain RLVR training transfers to synthetic seismic representations without seismic-specific training examples, supporting the hypothesis that RLVR can teach reusable geological reasoning concepts across related observation formats.
Editorial analysis
A structured set of objections, weighed in public.
Referee Report
Summary. The manuscript introduces Geo-Strat-RL, a synthetic environment that pairs a geological generator with an executable verifier to produce stratigraphic diagrams and compact event histories. It claims that reinforcement learning with verifiable rewards (RLVR) applied to vision-language models on this environment raises geological content scores on held-out stratigraphic diagrams and that the resulting reasoning transfers to synthetic seismic amplitude sections derived from the same underlying event histories, without any seismic-specific training examples.
Significance. If the transfer result holds under controls that isolate genuine geological principles from renderer-specific artifacts, the work would supply a concrete demonstration that RLVR can produce reusable, cross-domain scientific reasoning in VLMs. The use of an executable verifier and paired renderers is a methodological strength that could be extended to other domains where ground-truth process histories are synthetically controllable.
major comments (3)
- [Abstract] Abstract: the transfer claim rests on the assumption that the generator+verifier faithfully encode geological principles (chronology, deposition, structural relationships) rather than simulator-specific patterns. No ablation on verifier rules, no expert adjudication on real data, and no comparison against a fixed-event-sequence baseline are described that would distinguish these alternatives.
- [Abstract] Abstract: no methods, model architecture, reward function definition, training hyperparameters, or quantitative tables (scores, baselines, statistical tests) are supplied, rendering it impossible to evaluate whether the reported improvements or transfer effect are supported by the data.
- [Abstract] Abstract: the paired-renderer design means both observation domains are generated from identical event histories; without an external check (real seismic data or verifier-rule ablation), apparent transfer could arise from learning a shared mapping to the fixed history rather than domain-invariant geological concepts.
Simulated Author's Rebuttal
We thank the referee for the constructive comments highlighting key aspects of validating cross-domain transfer in our synthetic RLVR setup. We address each major comment point-by-point below, acknowledging limitations where the current design leaves alternative explanations open and indicating revisions to strengthen the manuscript.
read point-by-point responses
-
Referee: [Abstract] Abstract: the transfer claim rests on the assumption that the generator+verifier faithfully encode geological principles (chronology, deposition, structural relationships) rather than simulator-specific patterns. No ablation on verifier rules, no expert adjudication on real data, and no comparison against a fixed-event-sequence baseline are described that would distinguish these alternatives.
Authors: We agree these controls are needed to isolate geological principles from simulator artifacts. The verifier encodes standard stratigraphic rules (superposition, chronology, structural consistency), but without ablations or a fixed-sequence baseline the contribution of those rules versus other patterns cannot be fully separated. No expert review on real data is included because the work is scoped to synthetic verifiable tasks with known ground truth. We will add (i) an ablation removing or perturbing specific verifier rules and (ii) a fixed-event-sequence baseline in the revised manuscript; extension to real-data adjudication is noted as future work. revision: yes
-
Referee: [Abstract] Abstract: no methods, model architecture, reward function definition, training hyperparameters, or quantitative tables (scores, baselines, statistical tests) are supplied, rendering it impossible to evaluate whether the reported improvements or transfer effect are supported by the data.
Authors: The current manuscript version supplies only the high-level abstract and does not contain the requested details on architecture, reward definition, hyperparameters, or quantitative tables with baselines and tests. We will expand the manuscript to include these elements (VLM architecture, verifier-derived reward components, RLVR training settings, score tables, and statistical comparisons) so that the claimed improvements can be properly assessed. revision: yes
-
Referee: [Abstract] Abstract: the paired-renderer design means both observation domains are generated from identical event histories; without an external check (real seismic data or verifier-rule ablation), apparent transfer could arise from learning a shared mapping to the fixed history rather than domain-invariant geological concepts.
Authors: The paired-renderer design deliberately holds the underlying event history fixed to test whether reasoning learned in one observation domain transfers to another. This controlled setting is a methodological feature, yet we acknowledge it leaves open the possibility that the model learns a mapping to the shared history rather than domain-invariant concepts. Combined with the absence of verifier-rule ablations (addressed in comment 1), this is a genuine limitation of the current evidence. We will add explicit discussion of this alternative explanation and reference the planned ablations. revision: partial
Circularity Check
No significant circularity; empirical results rely on external verifier
full rationale
The paper presents an empirical RLVR training setup on synthetic stratigraphic diagrams generated with known event histories, using an executable verifier for rewards based on chronology and structural rules. Improvements are measured on held-out diagrams and transfer to a paired seismic renderer. No equations, fitted parameters, or self-citations are described that reduce the claimed gains or transfer to a tautology by construction. The verifier and generator are external rule-based components, and the held-out evaluation plus cross-domain test provide independent checks within the synthetic setting, making the derivation self-contained rather than circular.
Axiom & Free-Parameter Ledger
axioms (1)
- domain assumption Geological principles can be encoded into an executable verifier that produces reliable scores for chronology, event identity, deposition, and structural relationships.
invented entities (1)
-
Geo-Strat-RL synthetic environment
no independent evidence
Reference graph
Works this paper leans on
-
[1]
arXiv preprint arXiv:2002.08791 , year =
Wilson, Andrew Gordon and Izmailov, Pavel , title =. arXiv preprint arXiv:2002.08791 , year =
arXiv 2002
-
[2]
2026 , url =
Kimi K2.5: Visual Agentic Intelligence , journal =. 2026 , url =
2026
-
[3]
arXiv preprint arXiv:2501.12599 , year =
Kimi k1.5: Scaling Reinforcement Learning with. arXiv preprint arXiv:2501.12599 , year =
-
[4]
arXiv preprint arXiv:2306.05064 , year =
Deng, Cheng and Zhang, Tianhang and He, Zhongmou and Xu, Yi and Chen, Qiyuan and Shi, Yuanyuan and Fu, Luoyi and Zhang, Weinan and Wang, Xinbing and Zhou, Chenghu and Lin, Zhouhan and He, Junxian , title =. arXiv preprint arXiv:2306.05064 , year =
-
[5]
arXiv preprint arXiv:2401.00434 , year =
Lin, Zhouhan and Deng, Cheng and Zhou, Le and Zhang, Tianhang and Xu, Yi and Xu, Yutong and He, Zhongmou and Shi, Yuanyuan and Dai, Beiya and Song, Yunchong and Zeng, Boyi and Chen, Qiyuan and Miao, Yuxun and Xue, Bo and Wang, Shu and Fu, Luoyi and Zhang, Weinan and He, Junxian and Zhu, Yunqiang and Wang, Xinbing and Zhou, Chenghu , title =. arXiv preprin...
-
[6]
Proceedings of the IEEE/CVF Conference on Computer Vision and Pattern Recognition , year =
Kuckreja, Kartik and Danish, Muhammad Shahzad and Naseer, Muzammal and Das, Abhijit and Khan, Salman and Khan, Fahad Shahbaz , title =. Proceedings of the IEEE/CVF Conference on Computer Vision and Pattern Recognition , year =
-
[7]
Advances in Neural Information Processing Systems , volume =
Liu, Haotian and Li, Chunyuan and Wu, Qingyang and Lee, Yong Jae , title =. Advances in Neural Information Processing Systems , volume =. 2023 , url =
2023
-
[8]
arXiv preprint arXiv:1707.06347 , year =
Schulman, John and Wolski, Filip and Dhariwal, Prafulla and Radford, Alec and Klimov, Oleg , title =. arXiv preprint arXiv:1707.06347 , year =
-
[9]
Advances in Neural Information Processing Systems , volume =
Ouyang, Long and Wu, Jeffrey and Jiang, Xu and Almeida, Diogo and Wainwright, Carroll and Mishkin, Pamela and Zhang, Chong and Agarwal, Sandhini and Slama, Katarina and Ray, Alex and Schulman, John and Hilton, Jacob and Kelton, Fraser and Miller, Luke and Simens, Maddie and Askell, Amanda and Welinder, Peter and Christiano, Paul and Leike, Jan and Lowe, R...
2022
-
[10]
Hu, Edward J. and Shen, Yelong and Wallis, Phillip and Allen-Zhu, Zeyuan and Li, Yuanzhi and Wang, Shean and Wang, Lu and Chen, Weizhu , title =. arXiv preprint arXiv:2106.09685 , year =
-
[11]
Shao, Zhihong and Wang, Peiyi and Zhu, Qihao and Xu, Runxin and Song, Junxiao and Bi, Xiao and Zhang, Haowei and Zhang, Mingchuan and Li, Y. K. and Wu, Y. and Guo, Daya , title =. arXiv preprint arXiv:2402.03300 , year =
-
[12]
arXiv preprint arXiv:2501.12948 , year =
-
[13]
arXiv preprint arXiv:2502.13923 , year =
-
[14]
arXiv preprint arXiv:2511.21631 , year =
-
[15]
2020 , url =
von Werra, Leandro and Belkada, Younes and Tunstall, Lewis and Beeching, Edward and Thrush, Tristan and Lambert, Nathan and Huang, Shengyi and Rasul, Kashif and Gallou. 2020 , url =
2020
-
[16]
and Zhang, Hao and Stoica, Ion , title =
Kwon, Woosuk and Li, Zhuohan and Zhuang, Siyuan and Sheng, Ying and Zheng, Lianmin and Yu, Cody Hao and Gonzalez, Joseph E. and Zhang, Hao and Stoica, Ion , title =. Proceedings of the ACM SIGOPS 29th Symposium on Operating Systems Principles , pages =. 2023 , url =
2023
-
[17]
and Wolf, Thomas , title =
Lhoest, Quentin and Villanova del Moral, Albert and Jernite, Yacine and Thakur, Abhishek and von Platen, Patrick and Patil, Suraj and Chaumond, Julien and Drame, Mariama and Plu, Julien and Tunstall, Lewis and Davison, Joe and Sajjad, Hassan and Chhablani, Gunjan and Malik, Bhavitvya and Brandeis, Simon and Le Scao, Teven and Sanh, Victor and Xu, Canwen a...
2021
-
[18]
Advances in Neural Information Processing Systems , volume =
Paszke, Adam and Gross, Sam and Massa, Francisco and Lerer, Adam and Bradbury, James and Chanan, Gregory and Killeen, Trevor and Lin, Zeming and Gimelshein, Natalia and Antiga, Luca and Desmaison, Alban and Kopf, Andreas and Yang, Edward and DeVito, Zachary and Raison, Martin and Tejani, Alykhan and Chilamkurthy, Sasank and Steiner, Benoit and Fang, Lu an...
2019
-
[19]
Catuneanu, Octavian , title =
-
[20]
and Blum, Michael D
Catuneanu, Octavian and Abreu, Victor and Bhattacharya, Janok P. and Blum, Michael D. and Dalrymple, Robert W. and Eriksson, Patrick G. and Fielding, Christopher R. and Fisher, William L. and Galloway, William E. and Gibling, Martin R. and Giles, Katherine A. and Holbrook, John M. and Jordan, Ronald and Kendall, Christopher G. St. C. and Macurda, Bruce an...
2009
-
[21]
and Kolla, Venkatarathnan , title =
Posamentier, Henry W. and Kolla, Venkatarathnan , title =. Journal of Sedimentary Research , year =
-
[22]
Allan, U. S. , title =. AAPG Bulletin , year =
-
[23]
and Johnson, Paul A
Bergen, Karianne J. and Johnson, Paul A. and de Hoop, Maarten V. and Beroza, Gregory C. , title =. Science , year =
-
[24]
70 Years of Machine Learning in Geoscience in Review , journal =
Dramsch, Jesper S. 70 Years of Machine Learning in Geoscience in Review , journal =. 2020 , volume =
2020
-
[25]
IEEE Geoscience and Remote Sensing Magazine , year =
Zhu, Xiao Xiang and Tuia, Devis and Mou, Lichao and Xia, Gui-Song and Zhang, Liangpei and Xu, Feng and Fraundorfer, Friedrich , title =. IEEE Geoscience and Remote Sensing Magazine , year =
-
[26]
and Jensen, Are Charles and Gelius, Leiv-J
Waldeland, Anders U. and Jensen, Are Charles and Gelius, Leiv-J. and Solberg, Anne H. Schistad , title =. The Leading Edge , year =
-
[27]
Geophysics , year =
Wu, Xinming and Liang, Luming and Shi, Yunzhi and Fomel, Sergey , title =. Geophysics , year =
-
[28]
The Leading Edge , year =
Hall, Brendon , title =. The Leading Edge , year =
-
[29]
and Chemale, Farid , title =
Bressan, Thiago Santi and de Souza, Marcelo Kehl and Girelli, Tiago J. and Chemale, Farid , title =. Computers & Geosciences , year =
-
[30]
Mosser, Lukas and Dubrule, Olivier and Blunt, Martin J. , title =. Physical Review E , year =. doi:10.1103/PhysRevE.96.043309 , url =
-
[31]
Mathematical Geosciences52(1), 53–79 (2020) https://doi.org/10.1007/s11004-019-09832-6
Mosser, Lukas and Dubrule, Olivier and Blunt, Martin J. , title =. Mathematical Geosciences , year =. doi:10.1007/s11004-019-09832-6 , url =
-
[32]
and Aursand, P
Mosser, L. and Aursand, P. and Brakstad, K. S. and Lehre, C. and Myhre-Bakkevig, J. , title =. SPE Norway Subsurface Conference , year =
discussion (0)
Sign in with ORCID, Apple, or X to comment. Anyone can read and Pith papers without signing in.