pith. machine review for the scientific record. sign in

arxiv: 2604.21501 · v2 · submitted 2026-04-23 · 💻 cs.AI

Recognition: unknown

GeoMind: An Agentic Workflow for Lithology Classification with Reasoned Tool Invocation

Authors on Pith no claims yet

Pith reviewed 2026-05-09 21:33 UTC · model grok-4.3

classification 💻 cs.AI
keywords lithology classificationwell logsagentic frameworktool invocationgeological constraintsprocess supervisionsequential reasoning
0
0 comments X

The pith

GeoMind models lithology classification as an adaptive sequence of tool-based reasoning steps rather than a single static mapping.

A machine-rendered reading of the paper's core claim, the machinery that carries it, and where it could break.

The paper shows that standard lithology classifiers from well logs suffer from ignoring geological knowledge because they map data directly to rock-type labels in one step. GeoMind instead builds an agent that first turns raw logs into semantic trends, then forms lithology hypotheses from combined evidence, and finally checks those hypotheses against stratigraphic rules. A global planner chooses the order and combination of these steps according to the input data. Training rewards correct intermediate steps so the whole chain stays logically sound and consistent with geology. On four standard well-log test sets the approach raises accuracy while making every decision step inspectable.

Core claim

GeoMind is a tool-augmented agentic framework that casts lithology classification as a sequential reasoning process. Tools are grouped into perception, reasoning, and analysis modules that a global planner coordinates adaptively; fine-grained supervision on each reasoning step enforces logical consistency and alignment with geological constraints.

What carries the argument

The global planner that adaptively selects and sequences tools from the perception, reasoning, and analysis modules to build evidence-grounded lithology predictions.

If this is right

  • Classification accuracy rises above strong baselines across four well-log benchmark datasets.
  • Decision processes become transparent because each perception, hypothesis, and verification step is recorded.
  • Predictions respect stratigraphic constraints that single-step models frequently ignore.
  • Intermediate supervision improves consistency of the full reasoning trajectory.

Where Pith is reading between the lines

These are editorial extensions of the paper, not claims the author makes directly.

  • The same modular coordination pattern could be reused for other sequential geophysical interpretation tasks such as formation evaluation or log correlation.
  • Process-level supervision may limit geologically implausible outputs that appear when large models reason without explicit constraint checks.
  • Explicit module steps create natural points for human experts to review or correct only the parts of the workflow they care about most.

Load-bearing premise

The global planner can reliably choose module sequences that keep every reasoning step geologically valid and free of accumulated logical or domain errors.

What would settle it

A controlled run on any of the four benchmark datasets in which replacing the adaptive planner with fixed sequencing produces either lower accuracy or outputs that violate stratigraphic constraints the analysis module should have caught.

Figures

Figures reproduced from arXiv: 2604.21501 by Jiahao Wang, Mingyue Cheng, Qi Liu, Qingyang Mao, Yitong Zhou.

Figure 1
Figure 1. Figure 1: numerical models tend to overfit local noise and violate [PITH_FULL_IMAGE:figures/full_fig_p001_1.png] view at source ↗
Figure 2
Figure 2. Figure 2: Well-log curves from the facies dataset across depth [PITH_FULL_IMAGE:figures/full_fig_p002_2.png] view at source ↗
Figure 3
Figure 3. Figure 3: Overview of GeoMind for well-log lithology classification. GeoMind uses a Planner–Executor–Reflector workflow [PITH_FULL_IMAGE:figures/full_fig_p004_3.png] view at source ↗
Figure 4
Figure 4. Figure 4: MA-GRPO architecture with group sampling, module-specific process rewards, and KL-regularized policy update. [PITH_FULL_IMAGE:figures/full_fig_p005_4.png] view at source ↗
Figure 5
Figure 5. Figure 5: Comparison of actor gradient norm and trajectory return between MA-GRPO and GRPO. MA-GRPO demonstrates [PITH_FULL_IMAGE:figures/full_fig_p008_5.png] view at source ↗
Figure 6
Figure 6. Figure 6: Comparison of fragmentation rates across Fa [PITH_FULL_IMAGE:figures/full_fig_p008_6.png] view at source ↗
read the original abstract

Lithology classification in well logs is a fundamental geoscience data mining task that aims to infer rock types from multi dimensional geophysical sequences. Despite recent progress, existing approaches typically formulate the problem as a static, single-step discriminative mapping. This static paradigm limits evidence-based diagnostic reasoning against geological standards, often yielding predictions that are detached from geological reality due to a lack of domain priors. In this work, we propose GeoMind, a tool-augmented agentic framework that models lithology classification as a sequential reasoning process. GeoMind organizes its toolkit into perception, reasoning, and analysis modules, which respectively translate raw logs into semantic trends, infer lithology hypotheses from multi-source evidence, and verify predictions against stratigraphic constraints. A global planner adaptively coordinates these modules based on input characteristics, enabling geologically plausible and evidence-grounded decisions. To guarantee the logical consistency of GeoMind, we introduce a fine-grained process supervision strategy. Unlike standard methods that focus solely on final outcomes, our approach optimizes intermediate reasoning steps, ensuring the validity of decision trajectories and alignment to geological constraints. Experiments on four benchmark well-log datasets demonstrate that GeoMind consistently outperforms strong baselines in classification performance while providing transparent and traceable decision-making processes.

Editorial analysis

A structured set of objections, weighed in public.

Desk editor's note, referee report, simulated authors' rebuttal, and a circularity audit. Tearing a paper down is the easy half of reading it; the pith above is the substance, this is the friction.

Referee Report

3 major / 2 minor

Summary. The paper proposes GeoMind, a tool-augmented agentic framework for lithology classification from multi-dimensional well-log sequences. It reframes the task as a sequential reasoning process organized into perception (semantic trend extraction), reasoning (hypothesis inference from multi-source evidence), and analysis (verification against stratigraphic constraints) modules, coordinated by an adaptive global planner. Fine-grained process supervision is introduced to optimize intermediate reasoning steps for logical consistency and geological alignment, in contrast to outcome-only training. Experiments on four benchmark well-log datasets are claimed to show consistent outperformance over strong baselines together with transparent, traceable decision processes.

Significance. If the performance gains and geological plausibility claims are substantiated with appropriate controls, the work could meaningfully advance geoscience data mining by moving beyond static discriminative mappings toward dynamic, evidence-grounded agentic workflows that incorporate domain priors and produce interpretable trajectories.

major comments (3)
  1. [Abstract] Abstract: the central claim that GeoMind 'consistently outperforms strong baselines in classification performance' is load-bearing for the contribution yet is stated without any reported metrics, baseline specifications, dataset sizes, or statistical significance tests, preventing verification of whether the delta exceeds what richer prompting alone could achieve.
  2. [Abstract] Abstract (and implied experimental section): no ablation is described that removes the global planner or the fine-grained process supervision, so it remains unclear whether the reported gains derive from adaptive module coordination and trajectory optimization or simply from increased token budget and prompt complexity.
  3. [Abstract] Abstract: the assertion of 'geologically plausible and evidence-grounded decisions' and 'alignment to geological constraints' requires quantitative support (e.g., trajectory validity scores, stratigraphic constraint satisfaction rates, or expert review of intermediate steps), none of which is mentioned; final classification accuracy alone does not establish these properties.
minor comments (2)
  1. [Abstract] Abstract: the phrasing 'multi dimensional' should be hyphenated as 'multi-dimensional' for clarity.
  2. [Abstract] Abstract: the distinction between 'perception, reasoning, and analysis modules' would benefit from a brief parenthetical example of the specific tools or operations each module invokes.

Simulated Author's Rebuttal

3 responses · 0 unresolved

We thank the referee for the thoughtful and detailed feedback. The comments highlight important areas for strengthening the abstract and experimental validation. We will revise the manuscript accordingly to provide more concrete evidence and details while preserving the core contributions of the agentic workflow.

read point-by-point responses
  1. Referee: [Abstract] Abstract: the central claim that GeoMind 'consistently outperforms strong baselines in classification performance' is load-bearing for the contribution yet is stated without any reported metrics, baseline specifications, dataset sizes, or statistical significance tests, preventing verification of whether the delta exceeds what richer prompting alone could achieve.

    Authors: We agree that the abstract would benefit from greater specificity to allow immediate assessment of the claims. In the revised version, we will update the abstract to include representative quantitative results (e.g., accuracy or macro-F1 improvements across the four datasets), name the primary baselines, note approximate dataset scales, and reference statistical significance testing. These additions will be drawn from the existing experimental results and will help distinguish the gains from those achievable by prompting alone. revision: yes

  2. Referee: [Abstract] Abstract (and implied experimental section): no ablation is described that removes the global planner or the fine-grained process supervision, so it remains unclear whether the reported gains derive from adaptive module coordination and trajectory optimization or simply from increased token budget and prompt complexity.

    Authors: We acknowledge that the abstract does not explicitly reference ablations. The full manuscript presents comparative results against static baselines, but dedicated ablations isolating the global planner and process supervision (with controls for token usage) are not currently detailed. We will add these ablation experiments in the revised manuscript, reporting performance drops when each component is removed while matching token budgets, to demonstrate that the gains stem from the proposed coordination and supervision mechanisms. revision: yes

  3. Referee: [Abstract] Abstract: the assertion of 'geologically plausible and evidence-grounded decisions' and 'alignment to geological constraints' requires quantitative support (e.g., trajectory validity scores, stratigraphic constraint satisfaction rates, or expert review of intermediate steps), none of which is mentioned; final classification accuracy alone does not establish these properties.

    Authors: We recognize that final accuracy alone is insufficient to substantiate geological plausibility. The current manuscript relies on the design of the analysis module and process supervision to enforce constraints, but does not report explicit quantitative metrics for trajectory validity or constraint satisfaction. In the revision, we will introduce and report such metrics (e.g., percentage of trajectories satisfying stratigraphic rules and automated validity scores on intermediate steps). We will also add a limited expert review of sampled trajectories if resources permit, or at minimum provide qualitative examples with constraint adherence rates. revision: yes

Circularity Check

0 steps flagged

No circularity: framework is an independent architectural construction with no derivations or self-referential reductions.

full rationale

The paper describes GeoMind as a tool-augmented agentic workflow with perception, reasoning, analysis modules, a global planner, and fine-grained process supervision. No equations, first-principles derivations, fitted parameters renamed as predictions, or self-citation chains appear in the abstract or described structure. The claimed performance gains are presented as empirical outcomes from the framework rather than results forced by definition or prior self-citations. The central claims rest on experimental benchmarks without reducing to input data by construction, making the work self-contained against external evaluation.

Axiom & Free-Parameter Ledger

0 free parameters · 2 axioms · 0 invented entities

Only abstract available; ledger populated from high-level framework description. No explicit free parameters, axioms, or invented entities are quantified.

axioms (2)
  • domain assumption A global planner can adaptively coordinate perception, reasoning, and analysis modules based on input characteristics to yield geologically plausible decisions.
    Invoked in the framework organization and coordination description.
  • domain assumption Fine-grained process supervision on intermediate reasoning steps ensures logical consistency and alignment to geological constraints.
    Central to the supervision strategy claim.

pith-pipeline@v0.9.0 · 5521 in / 1264 out tokens · 23753 ms · 2026-05-09T21:33:11.198977+00:00 · methodology

discussion (0)

Sign in with ORCID, Apple, or X to comment. Anyone can read and Pith papers without signing in.

Reference graph

Works this paper leans on

49 extracted references · 14 canonical work pages · 4 internal anchors

  1. [1]

    1982.Basic well log analysis for geologists

    George B Asquith and Charles R Gibosn. 1982.Basic well log analysis for geologists. American Association of Petroleum Geologists

  2. [2]

    Paolo Bestagini, Vincenzo Lipari, and Stefano Tubaro. 2017. A machine learning approach to facies classification using well logs. InSeg technical program expanded abstracts 2017. Society of Exploration Geophysicists, 2137–2142

  3. [3]

    Tianqi Chen, Tong He, Michael Benesty, Vadim Khotilovich, Yuan Tang, Hyunsu Cho, Kailong Chen, Rory Mitchell, Ignacio Cano, Tianyi Zhou, et al. 2015. Xgboost: extreme gradient boosting.R package version 0.4-21, 4 (2015), 1–4

  4. [4]

    Mingyue Cheng, Yiheng Chen, Qi Liu, Zhiding Liu, Yucong Luo, and Enhong Chen. 2025. InstrucTime: Advancing Time Series Classification with Multimodal Language Modeling. InProceedings of the Eighteenth ACM International Conference on Web Search and Data Mining(Hannover, Germany)(WSDM ’25). Association for Computing Machinery, New York, NY, USA, 792–800. do...

  5. [5]

    Mingyue Cheng, Xiaoyu Tao, Qi Liu, Ze Guo, and Enhong Chen. 2026. Posi- tion: Beyond Model-Centric Prediction–Agentic Time Series Forecasting.arXiv preprint arXiv:2602.01776(2026)

  6. [6]

    Mingyue Cheng, Jiahao Wang, Daoyu Wang, Xiaoyu Tao, Qi Liu, and Enhong Chen. 2026. Can slow-thinking llms reason over time? empirical studies in time series forecasting. InProceedings of the Nineteenth ACM International Conference on Web Search and Data Mining. 99–110

  7. [7]

    2005.Well logging and formation evaluation

    Toby Darling. 2005.Well logging and formation evaluation. Elsevier

  8. [8]

    Angus Dempster, Daniel F Schmidt, and Geoffrey I Webb. 2021. Minirocket: A very fast (almost) deterministic transform for time series classification. In Proceedings of the 27th ACM SIGKDD conference on knowledge discovery & data mining. 248–257

  9. [9]

    Cheng Deng, Tianhang Zhang, Zhongmou He, Qiyuan Chen, Yuanyuan Shi, Yi Xu, Luoyi Fu, Weinan Zhang, Xinbing Wang, Chenghu Zhou, et al . 2024. K2: A foundation language model for geoscience knowledge understanding and utilization. InProceedings of the 17th ACM International Conference on Web Search and Data Mining. 161–170

  10. [10]

    John H Doveton. 1985. Log analysis of subsurface geology: concepts and computer methods. (1985)

  11. [11]

    Ahmed H El-Banbi, Ahmed El-Maraghi, and Mohamed Helmy Sayyouh. 2025. Artificial Intelligence Applications in Log Interpretation-A Review.Egyptian Journal of Petroleum34, 4 (2025), 7

  12. [12]

    Elizabeth Fons, Rachneet Kaur, Soham Palande, Zhen Zeng, Tucker Balch, Manuela Veloso, and Svitlana Vyetrenko. 2024. Evaluating large language models on time series feature understanding: A comprehensive taxonomy and bench- mark.arXiv preprint arXiv:2404.16563(2024)

  13. [13]

    Alexander Golubev, Maria Trofimova, Sergei Polezhaev, Ibragim Badertdinov, Maksim Nekrashevich, Anton Shevtsov, Simon Karasik, Sergey Abramov, An- drei Andriushchenko, Filipp Fisin, et al . 2025. Training long-context, multi- turn software engineering agents with reinforcement learning.arXiv preprint arXiv:2508.03501(2025)

  14. [14]

    Mononito Goswami, Konrad Szafer, Arjun Choudhry, Yifu Cai, Shuo Li, and Artur Dubrawski. 2024. Moment: A family of open time-series foundation models.arXiv preprint arXiv:2402.03885(2024)

  15. [15]

    Hassan Ismail Fawaz, Germain Forestier, Jonathan Weber, Lhassane Idoumghar, and Pierre-Alain Muller. 2019. Deep learning for time series classification: a review.Data mining and knowledge discovery33, 4 (2019), 917–963

  16. [16]

    Hassan Ismail Fawaz, Benjamin Lucas, Germain Forestier, Charlotte Pelletier, Daniel F Schmidt, Jonathan Weber, Geoffrey I Webb, Lhassane Idoumghar, Pierre- Alain Muller, and François Petitjean. 2020. Inceptiontime: Finding alexnet for time series classification.Data Mining and Knowledge Discovery34, 6 (2020), 1936–1962

  17. [17]

    Dongfu Jiang, Yi Lu, Zhuofeng Li, Zhiheng Lyu, Ping Nie, Haozhe Wang, Alex Su, Hui Chen, Kai Zou, Chao Du, et al. 2025. Verltool: Towards holistic agentic reinforcement learning with tool use.arXiv preprint arXiv:2509.01055(2025)

  18. [18]

    Fazle Karim, Somshubra Majumdar, Houshang Darabi, and Samuel Harford. 2019. Multivariate LSTM-FCNs for time series classification.Neural networks116 (2019), 237–245

  19. [19]

    Rohit J Kate. 2016. Using dynamic time warping distances as features for improved time series classification.Data mining and knowledge discovery30, 2 (2016), 283– 312

  20. [20]

    Guolin Ke, Qi Meng, Thomas Finley, Taifeng Wang, Wei Chen, Weidong Ma, Qiwei Ye, and Tie-Yan Liu. 2017. Lightgbm: A highly efficient gradient boosting decision tree.Advances in neural information processing systems30 (2017)

  21. [21]

    Ning Li, Binsen Xu, Hongliang Wu, Zhou Feng, Yusheng Li, Kewen Wang, and Peng Liu. 2021. Application status and prospects of artificial intelligence in well logging and formation evaluation.Acta Petrolei Sinica42, 4 (2021), 508

  22. [22]

    Zhi Li, Zhefeng Wang, Zhicheng Wei, Xiangguang Zhou, Yijun Wang, Baoxing Huai, Qi Liu, Nicholas Jing Yuan, Renbin Gong, and Enhong Chen. 2021. Cross- oilfield reservoir classification via multi-scale sensor knowledge transfer. In Proceedings of the AAAI Conference on Artificial Intelligence, Vol. 35. 4215–4223

  23. [23]

    Hunter Lightman, Vineet Kosaraju, Yuri Burda, Harrison Edwards, Bowen Baker, Teddy Lee, Jan Leike, John Schulman, Ilya Sutskever, and Karl Cobbe. 2023. Let’s verify step by step. InThe Twelfth International Conference on Learning Representations

  24. [24]

    Yen-Ting Lin, Di Jin, Tengyu Xu, Tianhao Wu, Sainbayar Sukhbaatar, Chen Zhu, Yun He, Yun-Nung Chen, Jason E Weston, Yuandong Tian, et al . 2025. Step- kto: Optimizing mathematical reasoning through stepwise binary feedback. In Proceedings of The 3rd Workshop on Mathematical Natural Language Processing (MathNLP 2025). 15–33

  25. [25]

    Zhouhan Lin, Cheng Deng, Le Zhou, Tianhang Zhang, Yi Xu, Yutong Xu, Zhong- mou He, Yuanyuan Shi, Beiya Dai, Yunchong Song, et al. 2023. Geogalactica: A scientific large language model in geoscience.arXiv preprint arXiv:2401.00434 (2023)

  26. [26]

    Jing-Jing Liu and Jian-Chao Liu. 2022. Integrating deep learning and logging data analytics for lithofacies classification and 3D modeling of tight sandstone reservoirs.Geoscience Frontiers13, 1 (2022), 101311

  27. [27]

    Qi Liu, Enhong Chen, Hui Xiong, Chris HQ Ding, and Jian Chen. 2011. Enhancing collaborative filtering by user interest expansion via personalized ranking.IEEE Transactions on Systems, Man, and Cybernetics, Part B (Cybernetics)42, 1 (2011), 218–233

  28. [28]

    Zhege Liu, Junxing Cao, Jiachun You, Shuna Chen, Yujia Lu, and Peng Zhou

  29. [29]

    A lithological sequence classification method with well log via SVM- assisted bi-directional GRU-CRF neural network.Journal of Petroleum Science and Engineering205 (2021), 108913

  30. [30]

    Yucong Luo, Yitong Zhou, Mingyue Cheng, Jiahao Wang, Daoyu Wang, Tingyue Pan, and Jintao Zhang. 2025. Time Series Forecasting as Reasoning: A Slow- Thinking Approach with Reinforced LLMs.arXiv preprint arXiv:2506.10630(2025)

  31. [31]

    Long Ouyang, Jeffrey Wu, Xu Jiang, Diogo Almeida, Carroll Wainwright, Pamela Mishkin, Chong Zhang, Sandhini Agarwal, Katarina Slama, Alex Ray, et al. 2022. Training language models to follow instructions with human feedback.Advances in neural information processing systems35 (2022), 27730–27744

  32. [32]

    Rafael Rafailov, Archit Sharma, Eric Mitchell, Christopher D Manning, Stefano Ermon, and Chelsea Finn. 2023. Direct preference optimization: Your language model is secretly a reward model.Advances in neural information processing systems36 (2023), 53728–53741

  33. [33]

    John Schulman, Filip Wolski, Prafulla Dhariwal, Alec Radford, and Oleg Klimov

  34. [34]

    Proximal policy optimization algorithms.arXiv preprint arXiv:1707.06347 (2017)

  35. [35]

    Agnes Schumann. 2002. Hidden Markov models for lithological well log classifi- cation.Terra Nostra4 (2002), 373–378

  36. [36]

    Zhihong Shao, Peiyi Wang, Qihao Zhu, Runxin Xu, Junxiao Song, Xiao Bi, Haowei Zhang, Mingchuan Zhang, YK Li, Yang Wu, et al. 2024. Deepseekmath: Pushing the limits of mathematical reasoning in open language models.arXiv preprint arXiv:2402.03300(2024)

  37. [37]

    Nisan Stiennon, Long Ouyang, Jeffrey Wu, Daniel Ziegler, Ryan Lowe, Chelsea Voss, Alec Radford, Dario Amodei, and Paul F Christiano. 2020. Learning to summarize with human feedback.Advances in neural information processing systems33 (2020), 3008–3021

  38. [38]

    Xiaoyu Tao, Mingyue Cheng, Ze Guo, Shuo Yu, Yaguo Liu, Qi Liu, and Shijin Wang. 2026. MemCast: Memory-Driven Time Series Forecasting with Experience- Conditioned Reasoning.arXiv preprint arXiv:2602.03164(2026)

  39. [39]

    Xiaoyu Tao, Mingyue Cheng, Chuang Jiang, Tian Gao, Huanjian Zhang, and Yaguo Liu. 2026. Cast-R1: Learning Tool-Augmented Sequential Decision Policies for Time Series Forecasting.arXiv preprint arXiv:2602.13802(2026)

  40. [40]

    Hind Taud and Jean-Franccois Mas. 2017. Multilayer perceptron (MLP). In Geomatic approaches for modeling land change scenarios. Springer, 451–455

  41. [41]

    Michele Tonutti, Emanuele Ruffaldi, Alessandro Cattaneo, and Carlo Alberto Avizzano. 2019. Robust and subject-independent driving manoeuvre anticipation through domain-adversarial recurrent neural networks.Robotics and Autonomous Systems115 (2019), 162–173

  42. [42]

    Valentin Tschannen, Matthias Delescluse, Mathieu Rodriguez, and Janis Keuper

  43. [43]

    Facies classification from well logs using an inception convolutional network.arXiv preprint arXiv:1706.00613(2017)

  44. [44]

    Jiahao Wang, Mingyue Cheng, Qingyang Mao, Yitong Zhou, Daoyu Wang, Qi Liu, Feiyang Xu, and Xin Li. 2025. Tabletime: Reformulating time series classification as training-free table understanding with large language models. InProceedings of the 34th ACM International Conference on Information and Knowledge Management. 3009–3019

  45. [45]

    Danyan Xie, Zeyang Liu, Fuhao Wang, and Zhenyu Song. 2024. A transformer and LSTM-based approach for blind well lithology prediction.Symmetry16, 5 (2024), 616

  46. [46]

    Qiying Yu, Zheng Zhang, Ruofei Zhu, Yufeng Yuan, Xiaochen Zuo, Yu Yue, Weinan Dai, Tiantian Fan, Gaohong Liu, Lingjun Liu, et al. 2025. Dapo: An open- source llm reinforcement learning system at scale.arXiv preprint arXiv:2503.14476 (2025)

  47. [47]

    Yi Zheng, Qi Liu, Enhong Chen, Yong Ge, and J Leon Zhao. 2014. Time se- ries classification using multi-channels deep convolutional neural networks. In International conference on web-age information management. Springer, 298–310. Conference’17, July 2017, Washington, DC, USA Yitong Zhou 1, Mingyue Cheng 1, Jiahao Wang1, Qingyang Mao 1, Qi Liu 1

  48. [48]

    Leave-One-Fold-Out

    Tian Zhou, Peisong Niu, Liang Sun, Rong Jin, et al . 2023. One fits all: Power general time series analysis by pretrained lm.Advances in neural information processing systems36 (2023), 43322–43355. GeoMind: An Agentic Workflow for Lithology Classification with Reasoned Tool Invocation Conference’17, July 2017, Washington, DC, USA A Tools Description This ...

  49. [49]

    Well logs

    Consequently, the peak GPU memory usage scales linearly with the total length of the trajectory:M 𝑝𝑒𝑎𝑘 ∝ Í𝐿 𝑖=1 len(𝑜𝑖 ). MA-GRPO (Local Gradient Accumulation).In MA-GRPO, the total gradient is the sum of independent gradients from each module: ∇𝜃 J𝑡𝑜𝑡𝑎𝑙 =∇ 𝜃 J𝑇 𝑟𝑒𝑛𝑑 + ∇𝜃 J𝑅𝑒𝑎𝑠𝑜𝑛𝑖𝑛𝑔 + ∇𝜃 J𝑅𝑒 𝑓 𝑙𝑒𝑐𝑡𝑜𝑟 .(13) Crucially, the gradient ∇𝜃 J𝑚 for module 𝑚 depend...