pith. machine review for the scientific record. sign in

arxiv: 2605.02405 · v1 · submitted 2026-05-04 · 💻 cs.LG

Recognition: unknown

Closed-Loop CO2 Storage Control With History-Based Reinforcement Learning and Latent Model-Based Adaptation

Authors on Pith no claims yet

Pith reviewed 2026-05-09 16:11 UTC · model grok-4.3

classification 💻 cs.LG
keywords closed-loop CO2 storagereinforcement learninghistory-conditioned policieslatent model adaptationreservoir simulationpartially observable controlwell-level observations
0
0 comments X

The pith

History-conditioned reinforcement learning recovers nearly all privileged-state performance for CO2 storage control with only well-level data.

A machine-rendered reading of the paper's core claim, the machinery that carries it, and where it could break.

The paper formulates CO2 injection and brine-production control as a partially observable sequential decision problem and trains deep reinforcement learning controllers on high-fidelity reservoir simulations. It compares privileged-state, well-only, history-conditioned, masking-curriculum, and asymmetric teacher-student policies to measure the benefit of temporal well-response information. History-conditioned policies recover nearly all privileged-state performance while depending solely on deployable well-level observations. A latent model-based adaptation pipeline reuses nominal latent dynamics to retune controllers under injector failure, leakage-induced shifts, and connectivity changes, and it outperforms direct model-free retuning when the same limited scenario-specific simulation budget is available. This supplies a simulator-budget-aware alternative to repeated online history matching and re-optimization.

Core claim

Closed-loop management of geological CO2 storage can be handled by history-conditioned deep reinforcement learning policies that recover nearly all of the privileged-state performance while using only deployable well-level information, together with a latent model-based adaptation pipeline that reuses nominal latent dynamics and retunes controllers more effectively than direct model-free retuning under the same scenario-specific real-simulator budget for abnormal cases involving injector failure, leakage, and compartmentalized connectivity.

What carries the argument

History-conditioned policies and latent model-based retuning pipeline, which reuses nominal latent dynamics to adapt controllers to changed reservoir conditions using only realistic observations and limited additional simulations.

Load-bearing premise

High-fidelity reservoir simulations accurately represent real-world reservoir behavior and the latent dynamics model captures the necessary changes under failures, leakage, and connectivity shifts.

What would settle it

Deploy the history-conditioned and latent-adapted controllers on a real CO2 storage site or a physical laboratory analog under documented injector failure or leakage conditions and measure whether achieved storage efficiency and safety metrics match the simulation predictions.

Figures

Figures reproduced from arXiv: 2605.02405 by Sofianos Panagiotis Fotias, Vassilis Gaganis.

Figure 1
Figure 1. Figure 1: Closed-loop CCS control formulation considered in this work. The reservoir simulator evolves view at source ↗
Figure 2
Figure 2. Figure 2: Training-deployment setting considered in this work. Policies are trained on ensembles of prior view at source ↗
Figure 3
Figure 3. Figure 3: Baseline information regimes. The privileged-state benchmark uses dense simulator fields and view at source ↗
Figure 4
Figure 4. Figure 4: History-conditioned model. Current well observations and a rolling history of public well view at source ↗
Figure 5
Figure 5. Figure 5: Masked-critic curriculum. During training, the critic receives progressively masked spatial sim view at source ↗
Figure 6
Figure 6. Figure 6: Asymmetric teacher-student model. During training, privileged teacher critics use dense spatial view at source ↗
Figure 7
Figure 7. Figure 7: Model-based pipeline used in this work. Public observations are mapped to a deployable latent view at source ↗
Figure 8
Figure 8. Figure 8: Scenario 1 methodology. The actor outputs an 11-dimensional nominal action, after which the view at source ↗
Figure 9
Figure 9. Figure 9: Residual world-model adaptation for Scenarios 2 and 3. Abnormal latent transitions are encoded view at source ↗
Figure 10
Figure 10. Figure 10: Test return as a function of training epoch for the five model-free variants. The well-only base view at source ↗
Figure 11
Figure 11. Figure 11: Scenario 0: nominal model-based retention. Real-environment evaluation return as a function view at source ↗
Figure 12
Figure 12. Figure 12: Scenario 1: adaptation under known injector failure. Real-environment evaluation return as a view at source ↗
Figure 13
Figure 13. Figure 13: Scenario 2: adaptation under leakage-induced dynamics and reward shift. Real-environment view at source ↗
Figure 14
Figure 14. Figure 14: Scenario 3: adaptation under compartmentalized connectivity shift. Real-environment eval view at source ↗
read the original abstract

Closed-loop management of geological CO2 storage requires control policies that adapt to uncertain reservoir behavior while relying on observations that are realistically available during operation. This work formulates CO2 injection and brine-production control as a partially observable sequential decision problem and studies deployable deep reinforcement-learning controllers trained with high-fidelity reservoir simulation. We first compare privileged-state, well-only, history-conditioned, masking-curriculum, and asymmetric teacher-student model-free policies in order to quantify the value of temporal well-response information and training-time privileged simulator states. We then evaluate a latent model-based adaptation pipeline that reuses nominal latent dynamics and retunes controllers under known injector failure, leakage-induced dynamics and reward shift, and compartmentalized reservoir connectivity. The results show that history-conditioned policies recover nearly all of the privileged-state performance while using only deployable well-level information, and that latent model-based retuning outperforms direct model-free retuning under the same scenario-specific real-simulator budget in the abnormal operating cases. The proposed framework therefore provides a simulator-budget-aware alternative to repeated online history matching and re-optimization for closed-loop CO2 storage control.

Editorial analysis

A structured set of objections, weighed in public.

Desk editor's note, referee report, simulated authors' rebuttal, and a circularity audit. Tearing a paper down is the easy half of reading it; the pith above is the substance, this is the friction.

Referee Report

2 major / 2 minor

Summary. The paper formulates CO2 injection and brine-production control as a partially observable Markov decision process and trains deep RL policies on high-fidelity reservoir simulators. It compares privileged-state, well-only, history-conditioned, masking-curriculum, and asymmetric teacher-student policies, claiming that history-conditioned policies recover nearly all privileged-state performance using only deployable well-level observations. It further proposes a latent model-based adaptation pipeline that reuses nominal latent dynamics and retunes controllers for three known abnormality classes (injector failure, leakage-induced dynamics/reward shift, compartmentalized connectivity), reporting that this outperforms direct model-free retuning under a fixed scenario-specific simulator budget.

Significance. If the empirical claims hold under broader validation, the work provides a practical, simulator-budget-aware alternative to repeated online history matching for closed-loop CO2 storage. The demonstration that temporal well-response history suffices to approach privileged performance, together with the latent-adaptation results, directly addresses partial observability and model uncertainty in subsurface control. The explicit focus on deployable observations and limited retuning budgets is a concrete strength that could inform real-world deployment of RL in energy systems.

major comments (2)
  1. [§4, §5.2] §4 (Experimental Setup) and §5.2 (Abnormal Scenario Results): The headline claims that history-conditioned policies recover nearly all privileged performance and that latent retuning outperforms model-free retuning rest entirely on high-fidelity reservoir simulations for three known abnormality classes. No cross-validation against field data, out-of-distribution simulator variants, or unanticipated dynamics (e.g., fault reactivation or multiphase hysteresis) is reported; this is load-bearing because the latent model is reused from the nominal case and only retuned for the tested shifts.
  2. [§5.1, Table 2] §5.1 and Table 2 (Policy Comparison): Performance gains are reported without error bars, confidence intervals, or statistical significance tests across random seeds or reservoir realizations. This makes it impossible to determine whether the reported near-recovery of privileged performance is robust or within noise, directly affecting the central claim about the value of history conditioning.
minor comments (2)
  1. [§3.3] Notation for the latent dynamics model (e.g., how the encoder/decoder are trained and how retuning is performed) is introduced without a clear equation reference or pseudocode, making the adaptation pipeline hard to reproduce from the text alone.
  2. [Abstract, §1] The abstract and §1 claim 'nearly all' recovery but do not quantify the gap (e.g., percentage of cumulative reward or constraint violation) relative to privileged policies; adding these numbers would strengthen the comparison.

Simulated Author's Rebuttal

2 responses · 0 unresolved

We thank the referee for the detailed and constructive feedback. We address each major comment below, indicating where we will revise the manuscript to strengthen the presentation and where we provide clarification on the scope of the study.

read point-by-point responses
  1. Referee: [§4, §5.2] §4 (Experimental Setup) and §5.2 (Abnormal Scenario Results): The headline claims that history-conditioned policies recover nearly all privileged performance and that latent retuning outperforms model-free retuning rest entirely on high-fidelity reservoir simulations for three known abnormality classes. No cross-validation against field data, out-of-distribution simulator variants, or unanticipated dynamics (e.g., fault reactivation or multiphase hysteresis) is reported; this is load-bearing because the latent model is reused from the nominal case and only retuned for the tested shifts.

    Authors: We agree that the evaluation relies on high-fidelity reservoir simulations for three representative abnormality classes (injector failure, leakage-induced shifts, and compartmentalization). This is standard practice in the field given the prohibitive cost and limited availability of real CO2 storage field data for controlled experimentation. The latent adaptation pipeline is explicitly designed to reuse nominal dynamics and retune only for known shift classes under a fixed simulator budget, which is the central methodological contribution. We will revise §5.2 and add a new limitations paragraph to explicitly state that results are conditioned on the tested shift classes, discuss the challenges of unanticipated dynamics (e.g., fault reactivation), and outline how online model adaptation could be extended in future work. No field-data cross-validation is feasible within the current scope. revision: partial

  2. Referee: [§5.1, Table 2] §5.1 and Table 2 (Policy Comparison): Performance gains are reported without error bars, confidence intervals, or statistical significance tests across random seeds or reservoir realizations. This makes it impossible to determine whether the reported near-recovery of privileged performance is robust or within noise, directly affecting the central claim about the value of history conditioning.

    Authors: We acknowledge the omission of variability measures. In the revised manuscript we will re-run all policy comparisons across at least five independent random seeds and multiple reservoir realizations (where the simulator permits), report mean performance with standard deviation or 95% confidence intervals in Table 2 and the associated figures, and include paired statistical significance tests (e.g., Wilcoxon or t-tests) between history-conditioned and baseline policies to substantiate the claim that history conditioning recovers nearly all privileged-state performance. revision: yes

Circularity Check

0 steps flagged

No significant circularity; empirical RL results in simulators

full rationale

The paper's core claims rest on training and evaluating RL policies (privileged, history-conditioned, latent model-based adaptation) inside high-fidelity reservoir simulators for nominal and abnormal scenarios. Performance comparisons and adaptation advantages are obtained by direct simulation rollouts under fixed budgets, not by any self-referential definition, fitted parameter renamed as prediction, or load-bearing self-citation chain. The derivation chain consists of standard MDP formulation, policy optimization, and empirical benchmarking; no equation or result reduces to its inputs by construction. Minor self-citations, if present, are not load-bearing for the reported outcomes.

Axiom & Free-Parameter Ledger

1 free parameters · 1 axioms · 0 invented entities

The work rests on standard RL training assumptions and the fidelity of reservoir simulators. No new physical entities are introduced.

free parameters (1)
  • RL training hyperparameters (learning rates, network architectures, reward weights)
    Typical in deep RL but unspecified in abstract; affect policy performance and adaptation results.
axioms (1)
  • domain assumption High-fidelity reservoir simulations accurately capture real CO2 storage dynamics including failures and leaks
    Invoked throughout training, evaluation, and adaptation pipeline.

pith-pipeline@v0.9.0 · 5500 in / 1295 out tokens · 49123 ms · 2026-05-09T16:11:36.032983+00:00 · methodology

discussion (0)

Sign in with ORCID, Apple, or X to comment. Anyone can read and Pith papers without signing in.

Reference graph

Works this paper leans on

77 extracted references · 3 canonical work pages · 2 internal anchors

  1. [1]

    Technologies and infrastructures underpinning future co2 value chains: A comprehensive review and comparative analysis.Renewable and Sustainable Energy Re- views, 85:46–68, 2018

    Sean M Jarvis and Sheila Samsatli. Technologies and infrastructures underpinning future co2 value chains: A comprehensive review and comparative analysis.Renewable and Sustainable Energy Re- views, 85:46–68, 2018

  2. [2]

    Paolo Gabrielli, Matteo Gazzani, and Marco Mazzotti. The role of carbon capture and utilization, carbon capture and storage, and biomass to enable a net-zero-co2 emissions chemical industry.In- dustrial & Engineering Chemistry Research, 59(15):7033–7045, 2020. 25

  3. [3]

    The role of carbon capture and storage (ccs) technologies in a net-zero carbon future

    Mai Bui, Graeme Douglas Puxty, Matteo Gazzani, Salman Masoudi Soltani, and Carlos Pozo. The role of carbon capture and storage (ccs) technologies in a net-zero carbon future. 2021

  4. [4]

    Introduction to geological storage.Carbon Capture and Storage; Elsevier: Amsterdam, The Netherlands, pages 285–304, 2017

    SA Rackley and SA Rackley. Introduction to geological storage.Carbon Capture and Storage; Elsevier: Amsterdam, The Netherlands, pages 285–304, 2017

  5. [5]

    Criteria for co2 storage in geological formations.Podzemni radovi, (32):61–74, 2018

    Lola Tomi´ c, Vesna Karovi´ c Mariˇ ci´ c, Duˇ san Danilovi´ c, and Miroslav Crnogorac. Criteria for co2 storage in geological formations.Podzemni radovi, (32):61–74, 2018

  6. [6]

    Co2 storage in deep saline aquifers

    Xiaoyan Ji and Chen Zhu. Co2 storage in deep saline aquifers. InNovel materials for carbon dioxide mitigation technology, pages 299–332. Elsevier, 2015

  7. [7]

    Review of co2 storage efficiency in deep saline aquifers.International Journal of Greenhouse Gas Control, 40:188–202, 2015

    Stefan Bachu. Review of co2 storage efficiency in deep saline aquifers.International Journal of Greenhouse Gas Control, 40:188–202, 2015

  8. [8]

    Geological storage of co2 in saline aquifers—a review of the experience from existing storage operations.International journal of greenhouse gas control, 4(4):659–667, 2010

    Karsten Michael, Alexandra Golab, Valeriya Shulakova, Jonathan Ennis-King, Guy Allinson, Sandeep Sharma, and Toby Aiken. Geological storage of co2 in saline aquifers—a review of the experience from existing storage operations.International journal of greenhouse gas control, 4(4):659–667, 2010

  9. [9]

    Co2 storage in depleted or depleting oil and gas fields: what can we learn from existing projects?Energy Procedia, 114:5680–5690, 2017

    Sarah Hannis, Jiemin Lu, Andy Chadwick, Sue Hovorka, Karen Kirk, Katherine Romanak, and Jonathan Pearce. Co2 storage in depleted or depleting oil and gas fields: what can we learn from existing projects?Energy Procedia, 114:5680–5690, 2017

  10. [10]

    Co 2-eor/sequestration: Current trends and future horizons

    Erfan Mohammadian, Badrul Mohamed Jan, Amin Azdarpour, Hossein Hamidi, Nur Hidayati Binti Othman, Aqilah Dollah, Siti Nurliyana Binti Che Mohamed Hussein, and Rozana Azrina Binti Sazali. Co 2-eor/sequestration: Current trends and future horizons. InEnhanced Oil Recovery Processes-New Technologies. IntechOpen, 2019

  11. [11]

    Co2 sequestration in depleted oil and gas reservoirs—caprock characterization and storage capacity.Energy Conversion and Management, 47(11-12):1372–1382, 2006

    Zhaowen Li, Mingzhe Dong, Shuliang Li, and Sam Huang. Co2 sequestration in depleted oil and gas reservoirs—caprock characterization and storage capacity.Energy Conversion and Management, 47(11-12):1372–1382, 2006

  12. [12]

    Carbon capture, utilization, and storage in saline aquifers: Sub- surface policies, development plans, well control strategies and optimization approaches—a review

    Ismail Ismail and Vassilis Gaganis. Carbon capture, utilization, and storage in saline aquifers: Sub- surface policies, development plans, well control strategies and optimization approaches—a review. Clean Technologies, 5(2):609–637, 2023

  13. [13]

    Code intercomparison builds confidence in numerical simulation models for geologic disposal of co2.Energy, 29(9-10):1431–1444, 2004

    Karsten Pruess, Julio Garc´ ıa, Tony Kovscek, Curt Oldenburg, Jonny Rutqvist, Carl Steefel, and Tianfu Xu. Code intercomparison builds confidence in numerical simulation models for geologic disposal of co2.Energy, 29(9-10):1431–1444, 2004

  14. [14]

    A benchmark study on problems related to co 2 storage in geologic formations: summary and discussion of the results

    Holger Class, Anozie Ebigbo, Rainer Helmig, Helge K Dahle, Jan M Nordbotten, Michael A Celia, Pascal Audigane, Melanie Darcis, Jonathan Ennis-King, Yaqing Fan, et al. A benchmark study on problems related to co 2 storage in geologic formations: summary and discussion of the results. Computational geosciences, 13:409–434, 2009

  15. [15]

    Optimal well placement and brine extraction for pressure management during co2 sequestration.International Journal of Greenhouse Gas Control, 42:175–187, 2015

    Abdullah Cihan, Jens T Birkholzer, and Marco Bianchi. Optimal well placement and brine extraction for pressure management during co2 sequestration.International Journal of Greenhouse Gas Control, 42:175–187, 2015

  16. [16]

    Optimization of well placement, co2 injection rates, and brine cycling for geological carbon sequestration.International Journal of Greenhouse Gas Control, 10:100–112, 2012

    David A Cameron and Louis J Durlofsky. Optimization of well placement, co2 injection rates, and brine cycling for geological carbon sequestration.International Journal of Greenhouse Gas Control, 10:100–112, 2012

  17. [17]

    Co2 storage in geological media: Role, means, status and barriers to deployment

    Stefan Bachu. Co2 storage in geological media: Role, means, status and barriers to deployment. Progress in energy and combustion science, 34(2):254–273, 2008. 26

  18. [18]

    The acceptability of co2 capture and storage (ccs) in europe: An assessment of the key determining factors: Part 1

    Heleen De Coninck, Todd Flach, Paul Curnow, Peter Richardson, Jason Anderson, Simon Shackley, Gudmundur Sigurthorsson, and David Reiner. The acceptability of co2 capture and storage (ccs) in europe: An assessment of the key determining factors: Part 1. scientific, technical and economic dimensions.International Journal of Greenhouse Gas Control, 3(3):333–...

  19. [19]

    Active pressure management through brine production for basin-wide deployment of geologic carbon sequestration.International Journal of Greenhouse Gas Control, 61:155–167, 2017

    Karl W Bandilla and Michael A Celia. Active pressure management through brine production for basin-wide deployment of geologic carbon sequestration.International Journal of Greenhouse Gas Control, 61:155–167, 2017

  20. [20]

    Pre-injection brine production in co2 storage reservoirs: An approach to augment the development, operation, and performance of ccs while generating water

    Thomas A Buscheck, Jeffrey M Bielicki, Joshua A White, Yunwei Sun, Yue Hao, William L Bourcier, Susan A Carroll, and Roger D Aines. Pre-injection brine production in co2 storage reservoirs: An approach to augment the development, operation, and performance of ccs while generating water. International Journal of Greenhouse Gas Control, 54:499–512, 2016

  21. [21]

    Estimating the net costs of brine production and disposal to expand pressure-limited dynamic capacity for basin-scale co2 storage in a saline formation

    Steven T Anderson and Hossein Jahediesfanjani. Estimating the net costs of brine production and disposal to expand pressure-limited dynamic capacity for basin-scale co2 storage in a saline formation. International Journal of Greenhouse Gas Control, 102:103161, 2020

  22. [22]

    Investigation of co2 storage capacity in open saline aquifers with numerical models.Procedia Engineering, 31:886–892, 2012

    Yang Wang, Yaqin Xu, and Keni Zhang. Investigation of co2 storage capacity in open saline aquifers with numerical models.Procedia Engineering, 31:886–892, 2012

  23. [23]

    Multi-objective optimization

    Kalyanmoy Deb, Karthik Sindhya, and Jussi Hakanen. Multi-objective optimization. InDecision sciences, pages 161–200. CRC Press, 2016

  24. [24]

    A holistic review on artificial intelligence techniques for well placement optimization problem.Advances in engineering software, 141:102767, 2020

    Jahedul Islam, Pandian M Vasant, Berihun Mamo Negash, Moacyr Bartholomeu Laruccia, Myo Myint, and Junzo Watada. A holistic review on artificial intelligence techniques for well placement optimization problem.Advances in engineering software, 141:102767, 2020

  25. [25]

    Learning surrogate models for simulation- based optimization.AIChE Journal, 60(6):2211–2227, 2014

    Alison Cozad, Nikolaos V Sahinidis, and David C Miller. Learning surrogate models for simulation- based optimization.AIChE Journal, 60(6):2211–2227, 2014

  26. [26]

    A Tutorial on Bayesian Optimization

    Peter I Frazier. A tutorial on bayesian optimization.arXiv preprint arXiv:1807.02811, 2018

  27. [27]

    An introduction to continuity, extrema, and related topics for general gaussian processes

    Robert J Adler. An introduction to continuity, extrema, and related topics for general gaussian processes. IMS, 1990

  28. [28]

    Aleatory or epistemic? does it matter?Structural safety, 31(2):105–112, 2009

    Armen Der Kiureghian and Ove Ditlevsen. Aleatory or epistemic? does it matter?Structural safety, 31(2):105–112, 2009

  29. [29]

    Aleatoric and epistemic uncertainty in machine learning: An introduction to concepts and methods.Machine learning, 110(3):457–506, 2021

    Eyke H¨ ullermeier and Willem Waegeman. Aleatoric and epistemic uncertainty in machine learning: An introduction to concepts and methods.Machine learning, 110(3):457–506, 2021

  30. [30]

    Optimization of well placement in carbon capture and storage (ccs): Bayesian optimization framework under permutation invariance

    Sofianos Panagiotis Fotias, Ismail Ismail, and Vassilis Gaganis. Optimization of well placement in carbon capture and storage (ccs): Bayesian optimization framework under permutation invariance. Applied Sciences, 14(8):3528, 2024

  31. [31]

    Improved reservoir management through optimal control and continuous model updating

    DR Brouwer, G Nœvdal, JD Jansen, Erland H Vefring, and CPJW Van Kruijsdijk. Improved reservoir management through optimal control and continuous model updating. InSPE Annual Technical Conference and Exhibition?, pages SPE–90149. SPE, 2004

  32. [32]

    Optimizing the performance of smart wells in com- plex reservoirs using continuously updated geological models.Journal of Petroleum Science and Engineering, 48(3-4):254–264, 2005

    Inegbenose Aitokhuehi and Louis J Durlofsky. Optimizing the performance of smart wells in com- plex reservoirs using continuously updated geological models.Journal of Petroleum Science and Engineering, 48(3-4):254–264, 2005. 27

  33. [33]

    Efficient real-time reservoir manage- ment using adjoint-based optimal control and model updating.Computational Geosciences, 10:3–36, 2006

    Pallav Sarma, Louis J Durlofsky, Khalid Aziz, and Wen H Chen. Efficient real-time reservoir manage- ment using adjoint-based optimal control and model updating.Computational Geosciences, 10:3–36, 2006

  34. [34]

    Closed-loop reservoir management

    Jan-Dirk Jansen, SD Douma, Dr R Brouwer, PMJ Van den Hof, OH Bosgra, and AW Heemink. Closed-loop reservoir management. InSPE Reservoir Simulation Conference?, pages SPE–119098. SPE, 2009

  35. [35]

    Production optimization in closed-loop reservoir management.SPE journal, 14(03):506–523, 2009

    Chunhong Wang, Gaoming Li, and Albert C Reynolds. Production optimization in closed-loop reservoir management.SPE journal, 14(03):506–523, 2009

  36. [36]

    Comprehensive framework for gradient-based optimization in closed-loop reservoir management.Computational Geosciences, 19:877–897, 2015

    Vladislav Bukshtynov, Oleg Volkov, Louis J Durlofsky, and Khalid Aziz. Comprehensive framework for gradient-based optimization in closed-loop reservoir management.Computational Geosciences, 19:877–897, 2015

  37. [37]

    A derivative-free approach for the estimation of porosity and permeability using time-lapse seismic and production data.Journal of Geophysics and Engineering, 7(4):351–368, 2010

    Mohsen Dadashpour, David Echeverria Ciaurri, Tapan Mukerji, Jon Kleppe, and Martin Landrø. A derivative-free approach for the estimation of porosity and permeability using time-lapse seismic and production data.Journal of Geophysics and Engineering, 7(4):351–368, 2010

  38. [38]

    Multilevel strategies and geological parameterizations for history matching complex reservoir models.SPE Journal, 25(01):081–104, 2020

    Yimin Liu and Louis J Durlofsky. Multilevel strategies and geological parameterizations for history matching complex reservoir models.SPE Journal, 25(01):081–104, 2020

  39. [39]

    Data assimilation for transient flow in geologic formations via ensemble kalman filter.Advances in Water Resources, 29(8):1107–1122, 2006

    Yan Chen and Dongxiao Zhang. Data assimilation for transient flow in geologic formations via ensemble kalman filter.Advances in Water Resources, 29(8):1107–1122, 2006

  40. [40]

    Ensemble smoother with multiple data assimilation

    Alexandre A Emerick and Albert C Reynolds. Ensemble smoother with multiple data assimilation. Computers & Geosciences, 55:3–15, 2013

  41. [41]

    Training effective deep reinforcement learning agents for real-time life-cycle production optimization.Journal of Petroleum Science and Engineering, 208:109766, 2022

    Kai Zhang, Zhongzheng Wang, Guodong Chen, Liming Zhang, Yongfei Yang, Chuanjin Yao, Jian Wang, and Jun Yao. Training effective deep reinforcement learning agents for real-time life-cycle production optimization.Journal of Petroleum Science and Engineering, 208:109766, 2022

  42. [42]

    Deep reinforcement learning for optimal well control in subsurface systems with uncertain geology.Journal of Computational Physics, 477:111945, 2023

    Yusuf Nasir and Louis J Durlofsky. Deep reinforcement learning for optimal well control in subsurface systems with uncertain geology.Journal of Computational Physics, 477:111945, 2023

  43. [43]

    Stabilizing transformers for reinforcement learning

    Emilio Parisotto, Francis Song, Jack Rae, Razvan Pascanu, Caglar Gulcehre, Siddhant Jayakumar, Max Jaderberg, Raphael Lopez Kaufman, Aidan Clark, Seb Noury, et al. Stabilizing transformers for reinforcement learning. InInternational conference on machine learning, pages 7487–7498. PMLR, 2020

  44. [44]

    Well placement optimization with the covari- ance matrix adaptation evolution strategy and meta-models.Computational Geosciences, 16:75–92, 2012

    Zyed Bouzarkouna, Didier Yu Ding, and Anne Auger. Well placement optimization with the covari- ance matrix adaptation evolution strategy and meta-models.Computational Geosciences, 16:75–92, 2012

  45. [45]

    A derivative-free methodology with local and global search for the constrained joint optimization of well locations and controls

    Obiajulu J Isebor, Louis J Durlofsky, and David Echeverr´ ıa Ciaurri. A derivative-free methodology with local and global search for the constrained joint optimization of well locations and controls. Computational Geosciences, 18:463–482, 2014

  46. [46]

    Yusuf Nasir, Wei Yu, and Kamy Sepehrnoori. Hybrid derivative-free technique and effective machine learning surrogate for nonlinear constrained well placement and production optimization.Journal of Petroleum Science and Engineering, 186:106726, 2020

  47. [47]

    Application of a particle swarm optimization algorithm for determining optimum well location and type.Computational Geosciences, 14:183–198, 2010

    J´ erˆ ome E Onwunalu and Louis J Durlofsky. Application of a particle swarm optimization algorithm for determining optimum well location and type.Computational Geosciences, 14:183–198, 2010. 28

  48. [48]

    Optimal rate control under geologic uncertainty

    Ahmed H Alhuthali, Akhil Datta-Gupta, Bevan Yuen, and Jerry P Fontanilla. Optimal rate control under geologic uncertainty. InSPE Improved Oil Recovery Conference?, pages SPE–113628. SPE, 2008

  49. [49]

    Zhe Liu and Albert C Reynolds. A sequential-quadratic-programming-filter algorithm with a modified stochastic gradient for robust life-cycle optimization problems with nonlinear state constraints.SPE Journal, 25(04):1938–1963, 2020

  50. [50]

    Optimization of production operations in petroleum fields

    Pengju Wang, Michael Litvak, and Khalid Aziz. Optimization of production operations in petroleum fields. InSPE Annual Technical Conference and Exhibition?, pages SPE–77658. SPE, 2002

  51. [51]

    Ensemble-based multiobjective optimization of on/off control devices under geological uncertainty

    R-M-M Fonseca, Olwijn Leeuwenburgh, Ernesto Della Rossa, PM Van den Hof, and J-D-D Jansen. Ensemble-based multiobjective optimization of on/off control devices under geological uncertainty. SPE Reservoir Evaluation & Engineering, 18(04):554–563, 2015

  52. [52]

    Improving the ensemble-optimization method through covariance-matrix adaptation.Spe Journal, 20(01):155–168, 2015

    RM M Fonseca, Olwijn Leeuwenburgh, PMJ MJ Van den Hof, and JD D Jansen. Improving the ensemble-optimization method through covariance-matrix adaptation.Spe Journal, 20(01):155–168, 2015

  53. [53]

    Im- proved sampling strategies for ensemble-based optimization.Computational Geosciences, 24:1057– 1069, 2020

    KR Ramaswamy, RM Fonseca, Olwijn Leeuwenburgh, MM Siraj, and PMJ Van den Hof. Im- proved sampling strategies for ensemble-based optimization.Computational Geosciences, 24:1057– 1069, 2020

  54. [54]

    Joint optimization of oil well placement and controls.Computational Geosciences, 16:1061–1079, 2012

    Mathias C Bellout, David Echeverr´ ıa Ciaurri, Louis J Durlofsky, Bjarne Foss, and Jon Kleppe. Joint optimization of oil well placement and controls.Computational Geosciences, 16:1061–1079, 2012

  55. [55]

    Lianlin Li, Behnam Jafarpour, and M Reza Mohammad-Khaninezhad. A simultaneous perturba- tion stochastic approximation algorithm for coupled well placement and control optimization under geologic uncertainty.Computational Geosciences, 17:167–188, 2013

  56. [56]

    Joint optimization of number of wells, well locations and controls using a gradient-based algorithm.Chemical Engineering Research and Design, 92(7):1315– 1328, 2014

    Fahim Forouzanfar and Albert C Reynolds. Joint optimization of number of wells, well locations and controls using a gradient-based algorithm.Chemical Engineering Research and Design, 92(7):1315– 1328, 2014

  57. [57]

    A general method to select representative models for decision making and optimization under uncertainty.Computers & geosciences, 96:109–123, 2016

    Mehrdad G Shirangi and Louis J Durlofsky. A general method to select representative models for decision making and optimization under uncertainty.Computers & geosciences, 96:109–123, 2016

  58. [58]

    Closed-loop field development under uncertainty by use of optimization with sample validation.SPE Journal, 20(05):908–922, 2015

    Mehrdad G Shirangi and Louis J Durlofsky. Closed-loop field development under uncertainty by use of optimization with sample validation.SPE Journal, 20(05):908–922, 2015

  59. [59]

    Optimisa- tion of decision making under uncertainty throughout field lifetime: A fractured reservoir example

    Dan Arnold, Vasily Demyanov, Mike Christie, Alexander Bakay, and Konstantin Gopa. Optimisa- tion of decision making under uncertainty throughout field lifetime: A fractured reservoir example. Computers & Geosciences, 95:123–139, 2016

  60. [60]

    Reservoir development optimization under uncertainty for infill well placement in brownfield redevelopment.Journal of Petroleum Science and Engineering, 175:444–464, 2019

    Junko Hutahaean, Vasily Demyanov, and Mike Christie. Reservoir development optimization under uncertainty for infill well placement in brownfield redevelopment.Journal of Petroleum Science and Engineering, 175:444–464, 2019

  61. [61]

    Geophysical inversion with a neighbourhood algorithm—i

    Malcolm Sambridge. Geophysical inversion with a neighbourhood algorithm—i. searching a param- eter space.Geophysical journal international, 138(2):479–494, 1999

  62. [62]

    Geophysical inversion with a neighbourhood algorithm—ii

    Malcolm Sambridge. Geophysical inversion with a neighbourhood algorithm—ii. appraising the en- semble.Geophysical Journal International, 138(3):727–746, 1999

  63. [63]

    University of Cambridge, Department of Engineering Cambridge, UK, 1994

    Gavin A Rummery and Mahesan Niranjan.On-line Q-learning using connectionist systems, vol- ume 37. University of Cambridge, Department of Engineering Cambridge, UK, 1994. 29

  64. [64]

    Q-learning.Machine learning, 8:279–292, 1992

    Christopher JCH Watkins and Peter Dayan. Q-learning.Machine learning, 8:279–292, 1992

  65. [65]

    A reinforcement learning approach for waterflooding optimization in petroleum reservoirs.Engineering Applications of Artificial Intelligence, 77:98–116, 2019

    Farzad Hourfar, Hamed Jalaly Bidgoly, Behzad Moshiri, Karim Salahshoor, and Ali Elkamel. A reinforcement learning approach for waterflooding optimization in petroleum reservoirs.Engineering Applications of Artificial Intelligence, 77:98–116, 2019

  66. [66]

    Waterflooding optimization under geological uncertainties by using deep reinforcement learning algorithms

    Hongze Ma, Gaoming Yu, Yuehui She, and Yongan Gu. Waterflooding optimization under geological uncertainties by using deep reinforcement learning algorithms. InSPE Annual Technical Conference and Exhibition?, page D031S043R001. SPE, 2019

  67. [67]

    Deep reinforcement learning: reservoir optimization from pixels

    Ruslan Miftakhov, Abdulaziz Al-Qasim, and Igor Efremov. Deep reinforcement learning: reservoir optimization from pixels. InInternational Petroleum Technology Conference, page D021S052R002. IPTC, 2020

  68. [68]

    Proximal Policy Optimization Algorithms

    John Schulman, Filip Wolski, Prafulla Dhariwal, Alec Radford, and Oleg Klimov. Proximal policy optimization algorithms.arXiv preprint arXiv:1707.06347, 2017

  69. [69]

    Stochastic optimal well control in subsurface reservoirs using reinforcement learning.Engineering Applications of Artificial Intelligence, 114:105106, 2022

    Atish Dixit and Ahmed H ElSheikh. Stochastic optimal well control in subsurface reservoirs using reinforcement learning.Engineering Applications of Artificial Intelligence, 114:105106, 2022

  70. [70]

    Asynchronous methods for deep reinforcement learning.arXiv preprint arXiv:1602.01783,

    Volodymyr Mnih. Asynchronous methods for deep reinforcement learning.arXiv preprint arXiv:1602.01783, 2016

  71. [71]

    Soft actor-critic: Off-policy maximum entropy deep reinforcement learning with a stochastic actor

    Tuomas Haarnoja, Aurick Zhou, Pieter Abbeel, and Sergey Levine. Soft actor-critic: Off-policy maximum entropy deep reinforcement learning with a stochastic actor. InInternational conference on machine learning, pages 1861–1870. PMLR, 2018

  72. [72]

    Deep reinforcement learning and adaptive policy transfer for generalizable well control optimization.Journal of Petroleum Science and Engineering, 217:110868, 2022

    Zhongzheng Wang, Kai Zhang, Jinding Zhang, Guodong Chen, Xiaopeng Ma, Guojing Xin, Jinzheng Kang, Hanjun Zhao, and Yongfei Yang. Deep reinforcement learning and adaptive policy transfer for generalizable well control optimization.Journal of Petroleum Science and Engineering, 217:110868, 2022

  73. [73]

    Evolutionary-assisted reinforcement learning for reservoir real-time production optimization under uncertainty.Petroleum Science, 20(1):261–276, 2023

    Zhong-Zheng Wang, Kai Zhang, Guo-Dong Chen, Jin-Ding Zhang, Wen-Dong Wang, Hao-Chen Wang, Li-Ming Zhang, Xia Yan, and Jun Yao. Evolutionary-assisted reinforcement learning for reservoir real-time production optimization under uncertainty.Petroleum Science, 20(1):261–276, 2023

  74. [74]

    Practical closed-loop reservoir management using deep reinforce- ment learning.SPE Journal, 28(03):1135–1148, 2023

    Yusuf Nasir and Louis J Durlofsky. Practical closed-loop reservoir management using deep reinforce- ment learning.SPE Journal, 28(03):1135–1148, 2023

  75. [75]

    Multi-asset closed-loop reservoir management using deep rein- forcement learning.Computational Geosciences, 28(1):23–42, 2024

    Yusuf Nasir and Louis J Durlofsky. Multi-asset closed-loop reservoir management using deep rein- forcement learning.Computational Geosciences, 28(1):23–42, 2024

  76. [76]

    Deep reinforcement learning for generalizable field development optimization.SPE Journal, 27(01):226–245, 2022

    Jincong He, Meng Tang, Chaoshun Hu, Shusei Tanaka, Kainan Wang, Xian-Huan Wen, and Yusuf Nasir. Deep reinforcement learning for generalizable field development optimization.SPE Journal, 27(01):226–245, 2022

  77. [77]

    Deep reinforcement learning for constrained field development optimization in subsurface two-phase flow

    Yusuf Nasir, Jincong He, Chaoshun Hu, Shusei Tanaka, Kainan Wang, and XianHuan Wen. Deep reinforcement learning for constrained field development optimization in subsurface two-phase flow. Frontiers in Applied Mathematics and Statistics, 7:689934, 2021. 30