pith. sign in

arxiv: 2504.09375 · v2 · pith:JFONOOIGnew · submitted 2025-04-12 · 🧮 math.OC

Efficient Gradient-Enhanced Bayesian Optimizer with Comparisons to Conjugate-Gradient and Quasi-Newton Optimizers for Unconstrained Local Optimization

Pith reviewed 2026-05-22 20:11 UTC · model grok-4.3

classification 🧮 math.OC
keywords Bayesian optimizationgradient-enhancedlocal optimizationunconstrained optimizationRosenbrock functionconjugate gradientquasi-Newtonnoisy gradients
0
0 comments X

The pith

Gradient-enhanced Bayesian optimizer reaches equivalent optimality to conjugate-gradient and quasi-Newton methods while often using significantly fewer function evaluations on unimodal problems up to 40 dimensions.

A machine-rendered reading of the paper's core claim, the machinery that carries it, and where it could break.

The paper introduces a local Bayesian optimization framework that incorporates gradients by building surrogates from a selected subset of points and minimizing the acquisition function inside a probabilistic trust region. This setup is benchmarked against MATLAB and SciPy implementations of conjugate-gradient and quasi-Newton methods on unimodal test functions ranging from 2 to 40 dimensions. The Bayesian approach matches the final convergence depth of the classical methods and frequently requires fewer objective evaluations to reach that depth. When gradients contain noise or are inaccurate, the probabilistic surrogate allows the Bayesian optimizer to continue improving several orders of magnitude beyond the point where the classical methods stall. The results indicate that Bayesian local optimization can be competitive or superior precisely when evaluations are costly or derivative information is imperfect.

Core claim

A gradient-enhanced Bayesian optimizer that selects a subset of evaluation points for the surrogate and uses a probabilistic trust region to minimize the acquisition function converges the optimality as deeply as conjugate-gradient and quasi-Newton optimizers while often requiring substantially fewer function evaluations. On the 40-dimensional Rosenbrock function the Bayesian optimizer needs only half as many evaluations as the MATLAB and SciPy solvers to reduce optimality by ten orders of magnitude. With noisy gradients the Bayesian method reaches several additional orders of magnitude of convergence. On the Lorenz 63 system with inaccurate gradients it attains a lower final objective value

What carries the argument

Gradient-enhanced Bayesian optimizer that selects a subset of evaluation points to construct the surrogate and applies a probabilistic trust region when minimizing the acquisition function.

If this is right

  • On problems with accurate gradients the Bayesian optimizer matches final convergence depth while using fewer evaluations.
  • When gradients are noisy the probabilistic surrogate permits convergence several orders of magnitude deeper than classical methods.
  • For the Lorenz 63 model with inaccurate gradients the Bayesian optimizer reaches a lower final objective from every tested starting point.
  • The framework remains effective across problem dimensions from 2 to 40 on unimodal landscapes.

Where Pith is reading between the lines

These are editorial extensions of the paper, not claims the author makes directly.

  • The same subset-selection and probabilistic-trust-region construction may extend to modestly multimodal problems if the trust-region radius is allowed to adapt.
  • Engineering applications that rely on expensive simulations with approximate derivatives could adopt this approach to lower total simulation count.
  • Parallel evaluation of the selected subset points could further reduce wall-clock time without changing the serial convergence behavior.

Load-bearing premise

Selecting a subset of evaluation points combined with a probabilistic trust region produces an acquisition function whose minimization reliably drives local convergence without excessive overhead or failure to escape flat regions on the tested unimodal problems.

What would settle it

On the 40-dimensional Rosenbrock function, if the Bayesian optimizer requires more than half the function evaluations of the MATLAB optimizer to reduce optimality by ten orders of magnitude, or if it stops at a higher optimality value than the conjugate-gradient and quasi-Newton solvers.

read the original abstract

The probabilistic surrogates used by Bayesian optimizers make them popular methods when function evaluations are noisy or expensive to evaluate. While Bayesian optimizers are traditionally used for global optimization, their benefits are also valuable for local optimization. In this paper, a framework for gradient-enhanced unconstrained local Bayesian optimization is presented. It involves selecting a subset of the evaluation points to construct the surrogate and using a probabilistic trust region for the minimization of the acquisition function. The Bayesian optimizer is compared to conjugate-gradient and quasi-Newton optimizers from MATLAB and SciPy for unimodal problems with 2 to 40 dimensions. The Bayesian optimizer converges the optimality as deeply as the optimizers used for comparison and often does so using significantly fewer function evaluations. For the minimization of the 40-dimensional Rosenbrock function for example, the Bayesian optimizer requires half as many function evaluations as the MATLAB and SciPy optimizers to reduce the optimality by 10 orders of magnitude. For test cases with noisy gradients, the probabilistic surrogate of the Bayesian optimizer enables it to converge the optimality several additional orders of magnitude relative to the conjugate-gradient and quasi-Newton optimizers. The final test case involves the chaotic Lorenz 63 model and inaccurate gradients. For this problem, the Bayesian optimizer achieves a lower final objective evaluation than the SciPy quasi-Newton optimizer for all initial starting solutions. The results demonstrate that a Bayesian optimizer can be competitive with quasi-Newton and conjugate-gradient optimizers when accurate gradients are available, and significantly outperforms them when the gradients are innacurate.

Editorial analysis

A structured set of objections, weighed in public.

Desk editor's note, referee report, simulated authors' rebuttal, and a circularity audit. Tearing a paper down is the easy half of reading it; the pith above is the substance, this is the friction.

Referee Report

3 major / 3 minor

Summary. The paper introduces a gradient-enhanced Bayesian optimization framework for unconstrained local minimization that constructs surrogates from a selected subset of evaluation points and minimizes the acquisition function inside a probabilistic trust region. It reports numerical comparisons against MATLAB and SciPy conjugate-gradient and quasi-Newton implementations on unimodal test problems (Rosenbrock in 2–40 dimensions and the Lorenz 63 system), claiming that the Bayesian method reaches comparable or deeper optimality reductions, often with substantially fewer function evaluations, and that it is more robust when gradients are noisy or inaccurate.

Significance. If the reported efficiency gains prove robust, the work would demonstrate that suitably adapted Bayesian methods can serve as practical local optimizers rather than being restricted to global search, particularly when gradient information is available but imperfect. The direct head-to-head comparisons on standard benchmarks against widely used CG and quasi-Newton codes constitute a concrete strength; the paper also supplies reproducible numerical evidence on both noise-free and noisy-gradient regimes.

major comments (3)
  1. [Abstract / Results (40D Rosenbrock)] Abstract and results on 40-dimensional Rosenbrock: the headline claim that the Bayesian optimizer requires “half as many function evaluations” to achieve a 10-order reduction in optimality is presented as a point estimate without reported statistics, multiple random initial conditions, or sensitivity sweeps over subset cardinality and trust-region radius. Because the Rosenbrock valley is narrow, any unreported tuning of these two algorithmic knobs could produce the observed factor-of-two advantage; this directly undermines the general efficiency conclusion.
  2. [Method (subset selection and probabilistic trust region)] Method description of subset selection and probabilistic trust region: the paper does not quantify how the subset cardinality is chosen or how the trust-region radius distribution is parameterized, nor does it provide ablation or robustness checks on these choices for the narrow-valley geometry of the 40D Rosenbrock function. Without such analysis the central performance claim rests on an unverified assumption that the surrogate-plus-trust-region construction reliably drives local convergence.
  3. [Noisy-gradient test cases] Noisy-gradient experiments: while the abstract states that the Bayesian optimizer converges “several additional orders of magnitude” deeper than CG/quasi-Newton when gradients are noisy, the manuscript supplies neither the exact noise model, the number of independent trials, nor error bars, making it impossible to assess whether the reported advantage is statistically reliable or an artifact of a single realization.
minor comments (3)
  1. [Abstract] Abstract contains the typo “innacurate” (should be “inaccurate”).
  2. [Abstract and throughout] The phrasing “converges the optimality” is nonstandard; consider “reduces the optimality measure” or “drives the gradient norm / objective gap down.”
  3. [Numerical experiments] The manuscript should state the precise MATLAB and SciPy function calls, tolerances, and line-search parameters used for the baseline optimizers so that the comparison is fully reproducible.

Simulated Author's Rebuttal

3 responses · 0 unresolved

We thank the referee for the constructive and detailed comments. We address each major comment below and will revise the manuscript to incorporate additional analysis and clarifications where appropriate.

read point-by-point responses
  1. Referee: [Abstract / Results (40D Rosenbrock)] Abstract and results on 40-dimensional Rosenbrock: the headline claim that the Bayesian optimizer requires “half as many function evaluations” to achieve a 10-order reduction in optimality is presented as a point estimate without reported statistics, multiple random initial conditions, or sensitivity sweeps over subset cardinality and trust-region radius. Because the Rosenbrock valley is narrow, any unreported tuning of these two algorithmic knobs could produce the observed factor-of-two advantage; this directly undermines the general efficiency conclusion.

    Authors: We agree that the 40D Rosenbrock result is presented as a point estimate from a representative run, which limits the strength of the general efficiency claim given the function's narrow valley. The comparison used a fixed but standard choice of subset cardinality and trust-region parameters. We will revise the abstract and results section to report performance statistics (means and standard deviations) over multiple random initial conditions and include a sensitivity analysis on subset cardinality and trust-region radius to demonstrate that the observed advantage is robust rather than an artifact of specific tuning. revision: yes

  2. Referee: [Method (subset selection and probabilistic trust region)] Method description of subset selection and probabilistic trust region: the paper does not quantify how the subset cardinality is chosen or how the trust-region radius distribution is parameterized, nor does it provide ablation or robustness checks on these choices for the narrow-valley geometry of the 40D Rosenbrock function. Without such analysis the central performance claim rests on an unverified assumption that the surrogate-plus-trust-region construction reliably drives local convergence.

    Authors: The method section outlines the subset selection and probabilistic trust-region approach, but we acknowledge that explicit quantification of cardinality selection rules and the radius distribution parameterization, together with ablation studies, would strengthen the presentation. We will expand the method description to provide these details and add robustness checks focused on narrow-valley problems such as 40D Rosenbrock. revision: yes

  3. Referee: [Noisy-gradient test cases] Noisy-gradient experiments: while the abstract states that the Bayesian optimizer converges “several additional orders of magnitude” deeper than CG/quasi-Newton when gradients are noisy, the manuscript supplies neither the exact noise model, the number of independent trials, nor error bars, making it impossible to assess whether the reported advantage is statistically reliable or an artifact of a single realization.

    Authors: We will revise the noisy-gradient experiments section to specify the exact noise model, state the number of independent trials, and include error bars or other statistical summaries so that the reliability of the reported advantage can be properly assessed. revision: yes

Circularity Check

0 steps flagged

No circularity; claims rest on external numerical benchmarks

full rationale

The paper introduces a gradient-enhanced Bayesian optimizer using subset selection for the surrogate and a probabilistic trust region for acquisition minimization. All performance claims (e.g., half the evaluations on 40D Rosenbrock to reach 10-order optimality reduction, better behavior under noisy gradients) are supported exclusively by direct comparisons against independent MATLAB and SciPy CG/quasi-Newton implementations on standard test problems. No derivation chain, equation, or self-citation reduces any result to its own inputs by construction. No fitted parameters are relabeled as predictions, no uniqueness theorems are imported from the authors' prior work, and no ansatz is smuggled via citation. The method is self-contained against external benchmarks.

Axiom & Free-Parameter Ledger

0 free parameters · 2 axioms · 0 invented entities

The framework rests on standard assumptions about Gaussian process surrogates incorporating gradients and the effectiveness of probabilistic trust regions for local acquisition minimization; no new free parameters or invented entities are explicitly introduced in the abstract.

axioms (2)
  • domain assumption A Gaussian process surrogate can accurately incorporate gradient information to model the objective for local search.
    This is the core modeling choice enabling the gradient-enhanced Bayesian optimizer.
  • domain assumption A probabilistic trust region can be used to constrain and guide minimization of the acquisition function without missing local improvements.
    This choice is central to making the method suitable for local rather than global optimization.

pith-pipeline@v0.9.0 · 5814 in / 1331 out tokens · 68565 ms · 2026-05-22T20:11:58.583867+00:00 · methodology

discussion (0)

Sign in with ORCID, Apple, or X to comment. Anyone can read and Pith papers without signing in.

Forward citations

Cited by 1 Pith paper

Reviewed papers in the Pith corpus that reference this work. Sorted by Pith novelty score.

  1. A Framework for Nonlinearly-Constrained Gradient-Enhanced Local Bayesian Optimization with Comparisons to Quasi-Newton Optimizers

    math.OC 2025-05 conditional novelty 6.0

    Two frameworks for nonlinear equality constraints in gradient-enhanced local Bayesian optimization achieve deeper convergence with fewer function evaluations than previous constrained BO methods and SciPy/MATLAB quasi...

Reference graph

Works this paper leans on

60 extracted references · 60 canonical work pages · cited by 1 Pith paper · 2 internal anchors

  1. [1]

    Numerical Optimization

    Nocedal J, Wright SJ. Numerical Optimization. Springer series in operation research and financial engineeringNew York, NY: Springer. second edition ed., 2006

  2. [2]

    Optimum Aerodynamic Design Using the Navier-Stokes Equations

    Jameson A, Martinelli L, Pierce N. Optimum Aerodynamic Design Using the Navier-Stokes Equations. Theoretical and Computational Fluid Dynamics. 1998;10(1-4):213–237. doi: 10.1007/s001620050060

  3. [3]

    Gaussian Processes for Machine Learning

    Rasmussen CE, Williams CKI. Gaussian Processes for Machine Learning. Adaptive computation and machine learningCambridge, Mass: MIT Press, 2006

  4. [4]

    Noise Estimation in Gaussian Process Regression

    Ameli S, Shadden SC. Noise Estimation in Gaussian Process Regression. 2022

  5. [5]

    Taking the human out of the loop: A review of Bayesian optimization,

    Shahriari B, Swersky K, Wang Z, Adams RP, Freitas dN. Taking the Human Out of the Loop: A Review of Bayesian Optimization.Proceedings of the IEEE. 2016;104(1):148–175. doi: 10.1109/JPROC.2015.2494218

  6. [6]

    Sensitivity-Based Sequential Sampling of Cokriging Response Surfaces for Aerodynamic Data

    Paul-Dubois-Taine A, Nadarajah S. Sensitivity-Based Sequential Sampling of Cokriging Response Surfaces for Aerodynamic Data. In: American Institute of Aeronautics and Astronautics 2013; San Diego, CA

  7. [7]

    A Tutorial on Bayesian Optimization of Expensive Cost Functions, with Application to Active User Modeling and Hierarchical Reinforcement Learning

    Brochu E, Cora VM, Freitas dN. A Tutorial on Bayesian Optimization of Expensive Cost Functions, with Application to Active User Modeling and Hierarchical Reinforcement Learning. 2010:49

  8. [8]

    Expected improvement for expensive optimization: a review

    Zhan D, Xing H. Expected improvement for expensive optimization: a review. Journal of Global Optimization. 2020;78(3):507–544. doi: 10.1007/s10898-020-00923-x

  9. [9]

    Variable Metric Method for Minimization

    Davidon WC. Variable Metric Method for Minimization. SIAM Journal on optimization. 1991;1(1):1–17

  10. [10]

    Bayesian Design and Analysis of Computer Experiments: Use of Derivatives in Surface Prediction

    Morris MD, Mitchell TJ, Ylvisaker D. Bayesian Design and Analysis of Computer Experiments: Use of Derivatives in Surface Prediction. Technometrics. 1993;35(3):243–255. doi: 10.1080/00401706.1993.10485320

  11. [11]

    Exploiting gradients and Hessians in Bayesian optimization and Bayesian quadrature

    Wu A, Aoi MC, Pillow JW. Exploiting gradients and Hessians in Bayesian optimization and Bayesian quadrature.arXiv:1704.00060 [stat]. 2018

  12. [12]

    Gradient-Enhanced Kriging for High-Dimensional Bayesian Optimization with Linear Embedding

    Cheng K, Zimmermann R. Gradient-Enhanced Kriging for High-Dimensional Bayesian Optimization with Linear Embedding. AIAA Journal. 2023;61(11):4946–4959. doi: 10.2514/1.J062592

  13. [13]

    Gradient-Enhanced Bayesian Optimization With Application to Aerodynamic Shape Optimization

    Marchildon AL, Zingg DW. Gradient-Enhanced Bayesian Optimization With Application to Aerodynamic Shape Optimization. In: AIAA Aviation Forum and Ascend 2024, AIAA 2024-4405 2024; Las Vegas, Nevada

  14. [14]

    Efficient and robust gradient enhanced Kriging emulators

    Dalbey K. Efficient and robust gradient enhanced Kriging emulators.. Tech. Rep. SAND2013-7022, 1096451, Sandia National Laboratories; 2013

  15. [15]

    A Non-intrusive Solution to the Ill-Conditioning Problem of the Gradient-Enhanced Gaussian Covariance Matrix for Gaussian Processes

    Marchildon AL, Zingg DW. A Non-intrusive Solution to the Ill-Conditioning Problem of the Gradient-Enhanced Gaussian Covariance Matrix for Gaussian Processes. Journal of Scientific Computing. 2023;95(3). doi: 10.1007/s10915-023-02190-w

  16. [16]

    Gaussian Processes for Global Optimization

    Osborne MA, Garnett R, Roberts SJ. Gaussian Processes for Global Optimization. In: Learning and Intelligent Optimization (LION) 2009; Trento, Italy

  17. [17]

    Gradient-based multifidelity optimisation for aircraft design using Bayesian model calibration.The Aeronautical Journal

    March A, Willcox K, Wang Q. Gradient-based multifidelity optimisation for aircraft design using Bayesian model calibration.The Aeronautical Journal. 2011;115(1174):729–738. doi: 10.1017/S0001924000006473

  18. [18]

    Systematic cost analysis of gradient- and anisotropy-enhanced Bayesian design optimization

    Shende S, Gillman A, Buskohl P, Vemaganti K. Systematic cost analysis of gradient- and anisotropy-enhanced Bayesian design optimization. Structural and Multidisciplinary Optimization. 2022;65(8):235. doi: 10.1007/s00158-022-03324-8

  19. [19]

    A solution to the ill-conditioning of gradient-enhanced covariance matrices for Gaussian processes.International Journal for Numerical Methods in Engineering

    Marchildon AL, Zingg DW. A solution to the ill-conditioning of gradient-enhanced covariance matrices for Gaussian processes.International Journal for Numerical Methods in Engineering. 2024. doi: 10.1002/nme.7498 26 Marchildon and Zingg

  20. [20]

    Aircraft Wing Optimization based on Computationally Efficient Gradient-Enhanced Ordinary Kriging Metamodel Building

    Mortished C, Ollar J, Toropov V , Sienz J. Aircraft Wing Optimization based on Computationally Efficient Gradient-Enhanced Ordinary Kriging Metamodel Building. In: 2016; San Diego, California, USA

  21. [21]

    Exploiting Hessian matrix and trust-region algorithm in hyperparameters estimation of Gaussian process.Appl

    Zhang Y , Leithead WE. Exploiting Hessian matrix and trust-region algorithm in hyperparameters estimation of Gaussian process.Appl. Math. Comput.. 2005

  22. [22]

    Marginalizing Gaussian process hyperparameters using sequential Monte Carlo

    Svensson A, Dahlin J, Schon TB. Marginalizing Gaussian process hyperparameters using sequential Monte Carlo. In: IEEE 2015; Cancun, Mexico:477–480

  23. [23]

    Gradient based hyper-parameter optimisation for well conditioned kriging metamodels

    Ollar J, Mortished C, Jones R, Sienz J, Toropov V . Gradient based hyper-parameter optimisation for well conditioned kriging metamodels. Structural and Multidisciplinary Optimization. 2017;55(6):2029–2044. doi: 10.1007/s00158-016-1626-8

  24. [24]

    Exploiting active subspaces of hyperparameters for efficient high-dimensional Kriging modeling.Mechanical Systems and Signal Processing

    Chen L, Qiu H, Gao L, Yang Z, Xu D. Exploiting active subspaces of hyperparameters for efficient high-dimensional Kriging modeling.Mechanical Systems and Signal Processing. 2022;169. doi: 10.1016/j.ymssp.2021.108643

  25. [25]

    Optimization of expensive black-box problems via Gradient-enhanced Kriging.Computer Methods in Applied Mechanics and Engineering

    Chen L, Qiu H, Gao L, Jiang C, Yang Z. Optimization of expensive black-box problems via Gradient-enhanced Kriging.Computer Methods in Applied Mechanics and Engineering. 2020;362. doi: 10.1016/j.cma.2020.112861

  26. [26]

    Improving variable-fidelity surrogate modeling via gradient-enhanced kriging and a generalized hybrid bridge function

    Han ZH, Görtz S, Zimmermann R. Improving variable-fidelity surrogate modeling via gradient-enhanced kriging and a generalized hybrid bridge function. Aerospace Science and Technology.2013;25(1):177–189. doi: 10.1016/j.ast.2012.01.006

  27. [27]

    Bayesian Optimization with Gradients

    Wu J, Poloczek M, Wilson AG, Frazier P. Bayesian Optimization with Gradients. In: 2017; Long Beach, CA, USA:5273–5284

  28. [28]

    On the Maximum Likelihood Training of Gradient-Enhanced Spatial Gaussian Processes

    Zimmermann R. On the Maximum Likelihood Training of Gradient-Enhanced Spatial Gaussian Processes. SIAM Journal on Scientific Computing. 2013;35(6):A2554–A2574. doi: 10.1137/13092229X

  29. [29]

    An Overview of Gradient-Enhanced Metamodels with Applications.Archives of Computational Methods in Engineering

    Laurent L, Le Riche R, Soulier B, Boucard PA. An Overview of Gradient-Enhanced Metamodels with Applications.Archives of Computational Methods in Engineering. 2019;26(1):61–106. doi: 10.1007/s11831-017-9226-3

  30. [30]

    High-Dimensional Gaussian Process Inference with Derivatives

    De Roos F, Gessner A, Hennig P. High-Dimensional Gaussian Process Inference with Derivatives. In: 2021:2535–2545

  31. [31]

    Kriging Hyperparameter Tuning Strategies.AIAA Journal

    Toal DJJ, Bressloff NW, Keane AJ. Kriging Hyperparameter Tuning Strategies.AIAA Journal. 2008;46(5):1240–1252. doi: 10.2514/1.34822

  32. [32]

    The development of a hybridized particle swarm for kriging hyperparameter tuning.Engineering Optimization

    Toal DJ, Bressloff NW, Keane AJ, Holden CM. The development of a hybridized particle swarm for kriging hyperparameter tuning.Engineering Optimization. 2011;43(6):675–699. doi: 10.1080/0305215X.2010.508524

  33. [33]

    Practical Bayesian Optimization of Machine Learning Algorithms

    Snoek J, Larochelle H, Adams RP. Practical Bayesian Optimization of Machine Learning Algorithms. International Conference on Neural Information Processing Systems. 2012;2:2951–2959

  34. [34]

    An Improved Approach for Estimating the Hyperparameters of the Kriging Model for High- Dimensional Problems through the Partial Least Squares Method

    Bouhlel MA, Bartoli N, Otsmane A, Morlier J. An Improved Approach for Estimating the Hyperparameters of the Kriging Model for High- Dimensional Problems through the Partial Least Squares Method. Mathematical Problems in Engineering. 2016;2016. doi: 10.1155/2016/6723410

  35. [35]

    Efficient global optimization for high-dimensional constrained problems by using the Kriging models combined with the partial least squares method

    Amine Bouhlel M, Bartoli N, Regis RG, Otsmane A, Morlier J. Efficient global optimization for high-dimensional constrained problems by using the Kriging models combined with the partial least squares method. Engineering Optimization. 2018;50(12):2038–2053. doi: 10.1080/0305215X.2017.1419344

  36. [36]

    An efficient kriging modeling method for high-dimensional design problems based on maximal information coefficient

    Zhao L, Wang P, Song B, Wang X, Dong H. An efficient kriging modeling method for high-dimensional design problems based on maximal information coefficient. Structural and Multidisciplinary Optimization. 2020;61(1):39–57. doi: 10.1007/s00158-019-02342-3

  37. [37]

    Sliced Gradient-Enhanced Kriging for High-Dimensional Function Approximation

    Cheng K, Zimmermann R. Sliced Gradient-Enhanced Kriging for High-Dimensional Function Approximation. SIAM Journal on Scientific Computing. 2023;45(6):A2858–A2885. doi: 10.1137/22M154315X

  38. [38]

    Using gradients to construct cokriging approximation models for high-dimensional design optimization problems

    Chung HS, Alonso J. Using gradients to construct cokriging approximation models for high-dimensional design optimization problems. In: 2002; Reno, NV , U.S.A

  39. [39]

    Efficient Global Optimization of Expensive Black-Box Functions.Journal of Global Optimization.1998;13:455–

    Jones DR, Schonlau M, Welch WJ. Efficient Global Optimization of Expensive Black-Box Functions.Journal of Global Optimization.1998;13:455–

  40. [40]

    doi: https://doi.org/10.1023/A:1008306431147

  41. [41]

    Optimization Under Unknown Constraints

    Gramacy RB, Lee HKH. Optimization Under Unknown Constraints. In: Bernardo JM, Bayarri MJ, Berger JO, et al., eds. Bayesian Statistics 9, , Oxford University Press, 2011:229–256

  42. [42]

    SciPy 1.0: fundamental algorithms for scientific computing in Python.Nature Methods

    Virtanen P, Gommers R, Oliphant TE, et al. SciPy 1.0: fundamental algorithms for scientific computing in Python.Nature Methods. 2020;17(3):261–

  43. [43]

    doi: 10.1038/s41592-019-0686-2

  44. [44]

    The Convergence of a Class of Double-rank Minimization Algorithms 1

    Broyden CG. The Convergence of a Class of Double-rank Minimization Algorithms 1. General Considerations.IMA Journal of Applied Mathematics. 1970;6(1):76–90. doi: 10.1093/imamat/6.1.76

  45. [45]

    A new approach to variable metric algorithms

    Fletcher R. A new approach to variable metric algorithms. The Computer Journal. 1970;13(3):317–322. doi: 10.1093/comjnl/13.3.317

  46. [46]

    A Family of Variable-Metric Methods Derived by Variational Means.Mathematics of computation

    Goldfarb D. A Family of Variable-Metric Methods Derived by Variational Means.Mathematics of computation. 1970;24(109):23–26

  47. [47]

    Conditioning of Quasi-Newton Methods for Function Minimization

    Shanno DF. Conditioning of Quasi-Newton Methods for Function Minimization. Mathematics of Computation. 1970;24(111):647–656. An Efficient Local Optimization Framework for Gradient-Enhanced Bayesian Optimizers 27

  48. [48]

    Engineering Design Optimization

    Martins JRRA, Ning A. Engineering Design Optimization. Cambridge University Press. 1 ed., 2021

  49. [49]

    Nonlinear Dynamics and Chaos

    Strogatz SH. Nonlinear Dynamics and Chaos. CRC Press. 2 ed., 2018

  50. [50]

    Sensitivity analysis of the climate of a chaotic system

    Lea DJ, Allen MR, Haine TWN. Sensitivity analysis of the climate of a chaotic system. Tellus A: Dynamic Meteorology and Oceanography. 2000;52(5):523–532. doi: 10.1034/j.1600-0870.2000.01137.x

  51. [51]

    Forward and adjoint sensitivity computation of chaotic dynamical systems

    Wang Q. Forward and adjoint sensitivity computation of chaotic dynamical systems. Journal of Computational Physics. 2013;235:1–13. doi: 10.1016/j.jcp.2012.09.007

  52. [52]

    Toward a chaotic adjoint for LES

    Blonigan PJ, Fernandez P, Murman SM, Wang Q, Rigas G, Magri L. Toward a chaotic adjoint for LES.arXiv:1702.06809 [nlin, physics:physics]

  53. [53]

    Least-Squares Shadowing Sensitivity Analysis of Chaotic Flow Around a Two-Dimensional Airfoil

    Blonigan PJ, Wang Q, Nielsen EJ, Diskin B. Least-Squares Shadowing Sensitivity Analysis of Chaotic Flow Around a Two-Dimensional Airfoil. AIAA Journal. 2018;56(2):658–672. doi: 10.2514/1.J055389

  54. [54]

    Sensitivity analysis on chaotic dynamical systems by Non-Intrusive Least Squares Shadowing (NILSS)

    Ni A, Wang Q. Sensitivity analysis on chaotic dynamical systems by Non-Intrusive Least Squares Shadowing (NILSS). Journal of Computational Physics. 2017;347:56–77. doi: 10.1016/j.jcp.2017.06.033

  55. [55]

    Towards Aerodynamic Shape Optimization of Unsteady Turbulent Flows

    Ashley A, Crean J, Hicken J. Towards Aerodynamic Shape Optimization of Unsteady Turbulent Flows. In: AIAA Scitech 2019 Forum, AIAA 2019-0168 2019; San Diego, California

  56. [56]

    An Analysis of the Ensemble Adjoint Approach to Sensitivity Analysis in Chaotic Systems

    Chandramoorthy N, Fernandez P, Talnikar C, Wang Q. An Analysis of the Ensemble Adjoint Approach to Sensitivity Analysis in Chaotic Systems. In: 23rd AIAA Computational Fluid Dynamics Conference, AIAA 2017-3799 2017; Denver, Colorado

  57. [57]

    Aerodynamic Shape Optimization of Unsteady, Chaotic Flows

    Ashley A. Aerodynamic Shape Optimization of Unsteady, Chaotic Flows. PhD thesis. Rensselaer Polytechnic Institute, 2019

  58. [58]

    J.: Diagnostic equations in isobaric coordinates, Mon

    Lorenz EN. Deterministic Nonperiodic Flow. Journal of the Atmospheric Sciences. 1963;20(2):130–141. doi: 10.1175/1520- 0469(1963)020<0130:DNF>2.0.CO;2

  59. [59]

    An adjoint for likelihood maximization

    Toal DJJ, Forrester AIJ, Bressloff NW, Keane AJ, Holden C. An adjoint for likelihood maximization. Proceedings of the Royal Society A: Mathematical, Physical and Engineering Sciences. 2009;465(2111):3267–3287. doi: 10.1098/rspa.2009.0096

  60. [60]

    ∂rx ∂u + ∂rx ∂u ⊤# v + 1 2 v⊤

    Smith SP. Differentiation of the Cholesky Algorithm. Journal of Computational and Graphical Statistics. 1995;4(2):134–147. doi: 10.1080/10618600.1995.10474671 APPENDIX A NOISE-FREE CLOSED FORM LIKELIHOOD SOLUTION The value of β that maximizes the marginal log likelihood from Eq. (25) is given by ∂ ln(L) ∂β = ˇ1 ⊤ Σ–1 ∇f ∇ – βˇ1 ⊤ Σ–1 ∇ˇ1 = 0 β = ˇ1 ⊤ Σ–1 ...