BUP-TR: Bayesian Underdetermined Projection Trust-Region Methods for Derivative-Free Optimization

Li Zhang; Pengcheng Xie; Wei Hu; Ya-Xiang Yuan

arxiv: 2605.30841 · v2 · pith:G4FCMQLHnew · submitted 2026-05-29 · 🧮 math.OC

BUP-TR: Bayesian Underdetermined Projection Trust-Region Methods for Derivative-Free Optimization

Wei Hu , Pengcheng Xie , Ya-Xiang Yuan , Li Zhang This is my paper

Pith reviewed 2026-06-28 21:44 UTC · model grok-4.3

classification 🧮 math.OC

keywords derivative-free optimizationtrust-region methodsunderdetermined interpolationBayesian projectionMAP-poisednessglobal convergencemodel-based optimizationcomplexity analysis

0 comments

The pith

BUP-TR selects underdetermined quadratic models by projecting a prior in the precision norm, making hard-MAP models fully linear and delivering global first-order convergence at O(ε^{-2}) cost.

A machine-rendered reading of the paper's core claim, the machinery that carries it, and where it could break.

The paper develops BUP-TR for derivative-free trust-region optimization. It replaces classical fixed-norm model selection with a projection of a prior quadratic onto the affine interpolation set, where the projection is taken in the precision norm induced by the prior itself. The same precision matrix supplies both a spectral geometry certificate called MAP-poisedness and an explicit repair mechanism for the interpolation set. Under standard smoothness, uniform precision bounds, MAP-poisedness, and a trust-region-scale prior-accuracy condition, the resulting hard-MAP models satisfy the fully-linear property. Consequently the algorithm converges globally to first-order stationary points while counting all geometry-repair evaluations, and a NEWUOA-style implementation improves fixed-budget performance on the reported test suite at moderate and stringent accuracy targets.

Core claim

BUP-TR completes the affine family of interpolating quadratics by projecting a prior quadratic onto the interpolation set in the prior's precision norm. The identical precision matrix defines MAP-poisedness, a spectral certificate for the interpolation set, and a repair procedure. Under the stated smoothness, uniform-precision, MAP-poisedness, and trust-region-scale prior-accuracy assumptions, the hard-MAP models are fully linear; the trust-region method therefore attains global first-order convergence together with an O(ε^{-2}) evaluation complexity bound that includes geometry repairs.

What carries the argument

Bayesian underdetermined projection (BUP) of the prior quadratic onto the interpolation set measured in the prior precision norm; MAP-poisedness, the spectral condition on the interpolation set induced by the same matrix.

If this is right

Global first-order convergence holds for the sequence of trust-region iterates.
Total function evaluations, including geometry-repair steps, scale as O(ε^{-2}).
The computational structure of a Powell-type trust-region method is retained while model selection is altered.
Fixed-budget performance improves relative to the baseline at both moderate and stringent accuracy targets on the reported benchmark suite.

Where Pith is reading between the lines

These are editorial extensions of the paper, not claims the author makes directly.

The precision-norm geometry certificate may transfer directly to other model-based derivative-free frameworks that already maintain interpolation sets.
MAP-poisedness supplies an a-priori diagnostic that could be inserted into existing solvers without changing their overall iteration structure.
The Bayesian projection view suggests a route for incorporating posterior uncertainty estimates into the choice of trust-region radius or model degree.

Load-bearing premise

The trust-region-scale prior-accuracy condition together with uniform precision bounds and MAP-poisedness of the interpolation set must hold so that the hard-MAP models become fully linear.

What would settle it

A sequence of interpolation sets that satisfy MAP-poisedness and uniform precision bounds yet violate the trust-region-scale prior-accuracy condition at some radii, for which the model gradient error fails to remain O(Δ) and the trust-region iterates do not drive the gradient norm below a prescribed ε.

Figures

Figures reproduced from arXiv: 2605.30841 by Li Zhang, Pengcheng Xie, Wei Hu, Ya-Xiang Yuan.

**Figure 2.** Figure 2: One iteration of BUP-TR. The prior source provides ( [PITH_FULL_IMAGE:figures/full_fig_p018_2.png] view at source ↗

**Figure 3.** Figure 3: Pairwise comparison of median final relative error on each problem–dimension pair. [PITH_FULL_IMAGE:figures/full_fig_p027_3.png] view at source ↗

**Figure 4.** Figure 4: Dolan–Mor´e performance profiles at four accuracy levels. Each panel shows the fraction [PITH_FULL_IMAGE:figures/full_fig_p028_4.png] view at source ↗

**Figure 5.** Figure 5: More–Wild data profiles. Each panel shows the fraction of test instances solved as [PITH_FULL_IMAGE:figures/full_fig_p029_5.png] view at source ↗

**Figure 6.** Figure 6: Convergence overview at the fixed dimension [PITH_FULL_IMAGE:figures/full_fig_p030_6.png] view at source ↗

**Figure 7.** Figure 7: Cascade-ablation profiles across evaluation budgets. Each panel shows the fraction of [PITH_FULL_IMAGE:figures/full_fig_p031_7.png] view at source ↗

**Figure 8.** Figure 8: Noisy benchmark: success rate at τ = 10−3 under increasing noise. 0 10 −5 10 −4 10 −3 10 −2 10 −1 σ 0 20 40 60 80 100 Success rate at τ = 10 −3 (\%) n = 10 0 10 −5 10 −4 10 −3 10 −2 10 −1 σ n = 20 0 10 −5 10 −4 10 −3 10 −2 10 −1 σ n = 30 BUP-NEWUOA NEWUOA UOBYQA CMA-ES [PITH_FULL_IMAGE:figures/full_fig_p033_8.png] view at source ↗

**Figure 9.** Figure 9: Success rate at τ = 10−3 under noise, stratified by dimension. Across the tested dimensions, BUP-NEWUOA remains the highest-performing quadratic-interpolation solver in this comparison. 33 [PITH_FULL_IMAGE:figures/full_fig_p033_9.png] view at source ↗

read the original abstract

Underdetermined quadratic interpolation is a central model-construction tool in model-based derivative-free trust-region methods: it limits sampling costs but leaves an affine family of interpolating quadratics. Classical solvers select one element of this family by prescribing a fixed norm or model-change measure, such as the least-Frobenius-change Hessian update in Powell-type methods. We introduce BUP-TR (Bayesian Underdetermined Projection Trust-Region), which instead completes the model by projecting a prior quadratic onto the affine interpolation set in the precision norm supplied by the prior. The same precision matrix defines a spectral geometry certificate, MAP-poisedness, and a repair mechanism for interpolation sets. Under standard smoothness assumptions, uniform precision bounds, MAP-poisedness, and a trust-region-scale prior-accuracy condition, the hard-MAP models are fully linear. Consequently, BUP-TR attains global first-order convergence and O(epsilon^{-2}) evaluation complexity, with geometry-repair evaluations included. A NEWUOA-style implementation, BUP-NEWUOA, improves fixed-budget performance on the reported benchmark suite at moderate and stringent accuracy targets while retaining the computational structure of a Powell-type trust-region method.

Editorial analysis

A structured set of objections, weighed in public.

Desk editor's note, referee report, simulated authors' rebuttal, and a circularity audit. Tearing a paper down is the easy half of reading it; the pith above is the substance, this is the friction.

Desk Editor's Note private letter to a colleague

BUP-TR gives a clean Bayesian projection rule for completing underdetermined quadratics in DFO trust-region methods and the convergence theory follows once the listed conditions hold.

read the letter

The paper's central move is to project a prior quadratic onto the affine space of interpolants using the prior precision norm instead of a fixed-norm or least-Frobenius rule. This produces the hard-MAP model, defines MAP-poisedness from the same matrix, and supplies a spectral repair mechanism. Under the four conditions listed in the abstract the models are fully linear, so the standard first-order global convergence and O(ε^{-2}) complexity carry over with repair cost included.

The approach is new in the subfield. Classical Powell-type updates minimize a change measure; here the selection is a projection in the prior metric, and the geometry certificate is derived directly from that metric. The NEWUOA-style implementation keeps the same trust-region loop and shows better fixed-budget results on the reported test set at moderate and tight tolerances.

The main soft spot is practical rather than theoretical. The trust-region-scale prior-accuracy condition and uniform precision bounds must be maintained; if the prior is misspecified these can force extra repairs. The abstract does not spell out the exact constants in the fully-linear proof, but the logical structure matches classical DFO arguments once MAP-poisedness is granted, so there is no internal contradiction.

The work is aimed at people already working on model-based derivative-free solvers who want to bring prior information into the model-selection step. The citation pattern is appropriate and the claims are reproducible in principle from the stated assumptions.

I would send it to a serious referee. The idea is distinct, the analysis is consistent with existing theory, and the implementation results are worth checking in detail.

Referee Report

2 major / 1 minor

Summary. The manuscript introduces BUP-TR, a derivative-free trust-region optimization method that completes underdetermined quadratic interpolation models via Bayesian projection of a prior quadratic onto the affine interpolation set in the prior precision norm. It defines MAP-poisedness, a spectral geometry certificate, and a repair mechanism from the same precision matrix. Under standard smoothness assumptions, uniform precision bounds, MAP-poisedness, and a trust-region-scale prior-accuracy condition, the hard-MAP models are fully linear, yielding global first-order convergence and O(ε^{-2}) evaluation complexity with geometry-repair costs included. A NEWUOA-style implementation (BUP-NEWUOA) is reported to improve fixed-budget performance on benchmarks at moderate and stringent accuracy targets.

Significance. If the fully-linear property holds, the work supplies a principled Bayesian mechanism for selecting models within the underdetermined family, together with an integrated spectral certificate and repair procedure whose costs are folded into the complexity bound. This extends classical Powell-type trust-region methods while preserving their computational structure. The explicit inclusion of repair evaluations in the O(ε^{-2}) bound is a constructive feature.

major comments (2)

[Abstract / main convergence theorem] The central convergence and complexity claims rest on the assertion that hard-MAP models are fully linear under the four listed conditions (standard smoothness, uniform precision bounds, MAP-poisedness, trust-region-scale prior-accuracy). The derivation steps establishing the fully-linear error bounds, the precise role of the trust-region-scale prior-accuracy condition, and the accounting for geometry-repair evaluations are not visible; this verification is load-bearing for the main theorem.
[Definitions of MAP-poisedness and geometry certificate] MAP-poisedness, the spectral geometry certificate, and the repair mechanism are all defined from the precision matrix that also defines the projection. While external smoothness and prior-accuracy conditions are invoked for convergence, the manuscript must explicitly confirm that the geometry control remains non-circular once these conditions are imposed.

minor comments (1)

[Abstract] Benchmark claims in the abstract are stated without reference to the specific test suite, number of runs, or statistical measures; these details should be supplied in the numerical section.

Simulated Author's Rebuttal

2 responses · 0 unresolved

We thank the referee for the thoughtful review and for highlighting the load-bearing aspects of the convergence analysis. We address the two major comments point by point below and will revise the manuscript to improve clarity and explicitness.

read point-by-point responses

Referee: [Abstract / main convergence theorem] The central convergence and complexity claims rest on the assertion that hard-MAP models are fully linear under the four listed conditions (standard smoothness, uniform precision bounds, MAP-poisedness, trust-region-scale prior-accuracy). The derivation steps establishing the fully-linear error bounds, the precise role of the trust-region-scale prior-accuracy condition, and the accounting for geometry-repair evaluations are not visible; this verification is load-bearing for the main theorem.

Authors: We agree that the step-by-step derivation of the fully-linear error bounds, the precise contribution of the trust-region-scale prior-accuracy condition, and the explicit inclusion of geometry-repair evaluations in the complexity count should be presented more visibly. In the revised manuscript we will expand the proof of the main convergence theorem (currently Theorem 5.1) with an additional lemma that isolates the model-error bound under the four stated conditions, followed by a short paragraph that traces how repair evaluations are charged against the O(ε^{-2}) budget. These additions will make the verification self-contained without altering the theorem statement. revision: yes
Referee: [Definitions of MAP-poisedness and geometry certificate] MAP-poisedness, the spectral geometry certificate, and the repair mechanism are all defined from the precision matrix that also defines the projection. While external smoothness and prior-accuracy conditions are invoked for convergence, the manuscript must explicitly confirm that the geometry control remains non-circular once these conditions are imposed.

Authors: We accept that an explicit non-circularity statement is required. MAP-poisedness, the spectral certificate, and the repair procedure are defined exclusively from the precision matrix and the interpolation set; the smoothness and prior-accuracy assumptions appear only later, in the error-bound lemma that establishes full linearity. In the revision we will insert a short remark immediately after the definition of MAP-poisedness (Section 3) stating that these geometric objects are independent of the external analytic assumptions and that the latter are used solely to convert the geometric certificate into a model-error bound. This addition removes any appearance of circularity. revision: yes

Circularity Check

0 steps flagged

No significant circularity; derivation adapts classical DFO fully-linear arguments to new model choice

full rationale

The paper defines the hard-MAP model by projection onto the interpolation affine space in the prior precision norm, then defines MAP-poisedness, the spectral certificate, and the repair rule from the same matrix. Under the additional external conditions of smoothness, uniform precision bounds, MAP-poisedness, and trust-region-scale prior accuracy, it claims the models are fully linear and therefore the method inherits the standard global first-order convergence and O(ε^{-2}) complexity. This structure mirrors classical trust-region DFO proofs (poisedness implies fully linear error bounds) without reducing any load-bearing step to a self-definition, fitted input renamed as prediction, or self-citation chain. The geometry definitions are new but the implication to fully linear is derived from the poisedness condition exactly as in the literature; no equation or claim collapses to its own input by construction.

Axiom & Free-Parameter Ledger

1 free parameters · 2 axioms · 2 invented entities

Only the abstract is available; the ledger is therefore limited to items explicitly named there. The prior precision matrix and the prior-accuracy condition function as central modeling choices whose concrete form is not specified.

free parameters (1)

prior precision matrix
Supplies the norm for projection and the geometry certificate; its concrete construction is not detailed in the abstract.

axioms (2)

domain assumption standard smoothness assumptions
Invoked to obtain the fully-linear property and first-order convergence.
ad hoc to paper trust-region-scale prior-accuracy condition
Required for hard-MAP models to be fully linear; stated as an assumption in the abstract.

invented entities (2)

MAP-poisedness no independent evidence
purpose: Spectral geometry certificate and repair mechanism for interpolation sets
Defined from the prior precision matrix; no independent evidence supplied in the abstract.
hard-MAP models no independent evidence
purpose: Selected quadratic models that are claimed to be fully linear
Result of the Bayesian projection; independent evidence not visible in abstract.

pith-pipeline@v0.9.1-grok · 5745 in / 1581 out tokens · 29713 ms · 2026-06-28T21:44:23.709697+00:00 · methodology

discussion (0)

Reference graph

Works this paper leans on

29 extracted references · 4 canonical work pages

[1]

A. R. Conn, K. Scheinberg, and L. N. Vicente.Introduction to Derivative-Free Optimization. MOS–SIAM Series on Optimization. SIAM, Philadelphia, PA, 2009

2009
[2]

M. J. D. Powell. UOBYQA: unconstrained optimization by quadratic approximation. Mathematical Programming, 92(3):555–582, 2002

2002
[3]

M. J. D. Powell. The NEWUOA software for unconstrained optimization without deriva- tives. In G. Di Pillo and M. Roma (eds.),Large-Scale Nonlinear Optimization, Nonconvex Optimization and Its Applications, pages 255–297. Springer, 2006. 34

2006
[4]

M. J. D. Powell. The BOBYQA algorithm for bound constrained optimization without derivatives. Technical Report DAMTP 2009/NA06, University of Cambridge, 2009

2009
[5]

T. M. Ragonneau and Z. Zhang. PDFO: a cross-platform package for Powell’s derivative-free optimization solvers.Mathematical Programming Computation, 16:535–559, 2024

2024
[6]

Z. Zhang. PRIMA: Reference Implementation for Powell’s Methods with Modernization and Amelioration. Software package, available at libprima.net, doi:10.5281/zenodo.8052654, 2023

work page doi:10.5281/zenodo.8052654 2023
[7]

Z. Zhang. Scalable derivative-free optimization algorithms with low-dimensional subspace techniques.arXiv preprint arXiv:2501.04536, 2025

arXiv 2025
[8]

Xie and Y.-X

P. Xie and Y.-X. Yuan. A derivative-free method using a new underdetermined quadratic interpolation model.SIAM Journal on Optimization, 35(2):1110–1133, 2025, doi:10.1137/23M1582023

work page doi:10.1137/23m1582023 2025
[9]

Xie and Y.-X

P. Xie and Y.-X. Yuan. Least H2-norm updating of quadratic interpolation models for derivative-free trust-region algorithms.IMA Journal of Numerical Analysis, 46(1):21–50, 2026, doi:10.1093/imanum/drae106

work page doi:10.1093/imanum/drae106 2026
[10]

Xie and S

P. Xie and S. M. Wild. ReMU: regional minimal updating for model-based derivative- free optimization.Optimization Methods and Software, published online, 2026, doi:10.1080/10556788.2026.2660368

work page doi:10.1080/10556788.2026.2660368 2026
[11]

C. E. Rasmussen and C. K. I. Williams.Gaussian Processes for Machine Learning. MIT Press, Cambridge, MA, 2006

2006
[12]

A derivative-free optimization algorithm combining line-search and trust-region techniques,

P. Xie and Y.-x. Yuan, “A derivative-free optimization algorithm combining line-search and trust-region techniques,”Chinese Annals of Mathematics, Series B, vol. 44, no. 5, pp. 693–708, 2023

2023
[13]

Snoek, H

J. Snoek, H. Larochelle, and R. P. Adams. Practical Bayesian optimization of machine learning algorithms. InAdvances in Neural Information Processing Systems, pages 2951– 2959, 2012

2012
[14]

Shahriari, K

B. Shahriari, K. Swersky, Z. Wang, R. P. Adams, and N. de Freitas. Taking the human out of the loop: A review of Bayesian optimization.Proceedings of the IEEE, 104(1):148–175, 2016

2016
[15]

A. S. Bandeira, K. Scheinberg, and L. N. Vicente. Convergence of trust-region methods based on probabilistic models.SIAM Journal on Optimization, 24(3):1238–1264, 2014

2014
[16]

E. D. Dolan and J. J. Mor´ e. Benchmarking optimization software with performance profiles. Mathematical Programming, 91(2):201–213, 2002

2002
[17]

J. J. Mor´ e and S. M. Wild. Benchmarking derivative-free optimization algorithms.SIAM Journal on Optimization, 20(1):172–191, 2009

2009
[18]

S. M. Wild.MNH: A Derivative-Free Optimization Algorithm Using Minimum Norm Hessians. Ph.D. thesis, Cornell University, 2008

2008
[19]

Cartis, J

C. Cartis, J. Fiala, B. Marber, and L. Roberts. Improving the flexibility and robustness of model-based derivative-free optimization solvers.ACM Transactions on Mathematical Software, 45(3):1–41, 2019

2019
[20]

P. I. Frazier. A tutorial on Bayesian optimization.arXiv preprint arXiv:1807.02811, 2018. 35

Pith/arXiv arXiv 2018
[21]

N. I. M. Gould, D. Orban, and Ph. L. Toint. CUTEst: a constrained and unconstrained testing environment with safe threads for mathematical optimization.Computational Optimization and Applications, 60(3):545–557, 2015

2015
[22]

J. A. Nelder and R. Mead. A simplex method for function minimization.The Computer Journal, 7(4):308–313, 1965

1965
[23]

Hansen and A

N. Hansen and A. Ostermeier. Completely derandomized self-adaptation in evolution strategies.Evolutionary Computation, 9(2):159–195, 2001. Technical Appendices The appendices are split into two parts. Sections A–H collect the technical derivations that support the main theory and algorithm, including coefficient scaling, closed-form MAP projections, fallb...

2001
[24]

LetH=H ⊤andv= vech(H)

Hence, sup ∥u∥2≤1 ∥ˆϕ(u)∥2≤ √ 5 2.(93) From vech to operator norm (symmetric case). LetH=H ⊤andv= vech(H). Then ∥H∥2≤∥H∥F≤ √ 2∥v∥2.(94) Proof. Write ∥H∥2 F = ∑ iH2 ii + 2 ∑ i<jH2 ij≤2 (∑ iH2 ii + ∑ i<jH2 ij ) = 2∥v∥2
[25]

Then ∥H∥2≤ ∥H∥F≤ √ 2∥v∥2. □ B Weighted Projection Formula for Hard-MAP Completion This appendix derives the closed form of the hard-MAP estimator and proves the projector properties used in Lemma 4.12 (nonexpansiveness in the ˆW-norm). B.1 KKT derivation of the projection formula Fix k and suppress the index. Let ˆA∈R(m+1)×qand b∈Rm+1. Given ˆcπ∈Rq and ˆW...
[26]

Taking square roots yields (123)

∆ 4 k. Taking square roots yields (123). RemarkE.2 (Role of the precision metric).Because ¯wH can be much smaller than wmax when the GP Hessian posterior is uncertain (diagonal entries of WH,k near wmin), the constant ¯κπis generically tighter than√wmax √ a2 1 +a 2 2, which would result from a Euclidean-norm assumption under identical surrogate accuracy. ...
[27]

By Lemma 4.19 (since ∥∇f(xk)∥2 ≥εand ∆ k ≤∆ crit(ε)), the criticality loop exits immediately without further reducing ∆ k
[28]

Conversely, if ∆k > ¯∆ εand the iteration is unsuccessful, then ∆ k+1 =γdec∆ k >γdec ¯∆ ε≥∆ε

By Lemma 4.18 (since ∆ k≤∆succ(ε)), the trust-region step is successful, so ∆k+1≥∆k. Conversely, if ∆k > ¯∆ εand the iteration is unsuccessful, then ∆ k+1 =γdec∆ k >γdec ¯∆ ε≥∆ε. In all cases ∆ k+1≥∆ε; induction onkproves the stated lower bound. By the criticality safeguard (Assumption 4.7), every iteration that proceeds to compute a step satisfies∥gk∥2>κ...
[29]

f⋆= varies

Once ∆ k≤¯∆ ε, Lemma 4.19 prevents further criticality shrinkage and Lemma 4.18 forces a successful step, contradicting the assumption. Therefore, f decreases by a fixed positive amount infinitely often, contradicting the lower boundf≥finf. G.4 Proof of Theorem 4.21 Proof. Fix ε∈(0, 1]. By Theorem 4.20, such indices exist for every ε∈(0, 1]; let K be the ...

[1] [1]

A. R. Conn, K. Scheinberg, and L. N. Vicente.Introduction to Derivative-Free Optimization. MOS–SIAM Series on Optimization. SIAM, Philadelphia, PA, 2009

2009

[2] [2]

M. J. D. Powell. UOBYQA: unconstrained optimization by quadratic approximation. Mathematical Programming, 92(3):555–582, 2002

2002

[3] [3]

M. J. D. Powell. The NEWUOA software for unconstrained optimization without deriva- tives. In G. Di Pillo and M. Roma (eds.),Large-Scale Nonlinear Optimization, Nonconvex Optimization and Its Applications, pages 255–297. Springer, 2006. 34

2006

[4] [4]

M. J. D. Powell. The BOBYQA algorithm for bound constrained optimization without derivatives. Technical Report DAMTP 2009/NA06, University of Cambridge, 2009

2009

[5] [5]

T. M. Ragonneau and Z. Zhang. PDFO: a cross-platform package for Powell’s derivative-free optimization solvers.Mathematical Programming Computation, 16:535–559, 2024

2024

[6] [6]

Z. Zhang. PRIMA: Reference Implementation for Powell’s Methods with Modernization and Amelioration. Software package, available at libprima.net, doi:10.5281/zenodo.8052654, 2023

work page doi:10.5281/zenodo.8052654 2023

[7] [7]

Z. Zhang. Scalable derivative-free optimization algorithms with low-dimensional subspace techniques.arXiv preprint arXiv:2501.04536, 2025

arXiv 2025

[8] [8]

Xie and Y.-X

P. Xie and Y.-X. Yuan. A derivative-free method using a new underdetermined quadratic interpolation model.SIAM Journal on Optimization, 35(2):1110–1133, 2025, doi:10.1137/23M1582023

work page doi:10.1137/23m1582023 2025

[9] [9]

Xie and Y.-X

P. Xie and Y.-X. Yuan. Least H2-norm updating of quadratic interpolation models for derivative-free trust-region algorithms.IMA Journal of Numerical Analysis, 46(1):21–50, 2026, doi:10.1093/imanum/drae106

work page doi:10.1093/imanum/drae106 2026

[10] [10]

Xie and S

P. Xie and S. M. Wild. ReMU: regional minimal updating for model-based derivative- free optimization.Optimization Methods and Software, published online, 2026, doi:10.1080/10556788.2026.2660368

work page doi:10.1080/10556788.2026.2660368 2026

[11] [11]

C. E. Rasmussen and C. K. I. Williams.Gaussian Processes for Machine Learning. MIT Press, Cambridge, MA, 2006

2006

[12] [12]

A derivative-free optimization algorithm combining line-search and trust-region techniques,

P. Xie and Y.-x. Yuan, “A derivative-free optimization algorithm combining line-search and trust-region techniques,”Chinese Annals of Mathematics, Series B, vol. 44, no. 5, pp. 693–708, 2023

2023

[13] [13]

Snoek, H

J. Snoek, H. Larochelle, and R. P. Adams. Practical Bayesian optimization of machine learning algorithms. InAdvances in Neural Information Processing Systems, pages 2951– 2959, 2012

2012

[14] [14]

Shahriari, K

B. Shahriari, K. Swersky, Z. Wang, R. P. Adams, and N. de Freitas. Taking the human out of the loop: A review of Bayesian optimization.Proceedings of the IEEE, 104(1):148–175, 2016

2016

[15] [15]

A. S. Bandeira, K. Scheinberg, and L. N. Vicente. Convergence of trust-region methods based on probabilistic models.SIAM Journal on Optimization, 24(3):1238–1264, 2014

2014

[16] [16]

E. D. Dolan and J. J. Mor´ e. Benchmarking optimization software with performance profiles. Mathematical Programming, 91(2):201–213, 2002

2002

[17] [17]

J. J. Mor´ e and S. M. Wild. Benchmarking derivative-free optimization algorithms.SIAM Journal on Optimization, 20(1):172–191, 2009

2009

[18] [18]

S. M. Wild.MNH: A Derivative-Free Optimization Algorithm Using Minimum Norm Hessians. Ph.D. thesis, Cornell University, 2008

2008

[19] [19]

Cartis, J

C. Cartis, J. Fiala, B. Marber, and L. Roberts. Improving the flexibility and robustness of model-based derivative-free optimization solvers.ACM Transactions on Mathematical Software, 45(3):1–41, 2019

2019

[20] [20]

P. I. Frazier. A tutorial on Bayesian optimization.arXiv preprint arXiv:1807.02811, 2018. 35

Pith/arXiv arXiv 2018

[21] [21]

N. I. M. Gould, D. Orban, and Ph. L. Toint. CUTEst: a constrained and unconstrained testing environment with safe threads for mathematical optimization.Computational Optimization and Applications, 60(3):545–557, 2015

2015

[22] [22]

J. A. Nelder and R. Mead. A simplex method for function minimization.The Computer Journal, 7(4):308–313, 1965

1965

[23] [23]

Hansen and A

N. Hansen and A. Ostermeier. Completely derandomized self-adaptation in evolution strategies.Evolutionary Computation, 9(2):159–195, 2001. Technical Appendices The appendices are split into two parts. Sections A–H collect the technical derivations that support the main theory and algorithm, including coefficient scaling, closed-form MAP projections, fallb...

2001

[24] [24]

LetH=H ⊤andv= vech(H)

Hence, sup ∥u∥2≤1 ∥ˆϕ(u)∥2≤ √ 5 2.(93) From vech to operator norm (symmetric case). LetH=H ⊤andv= vech(H). Then ∥H∥2≤∥H∥F≤ √ 2∥v∥2.(94) Proof. Write ∥H∥2 F = ∑ iH2 ii + 2 ∑ i<jH2 ij≤2 (∑ iH2 ii + ∑ i<jH2 ij ) = 2∥v∥2

[25] [25]

Then ∥H∥2≤ ∥H∥F≤ √ 2∥v∥2. □ B Weighted Projection Formula for Hard-MAP Completion This appendix derives the closed form of the hard-MAP estimator and proves the projector properties used in Lemma 4.12 (nonexpansiveness in the ˆW-norm). B.1 KKT derivation of the projection formula Fix k and suppress the index. Let ˆA∈R(m+1)×qand b∈Rm+1. Given ˆcπ∈Rq and ˆW...

[26] [26]

Taking square roots yields (123)

∆ 4 k. Taking square roots yields (123). RemarkE.2 (Role of the precision metric).Because ¯wH can be much smaller than wmax when the GP Hessian posterior is uncertain (diagonal entries of WH,k near wmin), the constant ¯κπis generically tighter than√wmax √ a2 1 +a 2 2, which would result from a Euclidean-norm assumption under identical surrogate accuracy. ...

[27] [27]

By Lemma 4.19 (since ∥∇f(xk)∥2 ≥εand ∆ k ≤∆ crit(ε)), the criticality loop exits immediately without further reducing ∆ k

[28] [28]

Conversely, if ∆k > ¯∆ εand the iteration is unsuccessful, then ∆ k+1 =γdec∆ k >γdec ¯∆ ε≥∆ε

By Lemma 4.18 (since ∆ k≤∆succ(ε)), the trust-region step is successful, so ∆k+1≥∆k. Conversely, if ∆k > ¯∆ εand the iteration is unsuccessful, then ∆ k+1 =γdec∆ k >γdec ¯∆ ε≥∆ε. In all cases ∆ k+1≥∆ε; induction onkproves the stated lower bound. By the criticality safeguard (Assumption 4.7), every iteration that proceeds to compute a step satisfies∥gk∥2>κ...

[29] [29]

f⋆= varies

Once ∆ k≤¯∆ ε, Lemma 4.19 prevents further criticality shrinkage and Lemma 4.18 forces a successful step, contradicting the assumption. Therefore, f decreases by a fixed positive amount infinitely often, contradicting the lower boundf≥finf. G.4 Proof of Theorem 4.21 Proof. Fix ε∈(0, 1]. By Theorem 4.20, such indices exist for every ε∈(0, 1]; let K be the ...