BUP-TR: Bayesian Underdetermined Projection Trust-Region Methods for Derivative-Free Optimization
Pith reviewed 2026-06-28 21:44 UTC · model grok-4.3
The pith
BUP-TR selects underdetermined quadratic models by projecting a prior in the precision norm, making hard-MAP models fully linear and delivering global first-order convergence at O(ε^{-2}) cost.
A machine-rendered reading of the paper's core claim, the machinery that carries it, and where it could break.
Core claim
BUP-TR completes the affine family of interpolating quadratics by projecting a prior quadratic onto the interpolation set in the prior's precision norm. The identical precision matrix defines MAP-poisedness, a spectral certificate for the interpolation set, and a repair procedure. Under the stated smoothness, uniform-precision, MAP-poisedness, and trust-region-scale prior-accuracy assumptions, the hard-MAP models are fully linear; the trust-region method therefore attains global first-order convergence together with an O(ε^{-2}) evaluation complexity bound that includes geometry repairs.
What carries the argument
Bayesian underdetermined projection (BUP) of the prior quadratic onto the interpolation set measured in the prior precision norm; MAP-poisedness, the spectral condition on the interpolation set induced by the same matrix.
If this is right
- Global first-order convergence holds for the sequence of trust-region iterates.
- Total function evaluations, including geometry-repair steps, scale as O(ε^{-2}).
- The computational structure of a Powell-type trust-region method is retained while model selection is altered.
- Fixed-budget performance improves relative to the baseline at both moderate and stringent accuracy targets on the reported benchmark suite.
Where Pith is reading between the lines
- The precision-norm geometry certificate may transfer directly to other model-based derivative-free frameworks that already maintain interpolation sets.
- MAP-poisedness supplies an a-priori diagnostic that could be inserted into existing solvers without changing their overall iteration structure.
- The Bayesian projection view suggests a route for incorporating posterior uncertainty estimates into the choice of trust-region radius or model degree.
Load-bearing premise
The trust-region-scale prior-accuracy condition together with uniform precision bounds and MAP-poisedness of the interpolation set must hold so that the hard-MAP models become fully linear.
What would settle it
A sequence of interpolation sets that satisfy MAP-poisedness and uniform precision bounds yet violate the trust-region-scale prior-accuracy condition at some radii, for which the model gradient error fails to remain O(Δ) and the trust-region iterates do not drive the gradient norm below a prescribed ε.
Figures
read the original abstract
Underdetermined quadratic interpolation is a central model-construction tool in model-based derivative-free trust-region methods: it limits sampling costs but leaves an affine family of interpolating quadratics. Classical solvers select one element of this family by prescribing a fixed norm or model-change measure, such as the least-Frobenius-change Hessian update in Powell-type methods. We introduce BUP-TR (Bayesian Underdetermined Projection Trust-Region), which instead completes the model by projecting a prior quadratic onto the affine interpolation set in the precision norm supplied by the prior. The same precision matrix defines a spectral geometry certificate, MAP-poisedness, and a repair mechanism for interpolation sets. Under standard smoothness assumptions, uniform precision bounds, MAP-poisedness, and a trust-region-scale prior-accuracy condition, the hard-MAP models are fully linear. Consequently, BUP-TR attains global first-order convergence and O(epsilon^{-2}) evaluation complexity, with geometry-repair evaluations included. A NEWUOA-style implementation, BUP-NEWUOA, improves fixed-budget performance on the reported benchmark suite at moderate and stringent accuracy targets while retaining the computational structure of a Powell-type trust-region method.
Editorial analysis
A structured set of objections, weighed in public.
Referee Report
Summary. The manuscript introduces BUP-TR, a derivative-free trust-region optimization method that completes underdetermined quadratic interpolation models via Bayesian projection of a prior quadratic onto the affine interpolation set in the prior precision norm. It defines MAP-poisedness, a spectral geometry certificate, and a repair mechanism from the same precision matrix. Under standard smoothness assumptions, uniform precision bounds, MAP-poisedness, and a trust-region-scale prior-accuracy condition, the hard-MAP models are fully linear, yielding global first-order convergence and O(ε^{-2}) evaluation complexity with geometry-repair costs included. A NEWUOA-style implementation (BUP-NEWUOA) is reported to improve fixed-budget performance on benchmarks at moderate and stringent accuracy targets.
Significance. If the fully-linear property holds, the work supplies a principled Bayesian mechanism for selecting models within the underdetermined family, together with an integrated spectral certificate and repair procedure whose costs are folded into the complexity bound. This extends classical Powell-type trust-region methods while preserving their computational structure. The explicit inclusion of repair evaluations in the O(ε^{-2}) bound is a constructive feature.
major comments (2)
- [Abstract / main convergence theorem] The central convergence and complexity claims rest on the assertion that hard-MAP models are fully linear under the four listed conditions (standard smoothness, uniform precision bounds, MAP-poisedness, trust-region-scale prior-accuracy). The derivation steps establishing the fully-linear error bounds, the precise role of the trust-region-scale prior-accuracy condition, and the accounting for geometry-repair evaluations are not visible; this verification is load-bearing for the main theorem.
- [Definitions of MAP-poisedness and geometry certificate] MAP-poisedness, the spectral geometry certificate, and the repair mechanism are all defined from the precision matrix that also defines the projection. While external smoothness and prior-accuracy conditions are invoked for convergence, the manuscript must explicitly confirm that the geometry control remains non-circular once these conditions are imposed.
minor comments (1)
- [Abstract] Benchmark claims in the abstract are stated without reference to the specific test suite, number of runs, or statistical measures; these details should be supplied in the numerical section.
Simulated Author's Rebuttal
We thank the referee for the thoughtful review and for highlighting the load-bearing aspects of the convergence analysis. We address the two major comments point by point below and will revise the manuscript to improve clarity and explicitness.
read point-by-point responses
-
Referee: [Abstract / main convergence theorem] The central convergence and complexity claims rest on the assertion that hard-MAP models are fully linear under the four listed conditions (standard smoothness, uniform precision bounds, MAP-poisedness, trust-region-scale prior-accuracy). The derivation steps establishing the fully-linear error bounds, the precise role of the trust-region-scale prior-accuracy condition, and the accounting for geometry-repair evaluations are not visible; this verification is load-bearing for the main theorem.
Authors: We agree that the step-by-step derivation of the fully-linear error bounds, the precise contribution of the trust-region-scale prior-accuracy condition, and the explicit inclusion of geometry-repair evaluations in the complexity count should be presented more visibly. In the revised manuscript we will expand the proof of the main convergence theorem (currently Theorem 5.1) with an additional lemma that isolates the model-error bound under the four stated conditions, followed by a short paragraph that traces how repair evaluations are charged against the O(ε^{-2}) budget. These additions will make the verification self-contained without altering the theorem statement. revision: yes
-
Referee: [Definitions of MAP-poisedness and geometry certificate] MAP-poisedness, the spectral geometry certificate, and the repair mechanism are all defined from the precision matrix that also defines the projection. While external smoothness and prior-accuracy conditions are invoked for convergence, the manuscript must explicitly confirm that the geometry control remains non-circular once these conditions are imposed.
Authors: We accept that an explicit non-circularity statement is required. MAP-poisedness, the spectral certificate, and the repair procedure are defined exclusively from the precision matrix and the interpolation set; the smoothness and prior-accuracy assumptions appear only later, in the error-bound lemma that establishes full linearity. In the revision we will insert a short remark immediately after the definition of MAP-poisedness (Section 3) stating that these geometric objects are independent of the external analytic assumptions and that the latter are used solely to convert the geometric certificate into a model-error bound. This addition removes any appearance of circularity. revision: yes
Circularity Check
No significant circularity; derivation adapts classical DFO fully-linear arguments to new model choice
full rationale
The paper defines the hard-MAP model by projection onto the interpolation affine space in the prior precision norm, then defines MAP-poisedness, the spectral certificate, and the repair rule from the same matrix. Under the additional external conditions of smoothness, uniform precision bounds, MAP-poisedness, and trust-region-scale prior accuracy, it claims the models are fully linear and therefore the method inherits the standard global first-order convergence and O(ε^{-2}) complexity. This structure mirrors classical trust-region DFO proofs (poisedness implies fully linear error bounds) without reducing any load-bearing step to a self-definition, fitted input renamed as prediction, or self-citation chain. The geometry definitions are new but the implication to fully linear is derived from the poisedness condition exactly as in the literature; no equation or claim collapses to its own input by construction.
Axiom & Free-Parameter Ledger
free parameters (1)
- prior precision matrix
axioms (2)
- domain assumption standard smoothness assumptions
- ad hoc to paper trust-region-scale prior-accuracy condition
invented entities (2)
-
MAP-poisedness
no independent evidence
-
hard-MAP models
no independent evidence
Reference graph
Works this paper leans on
-
[1]
A. R. Conn, K. Scheinberg, and L. N. Vicente.Introduction to Derivative-Free Optimization. MOS–SIAM Series on Optimization. SIAM, Philadelphia, PA, 2009
2009
-
[2]
M. J. D. Powell. UOBYQA: unconstrained optimization by quadratic approximation. Mathematical Programming, 92(3):555–582, 2002
2002
-
[3]
M. J. D. Powell. The NEWUOA software for unconstrained optimization without deriva- tives. In G. Di Pillo and M. Roma (eds.),Large-Scale Nonlinear Optimization, Nonconvex Optimization and Its Applications, pages 255–297. Springer, 2006. 34
2006
-
[4]
M. J. D. Powell. The BOBYQA algorithm for bound constrained optimization without derivatives. Technical Report DAMTP 2009/NA06, University of Cambridge, 2009
2009
-
[5]
T. M. Ragonneau and Z. Zhang. PDFO: a cross-platform package for Powell’s derivative-free optimization solvers.Mathematical Programming Computation, 16:535–559, 2024
2024
-
[6]
Z. Zhang. PRIMA: Reference Implementation for Powell’s Methods with Modernization and Amelioration. Software package, available at libprima.net, doi:10.5281/zenodo.8052654, 2023
-
[7]
Z. Zhang. Scalable derivative-free optimization algorithms with low-dimensional subspace techniques.arXiv preprint arXiv:2501.04536, 2025
arXiv 2025
-
[8]
P. Xie and Y.-X. Yuan. A derivative-free method using a new underdetermined quadratic interpolation model.SIAM Journal on Optimization, 35(2):1110–1133, 2025, doi:10.1137/23M1582023
-
[9]
P. Xie and Y.-X. Yuan. Least H2-norm updating of quadratic interpolation models for derivative-free trust-region algorithms.IMA Journal of Numerical Analysis, 46(1):21–50, 2026, doi:10.1093/imanum/drae106
-
[10]
P. Xie and S. M. Wild. ReMU: regional minimal updating for model-based derivative- free optimization.Optimization Methods and Software, published online, 2026, doi:10.1080/10556788.2026.2660368
-
[11]
C. E. Rasmussen and C. K. I. Williams.Gaussian Processes for Machine Learning. MIT Press, Cambridge, MA, 2006
2006
-
[12]
A derivative-free optimization algorithm combining line-search and trust-region techniques,
P. Xie and Y.-x. Yuan, “A derivative-free optimization algorithm combining line-search and trust-region techniques,”Chinese Annals of Mathematics, Series B, vol. 44, no. 5, pp. 693–708, 2023
2023
-
[13]
Snoek, H
J. Snoek, H. Larochelle, and R. P. Adams. Practical Bayesian optimization of machine learning algorithms. InAdvances in Neural Information Processing Systems, pages 2951– 2959, 2012
2012
-
[14]
Shahriari, K
B. Shahriari, K. Swersky, Z. Wang, R. P. Adams, and N. de Freitas. Taking the human out of the loop: A review of Bayesian optimization.Proceedings of the IEEE, 104(1):148–175, 2016
2016
-
[15]
A. S. Bandeira, K. Scheinberg, and L. N. Vicente. Convergence of trust-region methods based on probabilistic models.SIAM Journal on Optimization, 24(3):1238–1264, 2014
2014
-
[16]
E. D. Dolan and J. J. Mor´ e. Benchmarking optimization software with performance profiles. Mathematical Programming, 91(2):201–213, 2002
2002
-
[17]
J. J. Mor´ e and S. M. Wild. Benchmarking derivative-free optimization algorithms.SIAM Journal on Optimization, 20(1):172–191, 2009
2009
-
[18]
S. M. Wild.MNH: A Derivative-Free Optimization Algorithm Using Minimum Norm Hessians. Ph.D. thesis, Cornell University, 2008
2008
-
[19]
Cartis, J
C. Cartis, J. Fiala, B. Marber, and L. Roberts. Improving the flexibility and robustness of model-based derivative-free optimization solvers.ACM Transactions on Mathematical Software, 45(3):1–41, 2019
2019
-
[20]
P. I. Frazier. A tutorial on Bayesian optimization.arXiv preprint arXiv:1807.02811, 2018. 35
Pith/arXiv arXiv 2018
-
[21]
N. I. M. Gould, D. Orban, and Ph. L. Toint. CUTEst: a constrained and unconstrained testing environment with safe threads for mathematical optimization.Computational Optimization and Applications, 60(3):545–557, 2015
2015
-
[22]
J. A. Nelder and R. Mead. A simplex method for function minimization.The Computer Journal, 7(4):308–313, 1965
1965
-
[23]
Hansen and A
N. Hansen and A. Ostermeier. Completely derandomized self-adaptation in evolution strategies.Evolutionary Computation, 9(2):159–195, 2001. Technical Appendices The appendices are split into two parts. Sections A–H collect the technical derivations that support the main theory and algorithm, including coefficient scaling, closed-form MAP projections, fallb...
2001
-
[24]
LetH=H ⊤andv= vech(H)
Hence, sup ∥u∥2≤1 ∥ˆϕ(u)∥2≤ √ 5 2.(93) From vech to operator norm (symmetric case). LetH=H ⊤andv= vech(H). Then ∥H∥2≤∥H∥F≤ √ 2∥v∥2.(94) Proof. Write ∥H∥2 F = ∑ iH2 ii + 2 ∑ i<jH2 ij≤2 (∑ iH2 ii + ∑ i<jH2 ij ) = 2∥v∥2
-
[25]
Then ∥H∥2≤ ∥H∥F≤ √ 2∥v∥2. □ B Weighted Projection Formula for Hard-MAP Completion This appendix derives the closed form of the hard-MAP estimator and proves the projector properties used in Lemma 4.12 (nonexpansiveness in the ˆW-norm). B.1 KKT derivation of the projection formula Fix k and suppress the index. Let ˆA∈R(m+1)×qand b∈Rm+1. Given ˆcπ∈Rq and ˆW...
-
[26]
Taking square roots yields (123)
∆ 4 k. Taking square roots yields (123). RemarkE.2 (Role of the precision metric).Because ¯wH can be much smaller than wmax when the GP Hessian posterior is uncertain (diagonal entries of WH,k near wmin), the constant ¯κπis generically tighter than√wmax √ a2 1 +a 2 2, which would result from a Euclidean-norm assumption under identical surrogate accuracy. ...
-
[27]
By Lemma 4.19 (since ∥∇f(xk)∥2 ≥εand ∆ k ≤∆ crit(ε)), the criticality loop exits immediately without further reducing ∆ k
-
[28]
Conversely, if ∆k > ¯∆ εand the iteration is unsuccessful, then ∆ k+1 =γdec∆ k >γdec ¯∆ ε≥∆ε
By Lemma 4.18 (since ∆ k≤∆succ(ε)), the trust-region step is successful, so ∆k+1≥∆k. Conversely, if ∆k > ¯∆ εand the iteration is unsuccessful, then ∆ k+1 =γdec∆ k >γdec ¯∆ ε≥∆ε. In all cases ∆ k+1≥∆ε; induction onkproves the stated lower bound. By the criticality safeguard (Assumption 4.7), every iteration that proceeds to compute a step satisfies∥gk∥2>κ...
-
[29]
f⋆= varies
Once ∆ k≤¯∆ ε, Lemma 4.19 prevents further criticality shrinkage and Lemma 4.18 forces a successful step, contradicting the assumption. Therefore, f decreases by a fixed positive amount infinitely often, contradicting the lower boundf≥finf. G.4 Proof of Theorem 4.21 Proof. Fix ε∈(0, 1]. By Theorem 4.20, such indices exist for every ε∈(0, 1]; let K be the ...
discussion (0)
Sign in with ORCID, Apple, or X to comment. Anyone can read and Pith papers without signing in.