pith. sign in

arxiv: 2312.03169 · v1 · pith:7HBFHEKQnew · submitted 2023-12-05 · 🧮 math.OC

Q-fully Quadratic Modeling and its Application in a Random Subspace Derivative-free Method

Pith reviewed 2026-05-24 05:15 UTC · model grok-4.3

classification 🧮 math.OC
keywords derivative-free optimizationtrust-region methodsrandom subspacesquadratic modelsglobal convergencehigh-dimensional problems
0
0 comments X

The pith

A random-subspace trust-region algorithm using quadratic models converges almost surely to small-gradient points for general objective functions.

A machine-rendered reading of the paper's core claim, the machinery that carries it, and where it could break.

The paper develops a way to build quadratic approximations that satisfy the same error properties as fully quadratic models but inside randomly chosen low-dimensional subspaces. These approximations are then used inside a trust-region method that never needs the full-dimensional gradient or Hessian. The authors prove that the resulting algorithm reaches a point with arbitrarily small gradient with probability one and that the expected number of steps needed is finite. They also run numerical tests on standard problems to compare quadratic and linear models inside the same subspace framework. A reader should care because the approach removes the usual restriction to linear models when applying random subspaces to high-dimensional derivative-free problems.

Core claim

The paper constructs Q-fully quadratic models in randomly selected subspaces and embeds them in a trust-region algorithm that requires no special structure on the objective. Under standard smoothness assumptions, the geometry of the sample sets and the resulting approximation errors remain controlled with high probability in the subspaces. This yields an almost-sure global convergence guarantee together with an explicit upper bound on the expected number of iterations required to drive the gradient norm below any positive threshold.

What carries the argument

The Q-fully quadratic model, a quadratic approximation whose error bounds hold uniformly in randomly chosen subspaces and thereby support trust-region steps without full-space sampling.

If this is right

  • The algorithm reaches points with arbitrarily small gradient almost surely.
  • The expected iteration count to reach a given gradient tolerance is finite and explicitly bounded.
  • Quadratic models can be built and analyzed in subspaces with the same tools used for linear models.
  • Numerical comparisons reveal cases where quadratic models improve performance over linear ones and cases where they do not.

Where Pith is reading between the lines

These are editorial extensions of the paper, not claims the author makes directly.

  • The same subspace model construction could be reused inside other derivative-free frameworks such as direct-search methods.
  • The iteration bound may become practically useful once the subspace dimension and sampling parameters are tuned for a given problem class.
  • Extending the analysis to noisy or stochastic objectives would require only modest changes to the error-bound arguments already developed.

Load-bearing premise

Random subspaces preserve the sample-set geometry and quadratic error bounds needed for the model construction to work on arbitrary smooth functions.

What would settle it

Run the algorithm on a smooth high-dimensional test function such as the Rosenbrock function and observe whether the gradient norm fails to drop below a fixed tolerance after more iterations than the derived expectation bound.

Figures

Figures reproduced from arXiv: 2312.03169 by Amy Wiebe, Warren Hare, Yiwen Chen.

Figure 1
Figure 1. Figure 1: Legend of figures. 4.1.1 Results based on runtime Figures 2 and 3 present the performance and data profiles based on runtime with accuracy levels τ = 0.5 and τ = 10−3 . Figures 2 and 3 show that, for the low accuracy level τ = 0.5, when measured by the runtime, solvers with p = prand = 1 perform better than other choices of p and prand. The solver based on determined quadratic models with p = prand = 1 is … view at source ↗
Figure 2
Figure 2. Figure 2: Performance profiles with τ = 0.5 and τ = 10−3 comparing the runtime [PITH_FULL_IMAGE:figures/full_fig_p027_2.png] view at source ↗
Figure 3
Figure 3. Figure 3: Data profiles with τ = 0.5 and τ = 10−3 comparing the runtime. 4.1.2 Results based on function evaluations Figures 4 and 5 present the performance and data profiles based on function evaluations with accuracy levels τ = 0.5 and τ = 10−3 . We can see that when measured by function evaluations, for both low accuracy level τ = 0.5 and high accuracy level τ = 10−3 , solvers based on linear models perform bette… view at source ↗
Figure 4
Figure 4. Figure 4: Performance profiles with τ = 0.5 and τ = 10−3 comparing the function evaluations [PITH_FULL_IMAGE:figures/full_fig_p028_4.png] view at source ↗
Figure 5
Figure 5. Figure 5: Data profiles with τ = 0.5 and τ = 10−3 comparing the function evaluations. 4.2 Results on nonlinear least-squares problems For nonlinear least-squares problems, there exists another method to construct quadratic models. As shown in [6], we can first construct a subspace linear interpolation model for g, denoted by me g(sb), which interpolates gb(sb) = g(x 0 + Qsb) on the (p + 1) points {0} ∪ {ri : i = 1, … view at source ↗
Figure 6
Figure 6. Figure 6: Legend of figures (continued). 4.2.1 Results based on runtime Figures 7 and 8 present the performance and data profiles based on runtime with accuracy levels τ = 0.5 and τ = 10−3 . Figures 7 and 8 show that, for the low accuracy level τ = 0.5, when measured by the runtime, solvers based on square-of-linear models perform better than linear models, but worse than other quadratic models in general. However, … view at source ↗
Figure 7
Figure 7. Figure 7: Performance profiles with τ = 0.5 and τ = 10−3 comparing the runtime [PITH_FULL_IMAGE:figures/full_fig_p029_7.png] view at source ↗
Figure 8
Figure 8. Figure 8: Data profiles with τ = 0.5 and τ = 10−3 comparing the runtime. 4.2.2 Results based on function evaluations Figures 9 and 10 present the performance and data profiles based on function evaluations with accuracy levels τ = 0.5 and τ = 10−3 . When measured by function evaluations, as shown in Figures 9 and 10, for both accuracy levels τ = 0.5 and τ = 10−3 , the solvers based on square-of￾linear models perform… view at source ↗
Figure 9
Figure 9. Figure 9: Performance profiles with τ = 0.5 and τ = 10−3 comparing the function evaluations [PITH_FULL_IMAGE:figures/full_fig_p030_9.png] view at source ↗
Figure 10
Figure 10. Figure 10: Data profiles with τ = 0.5 and τ = 10−3 comparing the function evaluations. 5 Conclusion In this paper, we proposed QARSTA, a random subspace derivative-free trust-region algorithm for unconstrained deterministic optimization problems. This algorithm extends the previous ran￾dom subspace trust-region algorithms to a much broader scope by using quadratic approximation models instead of linear models and ge… view at source ↗
Figure 11
Figure 11. Figure 11: Performance profiles with τ = 10−1 and τ = 10−2 comparing the runtime [PITH_FULL_IMAGE:figures/full_fig_p032_11.png] view at source ↗
Figure 12
Figure 12. Figure 12: Data profiles with τ = 10−1 and τ = 10−2 comparing the runtime. A.1.2 Results based on function evaluations [PITH_FULL_IMAGE:figures/full_fig_p032_12.png] view at source ↗
Figure 13
Figure 13. Figure 13: Performance profiles with τ = 10−1 and τ = 10−2 comparing the function evaluations [PITH_FULL_IMAGE:figures/full_fig_p033_13.png] view at source ↗
Figure 14
Figure 14. Figure 14: Data profiles with τ = 10−1 and τ = 10−2 comparing the function evaluations. A.2 Results on nonlinear least-squares problems A.2.1 Results based on runtime [PITH_FULL_IMAGE:figures/full_fig_p033_14.png] view at source ↗
Figure 15
Figure 15. Figure 15: Performance profiles with τ = 10−1 and τ = 10−2 comparing the runtime [PITH_FULL_IMAGE:figures/full_fig_p034_15.png] view at source ↗
Figure 16
Figure 16. Figure 16: Data profiles with τ = 10−1 and τ = 10−2 comparing the runtime. A.2.2 Results based on function evaluations [PITH_FULL_IMAGE:figures/full_fig_p034_16.png] view at source ↗
Figure 17
Figure 17. Figure 17: Performance profiles with τ = 10−1 and τ = 10−2 comparing the function evaluations [PITH_FULL_IMAGE:figures/full_fig_p035_17.png] view at source ↗
Figure 18
Figure 18. Figure 18: Data profiles with τ = 10−1 and τ = 10−2 comparing the function evaluations. References 1. S. Alarie, C. Audet, A. E. Gheribi, M. Kokkolaras, and S. Le Digabel, Two decades of blackbox optimization applications, EURO J. Comput. Optim., 9 (2021), p. 100011. 2. M. Alzantot, Y. Sharma, S. Chakraborty, H. Zhang, C. Hsieh, and M. B. Srivastava, GenAttack: practical black-box attacks with gradient-free optimiza… view at source ↗
read the original abstract

Model-based derivative-free optimization (DFO) methods are an important class of DFO methods that are known to struggle with solving high-dimensional optimization problems. Recent research has shown that incorporating random subspaces into model-based DFO methods has the potential to improve their performance on high-dimensional problems. However, most of the current theoretical and practical results are based on linear approximation models due to the complexity of quadratic approximation models. This paper proposes a random subspace trust-region algorithm based on quadratic approximations. Unlike most of its precursors, this algorithm does not require any special form of objective function. We study the geometry of sample sets, the error bounds for approximations, and the quality of subspaces. In particular, we provide a technique to construct $Q$-fully quadratic models, which is easy to analyze and implement. We present an almost-sure global convergence result of our algorithm and give an upper bound on the expected number of iterations to find a sufficiently small gradient. We also develop numerical experiments to compare the performance of our algorithm using both linear and quadratic approximation models. The numerical results demonstrate the strengths and weaknesses of using quadratic approximations.

Editorial analysis

A structured set of objections, weighed in public.

Desk editor's note, referee report, simulated authors' rebuttal, and a circularity audit. Tearing a paper down is the easy half of reading it; the pith above is the substance, this is the friction.

Referee Report

2 major / 3 minor

Summary. The paper introduces a random-subspace trust-region algorithm for derivative-free optimization that employs quadratic (Q-fully quadratic) models rather than linear ones. It develops a construction technique for such models, analyzes the geometry of sample sets and approximation error bounds in randomly chosen subspaces, proves an almost-sure global convergence result together with an upper bound on the expected number of iterations needed to reach a sufficiently small gradient norm, and reports numerical comparisons of linear versus quadratic models on test problems. The method is stated to apply to general smooth objectives without special structure.

Significance. If the almost-sure convergence and iteration bound hold, the work would extend model-based DFO theory to quadratic approximations inside random subspaces for arbitrary objectives, addressing a recognized limitation of existing random-subspace methods that rely on linear models. The explicit construction technique for Q-fully quadratic models and the accompanying geometry/error analysis constitute concrete technical contributions that could be reused.

major comments (2)
  1. [§4] §4 (convergence analysis) and the subspace-quality section: the almost-sure convergence argument invokes Borel–Cantelli on the event that each random subspace yields a sufficiently poised sample set for a Q-fully quadratic model. No explicit lower bound on this probability that is independent of ambient dimension n and of the iteration index is supplied; without such a uniform positive lower bound the Borel–Cantelli step does not go through for general smooth f.
  2. [§3] Definition of Q-fully quadratic model and the error-bound statements (around Eq. (3.4)–(3.7)): the claimed O(Δ²) gradient and O(Δ) Hessian error bounds are asserted to hold with constants independent of the random subspace, yet the poisedness constants for the projected point set are allowed to depend on the particular realization of the subspace; this dependence must be controlled uniformly to justify the trust-region step analysis that follows.
minor comments (3)
  1. Notation for the random subspace projection operator is introduced without a dedicated symbol table or consistent use across sections.
  2. [Numerical experiments] Numerical section reports performance on a modest collection of CUTEst problems but does not include high-dimensional instances (n ≫ 100) that would directly test the claimed advantage of the random-subspace quadratic approach.
  3. A few typographical inconsistencies appear in the statement of the trust-region subproblem (e.g., the radius update rule).

Simulated Author's Rebuttal

2 responses · 0 unresolved

We thank the referee for the careful reading and constructive comments. We address the two major comments point by point below. Both identify gaps in the current presentation of the probability bound and uniformity of constants; we will revise the manuscript to close these gaps.

read point-by-point responses
  1. Referee: [§4] §4 (convergence analysis) and the subspace-quality section: the almost-sure convergence argument invokes Borel–Cantelli on the event that each random subspace yields a sufficiently poised sample set for a Q-fully quadratic model. No explicit lower bound on this probability that is independent of ambient dimension n and of the iteration index is supplied; without such a uniform positive lower bound the Borel–Cantelli step does not go through for general smooth f.

    Authors: We agree that an explicit lower bound must be stated. Since the problem dimension n is fixed for any given instance, a positive lower bound that depends on n but is independent of the iteration index and of f is sufficient for the Borel–Cantelli argument. In the revision we will add a lemma deriving such a bound from the random-subspace sampling procedure and the geometry conditions of Section 3, thereby completing the almost-sure convergence proof for general smooth objectives. revision: yes

  2. Referee: [§3] Definition of Q-fully quadratic model and the error-bound statements (around Eq. (3.4)–(3.7)): the claimed O(Δ²) gradient and O(Δ) Hessian error bounds are asserted to hold with constants independent of the random subspace, yet the poisedness constants for the projected point set are allowed to depend on the particular realization of the subspace; this dependence must be controlled uniformly to justify the trust-region step analysis that follows.

    Authors: The current statements permit the poisedness constant to depend on the realized subspace, which could make the error constants non-uniform. We will revise Section 3 to establish a uniform upper bound on the poisedness constant that holds for all subspaces satisfying the quality condition used later in the analysis. With this uniform control the O(Δ²) gradient and O(Δ) Hessian bounds will hold with constants independent of the particular subspace realization, supporting the trust-region arguments in Section 4. revision: yes

Circularity Check

0 steps flagged

No circularity: convergence rests on independent analysis of random-subspace geometry and model error bounds

full rationale

The paper claims to provide an explicit technique for constructing Q-fully quadratic models in random subspaces, then derives almost-sure global convergence and an expected-iteration bound from the resulting geometry and approximation-error controls. No quoted step equates a derived quantity to a fitted parameter by construction, nor does any load-bearing premise reduce to a self-citation whose content is itself unverified. The derivation chain is therefore self-contained against external benchmarks (poisedness constants, error bounds) rather than tautological.

Axiom & Free-Parameter Ledger

0 free parameters · 1 axioms · 0 invented entities

Review performed from abstract only; no explicit free parameters, invented entities, or non-standard axioms are stated. Standard smoothness assumptions for quadratic approximation are implied by the DFO context.

axioms (1)
  • domain assumption Objective function is sufficiently smooth for local quadratic approximations to exist with controllable error.
    Required for any model-based DFO method; implied by the paper's focus on error bounds and quadratic models.

pith-pipeline@v0.9.0 · 5728 in / 1181 out tokens · 45709 ms · 2026-05-24T05:15:18.754693+00:00 · methodology

discussion (0)

Sign in with ORCID, Apple, or X to comment. Anyone can read and Pith papers without signing in.

Lean theorems connected to this paper

Citations machine-checked in the Pith Canon. Every link opens the source theorem in the public Lean library.

What do these tags mean?
matches
The paper's claim is directly supported by a theorem in the formal canon.
supports
The theorem supports part of the paper's argument, but the paper may add assumptions or extra steps.
extends
The paper goes beyond the formal theorem; the theorem is a base layer rather than the whole result.
uses
The paper appears to rely on the theorem as machinery.
contradicts
The paper's claim conflicts with a theorem or certificate in the canon.
unclear
Pith found a possible connection, but the passage is too broad, indirect, or ambiguous to say the theorem truly supports the claim.

Reference graph

Works this paper leans on

36 extracted references · 36 canonical work pages · 1 internal anchor

  1. [1]

    Alarie, C

    S. Alarie, C. Audet, A. E. Gheribi, M. Kokkolaras, and S. Le Digabel , Two decades of blackbox optimization applications, EURO J. Comput. Optim., 9 (2021), p. 100011

  2. [2]

    Alzantot, Y

    M. Alzantot, Y. Sharma, S. Chakraborty, H. Zhang, C. Hsieh, and M. B. Srivastava , GenAttack: practical black-box attacks with gradient-free optimization , in Proceedings of the Genetic and Evolutionary Computation Conference, 2019, pp. 1111–1119

  3. [3]

    Audet, J

    C. Audet, J. E. Dennis Jr, and S. Le Digabel , Parallel space decomposition of the mesh adaptive direct search algorithm, SIAM J. Optim., 19 (2008), pp. 1150–1170

  4. [4]

    Audet and W

    C. Audet and W. Hare , Derivative-free and blackbox optimization , Springer, Cham, 2017

  5. [5]

    , Model-based methods in derivative-free nonsmooth optimization , Springer, Cham, 2020, pp. 655–691

  6. [6]

    Cartis and L

    C. Cartis and L. Roberts , Scalable subspace methods for derivative-free nonlinear least-squares optimiza- tion, Math. Program., 199 (2023), pp. 461–524

  7. [7]

    P. Chen, H. Zhang, Y. Sharma, J. Yi, and C. Hsieh, ZOO: zeroth order optimization based black-box attacks to deep neural networks without training substitute models , in Proceedings of the 10th ACM Workshop on Artificial Intelligence and Security, 2017, pp. 15–26

  8. [8]

    Chen and W

    Y. Chen and W. Hare , Adapting the centred simplex gradient to compensate for misaligned sample points , IMA J. Numer. Anal., (2023). 36 Yiwen Chen et al

  9. [9]

    Y. Chen, W. Hare, and G. Jarry-Bolduc , Error analysis of surrogate models constructed through opera- tions on submodels , Math. Oper. Res., (2022)

  10. [10]

    A. Conn, K. Scheinberg, and L. N. Vicente, Geometry of interpolation sets in derivative free optimization , Math. Program., 111 (2008), pp. 141–172

  11. [11]

    , Geometry of sample sets in derivative-free optimization: polynomial regression and underdetermined interpolation, IMA J. Numer. Anal., 28 (2008), pp. 721–748

  12. [12]

    , Introduction to derivative-free optimization , SIAM, 2009

  13. [13]

    A. Conn, P. L. Toint, A. Sartenaer, and N. Gould , On iterated-subspace minimization methods for nonlinear optimization, tech. rep., Rutherford Appleton Laboratory, 1994

  14. [14]

    J. E. Dennis Jr. and R. B. Schnabel , Numerical methods for unconstrained optimization and nonlinear equations, SIAM, 1996

  15. [15]

    K. J. Dzahini and S. M. Wild, Stochastic trust-region algorithm in random subspaces with convergence and expected complexity analyses , (2022). arXiv:2207.06452

  16. [16]

    Feurer and F

    M. Feurer and F. Hutter , Hyperparameter optimization, Springer, Cham, 2019, pp. 3–33

  17. [17]

    Fukushima, Parallel variable transformation in unconstrained optimization , SIAM J

    M. Fukushima, Parallel variable transformation in unconstrained optimization , SIAM J. Optim., 8 (1998), pp. 658–672

  18. [18]

    Black-Box Optimization in Machine Learning with Trust Region Based Derivative Free Algorithm

    H. Ghanbari and K. Scheinberg, Black-box optimization in machine learning with trust region based deriva- tive free algorithm , (2017). arXiv:1703.06925

  19. [19]

    Gould, D

    N. Gould, D. Orban, and P. L. Toint, CUTEst: a constrained and unconstrained testing environment with safe threads for mathematical optimization , Comput. Optim. Appl., 60 (2015), pp. 545–557

  20. [20]

    G. N. Grapiglia, J. Yuan, and Y. Yuan , A subspace version of the Powell–Yuan trust-region algorithm for equality constrained optimization , J. Oper. Res. Soc. China, 1 (2013), pp. 425–451

  21. [21]

    Gratton, C

    S. Gratton, C. W. Royer, L. N. Vicente, and Z. Zhang , Direct search based on probabilistic descent , SIAM J. Optim., 25 (2015), pp. 1515–1541

  22. [22]

    Hare and G

    W. Hare and G. Jarry-Bolduc, Calculus identities for generalized simplex gradients: rules and applications , SIAM J. Optim., 30 (2020), pp. 853–884

  23. [23]

    W. Hare, G. Jarry-Bolduc, and C. Planiden , A matrix algebra approach to approximate hessians , IMA J. Numer. Anal., (2023)

  24. [24]

    W. Hare, L. Roberts, and C. W. Royer , Expected decrease for derivative-free algorithms using random subspaces, (2023). arXiv:2308.04734

  25. [25]

    R. A. Horn and C. R. Johnson , Topics in matrix analysis , Cambridge University Press, 1991

  26. [26]

    Larson, M

    J. Larson, M. Menickelly, and S. M. Wild, Derivative-free optimization methods, Acta Numer., 28 (2019), pp. 287–404

  27. [27]

    Z. Li, F. Liu, W. Yang, S. Peng, and J. Zhou , A survey of convolutional neural networks: analysis, applications, and prospects , IEEE Trans. Neural Netw. Learn. Syst., (2021)

  28. [28]

    Liuzzi, S

    G. Liuzzi, S. Lucidi, F. Rinaldi, and L. N. Vicente, Trust-region methods for the derivative-free optimiza- tion of nonsmooth black-box functions , SIAM J. Optim., 29 (2019), pp. 3012–3035

  29. [29]

    J. J. Mor ´e and S. M. Wild , Benchmarking derivative-free optimization algorithms , SIAM J. Optim., 20 (2009), pp. 172–191

  30. [30]

    Penrose, A generalized inverse for matrices , in Mathematical Proceedings of the Cambridge Philosophical Society, vol

    R. Penrose, A generalized inverse for matrices , in Mathematical Proceedings of the Cambridge Philosophical Society, vol. 51, Cambridge University Press, 1955, pp. 406–413

  31. [31]

    M. J. D. Powell, The NEWUOA software for unconstrained optimization without derivatives , Springer US, Boston, MA, 2006, pp. 255–297

  32. [32]

    , Developments of NEWUOA for minimization without derivatives , IMA J. Numer. Anal., 28 (2008), pp. 649–664

  33. [33]

    rep., University of Cambridge, 2009

    , The BOBYQA algorithm for bound constrained optimization without derivatives , tech. rep., University of Cambridge, 2009

  34. [34]

    Roberts and C

    L. Roberts and C. W. Royer , Direct search based on probabilistic descent in reduced spaces , SIAM J. Optim., 33 (2023), pp. 3057–3082

  35. [35]

    Zhang, On derivative-free optimization methods (in Chinese) , PhD thesis, Chinese Academy of Sciences,

    Z. Zhang, On derivative-free optimization methods (in Chinese) , PhD thesis, Chinese Academy of Sciences,

  36. [36]

    https://www.zhangzk.net/docs/publications/thesis.pdf