pith. sign in

arxiv: 2604.18321 · v1 · submitted 2026-04-20 · 🧮 math.OC

Accuracy Certificates for Convex Optimization at Accelerated Rates via Primal-Dual Averaging

Pith reviewed 2026-05-10 04:00 UTC · model grok-4.3

classification 🧮 math.OC
keywords convex optimizationoptimality certificatesprimal-dual averagingaccelerated ratesregularized surrogatezero-sum gamesFisher markets
0
0 comments X

The pith

A three-average primal-dual averaging method yields accelerated rates for computable optimality certificates in convex optimization.

A machine-rendered reading of the paper's core claim, the machinery that carries it, and where it could break.

Many convex optimization algorithms target a small primal gap, yet this quantity is typically unavailable during computation. The paper shows that applying primal-dual averaging to a regularized surrogate problem produces a practical, computable certificate of optimality with non-asymptotic guarantees. One-average methods deliver standard Õ(ε^{-1}) certificate rates, while a symmetric two-average version sacrifices those guarantees. A new three-average construction restores the certificates at an accelerated Õ(ε^{-1/2}) rate and establishes direct primal-dual algorithm correspondences. The results also interpret the methods as dynamics on zero-sum matrix games and Fisher markets.

Core claim

Solving a regularized surrogate with primal-dual averaging supplies non-asymptotic bounds on a computable optimality certificate. One-average schemes (modified dual averaging and generalized conditional gradient) achieve Õ(ε^{-1}) rates. A self-dual two-average method preserves symmetry at the cost of certificate guarantees. The three-average method recovers accelerated Õ(ε^{-1/2}) certificate convergence; its primal form mirrors dual gradient extrapolation. The same averaging framework connects to equilibrium dynamics in zero-sum games and Fisher markets.

What carries the argument

Three-average primal-dual averaging applied to a regularized surrogate, which restores both symmetry and accelerated certificate rates while corresponding to known dual extrapolation schemes.

If this is right

  • One-average primal-dual averaging achieves Õ(ε^{-1}) certificate complexity.
  • Two-average methods remain symmetric but lose certificate convergence guarantees.
  • The three-average method achieves Õ(ε^{-1/2}) certificate complexity.
  • The primal three-average scheme corresponds exactly to gradient extrapolation on the dual side.
  • The averaging constructions apply directly to equilibrium computation in zero-sum matrix games and Fisher markets.

Where Pith is reading between the lines

These are editorial extensions of the paper, not claims the author makes directly.

  • The number of averages can be viewed as a tunable parameter that trades off symmetry, acceleration, and certificate quality across other convex problems.
  • These certificates supply practical, computable stopping rules for large-scale solvers where the true duality gap cannot be evaluated exactly.
  • The explicit game-theoretic and market interpretations open the possibility of importing averaging techniques into equilibrium-finding algorithms.

Load-bearing premise

The underlying problem is convex and the chosen regularization allows the averaging procedures to solve the surrogate to sufficient accuracy.

What would settle it

A concrete convex optimization instance on which the three-average method produces a certificate whose error fails to shrink at the claimed accelerated rate.

Figures

Figures reproduced from arXiv: 2604.18321 by Jiaming Liang, Matthew X. Burns.

Figure 1
Figure 1. Figure 1: Geometric illustration of the AggGCG and TAA updates. In AggGCG, there is no [PITH_FULL_IMAGE:figures/full_fig_p009_1.png] view at source ↗
read the original abstract

Many works in convex optimization provide rates for achieving a small primal gap. However, this quantity is typically unavailable in practice. In this work, we show that solving a regularized surrogate with algorithms based on simple primal-dual averaging provides non-asymptotic convergence guarantees for a \textit{computable} optimality certificate. We first analyze primal and dual methods based on one average, namely modified dual averaging and generalized conditional gradient, and establish $\tilde{O}(\varepsilon^{-1})$ certificate complexities. Motivated by asymmetries in the one-average case, we analyze a self-dual, two-average method that preserves symmetry while losing certificate guarantees. To recover certificate convergence, we propose a three-average method that achieves an accelerated $\tilde{O}(\varepsilon^{-1/2})$ certificate complexity. Furthermore, we prove primal-dual algorithm correspondences for the one, two, and three-average cases. In particular, the primal three-average accelerated method mirrors the well-known gradient extrapolation method in the dual. By interpreting our results through the lens of zero-sum matrix games and Fisher markets, we further connect primal-dual averaging methods to game theory and market dynamics.

Editorial analysis

A structured set of objections, weighed in public.

Desk editor's note, referee report, simulated authors' rebuttal, and a circularity audit. Tearing a paper down is the easy half of reading it; the pith above is the substance, this is the friction.

Referee Report

2 major / 1 minor

Summary. The paper develops primal-dual averaging schemes for convex optimization that produce computable optimality certificates rather than relying on the unavailable primal gap. One-average methods (modified dual averaging and generalized conditional gradient) are shown to achieve Õ(ε^{-1}) certificate complexity. A symmetric two-average method is analyzed but loses certificate guarantees. A three-average method is proposed that recovers an accelerated Õ(ε^{-1/2}) certificate complexity. Primal-dual algorithm correspondences are established for all three cases, and the results are interpreted in the settings of zero-sum matrix games and Fisher markets.

Significance. If the claimed accelerated certificate rate survives explicit control of the regularization bias, the work would provide a useful bridge between simple averaging algorithms and verifiable accuracy guarantees at accelerated rates. The primal-dual correspondences and game-theoretic links are additional strengths that could aid interpretation in related fields.

major comments (2)
  1. [Abstract] Abstract: the stated Õ(ε^{-1/2}) certificate complexity for the three-average method does not exhibit the regularization schedule λ(ε). Controlling the bias term (typically O(λ R²)) to be at most ε/2 forces λ = Θ(ε). Substituting this choice into the convergence analysis of the averaging scheme on the λ-regularized surrogate (which inherits a 1/λ factor from strong convexity or smoothness) produces an overall iteration count of Õ(ε^{-1}) rather than Õ(ε^{-1/2}), unless the paper derives an explicit cancellation that preserves acceleration.
  2. [Abstract] The non-asymptotic certificate bounds for the one-average and three-average cases must be re-derived with the explicit λ(ε) dependence included; without this, it is unclear whether the claimed rates are for the surrogate only or for the original unregularized problem.
minor comments (1)
  1. [Abstract] The abstract uses both 'tilde-O' and 'Õ' notation interchangeably; standardize the notation throughout.

Simulated Author's Rebuttal

2 responses · 0 unresolved

We thank the referee for the thoughtful and constructive report. The comments correctly identify that the abstract and analysis must make the dependence on the regularization parameter λ(ε) fully explicit when transferring guarantees from the surrogate to the original problem. We address both points below and will revise the manuscript accordingly.

read point-by-point responses
  1. Referee: [Abstract] Abstract: the stated Õ(ε^{-1/2}) certificate complexity for the three-average method does not exhibit the regularization schedule λ(ε). Controlling the bias term (typically O(λ R²)) to be at most ε/2 forces λ = Θ(ε). Substituting this choice into the convergence analysis of the averaging scheme on the λ-regularized surrogate (which inherits a 1/λ factor from strong convexity or smoothness) produces an overall iteration count of Õ(ε^{-1}) rather than Õ(ε^{-1/2}), unless the paper derives an explicit cancellation that preserves acceleration.

    Authors: We agree that the abstract does not display the λ(ε) schedule and that a direct substitution λ = Θ(ε) introduces a 1/λ factor that offsets the acceleration, yielding an overall Õ(ε^{-1}) certificate complexity for the original problem. The manuscript derives accelerated rates only for the λ-regularized surrogate; no cancellation that restores Õ(ε^{-1/2}) for the unregularized objective is shown. We will revise the abstract, introduction, and theorem statements to state the surrogate rates explicitly, insert the choice λ(ε) = Θ(ε) needed to control bias, and report the resulting Õ(ε^{-1}) complexity for the original problem. This makes the claims precise without overstating the acceleration. revision: yes

  2. Referee: [Abstract] The non-asymptotic certificate bounds for the one-average and three-average cases must be re-derived with the explicit λ(ε) dependence included; without this, it is unclear whether the claimed rates are for the surrogate only or for the original unregularized problem.

    Authors: We accept the request. The current non-asymptotic bounds are written for a generic fixed λ and therefore apply directly to the surrogate. We will re-derive and restate the bounds for both the one-average and three-average methods with the explicit substitution λ = Θ(ε) (or any other schedule the referee prefers), clearly separating the surrogate certificate complexity from the complexity required to reach an ε-certificate for the original objective. The one-average case remains Õ(ε^{-1}); the three-average case becomes Õ(ε^{-1}) once bias control is included. These revisions will be placed in the main theorems and the abstract. revision: yes

Circularity Check

0 steps flagged

No circularity; derivations follow from explicit analysis of primal-dual averaging on regularized surrogates

full rationale

The paper's central results consist of convergence analyses for one-average, two-average, and three-average primal-dual methods applied to a λ-regularized surrogate problem, followed by direct proofs of primal-dual algorithm correspondences. These steps rely on standard Lyapunov-style arguments and averaging identities that are derived from the algorithm recursions themselves rather than from any fitted parameter, self-definition, or prior self-citation that encodes the target certificate rate. The choice of regularization parameter λ appears as an explicit design choice whose effect on bias and strong-convexity modulus is accounted for in the final complexity statement; it does not create a definitional loop. No load-bearing uniqueness theorem or ansatz is imported from the authors' own prior work. The claimed Õ(ε^{-1/2}) certificate complexity is therefore obtained by ordinary non-asymptotic analysis and remains independent of the result it asserts.

Axiom & Free-Parameter Ledger

1 free parameters · 2 axioms · 0 invented entities

The central claim rests on standard convexity assumptions and properties of primal-dual averaging applied to a regularized surrogate problem.

free parameters (1)
  • regularization parameter
    Used to form the surrogate problem whose solution yields the certificate; choice affects the rates but not specified as fitted in abstract.
axioms (2)
  • domain assumption The optimization problem is convex
    Required for all stated convergence rates and certificate guarantees.
  • domain assumption Primal-dual averaging schemes produce the claimed convergence on the surrogate
    Invoked to establish the certificate complexities for one-, two-, and three-average methods.

pith-pipeline@v0.9.0 · 5496 in / 1180 out tokens · 43586 ms · 2026-05-10T04:00:47.025655+00:00 · methodology

discussion (0)

Sign in with ORCID, Apple, or X to comment. Anyone can read and Pith papers without signing in.

Reference graph

Works this paper leans on

112 extracted references · 112 canonical work pages

  1. [1]

    Foundations and Trends

    Online learning and online convex optimization , author=. Foundations and Trends. 2012 , publisher=

  2. [2]

    2020 , publisher=

    Statistical inference via convex optimization , author=. 2020 , publisher=

  3. [3]

    Foundations of Computational Mathematics , volume=

    Exact matrix completion via convex programming , author=. Foundations of Computational Mathematics , volume=

  4. [4]

    IEEE Transactions on Information Theory , volume=

    Robust uncertainty principles: Exact signal reconstruction from highly incomplete frequency information , author=. IEEE Transactions on Information Theory , volume=. 2006 , publisher=

  5. [5]

    Journal of Mathematical Imaging and Vision , volume=

    A first-order primal-dual algorithm for convex problems with applications to imaging , author=. Journal of Mathematical Imaging and Vision , volume=. 2011 , publisher=

  6. [6]

    2022 , publisher=

    Optimization for Data Analysis , author=. 2022 , publisher=

  7. [7]

    2011 , publisher=

    Optimization for Machine Learning , author=. 2011 , publisher=

  8. [8]

    Mathematical Programming , volume=

    Unifying mirror descent and dual averaging , author=. Mathematical Programming , volume=. 2023 , publisher=

  9. [9]

    Journal of Optimization Theory and Applications , volume=

    Quasi-monotone subgradient methods for nonsmooth convex minimization , author=. Journal of Optimization Theory and Applications , volume=. 2015 , publisher=

  10. [10]

    Mathematical programming , volume=

    A single cut proximal bundle method for stochastic convex composite optimization , author=. Mathematical programming , volume=. 2024 , publisher=

  11. [11]

    arXiv preprint arXiv:2407.10073 , year=

    Universal subgradient and proximal bundle methods for convex and strongly convex hybrid composite optimization , author=. arXiv preprint arXiv:2407.10073 , year=

  12. [12]

    A subgradient method for finding a saddle point of a convex-concave function , author=. Issled. Prikl. Mat , volume=

  13. [13]

    Matekon , volume=

    Gradient methods for finding saddle points , author=. Matekon , volume=

  14. [14]

    Matekon , volume=

    Generalized gradient method for finding saddlepoints , author=. Matekon , volume=. 1974 , publisher=

  15. [15]

    Studies in linear and nonlinear programming , volume=

    Iterative methods for concave programming , author=. Studies in linear and nonlinear programming , volume=

  16. [16]

    SIAM Journal on Optimization , volume=

    A proximal bundle variant with optimal iteration-complexity for a large range of prox stepsizes , author=. SIAM Journal on Optimization , volume=. 2021 , publisher=

  17. [17]

    Mathematics of Operations Research , volume=

    A unified analysis of a class of proximal bundle methods for solving hybrid convex composite optimization problems , author=. Mathematics of Operations Research , volume=. 2024 , publisher=

  18. [18]

    SIAM Journal on Optimization , volume=

    Optimal convergence rates for the proximal bundle method , author=. SIAM Journal on Optimization , volume=. 2023 , publisher=

  19. [19]

    SIAM Journal on Optimization , volume=

    A nonmonotone proximal bundle method with (potentially) continuous step decisions , author=. SIAM Journal on Optimization , volume=. 2013 , publisher=

  20. [20]

    1993 , publisher=

    Convex analysis and minimization algorithms II , author=. 1993 , publisher=

  21. [21]

    Computers & Operations Research , volume=

    Probabilistic optimization via approximate p-efficient points and bundle methods , author=. Computers & Operations Research , volume=. 2017 , publisher=

  22. [22]

    2011 , publisher=

    Nonlinear optimization , author=. 2011 , publisher=

  23. [23]

    Mathematical Programming , volume=

    Convex proximal bundle methods in depth: a unified analysis for inexact oracles , author=. Mathematical Programming , volume=. 2014 , publisher=

  24. [24]

    Journal of Optimization Theory and Applications , volume=

    Efficiency of proximal bundle methods , author=. Journal of Optimization Theory and Applications , volume=. 2000 , publisher=

  25. [25]

    SIAM Journal on Optimization , volume=

    Generalized bundle methods , author=. SIAM Journal on Optimization , volume=. 2002 , publisher=

  26. [26]

    Journal of Optimization Theory and Applications , volume=

    Rate of convergence of the bundle method , author=. Journal of Optimization Theory and Applications , volume=. 2017 , publisher=

  27. [27]

    , booktitle=

    Mifflin, R. , booktitle=. A modification and an extension of. 1982 , publisher=

  28. [28]

    1978 , publisher=

    Nonsmooth optimization and descent methods , author=. 1978 , publisher=

  29. [29]

    Nondifferentiable optimization , pages=

    A method of conjugate subgradients for minimizing nondifferentiable functions , author=. Nondifferentiable optimization , pages=. 1975 , publisher=

  30. [30]

    Nondifferentiable optimization , pages=

    An extension of Davidon methods to non differentiable problems , author=. Nondifferentiable optimization , pages=. 1975 , publisher=

  31. [31]

    and Sun, X

    Fersztand, D. and Sun, X. A. , journal=. The Proximal Bundle Algorithm Under a

  32. [32]

    Automation and Remote Control , volume=

    Algorithms of robust stochastic optimization based on mirror descent method , author=. Automation and Remote Control , volume=. 2019 , publisher=

  33. [33]

    and Calatroni, L

    Aujol, J.-F. and Calatroni, L. and Dossal, C. and Labarri. Parameter-free. Available at arXiv:2307.14323 , year=

  34. [34]

    Mathematical Programming , year=

    A single cut proximal bundle method for stochastic convex composite optimization , author=. Mathematical Programming , year=

  35. [35]

    Optimal and parameter-free gra- dient minimization methods for convex and nonconvex optimization.arXiv preprint arXiv:2310.12139, 2023

    Optimal and parameter-free gradient minimization methods for smooth optimization , author=. Available at arXiv:2310.12139 , year=

  36. [36]

    Mathematical programming , volume=

    Gradient methods for minimizing composite functions , author=. Mathematical programming , volume=. 2013 , publisher=

  37. [37]

    Journal of Machine Learning Research , volume=

    Loss minimization and parameter estimation with heavy tails , author=. Journal of Machine Learning Research , volume=. 2016 , publisher=

  38. [38]

    1983 , publisher=

    Problem Complexity and Method Efficiency in Optimization , author=. 1983 , publisher=

  39. [39]

    SIAM Journal on Control and Optimization , volume=

    Acceleration of stochastic approximation by averaging , author=. SIAM Journal on Control and Optimization , volume=. 1992 , publisher=

  40. [40]

    Mathematical Programming , volume=

    An optimal method for stochastic composite optimization , author=. Mathematical Programming , volume=. 2012 , publisher=

  41. [41]

    SIAM Journal on Optimization , volume=

    Optimal stochastic approximation algorithms for strongly convex stochastic composite optimization, II: shrinking procedures and optimal algorithms , author=. SIAM Journal on Optimization , volume=. 2013 , publisher=

  42. [42]

    SIAM Journal on Optimization , volume=

    Optimal stochastic approximation algorithms for strongly convex stochastic composite optimization i: A generic algorithmic framework , author=. SIAM Journal on Optimization , volume=. 2012 , publisher=

  43. [43]

    Stochastic Systems , volume=

    Deterministic and stochastic primal-dual subgradient algorithms for uniformly convex minimization , author=. Stochastic Systems , volume=. 2014 , publisher=

  44. [44]

    SIAM Journal on Optimization , volume=

    On the complexity of the hybrid proximal extragradient method for the iterates and the ergodic mean , author=. SIAM Journal on Optimization , volume=. 2010 , publisher=

  45. [45]

    R. D. C. Monteiro and B. F. Svaiter , title =. SIAM Journal on Optimization , volume =

  46. [46]

    Gradient methods for minimizing composite functions , author=. Math. Programming , pages=. 2012 , publisher=

  47. [47]

    SIAM Journal on Optimization , volume=

    Stochastic (approximate) proximal point methods: Convergence, optimality, and adaptivity , author=. SIAM Journal on Optimization , volume=. 2019 , publisher=

  48. [48]

    Mathematics of Operations Research , year=

    A unified analysis of a class of proximal bundle methods for solving hybrid convex composite optimization problems , author=. Mathematics of Operations Research , year=

  49. [49]

    Journal of Machine Learning Research , volume=

    From low probability to high confidence in stochastic convex optimization , author=. Journal of Machine Learning Research , volume=

  50. [50]

    Nemirovski and A

    A. Nemirovski and A. Juditsky and G. Lan and A. Shapiro , title=. SIAM Journal on Optimization , volume=19, issue=4, year=2009, pages=

  51. [51]

    J.-B. H. Urruty and C. Lemar\'echal , title =

  52. [52]

    , title =

    Beck, A. , title =. 2017 , doi =

  53. [53]

    Rockafellar and R

    R. Rockafellar and R. Wets , TITLE =

  54. [54]

    arXiv , year=

    The Complexity of Large-scale Convex Programming under a Linear Optimization Oracle , author=. arXiv , year=

  55. [55]

    Journal of optimization theory and applications , volume=

    Subgradient methods for saddle-point problems , author=. Journal of optimization theory and applications , volume=. 2009 , publisher=

  56. [56]

    Boţ and E.R

    R.I. Boţ and E.R. Csetnek and M. Sedlmayer Michael , title =. Comput. Optim. Appl. , volume =. 2023 , pages =

  57. [57]

    Hiriart-Urruty and C

    J. Hiriart-Urruty and C. Lemar\'echal , TITLE =

  58. [58]

    Nesterov and V

    Y. Nesterov and V. Shikhman , TITLE =. J. Oper. Res. Soc. China , VOLUME =. 2018 , NUMBER =

  59. [59]

    Nesterov , title =

    Y. Nesterov , title =

  60. [60]

    Anstreicher and L.A

    K.M. Anstreicher and L.A. Wolsey , title =. Math.Program. , volume =

  61. [61]

    Larsson and M

    T. Larsson and M. Patriksson and AB. Strömberg , title =. Math. Program. , volume =

  62. [62]

    Gustavsson and M

    E. Gustavsson and M. Patriksson and AB. Strömberg , title =. Math. Program. , volume =

  63. [63]

    SIAM Journal on Optimization , volume=

    Approximate primal solutions and rate analysis for dual subgradient methods , author=. SIAM Journal on Optimization , volume=. 2009 , publisher=

  64. [64]

    ArXiv , year=

    Some primal-dual theory for subgradient methods for strongly convex optimization , author=. ArXiv , year=

  65. [65]

    Nesterov , title =

    Y. Nesterov , title =. Math. Program. , volume =

  66. [66]

    Mathematical Programming , volume=

    Primal-dual subgradient methods for convex problems , author=. Mathematical Programming , volume=. 2009 , publisher=

  67. [67]

    Xu , title =

    Y. Xu , title =. SIAM J. Optim. , volume =

  68. [68]

    Nesterov and S

    Y. Nesterov and S. Shpirko , title =. SIAM J. Optim. , volume =

  69. [69]

    SIAM Journal on Optimization , volume=

    A unifying polyhedral approximation framework for convex optimization , author=. SIAM Journal on Optimization , volume=. 2011 , publisher=

  70. [70]

    and Moquet, T

    Mazanti, G. and Moquet, T. and Pfeiffer, L. , journal=. A nonsmooth

  71. [71]

    Foundations and Trends

    Learning with submodular functions: A convex optimization perspective , author=. Foundations and Trends. 2013 , publisher=

  72. [72]

    SIAM Journal on Optimization , volume=

    Duality between subgradient and conditional gradient methods , author=. SIAM Journal on Optimization , volume=. 2015 , publisher=

  73. [73]

    Advances in Neural Information Processing Systems , volume=

    First-order methods for large-scale market equilibrium computation , author=. Advances in Neural Information Processing Systems , volume=

  74. [74]

    and Teboulle, M

    Chen, G. and Teboulle, M. , year = 2006, journal =. Convergence. doi:10.1137/0803026 , urldate =

  75. [75]

    SIAM Journal on Optimization , volume =

    Relatively Smooth Convex Optimization by First-Order Methods, and Applications , author =. SIAM Journal on Optimization , volume =. doi:10.1137/16M1099546 , urldate =

  76. [76]

    Mathematical Programming , volume =

    Some Primal-Dual Theory for Subgradient Methods for Strongly Convex Optimization , author =. Mathematical Programming , volume =

  77. [77]

    Liang , author=

    Primal-dual proximal bundle and conditional gradient methods for convex problems: J. Liang , author=. Mathematical Programming , pages=. 2025 , publisher=

  78. [78]

    Cheung, Y. K. and Cole, R. and Devanur, N. , year = 2013, series =. Tatonnement beyond Gross Substitutes? Gradient Descent to the Rescue , shorttitle =. Proceedings of the Forty-Fifth Annual. doi:10.1145/2488608.2488633 , urldate =

  79. [79]

    and Freund, R

    Lu, H. and Freund, R. M. , journal=. Generalized stochastic. 2021 , publisher=

  80. [80]

    Mathematical Programming , volume =

    Subgradient Ellipsoid Method for Nonsmooth Convex Problems , author =. Mathematical Programming , volume =. doi:10.1007/s10107-022-01833-4 , urldate =

Showing first 80 references.