pith. machine review for the scientific record. sign in

arxiv: 2604.25613 · v1 · submitted 2026-04-28 · 🪐 quant-ph

Recognition: unknown

One Coordinate at a Time: Convergence Guarantees for Rotosolve in Variational Quantum Algorithms

M Girish Chandra, Sayantan Pramanik

Authors on Pith no claims yet

Pith reviewed 2026-05-07 16:53 UTC · model grok-4.3

classification 🪐 quant-ph
keywords Rotosolvevariational quantum algorithmsconvergence analysiscoordinate descentPolyak-Lojasiewicz conditionnon-convex optimizationshot noisequantum machine learning
0
0 comments X

The pith

Rotosolve converges to ε-stationary points in smooth non-convex variational quantum landscapes and to ε-suboptimal points under the Polyak-Lojasiewicz condition with explicit rates.

A machine-rendered reading of the paper's core claim, the machinery that carries it, and where it could break.

The paper resolves the open question of whether the Rotosolve algorithm for optimizing variational quantum circuits has convergence guarantees. It proves that the method reaches ε-stationary points when the landscape is non-convex and smooth, and ε-suboptimal points if the Polyak-Lojasiewicz condition also holds. Explicit worst-case convergence rates are derived that account for noise from finite quantum measurements. These rates are compared to randomized coordinate descent, noting that Rotosolve is hyperparameter-free and implicitly uses derivative information. Numerical experiments on quantum machine learning tasks supplement the theory by comparing Rotosolve to several other optimizers.

Core claim

Rotosolve sequentially optimizes one variational parameter at a time by exactly minimizing a trigonometric fit to the expectation value along that coordinate. The analysis shows that this procedure yields convergence to an ε-stationary point in smooth non-convex landscapes, and to an ε-suboptimal point under the Polyak-Lojasiewicz condition. The worst-case rates are derived for the finite-shot measurement regime by incorporating shot noise variance, and are shown to be comparable to those of randomized coordinate descent while highlighting Rotosolve's lack of hyperparameters and implicit use of first- and second-order information.

What carries the argument

The Rotosolve update that performs exact line search on each coordinate via three-point trigonometric interpolation of the sinusoidal cost landscape induced by the variational quantum circuit.

Load-bearing premise

The cost landscape of the variational quantum circuit is non-convex and smooth, or additionally satisfies the Polyak-Lojasiewicz inequality.

What would settle it

An experiment on a standard VQA circuit whose landscape is independently verified as smooth and non-convex in which Rotosolve fails to reach an ε-stationary point within the number of iterations predicted by the derived rate would falsify the guarantee.

Figures

Figures reproduced from arXiv: 2604.25613 by M Girish Chandra, Sayantan Pramanik.

Figure 1
Figure 1. Figure 1: Plot of training loss against (a) iterations, and (b) the view at source ↗
read the original abstract

In this paper, we resolve an open question in the field of optimization algorithms for training parametrized quantum circuits: Does the popular Rotosolve algorithm converge? Until now, interpolation-based coordinate descent methods such as Rotosolve have mostly been treated as heuristics, lacking any formal convergence guarantees. We rigorously analyze Rotosolve, and show that it converges to $\varepsilon$-stationary points if the optimization landscape is non-convex and smooth; and to $\varepsilon$-suboptimal points if the objective function additionally obeys the Polyak-Lojasiewicz (PL) condition. Further, we derive explicit worst-case rates of convergence in the finite quantum measurement regime. These rates are contrasted against those from a similar coordinate-based method: Randomized Coordinate Descent (RCD). Although in the worst case their rates are, prima facie, equivalent, we present arguments for a more nuanced comparison between the two. We highlight that Rotosolve is hyperparameter-free, and implicitly uses first and second derivatives in its updates. Finally, we supplement our theoretical findings with numerical experiments from Quantum Machine Learning; and compare the performance of Rotosolve against RCD, Stochastic Gradient Descent, Simultaneous Perturbation Stochastic Approximation, and Randomized Stochastic Gradient Free methods.

Editorial analysis

A structured set of objections, weighed in public.

Desk editor's note, referee report, simulated authors' rebuttal, and a circularity audit. Tearing a paper down is the easy half of reading it; the pith above is the substance, this is the friction.

Referee Report

2 major / 3 minor

Summary. The manuscript provides the first rigorous convergence analysis for the Rotosolve algorithm in variational quantum algorithms. It proves that, assuming a smooth non-convex objective, Rotosolve converges to ε-stationary points; under the additional Polyak-Łojasiewicz condition it converges to ε-suboptimal points. Explicit worst-case convergence rates are derived for the finite-shot (noisy measurement) regime and contrasted with Randomized Coordinate Descent (RCD). The analysis emphasizes that Rotosolve is hyperparameter-free and implicitly incorporates first- and second-order information. Numerical experiments on quantum machine learning tasks compare Rotosolve against RCD, SGD, SPSA, and RSGF.

Significance. If the proofs are correct, the work fills a notable gap by supplying formal guarantees for a widely used heuristic optimizer in VQAs. The explicit finite-shot rates, the nuanced (if informal) comparison to RCD, and the stress on the hyperparameter-free property are useful for practitioners. The supplementary numerical experiments add empirical context, though they are not the primary contribution. The results are conditional on standard optimization assumptions that the authors state explicitly rather than claim to hold universally for VQA landscapes.

major comments (2)
  1. [§4] §4 (finite-shot analysis): The derivation of the worst-case rate under bounded-variance shot noise is load-bearing for the central claim of explicit rates; the manuscript should state the precise variance bound assumed for the Rotosolve estimator and confirm that it matches the RCD estimator used for comparison (otherwise the rates are not directly comparable).
  2. [Theorem 2] Theorem 2 (PL case): The ε-suboptimality guarantee relies on the PL inequality holding globally; the paper should clarify whether the local PL constant is used or whether a uniform constant is assumed, as this affects the practical relevance of the rate.
minor comments (3)
  1. [Introduction] The abstract states that Rotosolve 'implicitly uses first and second derivatives'; this is an important selling point but should be made precise in the main text by showing how the analytic interpolation step encodes curvature information.
  2. [Numerical experiments] Numerical experiments section: quantitative details (number of shots per evaluation, circuit depth, number of independent runs, error bars) are needed to allow readers to assess the statistical significance of the reported performance differences.
  3. [Notation] Notation: the symbol for the finite-shot estimator should be introduced once and used consistently; currently the transition from the ideal expectation value to the shot-noise version is abrupt in several proofs.

Simulated Author's Rebuttal

2 responses · 0 unresolved

We thank the referee for their thorough review and constructive feedback. We address each major comment point by point below. The requested clarifications are straightforward to incorporate and will improve the manuscript's clarity.

read point-by-point responses
  1. Referee: [§4] §4 (finite-shot analysis): The derivation of the worst-case rate under bounded-variance shot noise is load-bearing for the central claim of explicit rates; the manuscript should state the precise variance bound assumed for the Rotosolve estimator and confirm that it matches the RCD estimator used for comparison (otherwise the rates are not directly comparable).

    Authors: We agree that an explicit statement of the variance bound strengthens the finite-shot analysis. Section 4 assumes that each noisy function evaluation (obtained via a fixed number of shots) has variance bounded above by a constant σ² independent of the parameter value; this is the standard bounded-variance model for shot noise. Because Rotosolve performs an exact trigonometric fit using three evaluations per coordinate, the variance of the resulting angle update is bounded by a multiple of σ² that we derive explicitly. The identical bounded-variance assumption is used for the RCD estimator in the comparison, so the worst-case rates are directly comparable. We will add a dedicated paragraph in §4 stating the precise bound for both estimators and confirming the matching assumption. revision: yes

  2. Referee: [Theorem 2] Theorem 2 (PL case): The ε-suboptimality guarantee relies on the PL inequality holding globally; the paper should clarify whether the local PL constant is used or whether a uniform constant is assumed, as this affects the practical relevance of the rate.

    Authors: We thank the referee for this observation. Theorem 2 assumes the Polyak-Łojasiewicz inequality holds globally with a single uniform constant μ > 0 for every point in the domain. This global assumption is required to obtain the stated linear rate to an ε-suboptimal point from an arbitrary initialization. A merely local PL constant would only yield a local convergence guarantee, which is not the claim of the theorem. We will revise the statement of Theorem 2, the paragraph immediately preceding it, and the discussion of assumptions to make the uniform global character of μ explicit. revision: yes

Circularity Check

0 steps flagged

No significant circularity; standard non-convex optimization results applied directly

full rationale

The paper states explicit hypotheses (non-convex smooth landscape, or additional PL inequality) and derives convergence rates for Rotosolve by applying known coordinate-descent analysis, with finite-shot noise modeled via bounded-variance estimators. No equation reduces to a fitted parameter renamed as prediction, no self-definitional loop, and no load-bearing self-citation chain that substitutes for an independent proof. The comparison to RCD is presented as a direct rate contrast under the same assumptions. The analysis is therefore self-contained against external optimization benchmarks.

Axiom & Free-Parameter Ledger

0 free parameters · 2 axioms · 0 invented entities

Central claims rest on two standard domain assumptions from non-convex optimization theory applied to the VQA loss landscape; no free parameters or new entities are introduced.

axioms (2)
  • domain assumption The objective function is non-convex and smooth
    Required for convergence to ε-stationary points
  • domain assumption The objective function satisfies the Polyak-Lojasiewicz condition
    Required for convergence to ε-suboptimal points

pith-pipeline@v0.9.0 · 5517 in / 1433 out tokens · 83965 ms · 2026-05-07T16:53:30.149203+00:00 · methodology

discussion (0)

Sign in with ORCID, Apple, or X to comment. Anyone can read and Pith papers without signing in.

Reference graph

Works this paper leans on

35 extracted references · 17 canonical work pages · 1 internal anchor

  1. [1]

    A variational eigenvalue solver on a photonic quantum processor,

    A. Peruzzo, J. McClean, P. Shadbolt, M.-H. Yung, X.-Q. Zhou, P. J. Love, A. Aspuru-Guzik, and J. L. O’Brien, “A variational eigenvalue solver on a photonic quantum processor,”Nature Communications, vol. 5, no. 1, jul 2014. [Online]. Available: https://doi.org/10.1038% 2Fncomms5213

  2. [2]

    Cerezoet al., Variational quantum al- gorithms, Nat

    M. Cerezo, A. Arrasmith, R. Babbush, S. C. Benjamin, S. Endo, K. Fujii, J. R. McClean, K. Mitarai, X. Yuan, L. Cincio, and P. J. Coles, “Variational quantum algorithms,”Nature Reviews Physics, vol. 3, no. 9, p. 625–644, Aug. 2021. [Online]. Available: http://dx.doi.org/10.1038/s42254-021-00348-9

  3. [3]

    Handbook of convergence theorems for (stochastic) gradient methods,

    G. Garrigos and R. M. Gower, “Handbook of convergence theorems for (stochastic) gradient methods,” 2024. [Online]. Available: https: //arxiv.org/abs/2301.11235

  4. [4]

    Stochastic gradient descent for hybrid quantum-classical optimization,

    R. Sweke, F. Wilde, J. Meyer, M. Schuld, P. K. Faehrmann, B. Meynard- Piganeau, and J. Eisert, “Stochastic gradient descent for hybrid quantum-classical optimization,”Quantum, vol. 4, p. 314, aug 2020. [Online]. Available: https://doi.org/10.22331%2Fq-2020-08-31-314

  5. [5]

    Efficiency of coordinate descent methods on huge-scale optimization problems,

    Y . Nesterov, “Efficiency of coordinate descent methods on huge-scale optimization problems,”SIAM J. Optim., vol. 22, no. 2, pp. 341–362, Jan. 2012

  6. [6]

    Coordinate descent algorithms,

    S. J. Wright, “Coordinate descent algorithms,” 2015. [Online]. Available: https://arxiv.org/abs/1502.04759

  7. [7]

    Random coordinate descent: A simple alternative for optimizing parameterized quantum circuits,

    Z. Ding, T. Ko, J. Yao, L. Lin, and X. Li, “Random coordinate descent: A simple alternative for optimizing parameterized quantum circuits,” Phys. Rev. Res., vol. 6, p. 033029, Jul 2024. [Online]. Available: https://link.aps.org/doi/10.1103/PhysRevResearch.6.033029

  8. [8]

    Adam: A method for stochastic optimization,

    D. P. Kingma and J. Ba, “Adam: A method for stochastic optimization,”

  9. [9]

    Adam: A Method for Stochastic Optimization

    [Online]. Available: https://arxiv.org/abs/1412.6980

  10. [10]

    Random gradient-free minimization of convex functions,

    Y . Nesterov and V . Spokoiny, “Random gradient-free minimization of convex functions,”Found. Comut. Math., vol. 17, no. 2, pp. 527–566, Apr. 2017

  11. [11]

    Stochastic first- and zeroth-order methods for nonconvex stochastic programming,

    S. Ghadimi and G. Lan, “Stochastic first- and zeroth-order methods for nonconvex stochastic programming,”SIAM J. Optim., vol. 23, no. 4, pp. 2341–2368, Jan. 2013

  12. [12]

    An overview of the simultaneous perturbation method for efficient optimization,

    J. C. Spall, “An overview of the simultaneous perturbation method for efficient optimization,”Johns Hopkins Apl Technical Digest, vol. 19, pp. 482–492, 1998

  13. [13]

    Implementation of the simultaneous perturbation algorithm for stochastic optimization,

    J. Spall, “Implementation of the simultaneous perturbation algorithm for stochastic optimization,”IEEE Transactions on Aerospace and Electronic Systems, vol. 34, no. 3, pp. 817–823, 1998

  14. [14]

    Calculus on parameterized quantum circuits,

    J. G. Vidal and D. O. Theis, “Calculus on parameterized quantum circuits,” 2018. [Online]. Available: https://arxiv.org/abs/1812.06323

  15. [15]

    Sequential minimal optimization for quantum-classical hybrid algorithms,

    K. M. Nakanishi, K. Fujii, and S. Todo, “Sequential minimal optimization for quantum-classical hybrid algorithms,”Physical Review Research, vol. 2, no. 4, Oct. 2020. [Online]. Available: http: //dx.doi.org/10.1103/PhysRevResearch.2.043158

  16. [16]

    Structure optimization for parameterized quantum circuits,

    M. Ostaszewski, E. Grant, and M. Benedetti, “Structure optimization for parameterized quantum circuits,”Quantum, vol. 5, p. 391, Jan

  17. [17]

    doi:10.22331/q-2021-01-28-391 , url =

    [Online]. Available: https://doi.org/10.22331/q-2021-01-28-391

  18. [18]

    A jacobi diagonalization and anderson acceleration algorithm for variational quantum algorithm parameter optimization,

    R. M. Parrish, J. T. Iosue, A. Ozaeta, and P. L. McMahon, “A jacobi diagonalization and anderson acceleration algorithm for variational quantum algorithm parameter optimization,” 2019. [Online]. Available: https://arxiv.org/abs/1904.03206

  19. [19]

    Interpolation-based coordinate descent method for parameterized quantum circuits,

    Z. Lai, J. Hu, T. Ko, J. Wu, and D. An, “Interpolation-based coordinate descent method for parameterized quantum circuits,” Communications Physics, vol. 9, no. 1, Jan. 2026. [Online]. Available: http://dx.doi.org/10.1038/s42005-025-02473-8

  20. [20]

    Linear convergence of gradient and proximal-gradient methods under the polyak-łojasiewicz condition,

    H. Karimi, J. Nutini, and M. Schmidt, “Linear convergence of gradient and proximal-gradient methods under the polyak-łojasiewicz condition,” 2020

  21. [21]

    On quantum backpropagation, information reuse, and cheating measurement collapse,

    A. Abbas, R. King, H.-Y . Huang, W. J. Huggins, R. Movassagh, D. Gilboa, and J. McClean, “On quantum backpropagation, information reuse, and cheating measurement collapse,” in Advances in Neural Information Processing Systems, 2023. [Online]. Available: https://proceedings.neurips.cc/paper files/paper/ 2023/file/8c3caae2f725c8e2a55ecd600563d172-Paper-Conf...

  22. [22]

    Dimension-free iteration complexity of finite sum optimization problems,

    Y . Arjevani and O. Shamir, “Dimension-free iteration complexity of finite sum optimization problems,” inAdvances in Neural Information Processing Systems, D. Lee, M. Sugiyama, U. Luxburg, I. Guyon, and R. Garnett, Eds., vol. 29. Curran Associates, Inc.,

  23. [23]

    Available: https://proceedings.neurips.cc/paper files/ paper/2016/file/299570476c6f0309545110c592b6a63b-Paper.pdf

    [Online]. Available: https://proceedings.neurips.cc/paper files/ paper/2016/file/299570476c6f0309545110c592b6a63b-Paper.pdf

  24. [24]

    Stochastic noise can be helpful for variational quantum algorithms,

    J. Liu, F. Wilde, A. A. Mele, X. Jin, L. Jiang, and J. Eisert, “Stochastic noise can be helpful for variational quantum algorithms,” Phys. Rev. A, vol. 111, p. 052441, May 2025. [Online]. Available: https://link.aps.org/doi/10.1103/PhysRevA.111.052441

  25. [25]

    Variational quantum algorithms are Lipschitz smooth,

    C. Kverne, M. Akewar, N. S. DiBrita, Y . Huo, T. Patel, and J. Bhimani, “Variational quantum algorithms are Lipschitz smooth,” 2026. [Online]. Available: https://openreview.net/forum?id=beg6QFuff4

  26. [26]

    Evaluating analytic gradients on quantum hardware,

    M. Schuld, V . Bergholm, C. Gogolin, J. Izaac, and N. Killoran, “Evaluating analytic gradients on quantum hardware,”Physical Review A, vol. 99, no. 3, mar 2019. [Online]. Available: https: //doi.org/10.1103%2Fphysreva.99.032331

  27. [27]

    General parameter- shift rules for quantum gradients,

    D. Wierichs, J. Izaac, C. Wang, and C. Y .-Y . Lin, “General parameter- shift rules for quantum gradients,”Quantum, vol. 6, p. 677, Mar. 2022. [Online]. Available: http://dx.doi.org/10.22331/q-2022-03-30-677

  28. [28]

    Estimating the gradient and higher-order derivatives on quantum hardware,

    A. Mari, T. R. Bromley, and N. Killoran, “Estimating the gradient and higher-order derivatives on quantum hardware,”Physical Review A, vol. 103, no. 1, p. 012405, 2021

  29. [29]

    Fast gradient- free optimization of excitations in variational quantum eigensolvers,

    J. J ¨ager, T. N. Kaldenbach, M. Haas, and E. Schultheis, “Fast gradient- free optimization of excitations in variational quantum eigensolvers,” Commun. Phys., vol. 8, no. 1, p. 418, Oct. 2025

  30. [30]

    Improving variational quantum circuit optimization via hybrid algorithms and random axis initialization,

    J. V . Pankkonen, L. Ylinen, M. Raasakka, and I. Tittonen, “Improving variational quantum circuit optimization via hybrid algorithms and random axis initialization,” 2025. [Online]. Available: https://arxiv.org/abs/2503.20728

  31. [31]

    Optimizing parameterized quantum circuits with free- axis selection,

    H. C. Watanabe, R. Raymond, Y .-Y . Ohnishi, E. Kaminishi, and M. Sugawara, “Optimizing parameterized quantum circuits with free- axis selection,” in2021 IEEE International Conference on Quantum Computing and Engineering (QCE), 2021, pp. 100–111

  32. [32]

    Sequential optimal selections of single-qubit gates in parameterized quantum circuits,

    K. Wada, R. Raymond, Y . Sato, and H. C. Watanabe, “Sequential optimal selections of single-qubit gates in parameterized quantum circuits,” Quantum Science and Technology, vol. 9, no. 3, p. 035030, May 2024. [Online]. Available: http://dx.doi.org/10.1088/2058-9565/ad4583

  33. [33]

    Optimizing a parameterized controlled gate using free quaternion selection,

    H. Kurogi, K. Endo, Y . Sato, M. Sugawara, K. Wada, K. Sugisaki, S. Kanno, H. C. Watanabe, and H. Nakano, “Optimizing a parameterized controlled gate using free quaternion selection,” 2025. [Online]. Available: https://arxiv.org/abs/2409.13547

  34. [34]

    Optimizing parameters of quantum cir- cuits with sparsity-inducing coordinate descent,

    R. Raymond and Z. He, “Optimizing parameters of quantum cir- cuits with sparsity-inducing coordinate descent,” inProceedings of the Thirty-Fourth International Joint Conference on Artificial Intelligence. California: International Joint Conferences on Artificial Intelligence Organization, Sep. 2025, pp. 6111–6119

  35. [35]

    Stochastic shadow descent: Training parametrized quantum circuits with shadows of gradients,

    S. Pramanik and M. G. Chandra, “Stochastic shadow descent: Training parametrized quantum circuits with shadows of gradients,” inTo be presented at IEEE International Conference on Acoustics, Speech and Signal Processing (ICASSP), 2026. [Online]. Available: https://arxiv.org/abs/2511.12168