Recognition: unknown
One Coordinate at a Time: Convergence Guarantees for Rotosolve in Variational Quantum Algorithms
Pith reviewed 2026-05-07 16:53 UTC · model grok-4.3
The pith
Rotosolve converges to ε-stationary points in smooth non-convex variational quantum landscapes and to ε-suboptimal points under the Polyak-Lojasiewicz condition with explicit rates.
A machine-rendered reading of the paper's core claim, the machinery that carries it, and where it could break.
Core claim
Rotosolve sequentially optimizes one variational parameter at a time by exactly minimizing a trigonometric fit to the expectation value along that coordinate. The analysis shows that this procedure yields convergence to an ε-stationary point in smooth non-convex landscapes, and to an ε-suboptimal point under the Polyak-Lojasiewicz condition. The worst-case rates are derived for the finite-shot measurement regime by incorporating shot noise variance, and are shown to be comparable to those of randomized coordinate descent while highlighting Rotosolve's lack of hyperparameters and implicit use of first- and second-order information.
What carries the argument
The Rotosolve update that performs exact line search on each coordinate via three-point trigonometric interpolation of the sinusoidal cost landscape induced by the variational quantum circuit.
Load-bearing premise
The cost landscape of the variational quantum circuit is non-convex and smooth, or additionally satisfies the Polyak-Lojasiewicz inequality.
What would settle it
An experiment on a standard VQA circuit whose landscape is independently verified as smooth and non-convex in which Rotosolve fails to reach an ε-stationary point within the number of iterations predicted by the derived rate would falsify the guarantee.
Figures
read the original abstract
In this paper, we resolve an open question in the field of optimization algorithms for training parametrized quantum circuits: Does the popular Rotosolve algorithm converge? Until now, interpolation-based coordinate descent methods such as Rotosolve have mostly been treated as heuristics, lacking any formal convergence guarantees. We rigorously analyze Rotosolve, and show that it converges to $\varepsilon$-stationary points if the optimization landscape is non-convex and smooth; and to $\varepsilon$-suboptimal points if the objective function additionally obeys the Polyak-Lojasiewicz (PL) condition. Further, we derive explicit worst-case rates of convergence in the finite quantum measurement regime. These rates are contrasted against those from a similar coordinate-based method: Randomized Coordinate Descent (RCD). Although in the worst case their rates are, prima facie, equivalent, we present arguments for a more nuanced comparison between the two. We highlight that Rotosolve is hyperparameter-free, and implicitly uses first and second derivatives in its updates. Finally, we supplement our theoretical findings with numerical experiments from Quantum Machine Learning; and compare the performance of Rotosolve against RCD, Stochastic Gradient Descent, Simultaneous Perturbation Stochastic Approximation, and Randomized Stochastic Gradient Free methods.
Editorial analysis
A structured set of objections, weighed in public.
Referee Report
Summary. The manuscript provides the first rigorous convergence analysis for the Rotosolve algorithm in variational quantum algorithms. It proves that, assuming a smooth non-convex objective, Rotosolve converges to ε-stationary points; under the additional Polyak-Łojasiewicz condition it converges to ε-suboptimal points. Explicit worst-case convergence rates are derived for the finite-shot (noisy measurement) regime and contrasted with Randomized Coordinate Descent (RCD). The analysis emphasizes that Rotosolve is hyperparameter-free and implicitly incorporates first- and second-order information. Numerical experiments on quantum machine learning tasks compare Rotosolve against RCD, SGD, SPSA, and RSGF.
Significance. If the proofs are correct, the work fills a notable gap by supplying formal guarantees for a widely used heuristic optimizer in VQAs. The explicit finite-shot rates, the nuanced (if informal) comparison to RCD, and the stress on the hyperparameter-free property are useful for practitioners. The supplementary numerical experiments add empirical context, though they are not the primary contribution. The results are conditional on standard optimization assumptions that the authors state explicitly rather than claim to hold universally for VQA landscapes.
major comments (2)
- [§4] §4 (finite-shot analysis): The derivation of the worst-case rate under bounded-variance shot noise is load-bearing for the central claim of explicit rates; the manuscript should state the precise variance bound assumed for the Rotosolve estimator and confirm that it matches the RCD estimator used for comparison (otherwise the rates are not directly comparable).
- [Theorem 2] Theorem 2 (PL case): The ε-suboptimality guarantee relies on the PL inequality holding globally; the paper should clarify whether the local PL constant is used or whether a uniform constant is assumed, as this affects the practical relevance of the rate.
minor comments (3)
- [Introduction] The abstract states that Rotosolve 'implicitly uses first and second derivatives'; this is an important selling point but should be made precise in the main text by showing how the analytic interpolation step encodes curvature information.
- [Numerical experiments] Numerical experiments section: quantitative details (number of shots per evaluation, circuit depth, number of independent runs, error bars) are needed to allow readers to assess the statistical significance of the reported performance differences.
- [Notation] Notation: the symbol for the finite-shot estimator should be introduced once and used consistently; currently the transition from the ideal expectation value to the shot-noise version is abrupt in several proofs.
Simulated Author's Rebuttal
We thank the referee for their thorough review and constructive feedback. We address each major comment point by point below. The requested clarifications are straightforward to incorporate and will improve the manuscript's clarity.
read point-by-point responses
-
Referee: [§4] §4 (finite-shot analysis): The derivation of the worst-case rate under bounded-variance shot noise is load-bearing for the central claim of explicit rates; the manuscript should state the precise variance bound assumed for the Rotosolve estimator and confirm that it matches the RCD estimator used for comparison (otherwise the rates are not directly comparable).
Authors: We agree that an explicit statement of the variance bound strengthens the finite-shot analysis. Section 4 assumes that each noisy function evaluation (obtained via a fixed number of shots) has variance bounded above by a constant σ² independent of the parameter value; this is the standard bounded-variance model for shot noise. Because Rotosolve performs an exact trigonometric fit using three evaluations per coordinate, the variance of the resulting angle update is bounded by a multiple of σ² that we derive explicitly. The identical bounded-variance assumption is used for the RCD estimator in the comparison, so the worst-case rates are directly comparable. We will add a dedicated paragraph in §4 stating the precise bound for both estimators and confirming the matching assumption. revision: yes
-
Referee: [Theorem 2] Theorem 2 (PL case): The ε-suboptimality guarantee relies on the PL inequality holding globally; the paper should clarify whether the local PL constant is used or whether a uniform constant is assumed, as this affects the practical relevance of the rate.
Authors: We thank the referee for this observation. Theorem 2 assumes the Polyak-Łojasiewicz inequality holds globally with a single uniform constant μ > 0 for every point in the domain. This global assumption is required to obtain the stated linear rate to an ε-suboptimal point from an arbitrary initialization. A merely local PL constant would only yield a local convergence guarantee, which is not the claim of the theorem. We will revise the statement of Theorem 2, the paragraph immediately preceding it, and the discussion of assumptions to make the uniform global character of μ explicit. revision: yes
Circularity Check
No significant circularity; standard non-convex optimization results applied directly
full rationale
The paper states explicit hypotheses (non-convex smooth landscape, or additional PL inequality) and derives convergence rates for Rotosolve by applying known coordinate-descent analysis, with finite-shot noise modeled via bounded-variance estimators. No equation reduces to a fitted parameter renamed as prediction, no self-definitional loop, and no load-bearing self-citation chain that substitutes for an independent proof. The comparison to RCD is presented as a direct rate contrast under the same assumptions. The analysis is therefore self-contained against external optimization benchmarks.
Axiom & Free-Parameter Ledger
axioms (2)
- domain assumption The objective function is non-convex and smooth
- domain assumption The objective function satisfies the Polyak-Lojasiewicz condition
Reference graph
Works this paper leans on
-
[1]
A variational eigenvalue solver on a photonic quantum processor,
A. Peruzzo, J. McClean, P. Shadbolt, M.-H. Yung, X.-Q. Zhou, P. J. Love, A. Aspuru-Guzik, and J. L. O’Brien, “A variational eigenvalue solver on a photonic quantum processor,”Nature Communications, vol. 5, no. 1, jul 2014. [Online]. Available: https://doi.org/10.1038% 2Fncomms5213
2014
-
[2]
Cerezoet al., Variational quantum al- gorithms, Nat
M. Cerezo, A. Arrasmith, R. Babbush, S. C. Benjamin, S. Endo, K. Fujii, J. R. McClean, K. Mitarai, X. Yuan, L. Cincio, and P. J. Coles, “Variational quantum algorithms,”Nature Reviews Physics, vol. 3, no. 9, p. 625–644, Aug. 2021. [Online]. Available: http://dx.doi.org/10.1038/s42254-021-00348-9
-
[3]
Handbook of convergence theorems for (stochastic) gradient methods,
G. Garrigos and R. M. Gower, “Handbook of convergence theorems for (stochastic) gradient methods,” 2024. [Online]. Available: https: //arxiv.org/abs/2301.11235
-
[4]
Stochastic gradient descent for hybrid quantum-classical optimization,
R. Sweke, F. Wilde, J. Meyer, M. Schuld, P. K. Faehrmann, B. Meynard- Piganeau, and J. Eisert, “Stochastic gradient descent for hybrid quantum-classical optimization,”Quantum, vol. 4, p. 314, aug 2020. [Online]. Available: https://doi.org/10.22331%2Fq-2020-08-31-314
2020
-
[5]
Efficiency of coordinate descent methods on huge-scale optimization problems,
Y . Nesterov, “Efficiency of coordinate descent methods on huge-scale optimization problems,”SIAM J. Optim., vol. 22, no. 2, pp. 341–362, Jan. 2012
2012
-
[6]
Coordinate descent algorithms,
S. J. Wright, “Coordinate descent algorithms,” 2015. [Online]. Available: https://arxiv.org/abs/1502.04759
-
[7]
Random coordinate descent: A simple alternative for optimizing parameterized quantum circuits,
Z. Ding, T. Ko, J. Yao, L. Lin, and X. Li, “Random coordinate descent: A simple alternative for optimizing parameterized quantum circuits,” Phys. Rev. Res., vol. 6, p. 033029, Jul 2024. [Online]. Available: https://link.aps.org/doi/10.1103/PhysRevResearch.6.033029
-
[8]
Adam: A method for stochastic optimization,
D. P. Kingma and J. Ba, “Adam: A method for stochastic optimization,”
-
[9]
Adam: A Method for Stochastic Optimization
[Online]. Available: https://arxiv.org/abs/1412.6980
work page internal anchor Pith review arXiv
-
[10]
Random gradient-free minimization of convex functions,
Y . Nesterov and V . Spokoiny, “Random gradient-free minimization of convex functions,”Found. Comut. Math., vol. 17, no. 2, pp. 527–566, Apr. 2017
2017
-
[11]
Stochastic first- and zeroth-order methods for nonconvex stochastic programming,
S. Ghadimi and G. Lan, “Stochastic first- and zeroth-order methods for nonconvex stochastic programming,”SIAM J. Optim., vol. 23, no. 4, pp. 2341–2368, Jan. 2013
2013
-
[12]
An overview of the simultaneous perturbation method for efficient optimization,
J. C. Spall, “An overview of the simultaneous perturbation method for efficient optimization,”Johns Hopkins Apl Technical Digest, vol. 19, pp. 482–492, 1998
1998
-
[13]
Implementation of the simultaneous perturbation algorithm for stochastic optimization,
J. Spall, “Implementation of the simultaneous perturbation algorithm for stochastic optimization,”IEEE Transactions on Aerospace and Electronic Systems, vol. 34, no. 3, pp. 817–823, 1998
1998
-
[14]
Calculus on parameterized quantum circuits,
J. G. Vidal and D. O. Theis, “Calculus on parameterized quantum circuits,” 2018. [Online]. Available: https://arxiv.org/abs/1812.06323
-
[15]
Sequential minimal optimization for quantum-classical hybrid algorithms,
K. M. Nakanishi, K. Fujii, and S. Todo, “Sequential minimal optimization for quantum-classical hybrid algorithms,”Physical Review Research, vol. 2, no. 4, Oct. 2020. [Online]. Available: http: //dx.doi.org/10.1103/PhysRevResearch.2.043158
-
[16]
Structure optimization for parameterized quantum circuits,
M. Ostaszewski, E. Grant, and M. Benedetti, “Structure optimization for parameterized quantum circuits,”Quantum, vol. 5, p. 391, Jan
-
[17]
doi:10.22331/q-2021-01-28-391 , url =
[Online]. Available: https://doi.org/10.22331/q-2021-01-28-391
-
[18]
R. M. Parrish, J. T. Iosue, A. Ozaeta, and P. L. McMahon, “A jacobi diagonalization and anderson acceleration algorithm for variational quantum algorithm parameter optimization,” 2019. [Online]. Available: https://arxiv.org/abs/1904.03206
-
[19]
Interpolation-based coordinate descent method for parameterized quantum circuits,
Z. Lai, J. Hu, T. Ko, J. Wu, and D. An, “Interpolation-based coordinate descent method for parameterized quantum circuits,” Communications Physics, vol. 9, no. 1, Jan. 2026. [Online]. Available: http://dx.doi.org/10.1038/s42005-025-02473-8
-
[20]
Linear convergence of gradient and proximal-gradient methods under the polyak-łojasiewicz condition,
H. Karimi, J. Nutini, and M. Schmidt, “Linear convergence of gradient and proximal-gradient methods under the polyak-łojasiewicz condition,” 2020
2020
-
[21]
On quantum backpropagation, information reuse, and cheating measurement collapse,
A. Abbas, R. King, H.-Y . Huang, W. J. Huggins, R. Movassagh, D. Gilboa, and J. McClean, “On quantum backpropagation, information reuse, and cheating measurement collapse,” in Advances in Neural Information Processing Systems, 2023. [Online]. Available: https://proceedings.neurips.cc/paper files/paper/ 2023/file/8c3caae2f725c8e2a55ecd600563d172-Paper-Conf...
2023
-
[22]
Dimension-free iteration complexity of finite sum optimization problems,
Y . Arjevani and O. Shamir, “Dimension-free iteration complexity of finite sum optimization problems,” inAdvances in Neural Information Processing Systems, D. Lee, M. Sugiyama, U. Luxburg, I. Guyon, and R. Garnett, Eds., vol. 29. Curran Associates, Inc.,
-
[23]
[Online]. Available: https://proceedings.neurips.cc/paper files/ paper/2016/file/299570476c6f0309545110c592b6a63b-Paper.pdf
-
[24]
Stochastic noise can be helpful for variational quantum algorithms,
J. Liu, F. Wilde, A. A. Mele, X. Jin, L. Jiang, and J. Eisert, “Stochastic noise can be helpful for variational quantum algorithms,” Phys. Rev. A, vol. 111, p. 052441, May 2025. [Online]. Available: https://link.aps.org/doi/10.1103/PhysRevA.111.052441
-
[25]
Variational quantum algorithms are Lipschitz smooth,
C. Kverne, M. Akewar, N. S. DiBrita, Y . Huo, T. Patel, and J. Bhimani, “Variational quantum algorithms are Lipschitz smooth,” 2026. [Online]. Available: https://openreview.net/forum?id=beg6QFuff4
2026
-
[26]
Evaluating analytic gradients on quantum hardware,
M. Schuld, V . Bergholm, C. Gogolin, J. Izaac, and N. Killoran, “Evaluating analytic gradients on quantum hardware,”Physical Review A, vol. 99, no. 3, mar 2019. [Online]. Available: https: //doi.org/10.1103%2Fphysreva.99.032331
2019
-
[27]
General parameter- shift rules for quantum gradients,
D. Wierichs, J. Izaac, C. Wang, and C. Y .-Y . Lin, “General parameter- shift rules for quantum gradients,”Quantum, vol. 6, p. 677, Mar. 2022. [Online]. Available: http://dx.doi.org/10.22331/q-2022-03-30-677
-
[28]
Estimating the gradient and higher-order derivatives on quantum hardware,
A. Mari, T. R. Bromley, and N. Killoran, “Estimating the gradient and higher-order derivatives on quantum hardware,”Physical Review A, vol. 103, no. 1, p. 012405, 2021
2021
-
[29]
Fast gradient- free optimization of excitations in variational quantum eigensolvers,
J. J ¨ager, T. N. Kaldenbach, M. Haas, and E. Schultheis, “Fast gradient- free optimization of excitations in variational quantum eigensolvers,” Commun. Phys., vol. 8, no. 1, p. 418, Oct. 2025
2025
-
[30]
J. V . Pankkonen, L. Ylinen, M. Raasakka, and I. Tittonen, “Improving variational quantum circuit optimization via hybrid algorithms and random axis initialization,” 2025. [Online]. Available: https://arxiv.org/abs/2503.20728
-
[31]
Optimizing parameterized quantum circuits with free- axis selection,
H. C. Watanabe, R. Raymond, Y .-Y . Ohnishi, E. Kaminishi, and M. Sugawara, “Optimizing parameterized quantum circuits with free- axis selection,” in2021 IEEE International Conference on Quantum Computing and Engineering (QCE), 2021, pp. 100–111
2021
-
[32]
Sequential optimal selections of single-qubit gates in parameterized quantum circuits,
K. Wada, R. Raymond, Y . Sato, and H. C. Watanabe, “Sequential optimal selections of single-qubit gates in parameterized quantum circuits,” Quantum Science and Technology, vol. 9, no. 3, p. 035030, May 2024. [Online]. Available: http://dx.doi.org/10.1088/2058-9565/ad4583
-
[33]
Optimizing a parameterized controlled gate using free quaternion selection,
H. Kurogi, K. Endo, Y . Sato, M. Sugawara, K. Wada, K. Sugisaki, S. Kanno, H. C. Watanabe, and H. Nakano, “Optimizing a parameterized controlled gate using free quaternion selection,” 2025. [Online]. Available: https://arxiv.org/abs/2409.13547
-
[34]
Optimizing parameters of quantum cir- cuits with sparsity-inducing coordinate descent,
R. Raymond and Z. He, “Optimizing parameters of quantum cir- cuits with sparsity-inducing coordinate descent,” inProceedings of the Thirty-Fourth International Joint Conference on Artificial Intelligence. California: International Joint Conferences on Artificial Intelligence Organization, Sep. 2025, pp. 6111–6119
2025
-
[35]
Stochastic shadow descent: Training parametrized quantum circuits with shadows of gradients,
S. Pramanik and M. G. Chandra, “Stochastic shadow descent: Training parametrized quantum circuits with shadows of gradients,” inTo be presented at IEEE International Conference on Acoustics, Speech and Signal Processing (ICASSP), 2026. [Online]. Available: https://arxiv.org/abs/2511.12168
discussion (0)
Sign in with ORCID, Apple, or X to comment. Anyone can read and Pith papers without signing in.