arxiv: 2604.25613 · v1 · submitted 2026-04-28 · 🪐 quant-ph

Recognition: unknown

One Coordinate at a Time: Convergence Guarantees for Rotosolve in Variational Quantum Algorithms

M Girish Chandra, Sayantan Pramanik

Authors on Pith no claims yet

Pith reviewed 2026-05-07 16:53 UTC · model grok-4.3

classification 🪐 quant-ph

keywords Rotosolvevariational quantum algorithmsconvergence analysiscoordinate descentPolyak-Lojasiewicz conditionnon-convex optimizationshot noisequantum machine learning

0 comments

The pith

Rotosolve converges to ε-stationary points in smooth non-convex variational quantum landscapes and to ε-suboptimal points under the Polyak-Lojasiewicz condition with explicit rates.

A machine-rendered reading of the paper's core claim, the machinery that carries it, and where it could break.

The paper resolves the open question of whether the Rotosolve algorithm for optimizing variational quantum circuits has convergence guarantees. It proves that the method reaches ε-stationary points when the landscape is non-convex and smooth, and ε-suboptimal points if the Polyak-Lojasiewicz condition also holds. Explicit worst-case convergence rates are derived that account for noise from finite quantum measurements. These rates are compared to randomized coordinate descent, noting that Rotosolve is hyperparameter-free and implicitly uses derivative information. Numerical experiments on quantum machine learning tasks supplement the theory by comparing Rotosolve to several other optimizers.

Core claim

Rotosolve sequentially optimizes one variational parameter at a time by exactly minimizing a trigonometric fit to the expectation value along that coordinate. The analysis shows that this procedure yields convergence to an ε-stationary point in smooth non-convex landscapes, and to an ε-suboptimal point under the Polyak-Lojasiewicz condition. The worst-case rates are derived for the finite-shot measurement regime by incorporating shot noise variance, and are shown to be comparable to those of randomized coordinate descent while highlighting Rotosolve's lack of hyperparameters and implicit use of first- and second-order information.

What carries the argument

The Rotosolve update that performs exact line search on each coordinate via three-point trigonometric interpolation of the sinusoidal cost landscape induced by the variational quantum circuit.

Load-bearing premise

The cost landscape of the variational quantum circuit is non-convex and smooth, or additionally satisfies the Polyak-Lojasiewicz inequality.

What would settle it

An experiment on a standard VQA circuit whose landscape is independently verified as smooth and non-convex in which Rotosolve fails to reach an ε-stationary point within the number of iterations predicted by the derived rate would falsify the guarantee.

Figures

Figures reproduced from arXiv: 2604.25613 by M Girish Chandra, Sayantan Pramanik.

**Figure 1.** Figure 1: Plot of training loss against (a) iterations, and (b) the view at source ↗

read the original abstract

In this paper, we resolve an open question in the field of optimization algorithms for training parametrized quantum circuits: Does the popular Rotosolve algorithm converge? Until now, interpolation-based coordinate descent methods such as Rotosolve have mostly been treated as heuristics, lacking any formal convergence guarantees. We rigorously analyze Rotosolve, and show that it converges to $\varepsilon$-stationary points if the optimization landscape is non-convex and smooth; and to $\varepsilon$-suboptimal points if the objective function additionally obeys the Polyak-Lojasiewicz (PL) condition. Further, we derive explicit worst-case rates of convergence in the finite quantum measurement regime. These rates are contrasted against those from a similar coordinate-based method: Randomized Coordinate Descent (RCD). Although in the worst case their rates are, prima facie, equivalent, we present arguments for a more nuanced comparison between the two. We highlight that Rotosolve is hyperparameter-free, and implicitly uses first and second derivatives in its updates. Finally, we supplement our theoretical findings with numerical experiments from Quantum Machine Learning; and compare the performance of Rotosolve against RCD, Stochastic Gradient Descent, Simultaneous Perturbation Stochastic Approximation, and Randomized Stochastic Gradient Free methods.

Editorial analysis

A structured set of objections, weighed in public.

Desk editor's note, referee report, simulated authors' rebuttal, and a circularity audit. Tearing a paper down is the easy half of reading it; the pith above is the substance, this is the friction.

Desk Editor's Note private letter to a colleague

Rotosolve now has a convergence proof to epsilon-stationary points under smoothness and to epsilon-suboptimal under PL, plus explicit finite-shot rates.

read the letter

The punchline is that this paper gives the first rigorous convergence analysis for Rotosolve. It proves the method reaches epsilon-stationary points under the standard smoothness assumption for non-convex landscapes, and epsilon-suboptimal points if the function also satisfies the Polyak-Lojasiewicz condition. They also work out explicit convergence rates for the finite-shot regime with measurement noise and compare them to randomized coordinate descent. What stands out is how they resolve the open question that the abstract mentions. The analysis applies known results from non-convex optimization to this specific quantum optimizer in a clean way. The paper does well by highlighting that Rotosolve is hyperparameter-free and effectively uses first and second derivative info without computing them explicitly. The arguments for why its rates might be better in practice than the worst-case comparison to RCD are useful, and the QML numerical experiments add some practical color by benchmarking against SGD, SPSA, and stochastic gradient-free methods. The main soft spot is that everything rests on the landscape being smooth and possibly satisfying PL. These are common assumptions, but the paper does not check or discuss how realistic they are for typical variational quantum circuits. That makes the results conditional, which the authors acknowledge. The finite measurement analysis uses bounded variance, which is reasonable, but I wonder if the interpolation step in Rotosolve interacts with shot noise in ways that need tighter bounds. No obvious circularity or invented entities in the argument. This work is for people in quantum computing who train parametrized circuits and want to know when they can trust Rotosolve theoretically. A reader interested in optimization theory applied to quantum machine learning would get value from the rates and comparisons. It deserves a serious referee because it tackles a real gap with formal proofs and some experiments, even if the assumptions limit how far the guarantees extend.

Referee Report

2 major / 3 minor

Summary. The manuscript provides the first rigorous convergence analysis for the Rotosolve algorithm in variational quantum algorithms. It proves that, assuming a smooth non-convex objective, Rotosolve converges to ε-stationary points; under the additional Polyak-Łojasiewicz condition it converges to ε-suboptimal points. Explicit worst-case convergence rates are derived for the finite-shot (noisy measurement) regime and contrasted with Randomized Coordinate Descent (RCD). The analysis emphasizes that Rotosolve is hyperparameter-free and implicitly incorporates first- and second-order information. Numerical experiments on quantum machine learning tasks compare Rotosolve against RCD, SGD, SPSA, and RSGF.

Significance. If the proofs are correct, the work fills a notable gap by supplying formal guarantees for a widely used heuristic optimizer in VQAs. The explicit finite-shot rates, the nuanced (if informal) comparison to RCD, and the stress on the hyperparameter-free property are useful for practitioners. The supplementary numerical experiments add empirical context, though they are not the primary contribution. The results are conditional on standard optimization assumptions that the authors state explicitly rather than claim to hold universally for VQA landscapes.

major comments (2)

[§4] §4 (finite-shot analysis): The derivation of the worst-case rate under bounded-variance shot noise is load-bearing for the central claim of explicit rates; the manuscript should state the precise variance bound assumed for the Rotosolve estimator and confirm that it matches the RCD estimator used for comparison (otherwise the rates are not directly comparable).
[Theorem 2] Theorem 2 (PL case): The ε-suboptimality guarantee relies on the PL inequality holding globally; the paper should clarify whether the local PL constant is used or whether a uniform constant is assumed, as this affects the practical relevance of the rate.

minor comments (3)

[Introduction] The abstract states that Rotosolve 'implicitly uses first and second derivatives'; this is an important selling point but should be made precise in the main text by showing how the analytic interpolation step encodes curvature information.
[Numerical experiments] Numerical experiments section: quantitative details (number of shots per evaluation, circuit depth, number of independent runs, error bars) are needed to allow readers to assess the statistical significance of the reported performance differences.
[Notation] Notation: the symbol for the finite-shot estimator should be introduced once and used consistently; currently the transition from the ideal expectation value to the shot-noise version is abrupt in several proofs.

Simulated Author's Rebuttal

2 responses · 0 unresolved

We thank the referee for their thorough review and constructive feedback. We address each major comment point by point below. The requested clarifications are straightforward to incorporate and will improve the manuscript's clarity.

read point-by-point responses

Referee: [§4] §4 (finite-shot analysis): The derivation of the worst-case rate under bounded-variance shot noise is load-bearing for the central claim of explicit rates; the manuscript should state the precise variance bound assumed for the Rotosolve estimator and confirm that it matches the RCD estimator used for comparison (otherwise the rates are not directly comparable).

Authors: We agree that an explicit statement of the variance bound strengthens the finite-shot analysis. Section 4 assumes that each noisy function evaluation (obtained via a fixed number of shots) has variance bounded above by a constant σ² independent of the parameter value; this is the standard bounded-variance model for shot noise. Because Rotosolve performs an exact trigonometric fit using three evaluations per coordinate, the variance of the resulting angle update is bounded by a multiple of σ² that we derive explicitly. The identical bounded-variance assumption is used for the RCD estimator in the comparison, so the worst-case rates are directly comparable. We will add a dedicated paragraph in §4 stating the precise bound for both estimators and confirming the matching assumption. revision: yes
Referee: [Theorem 2] Theorem 2 (PL case): The ε-suboptimality guarantee relies on the PL inequality holding globally; the paper should clarify whether the local PL constant is used or whether a uniform constant is assumed, as this affects the practical relevance of the rate.

Authors: We thank the referee for this observation. Theorem 2 assumes the Polyak-Łojasiewicz inequality holds globally with a single uniform constant μ > 0 for every point in the domain. This global assumption is required to obtain the stated linear rate to an ε-suboptimal point from an arbitrary initialization. A merely local PL constant would only yield a local convergence guarantee, which is not the claim of the theorem. We will revise the statement of Theorem 2, the paragraph immediately preceding it, and the discussion of assumptions to make the uniform global character of μ explicit. revision: yes

Circularity Check

0 steps flagged

No significant circularity; standard non-convex optimization results applied directly

full rationale

The paper states explicit hypotheses (non-convex smooth landscape, or additional PL inequality) and derives convergence rates for Rotosolve by applying known coordinate-descent analysis, with finite-shot noise modeled via bounded-variance estimators. No equation reduces to a fitted parameter renamed as prediction, no self-definitional loop, and no load-bearing self-citation chain that substitutes for an independent proof. The comparison to RCD is presented as a direct rate contrast under the same assumptions. The analysis is therefore self-contained against external optimization benchmarks.

Axiom & Free-Parameter Ledger

0 free parameters · 2 axioms · 0 invented entities

Central claims rest on two standard domain assumptions from non-convex optimization theory applied to the VQA loss landscape; no free parameters or new entities are introduced.

axioms (2)

domain assumption The objective function is non-convex and smooth
Required for convergence to ε-stationary points
domain assumption The objective function satisfies the Polyak-Lojasiewicz condition
Required for convergence to ε-suboptimal points

pith-pipeline@v0.9.0 · 5517 in / 1433 out tokens · 83965 ms · 2026-05-07T16:53:30.149203+00:00 · methodology

discussion (0)

Reference graph

Works this paper leans on

35 extracted references · 17 canonical work pages · 1 internal anchor

[1]

A variational eigenvalue solver on a photonic quantum processor,

A. Peruzzo, J. McClean, P. Shadbolt, M.-H. Yung, X.-Q. Zhou, P. J. Love, A. Aspuru-Guzik, and J. L. O’Brien, “A variational eigenvalue solver on a photonic quantum processor,”Nature Communications, vol. 5, no. 1, jul 2014. [Online]. Available: https://doi.org/10.1038% 2Fncomms5213

2014
[2]

Cerezoet al., Variational quantum al- gorithms, Nat

M. Cerezo, A. Arrasmith, R. Babbush, S. C. Benjamin, S. Endo, K. Fujii, J. R. McClean, K. Mitarai, X. Yuan, L. Cincio, and P. J. Coles, “Variational quantum algorithms,”Nature Reviews Physics, vol. 3, no. 9, p. 625–644, Aug. 2021. [Online]. Available: http://dx.doi.org/10.1038/s42254-021-00348-9

work page doi:10.1038/s42254-021-00348-9 2021
[3]

Handbook of convergence theorems for (stochastic) gradient methods,

G. Garrigos and R. M. Gower, “Handbook of convergence theorems for (stochastic) gradient methods,” 2024. [Online]. Available: https: //arxiv.org/abs/2301.11235

work page arXiv 2024
[4]

Stochastic gradient descent for hybrid quantum-classical optimization,

R. Sweke, F. Wilde, J. Meyer, M. Schuld, P. K. Faehrmann, B. Meynard- Piganeau, and J. Eisert, “Stochastic gradient descent for hybrid quantum-classical optimization,”Quantum, vol. 4, p. 314, aug 2020. [Online]. Available: https://doi.org/10.22331%2Fq-2020-08-31-314

2020
[5]

Efficiency of coordinate descent methods on huge-scale optimization problems,

Y . Nesterov, “Efficiency of coordinate descent methods on huge-scale optimization problems,”SIAM J. Optim., vol. 22, no. 2, pp. 341–362, Jan. 2012

2012
[6]

Coordinate descent algorithms,

S. J. Wright, “Coordinate descent algorithms,” 2015. [Online]. Available: https://arxiv.org/abs/1502.04759

work page arXiv 2015
[7]

Random coordinate descent: A simple alternative for optimizing parameterized quantum circuits,

Z. Ding, T. Ko, J. Yao, L. Lin, and X. Li, “Random coordinate descent: A simple alternative for optimizing parameterized quantum circuits,” Phys. Rev. Res., vol. 6, p. 033029, Jul 2024. [Online]. Available: https://link.aps.org/doi/10.1103/PhysRevResearch.6.033029

work page doi:10.1103/physrevresearch.6.033029 2024
[8]

Adam: A method for stochastic optimization,

D. P. Kingma and J. Ba, “Adam: A method for stochastic optimization,”
[9]

Adam: A Method for Stochastic Optimization

[Online]. Available: https://arxiv.org/abs/1412.6980

work page internal anchor Pith review arXiv
[10]

Random gradient-free minimization of convex functions,

Y . Nesterov and V . Spokoiny, “Random gradient-free minimization of convex functions,”Found. Comut. Math., vol. 17, no. 2, pp. 527–566, Apr. 2017

2017
[11]

Stochastic first- and zeroth-order methods for nonconvex stochastic programming,

S. Ghadimi and G. Lan, “Stochastic first- and zeroth-order methods for nonconvex stochastic programming,”SIAM J. Optim., vol. 23, no. 4, pp. 2341–2368, Jan. 2013

2013
[12]

An overview of the simultaneous perturbation method for efficient optimization,

J. C. Spall, “An overview of the simultaneous perturbation method for efficient optimization,”Johns Hopkins Apl Technical Digest, vol. 19, pp. 482–492, 1998

1998
[13]

Implementation of the simultaneous perturbation algorithm for stochastic optimization,

J. Spall, “Implementation of the simultaneous perturbation algorithm for stochastic optimization,”IEEE Transactions on Aerospace and Electronic Systems, vol. 34, no. 3, pp. 817–823, 1998

1998
[14]

Calculus on parameterized quantum circuits,

J. G. Vidal and D. O. Theis, “Calculus on parameterized quantum circuits,” 2018. [Online]. Available: https://arxiv.org/abs/1812.06323

work page arXiv 2018
[15]

Sequential minimal optimization for quantum-classical hybrid algorithms,

K. M. Nakanishi, K. Fujii, and S. Todo, “Sequential minimal optimization for quantum-classical hybrid algorithms,”Physical Review Research, vol. 2, no. 4, Oct. 2020. [Online]. Available: http: //dx.doi.org/10.1103/PhysRevResearch.2.043158

work page doi:10.1103/physrevresearch.2.043158 2020
[16]

Structure optimization for parameterized quantum circuits,

M. Ostaszewski, E. Grant, and M. Benedetti, “Structure optimization for parameterized quantum circuits,”Quantum, vol. 5, p. 391, Jan
[17]

doi:10.22331/q-2021-01-28-391 , url =

[Online]. Available: https://doi.org/10.22331/q-2021-01-28-391

work page doi:10.22331/q-2021-01-28-391 2021
[18]

A jacobi diagonalization and anderson acceleration algorithm for variational quantum algorithm parameter optimization,

R. M. Parrish, J. T. Iosue, A. Ozaeta, and P. L. McMahon, “A jacobi diagonalization and anderson acceleration algorithm for variational quantum algorithm parameter optimization,” 2019. [Online]. Available: https://arxiv.org/abs/1904.03206

work page arXiv 2019
[19]

Interpolation-based coordinate descent method for parameterized quantum circuits,

Z. Lai, J. Hu, T. Ko, J. Wu, and D. An, “Interpolation-based coordinate descent method for parameterized quantum circuits,” Communications Physics, vol. 9, no. 1, Jan. 2026. [Online]. Available: http://dx.doi.org/10.1038/s42005-025-02473-8

work page doi:10.1038/s42005-025-02473-8 2026
[20]

Linear convergence of gradient and proximal-gradient methods under the polyak-łojasiewicz condition,

H. Karimi, J. Nutini, and M. Schmidt, “Linear convergence of gradient and proximal-gradient methods under the polyak-łojasiewicz condition,” 2020

2020
[21]

On quantum backpropagation, information reuse, and cheating measurement collapse,

A. Abbas, R. King, H.-Y . Huang, W. J. Huggins, R. Movassagh, D. Gilboa, and J. McClean, “On quantum backpropagation, information reuse, and cheating measurement collapse,” in Advances in Neural Information Processing Systems, 2023. [Online]. Available: https://proceedings.neurips.cc/paper files/paper/ 2023/file/8c3caae2f725c8e2a55ecd600563d172-Paper-Conf...

2023
[22]

Dimension-free iteration complexity of finite sum optimization problems,

Y . Arjevani and O. Shamir, “Dimension-free iteration complexity of finite sum optimization problems,” inAdvances in Neural Information Processing Systems, D. Lee, M. Sugiyama, U. Luxburg, I. Guyon, and R. Garnett, Eds., vol. 29. Curran Associates, Inc.,
[23]

Available: https://proceedings.neurips.cc/paper files/ paper/2016/file/299570476c6f0309545110c592b6a63b-Paper.pdf

[Online]. Available: https://proceedings.neurips.cc/paper files/ paper/2016/file/299570476c6f0309545110c592b6a63b-Paper.pdf

work page arXiv 2016
[24]

Stochastic noise can be helpful for variational quantum algorithms,

J. Liu, F. Wilde, A. A. Mele, X. Jin, L. Jiang, and J. Eisert, “Stochastic noise can be helpful for variational quantum algorithms,” Phys. Rev. A, vol. 111, p. 052441, May 2025. [Online]. Available: https://link.aps.org/doi/10.1103/PhysRevA.111.052441

work page doi:10.1103/physreva.111.052441 2025
[25]

Variational quantum algorithms are Lipschitz smooth,

C. Kverne, M. Akewar, N. S. DiBrita, Y . Huo, T. Patel, and J. Bhimani, “Variational quantum algorithms are Lipschitz smooth,” 2026. [Online]. Available: https://openreview.net/forum?id=beg6QFuff4

2026
[26]

Evaluating analytic gradients on quantum hardware,

M. Schuld, V . Bergholm, C. Gogolin, J. Izaac, and N. Killoran, “Evaluating analytic gradients on quantum hardware,”Physical Review A, vol. 99, no. 3, mar 2019. [Online]. Available: https: //doi.org/10.1103%2Fphysreva.99.032331

2019
[27]

General parameter- shift rules for quantum gradients,

D. Wierichs, J. Izaac, C. Wang, and C. Y .-Y . Lin, “General parameter- shift rules for quantum gradients,”Quantum, vol. 6, p. 677, Mar. 2022. [Online]. Available: http://dx.doi.org/10.22331/q-2022-03-30-677

work page doi:10.22331/q-2022-03-30-677 2022
[28]

Estimating the gradient and higher-order derivatives on quantum hardware,

A. Mari, T. R. Bromley, and N. Killoran, “Estimating the gradient and higher-order derivatives on quantum hardware,”Physical Review A, vol. 103, no. 1, p. 012405, 2021

2021
[29]

Fast gradient- free optimization of excitations in variational quantum eigensolvers,

J. J ¨ager, T. N. Kaldenbach, M. Haas, and E. Schultheis, “Fast gradient- free optimization of excitations in variational quantum eigensolvers,” Commun. Phys., vol. 8, no. 1, p. 418, Oct. 2025

2025
[30]

Improving variational quantum circuit optimization via hybrid algorithms and random axis initialization,

J. V . Pankkonen, L. Ylinen, M. Raasakka, and I. Tittonen, “Improving variational quantum circuit optimization via hybrid algorithms and random axis initialization,” 2025. [Online]. Available: https://arxiv.org/abs/2503.20728

work page arXiv 2025
[31]

Optimizing parameterized quantum circuits with free- axis selection,

H. C. Watanabe, R. Raymond, Y .-Y . Ohnishi, E. Kaminishi, and M. Sugawara, “Optimizing parameterized quantum circuits with free- axis selection,” in2021 IEEE International Conference on Quantum Computing and Engineering (QCE), 2021, pp. 100–111

2021
[32]

Sequential optimal selections of single-qubit gates in parameterized quantum circuits,

K. Wada, R. Raymond, Y . Sato, and H. C. Watanabe, “Sequential optimal selections of single-qubit gates in parameterized quantum circuits,” Quantum Science and Technology, vol. 9, no. 3, p. 035030, May 2024. [Online]. Available: http://dx.doi.org/10.1088/2058-9565/ad4583

work page doi:10.1088/2058-9565/ad4583 2024
[33]

Optimizing a parameterized controlled gate using free quaternion selection,

H. Kurogi, K. Endo, Y . Sato, M. Sugawara, K. Wada, K. Sugisaki, S. Kanno, H. C. Watanabe, and H. Nakano, “Optimizing a parameterized controlled gate using free quaternion selection,” 2025. [Online]. Available: https://arxiv.org/abs/2409.13547

work page arXiv 2025
[34]

Optimizing parameters of quantum cir- cuits with sparsity-inducing coordinate descent,

R. Raymond and Z. He, “Optimizing parameters of quantum cir- cuits with sparsity-inducing coordinate descent,” inProceedings of the Thirty-Fourth International Joint Conference on Artificial Intelligence. California: International Joint Conferences on Artificial Intelligence Organization, Sep. 2025, pp. 6111–6119

2025
[35]

Stochastic shadow descent: Training parametrized quantum circuits with shadows of gradients,

S. Pramanik and M. G. Chandra, “Stochastic shadow descent: Training parametrized quantum circuits with shadows of gradients,” inTo be presented at IEEE International Conference on Acoustics, Speech and Signal Processing (ICASSP), 2026. [Online]. Available: https://arxiv.org/abs/2511.12168

work page arXiv 2026