pith. sign in

arxiv: 2606.13912 · v2 · pith:RNN7JP6Vnew · submitted 2026-06-11 · ❄️ cond-mat.dis-nn · cond-mat.str-el· cs.LG· physics.comp-ph· quant-ph

Low-variance estimators overcome the phase-gradient bottleneck in complex-valued neural quantum states

Pith reviewed 2026-06-27 04:48 UTC · model grok-4.3

classification ❄️ cond-mat.dis-nn cond-mat.str-elcs.LGphysics.comp-phquant-ph
keywords complex neural quantum statesphase gradientvariational Monte Carlolow-variance estimatorsquantum many-body systemsoptimization bottleneckamplitude-phase separation
0
0 comments X

The pith

Differentiating the local energy at fixed Monte Carlo samples yields an unbiased low-variance estimator of the phase force for complex neural quantum states.

A machine-rendered reading of the paper's core claim, the machinery that carries it, and where it could break.

The paper establishes that the main obstacle to optimizing complex-valued neural quantum states with nontrivial phase structure is high variance in the Monte Carlo estimator of the phase gradient rather than insufficient expressivity of the ansatz. For amplitude-phase separated representations, holding the Monte Carlo samples fixed while differentiating the local energy produces a distinct but unbiased estimator of the identical variational phase force. The construction extends to coupled two-head networks by preserving the amplitude contribution and applying the direct derivative only along the phase path, then combining both via an adaptive minimum-variance mixture. Across flux ladders, chiral chains, two-dimensional cylinders, fermion ladders, shared-weight controls, and a fractional quantum Hall benchmark, the resulting estimators lower phase-gradient variance, reduce seed failures, and convert multi-percent plateaus into sub-percent accuracy.

Core claim

For separated amplitude-phase states, differentiating the local energy at fixed samples gives a different unbiased estimator of the same variational Monte Carlo phase force, without changing the objective. The method extends to coupled two-head networks by keeping the amplitude-gradient contribution and applying the direct derivative only to the phase path, then interpolating between the two estimators with an adaptive minimum-variance mixture during training.

What carries the argument

Direct derivative of the local energy with respect to phase parameters at fixed Monte Carlo samples, serving as an alternative unbiased estimator of the variational phase force.

If this is right

  • The new estimator reduces variance of the phase gradient in variational Monte Carlo training of complex neural quantum states.
  • It suppresses optimization failures that depend on random seed in systems with gauge, chiral, or topological phase structure.
  • Training reaches sub-percent accuracy on benchmarks where the standard estimator plateaus at several percent error.
  • The adaptive mixture construction applies to both separated and shared-weight network architectures.

Where Pith is reading between the lines

These are editorial extensions of the paper, not claims the author makes directly.

  • If the fixed-sample estimator maintains its variance reduction at larger system sizes, it could allow reliable optimization of neural states for phases that were previously inaccessible due to gradient noise.
  • The mixture approach might be generalized to reduce variance in estimators for other variational parameters beyond phase.
  • Applying the estimator to models with explicit anyonic or non-Abelian statistics could test whether the variance benefit persists when phase structure is more intricate.

Load-bearing premise

Fixing the Monte Carlo samples while differentiating the local energy with respect to phase parameters keeps the estimator unbiased for the phase force even after the adaptive mixture is introduced and when amplitude and phase share network weights.

What would settle it

A direct numerical check showing that the expectation of the fixed-sample local-energy derivative deviates from the standard phase-force expectation in a coupled two-head network would disprove unbiasedness.

read the original abstract

Complex neural quantum states are difficult to optimize when their wavefunction phase carries gauge, chiral, fermionic, or topological structure. We show that the major failure mode is not only ansatz expressivity, but the Monte Carlo estimator used to learn this phase. For separated amplitude-phase states, differentiating the local energy at fixed samples gives a different unbiased estimator of the same variational Monte Carlo phase force, without changing the objective. We further extend the construction to coupled two-head networks by keeping the amplitude-gradient contribution and applying the direct derivative only to the phase path. An adaptive minimum-variance mixture interpolates between standard and direct estimators during training. Across flux ladders, chiral chains, two-dimensional flux cylinders, an interacting fermion ladder, shared-network controls, and a fractional quantum Hall benchmark, the resulting estimators reduce phase-gradient variance, suppress seed failures, and often move multi-percent standard-gradient plateaus to sub-percent accuracy.

Editorial analysis

A structured set of objections, weighed in public.

Desk editor's note, referee report, simulated authors' rebuttal, and a circularity audit. Tearing a paper down is the easy half of reading it; the pith above is the substance, this is the friction.

Referee Report

1 major / 0 minor

Summary. The manuscript proposes low-variance Monte Carlo estimators for the phase gradient in complex-valued neural quantum states. For amplitude-phase separated ansatzes, differentiating the local energy at fixed samples yields an alternative unbiased estimator of the variational phase force. The construction is extended to coupled two-head networks by retaining the amplitude-gradient term, applying the direct derivative only along the phase path, and interpolating via an adaptive minimum-variance mixture. Benchmarks on flux ladders, chiral chains, 2D flux cylinders, an interacting fermion ladder, shared-network controls, and a fractional quantum Hall state report reduced phase-gradient variance, fewer optimization failures, and improved accuracy relative to standard estimators.

Significance. If the unbiasedness of the hybrid estimator is rigorously established, the work targets a recognized practical bottleneck in variational Monte Carlo with complex NQS, offering a route to more stable optimization of states with non-trivial phase structure without altering the variational objective. The breadth of numerical tests across distinct physical systems constitutes a concrete strength.

major comments (1)
  1. [Estimator construction for coupled networks] The central claim that the hybrid estimator (amplitude-gradient term retained plus direct phase-path derivative, combined by adaptive mixture) remains exactly unbiased when amplitude and phase paths share network weights is load-bearing. Shared weights couple amplitude and phase contributions inside the local energy, and the adaptive mixing weights are sample-dependent; it is not obvious that the fixed-sample derivative still commutes with the expectation. An explicit derivation (or counter-example) confirming that the expectation equals the true variational phase force under these conditions is required.

Simulated Author's Rebuttal

1 responses · 0 unresolved

We thank the referee for their careful reading and for identifying the need for a rigorous treatment of unbiasedness in the coupled-network hybrid estimator. We address this point below.

read point-by-point responses
  1. Referee: [Estimator construction for coupled networks] The central claim that the hybrid estimator (amplitude-gradient term retained plus direct phase-path derivative, combined by adaptive mixture) remains exactly unbiased when amplitude and phase paths share network weights is load-bearing. Shared weights couple amplitude and phase contributions inside the local energy, and the adaptive mixing weights are sample-dependent; it is not obvious that the fixed-sample derivative still commutes with the expectation. An explicit derivation (or counter-example) confirming that the expectation equals the true variational phase force under these conditions is required.

    Authors: We agree that the sample-dependent adaptive mixing weights introduce a subtlety: because the weights correlate with the per-sample estimators, linearity of expectation alone does not immediately guarantee that the mixture remains exactly unbiased. The manuscript asserts unbiasedness for the separated-amplitude-phase case and extends the construction to coupled networks, but does not supply the explicit derivation requested. In the revised manuscript we will add a dedicated subsection (or appendix) that either (i) derives the conditions under which the hybrid estimator remains exactly unbiased or (ii) clarifies that the estimator is approximately unbiased in practice, with the numerical evidence across multiple systems serving as empirical support. We will also report any additional assumptions required for exact unbiasedness. revision: yes

Circularity Check

0 steps flagged

No circularity: phase estimator derived directly from local-energy differentiation

full rationale

The paper presents the low-variance estimator as obtained by differentiating the local energy with respect to phase parameters while holding Monte Carlo samples fixed, yielding an unbiased estimator of the variational phase force for separated amplitude-phase states; the extension to coupled networks retains the amplitude gradient term and mixes via an adaptive minimum-variance combination. No quoted equation or claim reduces this construction to a fitted parameter renamed as a prediction, a self-citation chain, an ansatz smuggled from prior work, or any other enumerated circular pattern. The derivation is therefore self-contained against the stated variational Monte Carlo objective.

Axiom & Free-Parameter Ledger

1 free parameters · 1 axioms · 0 invented entities

The abstract supplies no explicit free parameters beyond the adaptive mixture weight, no new axioms beyond standard VMC sampling assumptions, and no invented entities.

free parameters (1)
  • adaptive mixture weight
    The minimum-variance mixture interpolates between standard and direct estimators; its instantaneous value is presumably chosen or learned during training.
axioms (1)
  • domain assumption Fixing Monte Carlo samples while differentiating the local energy produces an unbiased estimator of the phase force
    This is the load-bearing step that allows the new estimator to be substituted without altering the variational objective.

pith-pipeline@v0.9.1-grok · 5706 in / 1431 out tokens · 40280 ms · 2026-06-27T04:48:13.398615+00:00 · methodology

discussion (0)

Sign in with ORCID, Apple, or X to comment. Anyone can read and Pith papers without signing in.

Reference graph

Works this paper leans on

59 extracted references · 30 canonical work pages

  1. [1]

    Science355(6325), 602–606 (2017) https://doi.org/10.1126/ science.aag2302

    Carleo, G., Troyer, M.: Solving the quantum many-body problem with artifi- cial neural networks. Science355(6325), 602–606 (2017) https://doi.org/10.1126/ science.aag2302

  2. [2]

    Nature Communications8, 662 (2017) https://doi.org/10

    Gao, X., Duan, L.-M.: Efficient representation of quantum many-body states with deep neural networks. Nature Communications8, 662 (2017) https://doi.org/10. 1038/s41467-017-00705-2

  3. [3]

    Physical Review Letters121, 167204 (2018) https://doi.org/10.1103/PhysRevLett.121.167204

    Choo, K., Carleo, G., Regnault, N., Neupert, T.: Symmetries and many-body excitations with neural-network quantum states. Physical Review Letters121, 167204 (2018) https://doi.org/10.1103/PhysRevLett.121.167204

  4. [4]

    Physical Review Letters124, 020503 (2020) https://doi.org/10.1103/ PhysRevLett.124.020503

    Sharir, O., Levine, Y., Wies, N., Carleo, G., Shashua, A.: Deep autoregres- sive models for the efficient variational simulation of many-body quantum systems. Physical Review Letters124, 020503 (2020) https://doi.org/10.1103/ PhysRevLett.124.020503

  5. [5]

    Physical Review Research2, 023358 (2020) https://doi.org/10.1103/PhysRevResearch.2.023358

    Hibat-Allah, M., Ganahl, M., Hayward, L.E., Melko, R.G., Carrasquilla, J.: Recurrent neural network wave functions. Physical Review Research2, 023358 (2020) https://doi.org/10.1103/PhysRevResearch.2.023358

  6. [6]

    Physical Review X11, 031034 (2021) https://doi.org/10.1103/ PhysRevX.11.031034

    Nomura, Y., Imada, M.: Dirac-type nodal spin liquid revealed by refined quan- tum many-body solver using neural-network wave function, correlation ratio, and level spectroscopy. Physical Review X11, 031034 (2021) https://doi.org/10.1103/ PhysRevX.11.031034

  7. [7]

    SciPost Physics Codebases, 7 (2022) https://doi.org/10.21468/SciPostPhysCodeb.7

    Vicentini, F., Hofmann, D., Szab´ o, A., Wu, D., Roth, C., Giuliani, C., Pescia, G., Nys, J., Vargas-Calder´ on, V., Astrakhantsev, N., Carleo, G.: NetKet 3: Machine 29 learning toolbox for many-body quantum systems. SciPost Physics Codebases, 7 (2022) https://doi.org/10.21468/SciPostPhysCodeb.7

  8. [8]

    The European Physical Journal Plus139, 631 (2024)

    Medvidovi´ c, M., Robledo Moreno, J.: Neural-network quantum states for many- body physics. The European Physical Journal Plus139, 631 (2024)

  9. [9]

    Quantum Science and Technology9(4), 040501 (2024) https://doi.org/10.1088/2058-9565/ad7168

    Lange, H., Walle, A., Abedinnia, A., Bohrdt, A.: From architectures to applica- tions: A review of neural quantum states. Quantum Science and Technology9(4), 040501 (2024) https://doi.org/10.1088/2058-9565/ad7168

  10. [10]

    Journal of Computational Physics399, 108929 (2019) https: //doi.org/10.1016/j.jcp.2019.108929

    Han, J., Zhang, L., E, W.: Solving many-electron schr¨ odinger equation using deep neural networks. Journal of Computational Physics399, 108929 (2019) https: //doi.org/10.1016/j.jcp.2019.108929

  11. [11]

    Physical Review Research2, 033429 (2020) https://doi.org/10.1103/PhysRevResearch.2

    Pfau, D., Spencer, J.S., Matthews, A.G.D.G., Foulkes, W.M.C.: Ab initio solution of the many-electron Schr¨ odinger equation with deep neural networks. Physical Review Research2, 033429 (2020) https://doi.org/10.1103/PhysRevResearch.2. 033429

  12. [12]

    Nature Chemistry12, 891–897 (2020) https://doi.org/10

    Hermann, J., Sch¨ atzle, Z., No´ e, F.: Deep-neural-network solution of the electronic Schr¨ odinger equation. Nature Chemistry12, 891–897 (2020) https://doi.org/10. 1038/s41557-020-0544-y

  13. [13]

    Nature Communications13, 7895 (2022) https://doi.org/10.1038/ s41467-022-35627-1

    Li, X., Li, Z., Chen, J.: Ab initio calculation of real solids via neural net- work ansatz. Nature Communications13, 7895 (2022) https://doi.org/10.1038/ s41467-022-35627-1

  14. [14]

    Nature Computational Science2(5), 331–341 (2022) https://doi.org/10.1038/s43588-022-00228-x

    Scherbela, M., Reisenhofer, R., Gerard, L., Marquetand, P., Grohs, P.: Solving the electronic schr¨ odinger equation for multiple nuclear geometries with weight- sharing deep neural networks. Nature Computational Science2(5), 331–341 (2022) https://doi.org/10.1038/s43588-022-00228-x

  15. [15]

    Nature Machine Intelligence6(2), 209–219 (2024) https://doi.org/10.1038/s42256-024-00794-x

    Li, R., Ye, H., Jiang, D., Wen, X., Wang, C., Li, Z., Li, X., He, D., Chen, J., Ren, W., Wang, L.: A computational framework for neural network-based variational monte carlo with forward laplacian. Nature Machine Intelligence6(2), 209–219 (2024) https://doi.org/10.1038/s42256-024-00794-x

  16. [16]

    Nature Computational Science4(12), 910–919 (2024) https: //doi.org/10.1038/s43588-024-00730-4

    Li, Z., Lu, Z., Li, R., Wen, X., Li, X., Wang, L., Chen, J., Ren, W.: Spin- symmetry-enforced solution of the many-body schr¨ odinger equation with a deep neural network. Nature Computational Science4(12), 910–919 (2024) https: //doi.org/10.1038/s43588-024-00730-4

  17. [17]

    Nature Computational Science5(12), 1147–1157 (2025) https://doi.org/10.1038/s43588-025-00872-z

    Gerard, L., Scherbela, M., Sutterud, H., Foulkes, W.M.C., Grohs, P.: Transferable neural wavefunctions for solids. Nature Computational Science5(12), 1147–1157 (2025) https://doi.org/10.1038/s43588-025-00872-z

  18. [18]

    Nature Computational Science5(12), 1133–1146 (2025) https://doi.org/10.1038/ s43588-025-00932-4

    Tang, Z., Chen, H., Li, Y., Qian, Y., Wang, Y., Fu, W., Li, J., Si, C., 30 Duan, W., Chen, J., Xu, Y.: Deep-learning electronic structure calculations. Nature Computational Science5(12), 1133–1146 (2025) https://doi.org/10.1038/ s43588-025-00932-4

  19. [19]

    arXiv preprint arXiv:2311.02143 (2023) arXiv:2311.02143 [cond-mat.str-el]

    Luo, D., Dai, D.D., Fu, L.: Pairing-based graph neural network for simulat- ing quantum materials. arXiv preprint arXiv:2311.02143 (2023) arXiv:2311.02143 [cond-mat.str-el]

  20. [20]

    Physical Review B111, 205117 (2025) https:// doi.org/10.1103/PhysRevB.111.205117

    Teng, Y., Dai, D.D., Fu, L.: Solving the fractional quantum hall problem with self-attention neural networks. Physical Review B111, 205117 (2025) https:// doi.org/10.1103/PhysRevB.111.205117

  21. [21]

    Physical Review Letters134, 176503 (2025) https://doi.org/10.1103/PhysRevLett.134.176503

    Qian, Y., Zhao, T., Zhang, J., Xiang, T., Li, X., Chen, J.: Describing landau level mixing in fractional quantum hall states with deep learning. Physical Review Letters134, 176503 (2025) https://doi.org/10.1103/PhysRevLett.134.176503

  22. [22]

    arXiv:2512.11962 (2025)

    Zaklama, T., Guerci, D., Fu, L.: Attention-based foundation model for quantum states. arXiv:2512.11962 (2025)

  23. [23]

    arXiv preprint arXiv:2603.02346 (2026) arXiv:2603.02346 [cond- mat.str-el]

    Zaklama, T., Geier, M., Fu, L.: Large electron model: A universal ground state predictor. arXiv preprint arXiv:2603.02346 (2026) arXiv:2603.02346 [cond- mat.str-el]

  24. [24]

    arXiv preprint arXiv:2604.26018 (2026) arXiv:2604.26018 [cond-mat.str-el]

    Nazaryan, K., Fu, L.: QERNEL: A scalable large electron model. arXiv preprint arXiv:2604.26018 (2026) arXiv:2604.26018 [cond-mat.str-el]

  25. [25]

    Troyer \ and\ author U.-J

    Troyer, M., Wiese, U.-J.: Computational complexity and fundamental limita- tions to fermionic quantum Monte Carlo simulations. Physical Review Letters 94, 170201 (2005) https://doi.org/10.1103/PhysRevLett.94.170201

  26. [26]

    Marshall, Antiferromagnetism, Proc

    Marshall, W.: Antiferromagnetism. Proceedings of the Royal Society of London. Series A232, 48–68 (1955) https://doi.org/10.1098/rspa.1955.0200

  27. [27]

    Nature Communications11, 1593 (2020) https://doi.org/ 10.1038/s41467-020-15402-w

    Westerhout, T., Astrakhantsev, N., Tikhonov, K.S., Katsnelson, M.I., Bagrov, A.A.: Generalization properties of neural network approximations to frustrated magnet ground states. Nature Communications11, 1593 (2020) https://doi.org/ 10.1038/s41467-020-15402-w

  28. [28]

    Physical Review Research2, 033075 (2020) https://doi.org/10.1103/ PhysRevResearch.2.033075

    Szab´ o, A., Castelnovo, C.: Neural network wave functions and the sign prob- lem. Physical Review Research2, 033075 (2020) https://doi.org/10.1103/ PhysRevResearch.2.033075

  29. [29]

    SciPost Physics10, 147 (2021) https://doi.org/10.21468/SciPostPhys.10.6.147

    Bukov, M., Schmitt, M., Dupont, M.: Learning the ground state of a non- stoquastic quantum Hamiltonian in a rugged neural network landscape. SciPost Physics10, 147 (2021) https://doi.org/10.21468/SciPostPhys.10.6.147

  30. [30]

    Physical Review Research4, 022026 (2022) https://doi.org/10.1103/PhysRevResearch.4.L022026

    Chen, A., Choo, K., Astrakhantsev, N., Neupert, T.: Neural network evolution 31 strategy for solving quantum sign structures. Physical Review Research4, 022026 (2022) https://doi.org/10.1103/PhysRevResearch.4.L022026

  31. [31]

    Physical Review B64, 144515 (2001) https://doi.org/10.1103/PhysRevB.64.144515

    Orignac, E., Giamarchi, T.: Meissner effect in a bosonic ladder. Physical Review B64, 144515 (2001) https://doi.org/10.1103/PhysRevB.64.144515

  32. [32]

    Nature Physics10, 588–593 (2014) https://doi.org/10.1038/nphys2998

    Atala, M., Aidelsburger, M., Lohse, M., Barreiro, J.T., Paredes, B., Bloch, I.: Observation of chiral currents with ultracold atoms in bosonic ladders. Nature Physics10, 588–593 (2014) https://doi.org/10.1038/nphys2998

  33. [33]

    Cavity electro-optic circuit for microwave-to-optical conversion in the quantum ground state

    H¨ ugel, D., Paredes, B.: Chiral ladders and the edges of quantum Hall insula- tors. Physical Review A89, 023619 (2014) https://doi.org/10.1103/PhysRevA. 89.023619

  34. [34]

    SciPost Physics18, 011 (2025) https://doi.org/10.21468/SciPostPhys.18.1.011

    Ledinauskas, E., Anisimovas, E.: Universal performance gap of neural quantum states applied to the Hofstadter–Bose–Hubbard model. SciPost Physics18, 011 (2025) https://doi.org/10.21468/SciPostPhys.18.1.011

  35. [35]

    Physical Review B111, 045408 (2025) https://doi.org/10.1103/PhysRevB.111.045408

    D¨ oschl, F., Palm, F.A., Lange, H., Grusdt, F., Bohrdt, A.: Neural network quantum states for the interacting Hofstadter model with higher local occu- pations and long-range interactions. Physical Review B111, 045408 (2025) https://doi.org/10.1103/PhysRevB.111.045408

  36. [36]

    Journal of High Energy Physics2024(6), 125 (2024) https://doi.org/10

    Wei, C., Mkhitaryan, V.V., Sedrakyan, T.A.: Unveiling chiral states in the XXZ chain: Finite-size scaling probing symmetry-enrichedc= 1 conformal field the- ories. Journal of High Energy Physics2024(6), 125 (2024) https://doi.org/10. 1007/JHEP06(2024)125

  37. [37]

    Physical Review Letters125, 100503 (2020) https: //doi.org/10.1103/PhysRevLett.125.100503

    Schmitt, M., Heyl, M.: Quantum many-body dynamics in two dimensions with artificial neural networks. Physical Review Letters125, 100503 (2020) https: //doi.org/10.1103/PhysRevLett.125.100503

  38. [38]

    Nature Physics20, 1476–1481 (2024) https://doi.org/10.1038/ s41567-024-02566-1

    Chen, A., Heyl, M.: Empowering deep neural quantum states through efficient optimization. Nature Physics20, 1476–1481 (2024) https://doi.org/10.1038/ s41567-024-02566-1

  39. [39]

    Physical Review B107, 075147 (2023) https: //doi.org/10.1103/PhysRevB.107.075147

    Zhang, Y.-H., Di Ventra, M.: Transformer quantum state: A multipurpose model for quantum many-body problems. Physical Review B107, 075147 (2023) https: //doi.org/10.1103/PhysRevB.107.075147

  40. [40]

    Physical Review B112, 165122 (2025) https://doi.org/ 10.1103/fqxr-r8vw

    Ou, X., Huang, T., Ozoli¸ nˇ s, V.: Improving neural network performance for solving quantum sign structure. Physical Review B112, 165122 (2025) https://doi.org/ 10.1103/fqxr-r8vw . arXiv:2510.02051

  41. [41]

    arXiv preprint arXiv:2507.05352 (2025) https://doi.org/10.48550/arXiv.2507.05352 32 arXiv:2507.05352 [quant-ph]

    Misery, A., Gravina, L., Santini, A., Vicentini, F.: Looking elsewhere: improving variational monte carlo gradients by importance sampling. arXiv preprint arXiv:2507.05352 (2025) https://doi.org/10.48550/arXiv.2507.05352 32 arXiv:2507.05352 [quant-ph]

  42. [42]

    Physical Review Letters80, 4558–4561 (1998) https://doi.org/10.1103/PhysRevLett.80

    Sorella, S.: Green function Monte Carlo with stochastic reconfiguration. Physical Review Letters80, 4558–4561 (1998) https://doi.org/10.1103/PhysRevLett.80. 4558

  43. [43]

    Cambridge University Press, Cambridge (2017)

    Becca, F., Sorella, S.: Quantum Monte Carlo Approaches for Correlated Sys- tems. Cambridge University Press, Cambridge (2017). https://doi.org/10.1017/ 9781316417041

  44. [44]

    Quantum 4, 269 (2020) https://doi.org/10.22331/q-2020-05-25-269

    Stokes, J., Izaac, J., Killoran, N., Carleo, G.: Quantum natural gradient. Quantum 4, 269 (2020) https://doi.org/10.22331/q-2020-05-25-269

  45. [45]

    Machine Learning8, 229–256 (1992) https://doi.org/10

    Williams, R.J.: Simple statistical gradient-following algorithms for connectionist reinforcement learning. Machine Learning8, 229–256 (1992) https://doi.org/10. 1007/BF00992696

  46. [46]

    Journal of Machine Learning Research21(132), 1–62 (2020)

    Mohamed, S., Rosca, M., Figurnov, M., Mnih, A.: Monte Carlo gradient estima- tion in machine learning. Journal of Machine Learning Research21(132), 1–62 (2020)

  47. [47]

    In: International Conference on Learning Representations (ICLR) (2014)

    Kingma, D.P., Welling, M.: Auto-encoding variational Bayes. In: International Conference on Learning Representations (ICLR) (2014). arXiv:1312.6114

  48. [48]

    Physical Review Letters83, 4682–4685 (1999) https://doi.org/10.1103/ PhysRevLett.83.4682

    Assaraf, R., Caffarel, M.: Zero-variance principle for Monte Carlo algo- rithms. Physical Review Letters83, 4682–4685 (1999) https://doi.org/10.1103/ PhysRevLett.83.4682

  49. [49]

    Physical Review Letters69, 2863–2866 (1992) https://doi.org/10.1103/ PhysRevLett.69.2863

    White, S.R.: Density matrix formulation for quantum renormalization groups. Physical Review Letters69, 2863–2866 (1992) https://doi.org/10.1103/ PhysRevLett.69.2863

  50. [50]

    An equivalence between generalized Maxwell model and fractional Zener model, Mechanics of Materials 100:148-153 (2016)

    Schollw¨ ock, U.: The density-matrix renormalization group in the age of matrix product states. Annals of Physics326, 96–192 (2011) https://doi.org/10.1016/j. aop.2010.09.012

  51. [51]

    Stanford University, (2013)

    Owen, A.B.: Monte Carlo Theory, Methods and Examples. Stanford University, (2013). Available at https://artowen.su.domains/mc/

  52. [52]

    Journal of Machine Learning Research 5, 1471–1530 (2004)

    Greensmith, E., Bartlett, P.L., Baxter, J.: Variance reduction techniques for gra- dient estimates in reinforcement learning. Journal of Machine Learning Research 5, 1471–1530 (2004)

  53. [53]

    In: Advances in Neural Information Processing Systems 30 (NeurIPS), pp

    Tucker, G., Mnih, A., Maddison, C.J., Lawson, J., Sohl-Dickstein, J.: REBAR: Low-variance, unbiased gradient estimates for discrete latent variable models. In: Advances in Neural Information Processing Systems 30 (NeurIPS), pp. 2627–2636 (2017) 33

  54. [54]

    In: International Conference on Learning Representations (ICLR) (2018)

    Grathwohl, W., Choi, D., Wu, Y., Roeder, G., Duvenaud, D.: Backpropagation through the void: Optimizing control variates for black-box gradient estima- tion. In: International Conference on Learning Representations (ICLR) (2018). arXiv:1711.00123

  55. [55]

    In: Proceedings of the 17th International Conference on Artificial Intelligence and Statistics (AISTATS)

    Ranganath, R., Gerrish, S., Blei, D.M.: Black box variational inference. In: Proceedings of the 17th International Conference on Artificial Intelligence and Statistics (AISTATS). Proceedings of Machine Learning Research, vol. 33, pp. 814–822 (2014)

  56. [56]

    SIAM Review60(2), 223–311 (2018) https://doi.org/10.1137/ 16M1080173

    Bottou, L., Curtis, F.E., Nocedal, J.: Optimization methods for large-scale machine learning. SIAM Review60(2), 223–311 (2018) https://doi.org/10.1137/ 16M1080173

  57. [57]

    In: 29th Annual Conference on Learning Theory (COLT)

    Lee, J.D., Simchowitz, M., Jordan, M.I., Recht, B.: Gradient descent only con- verges to minimizers. In: 29th Annual Conference on Learning Theory (COLT). Proceedings of Machine Learning Research, vol. 49, pp. 1246–1257 (2016)

  58. [58]

    SIAM Journal on Optimization16(2), 531– 547 (2005) https://doi.org/10.1137/040605266

    Absil, P.-A., Mahony, R., Andrews, B.: Convergence of the iterates of descent methods for analytic cost functions. SIAM Journal on Optimization16(2), 531– 547 (2005) https://doi.org/10.1137/040605266

  59. [59]

    Communications and Control Engineering

    Helmke, U., Moore, J.B.: Optimization and Dynamical Systems. Communications and Control Engineering. Springer, London (1994) 34 Extended Data Fig. 1 Generality to interacting fermions.A 100-site spinless-fermion two- leg flux ladder (nearest-neighbour interactionV= 2, Φ = 0.5π, ten seeds), Jordan–Wigner mapped to spins: relative-error training curves, tai...