pith. sign in

arxiv: 2605.30429 · v1 · pith:X3ULJI22new · submitted 2026-05-28 · 🪐 quant-ph · cs.LG

Attention-based optimizer for symmetry finding

Pith reviewed 2026-06-29 06:33 UTC · model grok-4.3

classification 🪐 quant-ph cs.LG
keywords symmetry findingPauli HamiltoniansSet-Transformerattention mechanismIsing modelToric codequantum optimization
0
0 comments X

The pith

A Set-Transformer uses self-attention on Pauli strings and commutation optimization to locate symmetries of Hamiltonians near-deterministically for physical models.

A machine-rendered reading of the paper's core claim, the machinery that carries it, and where it could break.

The paper presents a machine learning framework that searches for Pauli symmetries in quantum Hamiltonians by encoding correlations among Pauli strings with self-attention. Candidates are decoded and refined through a commutation-based objective until they map to actual symmetries of the input system. On the transverse-field Ising model and Toric code the method reaches near-deterministic success and improves on prior strategies. For random Pauli Hamiltonians the work supplies estimates of the parallel starts and GPUs needed to reach high success probability. A reader would care because symmetries simplify the analysis of physical systems and the approach automates their discovery.

Core claim

Built on a Set-Transformer architecture, the framework uses self-attention to encode the pairwise and higher-order correlations among the Pauli strings. The relations are then decoded as a candidate, which is further optimized with a custom commutation-based objective and mapped to a symmetry of the input Hamiltonian. For physical Hamiltonians including the periodic one- and two-dimensional transverse-field Ising model and the Toric code, the framework succeeds with near-deterministic probability while providing substantial advantage compared to state-of-the-art strategies.

What carries the argument

Set-Transformer architecture that applies self-attention to Pauli strings, followed by commutation-based optimization to produce valid symmetries.

If this is right

  • Physical Hamiltonians such as the transverse-field Ising model and Toric code are handled with near-deterministic success probability.
  • The method supplies a substantial advantage compared to state-of-the-art symmetry-finding strategies.
  • For random Pauli Hamiltonians the number of parallel starts and the number of GPUs needed for high success probability can be estimated under fixed design specifications.

Where Pith is reading between the lines

These are editorial extensions of the paper, not claims the author makes directly.

  • The same attention-plus-commutation loop could be tested on other many-body Hamiltonians to check whether near-deterministic performance holds beyond the models examined.
  • Resource estimates for random cases imply that larger system sizes will require scaling the number of parallel GPU starts accordingly.
  • The output symmetries could be fed directly into existing quantum simulation codes to reduce effective Hilbert-space dimension before numerical work begins.

Load-bearing premise

Self-attention on Pauli strings followed by commutation optimization will reliably map to a valid symmetry of the input Hamiltonian for the tested physical models.

What would settle it

Repeated independent runs on the Toric code that return a candidate which does not commute with every term of the Hamiltonian.

Figures

Figures reproduced from arXiv: 2605.30429 by Alessandro Ricottone, Charlie Nation, Federico Cerisola, Francesco Martini, Luca Dellantonio, Rick P.A. Simon, Shreya Banerjee, Vinodh Raj Rajagopal Muthu.

Figure 1
Figure 1. Figure 1: FIG. 1: Schematic diagram for the attention-based optimization framework to find a Pauli symmetry [PITH_FULL_IMAGE:figures/full_fig_p003_1.png] view at source ↗
Figure 2
Figure 2. Figure 2: FIG. 2: (a): Benchmark on random Hamiltonians with 10 qubits and increasing [PITH_FULL_IMAGE:figures/full_fig_p006_2.png] view at source ↗
read the original abstract

Finding symmetries is crucial for understanding physical models. In this work, we present an optimization framework that searches Pauli symmetries of Hamiltonians, merging the fields of machine learning with automated symmetry finding. Built on a Set-Transformer architecture, our framework uses self-attention to encode the pairwise and higher-order correlations among the Pauli-Strings. The relations are then decoded as a candidate, which is further optimized with a custom commutation-based objective, and mapped to a symmetry of the input Hamiltonian. We apply our method to random Pauli Hamiltonians, periodic one and two dimensional transverse-field Ising model and the Toric code. We show that for physical Hamiltonians (Ising and Toric), our framework succeeds with near-deterministic probability while providing substantial advantage compared to state-of-the-art strategies. For random Pauli Hamiltonians, we estimate the required computational resources, specifically the number of parallel starts and the number of GPUs, to find a symmetry with high success probability under fixed design specifications.

Editorial analysis

A structured set of objections, weighed in public.

Desk editor's note, referee report, simulated authors' rebuttal, and a circularity audit. Tearing a paper down is the easy half of reading it; the pith above is the substance, this is the friction.

Referee Report

0 major / 3 minor

Summary. The paper presents an optimization framework using a Set-Transformer architecture to search for Pauli symmetries of Hamiltonians. Self-attention encodes pairwise and higher-order correlations among Pauli strings; candidates are decoded and refined via a custom commutation-based objective to map to valid symmetries of the input Hamiltonian. The method is applied to random Pauli Hamiltonians, 1D/2D transverse-field Ising models, and the Toric code, with claims of near-deterministic success probability on the physical models, substantial advantage over SOTA strategies, and resource estimates (parallel starts and GPUs) for the random case.

Significance. If the reported success rates hold under the tested conditions, the work demonstrates a viable integration of attention-based ML with domain-specific commutation optimization for symmetry discovery in quantum Hamiltonians. The near-deterministic performance on standard physical models (Ising, Toric) and the provision of concrete resource estimates constitute practical strengths that could aid automated analysis of many-body systems.

minor comments (3)
  1. The abstract asserts near-deterministic success and substantial SOTA advantage without quantitative success rates, baselines, or error bars; these metrics (present in the full manuscript) should be summarized in the abstract for immediate clarity.
  2. The description of the commutation-based objective and its mapping to a valid symmetry would benefit from an explicit equation or pseudocode block to make the optimization step fully reproducible from the text.
  3. Experimental details on the number of trials, definition of success, and exact baselines used for the Ising/Toric comparisons should be consolidated in one dedicated subsection for easier evaluation of the claimed advantage.

Simulated Author's Rebuttal

0 responses · 0 unresolved

We thank the referee for their positive summary of our work and for recommending minor revision. The referee's description accurately reflects the Set-Transformer approach, commutation objective, and results on Ising, Toric code, and random Pauli Hamiltonians. No major comments were provided in the report.

Circularity Check

0 steps flagged

No circularity; empirical ML pipeline on tested Hamiltonians

full rationale

The manuscript describes a Set-Transformer architecture with self-attention on Pauli strings, followed by a commutation-based optimization objective, applied empirically to random Pauli Hamiltonians, Ising models, and the Toric code. Reported success probabilities and resource estimates are direct experimental outcomes on those instances. No equations, parameters, or claims reduce by construction to fitted inputs, self-definitions, or self-citation chains; the central results are falsifiable measurements on standard models with no load-bearing internal reduction.

Axiom & Free-Parameter Ledger

0 free parameters · 0 axioms · 0 invented entities

Abstract provides no explicit free parameters, axioms, or invented entities; the method is described at the level of architecture and objective without detailing internal constants or assumptions beyond standard neural-network training.

pith-pipeline@v0.9.1-grok · 5719 in / 1161 out tokens · 41716 ms · 2026-06-29T06:33:39.798845+00:00 · methodology

discussion (0)

Sign in with ORCID, Apple, or X to comment. Anyone can read and Pith papers without signing in.

Reference graph

Works this paper leans on

87 extracted references · 21 canonical work pages · 13 internal anchors

  1. [1]

    [39] (grey variations)

    as a way to benchmark the CPU (light blue) and GPU (blue) implementations (for device specifications, see Appendix C) of our attention-based optimizer, and compare the results against the deterministic algorithm of Ref. [39] (grey variations). For a periodic chain ofnq spins the input for our frame- work is [68] HIsing =−J nX i=1 ZiZi+1 −h x nX i=1 Xi. Th...

  2. [2]

    are in black (Ising ladder), dark-gray (Toric with ⃗B), and light-gray (Toric without ⃗B). The Hamiltonian for the 2-D Ising ladder withn y = 2 legs andn x rungs ( =⇒n q =n ynx = 2nx) is given by HIL =J X ⟨i,j⟩ ZiZj +h nqX i=1 Xi, where⟨i, j⟩indicates connected qubitsiandj. On the other hand, the Toric code considers a rectangular lat- tice with periodic ...

  3. [3]

    E. Noether, Nachrichten von der K¨ oniglichen Gesellschaft der Wissenschaften zu G¨ ottingen, Mathematisch- Physikalische Klasse , 235 (1918), reprinted/translated in Transport Theory and Statistical Physics1(3), 183–207 (1971)

  4. [4]

    Gross, Proceedings of the National Academy of Sci- ences of the United States of America93, 14256 (1996)

    D. Gross, Proceedings of the National Academy of Sci- ences of the United States of America93, 14256 (1996)

  5. [5]

    E. P. Wigner, Proceedings of the Na- tional Academy of Sciences51, 956 (1964), https://www.pnas.org/doi/pdf/10.1073/pnas.51.5.956

  6. [6]

    Metropolis, A

    N. Metropolis, A. W. Rosenbluth, M. N. Rosenbluth, A. H. Teller, and E. Teller, The Journal of Chemical Physics21, 1087 (1953)

  7. [7]

    Car and M

    R. Car and M. Parrinello, Phys. Rev. Lett.55, 2471 (1985)

  8. [8]

    Landau and K

    D. Landau and K. Binder,A Guide to Monte Carlo Sim- ulations in Statistical Physics, 5th ed. (Cambridge Uni- versity Press, 2021)

  9. [9]

    H. Q. Lin, Phys. Rev. B42, 6561 (1990)

  10. [10]

    Exact diagonalization techniques for quan- tum spin systems,

    J. Schnack, “Exact diagonalization techniques for quan- tum spin systems,” inComputational Modelling of Molec- ular Nanomagnets, edited by G. Rajaraman (Springer International Publishing, Cham, 2023) pp. 155–177

  11. [11]

    Troyer and U.-J

    M. Troyer and U.-J. Wiese, Phys. Rev. Lett.94, 170201 (2005)

  12. [12]

    Manousakis, Rev

    E. Manousakis, Rev. Mod. Phys.63, 1 (1991)

  13. [13]

    S. R. White, Phys. Rev. Lett.69, 2863 (1992)

  14. [14]

    Banks, J

    J. Banks, J. Garza-Vargas, A. Kulkarni, and N. Sri- vastava, Foundations of Computational Mathematics , 1 (2022)

  15. [15]

    ¨Ostlund and S

    S. ¨Ostlund and S. Rommer, Phys. Rev. Lett.75, 3537 (1995)

  16. [16]

    Verstraete and J

    F. Verstraete and J. I. Cirac, Phys. Rev. B73, 094423 (2006)

  17. [17]

    J. I. Cirac, D. P´ erez-Garc´ ıa, N. Schuch, and F. Ver- straete, Rev. Mod. Phys.93, 045003 (2021)

  18. [18]

    Verstraete, M

    F. Verstraete, M. M. Wolf, D. Perez-Garcia, and J. I. Cirac, Phys. Rev. Lett.96, 220601 (2006)

  19. [19]

    Renormalization algorithms for Quantum-Many Body Systems in two and higher dimensions

    F. Verstraete and J. I. Cirac, “Renormalization algo- rithms for quantum-many body systems in two and higher dimensions,” (2004), arXiv:cond-mat/0407066 [cond-mat.str-el]

  20. [20]

    Shi, L.-M

    Y.-Y. Shi, L.-M. Duan, and G. Vidal, Phys. Rev. A74, 022320 (2006)

  21. [21]

    Tagliacozzo, G

    L. Tagliacozzo, G. Evenbly, and G. Vidal, Phys. Rev. B 80, 235127 (2009)

  22. [22]

    Cheng, L

    S. Cheng, L. Wang, T. Xiang, and P. Zhang, Phys. Rev. B99, 155131 (2019)

  23. [23]

    Or´ us, Nature Reviews Physics1, 538 (2019)

    R. Or´ us, Nature Reviews Physics1, 538 (2019)

  24. [24]

    Lanthier, J

    B. Lanthier, J. Cˆ ot´ e, and S. Kourtis, Frontiers in Physics Volume 12 - 2024(2024), 10.3389/fphy.2024.1431810

  25. [25]

    Haussler and M

    D. Haussler and M. Warmuth, The Mathematics of Gen- eralization , 17 (2018)

  26. [26]

    P. Horn, V. Saz Ulibarrena, B. Koren, and S. Portegies Zwart, Journal of Computational Physics521, 113536 (2025)

  27. [27]

    Greydanus, M

    S. Greydanus, M. Dzamba, and J. Yosinski, Advances in neural information processing systems32(2019)

  28. [28]

    M. D. Cranmer, S. Greydanus, S. Hoyer, P. W. Battaglia, D. N. Spergel, and S. Ho, CoRRabs/2003.04630 (2020), 2003.04630

  29. [29]

    Mandal, Y

    A. Mandal, Y. Tiwari, P. K. Panigrahi, and M. Pal, Chaos, Solitons & Fractals164, 112670 (2022)

  30. [30]

    Sanchez-Gonzalez, J

    A. Sanchez-Gonzalez, J. Godwin, T. Pfaff, R. Ying, J. Leskovec, and P. W. Battaglia, CoRR abs/2002.09405(2020), 2002.09405. 10

  31. [31]

    Corso, H

    G. Corso, H. Stark, S. Jegelka, T. Jaakkola, and R. Barzilay, Nature Reviews Methods Primers4, 17 (2024)

  32. [32]

    A. C. Cenxin, K. Onggadinata, D. Kaszlikowski, and V. Scarani, PRX Quantum4, 020352 (2023)

  33. [33]

    Y. R. Sanders, D. W. Berry, P. C. Costa, L. W. Tessler, N. Wiebe, C. Gidney, H. Neven, and R. Babbush, PRX Quantum1, 020312 (2020)

  34. [34]

    Gray and S

    J. Gray and S. Kourtis, Quantum5, 410 (2021)

  35. [35]

    Kundu, P

    A. Kundu, P. Bede lek, M. Ostaszewski, O. Danaci, Y. J. Patel, V. Dunjko, and J. A. Miszczak, New Journal of Physics26, 013034 (2024)

  36. [36]

    Kundu, Machine Learning: Science and Technology 6, 025066 (2025)

    A. Kundu, Machine Learning: Science and Technology 6, 025066 (2025)

  37. [37]

    Eisert, M

    J. Eisert, M. Cramer, and M. B. Plenio, Rev. Mod. Phys. 82, 277 (2010)

  38. [38]

    P. W. Koh, S. Sagawa, H. Marklund, S. M. Xie, M. Zhang, A. Balsubramani, W. Hu, M. Yasunaga, R. L. Phillips, S. Beery, J. Leskovec, A. Kundaje, E. Pierson, S. Levine, C. Finn, and P. Liang, CoRR abs/2012.07421(2020), 2012.07421

  39. [39]

    Bettaque and B

    V. Bettaque and B. Swingle, Quantum8, 1362 (2024)

  40. [40]

    Tapering off qubits to simulate fermionic Hamiltonians

    S. Bravyi, J. M. Gambetta, A. Mezzacapo, and K. Temme, “Tapering off qubits to simulate fermionic hamiltonians,” (2017), arXiv:1701.08213 [quant-ph]

  41. [41]

    L. G. Gunderman, A. Jena, and L. Dellantonio, Phys. Rev. A109, 022618 (2024)

  42. [42]

    van den Berg and K

    E. van den Berg and K. Temme, Quantum4, 322 (2020)

  43. [43]

    Stabilizer Codes and Quantum Error Correction

    D. Gottesman,Stabilizer Codes and Quantum Error Cor- rection, Ph.D. thesis, California Institute of Technology (1997), arXiv:quant-ph/9705052

  44. [44]

    The Heisenberg Representation of Quantum Computers

    D. Gottesman, inProceedings of the XXII International Colloquium on Group Theoretical Methods in Physics (1999) pp. 32–43, arXiv:quant-ph/9807006

  45. [45]

    Aaronson and D

    S. Aaronson and D. Gottesman, Phys. Rev. A70, 052328 (2004)

  46. [46]

    Krippendorf and M

    S. Krippendorf and M. Syvaeri, Machine Learning: Sci- ence and Technology2, 015010 (2020)

  47. [47]

    Liu and M

    Z. Liu and M. Tegmark, Phys. Rev. Lett.128, 180201 (2022)

  48. [48]

    Calvo-Barl´ es, S

    P. Calvo-Barl´ es, S. G. Rodrigo, E. S´ anchez-Burillo, and L. Mart´ ın-Moreno, Phys. Rev. E110, 045304 (2024)

  49. [49]

    Learning Symmetries of Classical Integrable Systems

    R. Bondesan and A. Lamacraft, “Learning symmetries of classical integrable systems,” (2019), arXiv:1906.04645 [physics.comp-ph]

  50. [50]

    R. T. Forestano, K. T. Matchev, K. Matcheva, A. Ro- man, E. B. Unlu, and S. Verner, Machine Learning: Sci- ence and Technology4, 025027 (2023)

  51. [51]

    Clifford symmetries in quantum many-body systems

    C. Nation, R. P. A. Simon, S. Banerjee, F. Martini, A. Ricottone, F. Cerisola, and L. Dellantonio, “Clifford symmetries in quantum many-body systems,” (2026), arXiv:2605.18966 [quant-ph]

  52. [52]

    Graph automorphism approach to obtain clifford symmetries in open and closed qudit models,

    C. Nation, R. P. A. Simon, F. Martini, A. Ricottone, S. Banerjee, F. Cerisola, and L. Dellantonio, “Graph automorphism approach to obtain clifford symmetries in open and closed qudit models,” (2026)

  53. [53]

    J. Lee, Y. Lee, J. Kim, A. R. Kosiorek, S. Choi, and Y. W. Teh, inICML(2018)

  54. [54]

    Connor, G

    M. Connor, G. Canal, and C. Rozell, inProceedings of The 24th International Conference on Artificial Intelli- gence and Statistics, Proceedings of Machine Learning Research, Vol. 130, edited by A. Banerjee and K. Fuku- mizu (PMLR, 2021) pp. 2359–2367

  55. [55]

    Vaswani, N

    A. Vaswani, N. Shazeer, N. Parmar, J. Uszkoreit, L. Jones, A. N. Gomez, L. Kaiser, and I. Polosukhin, inAdvances in Neural Information Processing Systems, Vol. 30 (2017)

  56. [56]

    Z. Yang, D. Yang, C. Dyer, X. He, A. Smola, and E. Hovy, inProceedings of the 2016 Conference of the North American Chapter of the Association for Com- putational Linguistics: Human Language Technologies, edited by K. Knight, A. Nenkova, and O. Rambow (As- sociation for Computational Linguistics, San Diego, Cal- ifornia, 2016) pp. 1480–1489

  57. [57]

    J. L. Ba, J. R. Kiros, and G. E. Hinton, arXiv preprint arXiv:1607.06450 (2016)

  58. [58]

    C. J. Maddison, A. Mnih, and Y. W. Teh, CoRR abs/1611.00712(2016), 1611.00712

  59. [59]

    Learning Sparse Neural Networks through $L_0$ Regularization

    C. Louizos, M. Welling, and D. P. Kingma, “Learn- ing sparse neural networks throughl 0 regularization,” (2018), arXiv:1712.01312 [stat.ML]

  60. [60]

    Zaheer, S

    M. Zaheer, S. Kottur, S. Ravanbakhsh, B. Poczos, R. R. Salakhutdinov, and A. J. Smola, Advances in neural in- formation processing systems30(2017)

  61. [61]

    Dehaene and B

    J. Dehaene and B. De Moor, Phys. Rev. A68, 042318 (2003)

  62. [62]

    B. M. Terhal, Rev. Mod. Phys.87, 307 (2015)

  63. [63]

    Haah, Revista Colombiana de Matem´ aticas50, 299 (2017)

    J. Haah, Revista Colombiana de Matem´ aticas50, 299 (2017)

  64. [64]

    Leviathan, M

    Y. Leviathan, M. Kalman, and Y. Matias, inProceedings of the International Conference on Learning Representa- tions (ICLR)(2025) arXiv:2410.02703 [cs.CL]

  65. [65]

    Jurafsky and J

    D. Jurafsky and J. H. Martin,Speech and Language Pro- cessing: An Introduction to Natural Language Process- ing, Computational Linguistics, and Speech Recognition with Language Models, 3rd ed. (2026) online manuscript released January 6, 2026

  66. [66]

    Neural Machine Translation by Jointly Learning to Align and Translate

    D. Bahdanau, K. Cho, and Y. Bengio, arXiv preprint arXiv:1409.0473 (2014)

  67. [67]

    Efficient Estimation of Word Representations in Vector Space

    T. Mikolov, K. Chen, G. Corrado, and J. Dean, arXiv preprint arXiv:1301.3781 (2013)

  68. [68]

    X. S. Huang, F. Perez, J. Ba, and M. Volkovs, inPro- ceedings of the 37th International Conference on Machine Learning, Proceedings of Machine Learning Research, Vol. 119, edited by H. D. III and A. Singh (PMLR, 2020) pp. 4475–4483

  69. [69]

    Dean and M

    T. Dean and M. Boddy, inProceedings of the Seventh AAAI National Conference on Artificial Intelligence, AAAI’88 (AAAI Press, 1988) p. 49–54

  70. [70]

    Pfeuty, Annals of Physics57, 79 (1970)

    P. Pfeuty, Annals of Physics57, 79 (1970)

  71. [71]

    C. R. Laumann, R. Moessner, A. Scardicchio, and S. L. Sondhi, Phys. Rev. Lett.109, 030502 (2012)

  72. [72]

    Quantum simulation and ground state preparation for the honeycomb kitaev model,

    T. A. Bespalova and O. Kyriienko, “Quantum simulation and ground state preparation for the honeycomb kitaev model,” (2021), arXiv:2109.13883 [quant-ph]

  73. [73]

    Dennis, A

    E. Dennis, A. Kitaev, A. Landahl, and J. Preskill, Jour- nal of Mathematical Physics43, 4452 (2002)

  74. [74]

    Gelman, J

    A. Gelman, J. B. Carlin, H. S. Stern, and D. B. Ru- bin,Bayesian Data Analysis, 1st ed. (Chapman and Hall/CRC, 1995)

  75. [75]

    J. K. Kruschke,Doing Bayesian Data Analysis: A Tuto- rial with R, JAGS, and Stan, 2nd ed. (Academic Press, 2014)

  76. [76]

    Enhanced measurements on quantum comput- ers via the simultaneous probing of non-commuting pauli operators,

    R. P. A. Simon, Z. Shi, C. Nation, A. Jena, and L. Del- lantonio, “Enhanced measurements on quantum comput- ers via the simultaneous probing of non-commuting pauli operators,” (2025), arXiv:2509.01482 [quant-ph]

  77. [77]

    An Error-aware and Adaptive Method for the Estimation of Quantum Observables on Qudit-Based Quantum Computers

    R. P. A. Simon, M. Meth, F. Martini, P. Tirler, A. Jena, 11 M. Ringbauer, and L. Dellantonio, “An error-aware and adaptive method for the estimation of quantum ob- servables on qudit-based quantum computers,” (2026), arXiv:2605.00682 [quant-ph]

  78. [78]

    Shlosberg, A

    A. Shlosberg, A. J. Jena, P. Mukhopadhyay, J. F. Haase, F. Leditzky, and L. Dellantonio, Quantum7, 906 (2023)

  79. [79]

    Sympleq,

    C. Nation, R. P. A. Simon, F. Martini, A. Ricottone, S. Banerjee, F. Cerisola, and L. Dellantonio, “Sympleq,” (2026)

  80. [80]

    T. Dao, D. Y. Fu, S. Ermon, A. Rudra, and C. R´ e, inProceedings of the 36th International Conference on Neural Information Processing Systems, NIPS ’22 (Cur- ran Associates Inc., Red Hook, NY, USA, 2022)

Showing first 80 references.