An Iterative Dual-Channel Neural Quantum State Algorithm for Selected Configuration Interaction
Pith reviewed 2026-06-26 02:46 UTC · model grok-4.3
The pith
A dual-channel Transformer neural quantum state achieves chemical accuracy in selected configuration interaction with more favorable determinant scaling than CIPSI.
A machine-rendered reading of the paper's core claim, the machinery that carries it, and where it could break.
Core claim
The central claim is that embedding an autoregressive Transformer neural quantum state within the iterative framework of Sample-Based Quantum Diagonalization, using a dual-channel architecture with explicit spin cross-attention and distilling the subspace eigenvector via a factorized spin-marginal teacher signal, produces determinantal expansions that reach chemical accuracy while exhibiting substantially more favorable determinant-count scaling than CIPSI-based selected configuration interaction on the tested systems.
What carries the argument
The dual-channel autoregressive Transformer with spin-up/spin-down cross-attention, combined with the handover mechanism that distills the exact eigenvector into the network through a factorized spin-marginal teacher signal after each subspace diagonalization.
If this is right
- HI-NQS reaches chemical accuracy on all small-molecule and nitrogen active-space systems tested.
- Determinant-count scaling is substantially more favorable than CIPSI-based SCI for all but the smallest active spaces.
- All calculations run on classical GPU hardware with no quantum computing resources required.
- The closed feedback loop between generative sampling and exact diagonalization improves configuration selection efficiency.
Where Pith is reading between the lines
- The distillation step could be generalized to other iterative selected-configuration methods that already perform subspace diagonalizations.
- The same dual-channel architecture might reduce sampling cost in related variational Monte Carlo approaches that lack an exact diagonalization step.
- If the scaling advantage persists at larger active spaces, the method could become competitive with density-matrix renormalization group for quasi-one-dimensional strongly correlated systems.
Load-bearing premise
The factorized spin-marginal teacher signal obtained after each subspace diagonalization is sufficient to distill the exact eigenvector back into the autoregressive Transformer so that subsequent generative sampling identifies chemically important configurations more efficiently.
What would settle it
A benchmark on a larger active space or molecule in which the number of determinants required to reach chemical accuracy exceeds that of CIPSI-based SCI or fails to reach chemical accuracy altogether would falsify the reported scaling advantage.
Figures
read the original abstract
Accurately solving the electronic Schr\"{o}dinger equation for strongly correlated systems remains a central challenge in quantum chemistry, where the exponential growth of configuration space limits the applicability of exact methods. Selected Configuration Interaction (SCI) algorithms address this challenge by adaptively constructing compact determinantal expansions, yet their efficiency depends critically on the quality of the sampling strategy used to identify chemically important configurations. Here we introduce the Handover Iterative Neural Quantum State (HI-NQS) algorithm, which embeds a classically trained autoregressive Transformer neural quantum state within the iterative sample--diagonalize--update framework of Sample-Based Quantum Diagonalization. A dual-channel Transformer architecture with explicit spin-up/spin-down cross-attention encodes fermionic spin structure as an architectural inductive bias, enabling expressive and physically informed wavefunction representations. After each subspace diagonalization, the resulting eigenvector is distilled back into the network through a factorized spin-marginal teacher signal, establishing a closed feedback loop between generative sampling and exact diagonalization. Benchmarks across a range of small molecules and a systematic nitrogen active-space series demonstrate that HI-NQS achieves chemical accuracy on all systems tested, with determinant-count scaling substantially more favorable than conventional CIPSI-based SCI for all but the smallest active spaces. All calculations are performed on GPU hardware without quantum computing resources, establishing HI-NQS as an efficient and scalable purely classical approach to the selected configuration interaction problem.
Editorial analysis
A structured set of objections, weighed in public.
Referee Report
Summary. The manuscript introduces the Handover Iterative Neural Quantum State (HI-NQS) algorithm, which embeds a dual-channel autoregressive Transformer neural quantum state (with explicit spin-up/spin-down cross-attention) into the iterative sample-diagonalize-update loop of Sample-Based Quantum Diagonalization. After each subspace diagonalization, the exact eigenvector is distilled back into the network via a factorized spin-marginal teacher signal. The central claim is that this closed-loop procedure achieves chemical accuracy across small molecules and a systematic nitrogen active-space series while exhibiting substantially more favorable determinant-count scaling than conventional CIPSI-based SCI for all but the smallest active spaces.
Significance. If the reported chemical accuracy and scaling advantage are robustly supported by the benchmarks, the work would constitute a meaningful advance in classical selected-configuration-interaction methods for strongly correlated electrons. The architectural inductive bias for fermionic spin structure and the use of an exact diagonalization teacher signal independent of the network are positive features that distinguish the approach from purely variational neural quantum states.
major comments (2)
- [Abstract and method description of the distillation step] Abstract and method description of the distillation step: the central performance claims (chemical accuracy and determinant scaling superior to CIPSI on the nitrogen series) rest on the assumption that the factorized spin-marginal teacher signal is informationally sufficient to distill the full eigenvector into the autoregressive Transformer. Because the factorization separates the up and down marginals before feedback, any joint spin correlations not captured by the marginals are lost; the manuscript provides no explicit analysis, ablation, or information-theoretic argument showing that the dual-channel cross-attention compensates for this loss in multi-reference regimes.
- [Benchmarks on the nitrogen active-space series] Benchmarks on the nitrogen active-space series: the scaling advantage is asserted to hold for all but the smallest active spaces, yet the strength of this claim cannot be evaluated without the concrete determinant counts, energy errors, and error bars for the largest active spaces tested. If the factorized teacher signal is incomplete, the generative sampling would not preferentially identify the chemically important determinants that CIPSI already locates, undermining the reported scaling comparison.
minor comments (1)
- [Computational details] The abstract states that all calculations are performed on GPU hardware without quantum resources; this is a useful clarification but should be repeated with hardware specifications in the computational-details section for reproducibility.
Simulated Author's Rebuttal
We thank the referee for their constructive and detailed review. We address each major comment below, providing clarifications based on the manuscript content and indicating where revisions will strengthen the presentation.
read point-by-point responses
-
Referee: Abstract and method description of the distillation step: the central performance claims (chemical accuracy and determinant scaling superior to CIPSI on the nitrogen series) rest on the assumption that the factorized spin-marginal teacher signal is informationally sufficient to distill the full eigenvector into the autoregressive Transformer. Because the factorization separates the up and down marginals before feedback, any joint spin correlations not captured by the marginals are lost; the manuscript provides no explicit analysis, ablation, or information-theoretic argument showing that the dual-channel cross-attention compensates for this loss in multi-reference regimes.
Authors: We agree that the manuscript does not contain an explicit ablation study or information-theoretic quantification of information loss from the factorized marginals. The dual-channel Transformer with cross-attention is constructed precisely to allow the model to learn inter-spin correlations from the separate marginal teacher signals, and the exact diagonalization step supplies a complete target eigenvector at each iteration. Nevertheless, to directly address the concern, we will add a dedicated paragraph in the Methods section explaining the architectural inductive bias and how the closed feedback loop mitigates potential loss of joint correlations. This revision will be textual and will not require new calculations. revision: partial
-
Referee: Benchmarks on the nitrogen active-space series: the scaling advantage is asserted to hold for all but the smallest active spaces, yet the strength of this claim cannot be evaluated without the concrete determinant counts, energy errors, and error bars for the largest active spaces tested. If the factorized teacher signal is incomplete, the generative sampling would not preferentially identify the chemically important determinants that CIPSI already locates, undermining the reported scaling comparison.
Authors: Section 4 and the associated figures report chemical accuracy and improved determinant scaling for the nitrogen series relative to CIPSI. To make the quantitative basis of the scaling claim fully transparent, we will add a supplementary table listing, for each active space tested, the number of determinants retained, the energy error relative to the reference (FCI or DMRG where available), and any statistical error bars from multiple runs. These data already exist in our internal records and will allow readers to verify that the generative sampling preferentially recovers chemically relevant determinants. revision: yes
Circularity Check
No significant circularity; derivation relies on independent external diagonalization
full rationale
The HI-NQS method embeds an autoregressive Transformer within an iterative sample-diagonalize-update loop, where the eigenvector from each subspace diagonalization serves as an external teacher signal distilled via factorized spin-marginals. This signal is generated by exact diagonalization independent of the network parameters, and the reported chemical accuracy plus determinant scaling advantages are benchmarked against conventional CIPSI on external molecular systems. No equations or steps reduce the claimed predictions to fitted inputs by construction, nor do any load-bearing claims rest on self-citations or ansatzes imported from prior author work. The architecture and feedback loop remain self-contained against external benchmarks.
Axiom & Free-Parameter Ledger
free parameters (1)
- Transformer network weights
axioms (1)
- domain assumption The electronic wavefunction must be antisymmetric under particle exchange.
invented entities (1)
-
Dual-channel Transformer with explicit spin-up/spin-down cross-attention
no independent evidence
Reference graph
Works this paper leans on
-
[1]
S.Modern Quantum Chemistry: Introduction to Advanced Electronic Structure Theory; Macmillan: New York, 1982
Szabo, A.; Ostlund, N. S.Modern Quantum Chemistry: Introduction to Advanced Electronic Structure Theory; Macmillan: New York, 1982
1982
-
[2]
P.; Rancurel, P
Huron, B.; Malrieu, J. P.; Rancurel, P. Iterative perturbation calculations of ground and excited state energies from multiconfigurational zeroth-order wavefunctions.J. Chem. Phys.1973,58, 5745–5759
1973
-
[3]
A.; Tubman, N
Holmes, A. A.; Tubman, N. M.; Umrigar, C. J. Heat-bath configuration interaction: An efficient selected configuration interaction algorithm inspired by heat-bath sampling.J. Chem. Theory Comput.2016, 12, 3674–3680
2016
-
[4]
A.; Jeanmairet, G.; Alavi, A.; Umrigar, C
Sharma, S.; Holmes, A. A.; Jeanmairet, G.; Alavi, A.; Umrigar, C. J. Semistochastic heat-bath configu- ration interaction method: Selected configuration interaction with semistochastic perturbation theory. J. Chem. Theory Comput.2017,13, 1595–1604
2017
-
[5]
M.; Freeman, C
Tubman, N. M.; Freeman, C. D.; Levine, D. S.; Hait, D.; Head-Gordon, M.; Whaley, K. B. Modern Ap- proaches to Exact Diagonalization and Selected Configuration Interaction with the Adaptive Sampling CI Method.J. Chem. Theory Comput.2020,16, 2139–2159
2020
-
[6]
B.; Evangelista, F
Schriber, J. B.; Evangelista, F. A. Communication: An adaptive configuration interaction approach for strongly correlated electrons with tunable accuracy.J. Chem. Phys.2016,144, 161106
2016
-
[7]
K.-L.; Sharma, S
Chan, G. K.-L.; Sharma, S. The density matrix renormalization group in quantum chemistry.Annu. Rev. Phys. Chem.2011,62, 465–481
2011
-
[8]
H.; Thom, A
Booth, G. H.; Thom, A. J. W.; Alavi, A. Fermion Monte Carlo without fixed nodes: a game of life, death, and annihilation in Slater determinant space.J. Chem. Phys.2009,131, 054106
2009
-
[9]
J.; Aspuru-Guzik, A.; O’Brien, J
Peruzzo, A.; McClean, J.; Shadbolt, P.; Yung, M.-H.; Zhou, X.-Q.; Love, P. J.; Aspuru-Guzik, A.; O’Brien, J. L. A variational eigenvalue solver on a photonic quantum processor.Nat. Commun.2014, 5, 4213
2014
-
[10]
R.; Romero, J.; Babbush, R.; Aspuru-Guzik, A
McClean, J. R.; Romero, J.; Babbush, R.; Aspuru-Guzik, A. The theory of variational hybrid quantum- classical algorithms.New J. Phys.2016,18, 023023
2016
-
[11]
R.; Economou, S
Grimsley, H. R.; Economou, S. E.; Barnes, E.; Mayhall, N. J. An adaptive variational algorithm for exact molecular simulations on a quantum computer.Nat. Commun.2019,10, 3007
2019
-
[12]
R.; Boixo, S.; Smelyanskiy, V
McClean, J. R.; Boixo, S.; Smelyanskiy, V. N.; Babbush, R.; Neven, H. Barren plateaus in quantum neural network training landscapes.Nat. Commun.2018,9, 4812. 13
2018
-
[13]
J.; Cincio, L.; McClean, J
Larocca, M.; Thanasilp, S.; Wang, S.; Sharma, K.; Biamonte, J.; Coles, P. J.; Cincio, L.; McClean, J. R.; Holmes, Z.; Cerezo, M. Barren plateaus in variational quantum computing.Nat. Rev. Phys.2025,7, 174–189
2025
-
[14]
B.; Troyer, M
Wecker, D.; Hastings, M. B.; Troyer, M. Progress towards practical quantum variational algorithms. Phys. Rev. A2015,92, 042303
-
[15]
F.; Radin, M
Gonthier, J. F.; Radin, M. D.; Buda, C.; Doskocil, E. J.; Abuan, C. M.; Romero, J. Measurements as a roadblock to near-term practical quantum advantage in chemistry: Resource analysis.Phys. Rev. Research2022,4, 033154
-
[16]
Robledo-Moreno, J. et al. Chemistry beyond the scale of exact diagonalization on a quantum-centric supercomputer.Sci. Adv.2025,11, eadu9991, arXiv:2405.05068
arXiv 2025
-
[17]
Yu, J. et al. Quantum-centric algorithm for sample-based Krylov diagonalization.arXiv2025, arXiv:2501.09702
-
[18]
Kanno, K.; Kohda, M.; Imai, R.; Koh, S.; Mitarai, K.; Mizukami, W.; Nakagawa, Y. O. Quantum- selected configuration interaction: classical diagonalization of Hamiltonians in subspaces selected by quantum computers.Phys. Rev. Research2026,8, 023268, arXiv:2302.11320
-
[19]
Pellow-Jarman, A.; McFarthing, S.; Kang, D. H.; Yoo, P.; Elala, E. E.; Pellow-Jarman, R.; Nakliang, P. M.; Kim, J.; Rhee, J.-K. K. HIVQE: handover iterative variational quantum eigensolver for efficient quantum chemistry calculations.arXiv2025, arXiv:2503.06292
-
[20]
Yoo, P. et al. Extending the handover-iterative VQE to challenging strongly correlated systems: N2 and Fe–S cluster.arXiv2026, arXiv:2601.06935
-
[21]
Solving the quantum many-body problem with artificial neural networks.Science 2017,355, 602–606
Carleo, G.; Troyer, M. Solving the quantum many-body problem with artificial neural networks.Science 2017,355, 602–606
2017
-
[22]
S.; Matthews, A
Pfau, D.; Spencer, J. S.; Matthews, A. G. D. G.; Foulkes, W. M. C. Ab initio solution of the many- electron Schrödinger equation with deep neural networks.Phys. Rev. Research2020,2, 033429
-
[23]
Deep-neural-network solution of the electronic Schrödinger equation
Hermann, J.; Schätzle, Z.; Noé, F. Deep-neural-network solution of the electronic Schrödinger equation. Nat. Chem.2020,12, 891–897
2020
-
[24]
Fermionic neural-network states for ab-initio electronic structure
Choo, K.; Mezzacapo, A.; Carleo, G. Fermionic neural-network states for ab-initio electronic structure. Nat. Commun.2020,11, 2368
2020
-
[25]
From architectures to applications: a review of neural quantum states.Quantum Sci
Lange, H.; Van de Walle, A.; Abedinnia, A.; Bohrdt, A. From architectures to applications: a review of neural quantum states.Quantum Sci. Technol.2024,9, 040501
2024
-
[26]
Green function Monte Carlo with stochastic reconfiguration.Phys
Sorella, S. Green function Monte Carlo with stochastic reconfiguration.Phys. Rev. Lett.1998,80, 4558–4561
1998
-
[27]
A nonstochastic optimization algorithm for neural-network quantum states.J
Li, X.; Huang, J.-C.; Zhang, G.-Z.; Li, H.-E.; Cao, C.-s.; Lv, D.; Hu, H.-S. A nonstochastic optimization algorithm for neural-network quantum states.J. Chem. Theory Comput.2023,19, 8156–8165
2023
-
[28]
Empowering deep neural quantum states through efficient optimization.Nat
Chen, A.; Heyl, M. Empowering deep neural quantum states through efficient optimization.Nat. Phys. 2024,20, 1476–1481
2024
-
[29]
Schmerwitz, Y. L. A.; Thirion, L.; Levi, G.; Jónsson, E. O.; Bilous, P.; Jónsson, H.; Hansmann, P. Neural-Network-Based Selective Configuration Interaction Approach to Molecular Electronic Structure. J. Chem. Theory Comput.2025,21, 2301–2310, arXiv:2406.08154
arXiv 2025
-
[30]
Bilous, P.; Thirion, L.; Menke, H.; Haverkort, M. W.; Pálffy, A.; Hansmann, P. Neural-network- supported basis optimizer for the configuration interaction problem in quantum many-body clusters: Feasibility study and numerical proof.Phys. Rev. B2025,111, 035124, arXiv:2406.00151. 14
-
[31]
Thirion, L.; Schmerwitz, Y. L. A.; Kroesbergen, M.; Levi, G.; Jónsson, E. O.; Bilous, P.; Jóns- son, H.; Hansmann, P. Natural-orbital-based neural network configuration interaction.arXiv2025, arXiv:2510.27665
-
[32]
Coe, J. P. Machine Learning Configuration Interaction.J. Chem. Theory Comput.2018,14, 5739–5749
2018
-
[33]
Solving the Schrödinger Equation in the Configuration Space with Generative Machine Learning.J
Herzog, B.; Casier, B.; Lebègue, S.; Rocca, D. Solving the Schrödinger Equation in the Configuration Space with Generative Machine Learning.J. Chem. Theory Comput.2023,19, 2484–2490
2023
-
[34]
Sun, D. et al. A Fully GPU-Accelerated Framework for High-Performance Configuration Interaction Selection with Neural Network Quantum States.arXiv2026, arXiv:2604.15768
-
[35]
NNQS-Transformer: An Efficient and Scalable Neural Network Quantum States Approach for Ab initio Quantum Chemistry
Wu, Y.; Guo, C.; Fan, Y.; Zhou, P.; Shang, H. NNQS-Transformer: An Efficient and Scalable Neural Network Quantum States Approach for Ab initio Quantum Chemistry. Proceedings of the International Conference for High Performance Computing, Networking, Storage and Analysis (SC’23). 2023
2023
-
[36]
Solving the many-electron Schrödinger equation with a transformer-based framework.Nat
Shang, H.; Guo, C.; Wu, Y.; Li, Z.; Yang, J. Solving the many-electron Schrödinger equation with a transformer-based framework.Nat. Commun.2025,16, 8464
2025
-
[37]
Solanki, M. J.; Ding, L.; Reiher, M. Neural Quantum States Based on Selected Configurations.arXiv 2026, arXiv:2602.12993
arXiv 2026
-
[38]
Hamiltonian-guided autoregressive selected-configuration inter- action achieves chemical accuracy in strongly correlated systems.J
Zhang, H.; Zeng, X.; Li, Z.; Zhou, Y. Hamiltonian-guided autoregressive selected-configuration inter- action achieves chemical accuracy in strongly correlated systems.J. Chem. Theory Comput.2025,21, 12622–12633
2025
-
[39]
Kool, W.; van Hoof, H.; Welling, M. Stochastic beams and where to find them: The Gumbel-top-k trick for sampling sequences without replacement. Proceedings of the 36th International Conference on Machine Learning (ICML). 2019; pp 3499–3508, arXiv:1903.06059
Pith/arXiv arXiv 2019
-
[40]
Thompson, S.; Gunlycke, D. Auto-regressive neural quantum state sampling for selected configuration interaction.arXiv2026, arXiv:2603.24728
-
[41]
MADE: Masked autoencoder for distribution es- timation
Germain, M.; Gregor, K.; Murray, I.; Larochelle, H. MADE: Masked autoencoder for distribution es- timation. Proceedings of the 32nd International Conference on Machine Learning (ICML). 2015; pp 881–889
2015
-
[42]
Accelerating many-body quantum chemistry via generative Transformer-enhanced configuration interaction.J
Kan, B.; Shang, H. Accelerating many-body quantum chemistry via generative Transformer-enhanced configuration interaction.J. Chem. Theory Comput.2025,21, 11989–12000
2025
-
[43]
Barrett, T. D.; Malyshev, A.; Lvovsky, A. I. Autoregressive neural-network wavefunctions forab initio quantum chemistry.Nat. Mach. Intell.2022,4, 351–358, arXiv:2109.12606
arXiv 2022
-
[44]
Deep autoregressive models for the efficient variational simulation of many-body quantum systems.Phys
Sharir, O.; Levine, Y.; Wies, N.; Carleo, G.; Shashua, A. Deep autoregressive models for the efficient variational simulation of many-body quantum systems.Phys. Rev. Lett.2020,124, 020503
2020
-
[45]
Malyshev, A.; Schmitt, M.; Lvovsky, A. I. Neural quantum states and peaked molecular wave functions: Curse or blessing?arXiv2024, arXiv:2408.07625
-
[46]
Epstein, P. S. The Stark effect from the point of view of Schroedinger’s quantum theory.Phys. Rev. 1926,28, 695–710
1926
-
[47]
Nesbet, R. K. Configuration interaction in orbital theories.Proc. R. Soc. A1955,230, 312–321
-
[48]
Vaswani, A.; Shazeer, N.; Parmar, N.; Uszkoreit, J.; Jones, L.; Gomez, A. N.; Kaiser, Ł.; Polosukhin, I. Attention Is All You Need. Advances in Neural Information Processing Systems 30 (NIPS 2017). 2017; arXiv:1706.03762
Pith/arXiv arXiv 2017
-
[49]
von Glehn, I.; Spencer, J. S.; Pfau, D. A self-attention ansatz forab-initioquantum chemistry. The Eleventh International Conference on Learning Representations (ICLR). 2023; arXiv:2211.13672. 15
arXiv 2023
-
[50]
On layer normalization in the Transformer architecture
Xiong, R.; Yang, Y.; He, D.; Zheng, K.; Zheng, S.; Xing, C.; Zhang, H.; Lan, Y.; Wang, L.; Liu, T.-Y. On layer normalization in the Transformer architecture. Proceedings of the 37th International Conference on Machine Learning (ICML). 2020; pp 10524–10533
2020
-
[51]
Davidson, E. R. The iterative calculation of a few of the lowest eigenvalues and corresponding eigenvec- tors of large real-symmetric matrices.J. Comput. Phys.1975,17, 87–94
1975
-
[52]
Sun, Q. et al. Recent developments in the PySCF program package.J. Chem. Phys.2020,153, 024109
2020
-
[53]
Williams, R. J. Simple statistical gradient-following algorithms for connectionist reinforcement learning. Mach. Learn.1992,8, 229–256
1992
-
[54]
Kingma, D. P.; Ba, J. Adam: A method for stochastic optimization.arXiv2014, arXiv:1412.6980
-
[55]
Distributed Implementation of Full Configuration Inter- action for One Trillion Determinants.J
Gao, H.; Imamura, S.; Kasagi, A.; Yoshida, E. Distributed Implementation of Full Configuration Inter- action for One Trillion Determinants.J. Chem. Theory Comput.2024,20, 1185–1192
2024
-
[56]
Zhai, H.; Li, C.; Zhang, X.; Li, Z.; Lee, S.; Chan, G. K.-L. Classical computational simulation of the FeMo-cofactor model to chemical accuracy and its implications.arXiv2026, arXiv:2601.04621
-
[57]
M.; Wecker, D.; Troyer, M
Reiher, M.; Wiebe, N.; Svore, K. M.; Wecker, D.; Troyer, M. Elucidating reaction mechanisms on quantum computers.Proc. Natl. Acad. Sci. USA2017,114, 7555–7560
-
[58]
Large language model scaling laws for neural quantum states in quantum chemistry.Mach
Knitter, O.; Zhao, D.; Leichenauer, S.; Veerapaneni, S. Large language model scaling laws for neural quantum states in quantum chemistry.Mach. Learn.: Sci. Technol.2026,7, 025033, arXiv:2509.12679. 16 TOC Graphic placeholder. Replace this box with \includegraphics{toc_graphic}. JCTC recommended size: 8.5 cm×3.5 cm. 17
arXiv 2026
discussion (0)
Sign in with ORCID, Apple, or X to comment. Anyone can read and Pith papers without signing in.