pith. machine review for the scientific record. sign in

arxiv: 2604.15768 · v3 · submitted 2026-04-17 · 💻 cs.DC · cs.AI· cs.CE

Recognition: unknown

A Fully GPU-Accelerated Framework for High-Performance Configuration Interaction Selection with Neural Network Quantum States

Authors on Pith no claims yet

Pith reviewed 2026-05-10 08:02 UTC · model grok-4.3

classification 💻 cs.DC cs.AIcs.CE
keywords neural network quantum statesselected configuration interactionGPU accelerationdistributed de-duplicationCUDA kernelsquantum many-body systemsparallel scalingchemical accuracy
0
0 comments X

The pith

A fully GPU-accelerated framework removes CPU bottlenecks in neural network quantum state selected configuration interaction and delivers 2.32 times end-to-end speedup on 64 GPUs with unchanged accuracy.

A machine-rendered reading of the paper's core claim, the machinery that carries it, and where it could break.

The paper introduces a complete shift of the NNQS-SCI workflow onto GPUs to solve the Schrödinger equation for larger many-body systems. It replaces centralized CPU de-duplication with a distributed load-balanced GPU version, replaces host-side configuration generation with fine-grained CUDA kernels, and adds a GPU memory runtime using pooling, mini-batch streaming, and overlapped offloading. These changes matter because earlier hybrid CPU-GPU designs created communication and memory walls that limited the size of solvable configuration spaces. If the new design works, researchers can handle bigger molecules and materials at the same chemical accuracy but in far less wall-clock time.

Core claim

The central claim is that integrating a distributed GPU de-duplication algorithm, specialized CUDA kernels for exact coupled-configuration generation, and a GPU-centric memory runtime with pooling and streaming fully eliminates hybrid CPU-GPU bottlenecks, expands reachable configuration spaces, and yields up to 2.32X end-to-end speedup over the prior optimized NNQS-SCI baseline on 64 A100 GPUs while preserving identical chemical accuracy and over 90 percent parallel efficiency under strong scaling.

What carries the argument

Distributed load-balanced global de-duplication on GPUs combined with fine-grained CUDA kernels for coupled-configuration generation, backed by a GPU memory runtime of pooling, streaming mini-batches, and overlapped offloading.

Load-bearing premise

The new distributed GPU de-duplication and CUDA kernels must generate exactly the same set of selected configurations and the same energies as the original CPU-based method, with no numerical differences or missed entries at scale.

What would settle it

Execute both the original CPU NNQS-SCI code and the new GPU framework on an identical molecular system and observe any difference in the final selected configuration list or in the computed ground-state energy.

Figures

Figures reproduced from arXiv: 2604.15768 by Ankang Feng, Bowen Kan, Daran Sun, Dingwen Tao, Guangming Tan, Hairui Zhao, Haoquan Long, Haoxu Li, Honghui Shang, Ninghui Sun, Pengyu Zhou, Wenjing Huang, Yicheng Liu, Yida Gu, Yunquan Zhang, Zhenyu Li.

Figure 1
Figure 1. Figure 1: NNQS-SCI workflow. Green blocks denote CPU-side pre [PITH_FULL_IMAGE:figures/full_fig_p003_1.png] view at source ↗
Figure 2
Figure 2. Figure 2: End-to-end GPU-accelerated QiankunNet-cuSCI pipeline with overlapped execution. The workflow consists of (a) GPU-based coupled [PITH_FULL_IMAGE:figures/full_fig_p004_2.png] view at source ↗
Figure 3
Figure 3. Figure 3: Workflow of the distributed sort-based de-duplication. By [PITH_FULL_IMAGE:figures/full_fig_p005_3.png] view at source ↗
Figure 4
Figure 4. Figure 4: For a given source configurations 𝑑𝑖(with electrons in or￾bitals p,q) only considers its connections to other configurations. 𝐻𝑖1: single excitation (p → a), 𝐻𝑖 𝑗 : double excitation (pq → ab). Applying Slater-Condon rules to Hamiltonian matrix to get single Excitation Table as 𝑇𝑠𝑖𝑛𝑔𝑙𝑒 and double Excitation Table as 𝑇𝑑𝑜𝑢𝑏𝑙𝑒 . generation, which starves the downstream GPU inference engine due to limited para… view at source ↗
Figure 5
Figure 5. Figure 5: GPU-accelerated SCI algorithm with dual screening criteria [PITH_FULL_IMAGE:figures/full_fig_p007_5.png] view at source ↗
Figure 6
Figure 6. Figure 6: Staged GPU workflow with overlapped host–device data [PITH_FULL_IMAGE:figures/full_fig_p008_6.png] view at source ↗
Figure 8
Figure 8. Figure 8: Step-by-step energy comparison on Cr2 between QiankunNet-cuSCI and QiankunNet-SCI. The discrepancy is quanti￾fied by MAE (mean absolute error), RMSE (root mean square error), and Max (maximum absolute error). To assess accuracy at scale, we evaluate the strongly correlated 84-qubit Cr2 system. Since this exceeds QiankunNet’s computa￾tional limits, we compare exclusively against QiankunNet-SCI. Fig￾ure 8 pr… view at source ↗
Figure 7
Figure 7. Figure 7: Subfigures (a)–(f) present the absolute energy errors (with [PITH_FULL_IMAGE:figures/full_fig_p009_7.png] view at source ↗
Figure 9
Figure 9. Figure 9: End-to-end execution time breakdown for Cr [PITH_FULL_IMAGE:figures/full_fig_p010_9.png] view at source ↗
Figure 11
Figure 11. Figure 11: Non-linear growth of unique configurations in weak scal [PITH_FULL_IMAGE:figures/full_fig_p011_11.png] view at source ↗
Figure 12
Figure 12. Figure 12: Comparison of theoretical and QiankunNet-cuSCI mea [PITH_FULL_IMAGE:figures/full_fig_p012_12.png] view at source ↗
read the original abstract

AI-driven methods have demonstrated considerable success in tackling the central challenge of accurately solving the Schr\"odinger equation for complex many-body systems. Among neural network quantum state (NNQS) approaches, the NNQS-SCI (Selected Configuration Interaction) method stands out as a state-of-the-art technique, recognized for its high accuracy and scalability. However, its application to larger systems is severely constrained by a hybrid CPU-GPU architecture. Specifically, centralized CPU-based global de-duplication creates a severe scalability barrier due to communication bottlenecks, while host-resident coupled-configuration generation induces prohibitive computational overheads. We introduce QiankunNet-cuSCI, a fully GPU-accelerated SCI framework designed to overcome these bottlenecks. It first integrates a distributed, load-balanced global de-duplication algorithm to minimize redundancy and communication overhead at scale. To address compute limitations, it employs specialized, fine-grained CUDA kernels for exact coupled configuration generation. Finally, to break the single-GPU memory barrier exposed by this full acceleration, it incorporates a GPU memory-centric runtime featuring GPU-side pooling, streaming mini-batches, and overlapped offloading. This design enables much larger configuration spaces and shifts the bottleneck from host-side limitations back to on-device inference. Our evaluation demonstrates that our work fundamentally expands the scale of solvable problems. On an NVIDIA A100 cluster with 64 GPUs, our work achieves up to 2.32X end-to-end speedup over the highly-optimized NNQS-SCI baseline while preserving the same chemical accuracy. Furthermore, it demonstrates excellent distributed performance, maintaining over 90% parallel efficiency in strong scaling tests.

Editorial analysis

A structured set of objections, weighed in public.

Desk editor's note, referee report, simulated authors' rebuttal, and a circularity audit. Tearing a paper down is the easy half of reading it; the pith above is the substance, this is the friction.

Referee Report

2 major / 1 minor

Summary. The paper presents QiankunNet-cuSCI, a fully GPU-accelerated framework for NNQS-SCI that replaces centralized CPU de-duplication and host-side configuration generation with a distributed load-balanced GPU de-duplication algorithm, fine-grained CUDA kernels for exact coupled-configuration generation, and a GPU memory-centric runtime using pooling, streaming mini-batches, and overlapped offloading. On a 64-GPU NVIDIA A100 cluster it reports up to 2.32X end-to-end speedup over the optimized NNQS-SCI baseline while preserving chemical accuracy and >90% strong-scaling efficiency.

Significance. If the GPU kernels and distributed de-duplication are shown to produce identical configuration sets and energies to the CPU baseline, the work would meaningfully expand the reachable scale of NNQS-SCI calculations by removing host-side bottlenecks and enabling larger variational spaces on GPU clusters.

major comments (2)
  1. [Abstract and evaluation] The central claims of 2.32X speedup and preserved chemical accuracy rest on the assumption that the distributed GPU de-duplication and CUDA kernels produce exactly the same selected configurations and energies as the original CPU-based NNQS-SCI. No verification protocol, direct configuration-set comparison, checksums, or quantitative metrics (e.g., number of unique configurations, energy differences with error bars) are described to confirm equivalence, especially at 64-GPU scale where load balancing, hashing, or streaming could introduce discrepancies.
  2. [Abstract] The manuscript provides no details on how chemical accuracy is defined or measured (e.g., threshold, reference values, or systems tested), nor any ablation showing that the new GPU components do not alter the variational space or convergence behavior relative to the baseline.
minor comments (1)
  1. [Abstract] The abstract states that the framework 'fundamentally expands the scale of solvable problems' but supplies no quantitative data on the increase in maximum configuration-space size or memory footprint achieved.

Simulated Author's Rebuttal

2 responses · 0 unresolved

We thank the referee for their careful and constructive review of our manuscript. We address each major comment below with clarifications and indicate the specific revisions we will incorporate to improve the rigor and clarity of our claims.

read point-by-point responses
  1. Referee: [Abstract and evaluation] The central claims of 2.32X speedup and preserved chemical accuracy rest on the assumption that the distributed GPU de-duplication and CUDA kernels produce exactly the same selected configurations and energies as the original CPU-based NNQS-SCI. No verification protocol, direct configuration-set comparison, checksums, or quantitative metrics (e.g., number of unique configurations, energy differences with error bars) are described to confirm equivalence, especially at 64-GPU scale where load balancing, hashing, or streaming could introduce discrepancies.

    Authors: We agree that explicit verification of numerical equivalence is essential for validating the central claims. Although the distributed de-duplication (Section 3.2) employs consistent global hashing and the CUDA kernels (Section 3.3) implement exact deterministic generation of coupled configurations, the manuscript did not sufficiently detail the verification steps. In the revised version we will add a dedicated verification subsection (new Section 4.3) that describes the protocol: (i) direct set-equality checks via sorted configuration lists and MD5 checksums on the final selected sets, (ii) reporting of the exact number of unique configurations at each GPU count, and (iii) tabulated energy differences (with standard deviations across repeated runs) between the GPU and CPU implementations for all benchmark systems, explicitly including the 64-GPU case. These additions will confirm that load balancing and streaming introduce no discrepancies. revision: yes

  2. Referee: [Abstract] The manuscript provides no details on how chemical accuracy is defined or measured (e.g., threshold, reference values, or systems tested), nor any ablation showing that the new GPU components do not alter the variational space or convergence behavior relative to the baseline.

    Authors: We will clarify the definition of chemical accuracy both in the abstract and in a new paragraph of the Introduction: it is defined as an absolute energy error below 1 kcal/mol (0.043 eV) relative to reference values obtained from exact diagonalization (small systems) or CCSD(T) (larger systems). In addition, we will insert an ablation study in the Evaluation section that directly compares the variational spaces produced by the GPU versus CPU pipelines, showing identical selected-configuration sets, identical energy convergence curves, and identical final energies within floating-point tolerance. These results will be presented as tables and plots for the representative molecules used in the scaling experiments. revision: yes

Circularity Check

0 steps flagged

No significant circularity; engineering implementation and benchmarking paper

full rationale

The paper describes a GPU-accelerated framework (QiankunNet-cuSCI) for NNQS-SCI, with central claims resting on measured end-to-end speedups (up to 2.32X) and preserved chemical accuracy via new distributed de-duplication, CUDA kernels, and memory runtime. No mathematical derivation chain, fitted parameters, or predictions are presented that reduce by construction to inputs. The work is benchmarked against an external NNQS-SCI baseline and is self-contained against those empirical results; no self-definitional, self-citation load-bearing, or ansatz-smuggling steps appear in the abstract or described content.

Axiom & Free-Parameter Ledger

0 free parameters · 0 axioms · 0 invented entities

This is an applied high-performance computing paper. No free parameters are fitted to data, no domain-specific axioms beyond standard parallel-computing assumptions are invoked, and no new physical entities are postulated.

pith-pipeline@v0.9.0 · 5651 in / 1057 out tokens · 43346 ms · 2026-05-10T08:02:45.178724+00:00 · methodology

discussion (0)

Sign in with ORCID, Apple, or X to comment. Anyone can read and Pith papers without signing in.

Reference graph

Works this paper leans on

46 extracted references · 7 canonical work pages

  1. [1]

    The GAP Benchmark Suite,

    Scott Beamer, Krste Asanovic, and David A. Patterson. 2015. The GAP benchmark suite. arXiv:1508.03619 http://arxiv.org/abs/1508.03619

  2. [2]

    Giuseppe Carleo, Kenny Choo, Damian Hofmann, James ET Smith, Tom Wester- hout, Fabien Alet, Emily J Davis, Stavros Efthymiou, Ivan Glasser, Sheng-Hsuan Lin, et al. 2019. NetKet: A machine learning toolkit for many-body quantum systems.SoftwareX10 (2019), 100311

  3. [3]

    Giuseppe Carleo and Matthias Troyer. 2017. Solving the quantum many-body problem with artificial neural networks.Science355, 6325 (2017), 602–606

  4. [4]

    Ao Chen and Markus Heyl. 2024. Empowering deep neural quantum states through efficient optimization.Nature Physics20, 9 (2024), 1476–1481

  5. [5]

    Samuel Yen-Chi Chen, Chao-Han Huck Yang, Jun Qi, Pin-Yu Chen, Xiaoli Ma, and Hsi-Sheng Goan. 2020. Variational quantum circuits for deep reinforcement learning.IEEE access8 (2020), 141007–141024

  6. [6]

    Kenny Choo, Antonio Mezzacapo, and Giuseppe Carleo. 2020. Fermionic neural- network states for ab-initio electronic structure.Nature communications11, 1 (2020), 2368

  7. [7]

    Kenny Choo, Titus Neupert, and Giuseppe Carleo. 2019. Two-dimensional frus- trated J 1-J 2 model studied with neural network quantum states.Physical Review B100, 12 (2019), 125124

  8. [8]

    Pavlo O Dral. 2020. Quantum chemistry in the age of machine learning.The journal of physical chemistry letters11, 6 (2020), 2336–2347

  9. [9]

    Hong Gao, Satoshi Imamura, Akihiko Kasagi, and Eiji Yoshida. 2024. Distributed implementation of full configuration interaction for one trillion determinants. Journal of Chemical Theory and Computation20, 3 (2024), 1185–1192

  10. [10]

    Jan Hermann, Zeno Schätzle, and Frank Noé. 2020. Deep-neural-network solution of the electronic Schrödinger equation.Nature Chemistry12, 10 (2020), 891–897

  11. [11]

    Jan Hermann, James Spencer, Kenny Choo, Antonio Mezzacapo, W Matthew C Foulkes, David Pfau, Giuseppe Carleo, and Frank Noé. 2023. Ab initio quantum chemistry with neural-network wavefunctions.Nature Reviews Chemistry7, 10 (2023), 692–709

  12. [12]

    Le, Yonghui Wu, and Zhifeng Chen

    Yanping Huang, Youlong Cheng, Ankur Bapna, Orhan Firat, Dehao Chen, Mia Xu Chen, HyoukJoong Lee, Jiquan Ngiam, Quoc V. Le, Yonghui Wu, and Zhifeng Chen. 2019. GPipe: Efficient Training of Giant Neural Networks using Pipeline Parallelism. InAdvances in Neural Information Processing Systems 32: Annual Conference on Neural Information Processing Systems 2019...

  13. [13]

    Bernard Huron, JP Malrieu, and P Rancurel. 1973. Iterative perturbation calcula- tions of ground and excited state energies from multiconfigurational zeroth-order wavefunctions.The Journal of Chemical Physics58, 12 (1973), 5745–5759

  14. [14]

    Weile Jia, Han Wang, Mohan Chen, Denghui Lu, Lin Lin, Roberto Car, E Weinan, and Linfeng Zhang. 2020. Pushing the limit of molecular dynamics with ab initio accuracy to 100 million atoms with machine learning. InSC20: International conference for high performance computing, networking, storage and analysis. IEEE, IEEE/ACM, Atlanta, Georgia, USA, 1–14

  15. [15]

    Mohammed Abdul Lateef Junaid. 2025. Artificial intelligence driven innovations in biochemistry: A review of emerging research frontiers.Biomolecules and Biomedicine25, 4 (2025), 739

  16. [16]

    Bowen Kan, Yumeng Zhou, Daiyou Xie, Pengyu Zhou, Yunquan Zhang, and Honghui Shang. 2025. NNQS-SCI: Tackling Trillion-Dimensional Hilbert Space with Adaptive Neural Network Quantum States. InThe International Conference for High Performance Computing, Networking, Storage and Analysis (SC ’25)(St. Louis, MO, USA). Association for Computing Machinery (ACM), ...

  17. [17]

    Peter J Knowles and Nicholas C Handy. 1984. A new determinant-based full configuration interaction method.Chemical physics letters111, 4-5 (1984), 315– 321

  18. [18]

    Hannah Lange, Anka Van de Walle, Atiye Abedinnia, and Annabelle Bohrdt. 2024. From architectures to applications: a review of neural quantum states.Quantum Science and Technology9, 4 (2024), 040501

  19. [19]

    An-Jun Liu and Bryan K. Clark. 2024. Neural network backflow forab initio quantum chemistry.Physical Review B110, 11 (2024), 115137. doi:10.1103/ PhysRevB.110.115137

  20. [20]

    Huan Ma, Honghui Shang, and Jinlong Yang. 2024. Quantum embedding method with transformer neural network quantum states for strongly correlated materials. npj Computational Materials10, 1 (2024), 220

  21. [21]

    Matija Medvidović and Javier Robledo Moreno. 2024. Neural-network quantum states for many-body physics.The European Physical Journal Plus139, 7 (2024), 1–26

  22. [22]

    Jeppe Olsen, Poul Jørgensen, and John N. Simons. 1990. Passing the one-billion limit in full configuration-interaction (FCI) calculations.Chemical Physics Letters 169, 6 (1990), 463–472. doi:10.1016/0009-2614(90)85633-N

  23. [23]

    Bryan O’Gorman, Sandy Irani, James Whitfield, and Bill Fefferman. 2022. In- tractability of Electronic Structure in a Fixed Basis.PRX Quantum3, 2 (2022), 020322. doi:10.1103/PRXQuantum.3.020322

  24. [24]

    Bryan O’Gorman, Sandy Irani, James Whitfield, and Bill Fefferman. 2022. In- tractability of electronic structure in a fixed basis.PRX Quantum3, 2 (2022), 020322

  25. [25]

    David Pfau, James S Spencer, Alexander GDG Matthews, and W Matthew C Foulkes. 2020. Ab initio solution of the many-electron Schrödinger equation with deep neural networks.Physical review research2, 3 (2020), 033429

  26. [26]

    Jie Ren, Samyam Rajbhandari, Reza Yazdani Aminabadi, Olatunji Ruwase, Shuangyan Yang, Minjia Zhang, Dong Li, and Yuxiong He. 2021. ZeRO-Offload: Democratizing Billion-Scale Model Training. InUSENIX ATC. USENIX Associa- tion, Virtual Event, 551–564

  27. [27]

    2011.Numerical methods for large eigenvalue problems: revised edition

    Yousef Saad. 2011.Numerical methods for large eigenvalue problems: revised edition. SIAM, Philadelphia, PA

  28. [28]

    Anders W Sandvik. 2010. Computational studies of quantum spin systems. In AIP Conference Proceedings, Vol. 1297. American Institute of Physics, American Institute of Physics, Melville, NY, 135–338

  29. [29]

    Markus Schmitt and Markus Heyl. 2020. Quantum many-body dynamics in two dimensions with artificial neural networks.Physical Review Letters125, 10 (2020), 100503

  30. [30]

    Ulrich Schollwöck. 2011. The density-matrix renormalization group in the age of matrix product states.Annals of physics326, 1 (2011), 96–192

  31. [31]

    Jeffrey B Schriber and Francesco A Evangelista. 2017. Adaptive configuration interaction for computing challenging electronic excited states with tunable accuracy.Journal of chemical theory and computation13, 11 (2017), 5354–5366

  32. [32]

    Or Sharir, Yoav Levine, Noam Wies, Giuseppe Carleo, and Amnon Shashua

  33. [33]

    doi:10.1103/PhysRevLett.124.020503

    Deep Autoregressive Models for the Efficient Variational Simulation of Many-Body Quantum Systems.Physical Review Letters124, 2 (2020), 020503. doi:10.1103/PhysRevLett.124.020503

  34. [34]

    Holmes, Guillaume Jeanmairet, Ali Alavi, and C

    Sandeep Sharma, Adam A. Holmes, Guillaume Jeanmairet, Ali Alavi, and C. J. Umrigar. 2017. Semistochastic Heat-Bath Configuration Interaction Method: Selected Configuration Interaction with Semistochastic Perturbation Theory. Journal of Chemical Theory and Computation13, 4 (2017), 1595–1604. doi:10.1021/ acs.jctc.6b01028

  35. [35]

    2020.Ai for science: Report on the department of energy (doe) town halls on artificial intelligence (ai) for science

    Rick Stevens, Valerie Taylor, Jeff Nichols, Arthur Barney Maccabe, Katherine Yelick, and David Brown. 2020.Ai for science: Report on the department of energy (doe) town halls on artificial intelligence (ai) for science. Technical Report. Argonne National Lab.(ANL), Argonne, IL (United States)

  36. [36]

    Qiming Sun, Timothy C Berkelbach, Nick S Blunt, George H Booth, Sheng Guo, Zhendong Li, Junzi Liu, James D McClain, Elvira R Sayfutyarova, Sandeep Sharma, et al. 2018. PySCF: the Python-based simulations of chemistry framework.Wiley Interdisciplinary Reviews: Computational Molecular Science8, 1 (2018), e1340

  37. [37]

    Norm M Tubman, C Daniel Freeman, Daniel S Levine, Diptarka Hait, Martin Head- Gordon, and K Birgitta Whaley. 2020. Modern approaches to exact diagonalization and selected configuration interaction with the adaptive sampling CI method. Journal of chemical theory and computation16, 4 (2020), 2139–2159

  38. [38]

    Ivan S Ufimtsev and Todd J Martinez. 2009. Quantum chemistry on graphical processing units. 3. Analytical energy gradients, geometry optimization, and first principles molecular dynamics.Journal of Chemical Theory and Computation5, 10 (2009), 2619–2628

  39. [39]

    Marat Valiev, Eric J Bylaska, Niranjan Govind, Karol Kowalski, Tjerk P Straatsma, Hubertus Johannes Jacobus Van Dam, Dunyou Wang, Jarek Nieplocha, Edoardo Aprà, Theresa L Windus, et al. 2010. NWChem: A comprehensive and scalable open-source solution for large scale molecular simulations.Computer Physics Communications181, 9 (2010), 1477–1489

  40. [40]

    Gomez, Lukasz Kaiser, and Illia Polosukhin

    Ashish Vaswani, Noam Shazeer, Niki Parmar, Jakob Uszkoreit, Llion Jones, Aidan N. Gomez, Lukasz Kaiser, and Illia Polosukhin. 2017. Attention is All you Need. InAdvances in Neural Information Processing Systems 30: Annual Con- ference on Neural Information Processing Systems 2017, December 4-9, 2017, Long Beach, CA, USA, Isabelle Guyon, Ulrike von Luxburg...

  41. [41]

    Filippo Vicentini, Damian Hofmann, Attila Szabó, Dian Wu, Christopher Roth, Clemens Giuliani, Gabriel Pescia, Jannes Nys, Vladimir Vargas-Calderón, Nikita Astrakhantsev, and Giuseppe Carleo. 2022. NetKet 3: Machine Learning Toolbox for Many-Body Quantum Systems.SciPost Phys. Codebases(2022), 7. doi:10. 21468/SciPostPhysCodeb.7

  42. [42]

    Davidson, Yuechao Pan, Yuduo Wu, Andy Riffel, and John D

    Yangzihao Wang, Andrew A. Davidson, Yuechao Pan, Yuduo Wu, Andy Riffel, and John D. Owens. 2016. Gunrock: a high-performance graph processing library on the GPU. InPPoPP. ACM, Barcelona, Spain, 11:1–11:12

  43. [43]

    James Daniel Whitfield, Peter John Love, and Alán Aspuru-Guzik. 2013. Compu- tational complexity in electronic structure.Physical Chemistry Chemical Physics 15, 2 (2013), 397–411

  44. [44]

    Yangjun Wu, Chu Guo, Yi Fan, Pengyu Zhou, and Honghui Shang. 2023. NNQS- Transformer: an Efficient and Scalable Neural Network Quantum States Approach for Ab initio Quantum Chemistry. InProceedings of the International Conference for High Performance Computing, Networking, Storage and Analysis(Denver, CO, USA)(SC ’23). Association for Computing Machinery,...

  45. [45]

    Tianchen Zhao, James Stokes, and Shravan Veerapaneni. 2023. Scalable neural quantum states architecture for quantum chemistry.Machine Learning: Science and Technology4, 2 (2023), 025034

  46. [46]

    Xuncheng Zhao, Mingfan Li, Qian Xiao, Junshi Chen, Fei Wang, Li Shen, Mei- jia Zhao, Wenhao Wu, Hong An, Lixin He, and Xiao Liang. 2022. AI for Quantum Mechanics: High Performance Quantum Many-Body Simulations via Deep Learning. InSC22: International Conference for High Performance Computing, Networking, Storage and Analysis. IEEE, Dallas, TX, USA, 1–15. ...