arxiv: 2604.15768 · v3 · submitted 2026-04-17 · 💻 cs.DC · cs.AI· cs.CE

Recognition: unknown

A Fully GPU-Accelerated Framework for High-Performance Configuration Interaction Selection with Neural Network Quantum States

Daran Sun , Bowen Kan , Haoquan Long , Hairui Zhao , Haoxu Li , Yicheng Liu , Pengyu Zhou , Ankang Feng

show 8 more authors

Wenjing Huang Yida Gu Zhenyu Li Honghui Shang Yunquan Zhang Dingwen Tao Ninghui Sun Guangming Tan

Authors on Pith no claims yet

Pith reviewed 2026-05-10 08:02 UTC · model grok-4.3

classification 💻 cs.DC cs.AIcs.CE

keywords neural network quantum statesselected configuration interactionGPU accelerationdistributed de-duplicationCUDA kernelsquantum many-body systemsparallel scalingchemical accuracy

0 comments

The pith

A fully GPU-accelerated framework removes CPU bottlenecks in neural network quantum state selected configuration interaction and delivers 2.32 times end-to-end speedup on 64 GPUs with unchanged accuracy.

A machine-rendered reading of the paper's core claim, the machinery that carries it, and where it could break.

The paper introduces a complete shift of the NNQS-SCI workflow onto GPUs to solve the Schrödinger equation for larger many-body systems. It replaces centralized CPU de-duplication with a distributed load-balanced GPU version, replaces host-side configuration generation with fine-grained CUDA kernels, and adds a GPU memory runtime using pooling, mini-batch streaming, and overlapped offloading. These changes matter because earlier hybrid CPU-GPU designs created communication and memory walls that limited the size of solvable configuration spaces. If the new design works, researchers can handle bigger molecules and materials at the same chemical accuracy but in far less wall-clock time.

Core claim

The central claim is that integrating a distributed GPU de-duplication algorithm, specialized CUDA kernels for exact coupled-configuration generation, and a GPU-centric memory runtime with pooling and streaming fully eliminates hybrid CPU-GPU bottlenecks, expands reachable configuration spaces, and yields up to 2.32X end-to-end speedup over the prior optimized NNQS-SCI baseline on 64 A100 GPUs while preserving identical chemical accuracy and over 90 percent parallel efficiency under strong scaling.

What carries the argument

Distributed load-balanced global de-duplication on GPUs combined with fine-grained CUDA kernels for coupled-configuration generation, backed by a GPU memory runtime of pooling, streaming mini-batches, and overlapped offloading.

Load-bearing premise

The new distributed GPU de-duplication and CUDA kernels must generate exactly the same set of selected configurations and the same energies as the original CPU-based method, with no numerical differences or missed entries at scale.

What would settle it

Execute both the original CPU NNQS-SCI code and the new GPU framework on an identical molecular system and observe any difference in the final selected configuration list or in the computed ground-state energy.

Figures

Figures reproduced from arXiv: 2604.15768 by Ankang Feng, Bowen Kan, Daran Sun, Dingwen Tao, Guangming Tan, Hairui Zhao, Haoquan Long, Haoxu Li, Honghui Shang, Ninghui Sun, Pengyu Zhou, Wenjing Huang, Yicheng Liu, Yida Gu, Yunquan Zhang, Zhenyu Li.

**Figure 2.** Figure 2: End-to-end GPU-accelerated QiankunNet-cuSCI pipeline with overlapped execution. The workflow consists of (a) GPU-based coupled [PITH_FULL_IMAGE:figures/full_fig_p004_2.png] view at source ↗

**Figure 3.** Figure 3: Workflow of the distributed sort-based de-duplication. By [PITH_FULL_IMAGE:figures/full_fig_p005_3.png] view at source ↗

**Figure 4.** Figure 4: For a given source configurations 𝑑𝑖(with electrons in orbitals p,q) only considers its connections to other configurations. 𝐻𝑖1: single excitation (p → a), 𝐻𝑖 𝑗 : double excitation (pq → ab). Applying Slater-Condon rules to Hamiltonian matrix to get single Excitation Table as 𝑇𝑠𝑖𝑛𝑔𝑙𝑒 and double Excitation Table as 𝑇𝑑𝑜𝑢𝑏𝑙𝑒 . generation, which starves the downstream GPU inference engine due to limited para… view at source ↗

**Figure 5.** Figure 5: GPU-accelerated SCI algorithm with dual screening criteria [PITH_FULL_IMAGE:figures/full_fig_p007_5.png] view at source ↗

**Figure 6.** Figure 6: Staged GPU workflow with overlapped host–device data [PITH_FULL_IMAGE:figures/full_fig_p008_6.png] view at source ↗

**Figure 8.** Figure 8: Step-by-step energy comparison on Cr2 between QiankunNet-cuSCI and QiankunNet-SCI. The discrepancy is quantified by MAE (mean absolute error), RMSE (root mean square error), and Max (maximum absolute error). To assess accuracy at scale, we evaluate the strongly correlated 84-qubit Cr2 system. Since this exceeds QiankunNet’s computational limits, we compare exclusively against QiankunNet-SCI. Figure 8 pr… view at source ↗

**Figure 7.** Figure 7: Subfigures (a)–(f) present the absolute energy errors (with [PITH_FULL_IMAGE:figures/full_fig_p009_7.png] view at source ↗

**Figure 9.** Figure 9: End-to-end execution time breakdown for Cr [PITH_FULL_IMAGE:figures/full_fig_p010_9.png] view at source ↗

**Figure 11.** Figure 11: Non-linear growth of unique configurations in weak scal [PITH_FULL_IMAGE:figures/full_fig_p011_11.png] view at source ↗

**Figure 12.** Figure 12: Comparison of theoretical and QiankunNet-cuSCI mea [PITH_FULL_IMAGE:figures/full_fig_p012_12.png] view at source ↗

read the original abstract

AI-driven methods have demonstrated considerable success in tackling the central challenge of accurately solving the Schr\"odinger equation for complex many-body systems. Among neural network quantum state (NNQS) approaches, the NNQS-SCI (Selected Configuration Interaction) method stands out as a state-of-the-art technique, recognized for its high accuracy and scalability. However, its application to larger systems is severely constrained by a hybrid CPU-GPU architecture. Specifically, centralized CPU-based global de-duplication creates a severe scalability barrier due to communication bottlenecks, while host-resident coupled-configuration generation induces prohibitive computational overheads. We introduce QiankunNet-cuSCI, a fully GPU-accelerated SCI framework designed to overcome these bottlenecks. It first integrates a distributed, load-balanced global de-duplication algorithm to minimize redundancy and communication overhead at scale. To address compute limitations, it employs specialized, fine-grained CUDA kernels for exact coupled configuration generation. Finally, to break the single-GPU memory barrier exposed by this full acceleration, it incorporates a GPU memory-centric runtime featuring GPU-side pooling, streaming mini-batches, and overlapped offloading. This design enables much larger configuration spaces and shifts the bottleneck from host-side limitations back to on-device inference. Our evaluation demonstrates that our work fundamentally expands the scale of solvable problems. On an NVIDIA A100 cluster with 64 GPUs, our work achieves up to 2.32X end-to-end speedup over the highly-optimized NNQS-SCI baseline while preserving the same chemical accuracy. Furthermore, it demonstrates excellent distributed performance, maintaining over 90% parallel efficiency in strong scaling tests.

Editorial analysis

A structured set of objections, weighed in public.

Desk editor's note, referee report, simulated authors' rebuttal, and a circularity audit. Tearing a paper down is the easy half of reading it; the pith above is the substance, this is the friction.

Desk Editor's Note private letter to a colleague

The paper ports NNQS-SCI fully to GPUs with distributed de-duplication and CUDA kernels, delivering reported speedups and scaling that look useful if the accuracy match holds.

read the letter

The main takeaway is that this work moves the entire configuration selection loop in NNQS-SCI onto GPUs. It replaces the old centralized CPU de-duplication with a load-balanced distributed version that runs on the GPUs, adds fine-grained CUDA kernels to generate coupled configurations exactly, and layers on a GPU-centric runtime with pooling, streaming mini-batches, and overlapped offloading to stay within device memory. On 64 A100s this produces up to 2.32X end-to-end speedup over the prior hybrid baseline and keeps above 90% parallel efficiency in strong scaling. Those numbers address the real bottlenecks the abstract identifies: communication in global de-duplication and host-side compute for configuration generation. The combination of these three pieces is new relative to the cited NNQS-SCI literature, and the engineering focus on keeping the working set on-device is a direct response to the memory wall that appears once the CPU parts are removed. The paper therefore gives practitioners a concrete path to larger configuration spaces before inference itself becomes the limit. The softer part is the accuracy preservation claim. The abstract states that chemical accuracy is maintained, but supplies no error bars, no protocol for comparing selected configuration sets, and no discussion of how floating-point or hashing differences were ruled out at 64-GPU scale. For an implementation paper this is the load-bearing assumption, and end-to-end energy alone does not fully close it. If the full text includes direct config-set diffs or small-system equivalence tests, that would tighten the result. This is for groups already running NNQS-SCI who need better throughput on GPU clusters. It is worth sending to peer review because the performance data are specific and the changes target documented bottlenecks, even though the verification section could be expanded.

Referee Report

2 major / 1 minor

Summary. The paper presents QiankunNet-cuSCI, a fully GPU-accelerated framework for NNQS-SCI that replaces centralized CPU de-duplication and host-side configuration generation with a distributed load-balanced GPU de-duplication algorithm, fine-grained CUDA kernels for exact coupled-configuration generation, and a GPU memory-centric runtime using pooling, streaming mini-batches, and overlapped offloading. On a 64-GPU NVIDIA A100 cluster it reports up to 2.32X end-to-end speedup over the optimized NNQS-SCI baseline while preserving chemical accuracy and >90% strong-scaling efficiency.

Significance. If the GPU kernels and distributed de-duplication are shown to produce identical configuration sets and energies to the CPU baseline, the work would meaningfully expand the reachable scale of NNQS-SCI calculations by removing host-side bottlenecks and enabling larger variational spaces on GPU clusters.

major comments (2)

[Abstract and evaluation] The central claims of 2.32X speedup and preserved chemical accuracy rest on the assumption that the distributed GPU de-duplication and CUDA kernels produce exactly the same selected configurations and energies as the original CPU-based NNQS-SCI. No verification protocol, direct configuration-set comparison, checksums, or quantitative metrics (e.g., number of unique configurations, energy differences with error bars) are described to confirm equivalence, especially at 64-GPU scale where load balancing, hashing, or streaming could introduce discrepancies.
[Abstract] The manuscript provides no details on how chemical accuracy is defined or measured (e.g., threshold, reference values, or systems tested), nor any ablation showing that the new GPU components do not alter the variational space or convergence behavior relative to the baseline.

minor comments (1)

[Abstract] The abstract states that the framework 'fundamentally expands the scale of solvable problems' but supplies no quantitative data on the increase in maximum configuration-space size or memory footprint achieved.

Simulated Author's Rebuttal

2 responses · 0 unresolved

We thank the referee for their careful and constructive review of our manuscript. We address each major comment below with clarifications and indicate the specific revisions we will incorporate to improve the rigor and clarity of our claims.

read point-by-point responses

Referee: [Abstract and evaluation] The central claims of 2.32X speedup and preserved chemical accuracy rest on the assumption that the distributed GPU de-duplication and CUDA kernels produce exactly the same selected configurations and energies as the original CPU-based NNQS-SCI. No verification protocol, direct configuration-set comparison, checksums, or quantitative metrics (e.g., number of unique configurations, energy differences with error bars) are described to confirm equivalence, especially at 64-GPU scale where load balancing, hashing, or streaming could introduce discrepancies.

Authors: We agree that explicit verification of numerical equivalence is essential for validating the central claims. Although the distributed de-duplication (Section 3.2) employs consistent global hashing and the CUDA kernels (Section 3.3) implement exact deterministic generation of coupled configurations, the manuscript did not sufficiently detail the verification steps. In the revised version we will add a dedicated verification subsection (new Section 4.3) that describes the protocol: (i) direct set-equality checks via sorted configuration lists and MD5 checksums on the final selected sets, (ii) reporting of the exact number of unique configurations at each GPU count, and (iii) tabulated energy differences (with standard deviations across repeated runs) between the GPU and CPU implementations for all benchmark systems, explicitly including the 64-GPU case. These additions will confirm that load balancing and streaming introduce no discrepancies. revision: yes
Referee: [Abstract] The manuscript provides no details on how chemical accuracy is defined or measured (e.g., threshold, reference values, or systems tested), nor any ablation showing that the new GPU components do not alter the variational space or convergence behavior relative to the baseline.

Authors: We will clarify the definition of chemical accuracy both in the abstract and in a new paragraph of the Introduction: it is defined as an absolute energy error below 1 kcal/mol (0.043 eV) relative to reference values obtained from exact diagonalization (small systems) or CCSD(T) (larger systems). In addition, we will insert an ablation study in the Evaluation section that directly compares the variational spaces produced by the GPU versus CPU pipelines, showing identical selected-configuration sets, identical energy convergence curves, and identical final energies within floating-point tolerance. These results will be presented as tables and plots for the representative molecules used in the scaling experiments. revision: yes

Circularity Check

0 steps flagged

No significant circularity; engineering implementation and benchmarking paper

full rationale

The paper describes a GPU-accelerated framework (QiankunNet-cuSCI) for NNQS-SCI, with central claims resting on measured end-to-end speedups (up to 2.32X) and preserved chemical accuracy via new distributed de-duplication, CUDA kernels, and memory runtime. No mathematical derivation chain, fitted parameters, or predictions are presented that reduce by construction to inputs. The work is benchmarked against an external NNQS-SCI baseline and is self-contained against those empirical results; no self-definitional, self-citation load-bearing, or ansatz-smuggling steps appear in the abstract or described content.

Axiom & Free-Parameter Ledger

0 free parameters · 0 axioms · 0 invented entities

This is an applied high-performance computing paper. No free parameters are fitted to data, no domain-specific axioms beyond standard parallel-computing assumptions are invoked, and no new physical entities are postulated.

pith-pipeline@v0.9.0 · 5651 in / 1057 out tokens · 43346 ms · 2026-05-10T08:02:45.178724+00:00 · methodology

discussion (0)

Reference graph

Works this paper leans on

46 extracted references · 7 canonical work pages

[1]

The GAP Benchmark Suite,

Scott Beamer, Krste Asanovic, and David A. Patterson. 2015. The GAP benchmark suite. arXiv:1508.03619 http://arxiv.org/abs/1508.03619

work page arXiv 2015
[2]

Giuseppe Carleo, Kenny Choo, Damian Hofmann, James ET Smith, Tom Wester- hout, Fabien Alet, Emily J Davis, Stavros Efthymiou, Ivan Glasser, Sheng-Hsuan Lin, et al. 2019. NetKet: A machine learning toolkit for many-body quantum systems.SoftwareX10 (2019), 100311

2019
[3]

Giuseppe Carleo and Matthias Troyer. 2017. Solving the quantum many-body problem with artificial neural networks.Science355, 6325 (2017), 602–606

2017
[4]

Ao Chen and Markus Heyl. 2024. Empowering deep neural quantum states through efficient optimization.Nature Physics20, 9 (2024), 1476–1481

2024
[5]

Samuel Yen-Chi Chen, Chao-Han Huck Yang, Jun Qi, Pin-Yu Chen, Xiaoli Ma, and Hsi-Sheng Goan. 2020. Variational quantum circuits for deep reinforcement learning.IEEE access8 (2020), 141007–141024

2020
[6]

Kenny Choo, Antonio Mezzacapo, and Giuseppe Carleo. 2020. Fermionic neural- network states for ab-initio electronic structure.Nature communications11, 1 (2020), 2368

2020
[7]

Kenny Choo, Titus Neupert, and Giuseppe Carleo. 2019. Two-dimensional frus- trated J 1-J 2 model studied with neural network quantum states.Physical Review B100, 12 (2019), 125124

2019
[8]

Pavlo O Dral. 2020. Quantum chemistry in the age of machine learning.The journal of physical chemistry letters11, 6 (2020), 2336–2347

2020
[9]

Hong Gao, Satoshi Imamura, Akihiko Kasagi, and Eiji Yoshida. 2024. Distributed implementation of full configuration interaction for one trillion determinants. Journal of Chemical Theory and Computation20, 3 (2024), 1185–1192

2024
[10]

Jan Hermann, Zeno Schätzle, and Frank Noé. 2020. Deep-neural-network solution of the electronic Schrödinger equation.Nature Chemistry12, 10 (2020), 891–897

2020
[11]

Jan Hermann, James Spencer, Kenny Choo, Antonio Mezzacapo, W Matthew C Foulkes, David Pfau, Giuseppe Carleo, and Frank Noé. 2023. Ab initio quantum chemistry with neural-network wavefunctions.Nature Reviews Chemistry7, 10 (2023), 692–709

2023
[12]

Le, Yonghui Wu, and Zhifeng Chen

Yanping Huang, Youlong Cheng, Ankur Bapna, Orhan Firat, Dehao Chen, Mia Xu Chen, HyoukJoong Lee, Jiquan Ngiam, Quoc V. Le, Yonghui Wu, and Zhifeng Chen. 2019. GPipe: Efficient Training of Giant Neural Networks using Pipeline Parallelism. InAdvances in Neural Information Processing Systems 32: Annual Conference on Neural Information Processing Systems 2019...

2019
[13]

Bernard Huron, JP Malrieu, and P Rancurel. 1973. Iterative perturbation calcula- tions of ground and excited state energies from multiconfigurational zeroth-order wavefunctions.The Journal of Chemical Physics58, 12 (1973), 5745–5759

1973
[14]

Weile Jia, Han Wang, Mohan Chen, Denghui Lu, Lin Lin, Roberto Car, E Weinan, and Linfeng Zhang. 2020. Pushing the limit of molecular dynamics with ab initio accuracy to 100 million atoms with machine learning. InSC20: International conference for high performance computing, networking, storage and analysis. IEEE, IEEE/ACM, Atlanta, Georgia, USA, 1–14

2020
[15]

Mohammed Abdul Lateef Junaid. 2025. Artificial intelligence driven innovations in biochemistry: A review of emerging research frontiers.Biomolecules and Biomedicine25, 4 (2025), 739

2025
[16]

Bowen Kan, Yumeng Zhou, Daiyou Xie, Pengyu Zhou, Yunquan Zhang, and Honghui Shang. 2025. NNQS-SCI: Tackling Trillion-Dimensional Hilbert Space with Adaptive Neural Network Quantum States. InThe International Conference for High Performance Computing, Networking, Storage and Analysis (SC ’25)(St. Louis, MO, USA). Association for Computing Machinery (ACM), ...

work page doi:10.1145/3712285.3759800 2025
[17]

Peter J Knowles and Nicholas C Handy. 1984. A new determinant-based full configuration interaction method.Chemical physics letters111, 4-5 (1984), 315– 321

1984
[18]

Hannah Lange, Anka Van de Walle, Atiye Abedinnia, and Annabelle Bohrdt. 2024. From architectures to applications: a review of neural quantum states.Quantum Science and Technology9, 4 (2024), 040501

2024
[19]

An-Jun Liu and Bryan K. Clark. 2024. Neural network backflow forab initio quantum chemistry.Physical Review B110, 11 (2024), 115137. doi:10.1103/ PhysRevB.110.115137

2024
[20]

Huan Ma, Honghui Shang, and Jinlong Yang. 2024. Quantum embedding method with transformer neural network quantum states for strongly correlated materials. npj Computational Materials10, 1 (2024), 220

2024
[21]

Matija Medvidović and Javier Robledo Moreno. 2024. Neural-network quantum states for many-body physics.The European Physical Journal Plus139, 7 (2024), 1–26

2024
[22]

Jeppe Olsen, Poul Jørgensen, and John N. Simons. 1990. Passing the one-billion limit in full configuration-interaction (FCI) calculations.Chemical Physics Letters 169, 6 (1990), 463–472. doi:10.1016/0009-2614(90)85633-N

work page doi:10.1016/0009-2614(90)85633-n 1990
[23]

Bryan O’Gorman, Sandy Irani, James Whitfield, and Bill Fefferman. 2022. In- tractability of Electronic Structure in a Fixed Basis.PRX Quantum3, 2 (2022), 020322. doi:10.1103/PRXQuantum.3.020322

work page doi:10.1103/prxquantum.3.020322 2022
[24]

Bryan O’Gorman, Sandy Irani, James Whitfield, and Bill Fefferman. 2022. In- tractability of electronic structure in a fixed basis.PRX Quantum3, 2 (2022), 020322

2022
[25]

David Pfau, James S Spencer, Alexander GDG Matthews, and W Matthew C Foulkes. 2020. Ab initio solution of the many-electron Schrödinger equation with deep neural networks.Physical review research2, 3 (2020), 033429

2020
[26]

Jie Ren, Samyam Rajbhandari, Reza Yazdani Aminabadi, Olatunji Ruwase, Shuangyan Yang, Minjia Zhang, Dong Li, and Yuxiong He. 2021. ZeRO-Offload: Democratizing Billion-Scale Model Training. InUSENIX ATC. USENIX Associa- tion, Virtual Event, 551–564

2021
[27]

2011.Numerical methods for large eigenvalue problems: revised edition

Yousef Saad. 2011.Numerical methods for large eigenvalue problems: revised edition. SIAM, Philadelphia, PA

2011
[28]

Anders W Sandvik. 2010. Computational studies of quantum spin systems. In AIP Conference Proceedings, Vol. 1297. American Institute of Physics, American Institute of Physics, Melville, NY, 135–338

2010
[29]

Markus Schmitt and Markus Heyl. 2020. Quantum many-body dynamics in two dimensions with artificial neural networks.Physical Review Letters125, 10 (2020), 100503

2020
[30]

Ulrich Schollwöck. 2011. The density-matrix renormalization group in the age of matrix product states.Annals of physics326, 1 (2011), 96–192

2011
[31]

Jeffrey B Schriber and Francesco A Evangelista. 2017. Adaptive configuration interaction for computing challenging electronic excited states with tunable accuracy.Journal of chemical theory and computation13, 11 (2017), 5354–5366

2017
[32]

Or Sharir, Yoav Levine, Noam Wies, Giuseppe Carleo, and Amnon Shashua
[33]

doi:10.1103/PhysRevLett.124.020503

Deep Autoregressive Models for the Efficient Variational Simulation of Many-Body Quantum Systems.Physical Review Letters124, 2 (2020), 020503. doi:10.1103/PhysRevLett.124.020503

work page doi:10.1103/physrevlett.124.020503 2020
[34]

Holmes, Guillaume Jeanmairet, Ali Alavi, and C

Sandeep Sharma, Adam A. Holmes, Guillaume Jeanmairet, Ali Alavi, and C. J. Umrigar. 2017. Semistochastic Heat-Bath Configuration Interaction Method: Selected Configuration Interaction with Semistochastic Perturbation Theory. Journal of Chemical Theory and Computation13, 4 (2017), 1595–1604. doi:10.1021/ acs.jctc.6b01028

2017
[35]

2020.Ai for science: Report on the department of energy (doe) town halls on artificial intelligence (ai) for science

Rick Stevens, Valerie Taylor, Jeff Nichols, Arthur Barney Maccabe, Katherine Yelick, and David Brown. 2020.Ai for science: Report on the department of energy (doe) town halls on artificial intelligence (ai) for science. Technical Report. Argonne National Lab.(ANL), Argonne, IL (United States)

2020
[36]

Qiming Sun, Timothy C Berkelbach, Nick S Blunt, George H Booth, Sheng Guo, Zhendong Li, Junzi Liu, James D McClain, Elvira R Sayfutyarova, Sandeep Sharma, et al. 2018. PySCF: the Python-based simulations of chemistry framework.Wiley Interdisciplinary Reviews: Computational Molecular Science8, 1 (2018), e1340

2018
[37]

Norm M Tubman, C Daniel Freeman, Daniel S Levine, Diptarka Hait, Martin Head- Gordon, and K Birgitta Whaley. 2020. Modern approaches to exact diagonalization and selected configuration interaction with the adaptive sampling CI method. Journal of chemical theory and computation16, 4 (2020), 2139–2159

2020
[38]

Ivan S Ufimtsev and Todd J Martinez. 2009. Quantum chemistry on graphical processing units. 3. Analytical energy gradients, geometry optimization, and first principles molecular dynamics.Journal of Chemical Theory and Computation5, 10 (2009), 2619–2628

2009
[39]

Marat Valiev, Eric J Bylaska, Niranjan Govind, Karol Kowalski, Tjerk P Straatsma, Hubertus Johannes Jacobus Van Dam, Dunyou Wang, Jarek Nieplocha, Edoardo Aprà, Theresa L Windus, et al. 2010. NWChem: A comprehensive and scalable open-source solution for large scale molecular simulations.Computer Physics Communications181, 9 (2010), 1477–1489

2010
[40]

Gomez, Lukasz Kaiser, and Illia Polosukhin

Ashish Vaswani, Noam Shazeer, Niki Parmar, Jakob Uszkoreit, Llion Jones, Aidan N. Gomez, Lukasz Kaiser, and Illia Polosukhin. 2017. Attention is All you Need. InAdvances in Neural Information Processing Systems 30: Annual Con- ference on Neural Information Processing Systems 2017, December 4-9, 2017, Long Beach, CA, USA, Isabelle Guyon, Ulrike von Luxburg...

2017
[41]

Filippo Vicentini, Damian Hofmann, Attila Szabó, Dian Wu, Christopher Roth, Clemens Giuliani, Gabriel Pescia, Jannes Nys, Vladimir Vargas-Calderón, Nikita Astrakhantsev, and Giuseppe Carleo. 2022. NetKet 3: Machine Learning Toolbox for Many-Body Quantum Systems.SciPost Phys. Codebases(2022), 7. doi:10. 21468/SciPostPhysCodeb.7

2022
[42]

Davidson, Yuechao Pan, Yuduo Wu, Andy Riffel, and John D

Yangzihao Wang, Andrew A. Davidson, Yuechao Pan, Yuduo Wu, Andy Riffel, and John D. Owens. 2016. Gunrock: a high-performance graph processing library on the GPU. InPPoPP. ACM, Barcelona, Spain, 11:1–11:12

2016
[43]

James Daniel Whitfield, Peter John Love, and Alán Aspuru-Guzik. 2013. Compu- tational complexity in electronic structure.Physical Chemistry Chemical Physics 15, 2 (2013), 397–411

2013
[44]

Yangjun Wu, Chu Guo, Yi Fan, Pengyu Zhou, and Honghui Shang. 2023. NNQS- Transformer: an Efficient and Scalable Neural Network Quantum States Approach for Ab initio Quantum Chemistry. InProceedings of the International Conference for High Performance Computing, Networking, Storage and Analysis(Denver, CO, USA)(SC ’23). Association for Computing Machinery,...

work page doi:10.1145/3581784.3607061 2023
[45]

Tianchen Zhao, James Stokes, and Shravan Veerapaneni. 2023. Scalable neural quantum states architecture for quantum chemistry.Machine Learning: Science and Technology4, 2 (2023), 025034

2023
[46]

Xuncheng Zhao, Mingfan Li, Qian Xiao, Junshi Chen, Fei Wang, Li Shen, Mei- jia Zhao, Wenhao Wu, Hong An, Lixin He, and Xiao Liang. 2022. AI for Quantum Mechanics: High Performance Quantum Many-Body Simulations via Deep Learning. InSC22: International Conference for High Performance Computing, Networking, Storage and Analysis. IEEE, Dallas, TX, USA, 1–15. ...

work page Pith review doi:10.1109/sc41404.2022.00053 2022