Towards exascale fully relativistic pseudopotential density functional theory calculations enabled by mixed-precision computation and compressed-communication using residual based subspace iteration

Gourab Panigrahi; Kartick Ramakrishnan; Nikhil Kodali; Nishant Gupta; Phani Motamarri; Rudra Panch; Sambit Das; Sundaresan G; Vishwas Rao

arxiv: 2605.30128 · v1 · pith:YDAPLWEGnew · submitted 2026-05-28 · ❄️ cond-mat.mtrl-sci

Towards exascale fully relativistic pseudopotential density functional theory calculations enabled by mixed-precision computation and compressed-communication using residual based subspace iteration

Nikhil Kodali , Gourab Panigrahi , Nishant Gupta , Kartick Ramakrishnan , Sundaresan G , Rudra Panch , Sambit Das , Vishwas Rao

show 1 more author

Phani Motamarri

This is my paper

Pith reviewed 2026-06-29 06:29 UTC · model grok-4.3

classification ❄️ cond-mat.mtrl-sci

keywords density functional theorynoncollinear magnetismspin-orbit couplingexascale computingmixed precisionpseudopotentialsubspace iterationfinite element method

0 comments

The pith

A residual-based subspace iteration method combined with mixed-precision arithmetic and compressed communication enables fully relativistic DFT simulations of up to 100,000 electrons on exascale systems.

A machine-rendered reading of the paper's core claim, the machinery that carries it, and where it could break.

The paper develops a GPU-centric framework for density functional theory calculations that include noncollinear magnetism and spin-orbit coupling. These effects produce large, complex eigenproblems that normally limit system size. The approach uses a residual-based Chebyshev filtered subspace iteration that tolerates inexact matrix-vector products, allowing mixed-precision computation and block floating-point compressed MPI communication at ratios above 4x. This reduces both floating-point work and data movement while retaining the robustness of double-precision results. Numerical tests show better time-to-solution and strong scaling, reaching systems with 100,000 electrons.

Core claim

The residual-based Chebyshev filtered subspace iteration (R-ChFSI) remains stable under inexact matrix-vector products, which in turn permits a combination of mixed-precision arithmetic and block floating-point compressed communication that preserves double-precision accuracy for noncollinear SOC eigenproblems while cutting compute and communication costs enough to reach exascale performance.

What carries the argument

Residual-based Chebyshev filtered subspace iteration (R-ChFSI), which solves the sparse generalized eigenproblem arising from finite-element discretization of the NC-SOC Kohn-Sham equations while tolerating reduced-precision operations.

If this is right

Fully relativistic pseudopotential DFT becomes feasible for systems an order of magnitude larger than current practical limits.
Time-to-solution for noncollinear SOC calculations decreases because both arithmetic and MPI communication volumes are reduced.
The same R-ChFSI tolerance to inexact products can be reused with other sparse eigensolvers that appear in finite-element DFT.
Band-partitioning combined with compressed communication improves weak and strong scaling on GPU-based exascale machines.

Where Pith is reading between the lines

These are editorial extensions of the paper, not claims the author makes directly.

The same tolerance property could be tested in other quantum-chemistry packages that solve generalized eigenproblems with iterative subspace methods.
If the compression scheme generalizes, similar mixed-precision strategies might apply to time-dependent DFT or response calculations that also involve large sparse operators.
The approach suggests that future hardware supporting even lower-precision formats could further accelerate relativistic DFT without new algorithmic changes.

Load-bearing premise

The residual-based Chebyshev filtered subspace iteration stays accurate and convergent even when matrix-vector products are performed in lower precision or with compressed data.

What would settle it

A direct comparison on a benchmark NC-SOC system showing that the mixed-precision compressed run produces eigenvalues or total energies that differ from a full double-precision reference by more than the accepted DFT tolerance.

Figures

Figures reproduced from arXiv: 2605.30128 by Gourab Panigrahi, Kartick Ramakrishnan, Nikhil Kodali, Nishant Gupta, Phani Motamarri, Rudra Panch, Sambit Das, Sundaresan G, Vishwas Rao.

**Figure 2.** Figure 2: Domain decomposition of the simulation domain, where each color denotes a distinct MPI rank. Arrows denote P2P nearest-neighbour communication across partition boundaries: ghost values, originally in FP32, are compressed on the sending GPU to a chosen bits-per-value rate, transmitted as a fixed-size byte stream, and decompressed on the receiving GPU. To this end, we adopt a block floating-point (BFP) repr… view at source ↗

**Figure 3.** Figure 3: Compression is performed at the granularity of one 4-value FP32 block per thread. Each block is packed into mbits = 4 × bpv bits: one shared biased exponent (8 bits) and four signed vbits = (mbits − 8)/4-bit coefficients. The compressed stream is laid out contiguously in thread/rank order with fixed-size slices, enabling exact byte offsets and atomic-free writes for the common rates bpv ∈ {16, 12, 10, 8}. … view at source ↗

**Figure 4.** Figure 4: Band-partitioning of 20 processing elements (GPUs) into 2D [PITH_FULL_IMAGE:figures/full_fig_p008_4.png] view at source ↗

**Figure 5.** Figure 5: Throughput (DOFs/s) comparison between the proposed [PITH_FULL_IMAGE:figures/full_fig_p009_5.png] view at source ↗

**Figure 6.** Figure 6: Speedup of Chebyshev filtering (CF) using mixed-precision [PITH_FULL_IMAGE:figures/full_fig_p010_6.png] view at source ↗

read the original abstract

Noncollinear (NC) magnetism and spin-orbit coupling (SOC) are indispensable for predictive ab initio materials simulations with pronounced relativistic effects and magnetic frustration, yet they significantly increase the cost of cubic-scaling density functional theory (DFT) by introducing complex 2-component wavefunctions per electron and consequently much larger eigenproblems. We present a GPU-centric high-performance framework for NC-SOC DFT that combines: (i) algorithmic advances for solving finite-element (FE) discretized DFT equations; (ii) residual-based Chebyshev filtered subspace iteration (R-ChFSI), tolerant to inexact matrix-vector products, for the resulting sparse generalized eigenproblem; (iii) a matrix-free strategy for accelerating FE Poisson solver; (iv) R-ChFSI-enabled mixed-precision computation with block floating-point compressed MPI communication at compression ratios over 4x, preserving double-precision robustness while reducing compute and data movement costs; and (v) a communication efficient band-partitioning algorithm to improve scalability. Numerical results demonstrate improved time-to-solution and excellent scaling on exascale architectures, enabling fully relativistic pseudopotential DFT simulations of up to 100,000 electrons.

Editorial analysis

A structured set of objections, weighed in public.

Desk editor's note, referee report, simulated authors' rebuttal, and a circularity audit. Tearing a paper down is the easy half of reading it; the pith above is the substance, this is the friction.

Desk Editor's Note private letter to a colleague

The abstract describes a promising mix of R-ChFSI, mixed precision, and compressed communication for scaling NC-SOC DFT to 100k electrons, but supplies no error metrics to confirm the approximations preserve accuracy on the complex eigenproblems.

read the letter

The paper's main contribution is a GPU framework that layers residual-based Chebyshev filtered subspace iteration, matrix-free Poisson, mixed-precision arithmetic, and block-floating-point compressed MPI communication on top of finite-element NC-SOC DFT. The claim is that this combination keeps double-precision robustness while cutting compute and data movement enough to reach 100,000 electrons on exascale machines.

What stands out is the explicit focus on tolerance to inexact matrix-vector products inside the subspace iteration, plus the band-partitioning scheme for communication. Those choices directly target the larger, complex-valued generalized eigenproblems that come with noncollinear spinors. If the tolerance holds, it would be a practical step for materials where SOC and magnetism matter.

The soft spot is exactly where the stress-test note points: the abstract asserts that robustness is preserved and scaling is excellent, yet it gives no eigenvalue residuals, total-energy comparisons against full double-precision runs, or drift numbers at the reported system sizes. Without those checks, it is impossible to judge whether the mixed-precision and >4x compression steps actually leave the physics intact for the 2-component case. That gap is not minor; it is load-bearing for the headline result.

This is the kind of work that people building production DFT codes would want to see, provided the full manuscript includes the missing verification data. A serious referee could check the implementation details and the numerical evidence in one pass. I would send it to review rather than desk-reject, but only because the algorithmic direction is worth testing; the current abstract alone does not yet support the accuracy claim.

Referee Report

2 major / 1 minor

Summary. The manuscript presents a GPU-centric high-performance framework for noncollinear spin-orbit coupled (NC-SOC) pseudopotential DFT based on finite-element discretization. It introduces residual-based Chebyshev filtered subspace iteration (R-ChFSI) claimed to tolerate inexact matrix-vector products, combined with mixed-precision arithmetic, block floating-point compressed MPI communication (>4x ratio), a matrix-free Poisson solver, and band-partitioning. The central claim is that these enable fully relativistic DFT simulations of up to 100,000 electrons on exascale machines with improved time-to-solution, excellent scaling, and preserved double-precision robustness for the complex 2-component eigenproblems.

Significance. If the tolerance of R-ChFSI to the mixed-precision and compressed-communication approximations is shown to hold without degrading physical accuracy for NC-SOC systems, the work would enable previously inaccessible large-scale relativistic materials simulations. The algorithmic focus on inexact operations and communication reduction directly targets exascale bottlenecks in cubic-scaling DFT. Credit is due for targeting preservation of robustness rather than raw speed alone.

major comments (2)

[Numerical Results] Numerical Results section: The claim that the mixed-precision plus >4x compressed-communication scheme 'preserves double-precision robustness' for NC-SOC eigenproblems is not supported by any reported quantitative metrics (eigenvalue residuals, total-energy drift, or direct comparison to full double-precision reference calculations) at the largest system sizes (~100,000 electrons). This evidence gap is load-bearing for the headline claim of accurate exascale simulations.
[R-ChFSI description] R-ChFSI description (likely §3): While tolerance to inexact matvecs is asserted for the residual-based Chebyshev filter, no analysis, error bound, or numerical test is provided demonstrating stability specifically for the larger, complex-valued generalized eigenproblems that arise from 2-component spinors under noncollinear SOC (as opposed to collinear or scalar-relativistic cases).

minor comments (1)

[Abstract] Abstract: The statement 'Numerical results demonstrate...' does not cite the specific figures or tables that contain the scaling and accuracy data.

Simulated Author's Rebuttal

2 responses · 0 unresolved

We thank the referee for the constructive comments and for recognizing the potential impact of our framework on exascale NC-SOC DFT simulations. We address each major comment below.

read point-by-point responses

Referee: [Numerical Results] Numerical Results section: The claim that the mixed-precision plus >4x compressed-communication scheme 'preserves double-precision robustness' for NC-SOC eigenproblems is not supported by any reported quantitative metrics (eigenvalue residuals, total-energy drift, or direct comparison to full double-precision reference calculations) at the largest system sizes (~100,000 electrons). This evidence gap is load-bearing for the headline claim of accurate exascale simulations.

Authors: We agree that direct quantitative metrics at the absolute largest scales would strengthen the robustness claim. The current manuscript validates accuracy on representative smaller systems and reports scaling/time-to-solution up to 100k electrons, but does not include side-by-side double-precision references at the largest sizes (which are memory-prohibitive). In the revised version we will add an expanded table in the Numerical Results section with eigenvalue residuals, total-energy drift, and available higher-precision comparisons for the largest feasible systems, together with a brief discussion of why full double-precision runs become impractical. revision: yes
Referee: [R-ChFSI description] R-ChFSI description (likely §3): While tolerance to inexact matvecs is asserted for the residual-based Chebyshev filter, no analysis, error bound, or numerical test is provided demonstrating stability specifically for the larger, complex-valued generalized eigenproblems that arise from 2-component spinors under noncollinear SOC (as opposed to collinear or scalar-relativistic cases).

Authors: The residual-based formulation of R-ChFSI is designed to adapt to the spectrum of the generalized eigenproblem regardless of whether the matrices are real or complex. Nevertheless, we acknowledge that the manuscript does not contain an explicit stability discussion or dedicated test isolating the NC-SOC (complex 2-component) case. In the revision we will expand the R-ChFSI description in §3 with a short error-bound sketch applicable to complex Hermitian generalized eigenproblems and add a numerical test comparing filter convergence for NC-SOC versus scalar-relativistic discretizations. revision: yes

Circularity Check

0 steps flagged

No circularity detected in performance and scaling claims

full rationale

The paper describes algorithmic and implementation choices (R-ChFSI, mixed-precision, compressed communication, band-partitioning) whose consequences are measured as empirical time-to-solution and scaling results on exascale hardware. No derivation chain reduces a claimed prediction to a fitted parameter or self-citation by construction; the numerical demonstrations are presented as outcomes of the listed techniques rather than being defined in terms of themselves. The work is self-contained against external benchmarks of wall-clock performance.

Axiom & Free-Parameter Ledger

0 free parameters · 0 axioms · 0 invented entities

Abstract-only review; no access to full text prevents identification of free parameters, axioms, or invented entities.

pith-pipeline@v0.9.1-grok · 5779 in / 1254 out tokens · 29705 ms · 2026-06-29T06:29:03.808012+00:00 · methodology

discussion (0)

Reference graph

Works this paper leans on

56 extracted references · 49 canonical work pages

[1]

Large-scale electronic structure calculations of high-Z metals on the BlueGene/L platform,

F. Gygi, E. W. Draeger, M. Schulz, B. R. De Supinski, J. A. Gunnels, V . Austel, J. C. Sexton, F. Franchetti, S. Kral, C. W. Ueberhuber et al., “Large-scale electronic structure calculations of high-Z metals on the BlueGene/L platform,” inProceedings of the 2006 ACM/IEEE conference on Supercomputing, 2006, pp. 45–es. [Online]. Available: https://doi.org/1...

work page doi:10.1145/1188455.1188504 2006
[2]

New algorithm to enable 400+ TFlop/s sustained performance in simulations of disorder effects in high-T c superconductors,

G. Alvarez, M. S. Summers, D. E. Maxwell, M. Eisenbach, J. S. Meredith, J. M. Larkin, J. Levesque, T. A. Maier, P. R. Kent, E. F. D’Azevedoet al., “New algorithm to enable 400+ TFlop/s sustained performance in simulations of disorder effects in high-T c superconductors,” inSC’08: Proceedings of the 2008 ACM/IEEE Conference on Supercomputing. IEEE, 2008, p...

work page doi:10.1109/sc.2008.5214359 2008
[3]

First-principles calculations of electron states of a silicon nanowire with 100,000 atoms on the K computer,

Y . Hasegawa, J.-I. Iwata, M. Tsuji, D. Takahashi, A. Oshiyama, K. Minami, T. Boku, F. Shoji, A. Uno, M. Kurokawaet al., “First-principles calculations of electron states of a silicon nanowire with 100,000 atoms on the K computer,” inProceedings of 2011 International Conference for High Performance Computing, Networking, Storage and Analysis, 2011, pp. 1–...

work page doi:10.1145/2063384.2063386 2011
[4]

A data-centric approach to extreme-scale ab initio dissipative quantum transport simulations,

A. N. Ziogas, T. Ben-Nun, G. I. Fern ´andez, T. Schneider, M. Luisier, and T. Hoefler, “A data-centric approach to extreme-scale ab initio dissipative quantum transport simulations,” inProceedings of the International Conference for High Performance Computing, Networking, Storage and Analysis, 2019, pp. 1–13. [Online]. Available: https://doi.org/10.1145/3...

work page doi:10.1145/3295500.3356156 2019
[5]

Large-scale materials modeling at quantum accuracy: Ab initio simulations of quasicrystals and interacting extended defects in metallic alloys,

S. Das, B. Kanungo, V . Subramanian, G. Panigrahi, P. Motamarri, D. Rogers, P. Zimmerman, and V . Gavini, “Large-scale materials modeling at quantum accuracy: Ab initio simulations of quasicrystals and interacting extended defects in metallic alloys,” inProceedings of the International Conference for High Performance Computing, Networking, Storage and Ana...

work page doi:10.1145/3581784.3627037 2023
[6]

Modeling dilute solutions using first-principles molecular dynamics: computing more than a million atoms with over a million cores,

J.-L. Fattebert, D. Osei-Kuffuor, E. W. Draeger, T. Ogitsu, and W. D. Krauss, “Modeling dilute solutions using first-principles molecular dynamics: computing more than a million atoms with over a million cores,” inSC’16: Proceedings of the International Conference for High Performance Computing, Networking, Storage and Analysis. IEEE, 2016, pp. 12–22. [On...

work page doi:10.1109/sc.2016.2 2016
[7]

Fast, scalable and accurate finite-element based ab initio calculations using mixed precision computing: 46 PFLOPS simulation of a metallic dislocation system,

S. Das, P. Motamarri, V . Gavini, B. Turcksin, Y . W. Li, and B. Leback, “Fast, scalable and accurate finite-element based ab initio calculations using mixed precision computing: 46 PFLOPS simulation of a metallic dislocation system,” inProceedings of the International Conference for High Performance Computing, Networking, Storage and Analysis, 2019, pp. ...

work page doi:10.1145/3295500.3357157 2019
[8]

Inhomogeneous electron gas,

P. Hohenberg and W. Kohn, “Inhomogeneous electron gas,” Phys. Rev., vol. 136, pp. B864–B871, 1964. [Online]. Available: https://doi.org/10.1103/PhysRev.136.B864

work page doi:10.1103/physrev.136.b864 1964
[9]

Kohn and L

W. Kohn and L. J. Sham, “Self-consistent equations including exchange and correlation effects,”Phys. Rev., vol. 140, pp. 1133–1138, 1965. [Online]. Available: https://doi.org/10.1103/PhysRev.140.A1133

work page doi:10.1103/physrev.140.a1133 1965
[10]

[Online]

https://www.nobelprize.org/prizes/chemistry/1998/summary. [Online]. Available: https://www.nobelprize.org/prizes/chemistry/1998/summary

1998
[11]

Linear scaling electronic structure methods,

S. Goedecker, “Linear scaling electronic structure methods,”Rev. Mod. Phys., vol. 71, pp. 1085–1123, Jul 1999. [Online]. Available: https://link.aps.org/doi/10.1103/RevModPhys.71.1085

work page doi:10.1103/revmodphys.71.1085 1999
[12]

Introducing ONETEP: Linear-scaling density functional simulations on parallel computers,

C.-K. Skylaris, P. D. Haynes, A. A. Mostofi, and M. C. Payne, “Introducing ONETEP: Linear-scaling density functional simulations on parallel computers,”J. Chem. Phys., vol. 122, no. 8, p. 084119,
[13]

Available: https://doi.org/10.1063/1.1839852

[Online]. Available: https://doi.org/10.1063/1.1839852

work page doi:10.1063/1.1839852
[14]

Methods in electronic structure calculations,

D. Bowler and T. Miyazaki, “Methods in electronic structure calculations,”Rep. Prog. Phys., vol. 75, no. 3, p. 036503, 2012. [Online]. Available: https://doi.org/10.1088/0034-4885/75/3/036503

work page doi:10.1088/0034-4885/75/3/036503 2012
[15]

Linear-scaling three-dimensional fragment method for large-scale electronic structure calculations,

L.-W. Wang, Z. Zhao, and J. Meza, “Linear-scaling three-dimensional fragment method for large-scale electronic structure calculations,” Phys. Rev. B, vol. 77, no. 16, p. 165113, 2008. [Online]. Available: https://doi.org/10.1103/PhysRevB.77.165113

work page doi:10.1103/physrevb.77.165113 2008
[16]

A scalable method for ab initio computation of free energies in nanoscale systems,

M. Eisenbach, C.-G. Zhou, D. M. Nicholson, G. Brown, J. Larkin, and T. C. Schulthess, “A scalable method for ab initio computation of free energies in nanoscale systems,” inProceedings of the Conference on High Performance Computing Networking, Storage and Analysis, 2009, pp. 1–8. [Online]. Available: https://doi.org/10.1145/1654059.1654062

work page doi:10.1145/1654059.1654062 2009
[17]

Self-averaging stochastic Kohn-Sham density-functional theory,

R. Baer, D. Neuhauser, and E. Rabani, “Self-averaging stochastic Kohn-Sham density-functional theory,”Phys. Rev. Lett., vol. 111, p. 106402, 2013. [Online]. Available: https://doi.org/10.1103/PhysRevLett.111.106402

work page doi:10.1103/physrevlett.111.106402 2013
[18]

Stochastic density functional theory,

M. D. Fabian, B. Shpiro, E. Rabani, D. Neuhauser, and R. Baer, “Stochastic density functional theory,”WIREs Comput. Mol. Sci., vol. 9, no. 6, p. e1412, 2019. [Online]. Available: https://doi.org/10.1002/wcms.1412

work page doi:10.1002/wcms.1412 2019
[19]

DFT-FE 1.0: A massively parallel hybrid CPU-GPU density functional theory code using finite-element discretization,

S. Das, P. Motamarri, V . Subramanian, D. M. Rogers, and V . Gavini, “DFT-FE 1.0: A massively parallel hybrid CPU-GPU density functional theory code using finite-element discretization,”Comput. Phys. Commun., vol. 280, p. 108473, 2022. [Online]. Available: https://doi.org/10.1016/j.cpc.2022.108473

work page doi:10.1016/j.cpc.2022.108473 2022
[20]

Wilkinson and E

J. K ¨ubler, K. H. Hock, J. Sticht, and A. R. Williams, “Density functional theory of non-collinear magnetism,”J. Phys. F: Met. Phys., vol. 18, pp. 469–483, 1988. [Online]. Available: https://doi.org/10.1088/0305- 4608/18/3/018

work page doi:10.1088/0305- 1988
[21]

(6) Hertel, R.SPIN2013,03, 1340009, DOI:10.1142/S2010324713400092

U. von Barth and L. Hedin, “A local exchange-correlation potential for the spin polarized case. I,”J. Phys. C: Solid State Phys., vol. 5, pp. 1629–1642, 1972. [Online]. Available: https://doi.org/10.1088/0022- 3719/5/13/012

work page doi:10.1088/0022- 1972
[22]

DFT-FE–a massively parallel adaptive finite-element code for large-scale density functional theory calculations,

P. Motamarri, S. Das, S. Rudraraju, K. Ghosh, D. Davydov, and V . Gavini, “DFT-FE–a massively parallel adaptive finite-element code for large-scale density functional theory calculations,”Comput. Phys. Commun., vol. 246, p. 106853, 2020. [Online]. Available: https://doi.org/10.1016/j.cpc.2019.07.016

work page doi:10.1016/j.cpc.2019.07.016 2020
[23]

Optimized norm-conserving Vanderbilt pseudopoten- tials,

D. R. Hamann, “Optimized norm-conserving Vanderbilt pseudopoten- tials,”Phys. Rev. B, vol. 88, p. 085117, Aug 2013. [Online]. Available: https://doi.org/10.1103/PhysRevB.88.085117

work page doi:10.1103/physrevb.88.085117 2013
[24]

Spin-orbit coupling with ultrasoft pseudopotentials: Application to Au and Pt,

A. Dal Corso and A. M. Conte, “Spin-orbit coupling with ultrasoft pseudopotentials: Application to Au and Pt,”Phys. Rev. B, vol. 71, p. 115106, 2005. [Online]. Available: https://doi.org/10.1103/PhysRevB.71.115106

work page doi:10.1103/physrevb.71.115106 2005
[25]

Kresse, J

G. Kresse and J. Furthm ¨uller, “Efficient iterative schemes for ab initio total-energy calculations using a plane-wave basis set,”Phys. Rev. B, vol. 54, no. 16, p. 11169, 1996. [Online]. Available: https://doi.org/10.1103/PhysRevB.54.11169

work page doi:10.1103/physrevb.54.11169 1996
[26]

Advanced capabilities for materials modelling with Quantum ESPRESSO,

P. Giannozzi, O. Andreussi, T. Brumme, O. Bunau, M. B. Nardelli, M. Calandra, R. Car, C. Cavazzoni, D. Ceresoli, M. Cococcioni et al., “Advanced capabilities for materials modelling with Quantum ESPRESSO,”J. Phys.: Condens. Matter, vol. 29, no. 46, p. 465901,
[27]

Available: https://doi.org/10.1088/1361-648X/aa8f79

[Online]. Available: https://doi.org/10.1088/1361-648X/aa8f79

work page doi:10.1088/1361-648x/aa8f79
[28]

Finite-difference- pseudopotential method: electronic structure calculations without a basis,

J. R. Chelikowsky, N. Troullier, and Y . Saad, “Finite-difference- pseudopotential method: electronic structure calculations without a basis,”Phys. Rev. Lett., vol. 72, no. 8, p. 1240, 1994. [Online]. Available: https://doi.org/10.1103/PhysRevLett.72.1240

work page doi:10.1103/physrevlett.72.1240 1994
[29]

PARSEC–the pseudopotential algorithm for real-space electronic structure calculations: recent advances and novel applications to nano-structures,

L. Kronik, A. Makmal, M. L. Tiago, M. Alemany, M. Jain, X. Huang, Y . Saad, and J. R. Chelikowsky, “PARSEC–the pseudopotential algorithm for real-space electronic structure calculations: recent advances and novel applications to nano-structures,”Phys. Status Solidi B, vol. 243, no. 5, pp. 1063–1079, 2006. [Online]. Available: https://doi.org/10.1002/pssb....

work page doi:10.1002/pssb.200541463 2006
[30]

Daubechies wavelets as a basis set for density functional pseudopotential calculations,

L. Genovese, A. Neelov, S. Goedecker, T. Deutsch, S. A. Ghasemi, A. Willand, D. Caliste, O. Zilberberg, M. Rayson, A. Bergman, and R. Schneider, “Daubechies wavelets as a basis set for density functional pseudopotential calculations,”J. Chem. Phys., vol. 129, p. 014109,
[31]

Available: https://doi.org/10.1063/1.2949547

[Online]. Available: https://doi.org/10.1063/1.2949547

work page doi:10.1063/1.2949547
[32]

Adaptive finite-element method for electronic-structure calculations,

E. Tsuchida and M. Tsukada, “Adaptive finite-element method for electronic-structure calculations,”Phys. Rev. B, vol. 54, no. 11, pp. 7602–7605, 1996. [Online]. Available: https://doi.org/10.1103/PhysRevB.54.7602

work page doi:10.1103/physrevb.54.7602 1996
[33]

Finite element methods in ab initio electronic structure calculations,

J. Pask and P. Sterne, “Finite element methods in ab initio electronic structure calculations,”Modell. Simul. Mater. Sci. Eng., vol. 13, no. 3, p. R71, 2005. [Online]. Available: https://doi.org/10.1088/0965- 0393/13/3/R01

work page doi:10.1088/0965- 2005
[34]

Higher-order adaptive finite-element methods for Kohn–Sham density functional theory,

P. Motamarri, M. R. Nowak, K. Leiter, J. Knap, and V . Gavini, “Higher-order adaptive finite-element methods for Kohn–Sham density functional theory,”J. Comput. Phys., vol. 253, pp. 308–343, 2013. [Online]. Available: https://doi.org/10.1016/j.jcp.2013.06.042

work page doi:10.1016/j.jcp.2013.06.042 2013
[35]

A matrix-free approach for finite-strain hyperelastic problems using geometric multigrid,

D. Davydov, J.-P. Pelteret, D. Arndt, M. Kronbichler, and P. Steinmann, “A matrix-free approach for finite-strain hyperelastic problems using geometric multigrid,”Int. J. Numer. Methods Eng., vol. 121, no. 13, pp. 2874–2895, 2020. [Online]. Available: https://doi.org/10.1002/nme.6336

work page doi:10.1002/nme.6336 2020
[36]

Scalability of high-performance PDE solvers,

P. Fischer, M. Min, T. Rathnayake, S. Dutta, T. Kolev, V . Dobrev, J.-S. Camier, M. Kronbichler, T. Warburton, K. ´Swirydowicz, and J. Brown, “Scalability of high-performance PDE solvers,”Int. J. High Perform. Comput. Appl., vol. 34, no. 5, pp. 562–586, 2020. [Online]. Available: https://doi.org/10.1177/1094342020915762

work page doi:10.1177/1094342020915762 2020
[37]

Fast hardware- aware matrix-free algorithms for higher-order finite-element discretized matrix multivector products on distributed systems,

G. Panigrahi, N. Kodali, D. Panda, and P. Motamarri, “Fast hardware- aware matrix-free algorithms for higher-order finite-element discretized matrix multivector products on distributed systems,”J. Parallel Distrib. Comput., vol. 192, p. 104925, 2024. [Online]. Available: https://doi.org/10.1016/j.jpdc.2024.104925

work page doi:10.1016/j.jpdc.2024.104925 2024
[38]

Giant nonlinear Hall effect in twisted bilayer WTe2,

Z. He and H. Weng, “Giant nonlinear Hall effect in twisted bilayer WTe2,”npj Quantum Mater., vol. 6, p. 101, 2021. [Online]. Available: https://doi.org/10.1038/s41535-021-00403-9

work page doi:10.1038/s41535-021-00403-9 2021
[39]

Finite-element methods for noncollinear magnetism and spin-orbit coupling in real- space pseudopotential density functional theory,

N. Kodali and P. Motamarri, “Finite-element methods for noncollinear magnetism and spin-orbit coupling in real- space pseudopotential density functional theory,”Phys. Rev. B, vol. 111, p. 195129, May 2025. [Online]. Available: https://link.aps.org/doi/10.1103/PhysRevB.111.195129

work page doi:10.1103/physrevb.111.195129 2025
[40]

Resta-like preconditioning for self-consistent field iterations in the linearized augmented planewave method,

J. Kim and M. M. May, “Resta-like preconditioning for self-consistent field iterations in the linearized augmented planewave method,” Electronic Structure, vol. 4, no. 4, p. 047003, nov 2022. [Online]. Available: https://doi.org/10.1088/2516-1075/aca24a

work page doi:10.1088/2516-1075/aca24a 2022
[41]

Self-consistent- field calculations using Chebyshev-filtered subspace iteration,

Y . Zhou, Y . Saad, M. L. Tiago, and J. R. Chelikowsky, “Self-consistent- field calculations using Chebyshev-filtered subspace iteration,”J. Comput. Phys., vol. 219, no. 1, pp. 172 – 184, 2006. [Online]. Available: https://doi.org/10.1016/j.jcp.2006.03.017

work page doi:10.1016/j.jcp.2006.03.017 2006
[42]

Residual- based Chebyshev filtered subspace iteration for sparse Hermitian eigenvalue problems tolerant to inexact matrix-vector products,

N. Kodali, K. Ramakrishnan, and P. Motamarri, “Residual- based Chebyshev filtered subspace iteration for sparse Hermitian eigenvalue problems tolerant to inexact matrix-vector products,” arXiv preprint arXiv:2503.22652, 2025. [Online]. Available: https://arxiv.org/abs/2503.22652

work page arXiv 2025
[43]

Liu and J

P. Lindstrom, “Fixed-rate compressed floating-point arrays,” IEEE Transactions on Visualization and Computer Graphics, vol. 20, no. 12, pp. 2674–2683, 2014. [Online]. Available: https://doi.org/10.1109/TVCG.2014.2346458

work page doi:10.1109/tvcg.2014.2346458 2014
[44]

Aurora: Architecting argonne’s first exascale supercomputer for accelerated scientific discovery,

W. E. Allcock, B. S. Allen, J. Anchell, V . Anisimov, T. Applencourt, A. Bagusetty, R. Balakrishnan, R. Balin, S. Bekele, C. Bertoni, C. Blackworth, R. Bustamante, K. Canada, J. Carrier, C. Chan-nui, L. C. Cheney, T. Childers, P. Coffman, S. Coghlan, T. Dey, M. D’Mello, A. Emani, M. Emani, K. G. Felker, S. Foreman, O. Franza, L. Gao, M. Garc´ıa, M. Garzar...

work page arXiv 2025
[45]

Two-dimensional itinerant ferromagnetism in atomically thin Fe 3GeTe2,

Z. Fei, B. Huang, P. Malinowski, W. Wang, T. Song, J. Sanchez, W. Yao, D. Xiao, X. Zhu, A. F. May, W. Wu, D. H. Cobden, J. H. Chu, and X. Xu, “Two-dimensional itinerant ferromagnetism in atomically thin Fe 3GeTe2,”Nat. Mater., vol. 17, no. 9, pp. 778–782, 2018. [Online]. Available: https://doi.org/10.1038/s41563-018-0149-7

work page doi:10.1038/s41563-018-0149-7 2018
[46]

Gate-tunable room-temperature ferromagnetism in two-dimensional Fe3GeTe2,

Y . Deng, Y . Yu, Y . Song, J. Zhang, N. Z. Wang, Z. Sun, Y . Yi, Y . Z. Wu, S. Wu, J. Zhu, J. Wang, X. H. Chen, and Y . Zhang, “Gate-tunable room-temperature ferromagnetism in two-dimensional Fe3GeTe2,”Nature, vol. 563, no. 7729, pp. 94–99, 2018. [Online]. Available: https://doi.org/10.1038/s41586-018-0626-9

work page doi:10.1038/s41586-018-0626-9 2018
[47]

Topological exciton bands in moir ´e heterojunctions,

F. Wu, T. Lovorn, and A. H. MacDonald, “Topological exciton bands in moir ´e heterojunctions,”Phys. Rev. Lett., vol. 118, p. 147401, 2017. [Online]. Available: https://doi.org/10.1103/PhysRevLett.118.147401

work page doi:10.1103/physrevlett.118.147401 2017
[48]

Signatures of moir ´e-trapped valley excitons in MoSe 2/WSe2 heterobilayers,

K. L. Seyler, P. Rivera, H. Yu, N. P. Wilson, E. L. Ray, D. G. Mandrus, J. Yan, W. Yao, and X. Xu, “Signatures of moir ´e-trapped valley excitons in MoSe 2/WSe2 heterobilayers,”Nature, vol. 567, pp. 66–70,
[49]

Available: https://doi.org/10.1038/s41586-019-0957-1

[Online]. Available: https://doi.org/10.1038/s41586-019-0957-1

work page doi:10.1038/s41586-019-0957-1
[50]

Twister: Construction and structural relaxation of commensurate moir ´e superlattices,

M. H. Naik and M. Jain, “Twister: Construction and structural relaxation of commensurate moir ´e superlattices,”Comput. Phys. Commun., vol. 271, p. 108184, 2022. [Online]. Available: https://doi.org/10.1016/j.cpc.2021.108184

work page doi:10.1016/j.cpc.2021.108184 2022
[51]

Optimization algorithm for the generation of ONCV pseudopotentials,

M. Schlipf and F. Gygi, “Optimization algorithm for the generation of ONCV pseudopotentials,”Comput. Phys. Commun., vol. 196, pp. 36–44,
[52]

Available: https://doi.org/10.1016/j.cpc.2015.05.011

[Online]. Available: https://doi.org/10.1016/j.cpc.2015.05.011

work page doi:10.1016/j.cpc.2015.05.011 2015
[53]

Generalized gradient ap- proximation made simple,

J. P. Perdew, K. Burke, and M. Ernzerhof, “Generalized gradient ap- proximation made simple,”Phys. Rev. Lett., vol. 77, pp. 3865–3868, Oct
[54]

Generalized Gradient Approximation Made Simple,

[Online]. Available: https://doi.org/10.1103/PhysRevLett.77.3865

work page doi:10.1103/physrevlett.77.3865
[55]

The deal. II library, version 9.7

D. Arndt, W. Bangerth, M. Bergbauer, B. Blais, M. Fehling, R. Gassm ¨oller, T. Heister, L. Heltai, M. Kronbichler, M. Maier, P. Munch, S. Scheuerman, B. Turcksin, S. Uzunbajakau, D. Wells, and M. Wichrowski, “The deal.ii library, version 9.7,”Journal of Numerical Mathematics, vol. 33, no. 4, pp. 403–415, 2025. [Online]. Available: https://doi.org/10.1515/...

work page doi:10.1515/jnma-2025-0115 2025
[56]

The kokkos ecosystem: Comprehensive performance portability for high performance computing,

C. Trott, L. Berger-Vergiat, D. Poliakoff, S. Rajamanickam, D. Lebrun- Grandie, J. Madsen, N. Al Awar, M. Gligoric, G. Shipman, and G. Womeldorff, “The kokkos ecosystem: Comprehensive performance portability for high performance computing,”Computing in Science Engineering, vol. 23, no. 5, pp. 10–18, 2021. [Online]. Available: https://doi.org/10.1109/MCSE....

work page doi:10.1109/mcse.2021.3098509 2021

[1] [1]

Large-scale electronic structure calculations of high-Z metals on the BlueGene/L platform,

F. Gygi, E. W. Draeger, M. Schulz, B. R. De Supinski, J. A. Gunnels, V . Austel, J. C. Sexton, F. Franchetti, S. Kral, C. W. Ueberhuber et al., “Large-scale electronic structure calculations of high-Z metals on the BlueGene/L platform,” inProceedings of the 2006 ACM/IEEE conference on Supercomputing, 2006, pp. 45–es. [Online]. Available: https://doi.org/1...

work page doi:10.1145/1188455.1188504 2006

[2] [2]

New algorithm to enable 400+ TFlop/s sustained performance in simulations of disorder effects in high-T c superconductors,

G. Alvarez, M. S. Summers, D. E. Maxwell, M. Eisenbach, J. S. Meredith, J. M. Larkin, J. Levesque, T. A. Maier, P. R. Kent, E. F. D’Azevedoet al., “New algorithm to enable 400+ TFlop/s sustained performance in simulations of disorder effects in high-T c superconductors,” inSC’08: Proceedings of the 2008 ACM/IEEE Conference on Supercomputing. IEEE, 2008, p...

work page doi:10.1109/sc.2008.5214359 2008

[3] [3]

First-principles calculations of electron states of a silicon nanowire with 100,000 atoms on the K computer,

Y . Hasegawa, J.-I. Iwata, M. Tsuji, D. Takahashi, A. Oshiyama, K. Minami, T. Boku, F. Shoji, A. Uno, M. Kurokawaet al., “First-principles calculations of electron states of a silicon nanowire with 100,000 atoms on the K computer,” inProceedings of 2011 International Conference for High Performance Computing, Networking, Storage and Analysis, 2011, pp. 1–...

work page doi:10.1145/2063384.2063386 2011

[4] [4]

A data-centric approach to extreme-scale ab initio dissipative quantum transport simulations,

A. N. Ziogas, T. Ben-Nun, G. I. Fern ´andez, T. Schneider, M. Luisier, and T. Hoefler, “A data-centric approach to extreme-scale ab initio dissipative quantum transport simulations,” inProceedings of the International Conference for High Performance Computing, Networking, Storage and Analysis, 2019, pp. 1–13. [Online]. Available: https://doi.org/10.1145/3...

work page doi:10.1145/3295500.3356156 2019

[5] [5]

Large-scale materials modeling at quantum accuracy: Ab initio simulations of quasicrystals and interacting extended defects in metallic alloys,

S. Das, B. Kanungo, V . Subramanian, G. Panigrahi, P. Motamarri, D. Rogers, P. Zimmerman, and V . Gavini, “Large-scale materials modeling at quantum accuracy: Ab initio simulations of quasicrystals and interacting extended defects in metallic alloys,” inProceedings of the International Conference for High Performance Computing, Networking, Storage and Ana...

work page doi:10.1145/3581784.3627037 2023

[6] [6]

Modeling dilute solutions using first-principles molecular dynamics: computing more than a million atoms with over a million cores,

J.-L. Fattebert, D. Osei-Kuffuor, E. W. Draeger, T. Ogitsu, and W. D. Krauss, “Modeling dilute solutions using first-principles molecular dynamics: computing more than a million atoms with over a million cores,” inSC’16: Proceedings of the International Conference for High Performance Computing, Networking, Storage and Analysis. IEEE, 2016, pp. 12–22. [On...

work page doi:10.1109/sc.2016.2 2016

[7] [7]

Fast, scalable and accurate finite-element based ab initio calculations using mixed precision computing: 46 PFLOPS simulation of a metallic dislocation system,

S. Das, P. Motamarri, V . Gavini, B. Turcksin, Y . W. Li, and B. Leback, “Fast, scalable and accurate finite-element based ab initio calculations using mixed precision computing: 46 PFLOPS simulation of a metallic dislocation system,” inProceedings of the International Conference for High Performance Computing, Networking, Storage and Analysis, 2019, pp. ...

work page doi:10.1145/3295500.3357157 2019

[8] [8]

Inhomogeneous electron gas,

P. Hohenberg and W. Kohn, “Inhomogeneous electron gas,” Phys. Rev., vol. 136, pp. B864–B871, 1964. [Online]. Available: https://doi.org/10.1103/PhysRev.136.B864

work page doi:10.1103/physrev.136.b864 1964

[9] [9]

Kohn and L

W. Kohn and L. J. Sham, “Self-consistent equations including exchange and correlation effects,”Phys. Rev., vol. 140, pp. 1133–1138, 1965. [Online]. Available: https://doi.org/10.1103/PhysRev.140.A1133

work page doi:10.1103/physrev.140.a1133 1965

[10] [10]

[Online]

https://www.nobelprize.org/prizes/chemistry/1998/summary. [Online]. Available: https://www.nobelprize.org/prizes/chemistry/1998/summary

1998

[11] [11]

Linear scaling electronic structure methods,

S. Goedecker, “Linear scaling electronic structure methods,”Rev. Mod. Phys., vol. 71, pp. 1085–1123, Jul 1999. [Online]. Available: https://link.aps.org/doi/10.1103/RevModPhys.71.1085

work page doi:10.1103/revmodphys.71.1085 1999

[12] [12]

Introducing ONETEP: Linear-scaling density functional simulations on parallel computers,

C.-K. Skylaris, P. D. Haynes, A. A. Mostofi, and M. C. Payne, “Introducing ONETEP: Linear-scaling density functional simulations on parallel computers,”J. Chem. Phys., vol. 122, no. 8, p. 084119,

[13] [13]

Available: https://doi.org/10.1063/1.1839852

[Online]. Available: https://doi.org/10.1063/1.1839852

work page doi:10.1063/1.1839852

[14] [14]

Methods in electronic structure calculations,

D. Bowler and T. Miyazaki, “Methods in electronic structure calculations,”Rep. Prog. Phys., vol. 75, no. 3, p. 036503, 2012. [Online]. Available: https://doi.org/10.1088/0034-4885/75/3/036503

work page doi:10.1088/0034-4885/75/3/036503 2012

[15] [15]

Linear-scaling three-dimensional fragment method for large-scale electronic structure calculations,

L.-W. Wang, Z. Zhao, and J. Meza, “Linear-scaling three-dimensional fragment method for large-scale electronic structure calculations,” Phys. Rev. B, vol. 77, no. 16, p. 165113, 2008. [Online]. Available: https://doi.org/10.1103/PhysRevB.77.165113

work page doi:10.1103/physrevb.77.165113 2008

[16] [16]

A scalable method for ab initio computation of free energies in nanoscale systems,

M. Eisenbach, C.-G. Zhou, D. M. Nicholson, G. Brown, J. Larkin, and T. C. Schulthess, “A scalable method for ab initio computation of free energies in nanoscale systems,” inProceedings of the Conference on High Performance Computing Networking, Storage and Analysis, 2009, pp. 1–8. [Online]. Available: https://doi.org/10.1145/1654059.1654062

work page doi:10.1145/1654059.1654062 2009

[17] [17]

Self-averaging stochastic Kohn-Sham density-functional theory,

R. Baer, D. Neuhauser, and E. Rabani, “Self-averaging stochastic Kohn-Sham density-functional theory,”Phys. Rev. Lett., vol. 111, p. 106402, 2013. [Online]. Available: https://doi.org/10.1103/PhysRevLett.111.106402

work page doi:10.1103/physrevlett.111.106402 2013

[18] [18]

Stochastic density functional theory,

M. D. Fabian, B. Shpiro, E. Rabani, D. Neuhauser, and R. Baer, “Stochastic density functional theory,”WIREs Comput. Mol. Sci., vol. 9, no. 6, p. e1412, 2019. [Online]. Available: https://doi.org/10.1002/wcms.1412

work page doi:10.1002/wcms.1412 2019

[19] [19]

DFT-FE 1.0: A massively parallel hybrid CPU-GPU density functional theory code using finite-element discretization,

S. Das, P. Motamarri, V . Subramanian, D. M. Rogers, and V . Gavini, “DFT-FE 1.0: A massively parallel hybrid CPU-GPU density functional theory code using finite-element discretization,”Comput. Phys. Commun., vol. 280, p. 108473, 2022. [Online]. Available: https://doi.org/10.1016/j.cpc.2022.108473

work page doi:10.1016/j.cpc.2022.108473 2022

[20] [20]

Wilkinson and E

J. K ¨ubler, K. H. Hock, J. Sticht, and A. R. Williams, “Density functional theory of non-collinear magnetism,”J. Phys. F: Met. Phys., vol. 18, pp. 469–483, 1988. [Online]. Available: https://doi.org/10.1088/0305- 4608/18/3/018

work page doi:10.1088/0305- 1988

[21] [21]

(6) Hertel, R.SPIN2013,03, 1340009, DOI:10.1142/S2010324713400092

U. von Barth and L. Hedin, “A local exchange-correlation potential for the spin polarized case. I,”J. Phys. C: Solid State Phys., vol. 5, pp. 1629–1642, 1972. [Online]. Available: https://doi.org/10.1088/0022- 3719/5/13/012

work page doi:10.1088/0022- 1972

[22] [22]

DFT-FE–a massively parallel adaptive finite-element code for large-scale density functional theory calculations,

P. Motamarri, S. Das, S. Rudraraju, K. Ghosh, D. Davydov, and V . Gavini, “DFT-FE–a massively parallel adaptive finite-element code for large-scale density functional theory calculations,”Comput. Phys. Commun., vol. 246, p. 106853, 2020. [Online]. Available: https://doi.org/10.1016/j.cpc.2019.07.016

work page doi:10.1016/j.cpc.2019.07.016 2020

[23] [23]

Optimized norm-conserving Vanderbilt pseudopoten- tials,

D. R. Hamann, “Optimized norm-conserving Vanderbilt pseudopoten- tials,”Phys. Rev. B, vol. 88, p. 085117, Aug 2013. [Online]. Available: https://doi.org/10.1103/PhysRevB.88.085117

work page doi:10.1103/physrevb.88.085117 2013

[24] [24]

Spin-orbit coupling with ultrasoft pseudopotentials: Application to Au and Pt,

A. Dal Corso and A. M. Conte, “Spin-orbit coupling with ultrasoft pseudopotentials: Application to Au and Pt,”Phys. Rev. B, vol. 71, p. 115106, 2005. [Online]. Available: https://doi.org/10.1103/PhysRevB.71.115106

work page doi:10.1103/physrevb.71.115106 2005

[25] [25]

Kresse, J

G. Kresse and J. Furthm ¨uller, “Efficient iterative schemes for ab initio total-energy calculations using a plane-wave basis set,”Phys. Rev. B, vol. 54, no. 16, p. 11169, 1996. [Online]. Available: https://doi.org/10.1103/PhysRevB.54.11169

work page doi:10.1103/physrevb.54.11169 1996

[26] [26]

Advanced capabilities for materials modelling with Quantum ESPRESSO,

P. Giannozzi, O. Andreussi, T. Brumme, O. Bunau, M. B. Nardelli, M. Calandra, R. Car, C. Cavazzoni, D. Ceresoli, M. Cococcioni et al., “Advanced capabilities for materials modelling with Quantum ESPRESSO,”J. Phys.: Condens. Matter, vol. 29, no. 46, p. 465901,

[27] [27]

Available: https://doi.org/10.1088/1361-648X/aa8f79

[Online]. Available: https://doi.org/10.1088/1361-648X/aa8f79

work page doi:10.1088/1361-648x/aa8f79

[28] [28]

Finite-difference- pseudopotential method: electronic structure calculations without a basis,

J. R. Chelikowsky, N. Troullier, and Y . Saad, “Finite-difference- pseudopotential method: electronic structure calculations without a basis,”Phys. Rev. Lett., vol. 72, no. 8, p. 1240, 1994. [Online]. Available: https://doi.org/10.1103/PhysRevLett.72.1240

work page doi:10.1103/physrevlett.72.1240 1994

[29] [29]

PARSEC–the pseudopotential algorithm for real-space electronic structure calculations: recent advances and novel applications to nano-structures,

L. Kronik, A. Makmal, M. L. Tiago, M. Alemany, M. Jain, X. Huang, Y . Saad, and J. R. Chelikowsky, “PARSEC–the pseudopotential algorithm for real-space electronic structure calculations: recent advances and novel applications to nano-structures,”Phys. Status Solidi B, vol. 243, no. 5, pp. 1063–1079, 2006. [Online]. Available: https://doi.org/10.1002/pssb....

work page doi:10.1002/pssb.200541463 2006

[30] [30]

Daubechies wavelets as a basis set for density functional pseudopotential calculations,

L. Genovese, A. Neelov, S. Goedecker, T. Deutsch, S. A. Ghasemi, A. Willand, D. Caliste, O. Zilberberg, M. Rayson, A. Bergman, and R. Schneider, “Daubechies wavelets as a basis set for density functional pseudopotential calculations,”J. Chem. Phys., vol. 129, p. 014109,

[31] [31]

Available: https://doi.org/10.1063/1.2949547

[Online]. Available: https://doi.org/10.1063/1.2949547

work page doi:10.1063/1.2949547

[32] [32]

Adaptive finite-element method for electronic-structure calculations,

E. Tsuchida and M. Tsukada, “Adaptive finite-element method for electronic-structure calculations,”Phys. Rev. B, vol. 54, no. 11, pp. 7602–7605, 1996. [Online]. Available: https://doi.org/10.1103/PhysRevB.54.7602

work page doi:10.1103/physrevb.54.7602 1996

[33] [33]

Finite element methods in ab initio electronic structure calculations,

J. Pask and P. Sterne, “Finite element methods in ab initio electronic structure calculations,”Modell. Simul. Mater. Sci. Eng., vol. 13, no. 3, p. R71, 2005. [Online]. Available: https://doi.org/10.1088/0965- 0393/13/3/R01

work page doi:10.1088/0965- 2005

[34] [34]

Higher-order adaptive finite-element methods for Kohn–Sham density functional theory,

P. Motamarri, M. R. Nowak, K. Leiter, J. Knap, and V . Gavini, “Higher-order adaptive finite-element methods for Kohn–Sham density functional theory,”J. Comput. Phys., vol. 253, pp. 308–343, 2013. [Online]. Available: https://doi.org/10.1016/j.jcp.2013.06.042

work page doi:10.1016/j.jcp.2013.06.042 2013

[35] [35]

A matrix-free approach for finite-strain hyperelastic problems using geometric multigrid,

D. Davydov, J.-P. Pelteret, D. Arndt, M. Kronbichler, and P. Steinmann, “A matrix-free approach for finite-strain hyperelastic problems using geometric multigrid,”Int. J. Numer. Methods Eng., vol. 121, no. 13, pp. 2874–2895, 2020. [Online]. Available: https://doi.org/10.1002/nme.6336

work page doi:10.1002/nme.6336 2020

[36] [36]

Scalability of high-performance PDE solvers,

P. Fischer, M. Min, T. Rathnayake, S. Dutta, T. Kolev, V . Dobrev, J.-S. Camier, M. Kronbichler, T. Warburton, K. ´Swirydowicz, and J. Brown, “Scalability of high-performance PDE solvers,”Int. J. High Perform. Comput. Appl., vol. 34, no. 5, pp. 562–586, 2020. [Online]. Available: https://doi.org/10.1177/1094342020915762

work page doi:10.1177/1094342020915762 2020

[37] [37]

Fast hardware- aware matrix-free algorithms for higher-order finite-element discretized matrix multivector products on distributed systems,

G. Panigrahi, N. Kodali, D. Panda, and P. Motamarri, “Fast hardware- aware matrix-free algorithms for higher-order finite-element discretized matrix multivector products on distributed systems,”J. Parallel Distrib. Comput., vol. 192, p. 104925, 2024. [Online]. Available: https://doi.org/10.1016/j.jpdc.2024.104925

work page doi:10.1016/j.jpdc.2024.104925 2024

[38] [38]

Giant nonlinear Hall effect in twisted bilayer WTe2,

Z. He and H. Weng, “Giant nonlinear Hall effect in twisted bilayer WTe2,”npj Quantum Mater., vol. 6, p. 101, 2021. [Online]. Available: https://doi.org/10.1038/s41535-021-00403-9

work page doi:10.1038/s41535-021-00403-9 2021

[39] [39]

Finite-element methods for noncollinear magnetism and spin-orbit coupling in real- space pseudopotential density functional theory,

N. Kodali and P. Motamarri, “Finite-element methods for noncollinear magnetism and spin-orbit coupling in real- space pseudopotential density functional theory,”Phys. Rev. B, vol. 111, p. 195129, May 2025. [Online]. Available: https://link.aps.org/doi/10.1103/PhysRevB.111.195129

work page doi:10.1103/physrevb.111.195129 2025

[40] [40]

Resta-like preconditioning for self-consistent field iterations in the linearized augmented planewave method,

J. Kim and M. M. May, “Resta-like preconditioning for self-consistent field iterations in the linearized augmented planewave method,” Electronic Structure, vol. 4, no. 4, p. 047003, nov 2022. [Online]. Available: https://doi.org/10.1088/2516-1075/aca24a

work page doi:10.1088/2516-1075/aca24a 2022

[41] [41]

Self-consistent- field calculations using Chebyshev-filtered subspace iteration,

Y . Zhou, Y . Saad, M. L. Tiago, and J. R. Chelikowsky, “Self-consistent- field calculations using Chebyshev-filtered subspace iteration,”J. Comput. Phys., vol. 219, no. 1, pp. 172 – 184, 2006. [Online]. Available: https://doi.org/10.1016/j.jcp.2006.03.017

work page doi:10.1016/j.jcp.2006.03.017 2006

[42] [42]

Residual- based Chebyshev filtered subspace iteration for sparse Hermitian eigenvalue problems tolerant to inexact matrix-vector products,

N. Kodali, K. Ramakrishnan, and P. Motamarri, “Residual- based Chebyshev filtered subspace iteration for sparse Hermitian eigenvalue problems tolerant to inexact matrix-vector products,” arXiv preprint arXiv:2503.22652, 2025. [Online]. Available: https://arxiv.org/abs/2503.22652

work page arXiv 2025

[43] [43]

Liu and J

P. Lindstrom, “Fixed-rate compressed floating-point arrays,” IEEE Transactions on Visualization and Computer Graphics, vol. 20, no. 12, pp. 2674–2683, 2014. [Online]. Available: https://doi.org/10.1109/TVCG.2014.2346458

work page doi:10.1109/tvcg.2014.2346458 2014

[44] [44]

Aurora: Architecting argonne’s first exascale supercomputer for accelerated scientific discovery,

W. E. Allcock, B. S. Allen, J. Anchell, V . Anisimov, T. Applencourt, A. Bagusetty, R. Balakrishnan, R. Balin, S. Bekele, C. Bertoni, C. Blackworth, R. Bustamante, K. Canada, J. Carrier, C. Chan-nui, L. C. Cheney, T. Childers, P. Coffman, S. Coghlan, T. Dey, M. D’Mello, A. Emani, M. Emani, K. G. Felker, S. Foreman, O. Franza, L. Gao, M. Garc´ıa, M. Garzar...

work page arXiv 2025

[45] [45]

Two-dimensional itinerant ferromagnetism in atomically thin Fe 3GeTe2,

Z. Fei, B. Huang, P. Malinowski, W. Wang, T. Song, J. Sanchez, W. Yao, D. Xiao, X. Zhu, A. F. May, W. Wu, D. H. Cobden, J. H. Chu, and X. Xu, “Two-dimensional itinerant ferromagnetism in atomically thin Fe 3GeTe2,”Nat. Mater., vol. 17, no. 9, pp. 778–782, 2018. [Online]. Available: https://doi.org/10.1038/s41563-018-0149-7

work page doi:10.1038/s41563-018-0149-7 2018

[46] [46]

Gate-tunable room-temperature ferromagnetism in two-dimensional Fe3GeTe2,

Y . Deng, Y . Yu, Y . Song, J. Zhang, N. Z. Wang, Z. Sun, Y . Yi, Y . Z. Wu, S. Wu, J. Zhu, J. Wang, X. H. Chen, and Y . Zhang, “Gate-tunable room-temperature ferromagnetism in two-dimensional Fe3GeTe2,”Nature, vol. 563, no. 7729, pp. 94–99, 2018. [Online]. Available: https://doi.org/10.1038/s41586-018-0626-9

work page doi:10.1038/s41586-018-0626-9 2018

[47] [47]

Topological exciton bands in moir ´e heterojunctions,

F. Wu, T. Lovorn, and A. H. MacDonald, “Topological exciton bands in moir ´e heterojunctions,”Phys. Rev. Lett., vol. 118, p. 147401, 2017. [Online]. Available: https://doi.org/10.1103/PhysRevLett.118.147401

work page doi:10.1103/physrevlett.118.147401 2017

[48] [48]

Signatures of moir ´e-trapped valley excitons in MoSe 2/WSe2 heterobilayers,

K. L. Seyler, P. Rivera, H. Yu, N. P. Wilson, E. L. Ray, D. G. Mandrus, J. Yan, W. Yao, and X. Xu, “Signatures of moir ´e-trapped valley excitons in MoSe 2/WSe2 heterobilayers,”Nature, vol. 567, pp. 66–70,

[49] [49]

Available: https://doi.org/10.1038/s41586-019-0957-1

[Online]. Available: https://doi.org/10.1038/s41586-019-0957-1

work page doi:10.1038/s41586-019-0957-1

[50] [50]

Twister: Construction and structural relaxation of commensurate moir ´e superlattices,

M. H. Naik and M. Jain, “Twister: Construction and structural relaxation of commensurate moir ´e superlattices,”Comput. Phys. Commun., vol. 271, p. 108184, 2022. [Online]. Available: https://doi.org/10.1016/j.cpc.2021.108184

work page doi:10.1016/j.cpc.2021.108184 2022

[51] [51]

Optimization algorithm for the generation of ONCV pseudopotentials,

M. Schlipf and F. Gygi, “Optimization algorithm for the generation of ONCV pseudopotentials,”Comput. Phys. Commun., vol. 196, pp. 36–44,

[52] [52]

Available: https://doi.org/10.1016/j.cpc.2015.05.011

[Online]. Available: https://doi.org/10.1016/j.cpc.2015.05.011

work page doi:10.1016/j.cpc.2015.05.011 2015

[53] [53]

Generalized gradient ap- proximation made simple,

J. P. Perdew, K. Burke, and M. Ernzerhof, “Generalized gradient ap- proximation made simple,”Phys. Rev. Lett., vol. 77, pp. 3865–3868, Oct

[54] [54]

Generalized Gradient Approximation Made Simple,

[Online]. Available: https://doi.org/10.1103/PhysRevLett.77.3865

work page doi:10.1103/physrevlett.77.3865

[55] [55]

The deal. II library, version 9.7

D. Arndt, W. Bangerth, M. Bergbauer, B. Blais, M. Fehling, R. Gassm ¨oller, T. Heister, L. Heltai, M. Kronbichler, M. Maier, P. Munch, S. Scheuerman, B. Turcksin, S. Uzunbajakau, D. Wells, and M. Wichrowski, “The deal.ii library, version 9.7,”Journal of Numerical Mathematics, vol. 33, no. 4, pp. 403–415, 2025. [Online]. Available: https://doi.org/10.1515/...

work page doi:10.1515/jnma-2025-0115 2025

[56] [56]

The kokkos ecosystem: Comprehensive performance portability for high performance computing,

C. Trott, L. Berger-Vergiat, D. Poliakoff, S. Rajamanickam, D. Lebrun- Grandie, J. Madsen, N. Al Awar, M. Gligoric, G. Shipman, and G. Womeldorff, “The kokkos ecosystem: Comprehensive performance portability for high performance computing,”Computing in Science Engineering, vol. 23, no. 5, pp. 10–18, 2021. [Online]. Available: https://doi.org/10.1109/MCSE....

work page doi:10.1109/mcse.2021.3098509 2021