pith. machine review for the scientific record. sign in

arxiv: 2605.08885 · v1 · submitted 2026-05-09 · 💻 cs.LG

Recognition: 2 theorem links

· Lean Theorem

Compact SO(3) Equivariant Atomistic Foundation Models via Structural Pruning

Authors on Pith no claims yet

Pith reviewed 2026-05-12 01:38 UTC · model grok-4.3

classification 💻 cs.LG
keywords structural pruningSO(3) equivariant GNNatomistic foundation modelsmodel compressionMACEequivarianceMatbench Discoverygraph neural networks
0
0 comments X

The pith

Structural pruning of SO(3) equivariant atomistic models yields compact versions that outperform small models trained from scratch.

A machine-rendered reading of the paper's core claim, the machinery that carries it, and where it could break.

This paper develops a structural pruning technique for SO(3) equivariant graph neural networks used in atomistic foundation models. The method removes complete blocks of irreducible representations in the channel and order dimensions to maintain rotational symmetry. When applied to a large pretrained model, the resulting smaller network achieves higher accuracy than an equivalent small model trained independently on the same data. It also requires significantly fewer parameters and less pre-training computation. The benefits extend to fine-tuning on downstream tasks and apply across multiple equivariant architectures.

Core claim

The central claim is that block-wise pruning along the channel and order dimensions of SO(3) equivariant layers, starting from a large checkpoint, produces a compressed model that retains full equivariance and exceeds the accuracy of a from-scratch small model of similar size. This is evidenced by the pruned MACE-MP model outperforming the official small model on seven of nine Matbench Discovery metrics, with 1.5 to 4 times fewer parameters and 2.5 to 4 times less pre-training compute.

What carries the argument

Block-wise structural pruning of irreducible representations along channel and order dimensions to preserve SO(3) equivariance while reducing model size.

If this is right

  • Pruned models use 1.5× to 4× fewer parameters than training small models from scratch.
  • Pre-training compute is reduced by 2.5× to 4× compared to training small models.
  • Fine-tuning the pruned model lowers energy errors by 70.1% and force errors by 34.4% versus scratch-trained task-specific models.
  • The pruning generalizes to other SO(3) equivariant architectures including SevenNet and eSCN.
  • It combines effectively with quantization and knowledge distillation for further efficiency gains.

Where Pith is reading between the lines

These are editorial extensions of the paper, not claims the author makes directly.

  • This approach suggests that training large equivariant models and then pruning may be more effective than directly training small ones for achieving high performance at low inference cost.
  • Adopting this pruning could substantially lower the computational resources needed to develop and deploy atomistic AI models for materials science applications.
  • Similar block-pruning strategies might transfer to other symmetry-aware neural networks, enabling compact versions without retraining from scratch.

Load-bearing premise

Removing blocks of irreducible representations from a large model preserves enough expressive power for the pruned version to outperform a small model trained from scratch.

What would settle it

If an independently trained small model achieves comparable or better performance than the pruned large-to-small model on the Matbench Discovery leaderboard metrics, the advantage of pruning would not hold.

Figures

Figures reproduced from arXiv: 2605.08885 by Chen Wang, Guangming Tan, Siyu Hu, Weile Jia.

Figure 1
Figure 1. Figure 1: Pruning results. Pruned models out￾perform scratch-trained models of the same size at 2.5×–4× lower training cost. Point size indicates parameter count. Atomistic foundation models have recently emerged as a transformative paradigm in com￾putational chemistry and materials science [1]. These models [2–6] are available at different scales, placing each at a different point on the accuracy-efficiency frontie… view at source ↗
Figure 2
Figure 2. Figure 2: Overview of the proposed structural pruning framework. (a) Pruning Granularity & Importance Estimation: To preserve SO(3) equivariance, features are pruned as (k, l) blocks based on an energy-force sensitive importance criterion. (b) Structural Alignment: The model structure is aligned to the target structure via weight slicing and tensor product reconfiguration. (c) Pruning Framework: The four-stage pipel… view at source ↗
Figure 3
Figure 3. Figure 3 [PITH_FULL_IMAGE:figures/full_fig_p007_3.png] view at source ↗
Figure 4
Figure 4. Figure 4: Feature Importance Analysis across Orders L. The left panel displays the heatmap of feature importance sorted by channel for each order L, while the right panel shows the corresponding distribution (violin plots). C Comparison of Depth and Width Pruning We perform additional comparisons of Depth (reducing the number of message passing layers) and Width pruning (reducing the order L of the internal features… view at source ↗
Figure 5
Figure 5. Figure 5: Radial distribution functions from NVT MD simulation of liquid water at 300 K. O–O, O–H, and H–H RDFs are shown for the full pre-trained model (blue) and the pruned model (orange). The two curves overlap closely, confirming that structural pruning preserves the accuracy of local liquid-state structure. D.11 Organic Molecule Benchmarks We evaluate the pruned MACE-OFF model on three organic benchmarks coveri… view at source ↗
read the original abstract

SO(3) equivariant graph neural networks have become the dominant paradigm for atomistic foundation models, achieving high accuracy and data efficiency by building rotational symmetry directly into the architecture. Yet the computational cost of their higher-order tensor operations creates a tough trade-off between model accuracy and inference efficiency. In this paper, we propose a structural pruning method for SO(3) equivariant atomistic foundation models to bridge this accuracy-efficiency gap. The pruning is applied along the channel and order dimensions, with each irreducible representation kept or removed as a complete block, thereby retaining SO(3) equivariance. Starting from a large checkpoint, the pruned model substantially reduces the inference cost while retaining higher accuracy than an independently trained small model. The pruned MACE-MP model outperforms the official from-scratch trained small model on 7 of 9 metrics on the Matbench Discovery leaderboard. In terms of efficiency, compressed MACE-MP and MACE-OFF models contain 1.5$\times$ to 4$\times$ fewer parameters and require 2.5$\times$ to 4$\times$ less pre-training compute than training a small model from scratch. For downstream applications, fine-tuning the pruned model reduces energy and force errors by 70.1% and 34.4% compared to training task-specific models from scratch across eight representative downstream datasets. We demonstrate that the method generalizes to other SO(3) equivariant architectures (SevenNet, eSCN) and can be combined with quantization and knowledge distillation for further gains.

Editorial analysis

A structured set of objections, weighed in public.

Desk editor's note, referee report, simulated authors' rebuttal, and a circularity audit. Tearing a paper down is the easy half of reading it; the pith above is the substance, this is the friction.

Referee Report

3 major / 2 minor

Summary. The paper proposes a structural pruning method for SO(3) equivariant graph neural networks in atomistic foundation models (e.g., MACE-MP, MACE-OFF). Pruning is performed along channel and order dimensions by removing entire irreducible representation blocks to preserve equivariance. Starting from large pre-trained checkpoints, the resulting compact models reduce inference cost while achieving higher accuracy than independently trained small models of comparable size; the pruned MACE-MP outperforms the official from-scratch small model on 7 of 9 Matbench Discovery metrics. Additional claims include 1.5–4× parameter reduction and 2.5–4× lower pre-training compute versus training small models from scratch, plus gains from fine-tuning on eight downstream datasets and generalization to SevenNet and eSCN, with optional combination with quantization and distillation.

Significance. If the empirical claims hold under rigorous verification, the work would be significant for practical deployment of equivariant atomistic models, as it offers a way to obtain compact high-accuracy models without the full cost of training small architectures from scratch. The direct comparison against official from-scratch baselines on a public leaderboard (Matbench Discovery) and the reported fine-tuning improvements are strengths. The method's applicability across multiple SO(3)-equivariant architectures is also a positive aspect.

major comments (3)
  1. [Methods] Methods section: The pruning criterion for selecting which irreducible-representation blocks to remove (along channels or orders) is not specified with an equation or algorithm; it is unclear whether selection uses weight magnitude, activation statistics, or another importance measure. This is load-bearing for the central claim that the pruned model retains sufficient expressive power to outperform a from-scratch small model.
  2. [Results] Results on Matbench Discovery (Table reporting 7/9 metrics): No error bars, standard deviations across random seeds, or statistical significance tests are provided for the performance differences versus the official small model. Without these, it is impossible to determine whether the reported outperformance is reliable or could be explained by training variance.
  3. [Experiments] Ablation studies (if present) or Experiments section: There are no ablations varying the pruning ratio, comparing channel-only vs. order-only pruning, or contrasting the chosen criterion against random block removal. Such controls are required to substantiate that the specific structural pruning preserves task-critical tensor-product paths better than training a small model from scratch.
minor comments (2)
  1. [Abstract] Abstract: The efficiency statements (1.5×–4× fewer parameters, 2.5×–4× less pre-training compute) would be clearer if the exact pre- and post-pruning parameter counts and FLOPs for each model (MACE-MP, MACE-OFF) were stated explicitly.
  2. [Methods] Notation: The paper uses 'irreducible representations' and 'blocks' interchangeably in places; a short glossary or consistent definition in the methods would improve readability for readers less familiar with SO(3) equivariant tensor products.

Simulated Author's Rebuttal

3 responses · 0 unresolved

Thank you for the constructive feedback. We address each major comment below and will revise the manuscript to strengthen clarity and rigor.

read point-by-point responses
  1. Referee: [Methods] Methods section: The pruning criterion for selecting which irreducible-representation blocks to remove (along channels or orders) is not specified with an equation or algorithm; it is unclear whether selection uses weight magnitude, activation statistics, or another importance measure. This is load-bearing for the central claim that the pruned model retains sufficient expressive power to outperform a from-scratch small model.

    Authors: The referee is correct; the manuscript describes block removal to preserve equivariance but does not provide an explicit equation or algorithm for the selection criterion. We will add a precise mathematical formulation and pseudocode in the revised Methods section. revision: yes

  2. Referee: [Results] Results on Matbench Discovery (Table reporting 7/9 metrics): No error bars, standard deviations across random seeds, or statistical significance tests are provided for the performance differences versus the official small model. Without these, it is impossible to determine whether the reported outperformance is reliable or could be explained by training variance.

    Authors: We acknowledge the absence of error bars and statistical measures. In the revision we will rerun the relevant experiments across multiple seeds, report standard deviations, and include significance testing for the Matbench Discovery comparisons. revision: yes

  3. Referee: [Experiments] Ablation studies (if present) or Experiments section: There are no ablations varying the pruning ratio, comparing channel-only vs. order-only pruning, or contrasting the chosen criterion against random block removal. Such controls are required to substantiate that the specific structural pruning preserves task-critical tensor-product paths better than training a small model from scratch.

    Authors: We agree that systematic ablations would strengthen the evidence. The revised manuscript will include new experiments varying the pruning ratio, isolating channel-only versus order-only pruning, and comparing against random block removal. revision: yes

Circularity Check

0 steps flagged

No significant circularity in empirical pruning evaluation

full rationale

The paper presents a structural pruning method for SO(3) equivariant atomistic models, applied block-wise along channel and order dimensions to preserve equivariance, then evaluates the resulting compressed models against independently trained small models and from-scratch baselines on Matbench Discovery and downstream tasks. All reported gains (e.g., outperforming official small MACE-MP on 7/9 metrics, 1.5-4x parameter reduction) are obtained via direct experimental comparison rather than any derivation that reduces outputs to pruning hyperparameters, fitted inputs, or self-citations. No equations or uniqueness claims collapse the central result to its own inputs by construction; the work is self-contained against external benchmarks.

Axiom & Free-Parameter Ledger

1 free parameters · 2 axioms · 0 invented entities

The central claim rests on the domain assumption that block pruning preserves equivariance and on the empirical premise that large checkpoints contain compressible redundancy; no explicit free parameters or invented entities are named in the abstract.

free parameters (1)
  • pruning ratios or selection thresholds for channels and orders
    The decision of which blocks to remove is controlled by hyperparameters whose values are not specified in the abstract but must be chosen or tuned to achieve the reported compression levels.
axioms (2)
  • domain assumption Removing entire irreducible representations as blocks preserves the SO(3) equivariance of the network
    Stated directly in the abstract as the mechanism that retains rotational symmetry after pruning.
  • domain assumption A large pre-trained equivariant checkpoint contains redundant information that can be pruned without destroying the model's ability to outperform a small model trained from scratch
    Implicit in the claim that pruned models beat independently trained small models on benchmarks.

pith-pipeline@v0.9.0 · 5586 in / 1630 out tokens · 42899 ms · 2026-05-12T01:38:32.539351+00:00 · methodology

discussion (0)

Sign in with ORCID, Apple, or X to comment. Anyone can read and Pith papers without signing in.

Lean theorems connected to this paper

Citations machine-checked in the Pith Canon. Every link opens the source theorem in the public Lean library.

What do these tags mean?
matches
The paper's claim is directly supported by a theorem in the formal canon.
supports
The theorem supports part of the paper's argument, but the paper may add assumptions or extra steps.
extends
The paper goes beyond the formal theorem; the theorem is a base layer rather than the whole result.
uses
The paper appears to rely on the theorem as machinery.
contradicts
The paper's claim conflicts with a theorem or certificate in the canon.
unclear
Pith found a possible connection, but the passage is too broad, indirect, or ambiguous to say the theorem truly supports the claim.

Reference graph

Works this paper leans on

69 extracted references · 69 canonical work pages · 1 internal anchor

  1. [1]

    Machine learning interatomic potentials at the centennial crossroads of quantum mechanics.Nature Computational Science, pages 1–13, 2025

    Bhupalee Kalita, Hatice Gokcan, and Olexandr Isayev. Machine learning interatomic potentials at the centennial crossroads of quantum mechanics.Nature Computational Science, pages 1–13, 2025

  2. [2]

    arXiv preprint arXiv:2401.00096 , year=

    Ilyes Batatia, Philipp Benner, Yuan Chiang, Alin M Elena, Dávid P Kovács, Janosh Riebesell, Xavier R Advincula, Mark Asta, Matthew Avaylon, William J Baldwin, et al. A foundation model for atomistic materials chemistry.arXiv preprint arXiv:2401.00096, 2023

  3. [3]

    Scalable parallel algorithm for graph neural network interatomic potentials in molecular dynamics simulations.J

    Yutack Park, Jaesun Kim, Seungwoo Hwang, and Seungwu Han. Scalable parallel algorithm for graph neural network interatomic potentials in molecular dynamics simulations.J. Chem. Theory Comput., 20 (11):4857–4868, 2024. doi: 10.1021/acs.jctc.4c00190

  4. [4]

    Wood, Misko Dzamba, Xiang Fu, Meng Gao, Muhammed Shuaibi, Luis Barroso- Luque, Kareem Abdelmaqsoud, Vahe Gharakhanyan, John R

    Brandon M Wood, Misko Dzamba, Xiang Fu, Meng Gao, Muhammed Shuaibi, Luis Barroso-Luque, Kareem Abdelmaqsoud, Vahe Gharakhanyan, John R Kitchin, Daniel S Levine, et al. Uma: A family of universal models for atoms.arXiv preprint arXiv:2506.23971, 2025

  5. [5]

    arXiv preprint arXiv:2504.06231 , year=

    Benjamin Rhodes, Sander Vandenhaute, Vaidotas Šimkus, James Gin, Jonathan Godwin, Tim Duignan, and Mark Neumann. Orb-v3: atomistic simulation at scale.arXiv preprint arXiv:2504.06231, 2025

  6. [6]

    Pet-mad as a lightweight universal interatomic potential for advanced materials modeling.Nature Communications, 16(1):10653, 2025

    Arslan Mazitov, Filippo Bigi, Matthias Kellner, Paolo Pegolo, Davide Tisi, Guillaume Fraux, Sergey Pozdnyakov, Philip Loche, and Michele Ceriotti. Pet-mad as a lightweight universal interatomic potential for advanced materials modeling.Nature Communications, 16(1):10653, 2025

  7. [7]

    Flashtp: Fused, sparsity-aware tensor product for machine learning interatomic potentials

    Seung Yul Lee, Hojoon Kim, Yutack Park, Dawoon Jeong, Seungwu Han, Yeonhong Park, and Jae W Lee. Flashtp: Fused, sparsity-aware tensor product for machine learning interatomic potentials. InForty-second International Conference on Machine Learning, 2025

  8. [8]

    Tensor field networks: Rotation- and translation-equivariant neural networks for 3D point clouds

    Nathaniel Thomas, Tess Smidt, Steven Kearnes, Lusann Yang, Li Li, Kai Kohlhoff, and Patrick Riley. Tensor field networks: Rotation-and translation-equivariant neural networks for 3d point clouds.arXiv preprint arXiv:1802.08219, 2018

  9. [9]

    Geiger and T

    Mario Geiger and Tess Smidt. e3nn: Euclidean neural networks.arXiv preprint arXiv:2207.09453, 2022

  10. [10]

    Reducing so (3) convolutions to so (2) for efficient equivariant gnns

    Saro Passaro and C Lawrence Zitnick. Reducing so (3) convolutions to so (2) for efficient equivariant gnns. InInternational conference on machine learning, pages 27420–27438. PMLR, 2023

  11. [11]

    Knowledge distillation: A survey

    Jianping Gou, Baosheng Yu, Stephen J Maybank, and Dacheng Tao. Knowledge distillation: A survey. International journal of computer vision, 129(6):1789–1819, 2021

  12. [12]

    A survey of quantization methods for efficient neural network inference

    Amir Gholami, Sehoon Kim, Zhen Dong, Zhewei Yao, Michael W Mahoney, and Kurt Keutzer. A survey of quantization methods for efficient neural network inference. InLow-power computer vision, pages 291–326. Chapman and Hall/CRC, 2022

  13. [13]

    Compression of deep learning models for text: A survey.ACM Transactions on Knowledge Discovery from Data (TKDD), 16(4):1–55, 2022

    Manish Gupta and Puneet Agrawal. Compression of deep learning models for text: A survey.ACM Transactions on Knowledge Discovery from Data (TKDD), 16(4):1–55, 2022

  14. [14]

    arXiv preprint arXiv:2310.06694 , year=

    Mengzhou Xia, Tianyu Gao, Zhiyuan Zeng, and Danqi Chen. Sheared llama: Accelerating language model pre-training via structured pruning.arXiv preprint arXiv:2310.06694, 2023

  15. [15]

    Slicegpt: Compress large language models by deleting rows and columns,

    Saleh Ashkboos, Maximilian L Croci, Marcelo Gennari do Nascimento, Torsten Hoefler, and James Hensman. Slicegpt: Compress large language models by deleting rows and columns.arXiv preprint arXiv:2401.15024, 2024

  16. [16]

    Shortgpt: Layers in large language models are more redundant than you expect

    Xin Men, Mingyu Xu, Qingyu Zhang, Qianhao Yuan, Bingning Wang, Hongyu Lin, Yaojie Lu, Xianpei Han, and Weipeng Chen. Shortgpt: Layers in large language models are more redundant than you expect. InFindings of the Association for Computational Linguistics: ACL 2025, pages 20192–20204, 2025

  17. [17]

    Towards faster and more compact foundation models for molecular property prediction

    Yasir Ghunaim, Andrés Villa, Gergo Ignacz, Gyorgy Szekely, Motasem Alfarra, and Bernard Ghanem. Towards faster and more compact foundation models for molecular property prediction. InProceedings of the Computer Vision and Pattern Recognition Conference, pages 48–57, 2025

  18. [18]

    Scalable foundation interatomic potentials via message-passing pruning and graph partitioning.arXiv preprint arXiv:2509.21694, 2025

    Lingyu Kong, Jaeheon Shim, Guoxiang Hu, and Victor Fung. Scalable foundation interatomic potentials via message-passing pruning and graph partitioning.arXiv preprint arXiv:2509.21694, 2025

  19. [19]

    Mace: Higher order equivariant message passing neural networks for fast and accurate force fields.Advances in neural information processing systems, 35:11423–11436, 2022

    Ilyes Batatia, David P Kovacs, Gregor Simm, Christoph Ortner, and Gábor Csányi. Mace: Higher order equivariant message passing neural networks for fast and accurate force fields.Advances in neural information processing systems, 35:11423–11436, 2022. 10

  20. [20]

    Deepmd-kit: A deep learning package for many-body potential energy representation and molecular dynamics.Computer Physics Communications, 228:178–184, 2018

    Han Wang, Linfeng Zhang, Jiequn Han, et al. Deepmd-kit: A deep learning package for many-body potential energy representation and molecular dynamics.Computer Physics Communications, 228:178–184, 2018

  21. [21]

    Equivariant message passing for the prediction of tensorial properties and molecular spectra

    Kristof Schütt, Oliver Unke, and Michael Gastegger. Equivariant message passing for the prediction of tensorial properties and molecular spectra. InInternational conference on machine learning, pages 9377–9388. PMLR, 2021

  22. [22]

    Gemnet: Universal directional graph neural networks for molecules.Advances in Neural Information Processing Systems, 34:6790–6802, 2021

    Johannes Gasteiger, Florian Becker, and Stephan Günnemann. Gemnet: Universal directional graph neural networks for molecules.Advances in Neural Information Processing Systems, 34:6790–6802, 2021

  23. [23]

    E (3)-equivariant graph neural networks for data- efficient and accurate interatomic potentials.Nature communications, 13(1):2453, 2022

    Simon Batzner, Albert Musaelian, Lixin Sun, Mario Geiger, Jonathan P Mailoa, Mordechai Kornbluth, Nicola Molinari, Tess E Smidt, and Boris Kozinsky. E (3)-equivariant graph neural networks for data- efficient and accurate interatomic potentials.Nature communications, 13(1):2453, 2022

  24. [24]

    High-performance training and inference for deep equivariant interatomic potentials.arXiv preprint arXiv:2504.16068, 2025

    Chuin Wei Tan, Marc L Descoteaux, Mit Kotak, Gabriel de Miranda Nascimento, Seán R Kavanagh, Laura Zichi, Menghang Wang, Aadit Saluja, Yizhong R Hu, Tess Smidt, et al. High-performance training and inference for deep equivariant interatomic potentials.arXiv preprint arXiv:2504.16068, 2025

  25. [25]

    Graph atomic cluster expansion for foundational machine learning interatomic potentials.arXiv preprint arXiv:2508.17936, 2025

    Yury Lysogorskiy, Anton Bochkarev, and Ralf Drautz. Graph atomic cluster expansion for foundational machine learning interatomic potentials.arXiv preprint arXiv:2508.17936, 2025

  26. [26]

    Zhang et al., A Graph Neural Network for the Era of Large Atomistic Models

    Duo Zhang, Anyang Peng, Chun Cai, Wentao Li, Yuanchang Zhou, Jinzhe Zeng, Mingyu Guo, Chengqian Zhang, Bowen Li, Hong Jiang, et al. A graph neural network for the era of large atomistic models (2025). arXiv preprint arXiv:2506.01686, 2025

  27. [27]

    arXiv preprint arXiv:2405.04967 , year=

    Han Yang, Chenxi Hu, Yichi Zhou, Xixian Liu, Yu Shi, Jielan Li, Guanzhi Li, Zekun Chen, Shuizhou Chen, Claudio Zeni, et al. Mattersim: A deep learning atomistic model across elements, temperatures and pressures.arXiv preprint arXiv:2405.04967, 2024

  28. [28]

    A recipe for scalable attention- based mlips: unlocking long-range accuracy with all-to-all node attention.arXiv preprint arXiv:2603.06567, 2026

    Eric Qu, Brandon M Wood, Aditi S Krishnapriyan, and Zachary W Ulissi. A recipe for scalable attention- based mlips: unlocking long-range accuracy with all-to-all node attention.arXiv preprint arXiv:2603.06567, 2026

  29. [29]

    Matris: Toward reliable and efficient pretrained machine learning interatomic potentials.arXiv preprint arXiv:2603.02002, 2026

    Yuanchang Zhou, Siyu Hu, Xiangyu Zhang, Hongyu Wang, Guangming Tan, and Weile Jia. Matris: Toward reliable and efficient pretrained machine learning interatomic potentials.arXiv preprint arXiv:2603.02002, 2026

  30. [30]

    Chgnet as a pretrained universal neural network potential for charge-informed atomistic modelling.Nature Machine Intelligence, 5(9):1031–1041, 2023

    Bowen Deng, Peichen Zhong, KyuJung Jun, Janosh Riebesell, Kevin Han, Christopher J Bartel, and Gerbrand Ceder. Chgnet as a pretrained universal neural network potential for charge-informed atomistic modelling.Nature Machine Intelligence, 5(9):1031–1041, 2023

  31. [31]

    Spice, a dataset of drug-like molecules and peptides for training machine learning potentials.Scientific Data, 10(1):11, 2023

    Peter Eastman, Pavan Kumar Behara, David L Dotson, Raimondas Galvelis, John E Herr, Josh T Horton, Yuezhi Mao, John D Chodera, Benjamin P Pritchard, Yuanqing Wang, et al. Spice, a dataset of drug-like molecules and peptides for training machine learning potentials.Scientific Data, 10(1):11, 2023

  32. [32]

    arXiv preprint arXiv:2410.12771 , year=

    Luis Barroso-Luque, Muhammed Shuaibi, Xiang Fu, Brandon M Wood, Misko Dzamba, Meng Gao, Ammar Rizvi, C Lawrence Zitnick, and Zachary W Ulissi. Open materials 2024 (omat24) inorganic materials dataset and models.arXiv preprint arXiv:2410.12771, 2024

  33. [33]

    Levine, Muhammed Shuaibi, Evan Walter Clark Spotte-Smith, Michael G

    Daniel S Levine, Muhammed Shuaibi, Evan Walter Clark Spotte-Smith, Michael G Taylor, Muhammad R Hasyim, Kyle Michel, Ilyes Batatia, Gábor Csányi, Misko Dzamba, Peter Eastman, et al. The open molecules 2025 (omol25) dataset, evaluations, and models.arXiv preprint arXiv:2505.08762, 2025

  34. [34]

    Open molecular crystals 2025 (omc25) dataset and models.Scientific Data, 2026

    Vahe Gharakhanyan, Luis Barroso-Luque, Yi Yang, Muhammed Shuaibi, Kyle Michel, Daniel S Levine, Misko Dzamba, Xiang Fu, Meng Gao, Xingyu Liu, et al. Open molecular crystals 2025 (omc25) dataset and models.Scientific Data, 2026

  35. [35]

    Brabson, Xiaohan Yu, Sihoon Choi, Kareem Abdelmaqsoud, Elias Moubarak, Pim de Haan, Sindy Löwe, Johann Brehmer, John R

    Anuroop Sriram, Logan M Brabson, Xiaohan Yu, Sihoon Choi, Kareem Abdelmaqsoud, Elias Moubarak, Pim de Haan, Sindy Löwe, Johann Brehmer, John R Kitchin, et al. The open dac 2025 dataset for sorbent discovery in direct air capture.arXiv preprint arXiv:2508.03162, 2025

  36. [36]

    Machine learning meets quantum physics.Lecture Notes in Physics, 2020

    Kristof T Schütt, Stefan Chmiela, O Anatole V on Lilienfeld, Alexandre Tkatchenko, Koji Tsuda, and Klaus-Robert Müller. Machine learning meets quantum physics.Lecture Notes in Physics, 2020

  37. [37]

    Scaling deep learning for materials discovery.Nature, 624(7990):80–85, 2023

    Amil Merchant, Simon Batzner, Samuel S Schoenholz, Muratahan Aykol, Gowoon Cheon, and Ekin Dogus Cubuk. Scaling deep learning for materials discovery.Nature, 624(7990):80–85, 2023. 11

  38. [38]

    Fine-tuning foundation models of materials interatomic potentials with frozen transfer learning.npj Computational Materials, 11(1):237, 2025

    Mariia Radova, Wojciech G Stark, Connor S Allen, Reinhard J Maurer, and Albert P Bartók. Fine-tuning foundation models of materials interatomic potentials with frozen transfer learning.npj Computational Materials, 11(1):237, 2025

  39. [39]

    Data-efficient fine-tuning of foundational models for first-principles quality sublimation enthalpies.Faraday Discussions, 256:120–138, 2025

    Harveen Kaur, Flaviano Della Pia, Ilyes Batatia, Xavier R Advincula, Benjamin X Shi, Jinggang Lan, Gábor Csányi, Angelos Michaelides, and Venkat Kapil. Data-efficient fine-tuning of foundational models for first-principles quality sublimation enthalpies.Faraday Discussions, 256:120–138, 2025

  40. [40]

    Model compression and hardware acceleration for neural networks: A comprehensive survey.Proceedings of the IEEE, 108(4):485–532, 2020

    Lei Deng, Guoqi Li, Song Han, Luping Shi, and Yuan Xie. Model compression and hardware acceleration for neural networks: A comprehensive survey.Proceedings of the IEEE, 108(4):485–532, 2020

  41. [41]

    The Lottery Ticket Hypothesis: Finding Sparse, Trainable Neural Networks

    Jonathan Frankle and Michael Carbin. The lottery ticket hypothesis: Finding sparse, trainable neural networks.arXiv preprint arXiv:1803.03635, 2018

  42. [42]

    Accelerating molecular graph neural networks via knowledge distillation.Advances in Neural Information Processing Systems, 36: 25761–25792, 2023

    Filip Ekström Kelvinius, Dimitar Georgiev, Artur Toshev, and Johannes Gasteiger. Accelerating molecular graph neural networks via knowledge distillation.Advances in Neural Information Processing Systems, 36: 25761–25792, 2023

  43. [43]

    arXiv preprint arXiv:2501.09009 , year=

    Ishan Amin, Sanjeev Raja, and Aditi Krishnapriyan. Towards fast, specialized machine learning force fields: Distilling foundation models via energy hessians.arXiv preprint arXiv:2501.09009, 2025

  44. [44]

    Speeding up mace: Low-precision tricks for equivarient force fields.arXiv preprint arXiv:2510.23621, 2025

    Alexandre Benoit. Speeding up mace: Low-precision tricks for equivarient force fields.arXiv preprint arXiv:2510.23621, 2025

  45. [45]

    Are sixteen heads really better than one?Advances in neural information processing systems, 32, 2019

    Paul Michel, Omer Levy, and Graham Neubig. Are sixteen heads really better than one?Advances in neural information processing systems, 32, 2019

  46. [46]

    Structured pruning of large language models

    Ziheng Wang, Jeremy Wohlwend, and Tao Lei. Structured pruning of large language models. InProceedings of the 2020 conference on empirical methods in natural language processing (emnlp), pages 6151–6162, 2020

  47. [47]

    Llm-pruner: On the structural pruning of large language models.Advances in neural information processing systems, 36:21702–21720, 2023

    Xinyin Ma, Gongfan Fang, and Xinchao Wang. Llm-pruner: On the structural pruning of large language models.Advances in neural information processing systems, 36:21702–21720, 2023

  48. [48]

    Deep Compression: Compressing Deep Neural Networks with Pruning, Trained Quantization and Huffman Coding

    Song Han, Huizi Mao, and William J Dally. Deep compression: Compressing deep neural networks with pruning, trained quantization and huffman coding.arXiv preprint arXiv:1510.00149, 2015

  49. [49]

    Mace-off: Short-range transferable machine learning force fields for organic molecules.Journal of the American Chemical Society, 147(21):17598–17611, 2025

    Dávid Péter Kovács, J Harry Moore, Nicholas J Browning, Ilyes Batatia, Joshua T Horton, Yixuan Pu, Venkat Kapil, William C Witt, Ioan-Bogdan Magdau, Daniel J Cole, et al. Mace-off: Short-range transferable machine learning force fields for organic molecules.Journal of the American Chemical Society, 147(21):17598–17611, 2025

  50. [50]

    Linear atomic cluster expansion force fields for organic molecules: beyond rmse.Journal of chemical theory and computation, 17(12):7696–7711, 2021

    Dávid Péter Kovács, Cas van der Oord, Jiri Kucera, Alice EA Allen, Daniel J Cole, Christoph Ortner, and Gábor Csányi. Linear atomic cluster expansion force fields for organic molecules: beyond rmse.Journal of chemical theory and computation, 17(12):7696–7711, 2021

  51. [51]

    Batatia,et al., The Design Space of E(3)-Equivariant Atom-Centered Interatomic Potentials

    Ilyes Batatia, Simon Batzner, Dávid Péter Kovács, Albert Musaelian, Gregor NC Simm, Ralf Drautz, Christoph Ortner, Boris Kozinsky, and Gábor Csányi. The design space of e (3)-equivariant atom-centered interatomic potentials.arXiv preprint arXiv:2205.06643, 2022

  52. [52]

    On the role of gradients for machine learning of molecular energies and forces.Machine Learning: Science and Technology, 1(4):045018, 2020

    Anders S Christensen and O Anatole V on Lilienfeld. On the role of gradients for machine learning of molecular energies and forces.Machine Learning: Science and Technology, 1(4):045018, 2020

  53. [53]

    Improving machine-learning models in materials science through large datasets.Materials Today Physics, 48:101560, 2024

    Jonathan Schmidt, Tiago FT Cerqueira, Aldo H Romero, Antoine Loew, Fabian Jäger, Hai-Chen Wang, Silvana Botti, and Miguel AL Marques. Improving machine-learning models in materials science through large datasets.Materials Today Physics, 48:101560, 2024

  54. [54]

    Learning smooth and expressive interatomic potentials for physical property prediction

    Xiang Fu, Brandon M Wood, Luis Barroso-Luque, Daniel S Levine, Meng Gao, Misko Dzamba, and C Lawrence Zitnick. Learning smooth and expressive interatomic potentials for physical property prediction. InForty-second International Conference on Machine Learning, 2025

  55. [55]

    Matbench discovery–a framework to evaluate machine learning crystal stability predictions.arXiv preprint arXiv:2308.14920, 2023

    Janosh Riebesell, Rhys EA Goodall, Philipp Benner, Yuan Chiang, Bowen Deng, Alpha A Lee, Anubhav Jain, and Kristin A Persson. Matbench discovery–a framework to evaluate machine learning crystal stability predictions.arXiv preprint arXiv:2308.14920, 2023

  56. [56]

    Open catalyst 2020 (oc20) dataset and community challenges.Acs Catalysis, 11(10):6059–6072, 2021

    Lowik Chanussot, Abhishek Das, Siddharth Goyal, Thibaut Lavril, Muhammed Shuaibi, Morgane Riviere, Kevin Tran, Javier Heras-Domingo, Caleb Ho, Weihua Hu, et al. Open catalyst 2020 (oc20) dataset and community challenges.Acs Catalysis, 11(10):6059–6072, 2021. 12

  57. [57]

    Compact language models via pruning and knowledge distillation.Advances in Neural Information Processing Systems, 37:41076–41102, 2024

    Saurav Muralidharan, Sharath Turuvekere Sreenivas, Raviraj Joshi, Marcin Chochowski, Mostofa Patwary, Mohammad Shoeybi, Bryan Catanzaro, Jan Kautz, and Pavlo Molchanov. Compact language models via pruning and knowledge distillation.Advances in Neural Information Processing Systems, 37:41076–41102, 2024

  58. [58]

    Deep potential generation scheme and simulation protocol for the li10gep2s12-type superionic conductors.The Journal of Chemical Physics, 154(9), 2021

    Jianxing Huang, Linfeng Zhang, Han Wang, Jinbao Zhao, Jun Cheng, et al. Deep potential generation scheme and simulation protocol for the li10gep2s12-type superionic conductors.The Journal of Chemical Physics, 154(9), 2021

  59. [59]

    Phase diagram of a deep potential water model

    Linfeng Zhang, Han Wang, Roberto Car, and Weinan E. Phase diagram of a deep potential water model. Physical review letters, 126(23):236001, 2021

  60. [60]

    A generalizable machine learning potential of ag–au nanoalloys and its application to surface reconstruction, segregation and diffusion

    YiNan Wang, LinFeng Zhang, Ben Xu, XiaoYang Wang, and Han Wang. A generalizable machine learning potential of ag–au nanoalloys and its application to surface reconstruction, segregation and diffusion. Modelling and Simulation in Materials Science and Engineering, 30(2):025003, 2021

  61. [61]

    Accurate deep potential model for the al–cu–mg alloy in the full concentration space.Chinese Physics B, 30(5):050706, 2021

    Wanrun Jiang, Yuzhi Zhang, Linfeng Zhang, and Han Wang. Accurate deep potential model for the al–cu–mg alloy in the full concentration space.Chinese Physics B, 30(5):050706, 2021

  62. [62]

    Dp-gen: A concurrent learning platform for the generation of reliable deep learning based potential energy models

    Yuzhi Zhang, Haidi Wang, Weijie Chen, Jinzhe Zeng, Linfeng Zhang, Han Wang, and E Weinan. Dp-gen: A concurrent learning platform for the generation of reliable deep learning based potential energy models. Computer Physics Communications, 253:107206, 2020

  63. [63]

    Specialising neural network potentials for accurate properties and application to the mechanical response of titanium.npj Computational Materials, 7(1):206, 2021

    Tongqi Wen, Rui Wang, Lingyu Zhu, Linfeng Zhang, Han Wang, David J Srolovitz, and Zhaoxuan Wu. Specialising neural network potentials for accurate properties and application to the mechanical response of titanium.npj Computational Materials, 7(1):206, 2021

  64. [64]

    Classical and machine learning interatomic potentials for bcc vanadium.Physical Review Materials, 6(11): 113603, 2022

    Rui Wang, Xiaoxiao Ma, Linfeng Zhang, Han Wang, David J Srolovitz, Tongqi Wen, and Zhaoxuan Wu. Classical and machine learning interatomic potentials for bcc vanadium.Physical Review Materials, 6(11): 113603, 2022

  65. [65]

    A tungsten deep neural-network potential for simulating mechanical property degradation under fusion service environment.Nuclear Fusion, 62(12):126013, 2022

    Xiaoyang Wang, Yinan Wang, Linfeng Zhang, Fuzhi Dai, and Han Wang. A tungsten deep neural-network potential for simulating mechanical property degradation under fusion service environment.Nuclear Fusion, 62(12):126013, 2022

  66. [66]

    Accelerate drug and material discovery with new math library nvidia cuequivariance.URL https://github.com/NVIDIA/cuEquivariance, 2024

    Mario Geiger, Emine Kucukbenli, Becca Zandstein, and Kyle Tretina. Accelerate drug and material discovery with new math library nvidia cuequivariance.URL https://github.com/NVIDIA/cuEquivariance, 2024

  67. [67]

    Equivariant graph network approximations of high-degree polynomials for force field prediction.arXiv preprint arXiv:2411.04219, 2024

    Zhao Xu, Haiyang Yu, Montgomery Bohde, and Shuiwang Ji. Equivariant graph network approximations of high-degree polynomials for force field prediction.arXiv preprint arXiv:2411.04219, 2024. 13 A Background Machine-Learning Interatomic Potentials (MLIPs).The primary objective of MLIPs is to approximate the Potential Energy Surface (PES) of an atomic system...

  68. [68]

    , ml) be an arbitrary diagonal binary mask where not allm i are equal

    Element-wise Pruning:Let M=diag(m −l, . . . , ml) be an arbitrary diagonal binary mask where not allm i are equal. ApplyingMelement-wise violates equivariance: M(Dl(g)hl)̸=D l(g)(Mhl)(10) 15

  69. [69]

    Small Embedding

    Block-wise Pruning:Applying a scalar mask z∈ {0,1} (our proposed method) preserves equivariance: z(Dl(g)hl) =D l(g)(zh l)(11) Proof. Part 1 (Violation):The Wigner-D matrix Dl(g) is a dense (2l+ 1)×(2l+ 1) unitary matrix that mixes all components indexed by m∈[−l, l] . Let h′ =D l(g)h. The i-th compo- nent of the rotated feature is h′ i = P j Dl(g)ijhj. If...