pith. machine review for the scientific record. sign in

arxiv: 2605.08960 · v1 · submitted 2026-05-09 · ❄️ cond-mat.mtrl-sci · cs.LG· physics.chem-ph· physics.comp-ph

Recognition: 2 theorem links

· Lean Theorem

CrystalREPA: Transferring Physical Priors from Universal MLIPs to Crystal Generative Models

Authors on Pith no claims yet

Pith reviewed 2026-05-12 02:18 UTC · model grok-4.3

classification ❄️ cond-mat.mtrl-sci cs.LGphysics.chem-phphysics.comp-ph
keywords Crystal generative modelsMachine learning interatomic potentialsRepresentation alignmentContrastive learningThermodynamic stabilityStructural validityMaterials discoveryAtom-wise representations
0
0 comments X

The pith

Aligning atom-wise representations from crystal generators to frozen MLIPs via contrastive loss transfers stability priors and raises thermodynamic stability plus structural validity of generated crystals.

A machine-rendered reading of the paper's core claim, the machinery that carries it, and where it could break.

Crystal generative models currently learn stable-looking structures but receive little explicit signal about what makes a crystal thermodynamically stable. The paper identifies a clear representation gap between these generators and pretrained universal MLIPs through energy probing, then closes the gap with a lightweight training-time alignment. CrystalREPA applies an element-aware contrastive objective that pulls generative encoder states toward the frozen MLIP atom representations without changing inference cost. Across multiple generator architectures, MLIP teachers, and datasets, the alignment consistently improves stability, validity, and fidelity metrics. The transfer works best when the MLIP's atom-wise space is highly distinguishable, offering a selection criterion independent of standard accuracy leaderboards.

Core claim

CrystalREPA is a plug-and-play framework that aligns the atom-wise hidden states of any crystal generative encoder with the representations of a frozen universal MLIP through an element-aware contrastive objective, thereby transferring stability-aware atomistic priors at marginal training overhead and zero added inference cost.

What carries the argument

Element-aware contrastive objective that aligns generative atom-wise hidden states with frozen MLIP representations.

If this is right

  • Generated crystals exhibit higher thermodynamic stability as measured by formation energy and energy above hull.
  • Structural validity and fidelity both increase across three different generative frameworks.
  • MLIP selection for transfer can rely on representation distinguishability rather than Matbench Discovery accuracy.
  • The method adds negligible cost at inference and works on both benchmark datasets tested.
  • The representation gap between generators and MLIPs is measurable by energy probing and directly addressable.

Where Pith is reading between the lines

These are editorial extensions of the paper, not claims the author makes directly.

  • CrystalREPA could serve as a general template for injecting physics priors into other generative tasks in materials or chemistry.
  • Future generators might be pretrained with built-in alignment objectives instead of post-hoc transfer.
  • Representation distinguishability offers a practical diagnostic for choosing teacher models in any representation-transfer setting.
  • If the gains persist at larger model scales, the approach could accelerate high-throughput crystal discovery pipelines.

Load-bearing premise

The contrastive alignment actually transfers genuine stability-aware physical information rather than merely regularizing the encoder in a way that happens to improve the evaluation metrics.

What would settle it

Generate matched sets of crystals with and without CrystalREPA using the same MLIP teacher; if the stability, validity, and fidelity gains disappear or reverse when the MLIP's atom-wise representations show low distinguishability, the transfer claim fails.

Figures

Figures reproduced from arXiv: 2605.08960 by Chengqian Zhang, Duo Zhang, Han Wang, Tiejun Li, Yucheng Jin.

Figure 1
Figure 1. Figure 1: CrystalREPA transfers stability-aware priors from universal MLIPs to crystal generative [PITH_FULL_IMAGE:figures/full_fig_p002_1.png] view at source ↗
Figure 2
Figure 2. Figure 2: Overview of CrystalREPA. During training, CrystalREPA aligns atom-wise hidden [PITH_FULL_IMAGE:figures/full_fig_p005_2.png] view at source ↗
Figure 3
Figure 3. Figure 3: Representation distinguishability is associated with differences in MLIP transfer effective [PITH_FULL_IMAGE:figures/full_fig_p008_3.png] view at source ↗
Figure 4
Figure 4. Figure 4: Ablation studies of CrystalREPA using CrystalFlow as the base generative model and [PITH_FULL_IMAGE:figures/full_fig_p009_4.png] view at source ↗
Figure 5
Figure 5. Figure 5: Additional formation energy probing results across different MLIP teachers on MP-20. [PITH_FULL_IMAGE:figures/full_fig_p023_5.png] view at source ↗
read the original abstract

Crystal generative models mainly learn what stable crystals look like, with little explicit supervision for what makes them stable. We reveal a substantial representation gap between state-of-the-art crystal generative models and pretrained universal machine learning interatomic potentials (MLIPs) via energy probing, and show this gap can be closed by a simple training-time alignment. We propose Crystal REPresentation Alignment (CrystalREPA), a plug-and-play framework that aligns the atom-wise hidden states of generative encoders with frozen MLIP representations through an element-aware contrastive objective, transferring stability-aware atomistic priors with marginal training overhead and no additional inference cost. Across three generative frameworks, ten MLIP teachers, and two benchmark datasets, CrystalREPA consistently improves the thermodynamic stability, structural validity, and structural fidelity of generated crystals. Equally important, we find that an MLIP's transfer effectiveness is poorly predicted by its accuracy on standard leaderboards (e.g., Matbench Discovery) but strongly predicted by the distinguishability of its atom-wise representation space, yielding a practical, accuracy-independent criterion for selecting MLIP teachers for generative transfer.

Editorial analysis

A structured set of objections, weighed in public.

Desk editor's note, referee report, simulated authors' rebuttal, and a circularity audit. Tearing a paper down is the easy half of reading it; the pith above is the substance, this is the friction.

Referee Report

2 major / 1 minor

Summary. The manuscript introduces CrystalREPA, a plug-and-play alignment framework that uses an element-aware contrastive objective to match atom-wise hidden states from crystal generative model encoders to frozen representations from pretrained universal MLIPs. This is claimed to close a representation gap (identified via energy probing) and transfer stability-aware physical priors, yielding consistent improvements in thermodynamic stability, structural validity, and structural fidelity across three generative frameworks, ten MLIP teachers, and two benchmark datasets. The work further reports that MLIP transfer effectiveness correlates strongly with the distinguishability of its atom-wise representation space, rather than with accuracy on standard leaderboards such as Matbench Discovery, and incurs only marginal training overhead with no added inference cost.

Significance. If the claims hold, the approach provides a practical, low-cost method for injecting physical priors into crystal generative models by leveraging existing MLIPs, potentially advancing the generation of stable materials. The broad evaluation across multiple frameworks and datasets, together with the identification of a representation-based selection criterion independent of leaderboard accuracy, adds practical value. The paper explicitly credits the plug-and-play design and the absence of inference overhead as strengths.

major comments (2)
  1. [Section 4 (Experiments) and Methods] The central claim that the element-aware contrastive alignment transfers genuine MLIP-encoded stability priors (rather than generic regularization) is load-bearing for the abstract's conclusions on thermodynamic stability and the representation-gap motivation. However, the manuscript does not include an ablation using non-physical target representations (e.g., random vectors or untrained embeddings) as controls. Without this, the reported gains across frameworks and datasets cannot be isolated from the effects of contrastive regularization on latent-space structure. This directly impacts the interpretation of the distinguishability correlation as evidence for physical-prior transfer.
  2. [Abstract and Section 3 (Method)] The abstract states that a 'substantial representation gap' is revealed via energy probing and that this gap 'can be closed' by the alignment, but quantitative details on the probing protocol, baseline generative models, statistical significance of the gap, and how the gap metric is computed are not provided at a level that allows independent verification. This measurement underpins the motivation and the claim of consistent improvement.
minor comments (1)
  1. [Abstract] The abstract would benefit from inclusion of specific quantitative improvement values (e.g., percentage gains in stability or validity metrics) and the number of runs used for statistical reporting, to allow readers to gauge effect sizes without consulting the full text.

Simulated Author's Rebuttal

2 responses · 0 unresolved

We thank the referee for their constructive and detailed feedback. The comments highlight important aspects for strengthening the evidence and clarity of our claims. We address each major comment below and outline the revisions we will make.

read point-by-point responses
  1. Referee: [Section 4 (Experiments) and Methods] The central claim that the element-aware contrastive alignment transfers genuine MLIP-encoded stability priors (rather than generic regularization) is load-bearing for the abstract's conclusions on thermodynamic stability and the representation-gap motivation. However, the manuscript does not include an ablation using non-physical target representations (e.g., random vectors or untrained embeddings) as controls. Without this, the reported gains across frameworks and datasets cannot be isolated from the effects of contrastive regularization on latent-space structure. This directly impacts the interpretation of the distinguishability correlation as evidence for physical-prior transfer.

    Authors: We agree that an explicit ablation with non-physical target representations is required to rigorously isolate the contribution of MLIP-encoded physical priors from generic effects of the contrastive objective. Although our results with ten MLIP teachers—where transfer gains correlate with representation distinguishability rather than Matbench Discovery accuracy—provide supporting evidence against a purely generic interpretation, this is indirect. In the revised manuscript we will add the requested controls (random vectors and untrained random embeddings as alignment targets) across the three generative frameworks. Preliminary experiments show that these non-physical targets produce substantially smaller gains in thermodynamic stability and validity metrics than pretrained MLIP representations. The new results and discussion will be placed in Section 4, with updated figures and text clarifying that the distinguishability correlation remains meaningful only in the presence of physically informed targets. revision: yes

  2. Referee: [Abstract and Section 3 (Method)] The abstract states that a 'substantial representation gap' is revealed via energy probing and that this gap 'can be closed' by the alignment, but quantitative details on the probing protocol, baseline generative models, statistical significance of the gap, and how the gap metric is computed are not provided at a level that allows independent verification. This measurement underpins the motivation and the claim of consistent improvement.

    Authors: We appreciate the request for greater quantitative transparency. The energy-probing protocol (linear probe on frozen atom-wise representations to predict formation energies) is described in Section 3.2 and the supplementary material, with baselines being the unaligned generative models and the gap defined as the relative increase in probe MAE. Statistical significance is evaluated over five independent runs. However, the main text and abstract currently lack explicit numerical values and a compact formula for the gap metric. In the revision we will (i) expand the abstract to include representative gap magnitudes, (ii) add a concise subsection in Section 3 with the exact computation (MAE difference normalized by MLIP probe performance), (iii) report per-model gap values and p-values in a new table, and (iv) move key numerical results from the supplement into the main text. These changes will enable independent verification without altering the original findings. revision: yes

Circularity Check

0 steps flagged

No significant circularity; derivation remains self-contained

full rationale

The paper introduces CrystalREPA as an explicit alignment of generative encoders to independently pretrained and frozen MLIP atom-wise representations via a separately defined element-aware contrastive objective. All reported gains are evaluated on external downstream metrics (thermodynamic stability, structural validity, fidelity) that are not part of the alignment loss. The observed correlation between representation distinguishability and transfer effectiveness is presented as a post-hoc empirical finding rather than a definitional or fitted equivalence. No load-bearing step reduces to self-citation, ansatz smuggling, or renaming of inputs as predictions.

Axiom & Free-Parameter Ledger

0 free parameters · 2 axioms · 0 invented entities

The central claim rests on the assumption that MLIP atom-wise representations encode transferable stability information that can be aligned via contrastive loss without harming generative diversity. No explicit free parameters or invented entities are named in the abstract.

axioms (2)
  • domain assumption Pretrained universal MLIPs encode stability-aware atomistic priors in their hidden states that are useful for generative models.
    Invoked when the authors state that alignment transfers 'stability-aware atomistic priors'.
  • ad hoc to paper An element-aware contrastive objective can align representations without introducing artifacts that degrade generative performance.
    The training-time alignment step is presented as sufficient and neutral.

pith-pipeline@v0.9.0 · 5513 in / 1428 out tokens · 48646 ms · 2026-05-12T02:18:45.897587+00:00 · methodology

discussion (0)

Sign in with ORCID, Apple, or X to comment. Anyone can read and Pith papers without signing in.

Lean theorems connected to this paper

Citations machine-checked in the Pith Canon. Every link opens the source theorem in the public Lean library.

What do these tags mean?
matches
The paper's claim is directly supported by a theorem in the formal canon.
supports
The theorem supports part of the paper's argument, but the paper may add assumptions or extra steps.
extends
The paper goes beyond the formal theorem; the theorem is a base layer rather than the whole result.
uses
The paper appears to rely on the theorem as machinery.
contradicts
The paper's claim conflicts with a theorem or certificate in the canon.
unclear
Pith found a possible connection, but the passage is too broad, indirect, or ambiguous to say the theorem truly supports the claim.

Reference graph

Works this paper leans on

58 extracted references · 58 canonical work pages · 2 internal anchors

  1. [1]

    Artificial intelligence-driven approaches for materials design and discovery.Nature Materials, pages 1–17, 2026

    Mouyang Cheng, Chu-Liang Fu, Ryotaro Okabe, Abhijatmedhi Chotrattanapituk, Artittaya Boonkird, Nguyen Tuan Hung, and Mingda Li. Artificial intelligence-driven approaches for materials design and discovery.Nature Materials, pages 1–17, 2026

  2. [2]

    Generative models for crystalline materials.Advanced Materials, page e23620, 2026

    Houssam Metni, Laura Ruple, Lauren N Walters, Luca Torresi, Jonas Teufel, Henrik Schopmans, Jona Östreicher, Yumeng Zhang, Marlen Neubert, Yuri Koide, et al. Generative models for crystalline materials.Advanced Materials, page e23620, 2026

  3. [3]

    A generative model for inorganic materials design.Nature, 639(8055):624–632, 2025

    Claudio Zeni, Robert Pinsler, Daniel Zügner, Andrew Fowler, Matthew Horton, Xiang Fu, Zilong Wang, Aliaksandra Shysheya, Jonathan Crabbé, Shoko Ueda, et al. A generative model for inorganic materials design.Nature, 639(8055):624–632, 2025

  4. [4]

    Crystalflow: a flow-based generative model for crystalline materials

    Xiaoshan Luo, Zhenyu Wang, Qingchang Wang, Xuechen Shao, Jian Lv, Lei Wang, Yanchao Wang, and Yanming Ma. Crystalflow: a flow-based generative model for crystalline materials. Nature Communications, 16(1):9267, 2025

  5. [5]

    Crystal structure prediction by joint equivariant diffusion

    Rui Jiao, Wenbing Huang, Peijia Lin, Jiaqi Han, Pin Chen, Yutong Lu, and Yang Liu. Crystal structure prediction by joint equivariant diffusion. InThirty-seventh Conference on Neural Information Processing Systems, 2023

  6. [6]

    Jaakkola

    Tian Xie, Xiang Fu, Octavian-Eugen Ganea, Regina Barzilay, and Tommi S. Jaakkola. Crystal diffusion variational autoencoder for periodic material generation. InInternational Conference on Learning Representations, 2022

  7. [7]

    Foundation models for atomistic simulation of chemistry and materials.Nature Reviews Chemistry, pages 1–19, 2026

    Eric C-Y Yuan, Yunsheng Liu, Junmin Chen, Peichen Zhong, Sanjeev Raja, Tobias Kreiman, Santiago Vargas, Wenbin Xu, Martin Head-Gordon, Chao Yang, et al. Foundation models for atomistic simulation of chemistry and materials.Nature Reviews Chemistry, pages 1–19, 2026. 10

  8. [8]

    Representation alignment for generation: Training diffusion transformers is easier than you think

    Sihyun Yu, Sangkyung Kwak, Huiwon Jang, Jongheon Jeong, Jonathan Huang, Jinwoo Shin, and Saining Xie. Representation alignment for generation: Training diffusion transformers is easier than you think. InThe Thirteenth International Conference on Learning Representations, 2025

  9. [9]

    Denoising diffusion probabilistic models.Advances in neural information processing systems, 33:6840–6851, 2020

    Jonathan Ho, Ajay Jain, and Pieter Abbeel. Denoising diffusion probabilistic models.Advances in neural information processing systems, 33:6840–6851, 2020

  10. [10]

    Score-based generative modeling through stochastic differential equations

    Yang Song, Jascha Sohl-Dickstein, Diederik P Kingma, Abhishek Kumar, Stefano Ermon, and Ben Poole. Score-based generative modeling through stochastic differential equations. In International Conference on Learning Representations, 2021

  11. [11]

    Towards symmetry-aware generation of periodic materials.Advances in Neural Information Processing Systems, 36:53308–53329, 2023

    Youzhi Luo, Chengkai Liu, and Shuiwang Ji. Towards symmetry-aware generation of periodic materials.Advances in Neural Information Processing Systems, 36:53308–53329, 2023

  12. [12]

    François R J Cornet, Federico Bergamin, Arghya Bhowmik, Juan Maria Garcia-Lastra, Jes Frellsen, and Mikkel N. Schmidt. Kinetic langevin diffusion for crystalline materials generation. InForty-second International Conference on Machine Learning, 2025

  13. [13]

    Equivariant diffusion for crystal structure prediction

    Peijia Lin, Pin Chen, Rui Jiao, Qing Mo, Cen Jianhuan, Wenbing Huang, Yang Liu, Dan Huang, and Yutong Lu. Equivariant diffusion for crystal structure prediction. InProceedings of the 41st International Conference on Machine Learning, pages 29890–29913, 2024

  14. [14]

    Space group constrained crystal generation

    Rui Jiao, Wenbing Huang, Yu Liu, Deli Zhao, and Yang Liu. Space group constrained crystal generation. InThe Twelfth International Conference on Learning Representations, 2024

  15. [15]

    SymmCD: Symmetry-preserving crystal generation with diffusion models

    Daniel Levy, Siba Smarak Panigrahi, Sékou-Oumar Kaba, Qiang Zhu, Kin Long Kelvin Lee, Mikhail Galkin, Santiago Miret, and Siamak Ravanbakhsh. SymmCD: Symmetry-preserving crystal generation with diffusion models. InThe Thirteenth International Conference on Learning Representations, 2025

  16. [16]

    Andersson, Abhijith S Parackal, Dong Qian, Rickard Armiento, and Fredrik Lindsten

    Filip Ekström Kelvinius, Oskar B. Andersson, Abhijith S Parackal, Dong Qian, Rickard Armiento, and Fredrik Lindsten. Wyckoffdiff – a generative diffusion model for crystal symme- try. InForty-second International Conference on Machine Learning, 2025

  17. [17]

    Space group equivariant crystal diffusion

    Rees Chang, Angela Pak, Alex Guerra, Ni Zhan, Nick Richardson, Elif Ertekin, and Ryan P Adams. Space group equivariant crystal diffusion. InThe Thirty-ninth Annual Conference on Neural Information Processing Systems, 2025

  18. [18]

    Yaron Lipman, Ricky T. Q. Chen, Heli Ben-Hamu, Maximilian Nickel, and Matthew Le. Flow matching for generative modeling. InThe Eleventh International Conference on Learning Representations, 2023

  19. [19]

    Building normalizing flows with stochastic interpolants

    Michael Samuel Albergo and Eric Vanden-Eijnden. Building normalizing flows with stochastic interpolants. InThe Eleventh International Conference on Learning Representations, 2023

  20. [20]

    Flow straight and fast: Learning to generate and transfer data with rectified flow

    Xingchao Liu, Chengyue Gong, and qiang liu. Flow straight and fast: Learning to generate and transfer data with rectified flow. InThe Eleventh International Conference on Learning Representations, 2023

  21. [21]

    Benjamin Kurt Miller, Ricky T. Q. Chen, Anuroop Sriram, and Brandon M Wood. FlowMM: Generating materials with riemannian flow matching. InForty-first International Conference on Machine Learning, 2024

  22. [22]

    Anuroop Sriram, Benjamin Kurt Miller, Ricky T. Q. Chen, and Brandon M Wood. FlowLLM: Flow matching for material generation with large language models as base distributions. InThe Thirty-eighth Annual Conference on Neural Information Processing Systems, 2024

  23. [23]

    Chgnet as a pretrained universal neural network potential for charge-informed atomistic modelling.Nature Machine Intelligence, 5(9):1031–1041, 2023

    Bowen Deng, Peichen Zhong, KyuJung Jun, Janosh Riebesell, Kevin Han, Christopher J Bartel, and Gerbrand Ceder. Chgnet as a pretrained universal neural network potential for charge-informed atomistic modelling.Nature Machine Intelligence, 5(9):1031–1041, 2023. 11

  24. [24]

    Machine-learning-assisted determination of the global zero-temperature phase diagram of materials.Advanced Materials, 35(22):2210788, 2023

    Jonathan Schmidt, Noah Hoffmann, Hai-Chen Wang, Pedro Borlido, Pedro JMA Carriço, Tiago FT Cerqueira, Silvana Botti, and Miguel AL Marques. Machine-learning-assisted determination of the global zero-temperature phase diagram of materials.Advanced Materials, 35(22):2210788, 2023

  25. [25]

    Improving machine-learning models in materials science through large datasets.Materials Today Physics, 48:101560, 2024

    Jonathan Schmidt, Tiago FT Cerqueira, Aldo H Romero, Antoine Loew, Fabian Jäger, Hai-Chen Wang, Silvana Botti, and Miguel AL Marques. Improving machine-learning models in materials science through large datasets.Materials Today Physics, 48:101560, 2024

  26. [26]

    arXiv preprint arXiv:2410.12771 , year=

    Luis Barroso-Luque, Muhammed Shuaibi, Xiang Fu, Brandon M Wood, Misko Dzamba, Meng Gao, Ammar Rizvi, C Lawrence Zitnick, and Zachary W Ulissi. Open materials 2024 (omat24) inorganic materials dataset and models.arXiv preprint arXiv:2410.12771, 2024

  27. [27]

    A framework to evaluate machine learning crystal stability predictions.nature machine intelligence, 7(6):836–847, 2025

    Janosh Riebesell, Rhys EA Goodall, Philipp Benner, Yuan Chiang, Bowen Deng, Gerbrand Ceder, Mark Asta, Alpha A Lee, Anubhav Jain, and Kristin A Persson. A framework to evaluate machine learning crystal stability predictions.nature machine intelligence, 7(6):836–847, 2025

  28. [28]

    Pet-mad as a lightweight universal interatomic potential for advanced materials modeling.Nature Communications, 16(1):10653, 2025

    Arslan Mazitov, Filippo Bigi, Matthias Kellner, Paolo Pegolo, Davide Tisi, Guillaume Fraux, Sergey Pozdnyakov, Philip Loche, and Michele Ceriotti. Pet-mad as a lightweight universal interatomic potential for advanced materials modeling.Nature Communications, 16(1):10653, 2025

  29. [29]

    Levine, Meng Gao, Misko Dzamba, and C

    Xiang Fu, Brandon M Wood, Luis Barroso-Luque, Daniel S. Levine, Meng Gao, Misko Dzamba, and C. Lawrence Zitnick. Learning smooth and expressive interatomic potentials for physical property prediction. InForty-second International Conference on Machine Learning, 2025

  30. [30]

    Seung Yul Lee, Hojoon Kim, Yutack Park, Dawoon Jeong, Seungwu Han, Yeonhong Park, and Jae W. Lee. FlashTP: Fused, sparsity-aware tensor product for machine learning interatomic potentials. InForty-second International Conference on Machine Learning, 2025

  31. [31]

    MatRIS: Toward reliable and efficient pretrained machine learning interaction potentials

    Yuanchang Zhou, Siyu Hu, Xiangyu Zhang, Hongyu Wang, Guangming Tan, and Weile Jia. MatRIS: Toward reliable and efficient pretrained machine learning interaction potentials. In The Fourteenth International Conference on Learning Representations, 2026

  32. [32]

    Optimizing cross-domain transfer for universal machine learning interatomic potentials.Nature Communications, 2026

    Jaesun Kim, Jinmu You, Yutack Park, Yunsung Lim, Yujin Kang, Jisu Kim, Haekwan Jeon, Suyeon Ju, Deokgi Hong, Seung Yul Lee, et al. Optimizing cross-domain transfer for universal machine learning interatomic potentials.Nature Communications, 2026

  33. [33]

    High-performance training and inference for deep equivariant interatomic potentials.Digital Discovery, 2026

    Chuin Wei Tan, Marc L Descoteaux, Mit Kotak, Gabriel de Miranda Nascimento, Seán R Kavanagh, Laura Zichi, Menghang Wang, Aadit Saluja, Yizhong R Hu, Tess Smidt, et al. High-performance training and inference for deep equivariant interatomic potentials.Digital Discovery, 2026

  34. [34]

    Graph atomic cluster expansion for foundational machine learning interatomic potentials.npj Computational Materials, 2026

    Yury Lysogorskiy, Anton Bochkarev, and Ralf Drautz. Graph atomic cluster expansion for foundational machine learning interatomic potentials.npj Computational Materials, 2026

  35. [35]

    arXiv preprint arXiv:2504.06231 , year=

    Benjamin Rhodes, Sander Vandenhaute, Vaidotas Šimkus, James Gin, Jonathan Godwin, Tim Duignan, and Mark Neumann. Orb-v3: atomistic simulation at scale, 2025. URL https: //arxiv.org/abs/2504.06231

  36. [36]

    Graph atomic cluster expansion for semilocal interactions beyond equivariant message passing.Physical Review X, 14(2):021036, 2024

    Anton Bochkarev, Yury Lysogorskiy, and Ralf Drautz. Graph atomic cluster expansion for semilocal interactions beyond equivariant message passing.Physical Review X, 14(2):021036, 2024

  37. [37]

    Zhang et al., A Graph Neural Network for the Era of Large Atomistic Models

    Duo Zhang, Anyang Peng, Chun Cai, Wentao Li, Yuanchang Zhou, Jinzhe Zeng, Mingyu Guo, Chengqian Zhang, Bowen Li, Hong Jiang, Tong Zhu, Weile Jia, Linfeng Zhang, and Han Wang. A graph neural network for the era of large atomistic models.arXiv preprint arXiv:2506.01686, 2025

  38. [38]

    A foundation model for atomistic materials chemistry.The Journal of chemical physics, 163(18), 2025

    Ilyes Batatia, Philipp Benner, Yuan Chiang, Alin M Elena, Dávid P Kovács, Janosh Riebesell, Xavier R Advincula, Mark Asta, Matthew Avaylon, William J Baldwin, et al. A foundation model for atomistic materials chemistry.The Journal of chemical physics, 163(18), 2025. 12

  39. [39]

    arXiv preprint arXiv:2405.04967 , year=

    Han Yang, Chenxi Hu, Yichi Zhou, Xixian Liu, Yu Shi, Jielan Li, Guanzhi Li, Zekun Chen, Shuizhou Chen, Claudio Zeni, et al. Mattersim: A deep learning atomistic model across elements, temperatures and pressures.arXiv preprint arXiv:2405.04967, 2024

  40. [40]

    Scalable parallel algorithm for graph neural network interatomic potentials in molecular dynamics simulations.Journal of chemical theory and computation, 20(11):4857–4868, 2024

    Yutack Park, Jaesun Kim, Seungwoo Hwang, and Seungwu Han. Scalable parallel algorithm for graph neural network interatomic potentials in molecular dynamics simulations.Journal of chemical theory and computation, 20(11):4857–4868, 2024

  41. [41]

    Neumann, J

    Mark Neumann, James Gin, Benjamin Rhodes, Steven Bennett, Zhiyi Li, Hitarth Choubisa, Arthur Hussey, and Jonathan Godwin. Orb: A fast, scalable neural network potential, 2024. URLhttps://arxiv.org/abs/2410.22570

  42. [42]

    Dpa-2: a large atomic model as a multi-task learner.npj Computational Materials, 10(1):293, 2024

    Duo Zhang, Xinzijian Liu, Xiangyu Zhang, Chengqian Zhang, Chun Cai, Hangrui Bi, Yiming Du, Xuejian Qin, Anyang Peng, Jiameng Huang, et al. Dpa-2: a large atomic model as a multi-task learner.npj Computational Materials, 10(1):293, 2024

  43. [43]

    Kitchin, Daniel S

    Brandon M Wood, Misko Dzamba, Xiang Fu, Meng Gao, Muhammed Shuaibi, Luis Barroso- Luque, Kareem Abdelmaqsoud, Vahe Gharakhanyan, John R. Kitchin, Daniel S. Levine, Kyle Michel, Anuroop Sriram, Taco Cohen, Abhishek Das, Sushree Jagriti Sahoo, Ammar Rizvi, Zachary Ward Ulissi, and C. Lawrence Zitnick. UMA: A family of universal models for atoms. InThe Thirt...

  44. [44]

    Multi-task fine-tuning enables robust out-of- distribution generalization in atomistic models.arXiv preprint arXiv:2601.08486, 2026

    Chengqian Zhang, Duo Zhang, Anyang Peng, Mingyu Guo, Yuzhi Zhang, Lei Wang, Guolin Ke, Linfeng Zhang, Tiejun Li, and Han Wang. Multi-task fine-tuning enables robust out-of- distribution generalization in atomistic models.arXiv preprint arXiv:2601.08486, 2026

  45. [45]

    Representation Learning with Contrastive Predictive Coding

    Aaron van den Oord, Yazhe Li, and Oriol Vinyals. Representation learning with contrastive predictive coding.arXiv preprint arXiv:1807.03748, 2018

  46. [46]

    What matters for representation alignment: Global information or spatial structure? InThe Fourteenth International Conference on Learning Representations, 2026

    Jaskirat Singh, Xingjian Leng, Zongze Wu, Liang Zheng, Richard Zhang, Eli Shechtman, and Saining Xie. What matters for representation alignment: Global information or spatial structure? InThe Fourteenth International Conference on Learning Representations, 2026

  47. [47]

    Riemannian score-based generative modelling

    Valentin De Bortoli, Emile Mathieu, Michael John Hutchinson, James Thornton, Yee Whye Teh, and Arnaud Doucet. Riemannian score-based generative modelling. In Alice H. Oh, Alekh Agarwal, Danielle Belgrave, and Kyunghyun Cho, editors,Advances in Neural Information Processing Systems, 2022

  48. [48]

    Commentary: The materials project: A materials genome approach to accelerating materials innovation.APL materials, 1(1), 2013

    Anubhav Jain, Shyue Ping Ong, Geoffroy Hautier, Wei Chen, William Davidson Richards, Stephen Dacek, Shreyas Cholia, Dan Gunter, David Skinner, Gerbrand Ceder, et al. Commentary: The materials project: A materials genome approach to accelerating materials innovation.APL materials, 1(1), 2013

  49. [49]

    New developments in the inorganic crystal structure database (icsd): accessibility in support of materials research and design.Structural Science, 58(3):364–369, 2002

    Alec Belsky, Mariette Hellenbrandt, Vicky Lynn Karen, and Peter Luksch. New developments in the inorganic crystal structure database (icsd): accessibility in support of materials research and design.Structural Science, 58(3):364–369, 2002

  50. [50]

    https://www.aissquare.com/datasets/detail?name= OpenLAM-TrainingSet-v1&id=308&pageType=datasets

    The openlam-v1 dataset, 2025. https://www.aissquare.com/datasets/detail?name= OpenLAM-TrainingSet-v1&id=308&pageType=datasets

  51. [51]

    On representing chemical environments

    Albert P Bartók, Risi Kondor, and Gábor Csányi. On representing chemical environments. Physical Review B—Condensed Matter and Materials Physics, 87(18):184115, 2013

  52. [52]

    Compressing local atomic neighbourhood descriptors.npj Computational Materials, 8(1):166, 2022

    James P Darby, James R Kermode, and Gábor Csányi. Compressing local atomic neighbourhood descriptors.npj Computational Materials, 8(1):166, 2022

  53. [53]

    Lauri Himanen, Marc O. J. Jäger, Eiaki V . Morooka, Filippo Federici Canova, Yashasvi S. Ranawat, David Z. Gao, Patrick Rinke, and Adam S. Foster. DScribe: Library of descriptors for machine learning in materials science.Computer Physics Communications, 247:106949, 2020. ISSN 0010-4655. doi: 10.1016/j.cpc.2019.106949

  54. [54]

    Updates to the dscribe library: New descriptors and derivatives

    Jarno Laakso, Lauri Himanen, Henrietta Homm, Eiaki V Morooka, Marc OJ Jäger, Milica Todorovi´c, and Patrick Rinke. Updates to the dscribe library: New descriptors and derivatives. The Journal of Chemical Physics, 158(23), 2023. 13

  55. [55]

    Smact: Semiconducting materials by analogy and chemical theory.Journal of Open Source Software, 4(38):1361, 2019

    Daniel W Davies, Keith T Butler, Adam J Jackson, Jonathan M Skelton, Kazuki Morita, and Aron Walsh. Smact: Semiconducting materials by analogy and chemical theory.Journal of Open Source Software, 4(38):1361, 2019

  56. [56]

    Classifier-Free Diffusion Guidance

    Jonathan Ho and Tim Salimans. Classifier-free diffusion guidance.arXiv preprint arXiv:2207.12598, 2022

  57. [57]

    EGSDE: Unpaired image-to-image translation via energy-guided stochastic differential equations

    Min Zhao, Fan Bao, Chongxuan Li, and Jun Zhu. EGSDE: Unpaired image-to-image translation via energy-guided stochastic differential equations. In Alice H. Oh, Alekh Agarwal, Danielle Belgrave, and Kyunghyun Cho, editors,Advances in Neural Information Processing Systems, 2022

  58. [58]

    average” reports the mean result across the 10 universal MLIP teachers used in this work, while “best

    Fan Bao, Min Zhao, Zhongkai Hao, Peiyao Li, Chongxuan Li, and Jun Zhu. Equivariant energy-guided SDE for inverse molecular design. InThe Eleventh International Conference on Learning Representations, 2023. 14 Contents 1 Introduction 1 2 Related Work 3 3 Preliminaries 3 3.1 Crystal generative modeling . . . . . . . . . . . . . . . . . . . . . . . . . . . ....