arxiv: 2605.08960 · v1 · submitted 2026-05-09 · ❄️ cond-mat.mtrl-sci · cs.LG· physics.chem-ph· physics.comp-ph

Recognition: 2 theorem links

· Lean Theorem

CrystalREPA: Transferring Physical Priors from Universal MLIPs to Crystal Generative Models

Chengqian Zhang , Yucheng Jin , Duo Zhang , Tiejun Li , Han Wang

Authors on Pith no claims yet

Pith reviewed 2026-05-12 02:18 UTC · model grok-4.3

classification ❄️ cond-mat.mtrl-sci cs.LGphysics.chem-phphysics.comp-ph

keywords Crystal generative modelsMachine learning interatomic potentialsRepresentation alignmentContrastive learningThermodynamic stabilityStructural validityMaterials discoveryAtom-wise representations

0 comments

The pith

Aligning atom-wise representations from crystal generators to frozen MLIPs via contrastive loss transfers stability priors and raises thermodynamic stability plus structural validity of generated crystals.

A machine-rendered reading of the paper's core claim, the machinery that carries it, and where it could break.

Crystal generative models currently learn stable-looking structures but receive little explicit signal about what makes a crystal thermodynamically stable. The paper identifies a clear representation gap between these generators and pretrained universal MLIPs through energy probing, then closes the gap with a lightweight training-time alignment. CrystalREPA applies an element-aware contrastive objective that pulls generative encoder states toward the frozen MLIP atom representations without changing inference cost. Across multiple generator architectures, MLIP teachers, and datasets, the alignment consistently improves stability, validity, and fidelity metrics. The transfer works best when the MLIP's atom-wise space is highly distinguishable, offering a selection criterion independent of standard accuracy leaderboards.

Core claim

CrystalREPA is a plug-and-play framework that aligns the atom-wise hidden states of any crystal generative encoder with the representations of a frozen universal MLIP through an element-aware contrastive objective, thereby transferring stability-aware atomistic priors at marginal training overhead and zero added inference cost.

What carries the argument

Element-aware contrastive objective that aligns generative atom-wise hidden states with frozen MLIP representations.

If this is right

Generated crystals exhibit higher thermodynamic stability as measured by formation energy and energy above hull.
Structural validity and fidelity both increase across three different generative frameworks.
MLIP selection for transfer can rely on representation distinguishability rather than Matbench Discovery accuracy.
The method adds negligible cost at inference and works on both benchmark datasets tested.
The representation gap between generators and MLIPs is measurable by energy probing and directly addressable.

Where Pith is reading between the lines

These are editorial extensions of the paper, not claims the author makes directly.

CrystalREPA could serve as a general template for injecting physics priors into other generative tasks in materials or chemistry.
Future generators might be pretrained with built-in alignment objectives instead of post-hoc transfer.
Representation distinguishability offers a practical diagnostic for choosing teacher models in any representation-transfer setting.
If the gains persist at larger model scales, the approach could accelerate high-throughput crystal discovery pipelines.

Load-bearing premise

The contrastive alignment actually transfers genuine stability-aware physical information rather than merely regularizing the encoder in a way that happens to improve the evaluation metrics.

What would settle it

Generate matched sets of crystals with and without CrystalREPA using the same MLIP teacher; if the stability, validity, and fidelity gains disappear or reverse when the MLIP's atom-wise representations show low distinguishability, the transfer claim fails.

Figures

Figures reproduced from arXiv: 2605.08960 by Chengqian Zhang, Duo Zhang, Han Wang, Tiejun Li, Yucheng Jin.

**Figure 2.** Figure 2: Overview of CrystalREPA. During training, CrystalREPA aligns atom-wise hidden [PITH_FULL_IMAGE:figures/full_fig_p005_2.png] view at source ↗

**Figure 3.** Figure 3: Representation distinguishability is associated with differences in MLIP transfer effective [PITH_FULL_IMAGE:figures/full_fig_p008_3.png] view at source ↗

**Figure 4.** Figure 4: Ablation studies of CrystalREPA using CrystalFlow as the base generative model and [PITH_FULL_IMAGE:figures/full_fig_p009_4.png] view at source ↗

**Figure 5.** Figure 5: Additional formation energy probing results across different MLIP teachers on MP-20. [PITH_FULL_IMAGE:figures/full_fig_p023_5.png] view at source ↗

read the original abstract

Crystal generative models mainly learn what stable crystals look like, with little explicit supervision for what makes them stable. We reveal a substantial representation gap between state-of-the-art crystal generative models and pretrained universal machine learning interatomic potentials (MLIPs) via energy probing, and show this gap can be closed by a simple training-time alignment. We propose Crystal REPresentation Alignment (CrystalREPA), a plug-and-play framework that aligns the atom-wise hidden states of generative encoders with frozen MLIP representations through an element-aware contrastive objective, transferring stability-aware atomistic priors with marginal training overhead and no additional inference cost. Across three generative frameworks, ten MLIP teachers, and two benchmark datasets, CrystalREPA consistently improves the thermodynamic stability, structural validity, and structural fidelity of generated crystals. Equally important, we find that an MLIP's transfer effectiveness is poorly predicted by its accuracy on standard leaderboards (e.g., Matbench Discovery) but strongly predicted by the distinguishability of its atom-wise representation space, yielding a practical, accuracy-independent criterion for selecting MLIP teachers for generative transfer.

Editorial analysis

A structured set of objections, weighed in public.

Desk editor's note, referee report, simulated authors' rebuttal, and a circularity audit. Tearing a paper down is the easy half of reading it; the pith above is the substance, this is the friction.

Desk Editor's Note private letter to a colleague

CrystalREPA's contrastive alignment gives consistent gains on crystal generators, but the gains may come from generic regularization rather than specific MLIP stability priors.

read the letter

The main thing to know is that this paper introduces a simple training-time alignment that improves thermodynamic stability, structural validity, and fidelity across three generative frameworks and two datasets, while also showing that MLIP teacher quality tracks better with representation distinguishability than with standard accuracy benchmarks. The method is plug-and-play: it freezes the MLIP, adds an element-aware contrastive loss on atom-wise hidden states, and adds almost no inference cost. That combination is practical and the cross-framework results suggest it is not tied to one architecture. The distinguishability criterion is a useful, accuracy-independent rule for picking teachers. Those two pieces are the clearest contributions. The paper reports the improvements hold up across ten different MLIP teachers, which gives some sense of robustness. The abstract is clear about the setup and the main empirical pattern. On the soft spots, the evidence for actual transfer of physical priors is still thin. The stress-test concern is on point: without an ablation that replaces the MLIP targets with random or untrained vectors, the gains could simply reflect the regularizing effect of the contrastive objective on the encoder. The abstract gives no quantitative deltas, no statistical significance numbers, and no detail on how the energy-probing gap was measured, so it is hard to judge effect size. Post-hoc selection of the best teacher also risks inflating the perceived advantage. If the full paper has those controls and reports them cleanly, the central claim strengthens; otherwise the interpretation stays ambiguous. This work is for people building or using crystal generative models who want a low-overhead way to push outputs toward more plausible structures. A reader already working in that area will find the selection criterion and the plug-and-play framing immediately usable. It is worth sending to peer review because the method is simple enough to adopt quickly and the empirical scope is broad, even if the mechanistic explanation needs tighter controls.

Referee Report

2 major / 1 minor

Summary. The manuscript introduces CrystalREPA, a plug-and-play alignment framework that uses an element-aware contrastive objective to match atom-wise hidden states from crystal generative model encoders to frozen representations from pretrained universal MLIPs. This is claimed to close a representation gap (identified via energy probing) and transfer stability-aware physical priors, yielding consistent improvements in thermodynamic stability, structural validity, and structural fidelity across three generative frameworks, ten MLIP teachers, and two benchmark datasets. The work further reports that MLIP transfer effectiveness correlates strongly with the distinguishability of its atom-wise representation space, rather than with accuracy on standard leaderboards such as Matbench Discovery, and incurs only marginal training overhead with no added inference cost.

Significance. If the claims hold, the approach provides a practical, low-cost method for injecting physical priors into crystal generative models by leveraging existing MLIPs, potentially advancing the generation of stable materials. The broad evaluation across multiple frameworks and datasets, together with the identification of a representation-based selection criterion independent of leaderboard accuracy, adds practical value. The paper explicitly credits the plug-and-play design and the absence of inference overhead as strengths.

major comments (2)

[Section 4 (Experiments) and Methods] The central claim that the element-aware contrastive alignment transfers genuine MLIP-encoded stability priors (rather than generic regularization) is load-bearing for the abstract's conclusions on thermodynamic stability and the representation-gap motivation. However, the manuscript does not include an ablation using non-physical target representations (e.g., random vectors or untrained embeddings) as controls. Without this, the reported gains across frameworks and datasets cannot be isolated from the effects of contrastive regularization on latent-space structure. This directly impacts the interpretation of the distinguishability correlation as evidence for physical-prior transfer.
[Abstract and Section 3 (Method)] The abstract states that a 'substantial representation gap' is revealed via energy probing and that this gap 'can be closed' by the alignment, but quantitative details on the probing protocol, baseline generative models, statistical significance of the gap, and how the gap metric is computed are not provided at a level that allows independent verification. This measurement underpins the motivation and the claim of consistent improvement.

minor comments (1)

[Abstract] The abstract would benefit from inclusion of specific quantitative improvement values (e.g., percentage gains in stability or validity metrics) and the number of runs used for statistical reporting, to allow readers to gauge effect sizes without consulting the full text.

Simulated Author's Rebuttal

2 responses · 0 unresolved

We thank the referee for their constructive and detailed feedback. The comments highlight important aspects for strengthening the evidence and clarity of our claims. We address each major comment below and outline the revisions we will make.

read point-by-point responses

Referee: [Section 4 (Experiments) and Methods] The central claim that the element-aware contrastive alignment transfers genuine MLIP-encoded stability priors (rather than generic regularization) is load-bearing for the abstract's conclusions on thermodynamic stability and the representation-gap motivation. However, the manuscript does not include an ablation using non-physical target representations (e.g., random vectors or untrained embeddings) as controls. Without this, the reported gains across frameworks and datasets cannot be isolated from the effects of contrastive regularization on latent-space structure. This directly impacts the interpretation of the distinguishability correlation as evidence for physical-prior transfer.

Authors: We agree that an explicit ablation with non-physical target representations is required to rigorously isolate the contribution of MLIP-encoded physical priors from generic effects of the contrastive objective. Although our results with ten MLIP teachers—where transfer gains correlate with representation distinguishability rather than Matbench Discovery accuracy—provide supporting evidence against a purely generic interpretation, this is indirect. In the revised manuscript we will add the requested controls (random vectors and untrained random embeddings as alignment targets) across the three generative frameworks. Preliminary experiments show that these non-physical targets produce substantially smaller gains in thermodynamic stability and validity metrics than pretrained MLIP representations. The new results and discussion will be placed in Section 4, with updated figures and text clarifying that the distinguishability correlation remains meaningful only in the presence of physically informed targets. revision: yes
Referee: [Abstract and Section 3 (Method)] The abstract states that a 'substantial representation gap' is revealed via energy probing and that this gap 'can be closed' by the alignment, but quantitative details on the probing protocol, baseline generative models, statistical significance of the gap, and how the gap metric is computed are not provided at a level that allows independent verification. This measurement underpins the motivation and the claim of consistent improvement.

Authors: We appreciate the request for greater quantitative transparency. The energy-probing protocol (linear probe on frozen atom-wise representations to predict formation energies) is described in Section 3.2 and the supplementary material, with baselines being the unaligned generative models and the gap defined as the relative increase in probe MAE. Statistical significance is evaluated over five independent runs. However, the main text and abstract currently lack explicit numerical values and a compact formula for the gap metric. In the revision we will (i) expand the abstract to include representative gap magnitudes, (ii) add a concise subsection in Section 3 with the exact computation (MAE difference normalized by MLIP probe performance), (iii) report per-model gap values and p-values in a new table, and (iv) move key numerical results from the supplement into the main text. These changes will enable independent verification without altering the original findings. revision: yes

Circularity Check

0 steps flagged

No significant circularity; derivation remains self-contained

full rationale

The paper introduces CrystalREPA as an explicit alignment of generative encoders to independently pretrained and frozen MLIP atom-wise representations via a separately defined element-aware contrastive objective. All reported gains are evaluated on external downstream metrics (thermodynamic stability, structural validity, fidelity) that are not part of the alignment loss. The observed correlation between representation distinguishability and transfer effectiveness is presented as a post-hoc empirical finding rather than a definitional or fitted equivalence. No load-bearing step reduces to self-citation, ansatz smuggling, or renaming of inputs as predictions.

Axiom & Free-Parameter Ledger

0 free parameters · 2 axioms · 0 invented entities

The central claim rests on the assumption that MLIP atom-wise representations encode transferable stability information that can be aligned via contrastive loss without harming generative diversity. No explicit free parameters or invented entities are named in the abstract.

axioms (2)

domain assumption Pretrained universal MLIPs encode stability-aware atomistic priors in their hidden states that are useful for generative models.
Invoked when the authors state that alignment transfers 'stability-aware atomistic priors'.
ad hoc to paper An element-aware contrastive objective can align representations without introducing artifacts that degrade generative performance.
The training-time alignment step is presented as sufficient and neutral.

pith-pipeline@v0.9.0 · 5513 in / 1428 out tokens · 48646 ms · 2026-05-12T02:18:45.897587+00:00 · methodology

discussion (0)

Lean theorems connected to this paper

Citations machine-checked in the Pith Canon. Every link opens the source theorem in the public Lean library.

IndisputableMonolith/Foundation/AbsoluteFloorClosure.lean absolute_floor_iff_bare_distinguishability echoes

?

echoes
ECHOES: this paper passage has the same mathematical shape or conceptual pattern as the Recognition theorem, but is not a direct formal dependency.

an MLIP's transfer effectiveness is poorly predicted by its accuracy on standard leaderboards but strongly predicted by the distinguishability of its atom-wise representation space

What do these tags mean?

matches: The paper's claim is directly supported by a theorem in the formal canon.
supports: The theorem supports part of the paper's argument, but the paper may add assumptions or extra steps.
extends: The paper goes beyond the formal theorem; the theorem is a base layer rather than the whole result.
uses: The paper appears to rely on the theorem as machinery.
contradicts: The paper's claim conflicts with a theorem or certificate in the canon.
unclear: Pith found a possible connection, but the passage is too broad, indirect, or ambiguous to say the theorem truly supports the claim.

Reference graph

Works this paper leans on

58 extracted references · 58 canonical work pages · 2 internal anchors

[1]

Artificial intelligence-driven approaches for materials design and discovery.Nature Materials, pages 1–17, 2026

Mouyang Cheng, Chu-Liang Fu, Ryotaro Okabe, Abhijatmedhi Chotrattanapituk, Artittaya Boonkird, Nguyen Tuan Hung, and Mingda Li. Artificial intelligence-driven approaches for materials design and discovery.Nature Materials, pages 1–17, 2026

work page 2026
[2]

Generative models for crystalline materials.Advanced Materials, page e23620, 2026

Houssam Metni, Laura Ruple, Lauren N Walters, Luca Torresi, Jonas Teufel, Henrik Schopmans, Jona Östreicher, Yumeng Zhang, Marlen Neubert, Yuri Koide, et al. Generative models for crystalline materials.Advanced Materials, page e23620, 2026

work page 2026
[3]

A generative model for inorganic materials design.Nature, 639(8055):624–632, 2025

Claudio Zeni, Robert Pinsler, Daniel Zügner, Andrew Fowler, Matthew Horton, Xiang Fu, Zilong Wang, Aliaksandra Shysheya, Jonathan Crabbé, Shoko Ueda, et al. A generative model for inorganic materials design.Nature, 639(8055):624–632, 2025

work page 2025
[4]

Crystalflow: a flow-based generative model for crystalline materials

Xiaoshan Luo, Zhenyu Wang, Qingchang Wang, Xuechen Shao, Jian Lv, Lei Wang, Yanchao Wang, and Yanming Ma. Crystalflow: a flow-based generative model for crystalline materials. Nature Communications, 16(1):9267, 2025

work page 2025
[5]

Crystal structure prediction by joint equivariant diffusion

Rui Jiao, Wenbing Huang, Peijia Lin, Jiaqi Han, Pin Chen, Yutong Lu, and Yang Liu. Crystal structure prediction by joint equivariant diffusion. InThirty-seventh Conference on Neural Information Processing Systems, 2023

work page 2023
[6]

Jaakkola

Tian Xie, Xiang Fu, Octavian-Eugen Ganea, Regina Barzilay, and Tommi S. Jaakkola. Crystal diffusion variational autoencoder for periodic material generation. InInternational Conference on Learning Representations, 2022

work page 2022
[7]

Foundation models for atomistic simulation of chemistry and materials.Nature Reviews Chemistry, pages 1–19, 2026

Eric C-Y Yuan, Yunsheng Liu, Junmin Chen, Peichen Zhong, Sanjeev Raja, Tobias Kreiman, Santiago Vargas, Wenbin Xu, Martin Head-Gordon, Chao Yang, et al. Foundation models for atomistic simulation of chemistry and materials.Nature Reviews Chemistry, pages 1–19, 2026. 10

work page 2026
[8]

Representation alignment for generation: Training diffusion transformers is easier than you think

Sihyun Yu, Sangkyung Kwak, Huiwon Jang, Jongheon Jeong, Jonathan Huang, Jinwoo Shin, and Saining Xie. Representation alignment for generation: Training diffusion transformers is easier than you think. InThe Thirteenth International Conference on Learning Representations, 2025

work page 2025
[9]

Denoising diffusion probabilistic models.Advances in neural information processing systems, 33:6840–6851, 2020

Jonathan Ho, Ajay Jain, and Pieter Abbeel. Denoising diffusion probabilistic models.Advances in neural information processing systems, 33:6840–6851, 2020

work page 2020
[10]

Score-based generative modeling through stochastic differential equations

Yang Song, Jascha Sohl-Dickstein, Diederik P Kingma, Abhishek Kumar, Stefano Ermon, and Ben Poole. Score-based generative modeling through stochastic differential equations. In International Conference on Learning Representations, 2021

work page 2021
[11]

Towards symmetry-aware generation of periodic materials.Advances in Neural Information Processing Systems, 36:53308–53329, 2023

Youzhi Luo, Chengkai Liu, and Shuiwang Ji. Towards symmetry-aware generation of periodic materials.Advances in Neural Information Processing Systems, 36:53308–53329, 2023

work page 2023
[12]

François R J Cornet, Federico Bergamin, Arghya Bhowmik, Juan Maria Garcia-Lastra, Jes Frellsen, and Mikkel N. Schmidt. Kinetic langevin diffusion for crystalline materials generation. InForty-second International Conference on Machine Learning, 2025

work page 2025
[13]

Equivariant diffusion for crystal structure prediction

Peijia Lin, Pin Chen, Rui Jiao, Qing Mo, Cen Jianhuan, Wenbing Huang, Yang Liu, Dan Huang, and Yutong Lu. Equivariant diffusion for crystal structure prediction. InProceedings of the 41st International Conference on Machine Learning, pages 29890–29913, 2024

work page 2024
[14]

Space group constrained crystal generation

Rui Jiao, Wenbing Huang, Yu Liu, Deli Zhao, and Yang Liu. Space group constrained crystal generation. InThe Twelfth International Conference on Learning Representations, 2024

work page 2024
[15]

SymmCD: Symmetry-preserving crystal generation with diffusion models

Daniel Levy, Siba Smarak Panigrahi, Sékou-Oumar Kaba, Qiang Zhu, Kin Long Kelvin Lee, Mikhail Galkin, Santiago Miret, and Siamak Ravanbakhsh. SymmCD: Symmetry-preserving crystal generation with diffusion models. InThe Thirteenth International Conference on Learning Representations, 2025

work page 2025
[16]

Andersson, Abhijith S Parackal, Dong Qian, Rickard Armiento, and Fredrik Lindsten

Filip Ekström Kelvinius, Oskar B. Andersson, Abhijith S Parackal, Dong Qian, Rickard Armiento, and Fredrik Lindsten. Wyckoffdiff – a generative diffusion model for crystal symme- try. InForty-second International Conference on Machine Learning, 2025

work page 2025
[17]

Space group equivariant crystal diffusion

Rees Chang, Angela Pak, Alex Guerra, Ni Zhan, Nick Richardson, Elif Ertekin, and Ryan P Adams. Space group equivariant crystal diffusion. InThe Thirty-ninth Annual Conference on Neural Information Processing Systems, 2025

work page 2025
[18]

Yaron Lipman, Ricky T. Q. Chen, Heli Ben-Hamu, Maximilian Nickel, and Matthew Le. Flow matching for generative modeling. InThe Eleventh International Conference on Learning Representations, 2023

work page 2023
[19]

Building normalizing flows with stochastic interpolants

Michael Samuel Albergo and Eric Vanden-Eijnden. Building normalizing flows with stochastic interpolants. InThe Eleventh International Conference on Learning Representations, 2023

work page 2023
[20]

Flow straight and fast: Learning to generate and transfer data with rectified flow

Xingchao Liu, Chengyue Gong, and qiang liu. Flow straight and fast: Learning to generate and transfer data with rectified flow. InThe Eleventh International Conference on Learning Representations, 2023

work page 2023
[21]

Benjamin Kurt Miller, Ricky T. Q. Chen, Anuroop Sriram, and Brandon M Wood. FlowMM: Generating materials with riemannian flow matching. InForty-first International Conference on Machine Learning, 2024

work page 2024
[22]

Anuroop Sriram, Benjamin Kurt Miller, Ricky T. Q. Chen, and Brandon M Wood. FlowLLM: Flow matching for material generation with large language models as base distributions. InThe Thirty-eighth Annual Conference on Neural Information Processing Systems, 2024

work page 2024
[23]

Chgnet as a pretrained universal neural network potential for charge-informed atomistic modelling.Nature Machine Intelligence, 5(9):1031–1041, 2023

Bowen Deng, Peichen Zhong, KyuJung Jun, Janosh Riebesell, Kevin Han, Christopher J Bartel, and Gerbrand Ceder. Chgnet as a pretrained universal neural network potential for charge-informed atomistic modelling.Nature Machine Intelligence, 5(9):1031–1041, 2023. 11

work page 2023
[24]

Machine-learning-assisted determination of the global zero-temperature phase diagram of materials.Advanced Materials, 35(22):2210788, 2023

Jonathan Schmidt, Noah Hoffmann, Hai-Chen Wang, Pedro Borlido, Pedro JMA Carriço, Tiago FT Cerqueira, Silvana Botti, and Miguel AL Marques. Machine-learning-assisted determination of the global zero-temperature phase diagram of materials.Advanced Materials, 35(22):2210788, 2023

work page 2023
[25]

Improving machine-learning models in materials science through large datasets.Materials Today Physics, 48:101560, 2024

Jonathan Schmidt, Tiago FT Cerqueira, Aldo H Romero, Antoine Loew, Fabian Jäger, Hai-Chen Wang, Silvana Botti, and Miguel AL Marques. Improving machine-learning models in materials science through large datasets.Materials Today Physics, 48:101560, 2024

work page 2024
[26]

arXiv preprint arXiv:2410.12771 , year=

Luis Barroso-Luque, Muhammed Shuaibi, Xiang Fu, Brandon M Wood, Misko Dzamba, Meng Gao, Ammar Rizvi, C Lawrence Zitnick, and Zachary W Ulissi. Open materials 2024 (omat24) inorganic materials dataset and models.arXiv preprint arXiv:2410.12771, 2024

work page arXiv 2024
[27]

A framework to evaluate machine learning crystal stability predictions.nature machine intelligence, 7(6):836–847, 2025

Janosh Riebesell, Rhys EA Goodall, Philipp Benner, Yuan Chiang, Bowen Deng, Gerbrand Ceder, Mark Asta, Alpha A Lee, Anubhav Jain, and Kristin A Persson. A framework to evaluate machine learning crystal stability predictions.nature machine intelligence, 7(6):836–847, 2025

work page 2025
[28]

Pet-mad as a lightweight universal interatomic potential for advanced materials modeling.Nature Communications, 16(1):10653, 2025

Arslan Mazitov, Filippo Bigi, Matthias Kellner, Paolo Pegolo, Davide Tisi, Guillaume Fraux, Sergey Pozdnyakov, Philip Loche, and Michele Ceriotti. Pet-mad as a lightweight universal interatomic potential for advanced materials modeling.Nature Communications, 16(1):10653, 2025

work page 2025
[29]

Levine, Meng Gao, Misko Dzamba, and C

Xiang Fu, Brandon M Wood, Luis Barroso-Luque, Daniel S. Levine, Meng Gao, Misko Dzamba, and C. Lawrence Zitnick. Learning smooth and expressive interatomic potentials for physical property prediction. InForty-second International Conference on Machine Learning, 2025

work page 2025
[30]

Seung Yul Lee, Hojoon Kim, Yutack Park, Dawoon Jeong, Seungwu Han, Yeonhong Park, and Jae W. Lee. FlashTP: Fused, sparsity-aware tensor product for machine learning interatomic potentials. InForty-second International Conference on Machine Learning, 2025

work page 2025
[31]

MatRIS: Toward reliable and efficient pretrained machine learning interaction potentials

Yuanchang Zhou, Siyu Hu, Xiangyu Zhang, Hongyu Wang, Guangming Tan, and Weile Jia. MatRIS: Toward reliable and efficient pretrained machine learning interaction potentials. In The Fourteenth International Conference on Learning Representations, 2026

work page 2026
[32]

Optimizing cross-domain transfer for universal machine learning interatomic potentials.Nature Communications, 2026

Jaesun Kim, Jinmu You, Yutack Park, Yunsung Lim, Yujin Kang, Jisu Kim, Haekwan Jeon, Suyeon Ju, Deokgi Hong, Seung Yul Lee, et al. Optimizing cross-domain transfer for universal machine learning interatomic potentials.Nature Communications, 2026

work page 2026
[33]

High-performance training and inference for deep equivariant interatomic potentials.Digital Discovery, 2026

Chuin Wei Tan, Marc L Descoteaux, Mit Kotak, Gabriel de Miranda Nascimento, Seán R Kavanagh, Laura Zichi, Menghang Wang, Aadit Saluja, Yizhong R Hu, Tess Smidt, et al. High-performance training and inference for deep equivariant interatomic potentials.Digital Discovery, 2026

work page 2026
[34]

Graph atomic cluster expansion for foundational machine learning interatomic potentials.npj Computational Materials, 2026

Yury Lysogorskiy, Anton Bochkarev, and Ralf Drautz. Graph atomic cluster expansion for foundational machine learning interatomic potentials.npj Computational Materials, 2026

work page 2026
[35]

arXiv preprint arXiv:2504.06231 , year=

Benjamin Rhodes, Sander Vandenhaute, Vaidotas Šimkus, James Gin, Jonathan Godwin, Tim Duignan, and Mark Neumann. Orb-v3: atomistic simulation at scale, 2025. URL https: //arxiv.org/abs/2504.06231

work page arXiv 2025
[36]

Graph atomic cluster expansion for semilocal interactions beyond equivariant message passing.Physical Review X, 14(2):021036, 2024

Anton Bochkarev, Yury Lysogorskiy, and Ralf Drautz. Graph atomic cluster expansion for semilocal interactions beyond equivariant message passing.Physical Review X, 14(2):021036, 2024

work page 2024
[37]

Zhang et al., A Graph Neural Network for the Era of Large Atomistic Models

Duo Zhang, Anyang Peng, Chun Cai, Wentao Li, Yuanchang Zhou, Jinzhe Zeng, Mingyu Guo, Chengqian Zhang, Bowen Li, Hong Jiang, Tong Zhu, Weile Jia, Linfeng Zhang, and Han Wang. A graph neural network for the era of large atomistic models.arXiv preprint arXiv:2506.01686, 2025

work page arXiv 2025
[38]

A foundation model for atomistic materials chemistry.The Journal of chemical physics, 163(18), 2025

Ilyes Batatia, Philipp Benner, Yuan Chiang, Alin M Elena, Dávid P Kovács, Janosh Riebesell, Xavier R Advincula, Mark Asta, Matthew Avaylon, William J Baldwin, et al. A foundation model for atomistic materials chemistry.The Journal of chemical physics, 163(18), 2025. 12

work page 2025
[39]

arXiv preprint arXiv:2405.04967 , year=

Han Yang, Chenxi Hu, Yichi Zhou, Xixian Liu, Yu Shi, Jielan Li, Guanzhi Li, Zekun Chen, Shuizhou Chen, Claudio Zeni, et al. Mattersim: A deep learning atomistic model across elements, temperatures and pressures.arXiv preprint arXiv:2405.04967, 2024

work page arXiv 2024
[40]

Scalable parallel algorithm for graph neural network interatomic potentials in molecular dynamics simulations.Journal of chemical theory and computation, 20(11):4857–4868, 2024

Yutack Park, Jaesun Kim, Seungwoo Hwang, and Seungwu Han. Scalable parallel algorithm for graph neural network interatomic potentials in molecular dynamics simulations.Journal of chemical theory and computation, 20(11):4857–4868, 2024

work page 2024
[41]

Neumann, J

Mark Neumann, James Gin, Benjamin Rhodes, Steven Bennett, Zhiyi Li, Hitarth Choubisa, Arthur Hussey, and Jonathan Godwin. Orb: A fast, scalable neural network potential, 2024. URLhttps://arxiv.org/abs/2410.22570

work page arXiv 2024
[42]

Dpa-2: a large atomic model as a multi-task learner.npj Computational Materials, 10(1):293, 2024

Duo Zhang, Xinzijian Liu, Xiangyu Zhang, Chengqian Zhang, Chun Cai, Hangrui Bi, Yiming Du, Xuejian Qin, Anyang Peng, Jiameng Huang, et al. Dpa-2: a large atomic model as a multi-task learner.npj Computational Materials, 10(1):293, 2024

work page 2024
[43]

Kitchin, Daniel S

Brandon M Wood, Misko Dzamba, Xiang Fu, Meng Gao, Muhammed Shuaibi, Luis Barroso- Luque, Kareem Abdelmaqsoud, Vahe Gharakhanyan, John R. Kitchin, Daniel S. Levine, Kyle Michel, Anuroop Sriram, Taco Cohen, Abhishek Das, Sushree Jagriti Sahoo, Ammar Rizvi, Zachary Ward Ulissi, and C. Lawrence Zitnick. UMA: A family of universal models for atoms. InThe Thirt...

work page 2025
[44]

Multi-task fine-tuning enables robust out-of- distribution generalization in atomistic models.arXiv preprint arXiv:2601.08486, 2026

Chengqian Zhang, Duo Zhang, Anyang Peng, Mingyu Guo, Yuzhi Zhang, Lei Wang, Guolin Ke, Linfeng Zhang, Tiejun Li, and Han Wang. Multi-task fine-tuning enables robust out-of- distribution generalization in atomistic models.arXiv preprint arXiv:2601.08486, 2026

work page arXiv 2026
[45]

Representation Learning with Contrastive Predictive Coding

Aaron van den Oord, Yazhe Li, and Oriol Vinyals. Representation learning with contrastive predictive coding.arXiv preprint arXiv:1807.03748, 2018

work page internal anchor Pith review Pith/arXiv arXiv 2018
[46]

What matters for representation alignment: Global information or spatial structure? InThe Fourteenth International Conference on Learning Representations, 2026

Jaskirat Singh, Xingjian Leng, Zongze Wu, Liang Zheng, Richard Zhang, Eli Shechtman, and Saining Xie. What matters for representation alignment: Global information or spatial structure? InThe Fourteenth International Conference on Learning Representations, 2026

work page 2026
[47]

Riemannian score-based generative modelling

Valentin De Bortoli, Emile Mathieu, Michael John Hutchinson, James Thornton, Yee Whye Teh, and Arnaud Doucet. Riemannian score-based generative modelling. In Alice H. Oh, Alekh Agarwal, Danielle Belgrave, and Kyunghyun Cho, editors,Advances in Neural Information Processing Systems, 2022

work page 2022
[48]

Commentary: The materials project: A materials genome approach to accelerating materials innovation.APL materials, 1(1), 2013

Anubhav Jain, Shyue Ping Ong, Geoffroy Hautier, Wei Chen, William Davidson Richards, Stephen Dacek, Shreyas Cholia, Dan Gunter, David Skinner, Gerbrand Ceder, et al. Commentary: The materials project: A materials genome approach to accelerating materials innovation.APL materials, 1(1), 2013

work page 2013
[49]

New developments in the inorganic crystal structure database (icsd): accessibility in support of materials research and design.Structural Science, 58(3):364–369, 2002

Alec Belsky, Mariette Hellenbrandt, Vicky Lynn Karen, and Peter Luksch. New developments in the inorganic crystal structure database (icsd): accessibility in support of materials research and design.Structural Science, 58(3):364–369, 2002

work page 2002
[50]

https://www.aissquare.com/datasets/detail?name= OpenLAM-TrainingSet-v1&id=308&pageType=datasets

The openlam-v1 dataset, 2025. https://www.aissquare.com/datasets/detail?name= OpenLAM-TrainingSet-v1&id=308&pageType=datasets

work page 2025
[51]

On representing chemical environments

Albert P Bartók, Risi Kondor, and Gábor Csányi. On representing chemical environments. Physical Review B—Condensed Matter and Materials Physics, 87(18):184115, 2013

work page 2013
[52]

Compressing local atomic neighbourhood descriptors.npj Computational Materials, 8(1):166, 2022

James P Darby, James R Kermode, and Gábor Csányi. Compressing local atomic neighbourhood descriptors.npj Computational Materials, 8(1):166, 2022

work page 2022
[53]

Lauri Himanen, Marc O. J. Jäger, Eiaki V . Morooka, Filippo Federici Canova, Yashasvi S. Ranawat, David Z. Gao, Patrick Rinke, and Adam S. Foster. DScribe: Library of descriptors for machine learning in materials science.Computer Physics Communications, 247:106949, 2020. ISSN 0010-4655. doi: 10.1016/j.cpc.2019.106949

work page doi:10.1016/j.cpc.2019.106949 2020
[54]

Updates to the dscribe library: New descriptors and derivatives

Jarno Laakso, Lauri Himanen, Henrietta Homm, Eiaki V Morooka, Marc OJ Jäger, Milica Todorovi´c, and Patrick Rinke. Updates to the dscribe library: New descriptors and derivatives. The Journal of Chemical Physics, 158(23), 2023. 13

work page 2023
[55]

Smact: Semiconducting materials by analogy and chemical theory.Journal of Open Source Software, 4(38):1361, 2019

Daniel W Davies, Keith T Butler, Adam J Jackson, Jonathan M Skelton, Kazuki Morita, and Aron Walsh. Smact: Semiconducting materials by analogy and chemical theory.Journal of Open Source Software, 4(38):1361, 2019

work page 2019
[56]

Classifier-Free Diffusion Guidance

Jonathan Ho and Tim Salimans. Classifier-free diffusion guidance.arXiv preprint arXiv:2207.12598, 2022

work page internal anchor Pith review Pith/arXiv arXiv 2022
[57]

EGSDE: Unpaired image-to-image translation via energy-guided stochastic differential equations

Min Zhao, Fan Bao, Chongxuan Li, and Jun Zhu. EGSDE: Unpaired image-to-image translation via energy-guided stochastic differential equations. In Alice H. Oh, Alekh Agarwal, Danielle Belgrave, and Kyunghyun Cho, editors,Advances in Neural Information Processing Systems, 2022

work page 2022
[58]

average” reports the mean result across the 10 universal MLIP teachers used in this work, while “best

Fan Bao, Min Zhao, Zhongkai Hao, Peiyao Li, Chongxuan Li, and Jun Zhu. Equivariant energy-guided SDE for inverse molecular design. InThe Eleventh International Conference on Learning Representations, 2023. 14 Contents 1 Introduction 1 2 Related Work 3 3 Preliminaries 3 3.1 Crystal generative modeling . . . . . . . . . . . . . . . . . . . . . . . . . . . ....

work page arXiv 2023