pith. sign in

arxiv: 2606.22866 · v1 · pith:U4FSZZZQnew · submitted 2026-06-22 · 💻 cs.LG · cs.AI

Discovering Crystal Structure Prediction Algorithms with an AI Co-Scientist

Pith reviewed 2026-06-26 09:00 UTC · model grok-4.3

classification 💻 cs.LG cs.AI
keywords crystal structure predictiongenerative modelsmasked generative transformerAI co-scientistalgorithm discoverycross-domain transfermaterials science
0
0 comments X

The pith

An AI co-scientist adapts a vision model into MaskGXT to raise crystal structure prediction accuracy on benchmarks.

A machine-rendered reading of the paper's core claim, the machinery that carries it, and where it could break.

The paper presents a Human-AI Co-discovery system (HACO) that searches generative modeling approaches across fields and uses sparse human input to adapt promising ones to new scientific tasks. For crystal structure prediction from chemical compositions, HACO selects MaskGIT from vision, reformulates it as a discrete token model of crystals, and adds symmetry tokens, stratified sampling, and coordinate refinement to create MaskGXT. On the MP-20 polymorph split this yields 79.06 percent METRe accuracy against 70.87 percent for the strongest baseline, with top results also on standard MP-20 and MPTS-52 benchmarks. A sympathetic reader would care because the work tests whether cross-domain search plus targeted guidance can accelerate algorithm discovery in settings where validation is cheap and fast.

Core claim

HACO searched across generative modeling methodologies from multiple fields and identified MaskGIT as a promising framework for crystal structure prediction. It instantiated the masked formulation as a discrete token model of crystal structure; guided by sparse high-level human objectives, it added crystallographic symmetry tokens, space group stratified sampling for polymorph coverage, and sub-bin coordinate refinement, yielding MaskGXT. On the MP-20 polymorph split MaskGXT reaches 79.06 percent METRe accuracy compared with 70.87 percent for the strongest evaluated baseline and attains the best match rate on standard MP-20 and MPTS-52 CSP benchmarks.

What carries the argument

Human-AI Co-discovery system (HACO) that performs cross-domain search of generative models followed by sparse human steering to adapt them, instantiated here as the Masked Generative Crystal Transformer (MaskGXT).

If this is right

  • MaskGXT sets the highest reported match rate on the MP-20 polymorph split and on the standard MP-20 and MPTS-52 CSP benchmarks.
  • Transfer of masked generative modeling principles from vision, when combined with domain-specific tokens and sampling, improves coverage of polymorphs in crystal generation.
  • In scientific domains that supply cheap, fast, and well-aligned validation metrics, interactive AI systems can identify transferable modeling ideas and combine them with targeted human guidance.
  • The results supply evidence that cross-domain search plus sparse steering can contribute to scientific algorithm discovery.

Where Pith is reading between the lines

These are editorial extensions of the paper, not claims the author makes directly.

  • The same search-and-steer loop could be tested on other discrete-structure generation tasks such as molecular conformer prediction where fast validation oracles exist.
  • If the performance edge persists across multiple independent re-implementations, it would strengthen the case that the co-discovery workflow itself, rather than any single added token, drives the improvement.
  • Extending HACO to propose and evaluate several candidate adaptations in parallel might further reduce the human steering needed per task.

Load-bearing premise

The accuracy gains come from the cross-domain search and sparse human steering rather than from routine hyperparameter tuning or standard transformer implementation choices.

What would settle it

Re-implementing the MaskGIT-to-crystal adaptation using only conventional machine-learning engineering without the HACO search process or the listed human-guided additions, then checking whether the 79.06 percent METRe accuracy on the MP-20 polymorph split is still reached.

read the original abstract

We introduce Human-AI Co-discovery system (HACO) for scientific algorithm discovery through cross-domain search and sparse human steering. Starting from the goal of generating crystal structures from chemical compositions, HACO searched across generative modeling methodologies from multiple fields and identified MaskGIT, a masked generative model from vision, as a promising framework for crystal structure prediction (CSP). HACO instantiated this masked formulation as a discrete token model of crystal structure; guided by sparse high-level human objectives, it then added crystallographic symmetry tokens, space group stratified sampling for polymorph coverage, and sub-bin coordinate refinement, yielding the Masked Generative Crystal Transformer (MaskGXT). On the MP-20 polymorph split, MaskGXT reaches 79.06% match-everyone-to-reference (METRe) accuracy, compared with 70.87% for the strongest evaluated baseline. MaskGXT also attains the best match rate on standard MP-20 and MPTS-52 CSP benchmarks. These results provide evidence that, in domains offering cheap, fast, and well-aligned validation, transfer-guided interactive AI co-scientists can contribute to scientific algorithm discovery by identifying transferable modeling principles and combining them with targeted human domain guidance.

Editorial analysis

A structured set of objections, weighed in public.

Desk editor's note, referee report, simulated authors' rebuttal, and a circularity audit. Tearing a paper down is the easy half of reading it; the pith above is the substance, this is the friction.

Referee Report

2 major / 2 minor

Summary. The paper introduces the Human-AI Co-discovery system (HACO) that performs cross-domain search over generative modeling methods and, with sparse human steering, adapts MaskGIT into MaskGXT for crystal structure prediction by adding symmetry tokens, space-group stratified sampling, and sub-bin refinement. It reports that MaskGXT achieves 79.06% match-everyone-to-reference (METRe) accuracy on the MP-20 polymorph split (vs. 70.87% for the strongest baseline) and the best match rates on standard MP-20 and MPTS-52 CSP benchmarks, arguing this demonstrates the value of transfer-guided interactive AI co-scientists in domains with fast validation.

Significance. If the reported gains can be causally attributed to the HACO process, the work would provide a concrete example of algorithmic discovery via cross-domain transfer plus targeted domain guidance, with potential applicability to other scientific fields offering cheap, aligned validation oracles. The explicit benchmark numbers and focus on a well-defined task (CSP) make the result falsifiable and potentially reproducible if code and splits are released.

major comments (2)
  1. [Abstract and §4] Abstract and §4 (Results): the central performance claim (79.06% METRe on the MP-20 polymorph split) is presented without error bars, statistical significance tests, or details on exact baseline implementations and data splits, so it is impossible to assess whether the 8.19-point lift over the 70.87% baseline is robust or could arise from standard hyperparameter search on the same discrete-token formulation.
  2. [§3 and §4] §3 (Method) and §4: no ablation studies isolate the contribution of the HACO-identified elements (symmetry tokens, space-group stratified sampling, sub-bin coordinate refinement) from what a conventional transformer hyperparameter sweep on the base MaskGIT discrete-token model would achieve; this ablation is load-bearing for the claim that the co-scientist process itself produced the improvement.
minor comments (2)
  1. [Abstract] The acronym expansion for HACO appears only after first use; spelling it out on first mention would improve readability.
  2. [Abstract] Notation for METRe is introduced without an explicit equation or reference to its definition in prior CSP literature.

Simulated Author's Rebuttal

2 responses · 0 unresolved

We thank the referee for their constructive comments, which help clarify the presentation of our results. We address each major comment below.

read point-by-point responses
  1. Referee: [Abstract and §4] Abstract and §4 (Results): the central performance claim (79.06% METRe on the MP-20 polymorph split) is presented without error bars, statistical significance tests, or details on exact baseline implementations and data splits, so it is impossible to assess whether the 8.19-point lift over the 70.87% baseline is robust or could arise from standard hyperparameter search on the same discrete-token formulation.

    Authors: We agree that the current presentation would benefit from additional statistical details. In the revised manuscript we will report error bars from multiple runs with different random seeds, include statistical significance tests comparing MaskGXT to the baseline, and expand the experimental section with precise descriptions of baseline implementations (including hyperparameter ranges explored) and the exact train/validation/test splits for the MP-20 polymorph setting. revision: yes

  2. Referee: [§3 and §4] §3 (Method) and §4: no ablation studies isolate the contribution of the HACO-identified elements (symmetry tokens, space-group stratified sampling, sub-bin coordinate refinement) from what a conventional transformer hyperparameter sweep on the base MaskGIT discrete-token model would achieve; this ablation is load-bearing for the claim that the co-scientist process itself produced the improvement.

    Authors: The 70.87% baseline already reflects the strongest performance obtainable from standard adaptations of MaskGIT to discrete tokens (including hyperparameter tuning) without the domain-specific modifications identified by HACO. Symmetry tokens and space-group stratified sampling are not components that arise from a conventional hyperparameter sweep on the base architecture; they were introduced only after the co-scientist process highlighted transferable principles from vision and crystallography. We will clarify this distinction in §3 and add a limited ablation table in the revision showing performance when each HACO-derived component is removed individually. revision: partial

Circularity Check

0 steps flagged

No significant circularity detected

full rationale

The paper reports an empirical discovery process (HACO identifying MaskGIT and adding domain-specific modifications) whose central claims are performance numbers on external benchmarks (MP-20 polymorph split, standard MP-20, MPTS-52). These are measured against independent baselines and datasets rather than being derived from quantities defined inside the paper. No equations, self-citations, or fitted parameters are presented as load-bearing derivations that reduce to the inputs by construction.

Axiom & Free-Parameter Ledger

0 free parameters · 1 axioms · 2 invented entities

Abstract-only review yields limited visibility into parameters or assumptions; the ledger reflects only elements explicitly named in the provided text.

axioms (1)
  • domain assumption Masked generative models from vision can be instantiated as discrete token models for crystal structures
    The paper states that HACO identified MaskGIT and instantiated it as a discrete token model of crystal structure.
invented entities (2)
  • HACO no independent evidence
    purpose: Human-AI Co-discovery system for algorithm search
    Introduced as the overall framework that performs cross-domain search and sparse steering.
  • MaskGXT no independent evidence
    purpose: Masked Generative Crystal Transformer model
    The final instantiated model after adding symmetry tokens and sampling strategies.

pith-pipeline@v0.9.1-grok · 5740 in / 1301 out tokens · 35002 ms · 2026-06-26T09:00:37.040071+00:00 · methodology

discussion (0)

Sign in with ORCID, Apple, or X to comment. Anyone can read and Pith papers without signing in.

Reference graph

Works this paper leans on

23 extracted references · 9 linked inside Pith

  1. [1]

    Accessed: 2026-06-14. J. Austin, D. D. Johnson, J. Ho, D. Tarlow, and R. van den Berg. Structured denoising diffusion models in discrete state-spaces. InAdvances in Neural Information Processing Systems (NeurIPS),

  2. [2]

    [Accessed 03-05-2024]. K. T. Butler, D. W. Davies, H. Cartwright, O. Isayev, and A. Walsh. Machine learning for molecular and materials science.Nature, 559(7715):547–555,

  3. [3]

    Autoscientists: Self-organizingagentteamsforlong-runningscientificexperimentation

    S.Gao, A.Fang, andM.Zitnik. Autoscientists: Self-organizingagentteamsforlong-runningscientificexperimentation. arXiv preprint arXiv:2605.28655,

  4. [4]

    Gottweis, W.-H

    J. Gottweis, W.-H. Weng, A. Daryin, T. Tu, A. Palepu, P. Sirkovic, A. Myaskovsky, F. Weissenberger, K. Rong, R. Tanno, K. Saab, D. Popovici, J. Blum, F. Zhang, K. Chou, A. Hassidim, B. Gokturk, A. Vahdat, P. Kohli, Y. Matias, A. Carroll, K. Kulkarni, N. Tomasev, Y. Guan, and V. Natarajan. Towards an AI co-scientist.arXiv preprint arXiv:2502.18864,

  5. [5]

    Grosnit, A

    A. Grosnit, A. Maraval, J. Doran, G. Paolo, A. Thomas, R. S. H. N. Beevi, J. Gonzalez, K. Khandelwal, I. Ia- cobacci, A. Benechehab, H. Cherkaoui, Y. A. El-Hili, K. Shao, J. Hao, J. Yao, B. Kégl, H. Bou-Ammar, and J. Wang. Large language models orchestrating structured reasoning achieve kaggle grandmaster level.arXiv preprint arXiv:2411.03562,

  6. [6]

    Jiang, D

    Z. Jiang, D. Schmidt, D. Srikanth, D. Xu, I. Kaplan, D. Jacenko, and Y. Wu. AIDE: AI-driven exploration in the space of code.arXiv preprint arXiv:2502.13138,

  7. [7]

    URLhttps://arxiv.org/abs/2309.04475. R. Jiao, W. Huang, Y. Liu, D. Zhao, and Y. Liu. Space group constrained crystal generation. InInternational Conference on Learning Representations,

  8. [8]

    URLhttps://arxiv.org/abs/2402.03992. A. Karpathy. AutoResearch: Ai agents running research on single-gpu nanochat training automatically.https: //github.com/karpathy/autoresearch,

  9. [9]

    Accessed: 2026-06-15

    GitHub repository. Accessed: 2026-06-15. N. Kazeev, W. Nong, I. Romanov, R. Zhu, A. Ustyuzhanin, S. Yamazaki, and K. Hippalgaonkar. Wyckoff transformer: Generation of symmetric crystals. InInternational Conference on Machine Learning,

  10. [10]

    org/abs/2503.02407

    URLhttps://arxiv. org/abs/2503.02407. F. E. Kelvinius, O. B. Andersson, A. S. Parackal, D. Qian, R. Armiento, and F. Lindsten. WyckoffDiff–a generative diffusion model for crystal symmetry. InForty-second International Conference on Machine Learning,

  11. [11]

    URLhttps://arxiv.org/abs/2502.03638. J. Liu, S. Qiu, M. Li, B. Li, H. Ji, S. Han, X. Ye, P. Xia, Z. Dong, C. Zhang, et al. Autoresearchclaw: Self-reinforcing autonomous research with human-ai collaboration.arXiv preprint arXiv:2605.20025,

  12. [12]

    C. Lu, C. Lu, R. T. Lange, J. Foerster, J. Clune, and D. Ha. The AI scientist: Towards fully automated open-ended scientific discovery.arXiv preprint arXiv:2408.06292,

  13. [13]

    URLhttps://arxiv.org/abs/2406.04713. R. Müller, S. Kornblith, and G. Hinton. When does label smoothing help? InAdvances in Neural Information Processing Systems (NeurIPS),

  14. [14]

    Nathani, L

    D. Nathani, L. Madaan, N. Roberts, N. Bashlykov, A. Menon, V. Moens, A. Budhiraja, D. Magka, V. Vorotilov, G. Chaurasia, D. Hupkes, R. S. Cabral, T. Shavrina, J. Foerster, Y. Bachrach, W. Y. Wang, and R. Raileanu. MLGym: A new framework and benchmark for advancing AI research agents.arXiv preprint arXiv:2502.14499,

  15. [15]

    Novikov, N

    A. Novikov, N. V˜ u, M. Eisenberger, E. Dupont, P.-S. Huang, A. Z. Wagner, S. Shirobokov, B. Kozlovskii, F. J. R. Ruiz, A. Mehrabian, M. P. Kumar, A. See, S. Chaudhuri, G. Holland, A. Davies, S. Nowozin, P. Kohli, and M. Balog. AlphaEvolve: A coding agent for scientific and algorithmic discovery.arXiv preprint arXiv:2506.13131,

  16. [16]

    Seong, S

    K. Seong, S. Ahn, S. Han, and C. Park. Multimodal crystal flow: Any-to-any modality generation for unified crystal modeling.arXiv preprint arXiv:2602.20210,

  17. [17]

    N. Shazeer. Glu variants improve transformer.arXiv preprint arXiv:2002.05202,

  18. [18]

    T. H. Veljković, J. Rosenthal, I. Lončarić, and J.-W. van de Meent. Crystalite: A lightweight transformer for efficient crystal modeling.arXiv preprint arXiv:2604.02270,

  19. [19]

    J. Wei, Y. Yang, X. Zhang, et al. From AI for science to agentic science: A survey on autonomous scientific discovery. arXiv preprint arXiv:2508.14111,

  20. [20]

    H. Wijk, T. Lin, J. Becker, S. Jawhar, N. Parikh, T. Broadley, L. Chan, M. Chen, J. Clymer, J. Dhyani, E. Ericheva, K. Garcia, B. Goodrich, N. Jurkovic, M. Kinniment, A. Lajko, S. Nix, L. Sato, W. Saunders, M. Taran, B. West, and E. Barnes. RE-bench: Evaluating frontier AI R&D capabilities of language model agents against human experts. arXiv preprint arX...

  21. [21]

    Yamada, R

    Y. Yamada, R. T. Lange, C. Lu, S. Hu, C. Lu, J. Foerster, J. Clune, and D. Ha. The AI scientist-v2: Workshop-level automated scientific discovery via agentic tree search.arXiv preprint arXiv:2504.08066,

  22. [22]

    URLhttps://arxiv.org/ abs/2312.03687. R. Zhu, W. Nong, S. Yamazaki, and K. Hippalgaonkar. WyCryst: Wyckoff inorganic crystal generator framework. Matter, 7(10):3469–3488,

  23. [23]

    H. P. Zou et al. LLM-based human-agent collaboration and interaction systems: A survey.arXiv preprint arXiv:2505.00753,