pith. sign in

arxiv: 2605.26741 · v1 · pith:DVZSF3GMnew · submitted 2026-05-26 · ❄️ cond-mat.mtrl-sci · cs.AI

MatFormBench: A Benchmarking Evaluation Framework for Target-Driven Materials Formulation

Pith reviewed 2026-06-29 17:19 UTC · model grok-4.3

classification ❄️ cond-mat.mtrl-sci cs.AI
keywords materials formulationinverse designbenchmarkinggenerative modelsdiffusion modelstarget optimizationmachine learning evaluation
0
0 comments X

The pith

MatFormBench evaluates 39 algorithms and identifies diffusion-based models as strongest for generating materials that meet target properties.

A machine-rendered reading of the paper's core claim, the machinery that carries it, and where it could break.

The paper introduces MatFormBench to fill the gap in benchmarks that focus only on predicting material properties rather than on the inverse task of generating formulations to hit specific targets. It creates synthetic data through a physics-driven scheme that produces samples mimicking real structure-property relationships, organized into five levels of increasing difficulty. A composite metric called MatFormScore measures each algorithm on five axes including how often it hits the target, how efficiently it searches, and how stable its results are. Testing 39 different methods across 1170 standardized tasks shows diffusion models achieve the best overall results, while variational autoencoders and genetic algorithms hold advantages in narrower situations.

Core claim

MatFormBench integrates a physics-driven formulation generation scheme to generate synthetic samples that faithfully emulate realistic materials structure-property response relationships, complemented by five escalating difficulty levels to quantify the complexity of these relationships. To rigorously assess algorithm performance, MatFormScore comprehensively quantifies performance across target success, search efficiency, exploratory capacity, robustness, and stability. Validation by evaluating 39 diverse inverse design algorithms shows diffusion-based models demonstrate the strongest overall performance, while VAE-based and GA-based methods exhibit distinct advantages in specific scenarios

What carries the argument

MatFormBench ecosystem, built around a physics-driven synthetic data generator and the multi-axis MatFormScore metric that ranks inverse design algorithms.

If this is right

  • Provides a single standard that lets researchers compare classical search methods, deep generative models, and LLM-based strategies on equal footing.
  • Shows diffusion models deliver the highest combined score on target accuracy and stability across difficulty levels.
  • Allows algorithm developers to diagnose whether a method is limited by exploration, robustness, or efficiency.
  • Creates reproducible tasks at five graduated difficulty levels so progress can be tracked as new methods appear.

Where Pith is reading between the lines

These are editorial extensions of the paper, not claims the author makes directly.

  • The same synthetic-generation approach could be reused to create benchmarks for inverse design in chemistry or drug formulation.
  • Researchers could test whether adding real experimental feedback loops improves the correlation between benchmark scores and laboratory outcomes.
  • The multi-axis scoring could be applied to other generative tasks to separate methods that merely fit data from those that generalize to new targets.

Load-bearing premise

The physics-driven scheme produces synthetic samples whose structure-property relationships match those found in actual materials.

What would settle it

If rankings of the same 39 algorithms on real experimental formulation data reverse or diverge sharply from the rankings obtained on MatFormBench tasks, the framework's ability to guide real design would be called into question.

Figures

Figures reproduced from arXiv: 2605.26741 by Chenxi Wang, Chuhan Yang, Linhan Wu, Yuyang Liu, Zhengwei Yang.

Figure 1
Figure 1. Figure 1: Overview of MatFormBench. MatFormBench integrates controllable synthetic oracle construction, heterogeneous inverse design algorithms, multi-axis inverse evaluation metrics, and representative formulation applications. and surrogate-assisted search methods have been applied to explore complex materials design spaces [23]. Deep generative models, including variational autoencoders [27] and generative adver￾… view at source ↗
Figure 2
Figure 2. Figure 2: Overall benchmark performance. Diffusion-based methods achieve the strongest aggre￾gate performance and remain consistently competitive across difficulty regimes [PITH_FULL_IMAGE:figures/full_fig_p008_2.png] view at source ↗
Figure 3
Figure 3. Figure 3: Performance across task regimes. MatFormBench reveals clear regime dependence: VAE methods are strong on smooth tasks, GA-based search is competitive under local discontinuity, and diffusion models dominate multimodal and globally constrained regimes. Success Efficiency Explore Robust Stability 0.2 0.4 0.6 0.8 1.0 Family Metric Profiles Diffusion VAE GAN LLM Search Bayesian Optimization (a) All algorithm f… view at source ↗
Figure 4
Figure 4. Figure 4: Algorithm suitability analysis. Radar plots compare family-level profiles over Success, Efficiency, Explore, Robustness, and Stability. comparison of heterogeneous inverse design algorithms. Across 30 benchmark datasets, 39 algorithms were attempted and 37 produced valid oracle-evaluable outputs. The results show that diffusion￾based models achieve the strongest overall performance, while VAE- and GA-based… view at source ↗
read the original abstract

Inverse design of materials has significantly advanced target-driven formulation optimization, yet existing materials machine learning benchmarks remain limited to forward property prediction, failing to systematically evaluate inverse optimization and generation algorithms, a critical gap that hinders the progress of target-driven materials design. To address this limitation, we propose MatFormBench, a novel benchmarking ecosystem tailored to evaluate and guide generative strategies for target-driven formulation. MatFormBench integrates a physics-driven formulation generation scheme to generate synthetic samples that faithfully emulate realistic materials structure-property response relationships, complemented by five escalating difficulty levels to quantify the complexity of these relationships. To rigorously assess algorithm performance, we further propose MatFormScore, a multi-dimensional metric that comprehensively quantifies performance across five critical axes: target success, search efficiency, exploratory capacity, robustness, and stability. We validate MatFormBench by evaluating 39 diverse inverse design algorithms, covering classical surrogate-assisted black-box search, state-of-the-art deep generative models, and increasingly popular Large Language Model (LLM)-based recommendation strategies. Across 1170 standardized algorithm-task evaluations, diffusion-based models demonstrate the strongest overall performance, while Variational Autoencoder (VAE)-based and Genetic Algorithm (GA)-based methods exhibit distinct advantages in specific scenarios. By establishing a unified evaluation standard for target-driven materials formulation, MatFormBench enables reproducible benchmarking, principled algorithm comparison, and diagnostic analysis of inverse design strategies, providing a foundational tool for advancing materials inverse design.

Editorial analysis

A structured set of objections, weighed in public.

Desk editor's note, referee report, simulated authors' rebuttal, and a circularity audit. Tearing a paper down is the easy half of reading it; the pith above is the substance, this is the friction.

Referee Report

2 major / 2 minor

Summary. The paper introduces MatFormBench, a benchmarking ecosystem for target-driven materials formulation inverse design. It includes a physics-driven synthetic sample generator with five escalating difficulty levels that is asserted to emulate realistic structure-property relationships, the MatFormScore multi-axis metric (target success, search efficiency, exploratory capacity, robustness, stability), and reports results from 39 algorithms (surrogate black-box, deep generative, LLM-based) across 1170 standardized evaluations, with diffusion models showing strongest overall performance and VAE/GA methods advantageous in specific scenarios.

Significance. If the synthetic generator's fidelity to real materials systems can be established, MatFormBench would fill a clear gap by providing the first standardized, reproducible benchmark focused on inverse optimization rather than forward prediction, enabling principled comparison of generative strategies in materials design.

major comments (2)
  1. [Abstract, §3] Abstract and §3 (physics-driven generator description): the central claim that generated samples 'faithfully emulate realistic materials structure-property response relationships' is load-bearing for all 1170 evaluations and the diffusion-model ranking, yet no quantitative validation (e.g., preservation of physical invariants, Wasserstein distances to literature datasets, or reproduction of known phase behaviors) is provided; without this, benchmark rankings risk being artifacts of the synthetic distribution.
  2. [Results] Results section (1170 evaluations): headline performance claims (diffusion strongest overall) are reported without error bars, statistical significance tests, or baseline comparisons that would allow assessment of whether observed differences exceed evaluation noise.
minor comments (2)
  1. [§4] Notation for MatFormScore axes and difficulty levels should be defined with explicit equations or pseudocode rather than prose descriptions to enable exact reproduction.
  2. [Table 1] The manuscript would benefit from a table listing the 39 algorithms with their categories and key hyperparameters to improve clarity of the experimental design.

Simulated Author's Rebuttal

2 responses · 0 unresolved

We thank the referee for the constructive comments, which highlight important aspects for strengthening the manuscript. We address each major comment below and will revise the manuscript to incorporate the suggested improvements.

read point-by-point responses
  1. Referee: [Abstract, §3] Abstract and §3 (physics-driven generator description): the central claim that generated samples 'faithfully emulate realistic materials structure-property response relationships' is load-bearing for all 1170 evaluations and the diffusion-model ranking, yet no quantitative validation (e.g., preservation of physical invariants, Wasserstein distances to literature datasets, or reproduction of known phase behaviors) is provided; without this, benchmark rankings risk being artifacts of the synthetic distribution.

    Authors: We agree that the manuscript currently lacks explicit quantitative validation of the synthetic generator's fidelity. While the generator is constructed from physics-driven principles, no metrics such as Wasserstein distances, invariant preservation, or reproduction of known phase behaviors are reported. In the revised manuscript, we will add these quantitative validations, including direct comparisons to literature datasets where feasible, to substantiate the emulation claim and support the benchmark results. revision: yes

  2. Referee: [Results] Results section (1170 evaluations): headline performance claims (diffusion strongest overall) are reported without error bars, statistical significance tests, or baseline comparisons that would allow assessment of whether observed differences exceed evaluation noise.

    Authors: We acknowledge that the results section reports performance without error bars, statistical significance testing, or additional baseline comparisons. In the revision, we will rerun the 1170 evaluations with multiple random seeds to compute error bars, apply statistical tests (such as paired t-tests) to evaluate the significance of observed differences, and include further baseline comparisons to allow readers to assess whether differences exceed evaluation noise. revision: yes

Circularity Check

0 steps flagged

No significant circularity; benchmark is independent evaluation tool

full rationale

The paper introduces MatFormBench as a standalone benchmarking ecosystem that generates synthetic samples via a physics-driven scheme and then runs 39 external algorithms across 1170 evaluations to produce performance rankings. No equations, fitted parameters, or self-citations are presented that would make the reported rankings (e.g., diffusion models strongest) reduce to the benchmark inputs by construction. The derivation chain consists of defining the generator, defining MatFormScore axes, and executing independent algorithms on the resulting tasks; these steps remain non-tautological and externally falsifiable. This matches the default expectation of an honest non-finding.

Axiom & Free-Parameter Ledger

0 free parameters · 1 axioms · 2 invented entities

Abstract-only review; ledger populated from stated claims in the abstract. The central claim rests on the unverified assumption that the synthetic generator produces realistic structure-property relationships.

axioms (1)
  • domain assumption physics-driven formulation generation scheme faithfully emulates realistic materials structure-property response relationships
    Invoked in the abstract as the basis for generating synthetic samples that the benchmark relies upon.
invented entities (2)
  • MatFormBench no independent evidence
    purpose: benchmarking ecosystem for target-driven formulation
    Newly proposed framework integrating generator and scoring system.
  • MatFormScore no independent evidence
    purpose: multi-dimensional metric quantifying target success, search efficiency, exploratory capacity, robustness, and stability
    Newly proposed scoring system.

pith-pipeline@v0.9.1-grok · 5798 in / 1323 out tokens · 40846 ms · 2026-06-29T17:19:00.057025+00:00 · methodology

discussion (0)

Sign in with ORCID, Apple, or X to comment. Anyone can read and Pith papers without signing in.

Reference graph

Works this paper leans on

56 extracted references · 3 canonical work pages · 2 internal anchors

  1. [1]

    Emerging materials intelligence ecosystems propelled by machine learning.Nature Reviews Materials, 6:655–678, 2021

    Rishikesh Batra, Le Song, and Rampi Ramprasad. Emerging materials intelligence ecosystems propelled by machine learning.Nature Reviews Materials, 6:655–678, 2021

  2. [2]

    Boiko, Robert MacKnight, Ben Kline, and Gabe Gomes

    Daniil A. Boiko, Robert MacKnight, Ben Kline, and Gabe Gomes. Autonomous chemical research with large language models.Nature, 624:570–578, 2023. 9 Table 6: Family-level metric profile. The LLM row is computed from the valid DeepSeek baseline only; GLM-5.1 and KIMI-2.6 fail to produce valid candidate outputs under the benchmark protocol. Family MatFormScor...

  3. [3]

    Bran, Sam Cox, Oliver Schilter, Carlo Baldassari, Andrew D

    Andres M. Bran, Sam Cox, Oliver Schilter, Carlo Baldassari, Andrew D. White, and Philippe Schwaller. Chemcrow: Augmenting large-language models with chemistry tools.Nature Machine Intelligence, 6:525–535, 2024

  4. [4]

    Nathan Brown, Marco Fiscato, Marwin H. S. Segler, and Alain C. Vaucher. Guacamol: Bench- marking models for de novo molecular design.Journal of Chemical Information and Modeling, 59(3):1096–1108, 2019

  5. [5]

    Browne, Edward Powley, Daniel Whitehouse, Simon M

    Cameron B. Browne, Edward Powley, Daniel Whitehouse, Simon M. Lucas, Peter I. Cowling, Philipp Rohlfshagen, Stephen Tavener, Diego Perez, Spyridon Samothrakis, and Simon Colton. A survey of monte carlo tree search methods.IEEE Transactions on Computational Intelligence and AI in Games, 4(1):1–43, 2012

  6. [6]

    Importance weighted autoencoders

    Yuri Burda, Roger Grosse, and Ruslan Salakhutdinov. Importance weighted autoencoders. In International Conference on Learning Representations, 2016

  7. [7]

    Artificial intelligence-driven approaches for materials design and discovery.Nature Materials, 25:174–190, 2026

    Mouyang Cheng, Chu-Liang Fu, Ryotaro Okabe, Abhijatmedhi Chotrattanapituk, Artittaya Boonkird, Nguyen Tuan Hung, and Mingda Li. Artificial intelligence-driven approaches for materials design and discovery.Nature Materials, 25:174–190, 2026

  8. [8]

    Support-vector networks.Machine Learning, 20:273–297, 1995

    Corinna Cortes and Vladimir Vapnik. Support-vector networks.Machine Learning, 20:273–297, 1995

  9. [9]

    Taylor, Lance J

    Stefano Curtarolo, Wahyu Setyawan, Shidong Wang, Junkai Xue, Kesong Yang, Richard H. Taylor, Lance J. Nelson, Gus L. W. Hart, Stefano Sanvito, Marco Buongiorno-Nardelli, Natalio Mingo, and Ohad Levy. Aflowlib.org: A distributed materials properties repository from high-throughput ab initio calculations.Computational Materials Science, 58:227–235, 2012

  10. [10]

    Deepseek-v4: Towards highly efficient million-token context intelligence

    DeepSeek-AI. Deepseek-v4: Towards highly efficient million-token context intelligence. Technical report, 2026

  11. [11]

    Ant system: optimization by a colony of cooperating agents.IEEE Transactions on Systems, Man, and Cybernetics, Part B, 26(1): 29–41, 1996

    Marco Dorigo, Vittorio Maniezzo, and Alberto Colorni. Ant system: optimization by a colony of cooperating agents.IEEE Transactions on Systems, Man, and Cybernetics, Part B, 26(1): 29–41, 1996

  12. [12]

    The nomad laboratory: from data sharing to artificial intelligence.Journal of Physics: Materials, 2(3):036001, 2019

    Claudia Draxl and Matthias Scheffler. The nomad laboratory: from data sharing to artificial intelligence.Journal of Physics: Materials, 2(3):036001, 2019

  13. [13]

    Benchmarking materials property prediction methods: the matbench test set and automatminer reference algorithm.npj Computational Materials, 6:138, 2020

    Alexander Dunn, Qi Wang, Alex Ganose, Daniel Dopp, and Anubhav Jain. Benchmarking materials property prediction methods: the matbench test set and automatminer reference algorithm.npj Computational Materials, 6:138, 2020

  14. [14]

    Peter I. Frazier. A tutorial on bayesian optimization.arXiv preprint arXiv:1807.02811, 2018

  15. [15]

    Wei, David Duvenaud, José Miguel Hernández-Lobato, Benjamin Sánchez-Lengeling, Dennis Sheberla, Jorge Aguilera-Iparraguirre, Timothy D

    Rafael Gómez-Bombarelli, Jennifer N. Wei, David Duvenaud, José Miguel Hernández-Lobato, Benjamin Sánchez-Lengeling, Dennis Sheberla, Jorge Aguilera-Iparraguirre, Timothy D. Hirzel, Ryan P. Adams, and Alán Aspuru-Guzik. Automatic chemical design using a data-driven continuous representation of molecules.ACS Central Science, 4(2):268–276, 2018. 10

  16. [16]

    Generative adversarial nets

    Ian Goodfellow, Jean Pouget-Abadie, Mehdi Mirza, Bing Xu, David Warde-Farley, Sherjil Ozair, Aaron Courville, and Yoshua Bengio. Generative adversarial nets. InAdvances in Neural Information Processing Systems, volume 27, 2014

  17. [17]

    Improved training of wasserstein gans

    Ishaan Gulrajani, Faruk Ahmed, Martin Arjovsky, Vincent Dumoulin, and Aaron Courville. Improved training of wasserstein gans. InAdvances in Neural Information Processing Systems, 2017

  18. [18]

    Burgess, Xavier Glorot, Matthew Botvinick, Shakir Mohamed, and Alexander Lerchner

    Irina Higgins, Loic Matthey, Arka Pal, Christopher P. Burgess, Xavier Glorot, Matthew Botvinick, Shakir Mohamed, and Alexander Lerchner. beta-vae: Learning basic visual con- cepts with a constrained variational framework. InInternational Conference on Learning Representations, 2017

  19. [19]

    Denoising diffusion probabilistic models

    Jonathan Ho, Ajay Jain, and Pieter Abbeel. Denoising diffusion probabilistic models. In Advances in Neural Information Processing Systems, volume 33, pages 6840–6851, 2020

  20. [20]

    Hoerl and Robert W

    Arthur E. Hoerl and Robert W. Kennard. Ridge regression: Biased estimation for nonorthogonal problems.Technometrics, 12(1):55–67, 1970

  21. [21]

    John H. Holland. Adaptation in natural and artificial systems.University of Michigan Press, 1975

  22. [22]

    Anubhav Jain, Shyue Ping Ong, Geoffroy Hautier, Wei Chen, William Davidson Richards, Stephen Dacek, Shreyas Cholia, Dan Gunter, David Skinner, Gerbrand Ceder, and Kristin A. Persson. Commentary: The materials project: A materials genome approach to accelerating materials innovation.APL Materials, 1(1):011002, 2013

  23. [23]

    Jennings, Steen Lysgaard, Jens S

    Paul C. Jennings, Steen Lysgaard, Jens S. Hummelshøj, Tejs Vegge, and Thomas Bligaard. Genetic algorithms for computational materials discovery accelerated by machine learning.npj Computational Materials, 5:46, 2019

  24. [24]

    Jones, Matthias Schonlau, and William J

    Donald R. Jones, Matthias Schonlau, and William J. Welch. Efficient global optimization of expensive black-box functions.Journal of Global Optimization, 13:455–492, 1998

  25. [25]

    Elucidating the design space of diffusion-based generative models

    Tero Karras, Miika Aittala, Timo Aila, and Samuli Laine. Elucidating the design space of diffusion-based generative models. InAdvances in Neural Information Processing Systems, volume 35, pages 26565–26577, 2022

  26. [26]

    Particle swarm optimization

    James Kennedy and Russell Eberhart. Particle swarm optimization. InProceedings of ICNN’95, pages 1942–1948, 1995

  27. [27]

    Kingma and Max Welling

    Diederik P. Kingma and Max Welling. Auto-encoding variational bayes. InInternational Conference on Learning Representations, 2014

  28. [28]

    Saal, Bryce Meredig, Alex Thompson, Jeff W

    Scott Kirklin, James E. Saal, Bryce Meredig, Alex Thompson, Jeff W. Doak, Muratahan Aykol, Stephan Rühl, and Chris Wolverton. The open quantum materials database (oqmd): assessing the accuracy of dft formation energies.npj Computational Materials, 1:15010, 2015

  29. [29]

    Daniel Gelatt, and Mario P

    Scott Kirkpatrick, C. Daniel Gelatt, and Mario P. Vecchi. Optimization by simulated annealing. Science, 220(4598):671–680, 1983

  30. [30]

    Junhyeong Lee, Donggeun Park, Mingyu Lee, Hugon Lee, Kundo Park, Ikjin Lee, and Seunghwa Ryu. Machine learning-based inverse design methods considering data characteristics and design space size in materials design and manufacturing: a review.Materials Horizons, 10:5436–5456, 2023

  31. [31]

    Pacgan: The power of two samples in generative adversarial networks

    Zinan Lin, Ashish Khetan, Giulia Fanti, and Sewoong Oh. Pacgan: The power of two samples in generative adversarial networks. InAdvances in Neural Information Processing Systems, 2018

  32. [32]

    Yaron Lipman, Ricky T. Q. Chen, Heli Ben-Hamu, Maximilian Nickel, and Matt Le. Flow matching for generative modeling. InInternational Conference on Learning Representations, 2023. 11

  33. [33]

    Balachandran, Dezhen Xue, and Ruijuan Yuan

    Turab Lookman, Prasanna V . Balachandran, Dezhen Xue, and Ruijuan Yuan. Active learning in materials science with emphasis on adaptive sampling using uncertainties for targeted design. npj Computational Materials, 5(1):21, 2019

  34. [34]

    Grey wolf optimizer

    Seyedali Mirjalili, Seyed Mohammad Mirjalili, and Andrew Lewis. Grey wolf optimizer. Advances in Engineering Software, 69:46–61, 2014

  35. [35]

    Conditional Generative Adversarial Nets

    Mehdi Mirza and Simon Osindero. Conditional generative adversarial nets.arXiv preprint arXiv:1411.1784, 2014

  36. [36]

    Kimi k2.6 technical report

    Moonshot AI. Kimi k2.6 technical report. Technical report, 2026

  37. [37]

    Molecular sets (moses): A benchmarking platform for molecular generation models.Frontiers in Pharmacology, 11:565644, 2020

    Daniil Polykovskiy, Alexander Zhebrak, Benjamin Sanchez-Lengeling, Sergey Golovanov, Oktai Tatanov, Stanislav Belyaev, Rauf Kurbanov, Aleksey Artamonov, Vladimir Aladinskiy, Mark Veselov, Artur Kadurin, Simon Johansson, Hongming Chen, Sergey Nikolenko, Alan Aspuru-Guzik, and Alex Zhavoronkov. Molecular sets (moses): A benchmarking platform for molecular g...

  38. [38]

    Dral, Matthias Rupp, and O

    Raghunathan Ramakrishnan, Pavlo O. Dral, Matthias Rupp, and O. Anatole von Lilienfeld. Quantum chemistry structures and properties of 134 kilo molecules.Scientific Data, 1:140022, 2014

  39. [39]

    Carl Edward Rasmussen and Christopher K. I. Williams.Gaussian Processes for Machine Learning. MIT Press, 2006

  40. [40]

    Janosh Riebesell, Rhys E. A. Goodall, Anubhav Jain, Philipp Benner, Kristin A. Persson, and Alpha A. Lee. Matbench discovery: An evaluation framework for machine learning crystal stability prediction.arXiv preprint arXiv:2308.14920, 2023

  41. [41]

    Inverse molecular design using machine learning: generative models for matter engineering.Science, 361(6400):360–365, 2018

    Benjamin Sanchez-Lengeling and Alán Aspuru-Guzik. Inverse molecular design using machine learning: generative models for matter engineering.Science, 361(6400):360–365, 2018

  42. [42]

    Adams, and Nando de Freitas

    Bobak Shahriari, Kevin Swersky, Ziyu Wang, Ryan P. Adams, and Nando de Freitas. Taking the human out of the loop: A review of bayesian optimization.Proceedings of the IEEE, 104 (1):148–175, 2016

  43. [43]

    Learning structured output representation using deep conditional generative models

    Kihyuk Sohn, Honglak Lee, and Xinchen Yan. Learning structured output representation using deep conditional generative models. InAdvances in Neural Information Processing Systems, 2015

  44. [44]

    Denoising diffusion implicit models

    Jiaming Song, Chenlin Meng, and Stefano Ermon. Denoising diffusion implicit models. In International Conference on Learning Representations, 2021

  45. [45]

    Kingma, Abhishek Kumar, Stefano Ermon, and Ben Poole

    Yang Song, Jascha Sohl-Dickstein, Diederik P. Kingma, Abhishek Kumar, Stefano Ermon, and Ben Poole. Score-based generative modeling through stochastic differential equations. In International Conference on Learning Representations, 2021

  46. [46]

    Kakade, and Matthias Seeger

    Niranjan Srinivas, Andreas Krause, Sham M. Kakade, and Matthias Seeger. Gaussian pro- cess optimization in the bandit setting: No regret and experimental design. InInternational Conference on Machine Learning, 2010

  47. [47]

    Tomczak and Max Welling

    Jakub M. Tomczak and Max Welling. Vae with a vampprior. InInternational Conference on Artificial Intelligence and Statistics, 2018

  48. [48]

    Lively, and Rampi Ramprasad

    Huan Tran, Rishi Gurnani, Chiho Kim, Ghanshyam Pilania, Ha-Kyung Kwon, Ryan P. Lively, and Rampi Ramprasad. Design of functional and sustainable polymers assisted by artificial intelligence.Nature Reviews Materials, 9:866–886, 2024

  49. [49]

    A general-purpose machine learning framework for predicting properties of inorganic materials.npj Computational Materials, 2:16028, 2016

    Logan Ward, Ankit Agrawal, Alok Choudhary, and Christopher Wolverton. A general-purpose machine learning framework for predicting properties of inorganic materials.npj Computational Materials, 2:16028, 2016

  50. [50]

    Feinberg, Joseph Gomes, Caleb Geniesse, Aneesh S

    Zhenqin Wu, Bharath Ramsundar, Evan N. Feinberg, Joseph Gomes, Caleb Geniesse, Aneesh S. Pappu, Karl Leswing, and Vijay Pande. Moleculenet: A benchmark for molecular machine learning.Chemical Science, 9:513–530, 2018. 12

  51. [51]

    Modeling tabular data using conditional gan

    Lei Xu, Maria Skoularidou, Alfredo Cuesta-Infante, and Kalyan Veeramachaneni. Modeling tabular data using conditional gan. InAdvances in Neural Information Processing Systems, 2019

  52. [52]

    Balachandran, Ruijuan Yuan, Tao Hu, Xuefeng Qian, Edward R

    Dezhen Xue, Prasanna V . Balachandran, Ruijuan Yuan, Tao Hu, Xuefeng Qian, Edward R. Dougherty, and Turab Lookman. Accelerated search for materials with targeted properties by adaptive design.Nature Communications, 7:11241, 2016

  53. [53]

    Hanisch, Jian Ma, and Anima Anandkumar

    Liang Yan, Beom Seok Kang, Maurice D. Hanisch, Jian Ma, and Anima Anandkumar. MGB: The material generation benchmark. InAI for Accelerated Materials Design - NeurIPS 2025, 2025

  54. [54]

    Firefly algorithms for multimodal optimization.International Symposium on Stochastic Algorithms, pages 169–178, 2009

    Xin-She Yang. Firefly algorithms for multimodal optimization.International Symposium on Stochastic Algorithms, pages 169–178, 2009

  55. [55]

    Glm-5.1 technical report

    Zhipu AI. Glm-5.1 technical report. Technical report, 2026

  56. [56]

    Inverse design in search of materials with target functionalities.Nature Reviews Chemistry, 2(4):0121, 2018

    Alex Zunger. Inverse design in search of materials with target functionalities.Nature Reviews Chemistry, 2(4):0121, 2018. A Benchmark Dataset and Oracle Details A.1 Oracle implementation details MatFormBench represents each candidate formulation as a bounded continuous vector x= (x1, . . . , xd)∈[−1,1] d, with d∈ {5,10,15} . Beyond the oracle components s...