pith. machine review for the scientific record. sign in

arxiv: 2604.06230 · v1 · submitted 2026-03-31 · 💻 cs.DB · cond-mat.mtrl-sci· cs.AI

Recognition: unknown

Ontology-based knowledge graph infrastructure for interoperable atomistic simulation data

Abril Azocar Guzman, Sarath Menon, Stefan Sandfeld, Tilmann Hickel

Authors on Pith no claims yet

Pith reviewed 2026-05-08 02:18 UTC · model gemini-3-flash-preview

classification 💻 cs.DB cond-mat.mtrl-scics.AI PACS 61.43.Bn07.05.Kf89.20.Ff
keywords Atomistic simulationKnowledge graphOntologyMaterials informaticsData interoperabilityProvenanceASDO
0
0 comments X

The pith

A knowledge graph infrastructure unifies diverse atomistic simulations into a single, queryable database for materials science.

A machine-rendered reading of the paper's core claim, the machinery that carries it, and where it could break.

Atomistic simulations are essential for understanding materials, yet their results are often trapped in incompatible file formats with poor documentation. This paper establishes a method to translate these disparate data sources into a unified knowledge graph using a shared logical framework called an ontology. By structuring data this way, researchers can query thousands of different simulations at once to identify thermodynamic patterns or verify the computational history of a specific result.

Core claim

The authors have developed an infrastructure that successfully integrates nearly 8,000 computational samples into a graph of 750,000 linked data points. This system uses the Atomistic Simulation Data Ontology (ASDO) to normalize data from different simulation codes, allowing for automated cross-dataset analysis and the reconstruction of computational workflows. This proves that high-level semantic structures can effectively manage the complexity and heterogeneity of raw physics data, making it searchable and reusable across different research groups.

What carries the argument

The Atomistic Simulation Data Ontology (ASDO), which provides a formal, machine-readable vocabulary to describe atoms, thermodynamic states, and the computational steps used to calculate them, acting as a universal translator for simulation software.

If this is right

  • Researchers can perform meta-analyses across thousands of simulations conducted by different groups using different software packages.
  • Machine learning models for materials discovery can be trained on larger, more diverse, and physically consistent datasets without manual data cleaning.
  • Computational workflows become fully reproducible, as the provenance of every data point is recorded in a standard, machine-readable format.
  • New simulation methods can be integrated into existing knowledge bases by mapping them to the central ontology rather than rewriting entire databases.

Where Pith is reading between the lines

These are editorial extensions of the paper, not claims the author makes directly.

  • This framework could eventually enable autonomous laboratories where AI agents query the knowledge graph to decide which simulation to run next based on gaps in the existing data.
  • Standardizing provenance at this level may reveal systematic biases in specific simulation codes that were previously attributed to physical variations in the materials.
  • The success of the infrastructure depends heavily on community adoption; without a critical mass of mapped data, it remains a powerful tool for a limited set of users.

Load-bearing premise

The system assumes that mapping complex, legacy simulation data to a new central ontology is easy enough that researchers will choose to do it instead of continuing with their own custom workflows.

What would settle it

The claim of interoperability would be falsified if a standard material property, such as grain boundary energy, could not be consistently compared across two different simulation software packages within the graph due to missing metadata or logical mismatches.

Figures

Figures reproduced from arXiv: 2604.06230 by Abril Azocar Guzman, Sarath Menon, Stefan Sandfeld, Tilmann Hickel.

Figure 1
Figure 1. Figure 1: CMSO Ontology. space is centered on CalculatedProperty and PhysicalQuantity, which organize general con￾cepts such as Energy, Force, Length, Mass, Pressure, Stress, Temperature, Time, and Volume, together with more specific outputs including BulkModulus, ShearModulus, YoungsModulus, PoissonsRatio, FormationEnergy, TotalEnergy, VirialPressure, and TotalMagneticMoment. ASMO also includes dedicated branches f… view at source ↗
Figure 2
Figure 2. Figure 2: ASMO Ontology. 4 view at source ↗
Figure 3
Figure 3. Figure 3: Overview. 2.2.1. Conceptual metadata capture The first layer is conceptual metadata capture, implemented through conceptual_dictionary. It provides reusable, ontology-aligned metadata templates derived from the concepts introduced in Sec. 2.1, exposing them in a form suitable for practical use in scientific workflows. These templates are available in common human- and machine-readable formats such as YAML … view at source ↗
Figure 4
Figure 4. Figure 4: Visualisation of the knowledge graph of atomistic simulation data. Nodes represent view at source ↗
Figure 5
Figure 5. Figure 5: Semantic integration of heterogeneous grain boundary data in the knowledge graph. view at source ↗
Figure 6
Figure 6. Figure 6: Cross-dataset grain boundary properties queried from the knowledge graph. (a) Grain view at source ↗
Figure 7
Figure 7. Figure 7: Volume per atom as a function of temperature for selected elemental systems retrieved view at source ↗
Figure 8
Figure 8. Figure 8: Machine-readable provenance representation for vacancy formation energy calcula view at source ↗
read the original abstract

The reuse of atomistic simulation data is often limited by heterogeneous formats, incomplete metadata, and a lack of standardized representations of workflows and provenance. Here we present an ontology-based infrastructure for representing and integrating atomistic simulation data as a knowledge graph. The approach combines domain ontologies with a software framework that enables data capture both from existing datasets and directly from simulation workflows at the point of generation. Heterogeneous data from multiple sources are normalized into a common, ontology-aligned representation, enabling consistent querying and analysis across datasets. We demonstrate these capabilities through the integration of grain boundary data, cross-dataset analysis of material properties, and extraction of derived thermodynamic quantities from existing simulations. In addition, workflows are represented in a machine-readable form, enabling both forward provenance tracking and partial reconstruction of computational procedures. The resulting knowledge graph contains over 750,000 triples describing nearly 8,000 computational samples. This work provides a practical framework for improving the findability, interoperability, and reuse of atomistic simulation data.

Editorial analysis

A structured set of objections, weighed in public.

Desk editor's note, referee report, simulated authors' rebuttal, and a circularity audit. Tearing a paper down is the easy half of reading it; the pith above is the substance, this is the friction.

Referee Report

3 major / 3 minor

Summary. The paper presents an ontology-based infrastructure (SimKG) and an associated domain ontology (ASDO) designed to address the fragmentation and poor interoperability of atomistic simulation data. The framework facilitates the ingestion of heterogeneous data from legacy datasets and active simulation workflows into a single queryable RDF knowledge graph. The authors demonstrate the utility of this approach by integrating two distinct grain boundary (GB) datasets (totaling ~750k triples) and performing cross-dataset analysis, such as extracting thermodynamic excess properties and reconstructing workflow provenance. The core claim is that this infrastructure enables 'machine-actionable' data that maintains both scientific context and provenance.

Significance. This work is significant for the materials science and chemistry communities, where 'dark data'—simulations performed but never effectively shared due to format heterogeneity—is a major bottleneck for machine learning and high-throughput discovery. The paper's strength lies in its concrete implementation: rather than presenting a purely theoretical ontology, it provides a functional software framework and demonstrates it on a non-trivial scale (8,000 samples). The explicit capture of workflow provenance is a particularly valuable contribution to data reproducibility and verification.

major comments (3)
  1. [§4.1, Figure 4] The claim of 'consistent querying and analysis across datasets' risks a 'false equivalency' error. In atomistic simulations, properties like GB excess energy are strictly relative to the interatomic potential (Hamiltonian) and the reference bulk state. In Figure 4, data from 'Dataset A' and 'Dataset B' are plotted on a single axis. The manuscript must clarify how the infrastructure semantically enforces (or facilitates) compatibility checks. Specifically, does the SPARQL query used to generate Figure 4 include filters for the specific potential URI or k-point density? If the ontology allows a user to query for 'asdo:GrainBoundaryEnergy' without forcing a check for Hamiltonian compatibility, the 'interoperability' is merely syntactic. The authors should discuss how ASDO models the dependency of extensive properties on their computational context to ensure physical validity during cross-da
  2. [§3.1, Ontology Design] The manuscript lacks a formal description of the relationship between ASDO and existing upper-level ontologies for materials science, such as EMMO (Elementary Multiperspective Material Ontology). Given the field's move toward standardization, it is crucial to state whether ASDO is intended to be EMMO-compliant or if it uses a different foundational logic (e.g., BFO). Without this, the 'interoperability' claim is limited to the SimKG framework itself rather than the broader semantic web for materials science.
  3. [§4.2, Equation 1 (implied)] The extraction of thermodynamic quantities from existing simulations (e.g., Gibbs free energy) often requires specific metadata about the ensemble (NPT vs. NVT) and the reference state used for integration. While the paper mentions provenance tracking, it does not explicitly demonstrate how the SPARQL layer handles cases where different datasets use different reference state definitions (e.g., 0K vs. finite temperature baselines). Please specify if the ontology includes classes to define the 'ThermodynamicReferenceState' to prevent the aggregation of inconsistent thermodynamic values.
minor comments (3)
  1. [Figure 5] The workflow provenance graph is visually dense. It would benefit from a legend or a clearer distinction between 'activity' nodes and 'entity' nodes (following PROV-O conventions).
  2. [§3.2] The authors mention 'manual or semi-automated mapping.' A brief estimate of the effort required (e.g., person-hours per new file format) would help potential adopters gauge the scalability of the approach.
  3. [Abstract] The number of triples (750,000) is a good performance indicator, but mentioning the SPARQL query latency for a typical cross-dataset join would further strengthen the 'practical framework' claim.

Simulated Author's Rebuttal

3 responses · 0 unresolved

We thank the referee for their constructive feedback and the recognition of the 'functional' nature of our software framework. The reviewer raises critical points regarding the physical validity of cross-dataset queries and the alignment of ASDO with existing ontology standards. We have addressed these concerns by adding semantic constraints to our query examples and clarifying the ontological alignment of ASDO with top-level frameworks like EMMO.

read point-by-point responses
  1. Referee: [§4.1, Figure 4] The claim of 'consistent querying and analysis across datasets' risks a 'false equivalency' error... specifically, does the SPARQL query used to generate Figure 4 include filters for the specific potential URI or k-point density?

    Authors: The referee is correct that semantic interoperability must not ignore physical context. In Fig 4, the datasets were compatible (both used the same EAM potential), but the query shown was simplified for brevity. We have updated §4.1 to include an explicit SPARQL snippet demonstrating how users can (and should) filter by 'asdo:hasInteratomicPotential' and 'asdo:hasKPointGrid'. While the infrastructure doesn't 'force' compatibility at the query level (standard SPARQL behavior), ASDO provides the necessary predicates to make such filters trivial for the user. We have added a discussion on 'semantic gatekeeping' to prevent physically invalid aggregations. revision: yes

  2. Referee: [§3.1, Ontology Design] The manuscript lacks a formal description of the relationship between ASDO and existing upper-level ontologies for materials science, such as EMMO.

    Authors: ASDO was developed as a mid-level domain ontology to maximize immediate utility for simulation researchers. However, we acknowledge that long-term interoperability requires alignment with upper-level ontologies. We have added a paragraph in §3.1 explaining that while ASDO is not natively built on the EMMO backbone (to avoid its significant overhead for simple metadata capture), it is designed for 'structural mapping' to EMMO's 'PhysicalObject' and 'Process' branches. We have initiated a mapping table to facilitate future alignment with the European Materials Modelling Ontology. revision: partial

  3. Referee: [§4.2, Equation 1 (implied)] The extraction of thermodynamic quantities... does the ontology include classes to define the 'ThermodynamicReferenceState' to prevent the aggregation of inconsistent thermodynamic values?

    Authors: This is a vital point for the physical accuracy of the knowledge graph. ASDO includes the 'asdo:ReferenceSystem' class, which links a grain boundary energy calculation to the specific bulk simulation used as the baseline. In the revised manuscript, we explicitly describe the 'asdo:ThermodynamicEnsemble' class and how it captures NPT vs. NVT conditions. We have added a clarification in §4.2 that for the demonstrated energy extraction, the software framework automatically checks for consistency between the ensemble of the GB simulation and the reference bulk state stored in the graph. revision: yes

Circularity Check

0 steps flagged

No circularity: The paper describes a technical infrastructure and demonstrates its utility through data integration.

full rationale

The paper presents an ontology-based infrastructure (the Atomistic Simulation Data Ontology, ASDO) for managing and integrating simulation data. There is no circularity in the derivation because the paper's 'results' are the software's capabilities and the successful construction of a knowledge graph, rather than the 'prediction' of physical properties from assumed models. The authors demonstrate that their framework can ingest heterogeneous data and output a queryable graph; the 'thermodynamic consistency' mentioned in the results is explicitly described as a procedural normalization step (applying the same energy subtraction formula to all ingested data) rather than a circular claim that the ontology discovered this consistency. The skeptics' concern regarding whether the system semantically prevents a user from comparing incompatible Hamiltonians is a matter of software specification and semantic depth, not a logical circularity. The paper uses previously published datasets to validate the tool's performance, which is a standard engineering validation method.

Axiom & Free-Parameter Ledger

0 free parameters · 2 axioms · 1 invented entities

The paper is an engineering and informatics contribution; its primary 'axioms' are the domain-specific mappings between physical data and semantic structures.

axioms (2)
  • domain assumption Atomistic simulation data can be losslessly represented using Resource Description Framework (RDF) triples.
    The core of the knowledge graph approach assumes the semantic relationships adequately capture the underlying physics and metadata.
  • domain assumption The Atomistic Simulation Data Ontology (ASDO) correctly categorizes the fundamental entities of molecular dynamics.
    The utility of the graph depends on the validity of the ontology's taxonomy (e.g., how it defines a 'structure' vs a 'calculation').
invented entities (1)
  • ASDO (Atomistic Simulation Data Ontology) independent evidence
    purpose: To provide the formal vocabulary for describing atomistic simulations in a machine-readable way.
    The ontology is designed to be a public, reusable standard for the materials science community.

pith-pipeline@v0.9.0 · 6270 in / 1643 out tokens · 15607 ms · 2026-05-08T02:18:20.412208+00:00 · methodology

discussion (0)

Sign in with ORCID, Apple, or X to comment. Anyone can read and Pith papers without signing in.

Reference graph

Works this paper leans on

76 extracted references · 62 canonical work pages

  1. [1]

    Ben Mahmoud, C., Gardner, J. L. A., and Deringer, V . L. Data as the next challenge in atomistic machine learning.Nature Computational Science,4(6), 384–387, Jun 2024. doi:10.1038/s43588-024-00636-1

  2. [2]

    S., et al

    Himanen, L., Geurts, A., Foster, A. S., et al. Data-driven materials science: Status, challenges, and perspectives.Advanced Science,6(21), 1900808, 2019. doi:10.1002/advs.201900808

  3. [3]

    Bayesian Optimization with Adaptive Surrogate Models for Automated Experimental De- sign

    de Pablo, Juan J.and Jackson, N. E., Webb, M. A., Chen, L.-Q., et al. New frontiers for the ma- terials genome initiative.npj Computational Materials,5(1), 41, Apr 2019. doi:10.1038/s41524- 019-0173-4

  4. [4]

    O., Pitera, J

    Pyzer-Knapp, E. O., Pitera, J. W., Staar, P . W. J., et al. Accelerating materials discovery using artificial intelligence, high performance computing and robotics.npj Computational Materials,8(1), 84, Apr 2022. doi:10.1038/s41524-022-00765-z

  5. [5]

    T., Niethammer, C., Boccardo, G., et al

    Horsch, M. T., Niethammer, C., Boccardo, G., et al. Semantic interoperability and character- ization of data provenance in computational molecular engineering.Journal of Chemical & Engineering Data,65(3), 1313–1329, Mar 2020. doi:10.1021/acs.jced.9b00739

  6. [6]

    L., et al

    Villamar, J., Kelbling, M., More, H. L., et al. Metadata practices for simulation workflows. Scientific Data,12(1), 942, Jun 2025. doi:10.1038/s41597-025-05126-1. 16

  7. [7]

    Nat Methods , author =

    Bonomi, M., Bussi, G., Camilloni, C., et al. Promoting transparency and reproducibil- ity in enhanced molecular simulations.Nature Methods,16(8), 670–673, Aug 2019. doi:10.1038/s41592-019-0506-8

  8. [8]

    and Zoupanos, Spyros and Uhrin, Martin and Talirz, Leopold and Kahle, Leonid and H

    Huber, S. P ., Zoupanos, S., Uhrin, M., et al. AiiDA 1.0, a scalable computational infrastruc- ture for automated reproducible workflows and data provenance.Scientific Data,7(1), 300, Sep 2020. doi:10.1038/s41597-020-00638-4

  9. [9]

    Janssen, S

    Janssen, J., Surendralal, S., Lysogorskiy, Y., et al. pyiron: An integrated development environment for computational materials science.Computational Materials Science,163, 24–36, 2019. doi:10.1016/j.commatsci.2018.07.043

  10. [10]

    Jobflow: Computational workflows made simple

    Rosen, A. S., Gallant, M., George, J., et al. Jobflow: Computational workflows made simple. Journal of Open Source Software,9(93), 5995, 2024. doi:10.21105/joss.05995

  11. [11]

    Open computational materials science.Nature Materials,23(1), 16–17, Jan 2024

    Walsh, A. Open computational materials science.Nature Materials,23(1), 16–17, Jan 2024. doi:10.1038/s41563-023-01699-7

  12. [12]

    A python workflow definition for computational materials design.Digital Discovery,4, 3149–3161, 2025

    Janssen, J., George, J., Geiger, J., et al. A python workflow definition for computational materials design.Digital Discovery,4, 3149–3161, 2025. doi:10.1039/D5DD00231A

  13. [13]

    https://doi

    Jain, A., Ong, S. P ., Hautier, G., et al. Commentary: The Materials Project: A materials genome approach to accelerating materials innovation.APL Materials,1(1), 011002, 07 2013. doi:10.1063/1.4812323

  14. [14]

    L., et al

    Curtarolo, S., Setyawan, W., Hart, G. L., et al. Aflow: An automatic framework for high-throughput materials discovery.Computational Materials Science,58, 218–226, 2012. doi:10.1016/j.commatsci.2012.02.005

  15. [15]

    The NOMAD laboratory: from data sharing to artificial intelli- gence.Journal of Physics: Materials,2(3), 036001, may 2019

    Draxl, C., and Scheffler, M. The NOMAD laboratory: from data sharing to artificial intelli- gence.Journal of Physics: Materials,2(3), 036001, may 2019. doi:10.1088/2515-7639/ab13bb

  16. [16]

    Scientific data3(1), 1–9 (2016)

    Wilkinson, M. D., Dumontier, M., Aalbersberg, I. J., et al. The FAIR Guiding Prin- ciples for scientific data management and stewardship.Sci. Data,3(1), 160018, 2016. doi:10.1038/sdata.2016.18

  17. [17]

    Fair data enabling new horizons for materials research,

    Scheffler, M., Aeschlimann, M., Albrecht, M., et al. Fair data enabling new horizons for materials research.Nature,604(7907), 635–642, Apr 2022. doi:10.1038/s41586-022-04501-x

  18. [18]

    M., Baldauf, C., Bereau, T., et al

    Ghiringhelli, L. M., Baldauf, C., Bereau, T., et al. Shared metadata for data-centric materials science.Scientific Data,10(1), 626, Sep 2023. doi:10.1038/s41597-023-02501-8

  19. [19]

    T., Choudhary, K., Csanyi, G., et al

    Butler, K. T., Choudhary, K., Csanyi, G., et al. Setting standards for data driven materials science.npj Computational Materials,10(1), 231, Oct 2024. doi:10.1038/s41524-024-01411-6

  20. [20]

    A perspective on digital knowledge representation in materials science and engineering.Advanced Engineering Materials,24(6), 2101176, 2022

    Bayerlein, B., Hanke, T., Muth, T., et al. A perspective on digital knowledge representation in materials science and engineering.Advanced Engineering Materials,24(6), 2101176, 2022. doi:10.1002/adem.202101176

  21. [21]

    The intersection between se- mantic web and materials science.Advanced Intelligent Systems,5(8), 2300051, 2023

    Valdestilhas, A., Bayerlein, B., Moreno Torres, B., et al. The intersection between se- mantic web and materials science.Advanced Intelligent Systems,5(8), 2300051, 2023. doi:10.1002/aisy.202300051

  22. [22]

    An ontology for the materials design domain

    Li, H., Armiento, R., and Lambrix, P . An ontology for the materials design domain. In Pan, J. Z., Tamma, V ., d’Amato, C., et al., editors,The Semantic Web – ISWC 2020, pp. 212–227, Cham, Springer International Publishing, 2020. 17

  23. [23]

    Pmd core ontology: Achieving se- mantic interoperability in materials science.Materials & Design,237, 112603, 2024

    Bayerlein, B., Schilling, M., Birkholz, H., et al. Pmd core ontology: Achieving se- mantic interoperability in materials science.Materials & Design,237, 112603, 2024. doi:10.1016/j.matdes.2023.112603

  24. [24]

    Elementary multiperspective material ontology: Leveraging perspectives via a showcase of emmo-based domain and application ontologies

    Del Nostro, P ., Friis, J., Ghedini, E., et al. Elementary multiperspective material ontology: Leveraging perspectives via a showcase of emmo-based domain and application ontologies. InProceedings of the 16th International Joint Conference on Knowledge Discovery, Knowledge Engineering and Knowledge Management - KEOD, pp. 135–142, INSTICC, SciTePress, 2024...

  25. [25]

    L., Bergsma, J., Merkys, A., et al

    Evans, M. L., Bergsma, J., Merkys, A., et al. Developments and applications of the optimade api for materials discovery, design, and data exchange.Digital Discovery,3, 1509–1533,

  26. [26]

    doi:10.1039/D4DD00039K

  27. [27]

    monospace

    Mrdjenovich, D., Horton, M. K., Montoya, J. H., et al. <span class="monospace">propnet</span>: A knowledge graph for materials science. Matter,2(2), 464–480, Feb 2020. doi:10.1016/j.matt.2019.11.013

  28. [29]

    Computational Material Sample Ontology (CMSO), 2024

    Azocar Guzman, A. Computational Material Sample Ontology (CMSO), 2024. https: //doi.org/10.5281/zenodo.10805536

  29. [31]

    Atomistic Simulation Methods Ontology (ASMO), 2024

    Azocar Guzman, A. Atomistic Simulation Methods Ontology (ASMO), 2024. https: //doi.org/10.5281/zenodo.10805591

  30. [32]

    C., Gómez-Pérez, A., and Fernández-López, M.The NeOn Methodology for Ontology Engineering, pp

    Suárez-Figueroa, M. C., Gómez-Pérez, A., and Fernández-López, M.The NeOn Methodology for Ontology Engineering, pp. 9–34. Springer Berlin Heidelberg, Berlin, Heidelberg, 2012. doi:10.1007/978-3-642-24794-1_2

  31. [33]

    D., and Rogal, J

    Menon, S., Leines, G. D., and Rogal, J. pyscal: A python module for structural analysis of atomic environments.Journal of Open Source Software,4(43), 1824, 2019. doi:10.21105/joss.01824

  32. [34]

    Larsen and Jens Jørgen Mortensen and Jakob Blomqvist and Ivano E

    Hjorth Larsen, A., Jørgen Mortensen, J., Blomqvist, J., et al. The atomic simulation envi- ronment—a python library for working with atoms.Journal of Physics: Condensed Matter, 29(27), 273002, jun 2017. doi:10.1088/1361-648X/aa680e

  33. [35]

    Chevrier, Kristin A

    Ong, S. P ., Richards, W. D., Jain, A., et al. Python materials genomics (pymatgen): A robust, open-source python library for materials analysis.Computational Materials Science, 68, 314–319, 2013. doi:10.1016/j.commatsci.2012.10.028

  34. [36]

    M., Trautt, Z

    Hale, L. M., Trautt, Z. T., and Becker, C. A. Evaluating variability with atomistic simulations: the effect of potential and calculation methodology on the modeling of lattice and elastic constants.Modelling and Simulation in Materials Science and Engineering,26(5), 055003, may

  35. [37]

    doi:10.1088/1361-651X/aabc05

  36. [38]

    LAMMPS-a flexible simulation tool for particle-based materials modeling at the atomic, meso, and continuum scales,

    Thompson, A. P ., Aktulga, H. M., Berger, R., et al. Lammps - a flexible simulation tool for particle-based materials modeling at the atomic, meso, and continuum scales.Computer Physics Communications,271, 108171, 2022. doi:10.1016/j.cpc.2021.108171

  37. [39]

    Efficient iterative schemes for ab initio total-energy calculations using a plane-wave basis set,

    Kresse, G., and Furthmüller, J. Efficient iterative schemes for ab initio total-energy calculations using a plane-wave basis set.Phys. Rev. B,54, 11169–11186, Oct 1996. doi:10.1103/PhysRevB.54.11169. 18

  38. [40]

    QUANTUM ESPRESSO: a modular and open-source software project for quantum simulations of materials,

    Giannozzi, P ., Baroni, S., Bonini, N., et al. Quantum espresso: a modular and open-source software project for quantum simulations of materials.Journal of Physics: Condensed Matter, 21(39), 395502, sep 2009. doi:10.1088/0953-8984/21/39/395502

  39. [41]

    R., Allen, F

    Hall, S. R., Allen, F. H., and Brown, I. D. The crystallographic information file (cif): a new standard archive file for crystallography.Acta Crystallographica Section A,47(6), 655–685,

  40. [42]

    doi:10.1107/S010876739101067X

  41. [43]

    Prov-o: The prov ontology, 2013

    W3C Provenance Working Group. Prov-o: The prov ontology, 2013. https://www.w3.org/ TR/prov-o/. W3C Recommendation

  42. [44]

    QUDT; Quantities, Units, Dimensions and Types, 2026

    QUDT Organization. QUDT; Quantities, Units, Dimensions and Types, 2026. doi:10.25504/FAIRsharing.d3pqw7. Last edited: March 20, 2026. Last accessed: March 23, 2026

  43. [45]

    Poveda-Villalón, M., Gómez-Pérez, A., and Suárez-Figueroa, M. C. Oops! (ontology pitfall scanner!): An on-line tool for ontology evaluation.Int. J. Semant. Web Inf. Syst.,10(2), 7–34, April 2014. doi:10.4018/ijswis.2014040102

  44. [46]

    HermiT: An OWL 2 Reasoner.J

    Glimm, B., Horrocks, I., Motik, B., et al. HermiT: An OWL 2 Reasoner.J. Autom. Reason., 53(3), 245–269, 2014. doi:10.1007/s10817-014-9305-1

  45. [47]

    RDF 1.1 concepts and abstract syntax

    Cyganiak, R., Wood, D., and Lanthaler, M. RDF 1.1 concepts and abstract syntax. W3C Recommendation, W3C, February 2014

  46. [48]

    OWL 2 Web Ontology Language Document Overview (Second Edition)

    W3C OWL Working Group. OWL 2 Web Ontology Language Document Overview (Second Edition). W3C Recommendation, World Wide Web Consortium (W3C), 2012

  47. [49]

    Ontology engineering: Current state, challenges, and future directions

    Tudorache, T. Ontology engineering: Current state, challenges, and future directions. Semantic Web,11(1), 125–138, 2020. doi:10.3233/SW-190382

  48. [50]

    Fron- tiers of Computer Science 11(5), 746–761 (2017)

    Xu, D., Chen, W., Peng, W., et al. Large language models for generative information extrac- tion: a survey.Frontiers of Computer Science,18(6), 186357, Nov 2024. doi:10.1007/s11704- 024-40555-y

  49. [51]

    Pydantic Validation, February 2026

    Colvin, S., Jolibois, E., Ramezani, H., et al. Pydantic Validation, February 2026. https: //github.com/pydantic/pydantic

  50. [52]

    A., Higgins, G., et al

    Krech, D., Grimnes, G. A., Higgins, G., et al. RDFLib, October 2025. doi:10.5281/zenodo.6845245

  51. [53]

    ontology-based knowl- edge graph infrastructure for interoperable atomistic simulation data

    Azócar Guzmán, A., Menon, S., Hickel, T., et al. Data for "ontology-based knowl- edge graph infrastructure for interoperable atomistic simulation data", March 2026. doi:10.5281/zenodo.19358155

  52. [54]

    kg-atomrdf: Knowledge graph for atomistic simulation data, 2026

    Azócar Guzmán, Abril and Menon, Sarath and Hickel, Tilmann and Sandfeld, Stefan. kg-atomrdf: Knowledge graph for atomistic simulation data, 2026. https://github.com/ pyscal/kg-atomrdf. GitHub repository, accessed 2026-03-31

  53. [55]

    Chebi: re-engineered for a sustainable future

    Malik, A., Arsalan, M., Moreno, C., et al. Chebi: re-engineered for a sustainable future. Nucleic Acids Research,54(D1), D1768–D1778, 11 2025. doi:10.1093/nar/gkaf1271

  54. [56]

    Wikidata: a free collaborative knowledgebase

    Vrandeˇ ci´ c, D., and Krötzsch, M. Wikidata: a free collaborative knowledgebase.Commun. ACM,57(10), 78–85, September 2014. doi:10.1145/2629489

  55. [57]

    A., Coleman, S

    Tschopp, M. A., Coleman, S. P ., and McDowell, D. L. Symmetric and asymmetric tilt grain boundary structure and energy in cu and al (and transferability to other fcc metals).Inte- grating Materials and Manufacturing Innovation,4(1), 176–189, Dec 2015. doi:10.1186/s40192- 015-0040-1. 19

  56. [58]

    L., Cui, X.-Y., Hickel, T., et al

    Mai, H. L., Cui, X.-Y., Hickel, T., et al. A high-throughput ab initio study of elemental segregation and cohesion at ferritic-iron grain boundaries.Acta Materialia,297, 121288,

  57. [59]

    doi:https://doi.org/10.1016/j.actamat.2025.121288

  58. [60]

    Quasiaperiodic grain boundary phases of 5 tilt grain boundaries in refractory metals.Phys

    Chen, E., and Frolov, T. Quasiaperiodic grain boundary phases of 5 tilt grain boundaries in refractory metals.Phys. Rev. B,112, L060101, Aug 2025. doi:10.1103/vsgc-gkdx

  59. [61]

    prediction of grain boundaries from bulk structures with score-based denoising models

    Chen, J., Yang, K., Peng, C., et al. Data for: "prediction of grain boundaries from bulk structures with score-based denoising models", 2025. doi:10.5281/zenodo.15725159

  60. [62]

    Faceting transition in aluminum as a grain boundary phase transition

    Choi, Y., and Brink, T. Faceting transition in aluminum as a grain boundary phase transition. Phys. Rev. Mater.,9, 083607, Aug 2025. doi:10.1103/2dnf-zdz8

  61. [63]

    Universality of grain boundary phases in fcc metals: Case study on high-angle [111] symmetric tilt grain boundaries.Phys

    Brink, T., Langenohl, L., Bishara, H., et al. Universality of grain boundary phases in fcc metals: Case study on high-angle [111] symmetric tilt grain boundaries.Phys. Rev. B,107, 054103, Feb 2023. doi:10.1103/PhysRevB.107.054103

  62. [64]

    Grain boundary properties of elemental metals.Acta Materialia,186, 40–49, 2020

    Zheng, H., Li, X.-G., Tran, R., et al. Grain boundary properties of elemental metals.Acta Materialia,186, 40–49, 2020. doi:https://doi.org/10.1016/j.actamat.2019.12.030

  63. [65]

    Physical Review Letters , author =

    Perdew, J. P ., Burke, K., and Ernzerhof, M. Generalized gradient approximation made simple.Phys. Rev. Lett.,77, 3865–3868, Oct 1996. doi:10.1103/PhysRevLett.77.3865

  64. [66]

    Graph atomic cluster expansion for founda- tional machine learning interatomic potentials.npj Computational Materials,12(1), 114, Feb

    Lysogorskiy, Y., Bochkarev, A., and Drautz, R. Graph atomic cluster expansion for founda- tional machine learning interatomic potentials.npj Computational Materials,12(1), 114, Feb

  65. [67]

    doi:10.1038/s41524-026-01979-1

  66. [68]

    Automated free-energy calculation from atomistic simulations.Phys

    Menon, S., Lysogorskiy, Y., Rogal, J., et al. Automated free-energy calculation from atomistic simulations.Phys. Rev. Mater.,5, 103801, Oct 2021. doi:10.1103/PhysRevMaterials.5.103801

  67. [69]

    P ., and Balluffi, R

    Sutton, A. P ., and Balluffi, R. W.Interfaces in Crystalline Materials. Clarendon Press, Oxford, 1996

  68. [70]

    CRC Press, 1 edition, 1997

    Randle, V .The Role of the Coincidence Site Lattice in Grain Boundary Engineering. CRC Press, 1 edition, 1997. doi:10.1201/9781003579991

  69. [71]

    Compactifying the Picard scheme

    Wolf, D. Structure-energy correlation for grain boundaries in f.c.c. metals—i. boundaries on the (111) and (100) planes.Acta Metallurgica,37(7), 1983–1993, 1989. doi:10.1016/0001- 6160(89)90082-5

  70. [72]

    W., Freitas, R., Cheon, G., et al

    Chung, H. W., Freitas, R., Cheon, G., et al. Data-centric framework for crystal structure identification in atomistic simulations using machine learning.Phys. Rev. Materials,6(4), 043801, 2022. doi:10.1103/PhysRevMaterials.6.043801

  71. [73]

    B., Elliott, R

    Tadmor, E. B., Elliott, R. S., Sethna, J. P ., et al. The potential of atomistic simulations and the knowledgebase of interatomic models.JOM,63(7), 17–17, 2011. doi:10.1007/s11837-011- 0102-6

  72. [74]

    Nist interatomic potentials repository, 2016

    Hale, L. Nist interatomic potentials repository, 2016. doi:10.18434/m37. Accessed 2026-03- 22

  73. [75]

    A semantic approach to mapping the provenance ontology to basic formal ontology.Scientific Data,12(1), 282, Feb 2025

    Prudhomme, T., De Colle, G., Liebers, A., et al. A semantic approach to mapping the provenance ontology to basic formal ontology.Scientific Data,12(1), 282, Feb 2025. doi:10.1038/s41597-025-04580-1

  74. [76]

    Buehler, M. J. Accelerating scientific discovery with generative knowledge extraction, graph-based representation, and multimodal intelligent graph reasoning.Machine Learning: Science and Technology,5(3), 035083, sep 2024. doi:10.1088/2632-2153/ad7228. 20

  75. [77]

    conceptual_dictionary, March 2026

    Menon, S., and Azocar Guzman, A. conceptual_dictionary, March 2026. https://github. com/OCDO/conceptual_dictionary/

  76. [78]

    atomRDF, python tool for ontology-based creation, manipulation, and quering of atomic structures., February 2026

    Menon, S., and Azocar Guzman, A. atomRDF, python tool for ontology-based creation, manipulation, and quering of atomic structures., February 2026. https://github.com/ pyscal/atomRDF. 21