pith. machine review for the scientific record. sign in

arxiv: 2604.25944 · v1 · submitted 2026-04-17 · ⚛️ physics.comp-ph · cond-mat.mtrl-sci

Recognition: unknown

From Code to Figure: A FAIR-Aligned Data Provenance Chain for Reproducible Simulation Research in Numerical Physics

Authors on Pith no claims yet

Pith reviewed 2026-05-10 07:26 UTC · model grok-4.3

classification ⚛️ physics.comp-ph cond-mat.mtrl-sci
keywords reproducible researchdata provenanceFAIR principlesnumerical physicssimulation workflowsversion controlcomputational physics
0
0 comments X

The pith

A workflow chains version control, logging, and metadata to link code versions directly to published figures in numerical physics.

A machine-rendered reading of the paper's core claim, the machinery that carries it, and where it could break.

The paper establishes that an integrated set of existing practices can create a traceable data provenance chain for simulations whose codebases evolve over many years. It combines version control, code review, automated testing, structured logging, metadata-rich outputs, and standardized post-processing so that any figure can be traced back to the exact code, inputs, and analysis steps. A sympathetic reader would care because large-scale computational physics depends on long-running, actively developed software, where traditional methods leave results difficult to reproduce or verify. The authors demonstrate the approach on one simulation framework and note that the same combination of tools can apply across computational physics and other data-intensive fields.

Core claim

We present an integrated workflow for reproducible and FAIR-aligned simulation research in numerical physics. We describe how version control, code review, automated testing, structured logging, metadata-rich output, and standardized post-processing can be combined to support traceability from software development to publication. The presented concepts demonstrated for one particular simulation framework are broadly applicable to computational physics and other data-intensive areas of scientific computing.

What carries the argument

The data provenance chain that links version-controlled code, automated tests, structured logs, and metadata-rich outputs through standardized post-processing to final published figures.

If this is right

  • Any published figure carries explicit links back to the precise code version and inputs that generated it.
  • Reproducibility checks become feasible even when the underlying simulation code continues to change over years.
  • The workflow supports practical implementation of FAIR principles without inventing new tools.
  • The same combination of practices extends to other computational fields that generate large evolving datasets.

Where Pith is reading between the lines

These are editorial extensions of the paper, not claims the author makes directly.

  • Groups running long-term simulation projects could adopt the chain incrementally by layering logging and metadata on top of existing version control.
  • Wider use might reduce duplication of effort when researchers attempt to verify or extend prior simulation results.
  • The approach highlights a testable path for fields where software development and data generation remain tightly coupled over time.

Load-bearing premise

That combining these standard software engineering practices will deliver full traceability and achieve broad adoption in practice for simulation frameworks that stay under active development for many years.

What would settle it

An independent researcher following the workflow cannot recover the exact code commit, input files, and analysis steps that produced a published figure from the provided metadata and logs.

Figures

Figures reproduced from arXiv: 2604.25944 by Baerbel Rethfeld, Christopher Seibel, Lukas G. Jonda, Markus Uehlein, Sebastian T. Weber, Tobias Held.

Figure 1
Figure 1. Figure 1: FIG. 1. Conceptual overview of the data provenance chain presented in this work. Version-controlled and reviewed code is [PITH_FULL_IMAGE:figures/full_fig_p002_1.png] view at source ↗
Figure 2
Figure 2. Figure 2: FIG. 2. Simplified excerpt of a provenance-rich NeXus output [PITH_FULL_IMAGE:figures/full_fig_p004_2.png] view at source ↗
Figure 3
Figure 3. Figure 3: FIG. 3. Excerpt from the NeXus structure highlighting the [PITH_FULL_IMAGE:figures/full_fig_p004_3.png] view at source ↗
read the original abstract

Computational physics increasingly depends on large simulation datasets generated by software that remains under active development for many years. In such settings, reproducibility requires not only well documented data but also explicit links between code versions, simulation inputs, generated outputs, analysis steps, and published figures. Here, we present an integrated workflow for reproducible and FAIR-aligned simulation research in numerical physics. We describe how version control, code review, automated testing, structured logging, metadata-rich output, and standardized post-processing can be combined to support traceability from software development to publication. The presented concepts demonstrated for one particular simulation framework are broadly applicable to computational physics and other data-intensive areas of scientific computing.

Editorial analysis

A structured set of objections, weighed in public.

Desk editor's note, referee report, simulated authors' rebuttal, and a circularity audit. Tearing a paper down is the easy half of reading it; the pith above is the substance, this is the friction.

Referee Report

1 major / 0 minor

Summary. The paper presents an integrated workflow for reproducible and FAIR-aligned simulation research in numerical physics. It describes combining version control, code review, automated testing, structured logging, metadata-rich output, and standardized post-processing to support traceability from software development to publication. The concepts are demonstrated for one particular simulation framework and are claimed to be broadly applicable to computational physics and other data-intensive areas of scientific computing.

Significance. If the workflow is successfully implemented and adopted, it could provide a practical framework for enhancing reproducibility in computational physics by ensuring explicit links between code versions, simulation inputs, outputs, analysis, and published figures. This addresses a key challenge in long-term active software development and supports FAIR principles, potentially improving the reliability and reusability of simulation-based research.

major comments (1)
  1. [Abstract] The abstract outlines the workflow components and their purpose but does not include specific implementation details, validation results, or evidence of effectiveness. This makes it challenging to evaluate whether the proposed integration achieves the claimed traceability in practice.

Simulated Author's Rebuttal

1 responses · 0 unresolved

We thank the referee for their constructive review and recommendation for major revision. We address the single major comment below and have revised the manuscript to strengthen the abstract while preserving its high-level character.

read point-by-point responses
  1. Referee: [Abstract] The abstract outlines the workflow components and their purpose but does not include specific implementation details, validation results, or evidence of effectiveness. This makes it challenging to evaluate whether the proposed integration achieves the claimed traceability in practice.

    Authors: We agree that the abstract, in its current form, remains at a conceptual level and does not yet convey concrete implementation details or evidence from the demonstration. The full manuscript provides these elements through the description of the chosen simulation framework, the structured logging and metadata mechanisms, and the end-to-end traceability examples. To address the concern directly, we have revised the abstract to include a concise reference to the specific framework used for demonstration and a brief statement on the observed traceability outcomes. This addition supplies the requested evidence of effectiveness without violating abstract length conventions, while the detailed validation remains in the body of the paper. revision: yes

Circularity Check

0 steps flagged

No significant circularity

full rationale

The paper is a purely descriptive methodological contribution that outlines an integrated workflow for reproducible simulation research by combining standard, pre-existing software engineering practices (version control, code review, automated testing, structured logging, metadata-rich outputs, and post-processing). It contains no mathematical derivations, equations, fitted parameters, quantitative predictions, formal proofs, or self-referential claims. The central claim—that this combination supports traceability from code to publication—is presented as an aspirational demonstration on one framework rather than a result derived from its own definitions or prior self-citations. No load-bearing step reduces to its inputs by construction, making the paper self-contained against external benchmarks with no circularity.

Axiom & Free-Parameter Ledger

0 free parameters · 1 axioms · 0 invented entities

The paper relies on standard software engineering practices and the existing FAIR data principles without introducing new fitted parameters, mathematical axioms, or invented entities.

axioms (1)
  • domain assumption FAIR principles provide a useful standard framework for data management in scientific computing
    Invoked in the abstract to frame the workflow alignment.

pith-pipeline@v0.9.0 · 5431 in / 1321 out tokens · 42618 ms · 2026-05-10T07:26:40.450724+00:00 · methodology

discussion (0)

Sign in with ORCID, Apple, or X to comment. Anyone can read and Pith papers without signing in.

Reference graph

Works this paper leans on

45 extracted references · 13 canonical work pages · 1 internal anchor

  1. [1]

    Public data archiving in ecology and evolution: how well are we doing?

    D. G. Roche, L. E. Kruuk, R. Lanfear, and S. A. Binning, “Public data archiving in ecology and evolution: how well are we doing?” PLoS biology, vol. 13, no. 11, p. e1002295, 2015

  2. [2]

    The fair guiding principles for scientific data management and stewardship,

    M. D. Wilkinson, M. Dumontier, I. J. Aalbersberg, G. Appleton, M. Axton, A. Baak, N. Blomberg, J.-W. Boiten, L. B. da Silva Santos, P. E. Bourne et al., “The fair guiding principles for scientific data management and stewardship,” Scientific data, vol. 3, no. 1, pp. 1–9, 2016

  3. [3]

    ISO 26 324:2025

    Information and documentation — Digital object identifier system, International Organization for Stan- dardization Std. ISO 26 324:2025. [Online]. Available: https://www.iso.org/standard/88862.html

  4. [4]

    DOI Handbook,

    DOI Foundation, “DOI Handbook,” https://doi.org/10. 1000/182, 2023

  5. [5]

    Error: why scientific programming does not compute,

    Z. Merali, “Error: why scientific programming does not compute,” Nature, vol. 467, no. 7317, pp. 775–777, 2010

  6. [6]

    Best Practices for Scientific Computing,

    G. Wilson, D. A. Aruliah, C. T. Brown, N. P. Chue Hong, M. Davis, R. T. Guy, S. H. D. Haddock, K. D. Huff, I. M. Mitchell, M. D. Plumbley, B. Waugh, E. P. White, and P. Wilson, “Best Practices for Scientific Computing,” PLOS Biology, vol. 12, no. 1, pp. 1–7, 01 2014. [Online]. Available: https://doi.org/10.1371/journal.pbio.1001745

  7. [7]

    Good enough practices in scientific computing,

    G. Wilson, J. Bryan, K. Cranston, J. Kitzes, L. Nederbragt, and T. K. Teal, “Good enough practices in scientific computing,” PLOS Computational Biology, vol. 13, no. 6, pp. 1–20, 06 2017. [Online]. Available: https://doi.org/10.1371/journal.pcbi.1005510

  8. [8]

    Five recommended practices for computational scientists who write soft- ware,

    D. Kelly, D. Hook, and R. Sanders, “Five recommended practices for computational scientists who write soft- ware,” Computing in Science & Engineering, vol. 11, no. 5, pp. 48–53, 2009

  9. [9]

    Role of primary and secondary processes in the ultrafast spin dynamics of nickel,

    M. Stiehl, M. Weber, C. Seibel, J. Hoefer, S. T. Weber, D. M. Nenno, H. C. Schneider, B. Rethfeld, B. Stadtm¨ uller, and M. Aeschlimann, “Role of primary and secondary processes in the ultrafast spin dynamics of nickel,” Applied Physics Letters, vol. 120, no. 6, p. 062410, 2022

  10. [10]

    Influ- ence of Electronic Non-Equilibrium on Energy Dis- tribution and Dissipation in Aluminum Studied with an Extended Two-Temperature Model,

    M. Uehlein, S. T. Weber, and B. Rethfeld, “Influ- ence of Electronic Non-Equilibrium on Energy Dis- tribution and Dissipation in Aluminum Studied with an Extended Two-Temperature Model,” Nanomaterials, vol. 12, no. 10, p. 1655, 2022

  11. [11]

    Scheie, P

    C. Seibel, M. Weber, M. Stiehl, S. T. Weber, M. Aeschlimann, H. C. Schneider, B. Stadtm¨ uller, and B. Rethfeld, “Control of transport phenomena in magnetic heterostructures by wavelength modulation,” Phys. Rev. B, vol. 106, p. L140405, Oct 2022. [Online]. Available: https://link.aps.org/doi/10.1103/PhysRevB. 106.L140405

  12. [12]

    Time-Resolved Spectral Den- sities of Nonthermal Electrons in Gold,

    C. Seibel, M. Uehlein, T. Held, P. N. Terekhin, S. T. Weber, and B. Rethfeld, “Time-Resolved Spectral Den- sities of Nonthermal Electrons in Gold,” The Journal of Physical Chemistry C, vol. 127, pp. 23 349–23 358, 2023

  13. [13]

    Atomistic mod- elling of ultrafast laser-induced melting in copper,

    M. Maigler, T. Held, D. O. Gericke, J. Schein, B. Reth- feld, S. H. Glenzer, and M. Z. Mo, “Atomistic mod- elling of ultrafast laser-induced melting in copper,” in High-Power Laser Ablation VIII, vol. 12939. SPIE, 2024, pp. 164–173

  14. [14]

    Influence of band occupation on electron-phonon coupling in gold,

    T. Held, S. T. Weber, and B. Rethfeld, “Influence of band occupation on electron-phonon coupling in gold,” Journal of Physics: Condensed Matter, vol. 37, no. 9, p. 095001, 2025

  15. [15]

    In- fluence of phonon stiffness on electron–phonon energy transfer,

    T. Held, C. Seibel, S. T. Weber, and B. Rethfeld, “In- fluence of phonon stiffness on electron–phonon energy transfer,” Journal of Physics: Condensed Matter, vol. 37, no. 25, p. 255401, 2025

  16. [16]

    Collapse of electron–phonon coupling due to nonthermal phonon populations,

    T. Held, C. Seibel, M. Uehlein, S. T. Weber, and B. Rethfeld, “Collapse of electron–phonon coupling due to nonthermal phonon populations,” Journal of Physics: Condensed Matter, vol. 37, no. 47, p. 47LT01, 2025

  17. [17]

    Capturing non-equilibrium electron dynamics in metals accurately and efficiently,

    M. Uehlein, H. T. Snowden, C. Seibel, T. Held, S. T. Weber, R. J. Maurer, and B. Rethfeld, “Capturing non-equilibrium electron dynamics in metals accurately and efficiently,” Journal of Applied Physics, vol. 138, no. 6, p. 063103, 08 2025. [Online]. Available: https://doi.org/10.1063/5.0276556

  18. [18]

    Intrinsically energy-dependent spin dy- namics in ultrafast demagnetization,

    C. Seibel, T. Held, M. Uehlein, S. T. Weber, and B. Rethfeld, “Intrinsically energy-dependent spin dy- namics in ultrafast demagnetization,” Communications Physics, vol. 8, no. 1, p. 416, Oct 2025. [Online]. Available: https://doi.org/10.1038/s42005-025-02370-0

  19. [19]

    Indirect optical manipulation of the antiferromagnetic order of insulating NiO by ultrafast interfacial energy transfer,

    S. Wust, C. Seibel, H. Meer, P. Herrgen, C. Schmitt, L. Baldrati, R. Ramos, T. Kikkawa, E. Saitoh, O. Gomonay et al., “Indirect optical manipulation of the antiferromagnetic order of insulating NiO by ultrafast interfacial energy transfer,” arXiv preprint arXiv:2205.02686, 2022

  20. [20]

    Competing signatures of intersite and interlayer spin transfer in the ultrafast magnetization dy- namics,

    S. H¨ auser, S. T. Weber, C. Seibel, M. Weber, L. Scheuer, M. Anstett, G. Zinke, P. Pirro, B. Hillebrands, H. C. Schneider et al., “Competing signatures of intersite and interlayer spin transfer in the ultrafast magnetization dy- namics,” arXiv preprint arXiv:2304.14957, 2023

  21. [21]

    Femtosecond versus nanosecond magnetization dynamics in ultrathin ferromagnetic bilayers,

    A. De, C. Seibel, S. Ashok, P. Herrgen, A. Lentfert, L. Scheuer, G. von Freymann, P. Pirro, B. Rethfeld, and M. Aeschlimann, “Femtosecond versus nanosecond magnetization dynamics in ultrathin ferromagnetic bilayers,” Phys. Rev. B, vol. 112, p. 144407, Oct 2025. [Online]. Available: https://link.aps.org/doi/10.1103/ d7vt-jlkg

  22. [22]

    Apparatus for broadband, time-resolved measurements of laser-induced reflectivity transients with sub-10 fs resolution,

    H. M. Wrigge, T. Held, P. D. Ndione, T. Nagy, B. Rethfeld, and P. Simon, “Apparatus for broadband, time-resolved measurements of laser-induced reflectivity transients with sub-10 fs resolution,” Optics & Laser 7 Technology, vol. 193, p. 114354, 2026. [Online]. Avail- able: https://www.sciencedirect.com/science/article/ pii/S0030399225019450

  23. [23]

    Probing laser-driven surface and subsurface dynamics via grazing-incidence XFEL scattering and diffraction,

    L. Randolph, ¨O. ¨Ozt¨ urk, D. Ksenzov, L. Huang, T. Kluge, S. V. Rahul, V. Bouffetier, T. Held, S. T. Weber, C. Baehtz, M. Banjafar, E. Brambrink, F. Brieuc, B. I. Cho, S. G¨ ode, H. H¨ oppner, G. Jakob, M. Kl¨ aui, Z. Konˆ opkov´ a, C. Lee, G. Lee, M. Makita, M. Mishchenko, M. Mo, P. D. Ndione, F. Paschke- Bruehl, M. Paulus, A. Pelka, T. R. Preston, C. ...

  24. [24]

    Roden, C

    S. Roden, C. Seibel, T. Held, M. Uehlein, S. T. Weber, and B. Rethfeld, “Thermalization of optically excited fermi systems: Electron-electron collisions in solid metals,” 2026. [Online]. Available: https: //arxiv.org/abs/2601.11371

  25. [25]

    Competing thermalization pathways of photoexcited hot electrons

    C. Seibel, T. Held, M. Uehlein, and B. Rethfeld, “Competing thermalization pathways of photoexcited hot electrons,” 2026. [Online]. Available: https: //arxiv.org/abs/2604.09236

  26. [26]

    Thermodynamic µT model of ultrafast magnetization dynamics,

    B. Y. Mueller and B. Rethfeld, “Thermodynamic µT model of ultrafast magnetization dynamics,” Phys. Rev. B, vol. 90, p. 144420, Oct 2014

  27. [27]

    Relaxation dynamics in laser-excited metals under nonequilibrium conditions,

    ——, “Relaxation dynamics in laser-excited metals under nonequilibrium conditions,” Phys. Rev. B, vol. 87, no. 3, p. 035139, 2013

  28. [28]

    Nonequilibrium band occupation and optical re- sponse of gold after ultrafast XUV excitation,

    P. D. Ndione, S. T. Weber, D. O. Gericke, and B. Reth- feld, “Nonequilibrium band occupation and optical re- sponse of gold after ultrafast XUV excitation,” Scientific Reports, vol. 12, no. 1, pp. 1–10, 2022

  29. [29]

    A quick introduction to version control with git and github,

    J. D. Blischak, E. R. Davenport, and G. Wilson, “A quick introduction to version control with git and github,” PLoS computational biology, vol. 12, no. 1, p. e1004668, 2016

  30. [30]

    Version con- trol system: A review,

    N. N. Zolkifli, A. Ngah, and A. Deraman, “Version con- trol system: A review,” Procedia Computer Science, vol. 135, pp. 408–415, 2018

  31. [31]

    boost C ++ library,

    “boost C ++ library,” https://www.boost.org/, accessed 2026-04-16

  32. [32]

    NIST Standard Reference Database 121,

    “NIST Standard Reference Database 121,” https://physics.nist.gov/cgi-bin/cuu/Category?view= html&Non-SI+units.x=46&Non-SI+units.y=11, 2014, accessed 2026-04-16

  33. [33]

    GoogleTest - Google Testing and Mocking Framework,

    “GoogleTest - Google Testing and Mocking Framework,” https://github.com/google/googletest, accessed 2026-04- 16

  34. [34]

    YAML Ain’t Markup Language,

    “YAML Ain’t Markup Language,” https://yaml.org/, accessed 2026-04-16

  35. [35]

    spdlog - Fast C ++ logging library,

    “spdlog - Fast C ++ logging library,” https://github.com/ gabime/spdlog, accessed 2026-04-16

  36. [36]

    The HDF Group, “HDF5,” https://www.hdfgroup.org/ solutions/hdf5/, 2006, accessed 2026-04-16

  37. [37]

    An overview of the HDF5 technology suite and its applications,

    M. Folk, G. Heber, Q. Koziol, E. Pourmal, and D. Robin- son, “An overview of the HDF5 technology suite and its applications,” in Proceedings of the EDBT/ICDT 2011 Workshop on Array Databases, 2011, pp. 36–47

  38. [38]

    The NeXus data format,

    M. K¨ onnecke, F. A. Akeroyd, H. J. Bernstein, A. S. Brew- ster, S. I. Campbell, B. Clausen, S. Cottrell, J. U. Hoff- mann, P. R. Jemian, D. M¨ annicke et al., “The NeXus data format,” Journal of applied crystallography, vol. 48, no. 1, pp. 301–305, 2015

  39. [39]

    Fair data enabling new horizons for materials research,

    M. Scheffler, M. Aeschlimann, M. Albrecht, T. Bereau, H.-J. Bungartz, C. Felser, M. Greiner, A. Groß, C. T. Koch, K. Kremer, W. E. Nagel, M. Scheidgen, C. W¨ oll, and C. Draxl, “Fair data enabling new horizons for materials research,” Nature, vol. 604, no. 7907, pp. 635–642, Apr 2022. [Online]. Available: https://doi.org/10.1038/s41586-022-04501-x

  40. [40]

    silx- kit/h5web: H5web 15,

    A. Bocciarelli, L. Huder, T. VINCENT, P. Chang, B. B. Maranville, C. Prescher, and K. Bhasin, “silx- kit/h5web: H5web 15,” Aug. 2025. [Online]. Available: https://doi.org/10.5281/zenodo.16984476

  41. [41]

    nexusformat,

    “nexusformat,” https://github.com/nexpy/nexusformat, accessed 2026-04-16

  42. [42]

    A framework for few-shot language model evaluation

    A. Devresse, N. Cornu, L. Grosheintz-Laval, O. Awile, T. de Geus, F. Pereira, M. Wolf, and H. Contributors, “Highfive - header-only c++ hdf5 interface,” Dec. 2024. [Online]. Available: https://doi.org/10.5281/zenodo. 14272664

  43. [43]

    Git Large File Storage,

    “Git Large File Storage,” https://git-lfs.com/, accessed 2026-04-16

  44. [44]

    nature scientific data: Data Repository Guidance,

    “nature scientific data: Data Repository Guidance,” https://www.nature.com/sdata/policies/repositories, accessed 2026-04-16

  45. [45]

    Nomad: A distributed web-based platform for managing materials science re- search data,

    M. Scheidgen, L. Himanen, A. N. Ladines, D. Sikter, M. Nakhaee, ´A. Fekete, T. Chang, A. Golparvar, J. A. M´ arquez, S. Brockhauseret al., “Nomad: A distributed web-based platform for managing materials science re- search data,” Journal of Open Source Software, vol. 8, no. 90, p. 5388, 2023