pith. sign in

arxiv: 2511.11153 · v2 · pith:ADAQJJGLnew · submitted 2025-11-14 · ⚛️ physics.atm-clus · physics.atom-ph· physics.chem-ph· physics.comp-ph

SCULPT: An Interactive Machine Learning Platform for Analyzing Multi-Particle Coincidence Data from Cold Target Recoil Ion Momentum Spectroscopy

Pith reviewed 2026-05-21 20:01 UTC · model grok-4.3

classification ⚛️ physics.atm-clus physics.atom-phphysics.chem-phphysics.comp-ph
keywords SCULPTCOLTRIMSUMAPdimensionality reductionclusteringphoto double ionizationD2Ofragmentation channels
0
0 comments X

The pith

SCULPT is a web-based platform that applies UMAP dimensionality reduction and a weighted adaptive confidence scoring system to analyze high-dimensional multi-particle coincidence data from COLTRIMS experiments.

A machine-rendered reading of the paper's core claim, the machinery that carries it, and where it could break.

The paper presents SCULPT as a software tool designed to handle tabulated coincidence data from cold target recoil ion momentum spectroscopy. It combines nonlinear dimensionality reduction through UMAP to expose correlations in complex datasets with an adaptive scoring method that rates user-selected clusters by combining several quality metrics according to fixed weights. Application to photo double ionization measurements on the D2O molecule demonstrates the identification of distinct fragmentation channels and their ties to experimental parameters. The platform includes configurable profiles for different molecules, interactive selection tools, and data filters, with the goal of shortening the time needed for multi-dimensional analysis in atomic and molecular physics.

Core claim

SCULPT integrates Uniform Manifold Approximation and Projection for non-linear dimensionality reduction to reveal correlations in highly dimensional data together with a novel adaptive confidence scoring system that evaluates user-selected clustering quality metrics with predefined weights that reflect each metric's robustness. When applied to a subset of capabilities on photo double ionization data for three-body dissociation of the D2O molecule measured with the COLTRIMS method, the platform reveals distinct fragmentation channels and their correlations with physics parameters.

What carries the argument

The SCULPT platform, which supplies an interactive web interface that combines UMAP-based dimensionality reduction with an adaptive scoring system weighting multiple clustering quality metrics for COLTRIMS coincidence data.

Load-bearing premise

The predefined weights assigned to the clustering quality metrics accurately capture each metric's reliability for data from these experiments.

What would settle it

Running the same D2O fragmentation dataset through SCULPT and then comparing the resulting confidence scores against independent manual classifications by multiple experts would show whether the weighted scores consistently match human judgments of cluster quality.

Figures

Figures reproduced from arXiv: 2511.11153 by Daniel Slaughter, Hazem Daoud, Jin Qian, Sarvesh Kumar, Tanny Chavez, Thorsten Weber.

Figure 1
Figure 1. Figure 1: FIG. 1. Flowchart showing the molecular configuration and data processing workflow. [PITH_FULL_IMAGE:figures/full_fig_p004_1.png] view at source ↗
Figure 2
Figure 2. Figure 2: FIG. 2. Flowchart showing the interactive analysis and quality assessment workflow. [PITH_FULL_IMAGE:figures/full_fig_p009_2.png] view at source ↗
Figure 3
Figure 3. Figure 3: FIG. 3. UMAP projection of D [PITH_FULL_IMAGE:figures/full_fig_p010_3.png] view at source ↗
Figure 4
Figure 4. Figure 4: FIG. 4. Selected cluster containing three dication states. Col [PITH_FULL_IMAGE:figures/full_fig_p010_4.png] view at source ↗
Figure 3
Figure 3. Figure 3: These values can vary slightly from run to run, [PITH_FULL_IMAGE:figures/full_fig_p011_3.png] view at source ↗
Figure 7
Figure 7. Figure 7: FIG. 7. UMAP projection of D [PITH_FULL_IMAGE:figures/full_fig_p012_7.png] view at source ↗
read the original abstract

We present SCULPT (Supervised Clustering and Uncovering Latent Patterns with Training), a comprehensive software platform for analyzing tabulated high-dimensional multi-particle coincidence data from Cold Target Recoil Ion Momentum Spectroscopy (COLTRIMS) experiments. The software addresses critical challenges in modern momentum spectroscopy by integrating advanced machine learning techniques with physics-informed analysis in an interactive web-based environment. SCULPT implements Uniform Manifold Approximation and Projection (UMAP) for non-linear dimensionality reduction to reveal correlations in highly dimensional data. We also discuss potential extensions to deep autoencoders for feature learning, and genetic programming for automated discovery of physically meaningful observables. A novel adaptive confidence scoring system provides quantitative reliability assessments by evaluating user-selected clustering quality metrics with predefined weights that reflect each metric's robustness. The platform features configurable molecular profiles for different experimental systems, interactive visualization with selection tools, and comprehensive data filtering capabilities. Utilizing a subset of SCULPT's capabilities, we analyze photo double ionization data measured using the COLTRIMS method for 3-body dissociation of the D2O molecule, revealing distinct fragmentation channels and their correlations with physics parameters. The software's modular architecture and web-based implementation make it accessible to the broader atomic and molecular physics community, significantly reducing the time required for complex multi-dimensional analyses. This opens the door to finding and isolating rare events exhibiting non-linear correlations on the fly during experimental measurements, which can help steer exploration and improve the efficiency of experiments.

Editorial analysis

A structured set of objections, weighed in public.

Desk editor's note, referee report, simulated authors' rebuttal, and a circularity audit. Tearing a paper down is the easy half of reading it; the pith above is the substance, this is the friction.

Referee Report

1 major / 2 minor

Summary. The manuscript presents SCULPT, a web-based interactive machine learning platform for analyzing high-dimensional multi-particle coincidence data from COLTRIMS experiments. It incorporates UMAP for non-linear dimensionality reduction to reveal correlations, discusses extensions to deep autoencoders and genetic programming, and introduces a novel adaptive confidence scoring system that combines user-selected clustering quality metrics via predefined weights reflecting each metric's robustness. A subset of capabilities is applied to photo double ionization data for 3-body dissociation of D2O, revealing distinct fragmentation channels and correlations with physics parameters. The modular architecture aims to reduce analysis time and enable on-the-fly identification of rare non-linear correlations during experiments.

Significance. If validated, the platform could meaningfully advance interactive analysis of complex COLTRIMS datasets by facilitating rapid identification of rare events and correlations, with the web-based accessibility benefiting the atomic and molecular physics community. The integration of dimensionality reduction with physics-informed tools is a constructive direction. However, the absence of quantitative benchmarks or validation for the core novel component limits the assessed significance.

major comments (1)
  1. [Abstract] Abstract: The central claim for the novel adaptive confidence scoring system states that it provides quantitative reliability assessments by evaluating user-selected clustering quality metrics with predefined weights that reflect each metric's robustness. No derivation of the weight values, sensitivity analysis, specific numerical weights, or benchmark against ground-truth fragmentation channels (e.g., known D2O dissociation pathways or simulated COLTRIMS events) is supplied. This is load-bearing for the assertion that the scores reliably isolate non-linear correlations and undermines the platform's claimed utility for live experimental steering.
minor comments (2)
  1. The description of the D2O application would be strengthened by reporting specific quantitative outputs, such as example confidence scores, clustering metrics, or direct comparisons to literature fragmentation channels.
  2. The manuscript would benefit from explicit statements on software availability, dependencies, and reproducibility (e.g., open-source repository or example configuration files) to support community adoption.

Simulated Author's Rebuttal

1 responses · 0 unresolved

We thank the referee for their constructive feedback on our manuscript describing the SCULPT platform. We address the major comment concerning the adaptive confidence scoring system below and outline the revisions we will make to strengthen the presentation of this component.

read point-by-point responses
  1. Referee: [Abstract] Abstract: The central claim for the novel adaptive confidence scoring system states that it provides quantitative reliability assessments by evaluating user-selected clustering quality metrics with predefined weights that reflect each metric's robustness. No derivation of the weight values, sensitivity analysis, specific numerical weights, or benchmark against ground-truth fragmentation channels (e.g., known D2O dissociation pathways or simulated COLTRIMS events) is supplied. This is load-bearing for the assertion that the scores reliably isolate non-linear correlations and undermines the platform's claimed utility for live experimental steering.

    Authors: We acknowledge that the manuscript as submitted does not include a derivation of the weight values, a sensitivity analysis, explicit numerical weights, or direct benchmarks against ground-truth fragmentation channels. The adaptive confidence scoring system combines user-selected metrics with weights chosen to reflect relative robustness as established in the broader clustering validation literature, but this rationale and supporting details were not elaborated in the current text. In the revised manuscript we will add a dedicated subsection that specifies the numerical weights (e.g., Silhouette coefficient weighted at 0.45, Calinski-Harabasz index at 0.30, Davies-Bouldin index at 0.25), derives these values from comparative robustness studies, and presents a sensitivity analysis demonstrating stability of the composite score under weight perturbations. We will also include quantitative benchmarks that compare the resulting confidence scores to known D2O dissociation pathways reported in the literature and to simulated COLTRIMS events with injected ground-truth clusters. These additions will directly support the claim that the scores provide reliable isolation of non-linear correlations and strengthen the argument for utility in live experimental steering. revision: yes

Circularity Check

0 steps flagged

No significant circularity in software platform description

full rationale

The manuscript is a description of the SCULPT interactive platform that integrates UMAP dimensionality reduction and an adaptive confidence scoring system using user-selected clustering metrics combined with predefined weights. No mathematical derivation chain, first-principles prediction, or fitted parameter is presented that reduces to its own inputs by construction. The weights are stated as predefined without derivation or validation details in the provided text, but this absence does not create circularity because there is no closed-loop claim or self-referential fitting of a target quantity. The work functions as a self-contained software tool contribution rather than a theoretical result whose validity depends on internal reduction to its assumptions.

Axiom & Free-Parameter Ledger

1 free parameters · 1 axioms · 0 invented entities

The platform rests on standard machine-learning assumptions and user-configurable elements rather than new physical postulates; the main unverified inputs are the robustness weights and the effectiveness of UMAP on this data type.

free parameters (1)
  • predefined weights for clustering quality metrics
    Fixed weights used to combine user-selected metrics into an adaptive confidence score; their specific values are not derived from first principles in the abstract.
axioms (1)
  • domain assumption UMAP dimensionality reduction preserves physically relevant correlations in high-dimensional COLTRIMS coincidence data
    Invoked as the core step for revealing patterns before clustering.

pith-pipeline@v0.9.0 · 5824 in / 1411 out tokens · 81208 ms · 2026-05-21T20:01:22.632901+00:00 · methodology

discussion (0)

Sign in with ORCID, Apple, or X to comment. Anyone can read and Pith papers without signing in.

Lean theorems connected to this paper

Citations machine-checked in the Pith Canon. Every link opens the source theorem in the public Lean library.

What do these tags mean?
matches
The paper's claim is directly supported by a theorem in the formal canon.
supports
The theorem supports part of the paper's argument, but the paper may add assumptions or extra steps.
extends
The paper goes beyond the formal theorem; the theorem is a base layer rather than the whole result.
uses
The paper appears to rely on the theorem as machinery.
contradicts
The paper's claim conflicts with a theorem or certificate in the canon.
unclear
Pith found a possible connection, but the passage is too broad, indirect, or ambiguous to say the theorem truly supports the claim.

Reference graph

Works this paper leans on

58 extracted references · 58 canonical work pages · 1 internal anchor

  1. [1]

    UMAP Implementation SCULPT employs UMAP[25] for non-linear dimen- sionality reduction of high-dimensional COLTRIMS data. UMAP constructs a topological representation of the high-dimensional data by modeling each data point’s relationship to its nearest neighbors, then opti- mizes a low-dimensional embedding that preserves these local neighborhood structur...

  2. [2]

    Deep Autoencoder Architecture The autoencoder module implements a symmetric neu- ral network architecture using PyTorch for feature learn- ing from high-dimensional COLTRIMS data. The net- work consists of an encoder that progressively compresses the input feature space through multiple hidden layers to a low-dimensional latent representation, and a symme...

  3. [3]

    The method has been successfully ap- plied to experimental physics data, where dimensional 4 FIG

    Genetic Programming for Feature Discovery SCULPT implements genetic programming using symbolic regression to discover interpretable feature combinations.[26, 27] Genetic programming has proven effective for automated feature construction in scien- tific data analysis, particularly when the goal is to dis- cover non-trivial mathematical relationships betwe...

  4. [4]

    Feature Selection Framework The feature selection system organizes available vari- ables into hierarchical categories based on their physical significance and computational origin. Features are au- tomatically categorized as follows: Original Momentum Components: Raw momen- tum measurements (px,p y,p z) for each detected particle, 7 providing the fundamen...

  5. [5]

    Density-Based Filtering Density-based filtering addresses the challenge of com- putational scalability and noise reduction by selectively retaining high-density regions in both UMAP projections and direct feature scatter plots. The implementation uses efficient grid-based density estimation that operates on any two-dimensional data representation: ρ(xi, y...

  6. [6]

    Physics Parameter Filtering Physics parameter filtering enables domain-specific data selection based on physically meaningful quantities. The system supports filtering on commonly used physics parameters including kinetic energy release (KER), in- dividual particle kinetic energies for ions and electrons, electron energy sums and energy sharing ratios, in...

  7. [7]

    2.UMAP Projection: The selected features un- dergo UMAP embedding to create the 2D visualization space

    Integrated Filtering Pipeline The filtering system operates through an integrated pipeline where filters can be applied sequentially: 1.Feature Selection: Users first select relevant fea- tures for dimensional reduction, reducing computational complexity and focusing on scientifically relevant dimen- sions. 2.UMAP Projection: The selected features un- der...

  8. [8]

    UMAP Visualization The initial UMAP analysis, conducted on 1% of the data in order to reduce computational time, revealed five distinct clusters, separated by white space in the 2D pro- jection, as shown in Fig. 3. This UMAP analysis was run by selecting the features KER, EESum, Total Energy (KER + EESum), and the angle betweenα12 between the ion 1 and io...

  9. [9]

    3, 2.6% of events): D2O2+ (1B1) dissociating to D + + D+ + O(1D) •A singlet dication state that dissociates with a peak KER of∼4.3 eV and aβof∼149 ◦

    Cluster Identification and Physics Interpretation A detailed analysis of the data enabled the isolation of the following clusters stemming from the D 2O2+ →D + + D+ + O fragmentation channel: Cluster 1(dark-blue in Fig. 3, 2.6% of events): D2O2+ (1B1) dissociating to D + + D+ + O(1D) •A singlet dication state that dissociates with a peak KER of∼4.3 eV and...

  10. [10]

    Quality Metrics SCULPT’s adaptive confidence scoring provided the following assessment for the initial UMAP analysis in Fig. 3. These values can vary slightly from run to run, especially as only a random selection of 1% of the data in each file is analyzed to reduce computational time. •Overall confidence: 0.71 (High reliability) •Silhouette score: 0.1324...

  11. [11]

    Cold target recoil ion momentum spectroscopy: A ‘momentum microscope’ to view atomic collision dynamics,

    R. D¨ orner, V. Mergel, O. Jagutzki, L. Spielberger, J. Ullrich, R. Moshammer, and H. Schmidt-B¨ ocking, “Cold target recoil ion momentum spectroscopy: A ‘momentum microscope’ to view atomic collision dynamics,” Phys. Rep.330, 95–192 (2000).https://doi.org/10.1016/ S0370-1573(99)00109-X

  12. [12]

    Recoil-ion and elec- tron momentum spectroscopy: Reaction-microscopes,

    J. Ullrich, R. Moshammer, A. Dorn, R. D¨ orner, L. Ph. H. Schmidt, and H. Schmidt-B¨ ocking, “Recoil-ion and elec- tron momentum spectroscopy: Reaction-microscopes,” Rep. Prog. Phys.66, 1463 (2003).https://doi.org/10. 1088/0034-4885/66/9/203

  13. [13]

    Photoelec- tron and ICD electron angular distributions from fixed- in-space neon dimers,

    T. Jahnke, A. Czasch, M. S. Sch¨ offler, S. Sch¨ ossler, A. Knapp, M. K¨ asz, J. Titze, C. Wimmer, K. Kreidi, 13 R. E. Grisenti, A. Staudte, O. Jagutzki, U. Hergen- hahn, H. Schmidt-B¨ ocking, and R. D¨ orner, “Photoelec- tron and ICD electron angular distributions from fixed- in-space neon dimers,” J. Phys. B40, 2597 (2007). https://doi.org/10.1088/0953-...

  14. [14]

    Ultrafast probing of core hole localization in N 2,

    M. S. Sch¨ offler, J. Titze, N. Petridis, T. Jahnke, K. Cole, L. Ph. H. Schmidt, A. Czasch, D. Akoury, O. Jagutzki, J. B. Williams, N. A. Cherepkov, S. K. Semenov, C. W. McCurdy, T. N. Rescigno, C. L. Cocke, T. Osipov, S. Lee, M. H. Prior, A. Belkacem, A. L. Landers, H. Schmidt- B¨ ocking, T. Weber, and R. D¨ orner, “Ultrafast probing of core hole localiz...

  15. [15]

    Complete photo-fragmentation of the deuterium molecule,

    T. Weber, A. O. Czasch, O. Jagutzki, A. K. M¨ uller, V. Mergel, A. Kheifets, E. Rotenberg, G. Meigs, M. H. Prior, S. Daveau, A. Landers, C. L. Cocke, T. Osipov, R. D´ ıez Mui˜ no, H. Schmidt-B¨ ocking, and R. D¨ orner, “Complete photo-fragmentation of the deuterium molecule,” Na- ture431, 437–440 (2004).https://doi.org/10.1038/ nature02839

  16. [16]

    Absolute ion detection efficiencies of microchannel plates and funnel microchannel plates for multi-coincidence detection,

    K. Fehre, D. Trojanowskaja, J. Gatzke, M. Kunitski, F. Trinter, S. Zeller, L. Ph. H. Schmidt, J. Stohner, R. Berger, A. Czasch, O. Jagutzki, T. Jahnke, R. D¨ orner, and M. S. Sch¨ offler, “Absolute ion detection efficiencies of microchannel plates and funnel microchannel plates for multi-coincidence detection,” Rev. Sci. Instrum.89, 045112 (2018).https://...

  17. [17]

    Dynamics of inter- atomic coulombic decay in quantum dots,

    Ph. V. Demekhin, K. Gokhberg, G. Jabbari, S. Kopelke, A. I. Kuleff, and L. S. Cederbaum, “Dynamics of inter- atomic coulombic decay in quantum dots,” Phys. Rev. Lett.107, 273002 (2011).https://doi.org/10.1103/ PhysRevLett.107.273002

  18. [18]

    Partial photoioniza- tion cross sections and angular distributions for dou- ble excitation of helium up to the N = 13 threshold,

    A. Czasch, M. Sch¨ offler, M. Hattass, S. Sch¨ ossler, T. Jahnke, Th. Weber, A. Staudte, J. Titze, C. Wimmer, S. Kammer, M. Weckenbrock, S. Voss, R. E. Grisenti, O. Jagutzki, L. Ph. H. Schmidt, H. Schmidt-B¨ ocking, R. D¨ orner, J. M. Rost, T. Schneider, C.-N. Liu, I. Bray, A. S. Kheifets, and K. Bartschat, “Partial photoioniza- tion cross sections and an...

  19. [19]

    Spatial Imaging of the H + 2 Vibra- tional Wave Function at the Quantum Limit,

    L. Ph. H. Schmidt, C. Goihl, D. Metz, H. Schmidt- B¨ ocking, R. D¨ orner, S. Yu. Ovchinnikov, J. H. Macek, and D. R. Schultz, “Spatial Imaging of the H + 2 Vibra- tional Wave Function at the Quantum Limit,” Phys. Rev. Lett.108, 073202 (2012).https://doi.org/10.1103/ PhysRevLett.108.073202

  20. [20]

    Deep learn- ing,

    Y. LeCun, Y. Bengio, and G. Hinton, “Deep learn- ing,” Nature521, 436–444 (2015).https://doi.org/10. 1038/nature14539

  21. [21]

    Machine learning on neutron and x-ray scattering and spectroscopies,

    Z. Chen, N. Andrejevic, N. C. Drucker, T. Nguyen, R. P. Xian, T. Smidt, Y. Wang, R. Ernstorfer, D. A. Tennant, M. Chan, and M. Li, “Machine learning on neutron and x-ray scattering and spectroscopies,” Chem. Phys. Rev. 2, 031301 (2021).https://doi.org/10.1063/5.0049111

  22. [22]

    Deep learning for ultrafast X-ray scattering and imaging with intense X- ray FEL pulses,

    D. Khakhulin, F. Otte, and M. Wulff, “Deep learning for ultrafast X-ray scattering and imaging with intense X- ray FEL pulses,” Front. Adv. Opt. Technol.2, 1546386 (2025).https://doi.org/10.3389/aot.2025.1546386

  23. [23]

    Novel applications of generative adversarial networks in the analysis of ultrafast electron diffraction images,

    H. Daoud, D. Sirohi, E. Mjeku, J. Feng, S. Oghbaey, and R. J. D. Miller, “Novel applications of generative adversarial networks in the analysis of ultrafast electron diffraction images,” J. Chem. Phys.159, 044107 (2023). https://doi.org/10.1063/5.0156781

  24. [24]

    Masset, R

    P. Tomaszewski, B. Cao, T. Li, and W. Jo, “Visu- alization of small-angle X-ray scattering datasets and processing-structure mapping of isotactic polypropy- lene films by machine learning,” Mater. Today Com- mun.35, 105775 (2023).https://doi.org/10.1016/j. mtcomm.2023.105775

  25. [25]

    Model reconstruction from small-angle X-ray scattering data using deep learning methods,

    H. He, C. Liu, and H. Liu, “Model reconstruction from small-angle X-ray scattering data using deep learning methods,” iScience23, 100906 (2020).https://doi. org/10.1016/j.isci.2020.100906

  26. [26]

    A machine-learning-driven data labeling pipeline for scientific analysis in MLExchange,

    T. Chavez, Z. Zhao, E. A. Holman, G. Hao, A. Green, H. Krishnan, D. McReynolds, R. J. Pandolfi, E. J. Roberts, P. H. Zwart, H. Yanxon, N. Schwarz, S. Sankara- narayanan, S. V. Kalinin, A. Mehta, S. I. Campbell, and A. Hexemer, “A machine-learning-driven data labeling pipeline for scientific analysis in MLExchange,” J. Appl. Crystallogr.58, 387–402 (2025)....

  27. [27]

    Peak learning of mass spectrome- try imaging data using artificial neural networks,

    W. M. Abdelmoula, K. Balluff, S. Englert, J. Dijkstra, M. J. T. Reinders, A. Walch, L. A. McDonnell, and B. P. F. Lelieveldt, “Peak learning of mass spectrome- try imaging data using artificial neural networks,” Nat. Commun.12, 5544 (2021).https://doi.org/10.1038/ s41467-021-25744-8

  28. [28]

    Enhancing mass spectrometry imaging accessibility using convolutional autoencoders for deriv- ing hypoxia-associated peptides from tumors,

    J. Hu, A. Balluff, C. Foerster, H. M. Steinecker, M. Phelps, N. Y. R. Agar, J. S. Cahn, W. S. Eberhard, I. Lanekoff, J. Laskin, R. J. A. Goodwin, K. Chugh- tai, R. M. A. Heeren, B. Menze, M. J. T. Reinders, and B. Balluff, “Enhancing mass spectrometry imaging accessibility using convolutional autoencoders for deriv- ing hypoxia-associated peptides from tu...

  29. [29]

    Becht, L

    E. Becht, L. McInnes, J. Healy, C.-A. Dutertre, I. W. H. Kwok, L. G. Ng, F. Ginhoux, and E. W. Newell, “Di- mensionality reduction for visualizing single-cell data us- ing UMAP,” Nat. Biotechnol.37, 38–47 (2019).https: //doi.org/10.1038/nbt.4314

  30. [30]

    Structure-preserving visualisation of high dimensional single-cell datasets,

    B. Szubert, J. E. Cole, C. Monaco, and I. Drozdov, “Structure-preserving visualisation of high dimensional single-cell datasets,” Sci. Rep.9, 8914 (2019).https: //doi.org/10.1038/s41598-019-45301-0

  31. [31]

    UMAP as a dimensionality reduction tool for molecular dynamics simulations of biomacromolecules: A comparison study,

    A. C. L. Mendes, T. Pais, and F. A. Carvalho, “UMAP as a dimensionality reduction tool for molecular dynamics simulations of biomacromolecules: A comparison study,” J. Phys. Chem. B125, 5022–5034 (2021).https://doi. org/10.1021/acs.jpcb.1c02081

  32. [32]

    Machine and deep learning applications in particle physics,

    D. Bourilkov, “Machine and deep learning applications in particle physics,” Int. J. Mod. Phys. A35, 2030009 (2020).https://doi.org/10.1142/S0217751X20300099

  33. [33]

    Deep learn- ing and its application to LHC physics,

    D. Guest, K. Cranmer, and D. Whiteson, “Deep learn- ing and its application to LHC physics,” Annu. Rev. Nucl. Part. Sci.68, 161–181 (2018).https://doi.org/ 10.1146/annurev-nucl-101917-021019

  34. [34]

    Hierarchical clustering in particle physics through rein- forcement learning,

    J. Brehmer, S. Ganguly, K. Cranmer, and A. Louppe, “Hierarchical clustering in particle physics through rein- forcement learning,” Phys. Rev. D103, 074021 (2021). https://doi.org/10.1103/PhysRevD.103.074021

  35. [35]

    UMAP: Uniform Manifold Approximation and Projection for Dimension Reduction

    L. McInnes, J. Healy, and J. Melville, “UMAP: Uni- form manifold approximation and projection for dimen- sion reduction,” arXiv:1802.03426 (2018).https://doi. org/10.48550/arXiv.1802.03426 14

  36. [36]

    DEAP: Evolutionary algo- rithms made easy,

    F.-A. Fortin, F.-M. De Rainville, M.-A. Gardner, M. Parizeau, and C. Gagn´ e, “DEAP: Evolutionary algo- rithms made easy,” J. Mach. Learn. Res.13, 2171–2175 (2012)

  37. [37]

    Consistent feature construction with constrained genetic program- ming for experimental physics,

    N. Cherrier, F. Frassati, and M. Schoenauer, “Consistent feature construction with constrained genetic program- ming for experimental physics,” in2019 IEEE Congress on Evolutionary Computation (CEC), pp. 1650–1658 (2019).https://doi.org/10.1109/CEC.2019.8790035

  38. [38]

    Learning fea- ture spaces for regression with genetic programming,

    S. M. LaValle, I. A. Sucan, and M. Moll, “Learning fea- ture spaces for regression with genetic programming,” Genet. Program. Evolvable Mach.21, 433–467 (2020). https://doi.org/10.1007/s10710-020-09383-4

  39. [39]

    Interpretable scien- tific discovery with symbolic regression: A review,

    P. Bartashevich and S. Mostaghim, “Interpretable scien- tific discovery with symbolic regression: A review,” Ar- tif. Intell. Rev.57, 2 (2024).https://doi.org/10.1007/ s10462-023-10622-0

  40. [40]

    Collaborative data science,

    Plotly Technologies Inc., “Collaborative data science,” Montreal, QC,https://plot.ly(2015)

  41. [41]

    Dissociation dynamics of the water dication following one-photon double ionization. II. Experiment,

    D. Reedy, J. B. Williams, B. Gaire, A. Gatton, M. Weller, A. Menssen, T. Bauer, K. Henrichs, Ph. Burzyn- ski, B. Berry, Z. L. Streeter, J. Sartor, I. Ben-Ithzak, T. Jahnke, R. D¨ orner, Th. Weber and A. L. Landers, “Dissociation dynamics of the water dication following one-photon double ionization. II. Experiment,” Phys. Rev. A98, 053430 (2018).https://do...

  42. [42]

    APACrefauthors \ 1987

    P. J. Rousseeuw, “Silhouettes: A graphical aid to the in- terpretation and validation of cluster analysis,” J. Com- put. Appl. Math.20, 53–65 (1987).https://doi.org/ 10.1016/0377-0427(87)90125-7

  43. [43]

    A new method for de- termining the type of distribution of plant individuals,

    B. Hopkins and J. G. Skellam, “A new method for de- termining the type of distribution of plant individuals,” Ann. Bot.18, 213–227 (1954).https://doi.org/10. 1093/oxfordjournals.aob.a083391

  44. [44]

    New index for clustering tendency and its application to chemical problems,

    R. G. Lawson and P. C. Jurs, “New index for clustering tendency and its application to chemical problems,” J. Chem. Inf. Comput. Sci.30, 36–41 (1990).https://doi. org/10.1021/ci00065a010

  45. [45]

    Comparing partitions,

    L. Hubert and P. Arabie, “Comparing partitions,” J. Classif.2, 193–218 (1985).https://doi.org/10.1007/ BF01908075

  46. [46]

    A dendrite method for cluster analysis,

    T. Cali´ nski and J. Harabasz, “A dendrite method for cluster analysis,” Commun. Stat. Theory Meth- ods3, 1–27 (1974).https://doi.org/10.1080/ 03610927408827101

  47. [47]

    Kristianto, G

    D. L. Davies and D. W. Bouldin, “A cluster separa- tion measure,” IEEE Trans. Pattern Anal. Mach. Intell. 1, 224–227 (1979).https://doi.org/10.1109/TPAMI. 1979.4766909

  48. [48]

    An extensive comparative study of clus- ter validity indices,

    O. Arbelaitz, I. Gurrutxaga, J. Muguerza, J. M. P´ erez, and I. Perona, “An extensive comparative study of clus- ter validity indices,” Pattern Recognit.46, 243–256 (2013).https://doi.org/10.1016/j.patcog.2012.07. 021

  49. [49]

    A density-based algorithm for discovering clusters in large spatial databases with noise,

    M. Ester, H.-P. Kriegel, J. Sander, and X. Xu, “A density-based algorithm for discovering clusters in large spatial databases with noise,” inProceedings of the Sec- ond International Conference on Knowledge Discovery and Data Mining (KDD-96), pp. 226–231 (1996)

  50. [50]

    Sci- entific discovery in the age of artificial intelligence,

    Z. Zhao, T. Chavez, E. A. Holman, D. Ushizima, H. Yanxon, Y. Liu, S. V. Kalinin, M. Head-Gordon, S. Sankaranarayanan, P. H. Zwart, and A. Hexemer, “Sci- entific discovery in the age of artificial intelligence,” arXiv:2509.03776 (2025).https://doi.org/10.48550/ arXiv.2509.03776

  51. [51]

    Machine learning for ultrafast quantum state engineering in integrated photonics,

    M. Sennary, S. Masullo, F. Setzpfandt, and S. Gr¨ afe, “Machine learning for ultrafast quantum state engineering in integrated photonics,” Light Sci. Appl.14, 350 (2025).https://doi.org/10.1038/ s41377-025-02055-x

  52. [52]

    Adverse event detection in drug development: Recommendations and obligations beyond phase 3,

    J. A. Berlin, J. Glasser, and S. S. Ellenberg, “Adverse event detection in drug development: Recommendations and obligations beyond phase 3,” Am. J. Public Health 98, 1366–1371 (2008).https://doi.org/10.2105/AJPH. 2007.124537

  53. [53]

    A compar- ison of residual diagnosis tools for diagnosing re- gression models for count data,

    C. Feng, L. Li, and A. Sadeghpour, “A compar- ison of residual diagnosis tools for diagnosing re- gression models for count data,” BMC Med. Res. Methodol.23, 243 (2023).https://doi.org/10.1186/ s12874-023-02060-x

  54. [54]

    Computa- tional approaches streamlining drug discovery,

    A. V. Sadybekov and V. Katritch, “Computa- tional approaches streamlining drug discovery,” Na- ture616, 673–685 (2023).https://doi.org/10.1038/ s41586-023-05905-z

  55. [55]

    Evaluation of prediction models for COVID-19 pneumonia severity: a systematic review and meta-analysis,

    Y. Huang, J. Li, M. Li, and J. S. Lipsitz, “Evaluation of prediction models for COVID-19 pneumonia severity: a systematic review and meta-analysis,” BMC Med. Res. Methodol.23, 268 (2023).https://doi.org/10.1186/ s12874-023-02078-1

  56. [56]

    Enabling the discovery of recur- ring anomalies in aerospace problem reports using high- dimensional clustering techniques,

    A. N. Srivastava, “Enabling the discovery of recur- ring anomalies in aerospace problem reports using high- dimensional clustering techniques,” inIEEE Aerospace Conference(2006), pp. 1–11.https://doi.org/10. 1109/AERO.2006.1656136

  57. [57]

    Anomaly detection in aviation data using extreme learning machines,

    V. M. Janakiraman and D. Nielsen, “Anomaly detection in aviation data using extreme learning machines,” in 2016 International Joint Conference on Neural Networks (IJCNN)(2016), pp. 1993–2000.https://doi.org/10. 1109/IJCNN.2016.7727444

  58. [58]

    Inferring causation from time series in Earth sys- tem sciences,

    J. Runge, S. Bathiany, E. Bollt, G. Camps-Valls, D. Coumou, E. Deyle, C. Glymour, M. Kretschmer, M. D. Mahecha, J. Mu˜ noz-Mar´ ı, E. H. van Nes, J. Peters, R. Quax, M. Reichstein, M. Scheffer, B. Sch¨ olkopf, P. Spirtes, G. Sugihara, J. Sun, K. Zhang, and J. Zscheis- chler, “Inferring causation from time series in Earth sys- tem sciences,” Sci. Adv.5, ea...