pith. sign in

arxiv: 2605.26540 · v1 · pith:XQXRXPU5new · submitted 2026-05-26 · ⚛️ physics.chem-ph · cs.AI

DGLD: Domain-Gated Latent Diffusion for the Discovery of Novel Energetic Materials

Pith reviewed 2026-07-01 16:49 UTC · model grok-4.3

classification ⚛️ physics.chem-ph cs.AI
keywords energetic materialslatent diffusiondomain gatingDFT validationCHNO moleculesdetonation velocitygenerative modelsnovel compounds
0
0 comments X

The pith

Domain-gated latent diffusion discovers twelve novel energetic materials validated by first-principles calculations.

A machine-rendered reading of the paper's core claim, the machinery that carries it, and where it could break.

Energetic materials design faces a sparse-label problem where only about three thousand of sixty-six thousand CHNO molecules have high-quality data. Standard generative models tend to either memorize the high-performance examples or produce uncalibrated results. DGLD introduces a label-quality gate at training, multi-task guidance at sampling, and a validation funnel ending in DFT to address this. The result is twelve novel leads that are both new and on-target at the DFT level. The headline compound reaches a calculated density of 2.09 g/cm3 and detonation velocity of 8.25 km/s while being dissimilar to all training molecules.

Core claim

DGLD is the only method tested that produces candidates simultaneously novel and on-target when audited with density functional theory, resulting in twelve DFT-confirmed novel energetic material leads from the CHNO space.

What carries the argument

Domain-Gated Latent Diffusion model with label-quality gate at training time, multi-task score-model guidance at sample time, and four-stage chemistry-validation funnel ending in DFT audit.

If this is right

  • The next HMX-class energetic material can be discovered, validated, and recommended for synthesis at the cost of a few GPU-days.
  • Baseline generative methods either memorize training data at high rates or produce candidates whose performance drops under DFT audit.
  • The method can identify leads from disjoint chemotype families with competitive or superior performance metrics.
  • High-performance energetic materials become discoverable without relying on manual expert design in the sparse data regime.

Where Pith is reading between the lines

These are editorial extensions of the paper, not claims the author makes directly.

  • Gating techniques on label quality could extend to other generative modeling tasks in chemistry where data reliability varies widely.
  • Experimental testing of the proposed leads would be needed to confirm that DFT values translate to real material performance.
  • The release of mined hard negatives and code may facilitate further improvements or applications in related molecular design problems.

Load-bearing premise

The four-stage chemistry-validation funnel ending in DFT audit correctly identifies materials whose real-world performance will match the calculated values.

What would settle it

Synthesizing the headline compound L1 and experimentally measuring its density and detonation velocity to verify agreement with the DFT predictions of 2.09 g/cm3 and 8.25 km/s.

Figures

Figures reproduced from arXiv: 2605.26540 by Alexander Apartsin, Yehudit Aperstein.

Figure 1
Figure 1. Figure 1: Top-1 candidate per method against novelty on three property axes (𝐷, 𝜌, 𝑃). DGLD (blue, 7 settings × 3 seeds) clears the novelty floor (max-Tanimoto < 0.55) on every axis and lands in the HMX-class band. SMILES-LSTM (red X) is exact rediscovery (Tanimoto = 1.0); MolMIM 70 M (gold) is novel but at 𝐷 = 7.70 km/s; REINVENT 4 (green square) reaches 𝐷 = 9.02 km/s at novelty 0.43 ( [PITH_FULL_IMAGE:figures/ful… view at source ↗
Figure 2
Figure 2. Figure 2: Top-200 leads from the pool=40k joint rerank in the (𝐷, 𝑃) plane. Panel A colours each point by predicted density 𝜌 (viridis); panel B by novelty (1 minus max Morgan-FP-2 Tanimoto to the labelled master, plasma 0–1). Anchors and target lines 𝐷 = 9.5 km/s, 𝑃 = 40 GPa overlay both panels. 2. Related work DGLD sits at the intersection of three lines of work: molecular generative modelling, diffusion models wi… view at source ↗
Figure 3
Figure 3. Figure 3: Properties of the labelled corpus. Joint distribution of density and detonation velocity, with literature anchors overlaid. The bulk of the labelled distribution sits at 𝜌 < 1.85, 𝐷 < 8.5 km/s; the high-tail above 𝐷 = 9 km/s contains only a handful of compounds (CL-20, HMX, RDX-class). Generation must extrapolate into this tail [PITH_FULL_IMAGE:figures/full_fig_p011_3.png] view at source ↗
Figure 4
Figure 4. Figure 4: Per-property histograms over the labelled corpus. Density and detonation velocity are sharply peaked; HOF has a heavy tail that the high-tail-oversampling recipe (§3.3) is designed to amplify during conditioning. 3.1 Four-tier label hierarchy Available property labels in the energetic-materials literature span four orders of reliability, from a small core of experimental measurements to a large majority of… view at source ↗
Figure 5
Figure 5. Figure 5: Label-tier composition by property (tiers defined in [PITH_FULL_IMAGE:figures/full_fig_p013_5.png] view at source ↗
Figure 6
Figure 6. Figure 6: DGLD pipeline preview: encode (LIMO VAE) -> generate (conditional latent DDPM) -> guide (multi-task score model) -> filter (SMARTS, Pareto, xTB, DFT). The trust-gating annotation under the row reminds the reader that Tier-A/B labels drive the conditional gradient while Tier-C/D drive the unconditional CFG branch only. Stage references inside each box point at the per-stage panel that walks it. 4.2 LIMO fin… view at source ↗
Figure 7
Figure 7. Figure 7: Property-agnostic SELFIES VAE fine-tuned on the 326k energetic corpus (∼8.5k steps, ELBO with 𝛽 = 0.01). The cached latent mean 𝜇 is the 𝑧0 consumed by [PITH_FULL_IMAGE:figures/full_fig_p016_7.png] view at source ↗
Figure 8
Figure 8. Figure 8: From cached eligibility 𝑒 ∈ {0,1} 4 and tier weight 𝑤tier, five stochastic stages produce the per-step mask 𝑚: subset-size sampling, weighted pick, tentative one-hot, property dropout (0.30), CFG dropout (0.10). Output 𝑚 feeds the FiLM input in [PITH_FULL_IMAGE:figures/full_fig_p017_8.png] view at source ↗
Figure 9
Figure 9. Figure 9: walks denoiser training. Per step: sample 𝑡 ∼ 𝒰{1: 𝑇} and 𝜀 ∼ 𝒩(0,𝐼) ; form 𝑧𝑡 = √𝛼‾𝑡 𝑧0 + √1 − 𝛼‾𝑡 𝜀 on the cosine 𝑇 = 1000 DDPM schedule of Nichol & Dhariwal [dhariwal2021]; FiLM injects (𝑡, 𝑝 ⊙ 𝑚); the network predicts 𝜀̂; the loss is the per-sample MSE ‖𝜀 − 𝜀̂‖ 2 weighted by the row weight 𝜔row of §4.3. Optimiser AdamW, peak LR 10−4 , cosine decay, batch 128, 20 epochs, EMA decay 0.999 [PITH_FULL_IMAG… view at source ↗
Figure 10
Figure 10. Figure 10: Four offline pipelines (run once per corpus) generate per-row labels: Random Forest → 𝑦viab; Politzer– Murray BDE → 𝑦sens; SMARTS + Bruns–Watson → 𝑦haz; 3D-CNN/Uni-Mol smoke ensemble → 𝑦perf ∈ ℝ4 . Cached LIMO 𝑧 is held for [PITH_FULL_IMAGE:figures/full_fig_p020_10.png] view at source ↗
Figure 11
Figure 11. Figure 11: Forward-diffused latent 𝑧𝑡 and the 𝜎𝑡 sinusoidal embedding feed a shared 4-block FiLM-MLP trunk (1024-d) to six heads: Viability and Hazard (sigmoid/BCE), Sensitivity (SmoothL1), Performance (SmoothL1, 𝜌/𝐷/𝑃/HOF), SA, SC. Loss is the head-availability-gated sum ∑𝑘 𝑎𝑘 𝑤𝑘ℒ𝑘 ; AdamW + EMA. Trains on [PITH_FULL_IMAGE:figures/full_fig_p021_11.png] view at source ↗
Figure 12
Figure 12. Figure 12: Three rounds of mine-then-retrain that refine the Viability head only of the [PITH_FULL_IMAGE:figures/full_fig_p022_12.png] view at source ↗
Figure 13
Figure 13. Figure 13: walks sampling. A latent 𝑧𝑇 ∼ 𝒩(0,𝐼1024) is denoised over 𝑡 = 𝑇 → 1 in 40 DDIM steps. At each step, 𝜖̂ = 𝜖𝜃 cfg(𝑧𝑡 ,𝑡, 𝑐) − 𝜎𝑡 ∑ 𝑠ℎ ℎ∈{viab,sens,hazard} ∇𝑧𝑡 ℒℎ (𝑧𝑡 ,𝜎𝑡 ), where 𝜖𝜃 cfg is the standard CFG noise estimate over the frozen denoiser of §4.5 ( [PITH_FULL_IMAGE:figures/full_fig_p023_13.png] view at source ↗
Figure 14
Figure 14. Figure 14: Four-stage funnel on the [PITH_FULL_IMAGE:figures/full_fig_p025_14.png] view at source ↗
Figure 15
Figure 15. Figure 15: Four independent end-to-end sampling lanes, each defined by a (denoiser, guidance) tuple at the headline target conditions: lanes 1-2 are guided (DGLD-H and DGLD-P at viab+sens+hazard) and form the production methodology recipe; lanes 3-4 are unguided baselines (DGLD-H and DGLD-P at CFG-only). Each lane runs end-to￾end (𝑧𝑇 draw → 40 DDIM → LIMO decode); the four pools converge to a single Union + canonica… view at source ↗
Figure 16
Figure 16. Figure 16: Classifier-free guidance scale 𝑤 sweep at pool=8 000 per setting, ranked by the two-denoiser pool. 𝑤 = 7 is the empirical sweet spot [PITH_FULL_IMAGE:figures/full_fig_p027_16.png] view at source ↗
Figure 17
Figure 17. Figure 17: Pool size vs. (i) best composite score over top-1 candidate, (ii) number of candidates passing every filter. Both curves are still moving at pool=40k; the M7 five-lane 100k run (§F.5) confirms the trend: 4 639 passing candidates (5.1× more than the 40k baseline) with scaffold count expanding from 7 to 24. The second bucket is stop-criterion-driven. The self-distillation round count was set by the held-out… view at source ↗
Figure 19
Figure 19. Figure 19: Twelve chem-pass DGLD leads (L1–L5, L9, L11, L13, L16, L18, L19, L20). Each card shows the RDKit 2D depiction, chemotype label, molecular formula, and 6-anchor-calibrated DFT/Kamlet–Jacobs (𝜌, 𝐷, 𝑃) values. The dark circle (top-left) shows the Pareto rank within the merged top-100; “?” indicates a lead (L20) added from the pool=80k extension set and not assigned a top-100 rank. Top-5 leads (L1–L5) additio… view at source ↗
Figure 20
Figure 20. Figure 20: Filtered candidates (post hard-gate): saturating-performance score (x) vs. viability classifier output (y), coloured by sensitivity proxy. Stars mark the Pareto front. 5.3 Physics validation and DFT confirmation (Stages 3+4) Stage 3 (xTB triage). The merged top-100 from Stages 1+2 is the input to Stage 3 GFN2-xTB triage at the 1.5 eV HOMO–LUMO gap gate: 85/100 survive, and 6/8 of the smaller production ga… view at source ↗
Figure 21
Figure 21. Figure 21: Left: dumbbell plot connecting 3D-CNN-predicted 𝐷 (blue) to anchor-calibrated DFT–K-J 𝐷 (orange) for each DFT-converged lead; dotted green line is the HMX-class 9.0 km/s threshold. Right: residual vs N-fraction with linear fit and Pearson 𝑟 (575-row Tier-A pool, see Table C.4). Cross-check on SMARTS-rejected candidates. The same DFT pipeline applied to three of the 23 SMARTS-rejected candidates (rank-2 N-… view at source ↗
Figure 22
Figure 22. Figure 22: Forest plot of top-1 Pareto-reranker composite penalty (mean ± s.d., lower is better) for DGLD hazard￾axis (Hz-C0…Hz-C3) and SA-axis (SA-C1…SA-C3) conditions, SMILES-LSTM, MolMIM 70 M, REINVENT 4 (N￾fraction proxy), and SELFIES-GA 2k (alt-scale composite). MolMIM is a drug-domain reference and its composite is on a different scale (uncalibrated); the bar extends to ~4.79 and is shown for completeness rath… view at source ↗
Figure 23
Figure 23. Figure 23: Productive-quadrant scatter for the 12 DFT-confirmed leads with the four no-diffusion baselines as reference markers. 𝑥-axis: viability probability (RF classifier, energetic vs ZINC); 𝑦-axis: composite score 𝑆 (higher = better). Dashed lines mark the top-5 thresholds (𝑆 = 0.65, viab = 0.83); the green-tinted upper-right quadrant is the productive zone (novel + HMX-class). Marker area is proportional to dr… view at source ↗
Figure 24
Figure 24. Figure 24: Distribution-learning small-multiples comparing SMILES-LSTM (red) against seven DGLD conditions (blue) on validity proxy, top-100 scaffold uniqueness, internal diversity, and FCD vs the labelled master. 5.6 Ablation summary Seven ablations measure the contribution of each system component to the headline [PITH_FULL_IMAGE:figures/full_fig_p040_24.png] view at source ↗
Figure 25
Figure 25. Figure 25: Guidance-ablation forest plot. Each panel shows the effect size (delta vs unguided Hz-C0 = SA-C0 baseline) for one metric across the six guided conditions. Error bars are propagated standard errors. Composite and max-Tanimoto: negative delta is improvement; 𝐷 and 𝑃: positive delta is improvement. Hz-C2 is the best joint novelty condition; SA-axis conditions consistently trade novelty for composite improve… view at source ↗
read the original abstract

Energetic-materials performance gains translate directly into reduced propellant mass, smaller warheads, and more efficient civilian gas-generators, yet no new HMX-class compound has been disclosed in fifteen years. Designing one is a sparse-label problem: of ~66 k labelled CHNO molecules only ~3 k carry experimental or DFT-quality measurements, and naive generative models trained on the full mixture either memorise the high-performance tail or extrapolate without calibration. We introduce Domain-Gated Latent Diffusion (DGLD): a label-quality gate at training time, multi-task score-model guidance at sample time, and a four-stage chemistry-validation funnel ending in first-principles DFT audit. The result is 12 DFT-confirmed novel leads. The headline compound, 3,4,5-trinitro-1,2-isoxazole (L1), reaches \r{ho}_"cal" =2.09 g/cm3 and D_"K-J,cal" =8.25 km/s and is structurally dissimilar from all 65 980 training molecules (nearest-neighbour Tanimoto 0.27). A co-headline lead, E1 (4-nitro-1,2,3,5-oxatriazole), exceeds L1 on calibrated detonation velocity (D_"K-J,cal" =9.00 km/s) from a chemotype family disjoint from L1's. DGLD is the only method to land in the productive quadrant (simultaneously novel and on-target) at DFT level. SMILES-LSTM memorises 18.3% of its outputs exactly; SELFIES-GA's best novel candidate loses 3.5 km/s under DFT audit; REINVENT 4 generates novel high-N heterocycles but peaks at D=9.02 km/s. Code, checkpoints, and 918 mined hard negatives are released on Zenodo (DOI 10.5281/zenodo.19821953); the next compound to enter the HMX-class band can be discovered, validated, and recommended for synthesis at the cost of a few GPU-days.

Editorial analysis

A structured set of objections, weighed in public.

Desk editor's note, referee report, simulated authors' rebuttal, and a circularity audit. Tearing a paper down is the easy half of reading it; the pith above is the substance, this is the friction.

Referee Report

1 major / 2 minor

Summary. The paper introduces Domain-Gated Latent Diffusion (DGLD) for discovering novel energetic materials in the sparse-label CHNO space (~66k molecules, ~3k with experimental/DFT labels). It employs a label-quality gate during training, multi-task score-model guidance at sampling, and a four-stage chemistry-validation funnel ending in first-principles DFT audit. This produces 12 DFT-confirmed novel leads; the headline compound L1 (3,4,5-trinitro-1,2-isoxazole) reaches ρ_cal=2.09 g/cm³ and D_K-J,cal=8.25 km/s with nearest-neighbour Tanimoto similarity 0.27 to the training set. A second lead E1 reaches D_K-J,cal=9.00 km/s from a disjoint chemotype. DGLD is the only baseline to occupy the productive quadrant (novel and on-target) at DFT level, while SMILES-LSTM memorizes 18.3% of outputs, SELFIES-GA loses 3.5 km/s under DFT, and REINVENT 4 peaks at 9.02 km/s but generates high-N heterocycles. Code, checkpoints, and 918 hard negatives are released on Zenodo.

Significance. If the central claim holds, the work would be significant for providing a practical, calibrated generative framework that navigates the sparse-label regime in energetic-materials design and delivers multiple DFT-audited candidates with performance metrics competitive with or exceeding known high explosives. The explicit release of code, checkpoints, and the mined hard-negative set is a clear strength that supports independent verification of the generative outputs and the funnel.

major comments (1)
  1. [Abstract / Methods (four-stage funnel)] Abstract and Methods (four-stage funnel description): The headline claim of twelve DFT-confirmed novel leads and DGLD as the sole method in the productive quadrant rests on the four-stage chemistry-validation funnel plus final DFT audit correctly extracting molecules whose computed properties are insensitive to generative artifacts. The manuscript supplies no quantitative stress-test of the funnel against documented diffusion-model failure modes (mode collapse onto high-N heterocycles, density inflation from idealized single-molecule geometries) and no cross-validation of the DFT protocol (functional, basis, dispersion, convergence criteria) on either the 918 hard negatives or the 3 k experimental/DFT reference set.
minor comments (2)
  1. [Abstract] Abstract: LaTeX formatting artifacts (\r{ho}_"cal", D_"K-J,cal") should be rendered consistently in the published version.
  2. [Abstract] Abstract: The Tanimoto similarity of 0.27 for L1 is cited as evidence of structural novelty; a short statement of the similarity distribution across the full training set would strengthen the claim.

Simulated Author's Rebuttal

1 responses · 0 unresolved

We thank the referee for the positive assessment of the work's significance and for the detailed major comment. We address it directly below.

read point-by-point responses
  1. Referee: [Abstract / Methods (four-stage funnel)] Abstract and Methods (four-stage funnel description): The headline claim of twelve DFT-confirmed novel leads and DGLD as the sole method in the productive quadrant rests on the four-stage chemistry-validation funnel plus final DFT audit correctly extracting molecules whose computed properties are insensitive to generative artifacts. The manuscript supplies no quantitative stress-test of the funnel against documented diffusion-model failure modes (mode collapse onto high-N heterocycles, density inflation from idealized single-molecule geometries) and no cross-validation of the DFT protocol (functional, basis, dispersion, convergence criteria) on either the 918 hard negatives or the 3 k experimental/DFT reference set.

    Authors: We acknowledge that an explicit quantitative stress-test of the funnel against the cited failure modes would strengthen the presentation. The four-stage funnel was constructed precisely to counter those modes (chemical-validity filter, Tanimoto novelty gate, multi-task score guidance, and final DFT audit), and the empirical outcomes—DGLD alone occupying the productive quadrant while baselines exhibit memorization, property collapse, or high-N heterocycle bias—provide indirect evidence of its effectiveness. The public release of the 918 hard negatives was intended to enable exactly such community-driven stress tests. Regarding the DFT protocol, it follows the same PBE0/def2-TZVP+D3 level used to generate the 3 k reference labels; a dedicated cross-validation subsection on a 200-molecule subset of the reference set and on the hard-negative pool will be added in revision to quantify sensitivity to functional/basis choices. revision: partial

Circularity Check

0 steps flagged

No significant circularity: claims rest on external DFT validation and independent benchmarks

full rationale

The derivation chain consists of training a domain-gated latent diffusion model on ~66k CHNO molecules (with a label-quality gate), sampling candidates via multi-task guidance, then routing outputs through a four-stage funnel that terminates in first-principles DFT property calculations. Novelty is quantified by Tanimoto distance to the training set (0.27 for L1), and performance is audited by external DFT rather than any internal fitted quantity. Comparisons to SMILES-LSTM, SELFIES-GA and REINVENT 4 are performed on the same DFT protocol and report concrete failure modes (exact memorization, velocity loss, etc.). No step equates a claimed prediction to a fitted parameter by construction, invokes a self-citation as a uniqueness theorem, or renames an input as an output. The central result (12 DFT-confirmed leads, only method in the productive quadrant) is therefore falsifiable by independent DFT runs on the released code and hard-negative set.

Axiom & Free-Parameter Ledger

1 free parameters · 1 axioms · 0 invented entities

The central claim rests on the domain gate preventing memorization and on DFT serving as a reliable final filter; no new physical entities are postulated.

free parameters (1)
  • label-quality gate threshold
    The training-time gate separating high- and low-quality labels is a tunable parameter whose exact value is not stated in the abstract.
axioms (1)
  • domain assumption DFT calculations supply sufficiently accurate predictions of density and detonation velocity for CHNO molecules to serve as the final validation standard.
    The four-stage funnel terminates with DFT audit as the decisive confirmation step.

pith-pipeline@v0.9.1-grok · 5924 in / 1427 out tokens · 62609 ms · 2026-07-01T16:49:10.473654+00:00 · methodology

discussion (0)

Sign in with ORCID, Apple, or X to comment. Anyone can read and Pith papers without signing in.

Reference graph

Works this paper leans on

70 extracted references · 49 canonical work pages · 7 internal anchors

  1. [1]

    K., & Yu, R

    Eckmann, P., Sun, K., Zhao, B., Feng, M., Gilson, M. K., & Yu, R. (2022). LIMO: Latent Inceptionism for Targeted Molecule Generation. ICML 2022. arXiv:2206.09010

  2. [2]

    Gómez-Bombarelli, R. et al. (2018). Automatic Chemical Design Using a Data -Driven Continuous Representation of Molecules. ACS Central Science 4(2) :268–276. doi:10.1021/acscentsci.7b00572

  3. [3]

    Jin, W., Barzilay, R., & Jaakkola, T. (2018). Junction Tree Variational Autoencoder for Molecular Graph Generation. ICML 2018. arXiv:1802.04364

  4. [4]

    K., Gill, M., & Israeli, J

    Reidenbach, D., Livne, M., Ilango, R. K., Gill, M., & Israeli, J. (2023). MolMIM: A Molecular Language Model for Property -Guided Molecule Generation via Mutual Information Machines. (MLDD Workshop, ICLR 2023; arXiv:2208.09016)

  5. [5]

    Ross, J. et al. (2022). Large -Scale Chemical Language Representations Capture Molecular Structure and Properties. Nature Machine Intelligence 4 :1256 –1264. doi:10.1038/s42256 -022-00580 -7

  6. [6]

    Bengio, E., Jain, M., Korablyov, M., Precup, D., & Bengio, Y. (2021). Flow Network Based Generative Models for Non -Iterative Diverse Candidate Generation. NeurIPS 2021 . arXiv:2106.04399

  7. [7]

    Hoogeboom, E., Garcia Satorras, V., Vignac, C., & Welling, M. (2022). Equivariant Diffusion for Molecule Generation in 3D. ICML 2022. arXiv:2203.17003

  8. [8]

    Vignac, C. et al. (2023). DiGress: Discrete Denoising Diffusion for Graph Generation. ICLR 2023 . arXiv:2209.14734

  9. [9]

    Irwin, R., Dimitriadis, S., He, J., & Bjerrum, E. J. (2022). Chemformer: A Pre -Trained Transformer for Computational Chemistry. Mach. Learn.: Sci. Tech. 3 :015022. doi:10.1088/2632 -2153/ac3ffb

  10. [10]

    Peng, X., Guan, J., Liu, Q., & Ma, J. (2023). MolDiff: Addressing the Atom-Bond Inconsistency Problem in 3D Molecule Diffusion Generation. ICML 2023. arXiv:2305.07508

  11. [11]

    Mathieu, D. (2017). Sensitivity of Energetic Materials: Theoretical Relationships to Detonation Performance and Molecular Structure. Ind. Eng. Chem. Res. 56(31) :8191 –8201. doi:10.1021/acs.iecr.7b02021

  12. [12]

    Daylight Chemical Information Systems. (2007). SMARTS: A Language for Describing Molecular Patterns. Daylight Theory Manual, Aliso Viejo, CA. daylight.com/dayhtml/doc/theory/theory.smarts.html. SMARTS = SMILES Arbitrary Target Specification: a pattern language extending SMILES that matches molecular substructures, used by RDKit and other cheminformatics t...

  13. [13]

    Politzer, P., & Murray, J. S. (2014). Some Perspectives on Estimating Detonation Properties of C, H, N, O Compounds. Cent. Eur. J. Energ. Mater. 11(4) :459–474

  14. [14]

    Sućeska, M. (2018). EXPLO5 v6.05.04 User's Manual. Brodarski Institute, Zagreb, Croatia. Computer program for calculation of detonation parameters from molecular formula, density, and heat of formation via thermochemical -equilibrium Chapman –Jouguet solver with covolume EOS

  15. [15]

    E., Howard, W

    Fried, L. E., Howard, W. M., Souers, P. C., & Vitello, P. A. (2014). Cheetah 7.0 User's Manual. Lawrence Livermore National Laboratory technical report LLNL -SM-664002. Thermochemical -equilibrium detonation code with JCZ3 / BKWS covolume EOS

  16. [16]

    J., & Jacobs, S

    Kamlet, M. J., & Jacobs, S. J. (1968). Chemistry of Detonations. I. A Simple Method for Calculating Detonation Properties of C -H-N-O Explosives. J. Chem. Phys. 48:23–55. doi:10.1063/1.1667908

  17. [17]

    C., Boukouvalas, Z., Butrico, M

    Elton, D. C., Boukouvalas, Z., Butrico, M. S., Fuge, M. D., & Chung, P. W. (2018). Applying Machine Learning Techniques to Predict the Properties of Energetic Materials. Sci. Rep. 8:9059

  18. [18]

    D., Son, S

    Casey, A. D., Son, S. P., Bilionis, I., & Barnes, B. C. (2020). Prediction of Energetic Material Properties from Electronic Structure Using 3D Convolutional Neural Networks. J. Chem. Inf. Model. 60(10) :4457–

  19. [19]

    doi:10.1021/acs.jcim.0c00259

  20. [20]

    Zhou, G. et al. (2023). Uni -Mol: A Universal 3D Molecular Representation Learning Framework. ICLR 2023

  21. [21]

    Huang, X. et al. (2021). Applying Machine Learning to Balance Performance and Stability of High Energy Density Materials. iScience 24 :102803

  22. [22]

    Hervé, G., Roussel, C., & Graindorge, H. (2010). Selective Preparation of 3,4,5 -Trinitro-1H-pyrazole: A Stable All-Carbon-Substituted Trinitro Heterocycle, and Related Trinitroisoxazole Chemistry. Angew. Chem. Int. Ed. 49(18) :3177 –3181. doi:10.1002/anie.201000764. 47

  23. [23]

    Sabatini, J. J. (2018). A Review of Nitroisoxazole -Based Energetic Compounds. Propellants, Explosives, Pyrotechnics 43(1) :28–37. doi:10.1002/prep.201700225

  24. [24]

    A., Lisyutkin, A

    Konnov, A. A., Lisyutkin, A. D., Vinogradov, D. B., Nazarova, A. A., Pivkina, A. N., & Fershtat, L. L. (2025). Synthesis of 4 -Nitroisoxazole-Based Energetic Materials. Org. Lett. 27(14) :3795–3799. doi:10.1021/acs.orglett.5c01074

  25. [25]

    Ho, J., & Salimans, T. (2022). Classifier -Free Diffusion Guidance. arXiv:2207.12598

  26. [26]

    Dhariwal, P., & Nichol, A. (2021). Diffusion Models Beat GANs on Image Synthesis. NeurIPS 2021. arXiv:2105.05233

  27. [27]

    Song, Y., & Ermon, S. (2019). Generative Modeling by Estimating Gradients of the Data Distribution. NeurIPS 2019 . arXiv:1907.05600

  28. [28]

    Song, Y. et al. (2021). Score -Based Generative Modeling through Stochastic Differential Equations. ICLR

  29. [29]

    Ho, J., Jain, A., & Abbeel, P. (2020). Denoising Diffusion Probabilistic Models. NeurIPS 2020 . arXiv:2006.11239

  30. [30]

    Rombach, R., Blattmann, A., Lorenz, D., Esser, P., & Ommer, B. (2022). High-Resolution Image Synthesis with Latent Diffusion Models. CVPR 2022. arXiv:2112.10752

  31. [31]

    Krenn, M., Häse, F., Nigam, A., Friederich, P., & Aspuru -Guzik, A. (2020). Self-Referencing Embedded Strings (SELFIES): A 100% Robust Molecular String Representation. Mach. Learn.: Sci. Tech. 1 :045024

  32. [32]

    Ertl, P., & Schuffenhauer, A. (2009). Estimation of Synthetic Accessibility Score of Drug-Like Molecules Based on Molecular Complexity and Fragment Contributions. J. Cheminform. 1:8

  33. [33]

    W., Rogers, L., Green, W

    Coley, C. W., Rogers, L., Green, W. H., & Jensen, K. F. (2018). SCScore: Synthetic Complexity Learned from a Reaction Corpus. J. Chem. Inf. Model. 58(2) :252–261

  34. [34]

    J., & Tanimoto, T

    Rogers, D. J., & Tanimoto, T. T. (1960). A Computer Program for Classifying Plants. Science 132(3434) :1115 –1118

  35. [35]

    RDKit: Open -source cheminformatics

    Landrum, G., & contributors. RDKit: Open -source cheminformatics. rdkit.org

  36. [36]

    Sterling, T., & Irwin, J. J. (2015). ZINC 15: Ligand Discovery for Everyone. J. Chem. Inf. Model. 55(11) :2324 –2337

  37. [37]

    Kim, S. et al. (2023). PubChem 2023 Update. Nucleic Acids Res. 51(D1) :D1373–D1380

  38. [38]

    Jaegle, A. et al. (2021). Perceiver: General Perception with Iterative Attention. ICML 2021 . arXiv:2103.03206

  39. [40]

    H., He, J., Tibo, A., Janet, J

    Loeffler, H. H., He, J., Tibo, A., Janet, J. P., Voronov, A., Mervin, L. H., & Engkvist, O. (2024). REINVENT 4: Modern AI -driven generative molecule design. J. Cheminformatics 16 :20. doi:10.1186/s13321-024- 00812 -5

  40. [41]

    Yang, X., Zhang, J., Yoshizoe, K., Terayama, K., & Tsuda, K. (2017). ChemTS: An efficient python library for de novo molecular generation. Sci. Tech. Adv. Mater. 18(1) :972–976. doi:10.1080/14686996.2017.1401424

  41. [42]

    K., & Priyakumar, U

    Bagal, V., Aggarwal, R., Vinod, P. K., & Priyakumar, U. D. (2022). MolGPT: Molecular Generation Using a Transformer -Decoder Model. J. Chem. Inf. Model. 62(9) :2064 –2076. doi:10.1021/acs.jcim.1c00600

  42. [43]

    Winter, R., Montanari, F., Noé, F., & Clevert, D.-A. (2019). Learning continuous and data-driven molecular descriptors by translating equivalent chemical representations. Chem. Sci. 10(6) :1692–1701. doi:10.1039/C8SC04175J

  43. [45]

    Schneuing, A. et al. (2022). Structure-based Drug Design with Equivariant Diffusion Models. NeurIPS 2022 AI4Science Workshop . arXiv:2210.13695

  44. [46]

    Guan, J. et al. (2023). 3D Equivariant Diffusion for Target -Aware Molecule Generation and Affinity Prediction. ICLR 2023. arXiv:2303.03543

  45. [47]

    Corso, G., Stärk, H., Jing, B., Barzilay, R., & Jaakkola, T. (2023). DiffDock: Diffusion Steps, Twists, and Turns for Molecular Docking. ICLR 2023. arXiv:2210.01776. 48

  46. [48]

    Peng, X., Luo, S., Guan, J., Xie, Q., Peng, J., & Ma, J. (2022). Pocket2Mol: Efficient Molecular Sampling Based on 3D Protein Pockets. ICML 2022. arXiv:2205.07249

  47. [49]

    Nefati, H., Cense, J.-M., & Legendre, J.-J. (1996). Prediction of the Impact Sensitivity by Neural Networks. J. Chem. Inf. Comput. Sci. 36(4) :804–810. doi:10.1021/ci950223m

  48. [50]

    Klapötke, T. M. Chemistry of High -Energy Materials , 5th ed. (de Gruyter, 2019). doi:10.1515/9783110624571

  49. [51]

    -R., & Hernández -Lobato, J

    Griffiths, R. -R., & Hernández -Lobato, J. M. (2020). Constrained Bayesian optimization for automatic chemical design using variational autoencoders. Chem. Sci. 11(2) :577–586. doi:10.1039/C9SC04026A

  50. [52]

    Yang, K. et al. (2019). Analyzing Learned Molecular Representations for Property Prediction. J. Chem. Inf. Model. 59(8) :3370 –3388. doi:10.1021/acs.jcim.9b00237

  51. [53]

    T., Sauceda, H

    Schütt, K. T., Sauceda, H. E., Kindermans, P.-J., Tkatchenko, A., & Müller, K.-R. (2018). SchNet: A deep learning architecture for molecules and materials. J. Chem. Phys. 148:241722. doi:10.1063/1.5019779

  52. [54]

    R., & Miller III, T

    Qiao, Z., Welborn, M., Anandkumar, A., Manby, F. R., & Miller III, T. F. (2020). OrbNet: Deep learning for quantum chemistry using symmetry -adapted atomic -orbital features. J. Chem. Phys. 153 :124111. doi:10.1063/5.0021955

  53. [55]

    Brown, N., Fiscato, M., Segler, M. H. S., & Vaucher, A. C. (2019). GuacaMol: Benchmarking Models for de Novo Molecular Design. J. Chem. Inf. Model. 59(3) :1096 –1108. doi:10.1021/acs.jcim.8b00839

  54. [56]

    Polykovskiy, D. et al. (2020). Molecular Sets (MOSES): A Benchmarking Platform for Molecular Generation Models. Frontiers in Pharmacology 11 :565644. doi:10.3389/fphar.2020.565644

  55. [57]

    Preuer, K., Renz, P., Unterthiner, T., Hochreiter, S., & Klambauer, G. (2018). Fréchet ChemNet Distance: A Metric for Generative Models for Molecules in Drug Discovery. J. Chem. Inf. Model. 58(9) :1736 –1741

  56. [58]

    Reymond, J.-L. (2015). The chemical space project. Acc. Chem. Res. 48(3) :722–730

  57. [59]

    Hand-compilation of measured density, heat of formation, and detonation properties for ~3 000 known energetic CHNO compounds, assembled in this work from secondary literature compilations: Klapötke, T. M. Chemistry of High -Energy Materials , 5th ed. (de Gruyter, 2019); Cooper, P. W. Explosives Engineering (Wiley-VCH, 1996); and Dobratz, B. M. & Crawford,...

  58. [60]

    cameochemicals.noaa.gov

    NIST CAMEO Chemicals: Database of Hazardous Materials and Reactivity. cameochemicals.noaa.gov

  59. [61]

    dangerous reactivity

    Bruns, H., & Watson, P. (2020). SMARTS-based reactivity demerit catalogues for energetic-materials triage (in-house compilation following the ChemAxon “dangerous reactivity” rule set)

  60. [62]

    Bannwarth, C., Ehlert, S., & Grimme, S. (2019). GFN2-xTB: An Accurate and Broadly Parametrized Self- Consistent Tight-Binding Quantum Chemical Method with Multipole Electrostatics and Density-Dependent Dispersion Contributions. J. Chem. Theory Comput. 15(3) :1652 –1671. doi:10.1021/acs.jctc.8b01176

  61. [63]

    Goerigk, L., Hansen, A., Bauer, C., Ehrlich, S., Najibi, A., & Grimme, S. (2017). A look at the density functional theory zoo with the advanced GMTKN55 database for general main group thermochemistry, kinetics and noncovalent interactions. Phys. Chem. Chem. Phys. 19(48) :32184–32215. doi:10.1039/C7CP04913G

  62. [64]

    Bondi, A. (1964). van der Waals Volumes and Radii. J. Phys. Chem. 68(3) :441–451. doi:10.1021/j100785a001

  63. [65]

    -L., Engkvist, O., & Bjerrum, E

    Genheden, S., Thakkar, A., Chadimová, V., Reymond, J. -L., Engkvist, O., & Bjerrum, E. J. (2020). AiZynthFinder: a fast, robust and flexible open -source software for retrosynthetic planning. Journal of Cheminformatics 12 :70. doi:10.1186/s13321 -020-00472 -1

  64. [66]

    Sun, Q., Zhang, X., Banerjee, S., Bao, P., et al. (2020). Recent developments in the PySCF program package. J. Chem. Phys. 153:024109. doi:10.1063/5.0006074

  65. [67]

    Perez, E., Strub, F., de Vries, H., Dumoulin, V., & Courville, A. (2018). FiLM: Visual Reasoning with a General Conditioning Layer. AAAI 2018. arXiv:1709.07871

  66. [68]

    O., Ermon, S., & Leskovec, J

    Xu, M., Powers, A., Dror, R. O., Ermon, S., & Leskovec, J. (2023). Geometric Latent Diffusion Models for 3D Molecule Generation. ICML 2023. arXiv:2305.01140

  67. [69]

    Z. et al. (2025). De novo multi -objective generation framework for energetic materials with trading off energy and stability. npj Computational Materials . doi:10.1038/s41524 -025-01845 -6. 49

  68. [70]

    B., Nguyen, P

    Choi, J. B., Nguyen, P. C. H., Sen, O., Udaykumar, H. S., & Baek, S. (2023). Artificial Intelligence Approaches for Energetic Materials by Design: State of the Art, Challenges, and Future Directions. Propellants, Explosives, Pyrotechnics 48(4) , e202200276. doi:10.1002/prep.202200276

  69. [71]

    E., & Day, G

    Arnold, J. E., & Day, G. M. (2023). Crystal Structure Prediction of Energetic Materials. Crystal Growth & Design. doi:10.1021/acs.cgd.3c00706

  70. [72]

    V., Marrs III, F

    Davis, J. V., Marrs III, F. W., Cawkwell, M. J., & Manner, V. W. (2024). Machine Learning Models for High Explosive Crystal Density and Performance. Chemistry of Materials 36(22) , 11109 –11118. doi:10.1021/acs.chemmater.4c01978