arxiv: 2605.07048 · v1 · submitted 2026-05-07 · 💻 cs.LG · cs.AI

Recognition: no theorem link

Unlocking High-Fidelity Molecular Generation from Mass Spectra via Dual-Stream Line Graph Diffusion

Xujun Che , Xiuxia Du , Depeng Xu

Authors on Pith no claims yet

Pith reviewed 2026-05-11 01:28 UTC · model grok-4.3

classification 💻 cs.LG cs.AI

keywords molecular generationmass spectragraph diffusionline graphsde novo designatom-bond dependencycross-attention

0 comments

The pith

Dual-stream line graph diffusion resolves atom-bond circular dependencies to triple top-1 accuracy in mass-spectrum molecule generation.

A machine-rendered reading of the paper's core claim, the machinery that carries it, and where it could break.

The paper argues that single-stream graph diffusion models leave the circular dependency between atom identities and bond types unresolved because synchronization can occur only implicitly across layers. It proposes splitting the denoising task into two alternating subproblems, one for atoms and one for bonds, with the bond stream operating on a line graph whose nodes represent original edges. Incidence-constrained bidirectional cross-attention then enforces that each atom attends exclusively to its incident bonds and each bond attends to its endpoint atoms at every layer. This architectural separation, rather than any specific aggregation kernel or pre-training, produces top-1 accuracies of 34.37 percent on NPLIB1 and 23.89 percent on MassSpecGym, roughly three times higher than prior methods, and the architecture alone already beats the previous best pretrained baseline. A reader would care because the result suggests that explicitly factoring interdependent reasoning tasks can materially improve fidelity on this inverse problem.

Core claim

DualLGD reformulates molecular graph denoising as the alternating solution of atom-level reasoning and bond-level reasoning, each in its own dedicated representation space, with the line graph supplying the bond space and incidence-constrained bidirectional cross-attention synchronizing the streams while respecting chemical incidence relations.

What carries the argument

Dual-stream architecture in which atoms and bonds occupy separate streams, bonds are represented on the line graph, and incidence-constrained bidirectional cross-attention synchronizes the streams at every layer.

If this is right

Top-1 accuracy reaches 34.37 percent on NPLIB1 and 23.89 percent on MassSpecGym, approximately three times the prior state of the art.
The dual-stream model without any pre-training already exceeds the previous best fully pretrained single-stream model.
Ablation studies attribute the gains primarily to the dual-stream separation rather than to kernel choice or training regime.
Bond-level motifs such as angles, dihedrals, conjugation chains, and rings become native local neighborhoods on the line graph.

Where Pith is reading between the lines

These are editorial extensions of the paper, not claims the author makes directly.

The same separation of node and edge reasoning streams could be tested on other inverse graph problems where node labels and edge labels are mutually dependent.
The reduced need for pre-training may allow the method to be applied more readily to smaller or domain-specific spectral datasets.
Performance on molecules larger than those in the current benchmarks would reveal whether the cross-attention synchronization scales without additional constraints.
Replacing the incidence constraint with learned attention masks could test how strictly the chemical incidence rule must be enforced.

Load-bearing premise

The circular dependency between atoms and bonds is the dominant architectural bottleneck and can be resolved by separating the two into dedicated streams with constrained cross-attention.

What would settle it

A controlled experiment in which a single-stream model is given explicit, incidence-respecting atom-bond synchronization mechanisms at each layer and still fails to reach DualLGD accuracy on the same NPLIB1 and MassSpecGym test sets.

Figures

Figures reproduced from arXiv: 2605.07048 by Depeng Xu, Xiuxia Du, Xujun Che.

**Figure 2.** Figure 2: Line graph construction, illustrated on trans-propenylbenzene. (only heavy atoms are shown, as the molecular graph operates over heavy atoms). Left: the molecular graph G with aromatic (purple), double (orange), and single (blue) bonds. Center: the corresponding line graph L(G), where each bond becomes an independent node and two nodes are adjacent if they share an endpoint atom. Right: four chemical relat… view at source ↗

**Figure 3.** Figure 3: Speed–quality trade-off of efficient long-range reverse sampling on NPLIB1. [PITH_FULL_IMAGE:figures/full_fig_p010_3.png] view at source ↗

**Figure 4.** Figure 4: Cross-attention learns electronegativity-aware endpoint preference without explicit su [PITH_FULL_IMAGE:figures/full_fig_p011_4.png] view at source ↗

**Figure 5.** Figure 5: Scalability of the line graph stream self-attention on a single NVIDIA H200 GPU. [PITH_FULL_IMAGE:figures/full_fig_p015_5.png] view at source ↗

**Figure 6.** Figure 6: Performance of DualLGD on NPLIB1 stratified by structural descriptors: (a) heavy-atom [PITH_FULL_IMAGE:figures/full_fig_p016_6.png] view at source ↗

**Figure 7.** Figure 7: Representative generation results of DualLGD on NPLIB1. Each row shows the ground [PITH_FULL_IMAGE:figures/full_fig_p018_7.png] view at source ↗

**Figure 8.** Figure 8: Visualization of the reverse diffusion process for three molecules from NPLIB1. From left [PITH_FULL_IMAGE:figures/full_fig_p019_8.png] view at source ↗

read the original abstract

De novo molecular generation from tandem mass spectra is a challenging inverse problem whose core difficulty lies in the circular dependency between atom-level and bond-level reasoning: determining a bond's type requires knowing its endpoint atoms' chemical environment, yet an atom's environment is in turn defined by its incident bonds. Existing graph diffusion methods process atoms and bonds within a single computation stream, where atom-bond information synchronization can only occur implicitly across layers. We argue that this single-stream paradigm, rather than the choice of any particular aggregation kernel, is a key architectural bottleneck. We propose DualLGD (Dual-stream Line Graph Diffusion), which reformulates molecular graph denoising as the alternating solution of two coupled subproblems: atom-level reasoning and bond-level reasoning, each operating in its own dedicated representation space. The line graph provides a natural mathematical construction for the bond space, in which bond angles, dihedrals, conjugation chains, and rings correspond to local topological motifs between bonds. Incidence-constrained bidirectional cross-attention synchronizes the two streams at every layer, ensuring that each atom attends only to its incident bonds and vice versa, respecting the fundamental chemical principle that an atom's environment is determined by its bonding context. On the NPLIB1 and MassSpecGym benchmarks, DualLGD achieves top-1 accuracy of 34.37\% and 23.89\%, approximately $3\times$ the previous state of the art. Ablation studies confirm the architecture as the primary source of improvement: DualLGD without any pre-training already surpasses the previous best fully pretrained model.

Editorial analysis

A structured set of objections, weighed in public.

Desk editor's note, referee report, simulated authors' rebuttal, and a circularity audit. Tearing a paper down is the easy half of reading it; the pith above is the substance, this is the friction.

Desk Editor's Note private letter to a colleague

DualLGD shows clear benchmark gains on mass spectra to molecule generation, but the ablations leave open whether the dual-stream split itself drives the lift or if the line-graph plus constrained attention does most of the work.

read the letter

The paper's core move is to split molecular graph denoising into two coupled streams: one for atoms and one for bonds, with the bond stream running on a line graph so local motifs like rings and chains become direct neighbors. Incidence-constrained bidirectional cross-attention then keeps the streams synchronized at every layer without letting atoms or bonds attend outside their actual connections. That setup is new relative to the single-stream graph diffusion baselines they cite, and it directly targets the atom-bond circularity they flag in the abstract.

Referee Report

2 major / 2 minor

Summary. The manuscript introduces DualLGD, a dual-stream line graph diffusion model for de novo molecular generation from tandem mass spectra. It reformulates the denoising process as alternating atom-level and bond-level reasoning, with the bond stream operating on a line graph to capture motifs such as rings and conjugation, and uses incidence-constrained bidirectional cross-attention to synchronize the two streams while respecting chemical incidence. The paper reports top-1 accuracies of 34.37% on NPLIB1 and 23.89% on MassSpecGym (approximately 3× prior SOTA), with ablations indicating that the dual-stream architecture is the primary driver of gains, as the model without pre-training already surpasses previous fully pretrained baselines.

Significance. If the results hold, the work provides a principled architectural solution to the atom-bond circular dependency in molecular graph diffusion, which could influence future models in mass-spec-based generation. The line-graph formulation for bond-level reasoning and the explicit incidence constraints in attention are technically interesting contributions. The finding that architecture alone outperforms prior pretrained models is a notable strength, as is the emphasis on reproducible benchmark gains without heavy reliance on pre-training.

major comments (2)

[§4.3] §4.3 (Ablation Studies): The single-stream baselines used for comparison do not appear to incorporate the line-graph bond representation or the incidence-constrained bidirectional cross-attention (adapted to a single stream). Without this control experiment, the central claim that the dual-stream split itself is the primary source of the reported gains (rather than the line-graph construction or attention mechanism) cannot be isolated, weakening the attribution in the abstract and §4.3.
[Table 1] Table 1 (Benchmark Results): The top-1 accuracy figures (34.37% on NPLIB1, 23.89% on MassSpecGym) are reported without error bars, standard deviations, or details on the number of independent runs or sampling seeds. Given the stochastic nature of diffusion models, this omission makes it difficult to evaluate the reliability of the ~3× improvement claim over prior methods.

minor comments (2)

[Abstract] Abstract: The phrase 'approximately 3× the previous state of the art' would be clearer if the exact prior top-1 accuracies were stated for immediate comparison.
[§3.2] §3.2 (Line Graph Construction): A brief illustrative example or small diagram showing how a simple molecule's bonds map to line-graph nodes (e.g., capturing a ring or conjugation) would improve accessibility of the bond-space representation.

Simulated Author's Rebuttal

2 responses · 0 unresolved

We thank the referee for the detailed and constructive feedback on our manuscript. We address the major comments point by point below and outline the revisions we plan to make.

read point-by-point responses

Referee: [§4.3] §4.3 (Ablation Studies): The single-stream baselines used for comparison do not appear to incorporate the line-graph bond representation or the incidence-constrained bidirectional cross-attention (adapted to a single stream). Without this control experiment, the central claim that the dual-stream split itself is the primary source of the reported gains (rather than the line-graph construction or attention mechanism) cannot be isolated, weakening the attribution in the abstract and §4.3.

Authors: We appreciate the referee highlighting this important point regarding the isolation of the dual-stream contribution. Our single-stream baselines were intended to represent standard single-stream graph diffusion models from the literature, which do not employ line-graph representations or incidence-constrained attention mechanisms. However, to more rigorously demonstrate that the dual-stream architecture is the key driver, we will add a new ablation study in the revised manuscript. Specifically, we will implement a single-stream variant that incorporates the line-graph bond representation and an adapted version of the incidence-constrained bidirectional cross-attention, and compare its performance directly to DualLGD. This controlled experiment will strengthen the attribution of performance gains to the dual-stream design. revision: yes
Referee: [Table 1] Table 1 (Benchmark Results): The top-1 accuracy figures (34.37% on NPLIB1, 23.89% on MassSpecGym) are reported without error bars, standard deviations, or details on the number of independent runs or sampling seeds. Given the stochastic nature of diffusion models, this omission makes it difficult to evaluate the reliability of the ~3× improvement claim over prior methods.

Authors: We agree that reporting statistical variability is essential for assessing the reliability of results from stochastic models such as diffusion models. In the revised manuscript, we will rerun the experiments for the main benchmark results using multiple independent runs with different random seeds (we plan for at least 5 runs). We will update Table 1 to include the mean top-1 accuracies and standard deviations for DualLGD and the baseline methods where applicable. This will provide a clearer picture of the robustness of the reported improvements. revision: yes

Circularity Check

0 steps flagged

No circularity: empirical gains on external benchmarks rest on architecture and ablations, not self-referential definitions or fitted inputs

full rationale

The paper's derivation chain consists of a design argument (single-stream as bottleneck for atom-bond circular dependency) followed by an empirical claim (DualLGD top-1 accuracies of 34.37% and 23.89% on NPLIB1/MassSpecGym, ~3x prior SOTA, with ablations attributing gains to dual-stream + incidence-constrained cross-attention). Neither step reduces to its own inputs by construction: the line-graph bond representation and bidirectional attention are standard topological constructions applied to a new dual-stream split, the performance numbers are measured against held-out benchmark data, and ablations are experimental controls rather than parameter fits renamed as predictions. No self-citations are invoked as load-bearing uniqueness theorems, and no ansatz or renaming of known results occurs. The result remains falsifiable on independent test sets.

Axiom & Free-Parameter Ledger

1 free parameters · 1 axioms · 0 invented entities

The approach rests on standard graph diffusion training assumptions and the domain assumption that line graphs naturally encode bond motifs; no new physical entities are postulated.

free parameters (1)

diffusion schedule and attention hyperparameters
Standard ML training choices whose specific values are not detailed in the abstract but affect reported performance.

axioms (1)

domain assumption Line graph provides a natural mathematical construction for bond space where angles, dihedrals, and rings correspond to local motifs
Invoked to justify the bond-level stream representation.

pith-pipeline@v0.9.0 · 5586 in / 1205 out tokens · 36987 ms · 2026-05-11T01:28:47.744351+00:00 · methodology

discussion (0)

Reference graph

Works this paper leans on

43 extracted references · 43 canonical work pages

[1]

Competitive fragmentation modeling of esi-ms/ms spectra for putative metabolite identification.Metabolomics, 11(1):98–110, 2015

Felicity Allen, Russ Greiner, and David Wishart. Competitive fragmentation modeling of esi-ms/ms spectra for putative metabolite identification.Metabolomics, 11(1):98–110, 2015

work page 2015
[2]

Structured denoising diffusion models in discrete state-spaces.Advances in neural information processing systems, 34:17981–17993, 2021

Jacob Austin, Daniel D Johnson, Jonathan Ho, Daniel Tarlow, and Rianne Van Den Berg. Structured denoising diffusion models in discrete state-spaces.Advances in neural information processing systems, 34:17981–17993, 2021

work page 2021
[3]

The properties of known drugs

Guy W Bemis and Mark A Murcko. The properties of known drugs. 1. molecular frameworks.Journal of medicinal chemistry, 39(15):2887–2893, 1996

work page 1996
[4]

Montgomery Bohde, Mrunali Manjrekar, Runzhong Wang, Shuiwang Ji, and Connor W. Coley. DiffMS: Diffusion generation of molecules conditioned on mass spectra. InProceedings of the 42nd International Conference on Machine Learning, volume 267 ofProceedings of Machine Learning Research, pages 4737–4756. PMLR, 13–19 Jul 2025

work page 2025
[5]

Massspecgym: A benchmark for the discovery and identification of molecules.Advances in Neural Information Processing Systems, 37: 110010–110027, 2024

Roman Bushuiev, Anton Bushuiev, Niek F de Jonge, Adamo Young, Fleming Kretschmer, Raman Samu- sevich, Janne Heirman, Fei Wang, Luke Zhang, Kai Dührkop, et al. Massspecgym: A benchmark for the discovery and identification of molecules.Advances in Neural Information Processing Systems, 37: 110010–110027, 2024

work page 2024
[6]

Environmental Protection Agency

CCTE, U.S. Environmental Protection Agency. Distributed structure-searchable toxicity (DSSTox) database, 2019. URL https://epa.figshare.com/articles/dataset/Chemistry_Dashboard_ Data_DSSTox_Identifiers_Mapped_to_CAS_Numbers_and_Names/5588566

work page arXiv 2019
[7]

Comparative analysis of formula and structure prediction from tandem mass spectra.arXiv preprint arXiv:2601.00941, 2026

Xujun Che, Xiuxia Du, and Depeng Xu. Comparative analysis of formula and structure prediction from tandem mass spectra.arXiv preprint arXiv:2601.00941, 2026

work page arXiv 2026
[8]

Fast sampling via discrete non-markov diffusion models with predetermined transition time.Advances in Neural Information Processing Systems, 37:106870–106905, 2024

Zixiang Chen, Huizhuo Yuan, Yongqian Li, Yiwen Kou, Junkai Zhang, and Quanquan Gu. Fast sampling via discrete non-markov diffusion models with predetermined transition time.Advances in Neural Information Processing Systems, 37:106870–106905, 2024

work page 2024
[9]

Rethinking attention with performers

Krzysztof Marcin Choromanski, Valerii Likhosherstov, David Dohan, Xingyou Song, Andreea Gane, Tamas Sarlos, Peter Hawkins, Jared Quincy Davis, Afroz Mohiuddin, Lukasz Kaiser, et al. Rethinking attention with performers. InInternational Conference on Learning Representations (ICLR), 2021

work page 2021
[10]

Atomistic line graph neural network for improved materials property predictions.npj Computational Materials, 7(1):185, 2021

Kamal Choudhary and Brian DeCost. Atomistic line graph neural network for improved materials property predictions.npj Computational Materials, 7(1):185, 2021

work page 2021
[11]

Molecular formula identification using isotope pattern analysis and calculation of fragmentation trees.Mass Spectrometry, 3(Special_Issue_2):S0037– S0037, 2014

Kai Dührkop, Franziska Hufsky, and Sebastian Böcker. Molecular formula identification using isotope pattern analysis and calculation of fragmentation trees.Mass Spectrometry, 3(Special_Issue_2):S0037– S0037, 2014

work page 2014
[12]

Searching molecular structure databases with tandem mass spectra using csi: Fingerid.Proceedings of the National Academy of Sciences, 112(41):12580–12585, 2015

Kai Dührkop, Huibin Shen, Marvin Meusel, Juho Rousu, and Sebastian Böcker. Searching molecular structure databases with tandem mass spectra using csi: Fingerid.Proceedings of the National Academy of Sciences, 112(41):12580–12585, 2015

work page 2015
[13]

Sirius 4: a rapid tool for turning tandem mass spectra into metabolite structure information.Nature methods, 16(4):299–302, 2019

Kai Dührkop, Markus Fleischauer, Marcus Ludwig, Alexander A Aksenov, Alexey V Melnik, Marvin Meusel, Pieter C Dorrestein, Juho Rousu, and Sebastian Böcker. Sirius 4: a rapid tool for turning tandem mass spectra into metabolite structure information.Nature methods, 16(4):299–302, 2019

work page 2019
[14]

Hoffmann, Daniel Petras, William H

Kai Dührkop, Louis-Félix Nothias, Markus Fleischauer, Raphael Reher, Marcus Ludwig, Martin A. Hoffmann, Daniel Petras, William H. Gerwick, Juho Rousu, Pieter C. Dorrestein, and Sebastian Böcker. Systematic classification of unknown metabolites using high-resolution fragmentation mass spectra.Nature Biotechnology, 39(4):462–471, 2021. 12

work page 2021
[15]

Geometry-enhanced molecular representation learning for property prediction

Xiaomin Fang, Lihang Liu, Jieqiong Lei, Donglong He, Shanzhuo Zhang, Jingbo Zhou, Fan Wang, Hua Wu, and Haifeng Wang. Geometry-enhanced molecular representation learning for property prediction. Nature Machine Intelligence, 4(2):127–134, 2022

work page 2022
[16]

Directional message passing for molecular graphs

Johannes Gasteiger, Janek Groß, and Stephan Günnemann. Directional message passing for molecular graphs. InInternational Conference on Learning Representations, 2020. URL https://openreview. net/forum?id=B1eWbxStPH

work page 2020
[17]

Gemnet: Universal directional graph neural networks for molecules.Advances in neural information processing systems, 34:6790–6802, 2021

Johannes Gasteiger, Florian Becker, and Stephan Günnemann. Gemnet: Universal directional graph neural networks for molecules.Advances in neural information processing systems, 34:6790–6802, 2021

work page 2021
[18]

Annotating metabolite mass spectra with domain-inspired chemical formula transformers.Nature Machine Intelligence, 5(9):965–979, 2023

Samuel Goldman, Jeremy Wohlwend, Martin Stražar, Guy Haroush, Ramnik J Xavier, and Connor W Coley. Annotating metabolite mass spectra with domain-inspired chemical formula transformers.Nature Machine Intelligence, 5(9):965–979, 2023

work page 2023
[19]

Mist-cf: Chemical formula inference from tandem mass spectra.Journal of chemical information and modeling, 64(7):2421–2431, 2023

Samuel Goldman, Jiayi Xin, Joules Provenzano, and Connor W Coley. Mist-cf: Chemical formula inference from tandem mass spectra.Journal of chemical information and modeling, 64(7):2421–2431, 2023

work page 2023
[20]

Ms-bart: Unified modeling of mass spectra and molecules for structure elucidation.arXiv preprint arXiv:2510.20615, 2025

Yang Han, Pengyu Wang, Kai Yu, Xin Chen, and Lu Chen. Ms-bart: Unified modeling of mass spectra and molecules for structure elucidation.arXiv preprint arXiv:2510.20615, 2025

work page arXiv 2025
[21]

Denoising diffusion probabilistic models.Advances in neural information processing systems, 33:6840–6851, 2020

Jonathan Ho, Ajay Jain, and Pieter Abbeel. Denoising diffusion probabilistic models.Advances in neural information processing systems, 33:6840–6851, 2020

work page 2020
[22]

Massbank: a public repository for sharing mass spectral data for life sciences.Journal of mass spectrometry, 45(7):703–714, 2010

Hisayuki Horai, Masanori Arita, Shigehiko Kanaya, Yoshito Nihei, Tasuku Ikeda, Kazuhiro Suwa, Yuya Ojima, Kenichi Tanaka, Satoshi Tanaka, Ken Aoshima, et al. Massbank: a public repository for sharing mass spectral data for life sciences.Journal of mass spectrometry, 45(7):703–714, 2010

work page 2010
[23]

Triplet interaction im- proves graph transformers: Accurate molecular graph learning with triplet graph transformers

Md Shamim Hussain, Mohammed J Zaki, and Dharmashankar Subramanian. Triplet interaction im- proves graph transformers: Accurate molecular graph learning with triplet graph transformers. In Ruslan Salakhutdinov, Zico Kolter, Katherine Heller, Adrian Weller, Nuria Oliver, Jonathan Scarlett, and Felix Berkenkamp, editors,Proceedings of the 41st International ...

work page 2024
[24]

Self- referencing embedded strings (selfies): A 100% robust molecular string representation.Machine Learning: Science and Technology, 1(4):045024, 2020

Mario Krenn, Florian Häse, AkshatKumar Nigam, Pascal Friederich, and Alan Aspuru-Guzik. Self- referencing embedded strings (selfies): A 100% robust molecular string representation.Machine Learning: Science and Technology, 1(4):045024, 2020

work page 2020
[25]

Small molecule machine learning: All models are wrong, some may not even be useful.bioRxiv, pages 2023–03, 2023

Fleming Kretschmer, Jan Seipp, Marcus Ludwig, Gunnar W Klau, and Sebastian Böcker. Small molecule machine learning: All models are wrong, some may not even be useful.bioRxiv, pages 2023–03, 2023

work page 2023
[26]

Neuraldecipher–reverse-engineering extended- connectivity fingerprints (ecfps) to their molecular structures.Chemical science, 11(38):10378–10389, 2020

Tuan Le, Robin Winter, Frank Noé, and Djork-Arné Clevert. Neuraldecipher–reverse-engineering extended- connectivity fingerprints (ecfps) to their molecular structures.Chemical science, 11(38):10378–10389, 2020

work page 2020
[27]

An end-to-end deep learning framework for translating mass spectra to de-novo molecules.Communications Chemistry, 6(1):132, 2023

Eleni E Litsa, Vijil Chenthamarakshan, Payel Das, and Lydia E Kavraki. An end-to-end deep learning framework for translating mass spectra to de-novo molecules.Communications Chemistry, 6(1):132, 2023

work page 2023
[28]

The generation of a unique machine description for chemical structures-a technique developed at chemical abstracts service.Journal of chemical documentation, 5(2):107–113, 1965

Harry L Morgan. The generation of a unique machine description for chemical structures-a technique developed at chemical abstracts service.Journal of chemical documentation, 5(2):107–113, 1965

work page 1965
[29]

Film: Visual reasoning with a general conditioning layer

Ethan Perez, Florian Strub, Harm De Vries, Vincent Dumoulin, and Aaron Courville. Film: Visual reasoning with a general conditioning layer. InProceedings of the AAAI conference on artificial intelligence, volume 32, 2018

work page 2018
[30]

Molec- ular sets (moses): a benchmarking platform for molecular generation models.Frontiers in pharmacology, 11:565644, 2020

Daniil Polykovskiy, Alexander Zhebrak, Benjamin Sanchez-Lengeling, Sergey Golovanov, Oktai Tatanov, Stanislav Belyaev, Rauf Kurbanov, Aleksey Artamonov, Vladimir Aladinskiy, Mark Veselov, et al. Molec- ular sets (moses): a benchmarking platform for molecular generation models.Frontiers in pharmacology, 11:565644, 2020

work page 2020
[31]

MSAnchor: De novo molecular generation from mass spectrometry data with anchor-extended molecular scaffolds

Xiaohan Qin, Chao Wang, Zhengyang Zhou, Linjiang Chen, Wenjie Du, and Yang Wang. MSAnchor: De novo molecular generation from mass spectrometry data with anchor-extended molecular scaffolds. InProceedings of the AAAI Conference on Artificial Intelligence, volume 40, pages 953–961, 2026. doi: 10.1609/aaai.v40i2.37064. 13

work page doi:10.1609/aaai.v40i2.37064 2026
[32]

Metfrag relaunched: incorporating strategies beyond in silico fragmentation.Journal of cheminformatics, 8(1):3, 2016

Christoph Ruttkies, Emma L Schymanski, Sebastian Wolf, Juliane Hollender, and Steffen Neumann. Metfrag relaunched: incorporating strategies beyond in silico fragmentation.Journal of cheminformatics, 8(1):3, 2016

work page 2016
[33]

Critical assessment of small molecule identification 2016: automated methods.Journal of cheminformatics, 9(1):22, 2017

Emma L Schymanski, Christoph Ruttkies, Martin Krauss, Céline Brouard, Tobias Kind, Kai Dührkop, Felicity Allen, Arpana Vaniya, Dries Verdegem, Sebastian Böcker, et al. Critical assessment of small molecule identification 2016: automated methods.Journal of cheminformatics, 9(1):22, 2017

work page 2016
[34]

Coconut online: collection of open natural products database.Journal of Cheminformatics, 13(1):2, 2021

Maria Sorokina, Peter Merseburger, Kohulan Rajan, Mehmet Aziz Yirik, and Christoph Steinbeck. Coconut online: collection of open natural products database.Journal of Cheminformatics, 13(1):2, 2021

work page 2021
[35]

Msnovelist: de novo structure generation from mass spectra.Nature Methods, 19(7):865–870, 2022

Michael A Stravs, Kai Dührkop, Sebastian Böcker, and Nicola Zamboni. Msnovelist: de novo structure generation from mass spectra.Nature Methods, 19(7):865–870, 2022

work page 2022
[36]

De Novo Molecular Generation from Mass Spectra via Many-Body Enhanced Diffusion

Xichen Sun, Wentao Wei, Jiahua Rao, Jiancong Xie, and Yuedong Yang. De Novo Molecular Generation from Mass Spectra via Many-Body Enhanced Diffusion. InProceedings of the AAAI Conference on Artificial Intelligence, volume 40, pages 1042–1050, 2026. doi: 10.1609/aaai.v40i2.37074

work page doi:10.1609/aaai.v40i2.37074 2026
[37]

DiGress: Discrete denoising diffusion for graph generation

Clement Vignac, Igor Krawczuk, Antoine Siraudin, Bohan Wang, V olkan Cevher, and Pascal Frossard. DiGress: Discrete denoising diffusion for graph generation. InInternational Conference on Learning Representations (ICLR), 2023

work page 2023
[38]

Cfm-id 4.0: more accurate esi-ms/ms spectral prediction and compound identification.Analytical chemistry, 93(34): 11692–11700, 2021

Fei Wang, Jaanus Liigand, Siyang Tian, David Arndt, Russell Greiner, and David S Wishart. Cfm-id 4.0: more accurate esi-ms/ms spectral prediction and compound identification.Analytical chemistry, 93(34): 11692–11700, 2021

work page 2021
[39]

Sharing and community curation of mass spectrometry data with global natural products social molecular networking.Nature biotechnology, 34(8):828–837, 2016

Mingxun Wang, Jeremy J Carver, Vanessa V Phelan, Laura M Sanchez, Neha Garg, Yao Peng, Don Duy Nguyen, Jeramie Watrous, Clifford A Kapono, Tal Luzzatto-Knaan, et al. Sharing and community curation of mass spectrometry data with global natural products social molecular networking.Nature biotechnology, 34(8):828–837, 2016

work page 2016
[40]

MADGEN: Mass-spec attends to de novo molecular generation

Yinkai Wang, Xiaohui Chen, Liping Liu, and Soha Hassoun. MADGEN: Mass-spec attends to de novo molecular generation. InInternational Conference on Learning Representations (ICLR), 2025. URL https://openreview.net/forum?id=78tc3EiUrN

work page 2025
[41]

Smiles, a chemical language and information system

David Weininger. Smiles, a chemical language and information system. 1. introduction to methodology and encoding rules.Journal of chemical information and computer sciences, 28(1):31–36, 1988

work page 1988
[42]

Hmdb 5.0: the human metabolome database for 2022

David S Wishart, AnChi Guo, Eponine Oler, Fei Wang, Afia Anjum, Harrison Peters, Raynard Dizon, Zinat Sayeeda, Siyang Tian, Brian L Lee, et al. Hmdb 5.0: the human metabolome database for 2022. Nucleic acids research, 50(D1):D622–D631, 2022

work page 2022
[43]

How to train your neural network for molecular structure generation from mass spectra? In2024 IEEE International Conference on Bioinformatics and Biomedicine (BIBM), pages 817–822

Kai Zhao, Yanmin Liu, Longyang Dian, Shiwei Sun, and Xuefeng Cui. How to train your neural network for molecular structure generation from mass spectra? In2024 IEEE International Conference on Bioinformatics and Biomedicine (BIBM), pages 817–822. IEEE, 2024. A Hyperparameters Table 4 summarizes the architectural hyperparameters of the DualLGD denoising ne...

work page 2024