GLACIER: Rethinking Mass Spectrum Prediction as an Object Detection Problem
Pith reviewed 2026-06-30 08:07 UTC · model grok-4.3
The pith
Treating mass spectrum prediction as single-stage object detection on molecular graphs raises retrieval accuracy and cuts inference time.
A machine-rendered reading of the paper's core claim, the machinery that carries it, and where it could break.
Core claim
We introduce GLACIER, a single-stage transformer-based fragment detection neural network for molecular graphs. Molecular fragmentation is approximated as detecting a set of subgraphs and their associated spectral contributions; the unified formulation removes the need for candidate enumeration and produces globally consistent predictions in one forward pass.
What carries the argument
Single-stage transformer-based fragment detection network that operates directly on molecular graphs to identify subgraphs and spectral contributions without separate candidate generation.
If this is right
- Top-1 retrieval accuracy reaches 70.0 percent on MassSpecGym after contrastive finetuning, up from 64.0 percent.
- Top-1 retrieval accuracy reaches 52.5 percent on NIST'20 after contrastive finetuning, up from 33.2 percent.
- Inference runs nearly eight times faster than the authors' earlier two-stage model.
- Global consistency of predictions improves because fragment detection and scoring occur together rather than sequentially.
Where Pith is reading between the lines
- The same single-pass detection framing could be tested on other graph-to-sequence tasks where enumeration of substructures is currently required.
- If the model generalizes beyond the training distributions, it may enable real-time spectrum prediction inside portable mass spectrometers.
- The approach leaves open whether explicit physical constraints on fragment masses could be added as an auxiliary loss without reintroducing two-stage logic.
Load-bearing premise
Molecular fragmentation patterns can be recovered accurately by detecting subgraphs in one forward pass without enumerating and scoring candidates explicitly.
What would settle it
A test set of experimental spectra where the single-stage model consistently misses fragments that require explicit candidate enumeration or produces lower retrieval accuracy than the prior two-stage baseline on the same data.
Figures
read the original abstract
Predicting tandem mass spectra (MS/MS) from molecular structures represents a central task in analytical chemistry with direct relevance to clinical metabolomics, systems biology, and adjacent disciplines. In this work, we revisit the problem through the lens of object detection on molecular graphs. Molecular fragmentation, a central step in MS/MS prediction, can be approximated as detecting a set of subgraphs (i.e., fragments) and their associated spectral contributions. Existing fragment-based models follow a two-stage paradigm -- first generating candidate fragments and then scoring them -- analogous to two-stage R-CNNs in computer vision. Towards higher accuracy and faster inference, we introduce GLACIER, a single-stage transformer-based fragment detection neural network for molecular graphs. This unified formulation eliminates the need for candidate enumeration, enabling scalable and globally consistent modeling of molecular fragmentation. GLACIER is faster and more accurate than existing state-of-the-art by a significant margin, achieving 70.0% and 69.7% Top-1 retrieval accuracy with and without contrastive finetuning on the MassSpecGym dataset (from the previous SOTA of 64.0%) and 52.5% and 38.5% respectively on the NIST'20 dataset (from 33.2%). Furthermore, GLACIER provides nearly 8-fold inference speedup over our prior two-stage model. Code is available at https://github.com/coleygroup/ms-pred
Editorial analysis
A structured set of objections, weighed in public.
Referee Report
Summary. The manuscript reformulates tandem mass spectrum (MS/MS) prediction as an object detection task on molecular graphs, where a single-stage transformer (GLACIER) directly detects fragments (subgraphs) and their spectral contributions without explicit candidate enumeration. It reports Top-1 retrieval accuracies of 70.0% (with contrastive finetuning) and 69.7% (without) on MassSpecGym (prior SOTA 64.0%) and 52.5%/38.5% on NIST'20 (prior 33.2%), plus an ~8-fold inference speedup over the authors' prior two-stage model, with code released.
Significance. If the single-stage detection produces only valid induced subgraphs with globally additive, m/z-consistent contributions, the reformulation could enable more scalable and accurate MS/MS prediction for metabolomics applications. The explicit code release is a positive contribution to reproducibility.
major comments (3)
- [Abstract] Abstract: The central claims of improved accuracy and 'globally consistent modeling' without candidate enumeration rest on empirical numbers that lack error bars, dataset split details, or ablations demonstrating that detected objects are induced subgraphs and contributions are non-negative/additive/sum-to-spectrum; this directly affects the validity of the single-stage advantage over two-stage baselines.
- [Abstract] Abstract and methods (implied architecture): No mechanism is described for enforcing that every detected object is an induced subgraph of the input molecular graph or that predicted intensities are consistent (e.g., non-negative, correct m/z, globally additive); without this, the 'no-enumeration' and 'globally consistent' claims reduce to an unverified approximation rather than a sound reformulation.
- [Abstract] Abstract: The reported 8-fold speedup and accuracy gains are presented without comparison to the two-stage baseline under matched training data, hyperparameters, or hardware, making it impossible to isolate whether gains derive from the object-detection reformulation or from other implementation differences.
minor comments (1)
- [Abstract] The abstract mentions 'contrastive finetuning' but provides no details on the contrastive loss formulation or how it interacts with the detection head.
Simulated Author's Rebuttal
We thank the referee for their constructive comments, which highlight important aspects of reproducibility and methodological clarity. We address each major comment below with clarifications and indicate where revisions will be made to the manuscript.
read point-by-point responses
-
Referee: [Abstract] Abstract: The central claims of improved accuracy and 'globally consistent modeling' without candidate enumeration rest on empirical numbers that lack error bars, dataset split details, or ablations demonstrating that detected objects are induced subgraphs and contributions are non-negative/additive/sum-to-spectrum; this directly affects the validity of the single-stage advantage over two-stage baselines.
Authors: We agree that error bars, explicit dataset split details, and supporting ablations would strengthen the presentation. In the revised manuscript we will report standard deviations across multiple random seeds for the Top-1 accuracies, provide the precise train/validation/test splits for both MassSpecGym and NIST'20, and add a dedicated analysis subsection that verifies detected objects correspond to induced subgraphs and that predicted contributions are non-negative, m/z-consistent, and globally additive. revision: yes
-
Referee: [Abstract] Abstract and methods (implied architecture): No mechanism is described for enforcing that every detected object is an induced subgraph of the input molecular graph or that predicted intensities are consistent (e.g., non-negative, correct m/z, globally additive); without this, the 'no-enumeration' and 'globally consistent' claims reduce to an unverified approximation rather than a sound reformulation.
Authors: The current manuscript describes the graph-transformer architecture at a high level but does not explicitly detail the enforcement mechanisms. We will expand the Methods section to clarify that node-subset predictions are constrained to connected induced subgraphs by construction (via the graph attention and masking scheme) and that intensity outputs are passed through a non-negativity activation while m/z values are computed directly from subgraph composition, with global additivity obtained by summation over all detections. revision: yes
-
Referee: [Abstract] Abstract: The reported 8-fold speedup and accuracy gains are presented without comparison to the two-stage baseline under matched training data, hyperparameters, or hardware, making it impossible to isolate whether gains derive from the object-detection reformulation or from other implementation differences.
Authors: The reported speedup is measured on identical hardware against our previously published two-stage model; however, we acknowledge that a fully matched training comparison would better isolate the contribution of the single-stage reformulation. In the revision we will add a paragraph clarifying the sources of the observed differences and, where computationally feasible, include a controlled re-training experiment under matched hyperparameters. revision: partial
Circularity Check
Minor self-reference to prior model; central results are empirical benchmarks on external data
full rationale
The paper's accuracy claims (70.0% and 52.5% Top-1 retrieval) are measured directly on public external datasets (MassSpecGym, NIST'20) against prior SOTA numbers, not derived from or reduced to the authors' own equations or fitted parameters. The single self-reference to 'our prior two-stage model' appears only in the speedup claim and is not load-bearing for the reformulation or accuracy results. No self-definitional steps, fitted-input predictions, or uniqueness theorems imported from overlapping authors are present in the abstract or described claims.
Axiom & Free-Parameter Ledger
free parameters (1)
- model hyperparameters and training schedule
axioms (2)
- domain assumption Molecular fragmentation can be approximated as detecting a set of subgraphs and their spectral contributions
- domain assumption A single forward pass through a transformer on the molecular graph yields globally consistent fragment predictions
Reference graph
Works this paper leans on
-
[1]
Competitive fragmentation modeling of esi-ms/ms spectra for putative metabolite identification.Metabolomics, 11:98–110, 2015
Felicity Allen, Russ Greiner, and David Wishart. Competitive fragmentation modeling of esi-ms/ms spectra for putative metabolite identification.Metabolomics, 11:98–110, 2015
2015
-
[2]
The properties of known drugs
Guy W Bemis and Mark A Murcko. The properties of known drugs. 1. molecular frameworks. Journal of medicinal chemistry, 39(15):2887–2893, 1996
1996
-
[3]
Massspecgym: A benchmark for the discovery and identification of molecules
Roman Bushuiev, Anton Bushuiev, Niek F de Jonge, Adamo Young, Fleming Kretschmer, Raman Samusevich, Janne Heirman, Fei Wang, Luke Zhang, Kai Dührkop, et al. Massspecgym: A benchmark for the discovery and identification of molecules. InThe Thirty-eight Conference on Neural Information Processing Systems Datasets and Benchmarks Track, 2024. 10
2024
-
[4]
Nicolas Carion, Francisco Massa, Gabriel Synnaeve, Nicolas Usunier, Alexander Kirillov, and Sergey Zagoruyko. End-to-end object detection with transformers, 2020. URL https: //arxiv.org/abs/2005.12872
-
[5]
Fast r-cnn
Ross Girshick. Fast r-cnn. InProceedings of the IEEE international conference on computer vision, pages 1440–1448, 2015
2015
-
[6]
Rich feature hierarchies for accurate object detection and semantic segmentation
Ross Girshick, Jeff Donahue, Trevor Darrell, and Jitendra Malik. Rich feature hierarchies for accurate object detection and semantic segmentation. InProceedings of the IEEE conference on computer vision and pattern recognition, pages 580–587, 2014
2014
-
[7]
Prefix-tree decoding for predicting mass spectra from molecules.Advances in Neural Information Processing Systems, 36:48548–48572, 2023
Samuel Goldman, John Bradshaw, Jiayi Xin, and Connor Coley. Prefix-tree decoding for predicting mass spectra from molecules.Advances in Neural Information Processing Systems, 36:48548–48572, 2023
2023
-
[8]
Generating molecular fragmentation graphs with autoregressive neural networks.Analytical Chemistry, 96(8):3419–3428, 2024
Samuel Goldman, Janet Li, and Connor W Coley. Generating molecular fragmentation graphs with autoregressive neural networks.Analytical Chemistry, 96(8):3419–3428, 2024
2024
-
[9]
Mask r-cnn
Kaiming He, Georgia Gkioxari, Piotr Dollár, and Ross Girshick. Mask r-cnn. InProceedings of the IEEE international conference on computer vision, pages 2961–2969, 2017
2017
-
[10]
Spectral entropy outperforms ms/ms dot product similarity for small-molecule compound identification.Nature Methods, 18(12):1524–1531, 2021
Yuanyue Li, Tobias Kind, Jacob Folz, Arpana Vaniya, Sajjan Singh Mehta, and Oliver Fiehn. Spectral entropy outperforms ms/ms dot product similarity for small-molecule compound identification.Nature Methods, 18(12):1524–1531, 2021
2021
-
[11]
Hongxuan Liu, Roman Bushuiev, Ivy Lightheart, Mrunali Manjrekar, Anton Bushuiev, Mag- dalena Lederbauer, Filip Jozefov, Yinkai Wang, Soha Hassoun, Josef Sivic, James Taylor, Runzhong Wang, David Healey, Tomáš Pluskal, and Connor W. Coley. Massspecgym in the wild: Uncovering and correcting evaluation pitfalls in ai-driven molecule discovery, 2026. URL http...
work page internal anchor Pith review Pith/arXiv arXiv 2026
-
[12]
Ssd: Single shot multibox detector
Wei Liu, Dragomir Anguelov, Dumitru Erhan, Christian Szegedy, Scott Reed, Cheng-Yang Fu, and Alexander C Berg. Ssd: Single shot multibox detector. InEuropean conference on computer vision, pages 21–37. Springer, 2016
2016
-
[13]
An end-to-end transformer model for 3d object detection, 2021
Ishan Misra, Rohit Girdhar, and Armand Joulin. An end-to-end transformer model for 3d object detection, 2021. URLhttps://arxiv.org/abs/2109.08141
-
[14]
Efficiently predicting high resolution mass spectra with graph neural networks
Michael Murphy, Stefanie Jegelka, Ernest Fraenkel, Tobias Kind, David Healey, and Thomas Butler. Efficiently predicting high resolution mass spectra with graph neural networks. In International Conference on Machine Learning, pages 25549–25562. PMLR, 2023
2023
-
[15]
NIST standard reference database
NIST. NIST standard reference database. National Institute of Standards and Technology, 2020. URLhttps://www.nist.gov/srd
2020
-
[16]
Fiora: Local neighborhood-based prediction of compound mass spectra from single fragmentation events.Nature Communications, 16(1):2298, 2025
Yannek Nowatzky, Francesco Friedrich Russo, Jan Lisec, Alexander Kister, Knut Reinert, Thilo Muth, and Philipp Benner. Fiora: Local neighborhood-based prediction of compound mass spectra from single fragmentation events.Nature Communications, 16(1):2298, 2025
2025
-
[17]
Language model-guided anticipation and discovery of mammalian metabolites.Nature, pages 1–10, 2026
Hantao Qiang, Fei Wang, Wenyun Lu, Xi Xing, Hahn Kim, Sandrine AM Mérette, Lucas B Ayres, Eponine Oler, Jenna E AbuSalim, Asael Roichman, et al. Language model-guided anticipation and discovery of mammalian metabolites.Nature, pages 1–10, 2026
2026
-
[18]
You Only Look Once: Unified, Real-Time Object Detection
Joseph Redmon, Santosh Kumar Divvala, Ross B. Girshick, and Ali Farhadi. You only look once: Unified, real-time object detection.CoRR, abs/1506.02640, 2015. URL http://arxiv. org/abs/1506.02640
work page internal anchor Pith review Pith/arXiv arXiv 2015
-
[19]
Faster r-cnn: Towards real-time object detection with region proposal networks, 2016
Shaoqing Ren, Kaiming He, Ross Girshick, and Jian Sun. Faster r-cnn: Towards real-time object detection with region proposal networks, 2016. URL https://arxiv.org/abs/1506. 01497
2016
-
[20]
Lars Ridder, Justin J. J. van der Hooft, and Stefan Verhoeven. Automatic compound annotation from mass spectrometry data using magma.Mass Spectrometry, 3(Special_Issue_2):S0033– S0033, 2014. doi: 10.5702/massspectrometry.S0033. 11
-
[21]
Metfrag relaunched: incorporating strategies beyond in silico fragmentation.Journal of cheminformatics, 8:1–16, 2016
Christoph Ruttkies, Emma L Schymanski, Sebastian Wolf, Juliane Hollender, and Steffen Neumann. Metfrag relaunched: incorporating strategies beyond in silico fragmentation.Journal of cheminformatics, 8:1–16, 2016
2016
-
[22]
Exact solutions to the nonlinear dynamics of learning in deep linear neural networks
Andrew M. Saxe, James L. McClelland, and Surya Ganguli. Exact solutions to the nonlinear dynamics of learning in deep linear neural networks, 2014. URLhttps://arxiv.org/abs/ 1312.6120
work page internal anchor Pith review Pith/arXiv arXiv 2014
-
[23]
Austin Tripp and José Miguel Hernández-Lobato. Genetic algorithms are strong baselines for molecule generation.arXiv preprint arXiv:2310.09267, 2023
-
[24]
A spectroscopic test suggests that fragment ion structure annotations in ms/ms libraries are frequently incorrect.Communications Chemistry, 7(1):30, 2024
Lara van Tetering, Sylvia Spies, Quirine DK Wildeman, Kas J Houthuijs, Rianne E van Outersterp, Jonathan Martens, Ron A Wevers, David S Wishart, Giel Berden, and Jos Oomens. A spectroscopic test suggests that fragment ion structure annotations in ms/ms libraries are frequently incorrect.Communications Chemistry, 7(1):30, 2024
2024
-
[25]
Cfm-id 4.0: more accurate esi-ms/ms spectral prediction and compound identification.Analyti- cal chemistry, 93(34):11692–11700, 2021
Fei Wang, Jaanus Liigand, Siyang Tian, David Arndt, Russell Greiner, and David S Wishart. Cfm-id 4.0: more accurate esi-ms/ms spectral prediction and compound identification.Analyti- cal chemistry, 93(34):11692–11700, 2021
2021
- [26]
-
[27]
Combinatorial learning of graph edit distance via dynamic embedding
Runzhong Wang, Tianqi Zhang, Tianshu Yu, Junchi Yan, and Xiaokang Yang. Combinatorial learning of graph edit distance via dynamic embedding. InProceedings of the IEEE/CVF conference on computer vision and pattern recognition, pages 5241–5250, 2021
2021
-
[28]
LinSATNet: The positive linear satisfiability neural networks
Runzhong Wang, Yunhao Zhang, Ziao Guo, Tianyi Chen, Xiaokang Yang, and Junchi Yan. LinSATNet: The positive linear satisfiability neural networks. InInternational Conference on Machine Learning (ICML), 2023
2023
-
[29]
Runzhong Wang, Mrunali Manjrekar, Babak Mahjour, Julian Avila-Pacheco, Joules Provenzano, Erin Reynolds, Magdalena Lederbauer, Eivgeni Mashin, Samuel Goldman, Mingxun Wang, Jing-Ke Weng, Desirée L. Plata, Clary B. Clish, and Connor W. Coley. Neural spectral prediction for structure elucidation with tandem mass spectrometry.bioRxiv, 2025. doi: 10.1101/2025...
-
[30]
Yu Wang, Fan Yang, Kaikun Xu, Li Yuan, Jun Zhu, Jingjie Zhang, Zhenchao Tang, Yatao Bian, Cheng Chang, Yonghong Tian, and Jianhua Yao. Molspecflow: Mass-constrained hybrid flow matching for joint molecular-spectral analysis.bioRxiv, 2026. doi: 10.64898/2026.01.28.702438. URLhttps://www.biorxiv.org/content/early/2026/02/01/2026.01.28.702438
-
[31]
Rapid prediction of electron– ionization mass spectrometry using neural networks.ACS central science, 5(4):700–708, 2019
Jennifer N Wei, David Belanger, Ryan P Adams, and D Sculley. Rapid prediction of electron– ionization mass spectrometry using neural networks.ACS central science, 5(4):700–708, 2019
2019
-
[32]
Do transformers really perform bad for graph representation?, 2021
Chengxuan Ying, Tianle Cai, Shengjie Luo, Shuxin Zheng, Guolin Ke, Di He, Yanming Shen, and Tie-Yan Liu. Do transformers really perform bad for graph representation?, 2021. URL https://arxiv.org/abs/2106.05234
-
[33]
Tandem mass spectrum prediction for small molecules using graph transformers.Nature Machine Intelligence, 6(4):404–416, 2024
Adamo Young, Hannes Röst, and Bo Wang. Tandem mass spectrum prediction for small molecules using graph transformers.Nature Machine Intelligence, 6(4):404–416, 2024
2024
-
[34]
Adamo Young, Fei Wang, David Wishart, Bo Wang, Hannes Röst, and Russ Greiner. Fragnnet: A deep probabilistic model for mass spectrum prediction.arXiv preprint arXiv:2404.02360, 2024
-
[35]
Hao Zhu, Liping Liu, and Soha Hassoun. Using graph neural networks for mass spectrometry prediction.arXiv preprint arXiv:2010.04661, 2020. 12 Technical Appendices and Supplementary Materials A Graphormer Node and Edge Features To construct more expressive node and edge representations for the molecular graph, we incorporate chemically informative features...
discussion (0)
Sign in with ORCID, Apple, or X to comment. Anyone can read and Pith papers without signing in.