The Language of Elution: Autoregressive Prediction of the Next Feature in Untargeted LC-HRMS Lipidomics

Dayanjan S. Wijesinghe

arxiv: 2606.05225 · v1 · pith:UJVVN3AUnew · submitted 2026-06-02 · 🧬 q-bio.QM · cs.LG

The Language of Elution: Autoregressive Prediction of the Next Feature in Untargeted LC-HRMS Lipidomics

Dayanjan S. Wijesinghe This is my paper

Pith reviewed 2026-06-28 07:45 UTC · model grok-4.3

classification 🧬 q-bio.QM cs.LG

keywords lipidomicsLC-HRMSautoregressive predictionelution orderLSTMTransformeruntargeted metabolomicsmass spectrometry

0 comments

The pith

Reversed-phase elution forms a predictable autoregressive sequence that models forecast from per-feature statistics alone at 98 percent top-1 accuracy.

A machine-rendered reading of the paper's core claim, the machinery that carries it, and where it could break.

The paper reframes chromatographic elution order as a language-like autoregressive prediction problem. Successive features in reversed-phase LC-HRMS form a physically constrained sequence driven by hydrophobicity, so the next m/z bin can be predicted from five annotation-free tokens: m/z bin, mass defect, retention-time gap, polarity, and intensity rank. LSTM and Transformer models trained on 15,242 features from 342 plasma samples reach 98.4 percent and 98.0 percent top-1 accuracy on held-out data, with ablation showing the sequential context supplies 55.5 percentage points of that performance. Accuracy holds across instruments that share the method but drops sharply with changed column chemistry or polarity, and recovers with two to five fine-tuning injections. The work therefore claims that elution sequences are learnable without chemical identity or databases and can support predictive MS/MS acquisition.

Core claim

Treating elution as an autoregressive sequence task, LSTM and Transformer models trained on annotation-free per-feature statistics from four lipidomics cohorts predict the next m/z bin at 98.4 percent top-1 accuracy (99.99 percent top-5) and 98.0 percent respectively; ablation isolates autoregressive context as the dominant driver while no individual feature adds more than 0.2 percentage points, and transfer succeeds only under matched method and polarity.

What carries the argument

Autoregressive sequence models (LSTM and Transformer) that treat discretized m/z bins as tokens and predict the next bin from the five annotation-free per-token features.

If this is right

Elution sequences are highly predictable from sequential context rather than individual molecular properties.
Models trained on one instrument transfer with near-perfect correlation to another instrument using the identical chromatographic method.
Accuracy collapses under altered column chemistry or polarity mode, confirming the prediction is method- and mode-specific.
Fine-tuning on two to five quality-control injections restores nearly 50 percent top-1 accuracy after a condition change.
The approach supplies a route to predictive rather than reactive MS/MS acquisition that could raise annotation coverage in untargeted lipidomics.

Where Pith is reading between the lines

These are editorial extensions of the paper, not claims the author makes directly.

If elution order is this predictable, an instrument could pre-select the next precursor for fragmentation before the ion appears, directly addressing the reactive acquisition bottleneck.
Each distinct LC method appears to possess its own characteristic elution grammar that must be learned separately.
Because training uses only per-feature statistics, the same modeling strategy could be applied to other untargeted metabolomics datasets without requiring spectral libraries or structural annotations.

Load-bearing premise

That reversed-phase elution order forms a physically constrained autoregressive sequence governed primarily by hydrophobicity that can be learned from annotation-free per-feature statistics without reference to chemical identity.

What would settle it

Top-1 accuracy falling below 10 percent on a held-out set acquired with the same instrument but a different reversed-phase column chemistry would falsify the claim that the sequence is learnable and method-specific in the stated way.

Figures

Figures reproduced from arXiv: 2606.05225 by Dayanjan S. Wijesinghe.

**Figure 1.** Figure 1: Study overview. (A) Current LC-MS/MS acquisition is reactive: the instrument detects ions, then selects precursors for fragmentation. Predictive acquisition would forecast upcoming features before they elute, enabling pre-configured MS/MS parameters. (B) Tokenization scheme. Each detected feature is encoded as a composite token with five input fields derived from the feature table, none of which require st… view at source ↗

**Figure 2.** Figure 2: Physicochemical basis of elution order in reversed-phase lipidomics. (A) Retention time distributions for 12 lipid classes across all four training cohorts (n = 1,498 annotated features). Classes are ordered from most polar (acylcarnitine, left) to most hydrophobic (cholesteryl ester, right). Horizontal bar indicates median; individual points are jittered within each violin. Note the extensive overlap betw… view at source ↗

**Figure 3.** Figure 3: Model performance and dataset context. (A) Top-1 accuracy on next-m/z-bin prediction (110 classes, test set; neural models n = 182,902, baselines n = 186,052). The LSTM (98.38%) and Transformer (98.05%) far exceed all baselines, with the joint (RT, m/z) Markov model (56.8%) as the strongest non-neural comparator. Error bars indicate 95% bootstrap confidence intervals. (B) Training dynamics. Validation top-… view at source ↗

**Figure 4.** Figure 4: Feature ablation and data efficiency. (A) Accuracy drop from the full model when each input feature is removed via embedding zeroing (blue bars) or when autoregressive context is eliminated (red bar, context window = 1 token). Removing sequence context causes a 55.5 percentage-point drop; no individual feature contributes more than 0.19 pp. (B) Data efficiency curve. Top-1 accuracy as a function of trainin… view at source ↗

**Figure 5.** Figure 5: QC warm-up experiment: a null result. (A) Dose–response analysis. Top-1 accuracy as a function of the number of QC injection sequences used to condition the LSTM hidden state before evaluation (N = 0–10). The curve is flat, with no benefit from additional QC conditioning (maximum delta = 0.001%). (B) Comparison of four conditioning modes: cold start, prime-only (use QC-conditioned hidden state for first pr… view at source ↗

**Figure 6.** Figure 6: Cross-platform validation: models are chromatographic-method-specific. (A) Retention time correlation between the training cohorts and ST000983 (Agilent 6530 QTOF, same CSH C18 chromatographic method). Each point represents one of 242 matched lipids (r = 0.9993, MAE = 4.7 s). The nearperfect correlation demonstrates that elution order transfers across mass spectrometer platforms when the chromatographic … view at source ↗

**Figure 7.** Figure 7: Transfer learning recovers cross-polarity performance. Held-out analytical top-1 accuracy on ST000990 (positive-ESI only) as a function of the number of QC calibration injections N, for full finetuning (all parameters), head-only fine-tuning (output layer only), and a calibration-only Markov prior. Full fine-tuning recovers from the 2.6% zero-shot floor (dashed) toward ∼50%—and to 99.6% on a held-out QC i… view at source ↗

**Figure 8.** Figure 8: Dark lipidome annotation via dual mass and retention time filtering. (A) Filtering funnel. Of 13,266 unannotated consensus features, approximately 4,700 fell within the m/z range covered by the LipidMaps and LipidBlast reference databases. Dual filtering (±10 ppm mass tolerance, ±0.5 min RT tolerance) recovered 168 unique putative lipid annotations. (B) Distribution of recovered annotations (colored point… view at source ↗

**Figure 9.** Figure 9: Position-dependent prediction accuracy. Top-1 accuracy on next-m/z-bin prediction as a function of sequence position (token index within the chromatographic run) for the LSTM (blue line), averaged across all test samples with a 50-token moving average. Accuracy is approximately 98% and uniform across the entire run, with no systematic degradation at early positions (where the context window is not yet ful… view at source ↗

read the original abstract

Untargeted liquid chromatography-high-resolution mass spectrometry (LC-HRMS) detects thousands of molecular features per sample, yet only 2-20% receive confident structural annotations. A root cause of this "dark metabolome" is that tandem MS/MS acquisition is reactive: instruments select precursors only after ions appear, blind to what elutes next. We reframe chromatographic elution as an autoregressive sequence prediction task. Because reversed-phase elution order is governed by hydrophobicity, successive features form a physically constrained sequence, like tokens in language. We discretize the mass-to-charge (m/z) axis into 110 bins and train long short-term memory (LSTM) and Transformer models to predict the next eluting m/z bin from five annotation-free per-token features: m/z bin, mass defect, retention-time gap, polarity, and intensity rank. Trained on 15,242 features from four clinical lipidomics cohorts (342 plasma samples; SCIEX TripleTOF 6600+, Waters CSH C18), the LSTM reaches 98.4% top-1 accuracy (99.99% top-5; mean absolute error 3.6 Da) and the Transformer 98.0%. Ablation shows autoregressive context accounts for 55.5 percentage points while no single feature contributes more than 0.2 pp: the sequential pattern, not molecular properties, drives prediction. Models transfer across instruments sharing the method (r=0.999 on an independent Agilent 6530 dataset) but fail under a different column chemistry (5.1% top-1) or polarity mode (2.6%), confirming method- and mode-specificity. Fine-tuning on as few as two to five quality-control injections recovers held-out accuracy from 2.6% to nearly 50%, so cross-condition deployment needs minimal calibration. These results establish that elution sequences are highly predictable and lay the groundwork for predictive MS/MS acquisition to improve annotation coverage in untargeted metabolomics.

Editorial analysis

A structured set of objections, weighed in public.

Desk editor's note, referee report, simulated authors' rebuttal, and a circularity audit. Tearing a paper down is the easy half of reading it; the pith above is the substance, this is the friction.

Desk Editor's Note private letter to a colleague

High reported accuracy on elution prediction but intensity rank is non-causal so the model does not actually support the online predictive MS/MS use case.

read the letter

The paper treats reversed-phase elution order as an autoregressive sequence and trains LSTM and Transformer models to predict the next m/z bin from five annotation-free features. They get 98.4% top-1 accuracy on held-out features from four cohorts and show that removing the autoregressive context drops performance by 55 points while dropping any single feature costs almost nothing.

The transfer results across instruments on the same column chemistry and the recovery after fine-tuning on two to five QC injections are the parts that actually look useful. Those tests give some evidence that the pattern is reproducible within a method.

The problem is the intensity rank feature. In the online regime needed for predictive acquisition you only see the first k features, so their ranks are not yet fixed because stronger peaks may still be coming. The other four features are causal, but rank is not, and the paper does not show results without it. That undercuts the claim that the sequence can be learned from per-feature statistics in the setting that matters.

Methods details on data splits, exact feature computation, and whether rank was computed globally or within the prefix are not visible here, so leakage or post-hoc choices cannot be ruled out. The failure on different column chemistry or polarity is expected and not a flaw.

This is for metabolomics groups trying to move from reactive to predictive MS/MS. A reader who wants to test sequence models on chromatography data would find the framing and numbers worth checking, but the causality gap needs to be closed first.

Send it to review so the authors can demonstrate performance with only causal features or clarify how rank would be handled in deployment.

Referee Report

2 major / 1 minor

Summary. The paper reframes untargeted LC-HRMS lipidomics elution as an autoregressive sequence prediction task. It discretizes m/z into 110 bins and trains LSTM and Transformer models on five annotation-free per-feature statistics (m/z bin, mass defect, RT gap, polarity, intensity rank) from 15,242 features across 342 plasma samples. The LSTM achieves 98.4% top-1 accuracy (99.99% top-5, MAE 3.6 Da) on held-out data; ablation attributes 55.5 pp to autoregressive context. Models transfer across instruments with the same method (r=0.999) but not column chemistry or polarity, and fine-tuning on 2-5 QC injections recovers performance. The work aims to enable predictive MS/MS acquisition.

Significance. If the central empirical result holds under causal feature definitions, the high accuracies, the dominance of sequential context over individual features, the method-specific transfer, and the minimal-calibration fine-tuning results would be notable strengths. They provide concrete evidence that elution order is highly predictable from annotation-free statistics and directly support the motivating application of proactive MS/MS triggering to increase annotation coverage. The transfer and few-shot adaptation experiments are particularly useful for practical deployment.

major comments (2)

[Abstract and Results (feature definition and ablation)] Abstract (feature list) and Results (ablation and accuracy claims): Intensity rank is defined as a feature's position in the global sorted intensity list of all detected features in the sample. In the autoregressive regime required for predictive MS/MS (predicting the (k+1)th feature after observing only the first k), the ranks of the observed features are not yet determined because later-eluting features may have higher intensities and reorder the ranking. This renders the feature non-causal and unavailable at inference time. The reported 98.4% top-1 accuracy, 99.99% top-5, and the 55.5 pp ablation gain for autoregressive context therefore rest on a statistic that cannot be computed in the online setting the paper targets. The other four features are causal, but the inclusion of intensity rank undermines the claim that the sequence can be learned from annotation-free per-feature statist
[Methods and Results (transfer and ablation)] Methods (model training and evaluation) and Results (transfer tests): The paper reports strong transfer (r=0.999) only across instruments sharing the identical reversed-phase method and failure (5.1% and 2.6% top-1) under changed column chemistry or polarity. While this correctly demonstrates method specificity, the evaluation does not include a controlled test of the model under strictly causal feature computation (i.e., intensity ranks computed only from features observed so far). Such a test is required to establish whether the high accuracies survive removal or causal approximation of the non-causal feature.

minor comments (1)

[Methods] The discretization into exactly 110 m/z bins and the precise definition of mass defect and RT gap should be stated with explicit formulas or bin boundaries in the Methods section for reproducibility.

Simulated Author's Rebuttal

2 responses · 0 unresolved

We thank the referee for their thorough review and for pointing out the critical issue with the causality of the intensity rank feature in our autoregressive model. This comment is well-taken and directly impacts the applicability to the predictive MS/MS use case. We address the points below and will revise the manuscript accordingly.

read point-by-point responses

Referee: Abstract (feature list) and Results (ablation and accuracy claims): Intensity rank is defined as a feature's position in the global sorted intensity list of all detected features in the sample. In the autoregressive regime required for predictive MS/MS (predicting the (k+1)th feature after observing only the first k), the ranks of the observed features are not yet determined because later-eluting features may have higher intensities and reorder the ranking. This renders the feature non-causal and unavailable at inference time. The reported 98.4% top-1 accuracy, 99.99% top-5, and the 55.5 pp ablation gain for autoregressive context therefore rest on a statistic that cannot be computed in the online setting the paper targets. The other four features are causal, but the inclusion of intensity rank undermines the claim that the sequence can be learned from annotation-free per-feature statist

Authors: We agree with this assessment. The global intensity rank is indeed non-causal for online prediction. However, our ablation analysis indicates that individual features contribute minimally (≤0.2 pp), with the autoregressive context providing the majority of the predictive power (55.5 pp). We will revise the manuscript to remove intensity rank from the feature set and re-train/evaluate the models to demonstrate that performance is largely preserved. Additionally, we will include an ablation using a causal intensity rank (computed only from features observed up to the current point) to further validate the results under strictly causal conditions. These changes will be reflected in the revised version. revision: yes
Referee: Methods (model training and evaluation) and Results (transfer tests): The paper reports strong transfer (r=0.999) only across instruments sharing the identical reversed-phase method and failure (5.1% and 2.6% top-1) under changed column chemistry or polarity. While this correctly demonstrates method specificity, the evaluation does not include a controlled test of the model under strictly causal feature computation (i.e., intensity ranks computed only from features observed so far). Such a test is required to establish whether the high accuracies survive removal or causal approximation of the non-causal feature.

Authors: We acknowledge that the reported transfer and ablation results were obtained using the original non-causal feature set. To address this, we will perform additional experiments in the revised manuscript where intensity rank is either omitted or replaced with its causal counterpart. We will report accuracies, ablations, and transfer performance under these causal feature definitions to confirm the robustness of our findings. This will directly test whether the central empirical results hold without the non-causal feature. revision: yes

Circularity Check

0 steps flagged

No circularity; empirical ML evaluation on held-out data is self-contained

full rationale

The paper reports LSTM/Transformer accuracies (98.4% top-1) and ablations directly from training and testing on held-out features extracted from real LC-HRMS runs. No equations, parameter fits, or self-citations are used to derive the accuracy numbers; they are measured outcomes on unseen sequences. The five input features are computed from the data and supplied to the model; the reported performance is therefore an external empirical result rather than a reduction to the inputs by construction. Transfer tests on an independent instrument further provide an external check. The intensity-rank feature may limit online applicability, but that is a correctness or deployment issue, not a circularity in the derivation chain.

Axiom & Free-Parameter Ledger

2 free parameters · 2 axioms · 0 invented entities

The approach rests on the assumption that hydrophobicity imposes a strong sequential constraint on elution order that is learnable from local per-feature statistics. No new physical entities are postulated. The 110-bin discretization and choice of five features are modeling decisions rather than fitted constants.

free parameters (2)

m/z bin count
110 bins chosen to discretize the m/z axis for sequence modeling.
context length
Five prior tokens used as input; value not justified beyond empirical performance.

axioms (2)

domain assumption Reversed-phase elution order is governed by hydrophobicity and forms a physically constrained sequence.
Invoked in the abstract to justify the autoregressive framing.
domain assumption Annotation-free per-token features suffice to capture the sequential pattern.
Stated via the ablation result that context dominates over individual features.

pith-pipeline@v0.9.1-grok · 5908 in / 1446 out tokens · 15884 ms · 2026-06-28T07:45:05.476337+00:00 · methodology

discussion (0)

Reference graph

Works this paper leans on

68 extracted references · 59 canonical work pages · 1 internal anchor

[1]

Lipidomics reveals a remarkable diversity of lipids in human plasma.J Lipid Res, 51(11):3299–3305, 2010

Oswald Quehenberger, Aaron M Armando, Alex H Brown, et al. Lipidomics reveals a remarkable diversity of lipids in human plasma.J Lipid Res, 51(11):3299–3305, 2010. doi: 10.1194/jlr.M009449

work page doi:10.1194/jlr.m009449 2010
[2]

Understanding the diversity of membrane lipid composition

Takeshi Harayama and Howard Riezman. Understanding the diversity of membrane lipid composition. Nat Rev Mol Cell Biol, 19(5):281–296, 2018. doi: 10.1038/nrm.2017.138

work page doi:10.1038/nrm.2017.138 2018
[3]

The mystery of membrane organization: composition, regulation and roles of lipid rafts.Nat Rev Mol Cell Biol, 18(6):361–374,

Erdinc Sezgin, Ilya Levental, Satyajit Mayor, and Christian Eggeling. The mystery of membrane organization: composition, regulation and roles of lipid rafts.Nat Rev Mol Cell Biol, 18(6):361–374,
[4]

doi: 10.1038/nrm.2017.16

work page doi:10.1038/nrm.2017.16 2017
[5]

Olzmann and Pedro Carvalho

James A. Olzmann and Pedro Carvalho. Dynamics and functions of lipid droplets.Nat Rev Mol Cell Biol, 20(3):137–155, 2019. doi: 10.1038/s41580-018-0085-z

work page doi:10.1038/s41580-018-0085-z 2019
[6]

Dennis and Paul C

Edward A. Dennis and Paul C. Norris. Eicosanoid storm in infection and inflammation.Nat Rev Immunol, 15(8):511–523, 2015. doi: 10.1038/nri3859

work page doi:10.1038/nri3859 2015
[7]

Hannun and Lina M

Yusuf A. Hannun and Lina M. Obeid. Sphingolipids and their metabolism in physiology and disease. Nat Rev Mol Cell Biol, 19(3):175–191, 2018. doi: 10.1038/nrm.2017.107

work page doi:10.1038/nrm.2017.107 2018
[8]

Wymann and Roger Schneiter

Matthias P. Wymann and Roger Schneiter. Lipid signalling in disease.Nat Rev Mol Cell Biol, 9(2): 162–176, 2008. doi: 10.1038/nrm2335

work page doi:10.1038/nrm2335 2008
[9]

Buring, Lina Badimon, Göran K

Peter Libby, Julie E. Buring, Lina Badimon, Göran K. Hansson, John Deanfield, Márcio S. Bittencourt, Lale Tokgözo˘glu, and Eldrin F. Lewis. Atherosclerosis.Nat Rev Dis Primers, 5(1):56, 2019. doi: 10.1038/s41572-019-0106-z

work page doi:10.1038/s41572-019-0106-z 2019
[10]

Lydic and Young-Hwa Goo

Todd A. Lydic and Young-Hwa Goo. Lipidomics unveils the complexity of the lipidome in metabolic diseases.Clinical and Translational Medicine, 7, 2018. doi: 10.1186/s40169-018-0182-9. URL https://www.ncbi.nlm.nih.gov/pmc/articles/PMC5786598/

work page doi:10.1186/s40169-018-0182-9 2018
[11]

Lipidomes in health and dis- ease: Analytical strategies and considerations.TrAC Trends in Analytical Chemistry, 120:115664,

Fang Wei, Santosh Lamichhane, Matej Oreši ˇc, and Tuulia Hyötyläinen. Lipidomes in health and dis- ease: Analytical strategies and considerations.TrAC Trends in Analytical Chemistry, 120:115664,
[12]

URLhttps://www.sciencedirect.com/science/ article/pii/S0165993619303723

doi: 10.1016/j.trac.2019.115664. URLhttps://www.sciencedirect.com/science/ article/pii/S0165993619303723

work page doi:10.1016/j.trac.2019.115664 2019
[13]

Smilowitz, and Oliver Fiehn

Tomas Cajka, Jennifer T. Smilowitz, and Oliver Fiehn. Validating quantitative untargeted lipidomics across nine liquid chromatography–high-resolution mass spectrometry platforms.Anal Chem, 89(22): 12360–12368, 2017. doi: 10.1021/acs.analchem.7b03404

work page doi:10.1021/acs.analchem.7b03404 2017
[14]

Patti, Oscar Yanes, and Gary Siuzdak

Gary J. Patti, Oscar Yanes, and Gary Siuzdak. Metabolomics: the apogee of the omic trilogy.Nat Rev Mol Cell Biol, 13(4):263–269, 2012. doi: 10.1038/nrm3314

work page doi:10.1038/nrm3314 2012
[15]

Metabolomics insights into pathophysiological mechanisms of inter- stitial cystitis.International Neurourology Journal, 18(3):106–114, 2014

Oliver Fiehn and Jayoung Kim. Metabolomics insights into pathophysiological mechanisms of inter- stitial cystitis.International Neurourology Journal, 18(3):106–114, 2014. doi: 10.5213/inj.2014.18.3

work page doi:10.5213/inj.2014.18.3 2014
[16]

URLhttps://www.ncbi.nlm.nih.gov/pmc/articles/PMC4180160/
[17]

Schymanski, Junho Jeon, Rebekka Gulde, Kathrin Fenner, Martin Ruff, Heinz P

Emma L. Schymanski, Junho Jeon, Rebekka Gulde, Kathrin Fenner, Martin Ruff, Heinz P. Singer, and Juliane Hollender. Identifying small molecules via high resolution mass spectrometry: Communicating confidence.Environmental Science & Technology, 48(4):2097–2098, 2014. doi: 10.1021/es5002105

work page doi:10.1021/es5002105 2097
[18]

Metabo- lite annotation from knowns to unknowns through knowledge-guided multi-layer metabolic network- 32 ing.Nature Communications, 13(1), 2022

Zhiwei Zhou, Mingdu Luo, Haosong Zhang, Yandong Yin, Yuping Cai, and Zheng-Jiang Zhu. Metabo- lite annotation from knowns to unknowns through knowledge-guided multi-layer metabolic network- 32 ing.Nature Communications, 13(1), 2022. doi: 10.1038/s41467-022-34537-6

work page doi:10.1038/s41467-022-34537-6 2022
[19]

Neural networks and physical systems with emergent collective com- putational abilities

Ricardo R. da Silva, Pieter C. Dorrestein, and Robert A. Quinn. Illuminating the dark matter in metabolomics.Proc Natl Acad Sci U S A, 112(41):12549–12550, 2015. doi: 10.1073/pnas. 1516878112

work page doi:10.1073/pnas 2015
[20]

Dodds, Erin S

María Eugenia Monge, James N. Dodds, Erin S. Baker, Arthur S. Edison, and Facundo M. Fernández. Challenges in identifying the dark molecules of life.Annu Rev Anal Chem, 12(1):177–199, 2019. doi: 10.1146/annurev-anchem-061318-114959

work page doi:10.1146/annurev-anchem-061318-114959 2019
[21]

Jones, and Karan Uppal

Ihab Hajjar, Chang Liu, Dean P. Jones, and Karan Uppal. Untargeted metabolomics reveal dysregula- tions in sugar, methionine, and tyrosine pathways in the prodromal state of AD.Alzheimers Dement (Amst), 12(1):e12064, 2020. doi: 10.1002/dad2.12064

work page doi:10.1002/dad2.12064 2020
[22]

Metabolite discovery through global annotation of untargeted metabolomics data.Nat Methods, 18(11):1377–1385, 2021

Li Chen, Wenyun Lu, Lin Wang, et al. Metabolite discovery through global annotation of untargeted metabolomics data.Nat Methods, 18(11):1377–1385, 2021. doi: 10.1038/s41592-021-01303-3

work page doi:10.1038/s41592-021-01303-3 2021
[23]

Carver, Vanessa V

Mingxun Wang, Jeremy J. Carver, Vanessa V . Phelan, Laura M. Sanchez, Neha Garg, Yao Peng, et al. Sharing and community curation of mass spectrometry data with Global Natural Products Social Molecular Networking.Nat Biotechnol, 34(8):828–837, 2016. doi: 10.1038/nbt.3597

work page doi:10.1038/nbt.3597 2016
[24]

MassBank: a public repository for sharing mass spectral data for life sciences.J Mass Spectrom, 45(7):703–714, 2010

Hisayuki Horai, Masanori Arita, Shigehiko Kanaya, et al. MassBank: a public repository for sharing mass spectral data for life sciences.J Mass Spectrom, 45(7):703–714, 2010. doi: 10.1002/jms.1777

work page doi:10.1002/jms.1777 2010
[25]

SIRIUS 4: a rapid tool for turning tandem mass spectra into metabolite structure information.Nat Methods, 16(4):299–302, 2019

Kai Dührkop, Markus Fleischauer, Marcus Ludwig, et al. SIRIUS 4: a rapid tool for turning tandem mass spectra into metabolite structure information.Nat Methods, 16(4):299–302, 2019. doi: 10.1038/ s41592-019-0344-8

2019
[26]

Systematic classification of unknown metabolites using high-resolution fragmentation mass spectra.Nat Biotechnol, 39(4):462–471, 2021

Kai Dührkop, Louis-Félix Nothias, Markus Fleischauer, et al. Systematic classification of unknown metabolites using high-resolution fragmentation mass spectra.Nat Biotechnol, 39(4):462–471, 2021. doi: 10.1038/s41587-020-0740-8

work page doi:10.1038/s41587-020-0740-8 2021
[27]

Florian Huber, Sven van der Burg, Justin J. J. van der Hooft, and Lars Ridder. MS2DeepScore: a novel deep learning similarity measure to compare tandem mass spectra.J Cheminform, 13(1):84, 2021. doi: 10.1186/s13321-021-00558-4

work page doi:10.1186/s13321-021-00558-4 2021
[28]

CFM-ID: a web server for annotation, spectrum prediction and metabolite identification from tandem mass spectra.Nucleic Acids Res, 42(W1):W94–W99, 2014

Felicity Allen, Allison Pon, Michael Wilson, Russ Greiner, and David Wishart. CFM-ID: a web server for annotation, spectrum prediction and metabolite identification from tandem mass spectra.Nucleic Acids Res, 42(W1):W94–W99, 2014. doi: 10.1093/nar/gku436

work page doi:10.1093/nar/gku436 2014
[29]

Retip: Re- tention time prediction for compound annotation in untargeted metabolomics.Analytical Chemistry, 92(11):7515–7522, 2020

Paolo Bonini, Tobias Kind, Hiroshi Tsugawa, Dinesh Kumar Barupal, and Oliver Fiehn. Retip: Re- tention time prediction for compound annotation in untargeted metabolomics.Analytical Chemistry, 92(11):7515–7522, 2020. doi: 10.1021/acs.analchem.9b05765. URLhttps://pubs.acs.org/doi/ 10.1021/acs.analchem.9b05765

work page doi:10.1021/acs.analchem.9b05765 2020
[30]

RT-Transformer: retention time prediction for metabolite annotation to assist in metabolite identification.Bioinformatics, 40(3):btae084, 2024

Robin Schmid, Steffen Heuckeroth, Ansgar Korf, et al. RT-Transformer: retention time prediction for metabolite annotation to assist in metabolite identification.Bioinformatics, 40(3):btae084, 2024. doi: 10.1093/bioinformatics/btae084

work page doi:10.1093/bioinformatics/btae084 2024
[31]

Current status of retention time prediction in metabolite identi- fication.Journal of Separation Science, 43(9-10):1746–1754, 2020

Michael Witting and Sebastian Böcker. Current status of retention time prediction in metabolite identi- fication.Journal of Separation Science, 43(9-10):1746–1754, 2020. doi: https://doi.org/10.1002/jssc. 202000060. URLhttps://onlinelibrary.wiley.com/doi/abs/10.1002/jssc.202000060. 33

work page doi:10.1002/jssc 2020
[32]

Schymanski, and Juho Rousu

Eric Bach, Emma L. Schymanski, and Juho Rousu. Joint structural annotation of small molecules using liquid chromatography retention order and tandem mass spectrometry data.Nature Machine Intelligence, 4(12):1224–1237, 2022. doi: 10.1038/s42256-022-00577-2

work page doi:10.1038/s42256-022-00577-2 2022
[33]

Roasmi: accelerating small molecule identification by repurposing retention data.Journal of Cheminformatics, 17(1), 2025

Fang-Yuan Sun, Ying-Hao Yin, Hui-Jun Liu, Lu-Na Shen, Xiu-Lin Kang, Gui-Zhong Xin, Li-Fang Liu, and Jia-Yi Zheng. Roasmi: accelerating small molecule identification by repurposing retention data.Journal of Cheminformatics, 17(1), 2025. doi: 10.1186/s13321-025-00968-8

work page doi:10.1186/s13321-025-00968-8 2025
[34]

Eight key rules for successful data-dependent acquisition in mass spectrometry-based metabolomics.Mass Spectrometry Reviews, 42(1):131–143, 2021

Emmanuel Defossez, Julien Bourquin, Stephan von Reuss, Sergio Rasmann, and Gaétan Glauser. Eight key rules for successful data-dependent acquisition in mass spectrometry-based metabolomics.Mass Spectrometry Reviews, 42(1):131–143, 2021. doi: 10.1002/mas.21715

work page doi:10.1002/mas.21715 2021
[35]

Hanane El Boudlali, Laura Lehmicke, and Uta Ceglarek. High-resolution accurate mass- mass spec- trometry based- untargeted metabolomics: Reproducibility and detection power across data-dependent acquisition, data-independent acquisition, and acquirex.Computational and Structural Biotechnology Journal, 27:2412–2423, 2025. doi: 10.1016/j.csbj.2025.05.046

work page doi:10.1016/j.csbj.2025.05.046 2025
[36]

Cooper, Ruohong Yang, et al

Brianna T. Cooper, Ruohong Yang, et al. An assessment of acquirex and compound discoverer software 3.3 for non-targeted metabolomics.Scientific Reports, 2024. doi: 10.1038/s41598-024-55356-3

work page doi:10.1038/s41598-024-55356-3 2024
[37]

Vimms 2.0: A framework to develop, test and optimise fragmentation strategies in lc-ms metabolomics.Journal of Open Source Software, 7(71):3990, 2022

Joe Wandy, Vinny Davies, Ross McBride, Stefan Weidt, Simon Rogers, and Rónán Daly. Vimms 2.0: A framework to develop, test and optimise fragmentation strategies in lc-ms metabolomics.Journal of Open Source Software, 7(71):3990, 2022. doi: 10.21105/joss.03990

work page doi:10.21105/joss.03990 2022
[38]

Flashida enables intelligent data acquisition for top-down proteomics to boost proteoform identification counts.Nature Communications, 13:4407, 2022

Kyowon Jeong, Author Corresponding, et al. Flashida enables intelligent data acquisition for top-down proteomics to boost proteoform identification counts.Nature Communications, 13:4407, 2022. doi: 10.1038/s41467-022-31922-z

work page doi:10.1038/s41467-022-31922-z 2022
[39]

Maxquant.live enables global targeting of more than 25,000 peptides.Molecular & Cellular Proteomics, 18(5):982a–994, 2019

Christoph Wichmann, Florian Meier, Sebastian Virreira Winter, Andreas-David Brunner, Jürgen Cox, and Matthias Mann. Maxquant.live enables global targeting of more than 25,000 peptides.Molecular & Cellular Proteomics, 18(5):982a–994, 2019. doi: 10.1074/mcp.tir118.001131

work page doi:10.1074/mcp.tir118.001131 2019
[40]

Barshop, Seema Sharma, Jesse Canterbury, Aaron M

Brandon Bills, William D. Barshop, Seema Sharma, Jesse Canterbury, Aaron M. Robitaille, Michael Goodwin, Michael W. Senko, and Vlad Zabrouskov. Novel real-time library search driven data ac- quisition strategy for identification and characterization of metabolites.Analytical Chemistry, 94(9): 3749–3755, 2022. doi: 10.1021/acs.analchem.1c04336

work page doi:10.1021/acs.analchem.1c04336 2022
[41]

The Journal of Physical Chemistry A125(17), 3776–3784 (2021) https://doi.org/10.1021/acs

Lilian R. Heil, Philip M. Remes, Jesse D. Canterbury, Ping Yip, William D. Barshop, Christine C. Wu, and Michael J. MacCoss. Dynamic data-independent acquisition mass spectrometry with real- time retrospective alignment.Analytical Chemistry, 95(32):11854–11858, 2023. doi: 10.1021/acs. analchem.3c00903

work page doi:10.1021/acs 2023
[42]

White, Paul J

Jake B. White, Paul J. Trim, Thalia Salagaras, Aaron Long, Peter J. Psaltis, Johan W. Verjans, and Marten F. Snel. Equivalent carbon number and interclass retention time conversion enhance lipid identification in untargeted clinical lipidomics.Analytical Chemistry, 94(8):3476–3484, 2022. doi: 10.1021/acs.analchem.1c03770

work page doi:10.1021/acs.analchem.1c03770 2022
[43]

Retention dependences support highly confident identification of lipid species in human plasma by reversed-phase uhplc/ms.Analytical and Bioanalytical Chemistry, 414(1):319–331, 34

Zuzana Va ˇnková, Ond ˇrej Peterka, Michaela Chocholoušková, Denise Wolrab, Robert Jirásko, and Michal Hol ˇcapek. Retention dependences support highly confident identification of lipid species in human plasma by reversed-phase uhplc/ms.Analytical and Bioanalytical Chemistry, 414(1):319–331, 34
[44]

doi: 10.1007/s00216-021-03492-4

work page doi:10.1007/s00216-021-03492-4
[45]

Liquid- chromatography retention order prediction for metabolite identification.Bioinformatics, 34(17):i875– i883, 2018

Eric Bach, Sandor Szedmak, Céline Brouard, Shenghuo Liang, and Juho Rousu. Liquid- chromatography retention order prediction for metabolite identification.Bioinformatics, 34(17):i875– i883, 2018. doi: 10.1093/bioinformatics/bty590

work page doi:10.1093/bioinformatics/bty590 2018
[46]

PredRet: Prediction of retention time by direct mapping between multiple chromatographic systems.Analytical Chemistry, 87(18):9421–9428, 2015

Jan Stanstrup, Steffen Neumann, and Urska Vrhošek. PredRet: Prediction of retention time by direct mapping between multiple chromatographic systems.Analytical Chemistry, 87(18):9421–9428, 2015. doi: 10.1021/acs.analchem.5b02287

work page doi:10.1021/acs.analchem.5b02287 2015
[47]

Qcomics: Recommendations and guidelines for robust, easily implementable and reportable quality control of metabolomics data.Analytical Chemistry, 96(3):1064–1072, 2024

Álvaro González-Domínguez, Núria Estanyol-Torres, Carl Brunius, Rikard Landberg, and Raúl González-Domínguez. Qcomics: Recommendations and guidelines for robust, easily implementable and reportable quality control of metabolomics data.Analytical Chemistry, 96(3):1064–1072, 2024. doi: 10.1021/acs.analchem.3c03660

work page doi:10.1021/acs.analchem.3c03660 2024
[48]

Attention Is All You Need

Ashish Vaswani, Noam Shazeer, Niki Parmar, Jakob Uszkoreit, Llion Jones, Aidan N Gomez, Łukasz Kaiser, and Illia Polosukhin. Attention is all you need. InAdvances in Neural Information Processing Systems, volume 30, pages 6000–6010, 2017. doi: 10.48550/arXiv.1706.03762

work page internal anchor Pith review Pith/arXiv arXiv doi:10.48550/arxiv.1706.03762 2017
[49]

Lawrence Zitnick, Jerry Ma, and Rob Fergus

Alexander Rives, Joshua Meier, Tom Sercu, Siddharth Goyal, Zeming Lin, Jason Liu, Demi Guo, Myle Ott, C. Lawrence Zitnick, Jerry Ma, and Rob Fergus. Biological structure and function emerge from scaling unsupervised learning to 250 million protein sequences.Proc Natl Acad Sci U S A, 118(15): e2016239118, 2021. doi: 10.1073/pnas.2016239118

work page doi:10.1073/pnas.2016239118 2021
[50]

Hunter, Costas Bekas, and Alpha A

Philippe Schwaller, Teodoro Laino, Théophile Gaudin, Peter Bolgar, Christopher A. Hunter, Costas Bekas, and Alpha A. Lee. Molecular Transformer: A model for uncertainty-calibrated chemical reac- tion prediction.ACS Cent Sci, 5(9):1572–1583, 2019. doi: 10.1021/acscentsci.9b00576

work page doi:10.1021/acscentsci.9b00576 2019
[51]

Lewis, Yicheng Jin, Xiuyu Tang, Vidit Shah, Christina Doty, Bethany E

Nicholas R. Lewis, Yicheng Jin, Xiuyu Tang, Vidit Shah, Christina Doty, Bethany E. Matthews, Sarah Akers, and Steven R. Spurgeon. Forecasting of in situ electron energy loss spectroscopy.npj Compu- tational Materials, 8(1), 2022. doi: 10.1038/s41524-022-00940-2

work page doi:10.1038/s41524-022-00940-2 2022
[52]

Self-supervised learning of molecular representations from millions of tandem mass spec- tra using DreaMS.Nat Biotechnol, 2025

Roman Bushuiev, Anton Bushuiev, Raman Samusevich, Corinna Brungs, Josef Sivic, and Tomáš Pluskal. Self-supervised learning of molecular representations from millions of tandem mass spec- tra using DreaMS.Nat Biotechnol, 2025. doi: 10.1038/s41587-025-02663-3. Online ahead of print

work page doi:10.1038/s41587-025-02663-3 2025
[53]

Campbell, Jack Geremia, and Timothy Kassis

Gabriel Asher, Mimoun Cadosh Delmar, Jennifer M. Campbell, Jack Geremia, and Timothy Kassis. LSM1-MS2: A foundation model for MS/MS, encompassing chemical property predictions, search and de novo generation, 2024. ChemRxiv preprint

2024
[54]

Xu and Hannes L

Leon L. Xu and Hannes L. Röst. Peak detection on data independent acquisition mass spectrometry data with semisupervised convolutional transformers, 2020. arXiv:2010.13841

arXiv 2020
[55]

MS-DIAL: data-independent MS/MS deconvolution for comprehensive metabolome analysis.Nat Methods, 12(6):523–526, 2015

Hiroshi Tsugawa, Tomas Cajka, Tobias Kind, Yan Ma, Brendan Higgins, Kazutaka Ikeda, Mitsuhiro Kanazawa, Jean VanderGheynst, Oliver Fiehn, and Masanori Arita. MS-DIAL: data-independent MS/MS deconvolution for comprehensive metabolome analysis.Nat Methods, 12(6):523–526, 2015. doi: 10.1038/nmeth.3393

work page doi:10.1038/nmeth.3393 2015
[56]

Interleukin-1 blockade in recently decompensated systolic heart failure: Results from REDHART.Circ Heart Fail, 10(11):e004373,

Benjamin W Van Tassell, Justin M Canada, Salvatore Carbone, et al. Interleukin-1 blockade in recently decompensated systolic heart failure: Results from REDHART.Circ Heart Fail, 10(11):e004373,
[57]

doi: 10.1161/CIRCHEARTFAILURE.117.004373. 35

work page doi:10.1161/circheartfailure.117.004373
[58]

Metabolic modulation predicts heart failure tests performance.PLoS One, 14(6):e0218153, 2019

Daniel Jr Contaifer, Leo F Buckley, George Wohlford, Naren Gajenthra Kumar, Joshua M Morriss, Asanga D Ranasinghe, Salvatore Carbone, Justin M Canada, Cory Trankle, Antonio Abbate, Ben- jamin W Van Tassell, and Dayanjan S Wijesinghe. Metabolic modulation predicts heart failure tests performance.PLoS One, 14(6):e0218153, 2019. doi: 10.1371/journal.pone.0218153

work page doi:10.1371/journal.pone.0218153 2019
[59]

Daniel Jr Contaifer, Catherine H Roberts, Naren Gajenthra Kumar, Ramesh Natarajan, others, and Dayanjan S Wijesinghe. A preliminary investigation towards the risk stratification of allogeneic stem cell recipients with respect to the potential for development of GVHD via their pre-transplant plasma lipid and metabolic signature.Cancers (Basel), 11(4):466, ...

work page doi:10.3390/cancers11081051 2019
[60]

Highly reliable LC-MS lipidomics database for efficient human plasma profiling based on NIST SRM 1950.J Lipid Res, 65(11):100671, 2024

Sara Martínez, Miguel Fernández-García, Sara Londoño-Osorio, Coral Barbas, and Ana Gradillas. Highly reliable LC-MS lipidomics database for efficient human plasma profiling based on NIST SRM 1950.J Lipid Res, 65(11):100671, 2024. doi: 10.1016/j.jlr.2024.100671

work page doi:10.1016/j.jlr.2024.100671 1950
[61]

Harmonizing lipidomics: NIST interlaboratory comparison exercise for lipidomics using SRM 1950-metabolites in frozen human plasma.J Lipid Res, 58(12):2275–2288,

John A Bowden, Alan Heckert, Candice Z Ulmer, Christina M Jones, Jeremy P Koelmel, Laila Ab- dullah, Linda Ahonen, et al. Harmonizing lipidomics: NIST interlaboratory comparison exercise for lipidomics using SRM 1950-metabolites in frozen human plasma.J Lipid Res, 58(12):2275–2288,

1950
[62]

doi: 10.1194/jlr.M079012

work page doi:10.1194/jlr.m079012
[63]

Reinke, Julia Kuligowski, Ian D

David Broadhurst, Royston Goodacre, Stacey N. Reinke, Julia Kuligowski, Ian D. Wilson, Matthew R. Lewis, and Warwick B. Dunn. Guidelines and considerations for the use of system suitability and quality control samples in mass spectrometry assays applied in untargeted clinical metabolomic stud- ies.Metabolomics, 14(5):72, 2018. doi: 10.1007/s11306-018-1367-3

work page doi:10.1007/s11306-018-1367-3 2018
[64]

Paul Benton, Julijana Ivanisevic, Nathaniel G

H. Paul Benton, Julijana Ivanisevic, Nathaniel G. Mahieu, Michael E. Kurczy, Caroline H. Johnson, Lauren Franco, Duane Rinehart, Elizabeth Valentine, Harsha Gowda, Baljit K. Ubhi, Ralf Tautenhahn, Andrew Gieschen, Matthew W. Fields, Gary J. Patti, and Gary Siuzdak. Autonomous metabolomics for rapid metabolite identification in global profiling.Analytical ...

work page doi:10.1021/ac5025649 2015
[65]

Randolph, Matthew Muhoberac, Palak Manchanda, and Gaurav Chopra

Connor Beveridge, Sanjay Iyer, Caitlin E. Randolph, Matthew Muhoberac, Palak Manchanda, and Gaurav Chopra. Claw-mrm: Comprehensive lipidomics automation workflow for multiple reaction monitoring using large language models.Analytical Chemistry, 2025. doi: 10.1021/acs.analchem. 4c05039

work page doi:10.1021/acs.analchem 2025
[66]

Reder, E

Gustav K. Reder, E. Yves Bjurstrom, Daniel Brunnsaker, Fanny Kronstrom, Paul Lasin, Ievgeniia Tiukova, Otto I. Savolainen, James N. Dodds, Jody C. May, John P. Wikswo, John A. McLean, and Ross D. King. Autonoms: Automated ion mobility metabolomic fingerprinting.Journal of the Ameri- can Society for Mass Spectrometry, 35(3):542–550, 2024. doi: 10.1021/jasm...

work page doi:10.1021/jasms.3c00396 2024
[67]

Self-supervised learning from small-molecule mass spectrometry data.Nat Biotechnol, 2025

Wout Bittremieux and William Stafford Noble. Self-supervised learning from small-molecule mass spectrometry data.Nat Biotechnol, 2025. doi: 10.1038/s41587-025-02677-x. Online ahead of print

work page doi:10.1038/s41587-025-02677-x 2025
[68]

Language model-guided anticipation and discovery of mammalian metabolites.Nature, 2026

Hantao Qiang, Fei Wang, Wenyun Lu, Xi Xing, Hahn Kim, et al. Language model-guided anticipation and discovery of mammalian metabolites.Nature, 2026. doi: 10.1038/s41586-025-09969-x. 36

work page doi:10.1038/s41586-025-09969-x 2026

[1] [1]

Lipidomics reveals a remarkable diversity of lipids in human plasma.J Lipid Res, 51(11):3299–3305, 2010

Oswald Quehenberger, Aaron M Armando, Alex H Brown, et al. Lipidomics reveals a remarkable diversity of lipids in human plasma.J Lipid Res, 51(11):3299–3305, 2010. doi: 10.1194/jlr.M009449

work page doi:10.1194/jlr.m009449 2010

[2] [2]

Understanding the diversity of membrane lipid composition

Takeshi Harayama and Howard Riezman. Understanding the diversity of membrane lipid composition. Nat Rev Mol Cell Biol, 19(5):281–296, 2018. doi: 10.1038/nrm.2017.138

work page doi:10.1038/nrm.2017.138 2018

[3] [3]

The mystery of membrane organization: composition, regulation and roles of lipid rafts.Nat Rev Mol Cell Biol, 18(6):361–374,

Erdinc Sezgin, Ilya Levental, Satyajit Mayor, and Christian Eggeling. The mystery of membrane organization: composition, regulation and roles of lipid rafts.Nat Rev Mol Cell Biol, 18(6):361–374,

[4] [4]

doi: 10.1038/nrm.2017.16

work page doi:10.1038/nrm.2017.16 2017

[5] [5]

Olzmann and Pedro Carvalho

James A. Olzmann and Pedro Carvalho. Dynamics and functions of lipid droplets.Nat Rev Mol Cell Biol, 20(3):137–155, 2019. doi: 10.1038/s41580-018-0085-z

work page doi:10.1038/s41580-018-0085-z 2019

[6] [6]

Dennis and Paul C

Edward A. Dennis and Paul C. Norris. Eicosanoid storm in infection and inflammation.Nat Rev Immunol, 15(8):511–523, 2015. doi: 10.1038/nri3859

work page doi:10.1038/nri3859 2015

[7] [7]

Hannun and Lina M

Yusuf A. Hannun and Lina M. Obeid. Sphingolipids and their metabolism in physiology and disease. Nat Rev Mol Cell Biol, 19(3):175–191, 2018. doi: 10.1038/nrm.2017.107

work page doi:10.1038/nrm.2017.107 2018

[8] [8]

Wymann and Roger Schneiter

Matthias P. Wymann and Roger Schneiter. Lipid signalling in disease.Nat Rev Mol Cell Biol, 9(2): 162–176, 2008. doi: 10.1038/nrm2335

work page doi:10.1038/nrm2335 2008

[9] [9]

Buring, Lina Badimon, Göran K

Peter Libby, Julie E. Buring, Lina Badimon, Göran K. Hansson, John Deanfield, Márcio S. Bittencourt, Lale Tokgözo˘glu, and Eldrin F. Lewis. Atherosclerosis.Nat Rev Dis Primers, 5(1):56, 2019. doi: 10.1038/s41572-019-0106-z

work page doi:10.1038/s41572-019-0106-z 2019

[10] [10]

Lydic and Young-Hwa Goo

Todd A. Lydic and Young-Hwa Goo. Lipidomics unveils the complexity of the lipidome in metabolic diseases.Clinical and Translational Medicine, 7, 2018. doi: 10.1186/s40169-018-0182-9. URL https://www.ncbi.nlm.nih.gov/pmc/articles/PMC5786598/

work page doi:10.1186/s40169-018-0182-9 2018

[11] [11]

Lipidomes in health and dis- ease: Analytical strategies and considerations.TrAC Trends in Analytical Chemistry, 120:115664,

Fang Wei, Santosh Lamichhane, Matej Oreši ˇc, and Tuulia Hyötyläinen. Lipidomes in health and dis- ease: Analytical strategies and considerations.TrAC Trends in Analytical Chemistry, 120:115664,

[12] [12]

URLhttps://www.sciencedirect.com/science/ article/pii/S0165993619303723

doi: 10.1016/j.trac.2019.115664. URLhttps://www.sciencedirect.com/science/ article/pii/S0165993619303723

work page doi:10.1016/j.trac.2019.115664 2019

[13] [13]

Smilowitz, and Oliver Fiehn

Tomas Cajka, Jennifer T. Smilowitz, and Oliver Fiehn. Validating quantitative untargeted lipidomics across nine liquid chromatography–high-resolution mass spectrometry platforms.Anal Chem, 89(22): 12360–12368, 2017. doi: 10.1021/acs.analchem.7b03404

work page doi:10.1021/acs.analchem.7b03404 2017

[14] [14]

Patti, Oscar Yanes, and Gary Siuzdak

Gary J. Patti, Oscar Yanes, and Gary Siuzdak. Metabolomics: the apogee of the omic trilogy.Nat Rev Mol Cell Biol, 13(4):263–269, 2012. doi: 10.1038/nrm3314

work page doi:10.1038/nrm3314 2012

[15] [15]

Metabolomics insights into pathophysiological mechanisms of inter- stitial cystitis.International Neurourology Journal, 18(3):106–114, 2014

Oliver Fiehn and Jayoung Kim. Metabolomics insights into pathophysiological mechanisms of inter- stitial cystitis.International Neurourology Journal, 18(3):106–114, 2014. doi: 10.5213/inj.2014.18.3

work page doi:10.5213/inj.2014.18.3 2014

[16] [16]

URLhttps://www.ncbi.nlm.nih.gov/pmc/articles/PMC4180160/

[17] [17]

Schymanski, Junho Jeon, Rebekka Gulde, Kathrin Fenner, Martin Ruff, Heinz P

Emma L. Schymanski, Junho Jeon, Rebekka Gulde, Kathrin Fenner, Martin Ruff, Heinz P. Singer, and Juliane Hollender. Identifying small molecules via high resolution mass spectrometry: Communicating confidence.Environmental Science & Technology, 48(4):2097–2098, 2014. doi: 10.1021/es5002105

work page doi:10.1021/es5002105 2097

[18] [18]

Metabo- lite annotation from knowns to unknowns through knowledge-guided multi-layer metabolic network- 32 ing.Nature Communications, 13(1), 2022

Zhiwei Zhou, Mingdu Luo, Haosong Zhang, Yandong Yin, Yuping Cai, and Zheng-Jiang Zhu. Metabo- lite annotation from knowns to unknowns through knowledge-guided multi-layer metabolic network- 32 ing.Nature Communications, 13(1), 2022. doi: 10.1038/s41467-022-34537-6

work page doi:10.1038/s41467-022-34537-6 2022

[19] [19]

Neural networks and physical systems with emergent collective com- putational abilities

Ricardo R. da Silva, Pieter C. Dorrestein, and Robert A. Quinn. Illuminating the dark matter in metabolomics.Proc Natl Acad Sci U S A, 112(41):12549–12550, 2015. doi: 10.1073/pnas. 1516878112

work page doi:10.1073/pnas 2015

[20] [20]

Dodds, Erin S

María Eugenia Monge, James N. Dodds, Erin S. Baker, Arthur S. Edison, and Facundo M. Fernández. Challenges in identifying the dark molecules of life.Annu Rev Anal Chem, 12(1):177–199, 2019. doi: 10.1146/annurev-anchem-061318-114959

work page doi:10.1146/annurev-anchem-061318-114959 2019

[21] [21]

Jones, and Karan Uppal

Ihab Hajjar, Chang Liu, Dean P. Jones, and Karan Uppal. Untargeted metabolomics reveal dysregula- tions in sugar, methionine, and tyrosine pathways in the prodromal state of AD.Alzheimers Dement (Amst), 12(1):e12064, 2020. doi: 10.1002/dad2.12064

work page doi:10.1002/dad2.12064 2020

[22] [22]

Metabolite discovery through global annotation of untargeted metabolomics data.Nat Methods, 18(11):1377–1385, 2021

Li Chen, Wenyun Lu, Lin Wang, et al. Metabolite discovery through global annotation of untargeted metabolomics data.Nat Methods, 18(11):1377–1385, 2021. doi: 10.1038/s41592-021-01303-3

work page doi:10.1038/s41592-021-01303-3 2021

[23] [23]

Carver, Vanessa V

Mingxun Wang, Jeremy J. Carver, Vanessa V . Phelan, Laura M. Sanchez, Neha Garg, Yao Peng, et al. Sharing and community curation of mass spectrometry data with Global Natural Products Social Molecular Networking.Nat Biotechnol, 34(8):828–837, 2016. doi: 10.1038/nbt.3597

work page doi:10.1038/nbt.3597 2016

[24] [24]

MassBank: a public repository for sharing mass spectral data for life sciences.J Mass Spectrom, 45(7):703–714, 2010

Hisayuki Horai, Masanori Arita, Shigehiko Kanaya, et al. MassBank: a public repository for sharing mass spectral data for life sciences.J Mass Spectrom, 45(7):703–714, 2010. doi: 10.1002/jms.1777

work page doi:10.1002/jms.1777 2010

[25] [25]

SIRIUS 4: a rapid tool for turning tandem mass spectra into metabolite structure information.Nat Methods, 16(4):299–302, 2019

Kai Dührkop, Markus Fleischauer, Marcus Ludwig, et al. SIRIUS 4: a rapid tool for turning tandem mass spectra into metabolite structure information.Nat Methods, 16(4):299–302, 2019. doi: 10.1038/ s41592-019-0344-8

2019

[26] [26]

Systematic classification of unknown metabolites using high-resolution fragmentation mass spectra.Nat Biotechnol, 39(4):462–471, 2021

Kai Dührkop, Louis-Félix Nothias, Markus Fleischauer, et al. Systematic classification of unknown metabolites using high-resolution fragmentation mass spectra.Nat Biotechnol, 39(4):462–471, 2021. doi: 10.1038/s41587-020-0740-8

work page doi:10.1038/s41587-020-0740-8 2021

[27] [27]

Florian Huber, Sven van der Burg, Justin J. J. van der Hooft, and Lars Ridder. MS2DeepScore: a novel deep learning similarity measure to compare tandem mass spectra.J Cheminform, 13(1):84, 2021. doi: 10.1186/s13321-021-00558-4

work page doi:10.1186/s13321-021-00558-4 2021

[28] [28]

CFM-ID: a web server for annotation, spectrum prediction and metabolite identification from tandem mass spectra.Nucleic Acids Res, 42(W1):W94–W99, 2014

Felicity Allen, Allison Pon, Michael Wilson, Russ Greiner, and David Wishart. CFM-ID: a web server for annotation, spectrum prediction and metabolite identification from tandem mass spectra.Nucleic Acids Res, 42(W1):W94–W99, 2014. doi: 10.1093/nar/gku436

work page doi:10.1093/nar/gku436 2014

[29] [29]

Retip: Re- tention time prediction for compound annotation in untargeted metabolomics.Analytical Chemistry, 92(11):7515–7522, 2020

Paolo Bonini, Tobias Kind, Hiroshi Tsugawa, Dinesh Kumar Barupal, and Oliver Fiehn. Retip: Re- tention time prediction for compound annotation in untargeted metabolomics.Analytical Chemistry, 92(11):7515–7522, 2020. doi: 10.1021/acs.analchem.9b05765. URLhttps://pubs.acs.org/doi/ 10.1021/acs.analchem.9b05765

work page doi:10.1021/acs.analchem.9b05765 2020

[30] [30]

RT-Transformer: retention time prediction for metabolite annotation to assist in metabolite identification.Bioinformatics, 40(3):btae084, 2024

Robin Schmid, Steffen Heuckeroth, Ansgar Korf, et al. RT-Transformer: retention time prediction for metabolite annotation to assist in metabolite identification.Bioinformatics, 40(3):btae084, 2024. doi: 10.1093/bioinformatics/btae084

work page doi:10.1093/bioinformatics/btae084 2024

[31] [31]

Current status of retention time prediction in metabolite identi- fication.Journal of Separation Science, 43(9-10):1746–1754, 2020

Michael Witting and Sebastian Böcker. Current status of retention time prediction in metabolite identi- fication.Journal of Separation Science, 43(9-10):1746–1754, 2020. doi: https://doi.org/10.1002/jssc. 202000060. URLhttps://onlinelibrary.wiley.com/doi/abs/10.1002/jssc.202000060. 33

work page doi:10.1002/jssc 2020

[32] [32]

Schymanski, and Juho Rousu

Eric Bach, Emma L. Schymanski, and Juho Rousu. Joint structural annotation of small molecules using liquid chromatography retention order and tandem mass spectrometry data.Nature Machine Intelligence, 4(12):1224–1237, 2022. doi: 10.1038/s42256-022-00577-2

work page doi:10.1038/s42256-022-00577-2 2022

[33] [33]

Roasmi: accelerating small molecule identification by repurposing retention data.Journal of Cheminformatics, 17(1), 2025

Fang-Yuan Sun, Ying-Hao Yin, Hui-Jun Liu, Lu-Na Shen, Xiu-Lin Kang, Gui-Zhong Xin, Li-Fang Liu, and Jia-Yi Zheng. Roasmi: accelerating small molecule identification by repurposing retention data.Journal of Cheminformatics, 17(1), 2025. doi: 10.1186/s13321-025-00968-8

work page doi:10.1186/s13321-025-00968-8 2025

[34] [34]

Eight key rules for successful data-dependent acquisition in mass spectrometry-based metabolomics.Mass Spectrometry Reviews, 42(1):131–143, 2021

Emmanuel Defossez, Julien Bourquin, Stephan von Reuss, Sergio Rasmann, and Gaétan Glauser. Eight key rules for successful data-dependent acquisition in mass spectrometry-based metabolomics.Mass Spectrometry Reviews, 42(1):131–143, 2021. doi: 10.1002/mas.21715

work page doi:10.1002/mas.21715 2021

[35] [35]

Hanane El Boudlali, Laura Lehmicke, and Uta Ceglarek. High-resolution accurate mass- mass spec- trometry based- untargeted metabolomics: Reproducibility and detection power across data-dependent acquisition, data-independent acquisition, and acquirex.Computational and Structural Biotechnology Journal, 27:2412–2423, 2025. doi: 10.1016/j.csbj.2025.05.046

work page doi:10.1016/j.csbj.2025.05.046 2025

[36] [36]

Cooper, Ruohong Yang, et al

Brianna T. Cooper, Ruohong Yang, et al. An assessment of acquirex and compound discoverer software 3.3 for non-targeted metabolomics.Scientific Reports, 2024. doi: 10.1038/s41598-024-55356-3

work page doi:10.1038/s41598-024-55356-3 2024

[37] [37]

Vimms 2.0: A framework to develop, test and optimise fragmentation strategies in lc-ms metabolomics.Journal of Open Source Software, 7(71):3990, 2022

Joe Wandy, Vinny Davies, Ross McBride, Stefan Weidt, Simon Rogers, and Rónán Daly. Vimms 2.0: A framework to develop, test and optimise fragmentation strategies in lc-ms metabolomics.Journal of Open Source Software, 7(71):3990, 2022. doi: 10.21105/joss.03990

work page doi:10.21105/joss.03990 2022

[38] [38]

Flashida enables intelligent data acquisition for top-down proteomics to boost proteoform identification counts.Nature Communications, 13:4407, 2022

Kyowon Jeong, Author Corresponding, et al. Flashida enables intelligent data acquisition for top-down proteomics to boost proteoform identification counts.Nature Communications, 13:4407, 2022. doi: 10.1038/s41467-022-31922-z

work page doi:10.1038/s41467-022-31922-z 2022

[39] [39]

Maxquant.live enables global targeting of more than 25,000 peptides.Molecular & Cellular Proteomics, 18(5):982a–994, 2019

Christoph Wichmann, Florian Meier, Sebastian Virreira Winter, Andreas-David Brunner, Jürgen Cox, and Matthias Mann. Maxquant.live enables global targeting of more than 25,000 peptides.Molecular & Cellular Proteomics, 18(5):982a–994, 2019. doi: 10.1074/mcp.tir118.001131

work page doi:10.1074/mcp.tir118.001131 2019

[40] [40]

Barshop, Seema Sharma, Jesse Canterbury, Aaron M

Brandon Bills, William D. Barshop, Seema Sharma, Jesse Canterbury, Aaron M. Robitaille, Michael Goodwin, Michael W. Senko, and Vlad Zabrouskov. Novel real-time library search driven data ac- quisition strategy for identification and characterization of metabolites.Analytical Chemistry, 94(9): 3749–3755, 2022. doi: 10.1021/acs.analchem.1c04336

work page doi:10.1021/acs.analchem.1c04336 2022

[41] [41]

The Journal of Physical Chemistry A125(17), 3776–3784 (2021) https://doi.org/10.1021/acs

Lilian R. Heil, Philip M. Remes, Jesse D. Canterbury, Ping Yip, William D. Barshop, Christine C. Wu, and Michael J. MacCoss. Dynamic data-independent acquisition mass spectrometry with real- time retrospective alignment.Analytical Chemistry, 95(32):11854–11858, 2023. doi: 10.1021/acs. analchem.3c00903

work page doi:10.1021/acs 2023

[42] [42]

White, Paul J

Jake B. White, Paul J. Trim, Thalia Salagaras, Aaron Long, Peter J. Psaltis, Johan W. Verjans, and Marten F. Snel. Equivalent carbon number and interclass retention time conversion enhance lipid identification in untargeted clinical lipidomics.Analytical Chemistry, 94(8):3476–3484, 2022. doi: 10.1021/acs.analchem.1c03770

work page doi:10.1021/acs.analchem.1c03770 2022

[43] [43]

Retention dependences support highly confident identification of lipid species in human plasma by reversed-phase uhplc/ms.Analytical and Bioanalytical Chemistry, 414(1):319–331, 34

Zuzana Va ˇnková, Ond ˇrej Peterka, Michaela Chocholoušková, Denise Wolrab, Robert Jirásko, and Michal Hol ˇcapek. Retention dependences support highly confident identification of lipid species in human plasma by reversed-phase uhplc/ms.Analytical and Bioanalytical Chemistry, 414(1):319–331, 34

[44] [44]

doi: 10.1007/s00216-021-03492-4

work page doi:10.1007/s00216-021-03492-4

[45] [45]

Liquid- chromatography retention order prediction for metabolite identification.Bioinformatics, 34(17):i875– i883, 2018

Eric Bach, Sandor Szedmak, Céline Brouard, Shenghuo Liang, and Juho Rousu. Liquid- chromatography retention order prediction for metabolite identification.Bioinformatics, 34(17):i875– i883, 2018. doi: 10.1093/bioinformatics/bty590

work page doi:10.1093/bioinformatics/bty590 2018

[46] [46]

PredRet: Prediction of retention time by direct mapping between multiple chromatographic systems.Analytical Chemistry, 87(18):9421–9428, 2015

Jan Stanstrup, Steffen Neumann, and Urska Vrhošek. PredRet: Prediction of retention time by direct mapping between multiple chromatographic systems.Analytical Chemistry, 87(18):9421–9428, 2015. doi: 10.1021/acs.analchem.5b02287

work page doi:10.1021/acs.analchem.5b02287 2015

[47] [47]

Qcomics: Recommendations and guidelines for robust, easily implementable and reportable quality control of metabolomics data.Analytical Chemistry, 96(3):1064–1072, 2024

Álvaro González-Domínguez, Núria Estanyol-Torres, Carl Brunius, Rikard Landberg, and Raúl González-Domínguez. Qcomics: Recommendations and guidelines for robust, easily implementable and reportable quality control of metabolomics data.Analytical Chemistry, 96(3):1064–1072, 2024. doi: 10.1021/acs.analchem.3c03660

work page doi:10.1021/acs.analchem.3c03660 2024

[48] [48]

Attention Is All You Need

Ashish Vaswani, Noam Shazeer, Niki Parmar, Jakob Uszkoreit, Llion Jones, Aidan N Gomez, Łukasz Kaiser, and Illia Polosukhin. Attention is all you need. InAdvances in Neural Information Processing Systems, volume 30, pages 6000–6010, 2017. doi: 10.48550/arXiv.1706.03762

work page internal anchor Pith review Pith/arXiv arXiv doi:10.48550/arxiv.1706.03762 2017

[49] [49]

Lawrence Zitnick, Jerry Ma, and Rob Fergus

Alexander Rives, Joshua Meier, Tom Sercu, Siddharth Goyal, Zeming Lin, Jason Liu, Demi Guo, Myle Ott, C. Lawrence Zitnick, Jerry Ma, and Rob Fergus. Biological structure and function emerge from scaling unsupervised learning to 250 million protein sequences.Proc Natl Acad Sci U S A, 118(15): e2016239118, 2021. doi: 10.1073/pnas.2016239118

work page doi:10.1073/pnas.2016239118 2021

[50] [50]

Hunter, Costas Bekas, and Alpha A

Philippe Schwaller, Teodoro Laino, Théophile Gaudin, Peter Bolgar, Christopher A. Hunter, Costas Bekas, and Alpha A. Lee. Molecular Transformer: A model for uncertainty-calibrated chemical reac- tion prediction.ACS Cent Sci, 5(9):1572–1583, 2019. doi: 10.1021/acscentsci.9b00576

work page doi:10.1021/acscentsci.9b00576 2019

[51] [51]

Lewis, Yicheng Jin, Xiuyu Tang, Vidit Shah, Christina Doty, Bethany E

Nicholas R. Lewis, Yicheng Jin, Xiuyu Tang, Vidit Shah, Christina Doty, Bethany E. Matthews, Sarah Akers, and Steven R. Spurgeon. Forecasting of in situ electron energy loss spectroscopy.npj Compu- tational Materials, 8(1), 2022. doi: 10.1038/s41524-022-00940-2

work page doi:10.1038/s41524-022-00940-2 2022

[52] [52]

Self-supervised learning of molecular representations from millions of tandem mass spec- tra using DreaMS.Nat Biotechnol, 2025

Roman Bushuiev, Anton Bushuiev, Raman Samusevich, Corinna Brungs, Josef Sivic, and Tomáš Pluskal. Self-supervised learning of molecular representations from millions of tandem mass spec- tra using DreaMS.Nat Biotechnol, 2025. doi: 10.1038/s41587-025-02663-3. Online ahead of print

work page doi:10.1038/s41587-025-02663-3 2025

[53] [53]

Campbell, Jack Geremia, and Timothy Kassis

Gabriel Asher, Mimoun Cadosh Delmar, Jennifer M. Campbell, Jack Geremia, and Timothy Kassis. LSM1-MS2: A foundation model for MS/MS, encompassing chemical property predictions, search and de novo generation, 2024. ChemRxiv preprint

2024

[54] [54]

Xu and Hannes L

Leon L. Xu and Hannes L. Röst. Peak detection on data independent acquisition mass spectrometry data with semisupervised convolutional transformers, 2020. arXiv:2010.13841

arXiv 2020

[55] [55]

MS-DIAL: data-independent MS/MS deconvolution for comprehensive metabolome analysis.Nat Methods, 12(6):523–526, 2015

Hiroshi Tsugawa, Tomas Cajka, Tobias Kind, Yan Ma, Brendan Higgins, Kazutaka Ikeda, Mitsuhiro Kanazawa, Jean VanderGheynst, Oliver Fiehn, and Masanori Arita. MS-DIAL: data-independent MS/MS deconvolution for comprehensive metabolome analysis.Nat Methods, 12(6):523–526, 2015. doi: 10.1038/nmeth.3393

work page doi:10.1038/nmeth.3393 2015

[56] [56]

Interleukin-1 blockade in recently decompensated systolic heart failure: Results from REDHART.Circ Heart Fail, 10(11):e004373,

Benjamin W Van Tassell, Justin M Canada, Salvatore Carbone, et al. Interleukin-1 blockade in recently decompensated systolic heart failure: Results from REDHART.Circ Heart Fail, 10(11):e004373,

[57] [57]

doi: 10.1161/CIRCHEARTFAILURE.117.004373. 35

work page doi:10.1161/circheartfailure.117.004373

[58] [58]

Metabolic modulation predicts heart failure tests performance.PLoS One, 14(6):e0218153, 2019

Daniel Jr Contaifer, Leo F Buckley, George Wohlford, Naren Gajenthra Kumar, Joshua M Morriss, Asanga D Ranasinghe, Salvatore Carbone, Justin M Canada, Cory Trankle, Antonio Abbate, Ben- jamin W Van Tassell, and Dayanjan S Wijesinghe. Metabolic modulation predicts heart failure tests performance.PLoS One, 14(6):e0218153, 2019. doi: 10.1371/journal.pone.0218153

work page doi:10.1371/journal.pone.0218153 2019

[59] [59]

Daniel Jr Contaifer, Catherine H Roberts, Naren Gajenthra Kumar, Ramesh Natarajan, others, and Dayanjan S Wijesinghe. A preliminary investigation towards the risk stratification of allogeneic stem cell recipients with respect to the potential for development of GVHD via their pre-transplant plasma lipid and metabolic signature.Cancers (Basel), 11(4):466, ...

work page doi:10.3390/cancers11081051 2019

[60] [60]

Highly reliable LC-MS lipidomics database for efficient human plasma profiling based on NIST SRM 1950.J Lipid Res, 65(11):100671, 2024

Sara Martínez, Miguel Fernández-García, Sara Londoño-Osorio, Coral Barbas, and Ana Gradillas. Highly reliable LC-MS lipidomics database for efficient human plasma profiling based on NIST SRM 1950.J Lipid Res, 65(11):100671, 2024. doi: 10.1016/j.jlr.2024.100671

work page doi:10.1016/j.jlr.2024.100671 1950

[61] [61]

Harmonizing lipidomics: NIST interlaboratory comparison exercise for lipidomics using SRM 1950-metabolites in frozen human plasma.J Lipid Res, 58(12):2275–2288,

John A Bowden, Alan Heckert, Candice Z Ulmer, Christina M Jones, Jeremy P Koelmel, Laila Ab- dullah, Linda Ahonen, et al. Harmonizing lipidomics: NIST interlaboratory comparison exercise for lipidomics using SRM 1950-metabolites in frozen human plasma.J Lipid Res, 58(12):2275–2288,

1950

[62] [62]

doi: 10.1194/jlr.M079012

work page doi:10.1194/jlr.m079012

[63] [63]

Reinke, Julia Kuligowski, Ian D

David Broadhurst, Royston Goodacre, Stacey N. Reinke, Julia Kuligowski, Ian D. Wilson, Matthew R. Lewis, and Warwick B. Dunn. Guidelines and considerations for the use of system suitability and quality control samples in mass spectrometry assays applied in untargeted clinical metabolomic stud- ies.Metabolomics, 14(5):72, 2018. doi: 10.1007/s11306-018-1367-3

work page doi:10.1007/s11306-018-1367-3 2018

[64] [64]

Paul Benton, Julijana Ivanisevic, Nathaniel G

H. Paul Benton, Julijana Ivanisevic, Nathaniel G. Mahieu, Michael E. Kurczy, Caroline H. Johnson, Lauren Franco, Duane Rinehart, Elizabeth Valentine, Harsha Gowda, Baljit K. Ubhi, Ralf Tautenhahn, Andrew Gieschen, Matthew W. Fields, Gary J. Patti, and Gary Siuzdak. Autonomous metabolomics for rapid metabolite identification in global profiling.Analytical ...

work page doi:10.1021/ac5025649 2015

[65] [65]

Randolph, Matthew Muhoberac, Palak Manchanda, and Gaurav Chopra

Connor Beveridge, Sanjay Iyer, Caitlin E. Randolph, Matthew Muhoberac, Palak Manchanda, and Gaurav Chopra. Claw-mrm: Comprehensive lipidomics automation workflow for multiple reaction monitoring using large language models.Analytical Chemistry, 2025. doi: 10.1021/acs.analchem. 4c05039

work page doi:10.1021/acs.analchem 2025

[66] [66]

Reder, E

Gustav K. Reder, E. Yves Bjurstrom, Daniel Brunnsaker, Fanny Kronstrom, Paul Lasin, Ievgeniia Tiukova, Otto I. Savolainen, James N. Dodds, Jody C. May, John P. Wikswo, John A. McLean, and Ross D. King. Autonoms: Automated ion mobility metabolomic fingerprinting.Journal of the Ameri- can Society for Mass Spectrometry, 35(3):542–550, 2024. doi: 10.1021/jasm...

work page doi:10.1021/jasms.3c00396 2024

[67] [67]

Self-supervised learning from small-molecule mass spectrometry data.Nat Biotechnol, 2025

Wout Bittremieux and William Stafford Noble. Self-supervised learning from small-molecule mass spectrometry data.Nat Biotechnol, 2025. doi: 10.1038/s41587-025-02677-x. Online ahead of print

work page doi:10.1038/s41587-025-02677-x 2025

[68] [68]

Language model-guided anticipation and discovery of mammalian metabolites.Nature, 2026

Hantao Qiang, Fei Wang, Wenyun Lu, Xi Xing, Hahn Kim, et al. Language model-guided anticipation and discovery of mammalian metabolites.Nature, 2026. doi: 10.1038/s41586-025-09969-x. 36

work page doi:10.1038/s41586-025-09969-x 2026