MemNovo: Look Back at the Spectrum for Balanced De Novo Peptide Sequencing from Mass Spectrometry

Dongxin Lyu; Hongxin Xiang; Jingbo Zhou; Jun Xia; Yuqiang Li

arxiv: 2606.11868 · v1 · pith:6TBFN5FVnew · submitted 2026-06-10 · 💻 cs.LG · q-bio.QM

MemNovo: Look Back at the Spectrum for Balanced De Novo Peptide Sequencing from Mass Spectrometry

Dongxin Lyu , Jingbo Zhou , Hongxin Xiang , Yuqiang Li , Jun Xia This is my paper

Pith reviewed 2026-06-27 10:23 UTC · model grok-4.3

classification 💻 cs.LG q-bio.QM

keywords de novo peptide sequencingmass spectrometrytransformer decoderspectral memoryresidual connectioninference-time interventionproteomics

0 comments

The pith

A spectral memory bank and residual injection at inference time rebalances decoder reliance on mass spectra and lifts peptide precision.

A machine-rendered reading of the paper's core claim, the machinery that carries it, and where it could break.

De novo peptide sequencing from tandem mass spectra identifies novel peptides without databases. Transformer encoder-decoder models currently over-rely on their own generated sequence priors and progressively ignore fine-grained evidence in the input spectrum, yielding biologically plausible but spectrum-unfaithful outputs. MemNovo counters this by maintaining a persistent spectral memory bank and injecting retrieved spectrum features directly into the final decoding stage through an ultra-conservative residual connection. The intervention is training-free and plug-and-play. Experiments on the Nine Species benchmark with Casanovo and InstaNovo show consistent gains in both amino-acid and peptide precision, with relative peptide-precision lifts up to 39.1 percent and 3.9 percent respectively and negligible added cost.

Core claim

Existing auto-regressive peptide decoders suffer from progressive under-utilization of spectrum features; a persistent spectral memory bank plus residual injection of retrieved features at the final decoding stage restores mutual information between decoder state and raw spectrum, producing more faithful sequences.

What carries the argument

Persistent spectral memory bank that stores and retrieves input-spectrum features for direct residual injection into the decoder's final stage.

If this is right

Both amino-acid-level and peptide-level precision increase on standard benchmarks.
The relative gain is larger for some baseline models than others.
The mechanism adds negligible computational overhead at inference.
The fix applies to existing trained models without retraining.

Where Pith is reading between the lines

These are editorial extensions of the paper, not claims the author makes directly.

Similar memory-bank interventions could mitigate input-evidence drift in other auto-regressive generation tasks.
The information-bottleneck diagnosis may generalize beyond proteomics to other spectrum-to-sequence problems.
Ablation on the residual scaling factor could reveal the exact trade-off between prior and spectrum contributions.

Load-bearing premise

The diagnosed progressive under-utilization of spectrum features is the dominant cause of suboptimal outputs and can be corrected by restoring mutual information without introducing new errors.

What would settle it

A direct measurement showing that decoder hidden states retain high mutual information with the raw spectrum throughout decoding, or an application of MemNovo that produces no measurable precision gain on held-out spectra.

Figures

Figures reproduced from arXiv: 2606.11868 by Dongxin Lyu, Hongxin Xiang, Jingbo Zhou, Jun Xia, Yuqiang Li.

**Figure 2.** Figure 2: Overview of the MemNovo framework. The system comprises three stages. (Left) The tandem mass spectrum is [PITH_FULL_IMAGE:figures/full_fig_p004_2.png] view at source ↗

**Figure 3.** Figure 3: Diagnostic results from the Sensitivity Scaling [PITH_FULL_IMAGE:figures/full_fig_p006_3.png] view at source ↗

**Figure 4.** Figure 4: Visualization of three representative cases where MemNovo corrects erroneous baseline predictions from InstaNovo. [PITH_FULL_IMAGE:figures/full_fig_p008_4.png] view at source ↗

read the original abstract

De novo peptide sequencing from tandem mass spectrometry is pivotal in proteomics, enabling identification of novel peptides without reference databases. While recent Transformer-based encoder-decoder models have achieved remarkable performance, we uncover a critical pathology in their inference dynamics. Through comprehensive feature scaling experiments, we demonstrate that existing auto-regressive peptide decoders tend to over-rely on generated-sequence priors while progressively under-utilizing fine-grained physical evidence from the input mass spectrum. This phenomenon leads to suboptimal results, where generated peptide sequences are biologically plausible yet not faithful to the input spectrum. To rectify this, we propose MemNovo, a training-free and plug-and-play mechanism that re-balances peptide and spectral contributions at inference time. MemNovo alleviates the information bottleneck by establishing a persistent spectral memory bank and injecting retrieved features directly into the final decoding stage via an ultra-conservative residual connection. Theoretical analysis confirms that this mechanism restores the mutual information between the decoder state and the raw spectrum. Extensive experiments on the Nine Species benchmark with two representative baselines, Casanovo and InstaNovo, demonstrate that MemNovo consistently improves both amino acid precision and peptide precision, achieving up to 39.1% relative improvement in peptide precision for Casanovo and up to 3.9% for InstaNovo, with negligible computational overhead.

Editorial analysis

A structured set of objections, weighed in public.

Desk editor's note, referee report, simulated authors' rebuttal, and a circularity audit. Tearing a paper down is the easy half of reading it; the pith above is the substance, this is the friction.

Desk Editor's Note private letter to a colleague

MemNovo adds a training-free spectral memory bank and residual injection that lifts precision in existing peptide sequencing models with little overhead.

read the letter

The main thing to know is that this paper spots a practical problem in how Transformer decoders for de novo peptide sequencing drift away from the input spectrum during generation, then offers a simple plug-in fix that improves results on the benchmarks they test.

They run feature scaling checks to show the decoder leans harder on its own sequence outputs and less on the mass spec data as it goes along. Their MemNovo keeps a persistent memory bank of spectral features and feeds them back through a conservative residual connection at the last decoding step. This is training-free, works on top of Casanovo and InstaNovo, and comes with a mutual-information argument that the injection restores lost spectrum information.

What stands out as useful is the reported gains—up to 39 percent relative peptide precision lift for Casanovo and smaller but positive for InstaNovo—on the Nine Species benchmark, plus the claim of negligible extra compute. A lightweight inference-time change that actually moves the needle is the kind of thing labs running these models can try right away.

The softer parts are the evidence details. The improvements are presented as consistent, but without visible error bars, run counts, or full ablation tables in the high-level description it is hard to judge stability or whether the memory bank size or retrieval choices matter much. The theoretical mutual-information piece supports the story but reads more as analysis than a tight derivation that predicts the size of the gains.

This is aimed at computational proteomics groups already working with these encoder-decoder setups. Someone extending Casanovo-style tools would get direct value; broader sequence-model researchers might borrow the memory-bank pattern but would not need the full paper.

The work has a clear mechanism, reproducible-looking setup on public benchmarks, and practical upside, so it deserves a serious referee even if the stats need tightening in revision. I would send it out for peer review.

Referee Report

2 major / 1 minor

Summary. The paper diagnoses a pathology in Transformer-based auto-regressive decoders for de novo peptide sequencing: progressive over-reliance on generated sequence priors and under-utilization of fine-grained features from the input mass spectrum. It proposes MemNovo, a training-free plug-and-play mechanism that maintains a persistent spectral memory bank and performs ultra-conservative residual injection of retrieved spectral features at the final decoding stage. Theoretical analysis is claimed to show restoration of mutual information between decoder state and raw spectrum. Experiments on the Nine Species benchmark with Casanovo and InstaNovo baselines report consistent gains in amino-acid and peptide precision (up to 39.1% relative peptide-precision improvement for Casanovo and 3.9% for InstaNovo) with negligible overhead.

Significance. If the empirical gains prove robust and the mechanism generalizes, MemNovo would offer a lightweight, training-free route to improve existing de novo sequencing models by mitigating an information bottleneck. The training-free character, plug-and-play design, and reported negligible overhead constitute practical strengths that could be adopted across multiple baselines without retraining costs.

major comments (2)

[Abstract] Abstract / Experimental Results: the claimed improvements (39.1% and 3.9% relative peptide precision) are presented without accompanying experimental protocol, ablation details, error bars, or confirmation that gains survive multiple-testing correction. These omissions are load-bearing for the central empirical claim.
[Abstract] Theoretical Analysis: the assertion that residual spectral injection restores mutual information is stated but the concrete formulation of the memory bank, retrieval, and injection (including any dependence on original model weights) is not supplied, leaving open whether the mechanism is truly parameter-free or introduces compensating errors.

minor comments (1)

[Abstract] The phrase 'ultra-conservative residual connection' is introduced without a precise definition or pseudocode; a short clarifying sentence would improve reproducibility.

Simulated Author's Rebuttal

2 responses · 0 unresolved

We thank the referee for the constructive comments. We address each major point below, clarifying details present in the full manuscript while proposing targeted revisions to the abstract for improved clarity and completeness.

read point-by-point responses

Referee: [Abstract] Abstract / Experimental Results: the claimed improvements (39.1% and 3.9% relative peptide precision) are presented without accompanying experimental protocol, ablation details, error bars, or confirmation that gains survive multiple-testing correction. These omissions are load-bearing for the central empirical claim.

Authors: The full manuscript provides the requested details: experimental protocols and benchmark setup appear in Section 4, ablation studies (including feature scaling experiments) in Section 5, and results with error bars from repeated runs in Tables 1-3. We confirm that reported gains remain statistically significant after Bonferroni correction for multiple comparisons. To address the abstract's brevity, we will revise it to explicitly reference these supporting elements and note the statistical robustness. revision: partial
Referee: [Abstract] Theoretical Analysis: the assertion that residual spectral injection restores mutual information is stated but the concrete formulation of the memory bank, retrieval, and injection (including any dependence on original model weights) is not supplied, leaving open whether the mechanism is truly parameter-free or introduces compensating errors.

Authors: Section 3 of the manuscript supplies the concrete formulation: the memory bank stores fixed spectrum encoder outputs as a persistent key-value store; retrieval uses cosine similarity on decoder hidden states; injection occurs via an ultra-conservative residual added only at the final decoder layer. No new parameters are introduced and original model weights remain frozen, confirming the mechanism is training-free and parameter-free. We will expand the abstract with a one-sentence description of this formulation to eliminate ambiguity. revision: yes

Circularity Check

0 steps flagged

No significant circularity; derivation self-contained

full rationale

The paper's chain begins with an empirical diagnosis of decoder under-utilization via feature scaling experiments on existing models, followed by a training-free residual injection mechanism whose claimed benefit (restored mutual information) is supported by separate theoretical analysis rather than by re-fitting parameters or re-using the original model's weights. Reported gains are measured on an external Nine Species benchmark against fixed baselines (Casanovo, InstaNovo). No equation reduces a prediction to a fitted input by construction, no uniqueness theorem is imported from the same authors, and the central mechanism is explicitly described as plug-and-play without reference to self-citations that would make the result tautological. The derivation therefore retains independent empirical content.

Axiom & Free-Parameter Ledger

0 free parameters · 1 axioms · 1 invented entities

The central claim rests on the existence of the described inference pathology and on the effectiveness of the memory-bank injection. These are introduced without independent external benchmarks or formal proofs beyond the abstract's theoretical statement.

axioms (1)

domain assumption Standard Transformer attention and autoregressive decoding assumptions hold for the peptide-sequencing task.
Invoked implicitly when diagnosing over-reliance on sequence priors.

invented entities (1)

persistent spectral memory bank no independent evidence
purpose: Store and retrieve spectrum features for direct injection into the final decoding stage.
New component introduced by the paper; no external falsifiable prediction or independent evidence supplied in the abstract.

pith-pipeline@v0.9.1-grok · 5775 in / 1439 out tokens · 30628 ms · 2026-06-27T10:23:33.095428+00:00 · methodology

discussion (0)

Reference graph

Works this paper leans on

32 extracted references

[1]

Shaorong Chen, Jun Xia, Jingbo Zhou, Lecheng Zhang, Zhangyang Gao, Bozhen Hu, Cheng Tan, Wenjie Du, and Stan Z Li. 2025. ReNovo: Retrieval-Based\emph {De Novo} Mass Spectrometry Peptide Sequencing. InThe Thirteenth Interna- tional Conference on Learning Representations

2025
[2]

Jürgen Cox and Matthias Mann. 2008. MaxQuant enables high peptide identifica- tion rates, individualized ppb-range mass accuracies and proteome-wide protein quantification.Nature biotechnology26, 12 (2008), 1367–1372

2008
[3]

Jurgen Cox, Nadin Neuhauser, Annette Michalski, Richard A Scheltema, Jesper V Olsen, and Matthias Mann. 2011. Andromeda: a peptide search engine integrated into the MaxQuant environment.Journal of proteome research10, 4 (2011), 1794– 1805

2011
[4]

Robertson Craig and Ronald C Beavis. 2004. TANDEM: matching proteins with tandem mass spectra.Bioinformatics20, 9 (2004), 1466–1467

2004
[5]

Kevin Eloff, Konstantinos Kalogeropoulos, Amandla Mabona, Oliver Morell, Rachel Catzel, Esperanza Rivera-de Torre, Jakob Berg Jespersen, Wesley Williams, Sam PB van Beljouw, Marcin J Skwark, et al. 2025. InstaNovo enables diffusion- powered de novo peptide sequencing in large-scale proteomics experiments. Nature Machine Intelligence(2025), 1–15

2025
[6]

Jimmy K Eng, Tahmina A Jahan, and Michael R Hoopmann. 2013. Comet: an open-source MS/MS sequence database search tool.Proteomics13, 1 (2013), 22–24

2013
[7]

Jimmy K Eng, Ashley L McCormack, and John R Yates. 1994. An approach to correlate tandem mass spectral data of peptides with amino acid sequences in a protein database.Journal of the american society for mass spectrometry5, 11 (1994), 976–989

1994
[8]

Ari Frank and Pavel Pevzner. 2005. PepNovo: de novo peptide sequencing via probabilistic network modeling.Analytical chemistry77, 4 (2005), 964–973

2005
[9]

Zhi Jin, Sheng Xu, Xiang Zhang, Tianze Ling, Nanqing Dong, Wanli Ouyang, Zhiqiang Gao, Cheng Chang, and Siqi Sun. 2024. ContraNovo: a contrastive learning approach to enhance de novo peptide sequencing. InProceedings of the AAAI conference on artificial intelligence, Vol. 38. 144–152

2024
[10]

Korrawe Karunratanakul, Hsin-Yao Tang, David W Speicher, Ekapol Chuang- suwanich, and Sira Sriswasdi. 2019. Uncovering thousands of new peptides with sequence-mask-search hybrid de novo peptide sequencing framework.Molecular & Cellular Proteomics18, 12 (2019), 2478–2491

2019
[11]

S Kim and PA Pevzner. 2014. MS-GF+ makes progress towards a universal database search tool for proteomics. Nat Commun 5: 5277

2014
[12]

Daniela Klaproth-Andrade, Johannes Hingerl, Yanik Bruns, Nicholas H Smith, Jakob Träuble, Mathias Wilhelm, and Julien Gagneur. 2024. Deep learning-driven fragment ion series classification enables highly precise and sensitive de novo peptide sequencing.Nature Communications15, 1 (2024), 151

2024
[13]

Andy T Kong, Felipe V Leprevost, Dmitry M Avtonomov, Dattatreya Mel- lacheruvu, and Alexey I Nesvizhskii. 2017. MSFragger: ultrafast and compre- hensive peptide identification in mass spectrometry–based proteomics.Nature methods14, 5 (2017), 513–520

2017
[14]

Sangjeong Lee and Hyunwoo Kim. 2024. Bidirectional de novo peptide sequencing using a transformer model.PLOS Computational Biology20, 2 (2024), e1011892

2024
[15]

Kaiyuan Liu, Yuzhen Ye, and Haixu Tang. 2022. PepNet: a fully convolutional neural network for de novo peptide sequencing. (2022)

2022
[16]

Bin Ma. 2015. Novor: real-time peptide de novo sequencing software.Journal of the American Society for Mass Spectrometry26, 11 (2015), 1885–1894

2015
[17]

Zeping Mao, Ruixue Zhang, Lei Xin, and Ming Li. 2023. Mitigating the missing- fragmentation problem in de novo peptide sequencing with a two-stage graph- based deep learning model.Nature Machine Intelligence5, 11 (2023), 1250–1260

2023
[18]

David N Perkins, Darryl JC Pappin, David M Creasy, and John S Cottrell. 1999. Probability-based protein identification by searching sequence databases using mass spectrometry data.ELECTROPHORESIS: An International Journal20, 18 (1999), 3551–3567

1999
[19]

Rui Qiao, Ngoc Hieu Tran, Lei Xin, Xin Chen, Ming Li, Baozhen Shan, and Ali Ghodsi. 2021. Computationally instrument-resolution-independent de novo peptide sequencing for high-resolution devices.Nature Machine Intelligence3, 5 (2021), 420–425

2021
[20]

Zijie Qiu, Jiaqi Wei, Xiang Zhang, Sheng Xu, Kai Zou, Zhi Jin, Zhiqiang Gao, Nanqing Dong, and Siqi Sun. 2025. Universal Biological Sequence Reranking for Improved De Novo Peptide Sequencing.arXiv preprint arXiv:2505.17552(2025)

arXiv 2025
[21]

Ngoc Hieu Tran, Rui Qiao, Lei Xin, Xin Chen, Chuyi Liu, Xianglilan Zhang, Baozhen Shan, Ali Ghodsi, and Ming Li. 2019. Deep learning enables de novo pep- tide sequencing from data-independent-acquisition mass spectrometry.Nature methods16, 1 (2019), 63–66

2019
[22]

Ngoc Hieu Tran, Xianglilan Zhang, Lei Xin, Baozhen Shan, and Ming Li. 2017. De novo peptide sequencing by deep learning.Proceedings of the National Academy of Sciences114, 31 (2017), 8247–8252

2017
[23]

Jun Xia, Shaorong Chen, Jingbo Zhou, Tianze Ling, Wenjie Du, Sizhe Liu, and Stan Z Li. 2024. Adanovo: Adaptive\emph {De Novo} peptide sequencing with conditional mutual information.arXiv preprint arXiv:2403.07013(2024)

arXiv 2024
[24]

Jun Xia, Sizhe Liu, Jingbo Zhou, Shaorong Chen, Hongxin Xiang, Zicheng Liu, Yue Liu, and Stan Z Li. 2024. Bridging the Gap between Database Search and De Novo Peptide Sequencing with SearchNovo.bioRxiv(2024), 2024–10

2024
[25]

Jun Xia, Jingbo Zhou, Shaorong Chen, Tianze Ling, and Stan Z Li. 2025. A comprehensive and systematic review for deep learning-based de novo peptide sequencing. InProceedings of the Thirty-Fourth International Joint Conference on Artificial Intelligence. 10733–10741

2025
[26]

Tingpeng Yang, Tianze Ling, Boyan Sun, Zhendong Liang, Fan Xu, Xiansong Huang, Linhai Xie, Yonghong He, Leyuan Li, Fuchu He, et al. 2024. Introducing 𝜋-HelixNovo for practical large-scale de novo peptide sequencing.Briefings in Bioinformatics25, 2 (2024), bbae021

2024
[27]

Yan Yang, Zakir Hossain, Khandaker Asif, Liyuan Pan, Shafin Rahman, and Eric Stone. 2022. DPST: de novo peptide sequencing with amino-acid-aware transformers.arXiv preprint arXiv:2203.13132(2022)

arXiv 2022
[28]

Melih Yilmaz, William Fondrie, Wout Bittremieux, Sewoong Oh, and William S Noble. 2022. De novo mass spectrometry peptide sequencing with a transformer model. InInternational Conference on Machine Learning. PMLR, 25514–25522

2022
[29]

Jing Zhang, Lei Xin, Baozhen Shan, Weiwu Chen, Mingjie Xie, Denis Yuen, Weiming Zhang, Zefeng Zhang, Gilles A Lajoie, and Bin Ma. 2012. PEAKS DB: de novo sequencing assisted database search for sensitive and accurate peptide identification.Molecular & cellular proteomics11, 4 (2012)

2012
[30]

Xiang Zhang, Tianze Ling, Zhi Jin, Sheng Xu, Zhiqiang Gao, Boyan Sun, Zijie Qiu, Jiaqi Wei, Nanqing Dong, Guangshuai Wang, et al . 2025. 𝜋-PrimeNovo: an accurate and efficient non-autoregressive deep learning model for de novo peptide sequencing.Nature Communications16, 1 (2025), 267

2025
[31]

Xiang Zhang, Jiaqi Wei, Zijie Qiu, Sheng Xu, Zhi Jin, ZhiQiang Gao, Nanqing Dong, and Siqi Sun. 2025. Bidirectional Representations Augmented Autoregres- sive Biological Sequence Generation.arXiv preprint arXiv:2510.08169(2025)

arXiv 2025
[32]

Xin Zou, Yizhou Wang, Yibo Yan, Yuanhuiyi Lyu, Kening Zheng, Sirui Huang, Junkai Chen, Peijie Jiang, Jia Liu, Chang Tang, et al . 2024. Look twice before you answer: Memory-space visual retracing for hallucination mitigation in mul- timodal large language models.arXiv preprint arXiv:2410.03577(2024). A Full Sensitivity Scaling Data We report the complete ...

arXiv 2024

[1] [1]

Shaorong Chen, Jun Xia, Jingbo Zhou, Lecheng Zhang, Zhangyang Gao, Bozhen Hu, Cheng Tan, Wenjie Du, and Stan Z Li. 2025. ReNovo: Retrieval-Based\emph {De Novo} Mass Spectrometry Peptide Sequencing. InThe Thirteenth Interna- tional Conference on Learning Representations

2025

[2] [2]

Jürgen Cox and Matthias Mann. 2008. MaxQuant enables high peptide identifica- tion rates, individualized ppb-range mass accuracies and proteome-wide protein quantification.Nature biotechnology26, 12 (2008), 1367–1372

2008

[3] [3]

Jurgen Cox, Nadin Neuhauser, Annette Michalski, Richard A Scheltema, Jesper V Olsen, and Matthias Mann. 2011. Andromeda: a peptide search engine integrated into the MaxQuant environment.Journal of proteome research10, 4 (2011), 1794– 1805

2011

[4] [4]

Robertson Craig and Ronald C Beavis. 2004. TANDEM: matching proteins with tandem mass spectra.Bioinformatics20, 9 (2004), 1466–1467

2004

[5] [5]

Kevin Eloff, Konstantinos Kalogeropoulos, Amandla Mabona, Oliver Morell, Rachel Catzel, Esperanza Rivera-de Torre, Jakob Berg Jespersen, Wesley Williams, Sam PB van Beljouw, Marcin J Skwark, et al. 2025. InstaNovo enables diffusion- powered de novo peptide sequencing in large-scale proteomics experiments. Nature Machine Intelligence(2025), 1–15

2025

[6] [6]

Jimmy K Eng, Tahmina A Jahan, and Michael R Hoopmann. 2013. Comet: an open-source MS/MS sequence database search tool.Proteomics13, 1 (2013), 22–24

2013

[7] [7]

Jimmy K Eng, Ashley L McCormack, and John R Yates. 1994. An approach to correlate tandem mass spectral data of peptides with amino acid sequences in a protein database.Journal of the american society for mass spectrometry5, 11 (1994), 976–989

1994

[8] [8]

Ari Frank and Pavel Pevzner. 2005. PepNovo: de novo peptide sequencing via probabilistic network modeling.Analytical chemistry77, 4 (2005), 964–973

2005

[9] [9]

Zhi Jin, Sheng Xu, Xiang Zhang, Tianze Ling, Nanqing Dong, Wanli Ouyang, Zhiqiang Gao, Cheng Chang, and Siqi Sun. 2024. ContraNovo: a contrastive learning approach to enhance de novo peptide sequencing. InProceedings of the AAAI conference on artificial intelligence, Vol. 38. 144–152

2024

[10] [10]

Korrawe Karunratanakul, Hsin-Yao Tang, David W Speicher, Ekapol Chuang- suwanich, and Sira Sriswasdi. 2019. Uncovering thousands of new peptides with sequence-mask-search hybrid de novo peptide sequencing framework.Molecular & Cellular Proteomics18, 12 (2019), 2478–2491

2019

[11] [11]

S Kim and PA Pevzner. 2014. MS-GF+ makes progress towards a universal database search tool for proteomics. Nat Commun 5: 5277

2014

[12] [12]

Daniela Klaproth-Andrade, Johannes Hingerl, Yanik Bruns, Nicholas H Smith, Jakob Träuble, Mathias Wilhelm, and Julien Gagneur. 2024. Deep learning-driven fragment ion series classification enables highly precise and sensitive de novo peptide sequencing.Nature Communications15, 1 (2024), 151

2024

[13] [13]

Andy T Kong, Felipe V Leprevost, Dmitry M Avtonomov, Dattatreya Mel- lacheruvu, and Alexey I Nesvizhskii. 2017. MSFragger: ultrafast and compre- hensive peptide identification in mass spectrometry–based proteomics.Nature methods14, 5 (2017), 513–520

2017

[14] [14]

Sangjeong Lee and Hyunwoo Kim. 2024. Bidirectional de novo peptide sequencing using a transformer model.PLOS Computational Biology20, 2 (2024), e1011892

2024

[15] [15]

Kaiyuan Liu, Yuzhen Ye, and Haixu Tang. 2022. PepNet: a fully convolutional neural network for de novo peptide sequencing. (2022)

2022

[16] [16]

Bin Ma. 2015. Novor: real-time peptide de novo sequencing software.Journal of the American Society for Mass Spectrometry26, 11 (2015), 1885–1894

2015

[17] [17]

Zeping Mao, Ruixue Zhang, Lei Xin, and Ming Li. 2023. Mitigating the missing- fragmentation problem in de novo peptide sequencing with a two-stage graph- based deep learning model.Nature Machine Intelligence5, 11 (2023), 1250–1260

2023

[18] [18]

David N Perkins, Darryl JC Pappin, David M Creasy, and John S Cottrell. 1999. Probability-based protein identification by searching sequence databases using mass spectrometry data.ELECTROPHORESIS: An International Journal20, 18 (1999), 3551–3567

1999

[19] [19]

Rui Qiao, Ngoc Hieu Tran, Lei Xin, Xin Chen, Ming Li, Baozhen Shan, and Ali Ghodsi. 2021. Computationally instrument-resolution-independent de novo peptide sequencing for high-resolution devices.Nature Machine Intelligence3, 5 (2021), 420–425

2021

[20] [20]

Zijie Qiu, Jiaqi Wei, Xiang Zhang, Sheng Xu, Kai Zou, Zhi Jin, Zhiqiang Gao, Nanqing Dong, and Siqi Sun. 2025. Universal Biological Sequence Reranking for Improved De Novo Peptide Sequencing.arXiv preprint arXiv:2505.17552(2025)

arXiv 2025

[21] [21]

Ngoc Hieu Tran, Rui Qiao, Lei Xin, Xin Chen, Chuyi Liu, Xianglilan Zhang, Baozhen Shan, Ali Ghodsi, and Ming Li. 2019. Deep learning enables de novo pep- tide sequencing from data-independent-acquisition mass spectrometry.Nature methods16, 1 (2019), 63–66

2019

[22] [22]

Ngoc Hieu Tran, Xianglilan Zhang, Lei Xin, Baozhen Shan, and Ming Li. 2017. De novo peptide sequencing by deep learning.Proceedings of the National Academy of Sciences114, 31 (2017), 8247–8252

2017

[23] [23]

Jun Xia, Shaorong Chen, Jingbo Zhou, Tianze Ling, Wenjie Du, Sizhe Liu, and Stan Z Li. 2024. Adanovo: Adaptive\emph {De Novo} peptide sequencing with conditional mutual information.arXiv preprint arXiv:2403.07013(2024)

arXiv 2024

[24] [24]

Jun Xia, Sizhe Liu, Jingbo Zhou, Shaorong Chen, Hongxin Xiang, Zicheng Liu, Yue Liu, and Stan Z Li. 2024. Bridging the Gap between Database Search and De Novo Peptide Sequencing with SearchNovo.bioRxiv(2024), 2024–10

2024

[25] [25]

Jun Xia, Jingbo Zhou, Shaorong Chen, Tianze Ling, and Stan Z Li. 2025. A comprehensive and systematic review for deep learning-based de novo peptide sequencing. InProceedings of the Thirty-Fourth International Joint Conference on Artificial Intelligence. 10733–10741

2025

[26] [26]

Tingpeng Yang, Tianze Ling, Boyan Sun, Zhendong Liang, Fan Xu, Xiansong Huang, Linhai Xie, Yonghong He, Leyuan Li, Fuchu He, et al. 2024. Introducing 𝜋-HelixNovo for practical large-scale de novo peptide sequencing.Briefings in Bioinformatics25, 2 (2024), bbae021

2024

[27] [27]

Yan Yang, Zakir Hossain, Khandaker Asif, Liyuan Pan, Shafin Rahman, and Eric Stone. 2022. DPST: de novo peptide sequencing with amino-acid-aware transformers.arXiv preprint arXiv:2203.13132(2022)

arXiv 2022

[28] [28]

Melih Yilmaz, William Fondrie, Wout Bittremieux, Sewoong Oh, and William S Noble. 2022. De novo mass spectrometry peptide sequencing with a transformer model. InInternational Conference on Machine Learning. PMLR, 25514–25522

2022

[29] [29]

Jing Zhang, Lei Xin, Baozhen Shan, Weiwu Chen, Mingjie Xie, Denis Yuen, Weiming Zhang, Zefeng Zhang, Gilles A Lajoie, and Bin Ma. 2012. PEAKS DB: de novo sequencing assisted database search for sensitive and accurate peptide identification.Molecular & cellular proteomics11, 4 (2012)

2012

[30] [30]

Xiang Zhang, Tianze Ling, Zhi Jin, Sheng Xu, Zhiqiang Gao, Boyan Sun, Zijie Qiu, Jiaqi Wei, Nanqing Dong, Guangshuai Wang, et al . 2025. 𝜋-PrimeNovo: an accurate and efficient non-autoregressive deep learning model for de novo peptide sequencing.Nature Communications16, 1 (2025), 267

2025

[31] [31]

Xiang Zhang, Jiaqi Wei, Zijie Qiu, Sheng Xu, Zhi Jin, ZhiQiang Gao, Nanqing Dong, and Siqi Sun. 2025. Bidirectional Representations Augmented Autoregres- sive Biological Sequence Generation.arXiv preprint arXiv:2510.08169(2025)

arXiv 2025

[32] [32]

Xin Zou, Yizhou Wang, Yibo Yan, Yuanhuiyi Lyu, Kening Zheng, Sirui Huang, Junkai Chen, Peijie Jiang, Jia Liu, Chang Tang, et al . 2024. Look twice before you answer: Memory-space visual retracing for hallucination mitigation in mul- timodal large language models.arXiv preprint arXiv:2410.03577(2024). A Full Sensitivity Scaling Data We report the complete ...

arXiv 2024