pith. sign in

arxiv: 2606.12219 · v1 · pith:VWE4QSDKnew · submitted 2026-06-10 · 🧬 q-bio.GN · q-bio.MN

m6A-FORM: A Foundation Model for Decoding N6-methyladenosine Biology

Pith reviewed 2026-06-27 07:33 UTC · model grok-4.3

classification 🧬 q-bio.GN q-bio.MN
keywords m6AN6-methyladenosineRNA methylationfoundation modeltransformersite predictionMeRIP-seqepitranscriptomics
0
0 comments X

The pith

m6A-FORM predicts m6A sites with PR-AUC of 0.635 after pretraining on MeRIP-seq peaks, improving over existing methods by at least 0.14.

A machine-rendered reading of the paper's core claim, the machinery that carries it, and where it could break.

The paper introduces m6A-FORM, a transformer-based foundation model for decoding N6-methyladenosine biology in RNA. It pretrains on approximately 22 million sequences derived from MeRIP-seq peaks across 143 human studies to learn methylation-enriched patterns. Fine-tuning on high-confidence single-nucleotide annotations then yields state-of-the-art prediction of m6A sites with PR-AUC 0.635 and ROC-AUC 0.988. The model also adapts to predict binding sites of m6A regulators and identifies thousands of tissue-conserved sites with distinct biological signatures. This approach addresses inefficiencies in prior adenosine-centered predictors by using peak priors for better accuracy and speed.

Core claim

m6A-FORM is a transformer-based foundation model that uses MeRIP-seq peaks as methylation-enriched priors and is pretrained on approximately 22 million peak-derived sequences from 143 human MeRIP-seq studies. After fine-tuning with high-confidence single-nucleotide m6A annotations from m6A-Atlas v2.0 and GLORI, it achieves a PR-AUC of 0.635 and ROC-AUC of 0.988 for m6A site prediction, improving PR-AUC by at least 0.14 over existing methods while enabling substantially faster inference. Task-specific adaptation supports prediction of binding sites for 19 m6A-associated regulators and identification of YTHDF2-bound m6A sites associated with mRNA degradation. Applying the model across 67 datas

What carries the argument

transformer-based foundation model that uses MeRIP-seq peaks as methylation-enriched priors for pretraining on peak-derived sequences

If this is right

  • The model achieves substantially faster inference for m6A site prediction compared to prior methods.
  • Task-specific adaptation enables prediction of binding sites for 19 m6A-associated regulators.
  • The model identifies YTHDF2-bound m6A sites associated with mRNA degradation.
  • Application across 67 datasets from 24 human tissues yields 19,631 tissue-conserved m6A sites with distinct signatures.

Where Pith is reading between the lines

These are editorial extensions of the paper, not claims the author makes directly.

  • The peak-based pretraining approach might extend to prediction tasks for other RNA modifications where peak data exists.
  • The tissue-conserved sites could provide candidates for experiments testing effects on mRNA decay rates in specific cell types.
  • Faster inference could allow scanning of larger transcriptomes or integration with other sequencing datasets for combined analyses.

Load-bearing premise

MeRIP-seq peaks serve as reliable methylation-enriched priors for pretraining and the single-nucleotide annotations from m6A-Atlas v2.0 and GLORI constitute accurate ground truth without substantial false positives or selection biases.

What would settle it

An independent validation experiment using an orthogonal technique such as mass spectrometry on held-out tissue samples to check whether the predicted m6A sites match at the reported accuracy levels.

read the original abstract

N6-methyladenosine (m6A) is the most abundant internal modification in eukaryotic mRNA. However, most existing predictors use adenosine-centered formulations that are computationally inefficient and prone to false positives. Here we present m6A-FORM, a transformer-based foundation model for RNA methylation that uses MeRIP-seq peaks as methylation-enriched priors and is pretrained on approximately 22 million peak-derived sequences from 143 human MeRIP-seq studies. After fine-tuning with high-confidence single-nucleotide m6A annotations from m6A-Atlas v2.0 and GLORI, m6A-FORM-sites achieves state-of-the-art m6A site prediction performance, with a PR-AUC of 0.635 and ROC-AUC of 0.988, improving PR-AUC by at least 0.14 over existing methods while enabling substantially faster inference. Task-specific adaptation further supports prediction of binding sites for 19 m6A-associated regulators and identification of YTHDF2-bound m6A sites associated with mRNA degradation. Applying m6A-FORM across 67 datasets from 24 human tissues identifies 19,631 tissue-conserved sites with distinct localization, clustering, methylation, expression, RBP-interaction, and decay-associated signatures.

Editorial analysis

A structured set of objections, weighed in public.

Desk editor's note, referee report, simulated authors' rebuttal, and a circularity audit. Tearing a paper down is the easy half of reading it; the pith above is the substance, this is the friction.

Referee Report

3 major / 2 minor

Summary. The paper presents m6A-FORM, a transformer-based foundation model pretrained on ~22 million sequences derived from MeRIP-seq peaks across 143 human studies. After fine-tuning on single-nucleotide m6A annotations from m6A-Atlas v2.0 and GLORI, the m6A-FORM-sites variant reports state-of-the-art performance (PR-AUC 0.635, ROC-AUC 0.988) for m6A site prediction, with claimed improvement of at least 0.14 in PR-AUC over prior methods, faster inference, and downstream applications to 19 regulator binding sites and 19,631 tissue-conserved sites identified across 67 datasets from 24 tissues.

Significance. If the performance metrics are shown to be robust, the work offers a potentially useful large-scale pretrained model for m6A biology that could improve site prediction efficiency and enable tissue-level analyses. The scale of the pretraining corpus (~22M sequences) is a clear strength relative to prior adenosine-centered predictors.

major comments (3)
  1. [Results] Results (m6A site prediction experiments): The headline PR-AUC of 0.635 and 0.14 improvement over baselines are reported without any description of train-test split methodology, baseline re-implementations, statistical error bars, or explicit controls for data leakage between the MeRIP-seq peak pretraining corpus and the fine-tuning labels from m6A-Atlas v2.0/GLORI; this directly undermines verification of the central SOTA claim.
  2. [Methods] Methods (fine-tuning data curation): The model treats single-nucleotide annotations from m6A-Atlas v2.0 and GLORI as high-confidence ground truth, yet no analysis or external validation is provided for potential false-positive rates, tissue-selection biases, or sequence-context artifacts common in aggregated MeRIP/GLORI compilations; if present, these would systematically inflate both absolute metrics and the reported improvement.
  3. [Results] Results (tissue-conserved sites analysis): The identification of 19,631 tissue-conserved sites and their downstream signatures (localization, RBP interaction, decay) inherits the same label-quality dependency as the site-prediction task; without independent orthogonal validation (e.g., mass-spec or orthogonal sequencing), the biological conclusions rest on the same unverified ground-truth assumption.
minor comments (2)
  1. The abstract states performance numbers but the main text should include a dedicated table comparing all baselines with exact PR-AUC/ROC-AUC values, inference times, and parameter counts for reproducibility.
  2. Notation for the foundation model variants (m6A-FORM vs. m6A-FORM-sites) is introduced without an explicit definition table or diagram showing the pretraining vs. fine-tuning stages.

Simulated Author's Rebuttal

3 responses · 0 unresolved

We thank the referee for their detailed and constructive comments. We address each major comment point-by-point below, committing to revisions that add missing methodological details and explicit discussions of limitations.

read point-by-point responses
  1. Referee: [Results] Results (m6A site prediction experiments): The headline PR-AUC of 0.635 and 0.14 improvement over baselines are reported without any description of train-test split methodology, baseline re-implementations, statistical error bars, or explicit controls for data leakage between the MeRIP-seq peak pretraining corpus and the fine-tuning labels from m6A-Atlas v2.0/GLORI; this directly undermines verification of the central SOTA claim.

    Authors: We agree that these details are essential for verifying the central claims. In the revised manuscript we will add a dedicated subsection describing the train-test split protocol (including sequence-identity filtering to prevent leakage between the ~22M pretraining sequences and the m6A-Atlas/GLORI fine-tuning labels), the exact re-implementation steps for each baseline, and statistical error bars obtained from multiple random seeds or cross-validation folds. revision: yes

  2. Referee: [Methods] Methods (fine-tuning data curation): The model treats single-nucleotide annotations from m6A-Atlas v2.0 and GLORI as high-confidence ground truth, yet no analysis or external validation is provided for potential false-positive rates, tissue-selection biases, or sequence-context artifacts common in aggregated MeRIP/GLORI compilations; if present, these would systematically inflate both absolute metrics and the reported improvement.

    Authors: We acknowledge that the original submission did not include an explicit analysis of label quality. In revision we will insert a new paragraph in Methods that discusses known limitations of aggregated MeRIP-seq and GLORI compilations, cites supporting literature on their false-positive characteristics, and notes potential tissue biases. A full orthogonal experimental validation lies outside the scope of this computational study. revision: partial

  3. Referee: [Results] Results (tissue-conserved sites analysis): The identification of 19,631 tissue-conserved sites and their downstream signatures (localization, RBP interaction, decay) inherits the same label-quality dependency as the site-prediction task; without independent orthogonal validation (e.g., mass-spec or orthogonal sequencing), the biological conclusions rest on the same unverified ground-truth assumption.

    Authors: We agree that the conserved-site conclusions rest on the same label assumptions. The revised manuscript will explicitly state this dependency, add a limitations paragraph, and frame the reported signatures as computational observations that motivate future orthogonal experiments. The internal consistency of the signatures (e.g., expected RBP and decay associations) provides supporting context but does not replace independent validation. revision: partial

Circularity Check

0 steps flagged

No circularity: empirical ML pipeline with external labels and no derivation steps

full rationale

The paper presents a transformer foundation model pretrained on MeRIP-seq peak sequences and fine-tuned on single-nucleotide annotations from the external m6A-Atlas v2.0 and GLORI resources. Reported metrics (PR-AUC 0.635, ROC-AUC 0.988) are standard supervised evaluation outcomes on held-out data rather than any claimed first-principles derivation. No equations, self-definitional loops, fitted-input-as-prediction steps, or load-bearing self-citations appear in the described pipeline. The central claims rest on empirical performance against independent annotations and do not reduce to the model's own inputs by construction.

Axiom & Free-Parameter Ledger

0 free parameters · 2 axioms · 0 invented entities

Abstract-only; the central performance claim rests on the unstated assumption that the cited databases and MeRIP-seq peaks are high-quality ground truth. No explicit free parameters or invented entities are named.

axioms (2)
  • domain assumption MeRIP-seq peaks provide reliable methylation-enriched priors for pretraining
    Stated in abstract as the basis for pretraining on 22 million sequences.
  • domain assumption m6A-Atlas v2.0 and GLORI supply accurate single-nucleotide ground truth
    Used for fine-tuning and claimed to enable state-of-the-art performance.

pith-pipeline@v0.9.1-grok · 5769 in / 1698 out tokens · 29068 ms · 2026-06-27T07:33:38.600531+00:00 · methodology

discussion (0)

Sign in with ORCID, Apple, or X to comment. Anyone can read and Pith papers without signing in.

Reference graph

Works this paper leans on

57 extracted references

  1. [1]

    Clustered

    rely on highly similar experimental principles, we treated them as a single technology when counting supporting evi dence. Using these criteria, we constructed a high -confidence dataset containing 131,320 base-resolution m6A sites. Dataset preparation for m6A sites identification We collected 528,452 MeRIP -seq peaks from five human cell lines with the l...

  2. [2]

    Nature, 2014

    Wang, X., et al., N6-methyladenosine-dependent regulation of messenger RNA stability. Nature, 2014. 505(7481): p. 117-120

  3. [3]

    Nature, 2012

    Dominissini, D., et al., Topology of the human and mouse m6A RNA methylomes revealed by m6A-seq. Nature, 2012. 485(7397): p. 201-206

  4. [4]

    Cell Genom, 2024

    Fan, R., et al., A combined deep learning framework for mammalian m6A site prediction. Cell Genom, 2024. 4(12): p. 100697

  5. [5]

    Briefings in Functional Genomics, 2025

    Huang, X., et al., m6A RNA modification pathway: orchestrating fibrotic mechanisms across multiple organs. Briefings in Functional Genomics, 2025. 24

  6. [6]

    Nature, 2017

    Barbieri, I., et al., Promoter-bound METTL3 maintains myeloid leukaemia by m6A- dependent translation control. Nature, 2017. 552(7683): p. 126-131

  7. [7]

    Trends in Molecular Medicine, 2023

    Liu, Y ., et al., N6-methyladenosine-mediated gene regulation and therapeutic implications. Trends in Molecular Medicine, 2023. 29(6): p. 454-467

  8. [8]

    Bioinformatics, 2023

    Zhang, Y ., et al., Interpretable prediction models for widespread m6A RNA modification across cell lines and tissues. Bioinformatics, 2023. 39(12)

  9. [9]

    Bioinformatics, 2024

    Ni, P ., et al., RNA m6A detection using raw current sig nals and basecalling errors from Nanopore direct RNA sequencing reads. Bioinformatics, 2024. 40(6)

  10. [10]

    Nature Methods, 2015

    Linder, B., et al., Single-nucleotide-resolution mapping of m6A and m6Am throughout the transcriptome. Nature Methods, 2015. 12(8): p. 767-772

  11. [11]

    Nature Biotechnology, 2023

    Liu, C., et al., Absolute quantification of single -base m6A methylation in the mammalian transcriptome using GLORI. Nature Biotechnology, 2023. 41(3): p. 355-366

  12. [12]

    Nucleic Acids Research, 2016

    Zhou, Y ., et al., SRAMP: prediction of mammalian N6 -methyladenosine (m6A) sites based on sequence-derived features. Nucleic Acids Research, 2016. 44(10): p. e91-e91

  13. [13]

    Nucleic Acids Research,

    Chen, K., et al., WHISTLE: a high-accuracy map of the human N6 -methyladenosine (m6A) epitranscriptome predicted using a machine learning approach. Nucleic Acids Research,

  14. [14]

    RNA Biol, 2021

    Li, J., et al., HSm6AP: a high-precision predictor for the Homo sapiens N6-methyladenosine (m;6 A) based on multiple weights and feature stitching. RNA Biol, 2021. 18(11): p. 1882- 1892

  15. [15]

    Cell Genomics, 2024

    Fan, R., et al., A combined deep learning framework for mammalian m6A site prediction. Cell Genomics, 2024. 4(12)

  16. [16]

    Nature Communications,

    Song, Z., et al., Attention-based multi-label neural networks for integrated prediction and interpretation of twelve widely occurring RNA modifications. Nature Communications,

  17. [17]

    BMC Bioinformatics, 2024

    Tu, G., et al., m6A-TCPred: a web server to predict tissue-conserved human m6A sites using machine learning approach. BMC Bioinformatics, 2024. 25(1): p. 127

  18. [18]

    Nucleic Acids Research, 2021

    Xiong, Y ., et al., Modeling multi-species RNA modification through multi -task curriculum learning. Nucleic Acids Research, 2021. 49(7): p. 3719-3734

  19. [19]

    Signal Transduct Target Ther, 2021

    Jiang, X., et al., The role of m6A modification in the biological functions and diseases. Signal Transduct Target Ther, 2021. 6(1): p. 74

  20. [20]

    BioRxiv, 2021: p

    Chen, Y ., et al., A systematic benchmark of Nanopore long read RNA sequencing for transcript level analysis in human cell lines. BioRxiv, 2021: p. 2021.04. 21.440736

  21. [21]

    Bioinformatics, 2021

    Ji, Y ., et al., DNABERT: pre -trained Bidirectional Encoder Representations from Transformers model for DNA-language in genome. Bioinformatics, 2021. 37(15): p. 2112- 2120

  22. [22]

    bioRxiv, 2026

    Jo, S., et al., Systematic identification of tissue -conserved m(6)A sites reveals a stable epitranscriptomic regulatory layer controlling essential genes. bioRxiv, 2026

  23. [23]

    Nucleic Acids Research, 2021

    Körtel, N., et al., Deep and accurate detection of m6A RNA modifications using miCLIP2 and m6Aboost machine learning. Nucleic Acids Research, 2021. 49(16): p. e92-e92

  24. [24]

    Nucleic Acids Research, 2023

    Liang, Z., et al., m6A-Atlas v2.0: updated resources for unraveling the N6-methyladenosine (m6A) epitranscriptome among multiple species. Nucleic Acids Research, 2023. 52(D1): p. D194-D202

  25. [25]

    Tegowski, and K.D

    Flamand, M.N., M. Tegowski, and K.D. Meyer, The Proteins of mRNA Modification: Writers, Readers, and Erasers. Annu Rev Biochem, 2023. 92: p. 145-173

  26. [26]

    Nucleic Acids Research, 2021

    Zhao, W., et al., POSTAR3: an updated platform for exploring post -transcriptional regulation coordinated by RNA-binding proteins. Nucleic Acids Research, 2021. 50(D1): p. D287-D294

  27. [27]

    Wang, X. and Y . Wang. Sentence-level resampling for named entity recognition . in Proceedings of the 2022 Conference of the North American Chapter of the Association for computational linguistics: human language technologies. 2022

  28. [28]

    BMC Genomics, 2018

    Pan, X., et al., Prediction of RNA-protein sequence and structure binding preferences using deep convolutional and recurrent neural networks. BMC Genomics, 2018. 19(1): p. 511

  29. [29]

    GigaScience, 2021

    Uhl, M., et al., RNAProt: an efficient and feature -rich RNA binding protein binding site predictor. GigaScience, 2021. 10(8)

  30. [30]

    Int J Gen Med, 2025

    Long, X., et al., RNA Binding Motif Protein 15 (RBM15): Structure, Function and Its Research Progress in Tumors. Int J Gen Med, 2025. 18: p. 3635-3649

  31. [31]

    Molecular Cell, 2016

    Xiao, W., et al., Nuclear m<sup>6</sup>A Reader YTHDC1 Regulates mRNA Splicing. Molecular Cell, 2016. 61(4): p. 507-519

  32. [32]

    Zaccara, S. and S.R. Jaffrey, A Unified Model for the Function of YTHDF Proteins in Regulating m6A-Modified mRNA. Cell, 2020. 181(7): p. 1582-1595.e18

  33. [33]

    Cell Reports,

    Boo, S.H., et al., UPF1 promotes rapid degradation of m6A-containing RNAs. Cell Reports,

  34. [34]

    Nature Cell Biology, 2018

    Huang, H., et al., Recognition of RNA N6 -methyladenosine by IGF2BP proteins enhances mRNA stability and translation. Nature Cell Biology, 2018. 20(3): p. 285-295

  35. [35]

    Molecular Cancer, 2024

    Ying, Y ., et al., Co-transcriptional R-loops-mediated epigenetic regulation drives growth retardation and docetaxel chemosensitivity enhancement in advanced prostate cancer. Molecular Cancer, 2024. 23(1): p. 79

  36. [36]

    Nat Cell Biol, 2018

    Huang, H., et al., Recognition of RNA N(6)-methyladenosine by IGF2BP proteins enhances mRNA stability and translation. Nat Cell Biol, 2018. 20(3): p. 285-295

  37. [37]

    Cell Death Discovery, 2022

    Yan, H., et al., Roles and mechanisms of the m6A reader YTHDC1 in biological processes and diseases. Cell Death Discovery, 2022. 8(1): p. 237

  38. [38]

    Journal of Translational Medicine, 2022

    Wang, X., et al., SRSF9 promotes colorectal cancer progression via stabilizing DSN1 mRNA in an m6A-related manner. Journal of Translational Medicine, 2022. 20(1): p. 198

  39. [39]

    Cancer Biology & Therapy, 2024

    Wang, J., et al., A positive feedback loop of SRSF9/USP22/ZEB1 promotes the progression of ovarian cancer. Cancer Biology & Therapy, 2024. 25(1): p. 2427415

  40. [40]

    eLife, 2016

    Ge, Z., et al., Polypyrimidine tract binding protein 1 protects mRNAs from recognition by the nonsense-mediated mRNA decay pathway. eLife, 2016. 5: p. e11155

  41. [41]

    Mol Cancer Res, 2020

    Zhang, K., et al., AGO2 Mediates MYC mRNA Sta bility in Hepatocellular Carcinoma. Mol Cancer Res, 2020. 18(4): p. 612-622

  42. [42]

    Nucleic Acids Research, 2020

    Zhang, H., et al., Dynamic landscape and evolution of m6A methylation in human. Nucleic Acids Research, 2020. 48(11): p. 6251-6264

  43. [43]

    Molecular Cell, 2020

    Liu, J.e., et al., Landscape and Regulation of m6A and m6Am Methylome across Human and Mouse Tissues. Molecular Cell, 2020. 77(2): p. 426-440.e6

  44. [44]

    Human Molecular Genetics, 2018

    Zhang, F., et al., Fragile X mental retardation protein modulates the stability of its m6A- marked messenger RNA targets. Human Molecular Genetics, 2018. 27(22): p. 3936-3950

  45. [45]

    Bioinformatics, 2018

    Chen, S., et al., fastp: an ultra -fast all-in-one FASTQ preprocessor. Bioinformatics, 2018. 34(17): p. i884-i890

  46. [46]

    Martin, M., Cutadapt removes adapter sequences from high-throughput sequencing reads. EMBnet. journal, 2011. 17(1): p. 10-12

  47. [47]

    Nature Biotechnology, 2019

    Kim, D., et al., Graph-based genome alignment and genotyping with HISAT2 and HISAT - genotype. Nature Biotechnology, 2019. 37(8): p. 907-915

  48. [48]

    Genomics, Proteomics & Bioinformatics, 2026

    Zhou, J., et al., Comprehensive Epitranscriptome Analysis from MeR IP-seq Data with exomePeak2. Genomics, Proteomics & Bioinformatics, 2026

  49. [49]

    Briefings in Bioinformatics, 2024

    Zhang, T.-H., et al., Understanding YTHDF2-mediated mRNA degradation by m6A-BERT- Deg. Briefings in Bioinformatics, 2024. 25(3): p. bbae170

  50. [50]

    Cell Genomics, 2024

    Fan, R., et al., A combined deep learning framework for mammalian m6A site prediction. Cell Genomics, 2024. 4(12): p. 100697

  51. [51]

    Bioinformatics, 2024

    Genovese, G., et al., BCFtools/liftover: an accurate and comprehensive tool to convert genetic variants across genome assemblies. Bioinformatics, 2024. 40(2)

  52. [52]

    Nature, 2015

    Zhou, J., et al., Dynamic m6A mRNA methylation directs translational control of heat shock response. Nature, 2015. 526(7574): p. 591-594

  53. [53]

    Better Modeling of Incomplete Annotations for Named Entity Recognition

    Jie, Z., et al. Better Modeling of Incomplete Annotations for Named Entity Recognition

  54. [54]

    Minneapolis, Minnesota: Association for Computational Linguistics

  55. [55]

    Did the Model Understand the Question? 2018

    Mudrakarta, P .K., et al. Did the Model Understand the Question? 2018. Melbourne, Australia: Association for Computational Linguistics

  56. [56]

    International Journal of Cancer, 2023

    Nakken, S., et al., Comprehensive interrogation of gene lists f rom genome-scale cancer screens with oncoEnrichR. International Journal of Cancer, 2023. 153(10): p. 1819-1828

  57. [57]

    PLOS Computational Biology, 2013

    Lawrence, M., et al., Software for Computing and Annotating Genomic Ranges. PLOS Computational Biology, 2013. 9(8): p. e1003118. Fig. 1 | Overview of the m6A-FORM framework. a, Pipeline for constructing the high -confidence single-base m6A dataset. A total of 224 human MeRIP-seq datasets were processed through data preparation and peak calling, yielding 2...