VAMP-Net: An Interpretable Multi-Path Network of Genomic Permutation-Invariant Set Attention and Quality-Aware 1D-CNN for MTB Drug Resistance

Aicha Boutorh; Anais Daoud; Kamar Hibatallah Baghdadi

arxiv: 2512.21786 · v2 · submitted 2025-12-25 · 💻 cs.LG

VAMP-Net: An Interpretable Multi-Path Network of Genomic Permutation-Invariant Set Attention and Quality-Aware 1D-CNN for MTB Drug Resistance

Aicha Boutorh , Kamar Hibatallah Baghdadi , Anais Daoud This is my paper

Pith reviewed 2026-05-16 19:11 UTC · model grok-4.3

classification 💻 cs.LG

keywords drug resistance predictionMycobacterium tuberculosisset attentionneural networksgenomic variantsinterpretabilityquality metricsIntegrated Gradients

0 comments

The pith

VAMP-Net combines a set attention path on genomic variants with a quality-aware 1D-CNN to predict Mycobacterium tuberculosis drug resistance at over 95 percent accuracy while recovering both known and novel resistance loci.

A machine-rendered reading of the paper's core claim, the machinery that carries it, and where it could break.

The paper introduces VAMP-Net to predict resistance to anti-tuberculosis drugs from genomic variant data that can be noisy and epistatically complex. One path treats the variants as an unordered collection so the network can learn interactions among them regardless of how they are listed. The second path inspects sequencing quality metrics to adjust how much each sample should influence the final call. On four important drugs the model exceeds the accuracy of ordinary CNN and MLP baselines, reaches AUC values near 0.97 for Rifampicin and Rifabutin, and uses Integrated Gradients to surface both the expected genes and previously unreported loci that cluster in cell-wall metabolic modules.

Core claim

By routing permutation-invariant variant sets through a Set Attention Transformer and quality metrics through a 1D-CNN, VAMP-Net achieves accuracies above 95 percent and AUCs around 0.97 on Rifampicin and Rifabutin, recovers the canonical targets rpoB, embB and katG via Integrated Gradients, identifies high-impact novel loci whose functional enrichment in cell-wall remodeling reaches p=0.00239, and demonstrates through ablation that the quality pathway learns to prioritize fraction of supporting reads over raw depth.

What carries the argument

The dual-path architecture that pairs a Set Attention Transformer for modeling epistatic dependencies among variant sets with a 1D-CNN that produces adaptive scores from VCF quality metrics.

If this is right

The model recovers the established resistance genes rpoB, embB and katG through attribution analysis.
Novel loci identified by the model form statistically non-random modules centered on cell-wall remodeling.
Ablation of the quality path shows that the network learns to weight fraction of supporting reads more heavily than raw sequencing depth.
Performance on four critical anti-TB drugs exceeds that of baseline CNN and MLP architectures.
The same dual interpretability layer supports both diagnostic classification and mechanistic gene discovery.

Where Pith is reading between the lines

These are editorial extensions of the paper, not claims the author makes directly.

If the novel loci are functionally validated they could become new targets for compounds that disrupt cell-wall remodeling.
The same dual-path structure could be applied to resistance prediction in other bacterial species where sequencing depth and variant calling quality vary.
The learned quality audit could be used in clinical pipelines to flag low-confidence samples for re-sequencing before a resistance call is issued.
Extending the set attention component to model interactions across multiple drugs simultaneously might reveal patterns of cross-resistance.

Load-bearing premise

The feature attributions and enrichment results point to causal biological mechanisms rather than correlations that happen to be present in the particular training collections.

What would settle it

An independent test set sequenced on a different platform and processed with a different variant caller that fails to recover the same novel loci or drops below the reported accuracy levels would falsify the central performance and discovery claims.

Figures

Figures reproduced from arXiv: 2512.21786 by Aicha Boutorh, Anais Daoud, Kamar Hibatallah Baghdadi.

**Figure 2.** Figure 2: Path-1 Architecture: Set Attention Transformer Block (SAB) for Symbolic Variant Processing [PITH_FULL_IMAGE:figures/full_fig_p008_2.png] view at source ↗

**Figure 3.** Figure 3: Path-2 Architecture: Quality-Aware 1D Convolutional Neural Network (1D-CNN) for VCF Feature Processing [PITH_FULL_IMAGE:figures/full_fig_p009_3.png] view at source ↗

**Figure 4.** Figure 4: Distribution of classes for the selected drugs. [PITH_FULL_IMAGE:figures/full_fig_p013_4.png] view at source ↗

**Figure 5.** Figure 5: Comparison of Static Encoding vs BERT Tokeniser metrics over epochs [PITH_FULL_IMAGE:figures/full_fig_p014_5.png] view at source ↗

**Figure 6.** Figure 6: Training (top) and validation (bottom) performance curves for fusion methods [PITH_FULL_IMAGE:figures/full_fig_p015_6.png] view at source ↗

**Figure 7.** Figure 7: Comparison of -Model A- and -Model B- performance without and with variant sequence shuffling for training and validation [PITH_FULL_IMAGE:figures/full_fig_p017_7.png] view at source ↗

**Figure 8.** Figure 8: Training (top) and validation (bottom) performance curves of all models with early stopping [PITH_FULL_IMAGE:figures/full_fig_p017_8.png] view at source ↗

**Figure 9.** Figure 9: ROC AUC Test curve of VAMP-Net vs CNN and MLP [PITH_FULL_IMAGE:figures/full_fig_p018_9.png] view at source ↗

**Figure 10.** Figure 10: Models test results comparison The results, visualized in [PITH_FULL_IMAGE:figures/full_fig_p018_10.png] view at source ↗

**Figure 11.** Figure 11: Variant interaction heatmaps showing attention-based relationships between top hub variants for RIF [PITH_FULL_IMAGE:figures/full_fig_p022_11.png] view at source ↗

**Figure 12.** Figure 12: Variant interaction heatmaps showing attention-based relationships between top hub variants for RFB [PITH_FULL_IMAGE:figures/full_fig_p023_12.png] view at source ↗

**Figure 13.** Figure 13: VCF feature importance comparison for RIF and RFB resistance models. Fraction of Supporting Reads [PITH_FULL_IMAGE:figures/full_fig_p024_13.png] view at source ↗

read the original abstract

Genomic prediction of drug resistance in Mycobacterium tuberculosis is often hindered by complex epistatic interactions and variable sequencing quality. We present the Interpretable Variant-Aware Multi-Path Network (VAMP-Net), a novel architecture addressing these challenges through a dual-pathway approach. Path-1 utilizes a Set Attention Transformer to model permutation-invariant variant sets and capture epistatic dependencies, while Path-2 employs a 1D-CNN to analyze VCF quality metrics for adaptive confidence scoring. Evaluated on four critical anti-TB drugs, VAMP-Net significantly outperforms baseline CNN and MLP models, achieving accuracies > 95% and AUCs around 0.97 for Rifampicin and Rifabutin. Feature attribution analysis via Integrated Gradients successfully recovered canonical targets (rpoB, embB, katG) and discovered high-impact novel loci. Functional enrichment confirmed these novel variants constitute non-random metabolic modules (p=0.00239) centered on cell-wall remodeling. Furthermore, systematic ablation of the Quality-Aware pathway demonstrates that the model performs a learned "integrated audit," prioritizing the Fraction of Supporting Reads and relative confidence over raw depth to mitigate technical noise. This dual-layer interpretability, bridging genomic pathogenicity with technical reliability, establishes a new paradigm for robust, auditable, and clinically actionable resistance prediction, positioning VAMP-Net as an important tool for both diagnostic classification and mechanistic discovery in clinical genomics.

Editorial analysis

A structured set of objections, weighed in public.

Desk editor's note, referee report, simulated authors' rebuttal, and a circularity audit. Tearing a paper down is the easy half of reading it; the pith above is the substance, this is the friction.

Desk Editor's Note private letter to a colleague

VAMP-Net pairs set attention with a quality-aware CNN for MTB resistance prediction and reports solid numbers plus gene recovery, but the validation details are too thin to trust the claims yet.

read the letter

The paper's main contribution is a dual-path model: one path uses a permutation-invariant set attention transformer on variant sets to handle epistasis, while the second runs a 1D-CNN over VCF quality metrics to produce an adaptive score. They evaluate on four drugs, claim >95% accuracy and ~0.97 AUC on rifampicin and rifabutin, beat plain CNN and MLP baselines, recover rpoB/embB/katG via Integrated Gradients, surface some novel loci that enrich for cell-wall modules at p=0.00239, and show via ablation that the quality path actually learns to weight supporting reads and over raw depth.

Referee Report

4 major / 2 minor

Summary. The paper introduces VAMP-Net, a dual-pathway architecture combining a permutation-invariant Set Attention Transformer (Path-1) for modeling epistatic variant interactions with a quality-aware 1D-CNN (Path-2) that processes VCF metrics for adaptive confidence scoring. It claims superior predictive performance over CNN and MLP baselines on four anti-TB drugs, with accuracies exceeding 95% and AUCs around 0.97 for Rifampicin and Rifabutin; Integrated Gradients attributions recover canonical resistance genes (rpoB, embB, katG) while identifying novel high-impact loci whose functional enrichment yields p=0.00239 for cell-wall remodeling modules; and ablation of the quality-aware path shows the model learns an integrated audit prioritizing read-support fraction and confidence over raw depth.

Significance. If the performance and attribution results hold after addressing validation gaps, the work would be significant for advancing interpretable genomic prediction in MTB resistance, where it bridges technical sequencing quality with biological variant effects via explicit ablation and Integrated Gradients. The dual-path design and reported enrichment of novel loci in non-random metabolic modules represent a concrete step toward auditable models that could support both diagnostics and mechanistic hypothesis generation, provided the attributions isolate causal signals rather than dataset correlations.

major comments (4)

[Abstract] Abstract and Results (performance claims): The reported accuracies >95% and AUCs ~0.97 for Rifampicin/Rifabutin are presented without dataset sizes, number of isolates per drug, cross-validation folds, or baseline hyperparameter search details; these omissions are load-bearing because independent verification of the claimed outperformance over CNN/MLP baselines cannot be performed from the given information.
[Feature attribution analysis] Feature attribution and discovery section: The claim that Integrated Gradients recovers causal canonical targets and novel loci is undermined by the absence of lineage correction, principal-component adjustment for population structure, or external validation cohorts; MTB collections commonly exhibit LD and lineage-driven correlations that can produce spurious attributions, directly affecting the mechanistic-discovery interpretation.
[Functional enrichment] Functional enrichment paragraph: The reported p=0.00239 for cell-wall remodeling modules requires explicit specification of the enrichment test, background gene set, multiple-testing correction method, and how the novel loci were thresholded for inclusion; without these, the non-random module claim cannot be evaluated as evidence of biological signal.
[Ablation study] Ablation study on Quality-Aware pathway: While the ablation demonstrates prioritization of Fraction of Supporting Reads and relative confidence, the manuscript provides no held-out evaluation on sequencing platforms, variant callers, or laboratories absent from the training distribution; this leaves the generalization of the learned integrated audit untested and weakens the clinical-actionability claim.

minor comments (2)

[Methods] Notation for the Set Attention Transformer and 1D-CNN paths should be unified (e.g., consistent use of symbols for permutation-invariant sets versus quality vectors) to improve readability.
[Figures] Figure legends for attribution heatmaps and enrichment plots should include the exact number of variants or genes shown and the statistical thresholds applied.

Simulated Author's Rebuttal

4 responses · 1 unresolved

We thank the referee for the detailed and constructive review. We have addressed each major comment below with revisions to improve reproducibility, acknowledge limitations, and strengthen the claims where possible.

read point-by-point responses

Referee: [Abstract] Abstract and Results (performance claims): The reported accuracies >95% and AUCs ~0.97 for Rifampicin/Rifabutin are presented without dataset sizes, number of isolates per drug, cross-validation folds, or baseline hyperparameter search details; these omissions are load-bearing because independent verification of the claimed outperformance over CNN/MLP baselines cannot be performed from the given information.

Authors: We agree these details are essential for verification. The revised manuscript adds the full dataset composition (number of isolates and resistance status per drug), specifies 5-fold stratified cross-validation, and details the grid-search hyperparameter procedure for baselines (including ranges for learning rate, layers, and regularization). These are now in the Methods section with a supplementary table reporting mean performance and standard deviations across folds. revision: yes
Referee: [Feature attribution analysis] Feature attribution and discovery section: The claim that Integrated Gradients recovers causal canonical targets and novel loci is undermined by the absence of lineage correction, principal-component adjustment for population structure, or external validation cohorts; MTB collections commonly exhibit LD and lineage-driven correlations that can produce spurious attributions, directly affecting the mechanistic-discovery interpretation.

Authors: We acknowledge the risk of population-structure confounding. The revision now includes PCA adjustment (regressing out the top 10 principal components from variant features before attribution) and reports the adjusted Integrated Gradients results. We added a discussion of remaining limitations due to linkage disequilibrium and the lack of external cohorts, while noting that canonical genes such as rpoB retain high attribution scores post-adjustment. revision: partial
Referee: [Functional enrichment] Functional enrichment paragraph: The reported p=0.00239 for cell-wall remodeling modules requires explicit specification of the enrichment test, background gene set, multiple-testing correction method, and how the novel loci were thresholded for inclusion; without these, the non-random module claim cannot be evaluated as evidence of biological signal.

Authors: We have corrected the omission. The revised Methods section now states that enrichment uses the hypergeometric test (clusterProfiler), with the full H37Rv genome as background, Benjamini-Hochberg FDR correction, and inclusion of loci above the 95th percentile of permutation-based null attributions. The reported p-value is the adjusted value. revision: yes
Referee: [Ablation study] Ablation study on Quality-Aware pathway: While the ablation demonstrates prioritization of Fraction of Supporting Reads and relative confidence, the manuscript provides no held-out evaluation on sequencing platforms, variant callers, or laboratories absent from the training distribution; this leaves the generalization of the learned integrated audit untested and weakens the clinical-actionability claim.

Authors: We agree that external generalization remains untested. The revision adds an explicit Limitations section discussing this gap and outlining planned multi-center validation. The internal ablation still demonstrates the model's learned prioritization on the available data distribution. revision: partial

standing simulated objections not resolved

External held-out evaluation of the quality-aware pathway on sequencing platforms, variant callers, or laboratories outside the training distribution, as this requires new datasets not available in the current study.

Circularity Check

0 steps flagged

No significant circularity in VAMP-Net: empirical ML performance and post-hoc attributions rest on external labels and databases

full rationale

The paper trains a dual-path neural architecture (set-attention transformer plus quality-aware 1D-CNN) on labeled MTB variant data to predict phenotypic drug resistance. Reported accuracies, AUCs, Integrated Gradients attributions, and downstream enrichment p-values are all computed against held-out phenotypic labels and independent functional annotation resources rather than being defined by the model equations or prior self-citations. No derivation step reduces by construction to its inputs, no fitted parameter is renamed as a prediction, and no uniqueness theorem or ansatz is smuggled via self-citation. The claims therefore remain externally falsifiable and self-contained.

Axiom & Free-Parameter Ledger

0 free parameters · 2 axioms · 0 invented entities

The central claims rest on standard machine-learning assumptions that the training distribution matches future clinical data and that post-hoc attribution methods recover biologically meaningful signals; no new physical entities are postulated.

axioms (2)

domain assumption Genetic variants can be modeled as a permutation-invariant set whose interactions are captured by attention.
Invoked in the description of Path-1.
domain assumption Sequencing quality metrics supply an independent signal that improves resistance prediction when processed by a dedicated CNN.
Invoked in the description of Path-2 and the ablation study.

pith-pipeline@v0.9.0 · 5575 in / 1533 out tokens · 56667 ms · 2026-05-16T19:11:48.209243+00:00 · methodology

discussion (0)

Reference graph

Works this paper leans on

5 extracted references · 5 canonical work pages

[1]

Interpreting attention mechanisms in genomic transformer models: A framework for biological insights.bioRxiv, pages 2025–06,

Micaela E Consens, Ander Diaz-Navarro, Vivian Chu, Lincoln Stein, Housheng Hansen He, Alan Moses, and Bo Wang. Interpreting attention mechanisms in genomic transformer models: A framework for biological insights.bioRxiv, pages 2025–06,

work page 2025
[2]

Exploring the consistency of the quality scores with machine learning for next-generation sequencing experiments.BioMed Research International, 2020(1):8531502,

Erdal Cosgun and Min Oh. Exploring the consistency of the quality scores with machine learning for next-generation sequencing experiments.BioMed Research International, 2020(1):8531502,

work page 2020
[3]

Dnabert: pre-trained bidirectional encoder representations from transformers model for dna-language in genome.Bioinformatics, 37(15):2112–2120,

25 APREPRINT- DECEMBER29, 2025 Yanrong Ji, Zhihan Zhou, Han Liu, and Ramana V Davuluri. Dnabert: pre-trained bidirectional encoder representations from transformers model for dna-language in genome.Bioinformatics, 37(15):2112–2120,

work page 2025
[4]

Molecular genetic basis of antimicrobial agent resistance inmycobacterium tuberculosis: 1998 update.Tubercle and Lung disease, 79(1):3–29,

26 APREPRINT- DECEMBER29, 2025 S Ramaswamy and James M Musser. Molecular genetic basis of antimicrobial agent resistance inmycobacterium tuberculosis: 1998 update.Tubercle and Lung disease, 79(1):3–29,

work page 2025
[5]

Integrating convolution and self-attention improves language model of human genome for interpreting non-coding regions at base-resolution.Nucleic acids research, 50(14):e81–e81,

27 APREPRINT- DECEMBER29, 2025 Meng Yang, Lichao Huang, Haiping Huang, Hui Tang, Nan Zhang, Huanming Yang, Jihong Wu, and Feng Mu. Integrating convolution and self-attention improves language model of human genome for interpreting non-coding regions at base-resolution.Nucleic acids research, 50(14):e81–e81,

work page 2025

[1] [1]

Interpreting attention mechanisms in genomic transformer models: A framework for biological insights.bioRxiv, pages 2025–06,

Micaela E Consens, Ander Diaz-Navarro, Vivian Chu, Lincoln Stein, Housheng Hansen He, Alan Moses, and Bo Wang. Interpreting attention mechanisms in genomic transformer models: A framework for biological insights.bioRxiv, pages 2025–06,

work page 2025

[2] [2]

Exploring the consistency of the quality scores with machine learning for next-generation sequencing experiments.BioMed Research International, 2020(1):8531502,

Erdal Cosgun and Min Oh. Exploring the consistency of the quality scores with machine learning for next-generation sequencing experiments.BioMed Research International, 2020(1):8531502,

work page 2020

[3] [3]

Dnabert: pre-trained bidirectional encoder representations from transformers model for dna-language in genome.Bioinformatics, 37(15):2112–2120,

25 APREPRINT- DECEMBER29, 2025 Yanrong Ji, Zhihan Zhou, Han Liu, and Ramana V Davuluri. Dnabert: pre-trained bidirectional encoder representations from transformers model for dna-language in genome.Bioinformatics, 37(15):2112–2120,

work page 2025

[4] [4]

Molecular genetic basis of antimicrobial agent resistance inmycobacterium tuberculosis: 1998 update.Tubercle and Lung disease, 79(1):3–29,

26 APREPRINT- DECEMBER29, 2025 S Ramaswamy and James M Musser. Molecular genetic basis of antimicrobial agent resistance inmycobacterium tuberculosis: 1998 update.Tubercle and Lung disease, 79(1):3–29,

work page 2025

[5] [5]

Integrating convolution and self-attention improves language model of human genome for interpreting non-coding regions at base-resolution.Nucleic acids research, 50(14):e81–e81,

27 APREPRINT- DECEMBER29, 2025 Meng Yang, Lichao Huang, Haiping Huang, Hui Tang, Nan Zhang, Huanming Yang, Jihong Wu, and Feng Mu. Integrating convolution and self-attention improves language model of human genome for interpreting non-coding regions at base-resolution.Nucleic acids research, 50(14):e81–e81,

work page 2025