pith. machine review for the scientific record. sign in

arxiv: 2605.12852 · v1 · submitted 2026-05-13 · 💻 cs.LG · q-bio.QM

Recognition: 2 theorem links

· Lean Theorem

Multitask Multimodal Fusion with Tabular Foundation Models for Peak and Durability Prediction of Pertussis Booster Response

Authors on Pith no claims yet

Pith reviewed 2026-05-14 20:33 UTC · model grok-4.3

classification 💻 cs.LG q-bio.QM
keywords pertussis boosterimmune response predictionmultimodal fusionmultitask learningTabPFNcontrastive losspeak responsedurability
0
0 comments X

The pith

A multitask fusion model using tabular foundation models jointly predicts peak magnitude and long-term durability of pertussis booster immune responses.

A machine-rendered reading of the paper's core claim, the machinery that carries it, and where it could break.

The paper proposes a multi-task contrastive multimodal fusion architecture to jointly predict the peak magnitude and long-term durability of immune responses to pertussis booster vaccination from heterogeneous modalities. These two outcomes are biologically dissociated rather than redundant, with peak reflecting acute B-cell activation and durability reflecting establishment of long-term humoral memory, yet most models target only one endpoint. By freezing TabPFN-v2 encoders per modality, applying a dual-label supervised contrastive loss that pairs subjects agreeing on either task label, and using missingness-masked attention fusion, the model reaches test AUROCs of 0.797 for peak and 0.755 for durability on a curated subset of 158 subjects. This joint prediction is the only one among logistic regression, XGBoost, and MLP baselines whose confidence intervals lie entirely above chance on both tasks simultaneously, and per-modality contribution analyses recover task-specific patterns consistent with underlying immunology.

Core claim

The proposed multi-task contrastive multimodal fusion architecture, combining frozen TabPFN-v2 per-modality encoders, a dual-label supervised contrastive loss, modality dropout calibrated to empirical missingness, and missingness-masked attention fusion, achieves test AUROC 0.797 (95% CI [0.621, 0.948]) for peak response and 0.755 (95% CI [0.519, 0.945]) for durability on the CMI-PB pertussis booster dataset (n=158, Spearman r=-0.58 between endpoints). Both metrics are statistically significant under joint label permutation testing (N=1000; p=0.002 and p=0.045), and the model is the only one among raw-feature and TabPFN-embedding baselines whose 95% CIs lie above chance on both tasks at once

What carries the argument

Dual-label supervised contrastive loss that treats two subjects as a positive pair if they agree on the peak label or the durability label, fused via missingness-masked attention over frozen TabPFN-v2 per-modality encoders.

If this is right

  • Peak prediction is carried primarily by cytokine signatures while durability prediction draws from baseline antibody features, recovering task-specific biological signals.
  • The architecture handles structured missingness without collapse and remains the sole model with confidence intervals above chance on both tasks.
  • Joint modeling captures the full boost-and-wane trajectory instead of isolated endpoints.
  • Statistical significance on both tasks holds under joint label permutation testing.
  • Per-modality contribution analyses align with known immunology of acute activation versus long-term memory.

Where Pith is reading between the lines

These are editorial extensions of the paper, not claims the author makes directly.

  • The same fusion approach could be tested on booster responses for other vaccines that exhibit dissociated peak and durability phases.
  • Predicted durability values might support individualized booster timing schedules if validated on larger cohorts.
  • Adding genomic or transcriptomic modalities could further separate the biological compartments driving each task.
  • The interpretable modality contributions suggest candidate biomarkers for early assessment of vaccine response quality.

Load-bearing premise

The small sample of 158 subjects with 44.9 percent missing modalities allows the dual-label contrastive loss to reliably capture the biological dissociation between peak and durability without overfitting or spurious correlations.

What would settle it

An independent replication cohort where the model's 95 percent confidence intervals for both AUROCs include 0.5 or where simpler baselines achieve comparable or better intervals on both tasks simultaneously would falsify the claim of effective joint prediction.

Figures

Figures reproduced from arXiv: 2605.12852 by Divya Sitani.

Figure 1
Figure 1. Figure 1: Cohort × modality missingness pattern in CMI-PB. (A) Subject level modality availability for the Task 1 (peak response) cohort, organised by annual cohort (2020-2023). Each row is one subject; each column is one of the four data modalities. Filled cells indicate the modality is present; white cells indicate it is missing. Within each cohort, subjects are sorted by missingness pattern (subjects with more mo… view at source ↗
Figure 2
Figure 2. Figure 2: Multi-task contrastive multimodal fusion architecture. [PITH_FULL_IMAGE:figures/full_fig_p007_2.png] view at source ↗
Figure 3
Figure 3. Figure 3: Peak and durability are anti-correlated phases of the humoral response. [PITH_FULL_IMAGE:figures/full_fig_p012_3.png] view at source ↗
Figure 4
Figure 4. Figure 4: ROC curves on the held-out test set with 95% bootstrap confidence intervals, and [PITH_FULL_IMAGE:figures/full_fig_p013_4.png] view at source ↗
Figure 5
Figure 5. Figure 5: Bootstrap AUROC distributions on the held-out test set, both tasks overlaid. [PITH_FULL_IMAGE:figures/full_fig_p014_5.png] view at source ↗
Figure 6
Figure 6. Figure 6: Label-permutation null distributions confirm learned signal for both tasks. [PITH_FULL_IMAGE:figures/full_fig_p015_6.png] view at source ↗
Figure 7
Figure 7. Figure 7: Per-modality contribution to peak and durability prediction. Top: [PITH_FULL_IMAGE:figures/full_fig_p016_7.png] view at source ↗
Figure 8
Figure 8. Figure 8: Graceful degradation under inference-time modality missingness. [PITH_FULL_IMAGE:figures/full_fig_p017_8.png] view at source ↗
read the original abstract

Pertussis booster vaccination produces immune responses that vary widely across individuals in both peak magnitude and long-term durability. These two phases are governed by partly distinct biological compartments:peak reflects acute B-cell activation and antibody secretion, while durability reflects the establishment of long-term humoral memory. Yet most computational models target only one, missing the full boost-and-wane trajectory. Jointly predicting both is non-trivial because the two endpoints are biologically dissociated rather than redundant; samples are small, modalities are heterogeneous with structured missingness, and the two tasks rely on different measurement windows. We propose a multi-task contrastive multimodal fusion architecture combining frozen TabPFN-v2 per-modality encoders, a dual-label supervised contrastive loss that treats two subjects as a positive pair if they agree on the Task 1 label or the Task 2 label, modality dropout calibrated to empirical missingness, and missingness-masked attention fusion. Applied to a curated subset of the CMI-PB pertussis booster dataset (n = 158 subjects, four modalities, 44.9% with at least one modality missing; Spearman r = -0.58 between peak and durability, n = 96), the model achieves test AUROC 0.797 (95% CI [0.621, 0.948]) for peak response and 0.755 (95% CI [0.519, 0.945]) for durability, with both significant under joint label permutation (N = 1000; p = 0.002 and p = 0.045). Across logistic regression, XGBoost, and MLP baselines on raw features and on TabPFN embeddings, the proposed model is the only one whose 95% CIs lie above chance on both tasks simultaneously. Per-modality contribution analyses recover task-specific modality contributions consistent with the underlying immunology: peak prediction is carried by cytokine signatures, while durability is carried by baseline antibody features.

Editorial analysis

A structured set of objections, weighed in public.

Desk editor's note, referee report, simulated authors' rebuttal, and a circularity audit. Tearing a paper down is the easy half of reading it; the pith above is the substance, this is the friction.

Referee Report

2 major / 1 minor

Summary. The manuscript proposes a multitask multimodal fusion architecture that combines frozen TabPFN-v2 per-modality encoders, a dual-label supervised contrastive loss (positive pairs if subjects agree on peak OR durability label), modality dropout, and missingness-masked attention to jointly predict peak magnitude and durability of pertussis booster responses. On a curated CMI-PB subset (n=158 subjects, four modalities, 44.9% with at least one missing modality, Spearman r=-0.58 between endpoints), it reports test AUROCs of 0.797 (95% CI [0.621, 0.948]) for peak and 0.755 (95% CI [0.519, 0.945]) for durability, both significant under joint label permutation (N=1000; p=0.002 and p=0.045), outperforming logistic regression, XGBoost, and MLP baselines on raw features and embeddings, with per-modality contributions aligning with immunology.

Significance. If the performance claims hold under more rigorous validation, the work would demonstrate a practical way to leverage tabular foundation models and contrastive learning for jointly modeling biologically dissociated endpoints in small-sample, high-missingness multimodal settings, with direct relevance to personalized vaccination and immune-response modeling.

major comments (2)
  1. [Methods] Methods (dual-label contrastive loss definition): the loss treats pairs as positive if subjects agree on either task label, yet the endpoints show negative correlation (r=-0.58 on n=96); with only 158 subjects this formulation risks capturing dataset-specific noise or missingness patterns rather than dissociated biology, and no ablation isolating the loss from the TabPFN encoders is reported.
  2. [Results] Results (evaluation protocol): the reported CIs for durability [0.519, 0.945] include values at or below chance, and the single held-out split plus permutation test (N=1000) does not include nested cross-validation or multiple random splits; given n=158 and 44.9% missing modalities this leaves open the possibility that outperformance on both tasks simultaneously is split-specific rather than robust.
minor comments (1)
  1. [Abstract] Abstract: the number of modalities and the precise baseline configurations (raw vs. embedded) could be stated more explicitly to aid quick assessment of the comparison.

Simulated Author's Rebuttal

2 responses · 0 unresolved

We thank the referee for the constructive comments. We address each major point below with clarifications on our design choices and indicate revisions to strengthen the validation and analysis.

read point-by-point responses
  1. Referee: [Methods] Methods (dual-label contrastive loss definition): the loss treats pairs as positive if subjects agree on either task label, yet the endpoints show negative correlation (r=-0.58 on n=96); with only 158 subjects this formulation risks capturing dataset-specific noise or missingness patterns rather than dissociated biology, and no ablation isolating the loss from the TabPFN encoders is reported.

    Authors: The dual-label supervised contrastive loss was chosen precisely because the endpoints are biologically dissociated (Spearman r = -0.58), enabling the model to learn representations that capture similarity in at least one task without forcing redundancy. This aligns with the multitask goal of jointly modeling peak and durability. We agree that an ablation isolating the loss contribution from the TabPFN encoders is needed to rule out noise capture. In the revised manuscript we will add such an ablation, comparing the full dual-label model against single-label contrastive variants and a non-contrastive fusion baseline with frozen encoders. revision: yes

  2. Referee: [Results] Results (evaluation protocol): the reported CIs for durability [0.519, 0.945] include values at or below chance, and the single held-out split plus permutation test (N=1000) does not include nested cross-validation or multiple random splits; given n=158 and 44.9% missing modalities this leaves open the possibility that outperformance on both tasks simultaneously is split-specific rather than robust.

    Authors: We acknowledge that the wide CI for durability AUROC includes values at or below chance, which reflects uncertainty from the modest sample size and missingness. The single held-out split was used to preserve test integrity under structured missingness. To demonstrate robustness beyond a single split, we will add results from five independent random train-test splits, reporting mean AUROC and standard deviation for both tasks. Full nested cross-validation remains computationally prohibitive with the TabPFN encoders, but the multi-split evaluation will provide clearer evidence that simultaneous outperformance is not split-specific. revision: partial

Circularity Check

0 steps flagged

No circularity: performance claims rest on independent held-out evaluation with external encoders

full rationale

The paper reports test AUROCs (0.797 and 0.755) obtained via standard cross-validation on a held-out split of the n=158 dataset, using frozen external TabPFN-v2 encoders and a dual-label contrastive loss whose parameters are optimized on training data only. No equation or claim reduces a reported metric to a quantity defined by the same fitted parameters (no self-definitional loops, no fitted-input-called-prediction, no load-bearing self-citation). The architecture description and per-modality analyses are post-hoc interpretations of the trained model, not derivations that presuppose the final AUROCs. This is a standard empirical ML evaluation pipeline with no reduction of outputs to inputs by construction.

Axiom & Free-Parameter Ledger

0 free parameters · 2 axioms · 0 invented entities

The paper introduces no new free parameters or invented entities; it relies on standard assumptions in multimodal ML and the pre-trained foundation model.

axioms (2)
  • domain assumption TabPFN-v2 provides effective frozen encoders for the tabular modalities in this immunology domain
    Assumed without re-training or fine-tuning details provided in the abstract.
  • ad hoc to paper The dual-label supervised contrastive loss appropriately captures agreement on either task label for dissociated biological endpoints
    Defined specifically for this multitask setup to handle non-redundant tasks.

pith-pipeline@v0.9.0 · 5662 in / 1579 out tokens · 63782 ms · 2026-05-14T20:33:11.222626+00:00 · methodology

discussion (0)

Sign in with ORCID, Apple, or X to comment. Anyone can read and Pith papers without signing in.

Lean theorems connected to this paper

Citations machine-checked in the Pith Canon. Every link opens the source theorem in the public Lean library.

What do these tags mean?
matches
The paper's claim is directly supported by a theorem in the formal canon.
supports
The theorem supports part of the paper's argument, but the paper may add assumptions or extra steps.
extends
The paper goes beyond the formal theorem; the theorem is a base layer rather than the whole result.
uses
The paper appears to rely on the theorem as machinery.
contradicts
The paper's claim conflicts with a theorem or certificate in the canon.
unclear
Pith found a possible connection, but the passage is too broad, indirect, or ambiguous to say the theorem truly supports the claim.

Reference graph

Works this paper leans on

26 extracted references

  1. [1]

    Duration of humoral immunity to common viral and vaccine antigens

    Amanna IJ, Carlson NE, Slifka MK. Duration of humoral immunity to common viral and vaccine antigens. New England Journal of Medicine. 2007;357(19):1903–1915

  2. [2]

    Antiviral responses induced by Tdap-IPV vaccination are associated with persistent humoral immunity to Bordetella pertussis

    Gillard J, Suffiotti M, Brazda P, Venkatasubramanian PB, Versteegen P, de Jonge MI, et al. Antiviral responses induced by Tdap-IPV vaccination are associated with persistent humoral immunity to Bordetella pertussis. Nature Communications. 2024;15(1):2133

  3. [3]

    A multi-omics systems vaccinology resource to develop and test computational models of immunity

    Shinde P, Soldevila F, Reyna J, Aoki M, Rasmussen M, Willemsen L, et al. A multi-omics systems vaccinology resource to develop and test computational models of immunity. Cell Reports Methods. 2024;4(3):100731

  4. [4]

    Pertussis resurgence: waning immunity and pathogen adaptation: two sides of the same coin

    Mooi FR, Van Der Maas NAT, De Melker HE. Pertussis resurgence: waning immunity and pathogen adaptation: two sides of the same coin. Epidemiology & Infection. 2014;142(4):685–694. May 14, 2026 20/22

  5. [5]

    Pertussis vaccines, epidemiology and evolution

    Domenech de Cell` es M, Rohani P. Pertussis vaccines, epidemiology and evolution. Nature Reviews Microbiology. 2024;22(11):722–735

  6. [6]

    Acellular pertussis vaccines protect against disease but fail to prevent infection and transmission in a nonhuman primate model

    Warfel JM, Zimmerman LI, Merkel TJ. Acellular pertussis vaccines protect against disease but fail to prevent infection and transmission in a nonhuman primate model. Proceedings of the National Academy of Sciences. 2014;111(2):787–792

  7. [7]

    Waning vaccine immunity in teenagers primed with whole cell and acellular pertussis vaccine: recent epidemiology

    Sheridan SL, Frith K, Snelling TL, Grimwood K, McIntyre PB, Lambert SB. Waning vaccine immunity in teenagers primed with whole cell and acellular pertussis vaccine: recent epidemiology. Expert Review of Vaccines. 2014;13(9):1081–1106

  8. [8]

    IL-17 and IFN-γ-producing respiratory tissue-resident memory CD4 T cells persist for decades in adults immunized as children with whole-cell pertussis vaccines

    McCarthy KN, Hone S, McLoughlin RM, Mills KHG. IL-17 and IFN-γ-producing respiratory tissue-resident memory CD4 T cells persist for decades in adults immunized as children with whole-cell pertussis vaccines. The Journal of Infectious Diseases. 2024;230(3):e518–e523

  9. [9]

    Th1/Th17 polarization persists following whole-cell pertussis vaccination despite repeated acellular boosters

    da Silva Antunes R, Babor M, Carpenter C, Khalil N, Cortese M, Mentzer AJ, et al. Th1/Th17 polarization persists following whole-cell pertussis vaccination despite repeated acellular boosters. Journal of Clinical Investigation. 2018;128(9):3853–3865

  10. [10]

    Immunization with Whole Cell but Not Acellular Pertussis Vaccines Primes CD4 T RM Cells That Sustain Protective Immunity Against Nasal Colonization withBordetella pertussis

    Wilk MM, Borkner L, Misiak A, Curham L, Allen AC, Mills KHG. Immunization with Whole Cell but Not Acellular Pertussis Vaccines Primes CD4 T RM Cells That Sustain Protective Immunity Against Nasal Colonization withBordetella pertussis. Emerging Microbes & Infections. 2019;8(1):169–185

  11. [11]

    A system-view of Bordetella pertussisbooster vaccine responses in adults primed with whole-cell versus acellular vaccine in infancy

    da Silva Antunes R, Soldevila F, Pomaznoy M, Babor M, Bennett J, Tian Y, et al. A system-view of Bordetella pertussisbooster vaccine responses in adults primed with whole-cell versus acellular vaccine in infancy. JCI Insight. 2021;6(7):e141023

  12. [12]

    Systems biology approach predicts immunogenicity of the yellow fever vaccine in humans

    Querec TD, Akondy RS, Lee EK, Cao W, Nakaya HI, Teuwen D, et al. Systems biology approach predicts immunogenicity of the yellow fever vaccine in humans. Nature Immunology. 2009;10(1):116–125

  13. [13]

    TabPFN: A Transformer That Solves Small Tabular Classification Problems in a Second

    Hollmann N, M¨ uller S, Eggensperger K, Hutter F. TabPFN: A Transformer That Solves Small Tabular Classification Problems in a Second. arXiv preprint arXiv:220701848. 2022

  14. [14]

    Accurate predictions on small data with a tabular foundation model

    Hollmann N, M¨ uller S, Purucker L, Krishnakumar A, K¨ orfer M, Hoo SB, et al. Accurate predictions on small data with a tabular foundation model. Nature. 2025;637(8045):319–326

  15. [15]

    Missing value estimation methods for DNA microarrays

    Troyanskaya O, Cantor M, Sherlock G, Brown P, Hastie T, Tibshirani R, et al. Missing value estimation methods for DNA microarrays. Bioinformatics. 2001;17(6):520–525

  16. [16]

    Adjusting batch effects in microarray expression data using empirical Bayes methods

    Johnson WE, Li C, Rabinovic A. Adjusting batch effects in microarray expression data using empirical Bayes methods. Biostatistics. 2007;8(1):118–127

  17. [17]

    Systems Vaccinology: Probing Humanity’s Diverse Immune Systems with Vaccines

    Pulendran B. Systems Vaccinology: Probing Humanity’s Diverse Immune Systems with Vaccines. Proceedings of the National Academy of Sciences. 2014;111(34):12300–12306

  18. [18]

    Systems Biology of Vaccination for Seasonal Influenza in Humans

    Nakaya HI, Wrammert J, Lee EK, Racioppi L, Marie-Kunze S, Haining WN, et al. Systems Biology of Vaccination for Seasonal Influenza in Humans. Nature Immunology. 2011;12(8):786–795

  19. [19]

    Molecular Signatures of Antibody Responses Derived from a Systems Biology Study of Five Human Vaccines

    Li S, Rouphael N, Duraisingham S, Romero-Steiner S, Presnell S, Davis C, et al. Molecular Signatures of Antibody Responses Derived from a Systems Biology Study of Five Human Vaccines. Nature Immunology. 2014;15(2):195–204

  20. [20]

    Pertussis Vaccines

    Edwards KM, Decker MD. Pertussis Vaccines. In: Plotkin SA, Orenstein WA, Offit PA, Edwards KM, editors. Plotkin’s Vaccines. 7th ed. Philadelphia: Elsevier; 2018

  21. [21]

    What Is Wrong with Pertussis Vaccine Immunity? The Problem of Waning Effectiveness of Pertussis Vaccines

    Burdin N, Handy LK, Plotkin SA. What Is Wrong with Pertussis Vaccine Immunity? The Problem of Waning Effectiveness of Pertussis Vaccines. Cold Spring Harbor Perspectives in Biology. 2017;9(12):a029454. May 14, 2026 21/22

  22. [22]

    Supervised Contrastive Learning

    Khosla P, Teterwak P, Wang C, Sarna A, Tian Y, Isola P, et al. Supervised Contrastive Learning. In: Advances in Neural Information Processing Systems (NeurIPS). vol. 33; 2020. p. 18661–18673

  23. [23]

    Decoupled Weight Decay Regularization

    Loshchilov I, Hutter F. Decoupled Weight Decay Regularization. In: International Conference on Learning Representations (ICLR); 2019

  24. [24]

    Permutation Tests for Studying Classifier Performance

    Ojala M, Garriga GC. Permutation Tests for Studying Classifier Performance. Journal of Machine Learning Research. 2010;11:1833–1863. Available from: https://jmlr.org/papers/v11/ojala10a.html

  25. [25]

    Scikit-learn: Machine Learning in Python

    Pedregosa F, Varoquaux G, Gramfort A, Michel V, Thirion B, Grisel O, et al. Scikit-learn: Machine Learning in Python. Journal of Machine Learning Research. 2011;12:2825–2830

  26. [26]

    XGBoost: A Scalable Tree Boosting System

    Chen T, Guestrin C. XGBoost: A Scalable Tree Boosting System. In: Proceedings of the 22nd ACM SIGKDD International Conference on Knowledge Discovery and Data Mining. KDD ’16. ACM; 2016. p. 785–794. May 14, 2026 22/22