pith. machine review for the scientific record. sign in

arxiv: 2604.12026 · v1 · submitted 2026-04-13 · 💻 cs.LG · q-bio.BM· q-bio.QM

Recognition: no theorem link

TriFit: Trimodal Fusion with Protein Dynamics for Mutation Fitness Prediction

Authors on Pith no claims yet

Pith reviewed 2026-05-10 15:29 UTC · model grok-4.3

classification 💻 cs.LG q-bio.BMq-bio.QM
keywords protein mutation predictionmultimodal fusionprotein dynamicsfitness predictionsingle amino acid substitutionmixture of expertsvariant effect prediction
0
0 comments X

The pith

A trimodal model that adds protein dynamics to sequence and structure data improves predictions of how mutations affect protein fitness.

A machine-rendered reading of the paper's core claim, the machinery that carries it, and where it could break.

The paper seeks to demonstrate that protein dynamics provide essential information missing from sequence and static structure alone for forecasting the functional impact of amino acid substitutions. This is important because accurate mutation effect prediction aids in understanding genetic diseases and in engineering proteins with desired properties. The proposed framework uses an adaptive fusion mechanism to integrate the three modalities without assuming fixed importance for any one. Results on a large collection of mutation experiments show gains over methods using fewer data types, with dynamics contributing the most additional value.

Core claim

The authors establish that dynamics embeddings, capturing residue flexibility, mode shapes, and cross-correlations, when fused with sequence and structure embeddings using an adaptive four-expert mixture-of-experts module and trimodal cross-modal contrastive learning, enable more accurate assessment of mutational tolerance than prior approaches limited to sequence or structure.

What carries the argument

An adaptive mixture-of-experts fusion module that routes and weights combinations of sequence, structure, and dynamics embeddings based on the specific protein input.

If this is right

  • The dynamics modality yields the largest performance gain when incorporated alongside the other two.
  • Outputs from the model are well-calibrated in their probability estimates.
  • Adaptive weighting permits the fusion strategy to vary across different proteins rather than using a uniform rule.

Where Pith is reading between the lines

These are editorial extensions of the paper, not claims the author makes directly.

  • This could extend to other prediction tasks in structural biology where motion data might resolve ambiguities in static models.
  • Testing the fusion weights against known functional sites in proteins could reveal if the model learns biologically meaningful patterns.
  • Applying similar trimodal integration to variants in membrane proteins or those under cellular conditions might further validate the approach.

Load-bearing premise

The information from protein flexibility and correlated motions is not already fully encoded in sequence patterns or static three-dimensional structures.

What would settle it

Observing no increase in accuracy on the mutation assay collection when dynamics features are excluded from the model.

Figures

Figures reproduced from arXiv: 2604.12026 by Seungik Cho.

Figure 1
Figure 1. Figure 1: TriFit architecture. Sequence, structure, and dynamics encoders (frozen) extract modality-specific embeddings. A learned projection maps each to a shared 512-dim space. The four-expert MoE router adaptively combines modality pairs (E1: Seq+Struct, E2: Seq+Dyn, E3: Struct+Dyn, E4: Trimodal) via soft gating. Cross-modal contrastive loss aligns all three modality pairs during training. The weighted-fused repr… view at source ↗
Figure 2
Figure 2. Figure 2: Representation analysis. Left: UMAP of projected modality embeddings (sequence: orange, structure: blue, dynam￾ics: green). Right: LDA projection of MoE fused embeddings showing distributional shift between damaging (red) and functional (blue) variants. Expert Utilization & Calibration. MoE router analy￾sis across 217 proteins reveals two dominant clusters: one preferring the Trimodal expert (E4) and one p… view at source ↗
Figure 3
Figure 3. Figure 3: MoE expert utilization across 217 test proteins (hierarchical clustering by router weight pattern). Color intensity indicates mean router weight assigned to each expert. Two major clusters emerge: proteins preferring the Trimodal expert (E4) and proteins preferring the Struct+Dyn expert (E3). The Seq+Dyn expert (E2) is consistently underweighted, while Seq+Struct (E1) shows intermediate utilization. D. Cal… view at source ↗
Figure 4
Figure 4. Figure 4: Prediction calibration analysis on the ProteinGym test set (139,480 variants). Left: Reliability diagram showing close alignment between predicted probabilities and empirical positive rates (ECE = 0.044), achieved without post-hoc calibration. Center: Confidence distribution max(p, 1−p) by true label, showing similar confidence profiles for both classes. Right: Confidence vs. accuracy across 10 equal-width… view at source ↗
Figure 5
Figure 5. Figure 5: Per-position prediction accuracy along the protein sequence for three representative proteins. Each dot corresponds to one residue position; dot size is proportional to variant count at that position; color indicates local functional rate (blue = functional, red = damaging). The black curve shows a sliding window average (window = L/20 residues). Overall per-protein accuracy is reported in the subtitle of … view at source ↗
read the original abstract

Predicting the functional impact of single amino acid substitutions (SAVs) is central to understanding genetic disease and engineering therapeutic proteins. While protein language models and structure-based methods have achieved strong performance on this task, they systematically neglect protein dynamics; residue flexibility, correlated motions, and allosteric coupling are well-established determinants of mutational tolerance in structural biology, yet have not been incorporated into supervised variant effect predictors. We present TriFit, a multimodal framework that integrates sequence, structure, and protein dynamics through a four-expert Mixture-of-Experts (MoE) fusion module with trimodal cross-modal contrastive learning. Sequence embeddings are extracted via masked marginal scoring with ESM-2 (650M); structural embeddings from AlphaFold2-predicted C-alpha geometries; and dynamics embeddings from Gaussian Network Model (GNM) B-factors, mode shapes, and residue-residue cross-correlations. The MoE router adaptively weights modality combinations conditioned on the input, enabling protein-specific fusion without fixed modality assumptions. On the ProteinGym substitution benchmark (217 DMS assays, 696k SAVs), TriFit achieves AUROC 0.897 +/- 0.0002, outperforming all supervised baselines including Kermut (0.864) and ProteinNPT (0.844), and the best zero-shot model ESM3 (0.769). Ablation studies confirm that dynamics provides the largest marginal contribution over pairwise modality combinations, and TriFit achieves well-calibrated probabilistic outputs (ECE = 0.044) without post-hoc correction.

Editorial analysis

A structured set of objections, weighed in public.

Desk editor's note, referee report, simulated authors' rebuttal, and a circularity audit. Tearing a paper down is the easy half of reading it; the pith above is the substance, this is the friction.

Referee Report

3 major / 1 minor

Summary. The manuscript presents TriFit, a multimodal framework for predicting fitness effects of single amino acid variants (SAVs). It extracts sequence embeddings via masked marginal scoring with ESM-2, structural embeddings from AlphaFold2 Cα geometries, and dynamics embeddings from Gaussian Network Model (GNM) B-factors, mode shapes, and cross-correlations. These are fused via a four-expert Mixture-of-Experts (MoE) module with trimodal cross-modal contrastive learning. On the ProteinGym substitution benchmark (217 DMS assays, 696k SAVs), TriFit reports AUROC 0.897 +/- 0.0002, outperforming supervised baselines (Kermut 0.864, ProteinNPT 0.844) and zero-shot ESM3 (0.769). Ablations claim the dynamics modality provides the largest marginal gain, and the model produces well-calibrated outputs (ECE=0.044).

Significance. If the results and ablations hold after clarification, the work would be significant for variant effect prediction by integrating protein dynamics—an established determinant of mutational tolerance from structural biology that prior supervised models have neglected. The adaptive MoE fusion and contrastive objective provide a principled way to combine modalities without fixed weighting assumptions. Evaluation on the large public ProteinGym benchmark and explicit reporting of calibration error are positive aspects that support reproducibility and practical utility.

major comments (3)
  1. [Abstract and Results section] Abstract and ablation studies (Results section): The claim that 'dynamics provides the largest marginal contribution over pairwise modality combinations' is load-bearing for the central novelty argument, yet the controls are unspecified. It is unclear whether dynamics features are appended to a frozen sequence+structure backbone, whether all three modalities are jointly retrained in every ablation arm, or whether the MoE router always receives the same input dimensionality. Without these details the 0.033 AUROC lift over Kermut cannot be confidently attributed to the GNM modality rather than the fusion architecture or training protocol.
  2. [Experimental setup (Section 3)] Experimental setup (Section 3): No information is given on the train-test split protocol across the 217 DMS assays (per-assay vs. global splits, sequence-identity cutoffs, or temporal splits). Because the MoE router and trimodal contrastive loss are fitted supervised on the same ProteinGym distribution used for final evaluation, the absence of explicit leakage controls undermines interpretation of the headline AUROC of 0.897.
  3. [Results section] Results section: The reported AUROC variance of +/- 0.0002 is unusually tight. It is not stated whether this reflects multiple random seeds, different data folds, or a single run. This detail is required to assess whether the outperformance over ProteinNPT (0.844) and Kermut (0.864) is statistically reliable.
minor comments (1)
  1. [Abstract] The abstract states 'well-calibrated probabilistic outputs (ECE = 0.044)' but does not define the expected calibration error formula or binning strategy used; this should be added for clarity.

Simulated Author's Rebuttal

3 responses · 0 unresolved

We thank the referee for the constructive and detailed review. We address each major comment below and will revise the manuscript to improve clarity, reproducibility, and interpretation of the results.

read point-by-point responses
  1. Referee: [Abstract and Results section] Abstract and ablation studies (Results section): The claim that 'dynamics provides the largest marginal contribution over pairwise modality combinations' is load-bearing for the central novelty argument, yet the controls are unspecified. It is unclear whether dynamics features are appended to a frozen sequence+structure backbone, whether all three modalities are jointly retrained in every ablation arm, or whether the MoE router always receives the same input dimensionality. Without these details the 0.033 AUROC lift over Kermut cannot be confidently attributed to the GNM modality rather than the fusion architecture or training protocol.

    Authors: We thank the referee for this observation. In the ablation experiments, all three modalities were jointly retrained together with the MoE router in every arm; dynamics features were not appended to any frozen backbone. The router always received embeddings of identical dimensionality across configurations. We will revise the Results section to explicitly document these controls so that the marginal contribution of the dynamics modality can be properly evaluated. revision: yes

  2. Referee: [Experimental setup (Section 3)] Experimental setup (Section 3): No information is given on the train-test split protocol across the 217 DMS assays (per-assay vs. global splits, sequence-identity cutoffs, or temporal splits). Because the MoE router and trimodal contrastive loss are fitted supervised on the same ProteinGym distribution used for final evaluation, the absence of explicit leakage controls undermines interpretation of the headline AUROC of 0.897.

    Authors: We agree that the splitting protocol requires explicit description. The experiments used global splits across all 217 assays together with sequence-identity cutoffs between training and test proteins to prevent leakage; no temporal splits were applied. We will add a dedicated paragraph in Section 3 that fully specifies the splitting procedure and the leakage-mitigation steps taken during supervised training of the MoE and contrastive components. revision: yes

  3. Referee: [Results section] Results section: The reported AUROC variance of +/- 0.0002 is unusually tight. It is not stated whether this reflects multiple random seeds, different data folds, or a single run. This detail is required to assess whether the outperformance over ProteinNPT (0.844) and Kermut (0.864) is statistically reliable.

    Authors: The reported variance of +/- 0.0002 is the standard deviation obtained across multiple independent training runs that differed only in random seed. We will revise the Results section to state the exact number of runs performed and, space permitting, include a brief statistical comparison confirming that the observed gains remain significant. revision: yes

Circularity Check

0 steps flagged

No significant circularity in TriFit derivation chain

full rationale

The paper presents an empirical multimodal ML model that extracts fixed embeddings from ESM-2, AlphaFold2, and GNM, then trains a supervised MoE fusion module plus contrastive loss on the ProteinGym benchmark. Reported AUROC and ablation results are standard held-out evaluation metrics after training; they do not reduce by construction to the input features or to any self-citation. No load-bearing uniqueness theorems, self-definitional equations, or fitted parameters renamed as predictions appear in the abstract or described pipeline. The central claim (dynamics adds orthogonal signal) is an empirical hypothesis tested via ablation, not a tautology.

Axiom & Free-Parameter Ledger

1 free parameters · 1 axioms · 0 invented entities

The central performance claim rests on the assumption that GNM-derived dynamics features are both accurate and complementary to sequence and structure; the model itself contains many learned parameters but no additional ad-hoc constants beyond standard training.

free parameters (1)
  • MoE router and expert weights
    Learned during supervised training on ProteinGym data.
axioms (1)
  • domain assumption Gaussian Network Model B-factors and cross-correlations capture mutational tolerance signals
    Invoked when extracting dynamics embeddings from AlphaFold structures.

pith-pipeline@v0.9.0 · 5579 in / 1179 out tokens · 53205 ms · 2026-05-10T15:29:13.883132+00:00 · methodology

discussion (0)

Sign in with ORCID, Apple, or X to comment. Anyone can read and Pith papers without signing in.

Reference graph

Works this paper leans on

19 extracted references

  1. [1]

    R., and Erman, B

    Bahar, I., Atilgan, A. R., and Erman, B. Direct evalua- tion of thermal fluctuations in proteins using a single- 4 TriFit: Trimodal Fusion with Protein Dynamics for Mutation Fitness Prediction parameter harmonic potential.F olding and Design, 2: 173–181, 1997

  2. [2]

    M., and Bahar, I

    Bakan, A., Meireles, L. M., and Bahar, I. ProDy: Protein dynamics inferred from theory and experiments.Bioin- formatics, 27:1575–1577, 2011

  3. [3]

    Robust deep learning–based protein sequence design using Pro- teinMPNN.Science, 378:49–56, 2022

    Dauparas, J., Anishchenko, I., Bennett, N., et al. Robust deep learning–based protein sequence design using Pro- teinMPNN.Science, 378:49–56, 2022

  4. [4]

    Fowler, D. M. and Fields, S. Deep mutational scanning: a new style of protein science.Nature Methods, 11:801– 807, 2014

  5. [5]

    Gaussian dynamics of folded proteins.Physical Review Letters, 79:3090, 1997

    Haliloglu, T., Bahar, I., and Erman, B. Gaussian dynamics of folded proteins.Physical Review Letters, 79:3090, 1997

  6. [6]

    Simulating 500 million years of evolution with a language model.Science, 2024

    Hayes, T., Rao, R., Akin, H., et al. Simulating 500 million years of evolution with a language model.Science, 2024

  7. [7]

    Learning inverse folding from millions of predicted structures

    Hsu, C., Verkuil, R., Liu, J., et al. Learning inverse folding from millions of predicted structures. InICML, 2022

  8. [8]

    Learning from protein structure with geometric vector perceptrons

    Dror, R. Learning from protein structure with geometric vector perceptrons. InICLR, 2021

  9. [9]

    Highly accurate protein structure prediction with AlphaFold.Nature, 596: 583–589, 2021

    Jumper, J., Evans, R., Pritzel, A., et al. Highly accurate protein structure prediction with AlphaFold.Nature, 596: 583–589, 2021

  10. [10]

    Kermut, V . et al. Modelling mutational effects on biochemi- cal phenotypes using gaussian processes: Application to clinical variant interpretation.bioRxiv, 2024

  11. [11]

    Gremlin and GEMME: Fast and accurate protein fitness landscape prediction.PLOS Computational Biology, 2019

    Laine, E., Karami, Y ., and Carbone, A. Gremlin and GEMME: Fast and accurate protein fitness landscape prediction.PLOS Computational Biology, 2019

  12. [12]

    Evolutionary-scale pre- diction of atomic-level protein structure with a language model.Science, 379:1123–1130, 2023

    Lin, Z., Akin, H., Rao, R., et al. Evolutionary-scale pre- diction of atomic-level protein structure with a language model.Science, 379:1123–1130, 2023

  13. [13]

    and Hutter, F

    Loshchilov, I. and Hutter, F. SGDR: Stochastic gradient descent with warm restarts. InICLR, 2017

  14. [14]

    and Hutter, F

    Loshchilov, I. and Hutter, F. Decoupled weight decay regu- larization. InICLR, 2019

  15. [15]

    VESPA: Variant effect score prediction without alignments.PLOS Computational Biology, 2022

    Marquet, C., Heinzinger, M., Olenyi, T., et al. VESPA: Variant effect score prediction without alignments.PLOS Computational Biology, 2022

  16. [16]

    Language models enable zero-shot prediction of the effects of mutations on protein function

    Meier, J., Rao, R., Verkuil, R., et al. Language models enable zero-shot prediction of the effects of mutations on protein function. InNeurIPS, 2021

  17. [17]

    TranceptEVE: Combining family-specific and family-agnostic models of protein sequences for improved fitness prediction

    Notin, P., Van Niekerk, L., Kollasch, A., et al. TranceptEVE: Combining family-specific and family-agnostic models of protein sequences for improved fitness prediction. In NeurIPS Workshop on Learning Meaningful Representa- tions of Life, 2022

  18. [18]

    MSA transformer

    Rao, R., Liu, J., Verkuil, R., et al. MSA transformer. 2021

  19. [19]

    SaProt: Protein language modeling with structure-aware vocabulary

    Su, J., Han, C., Zhou, Y ., et al. SaProt: Protein language modeling with structure-aware vocabulary. InICLR, 2024. van den Oord, A., Li, Y ., and Vinyals, O. Representation learning with contrastive predictive coding. InNeurIPS, 2018. 5 TriFit: Trimodal Fusion with Protein Dynamics for Mutation Fitness Prediction A. Implementation Details Embedding extra...