Recognition: no theorem link
TriFit: Trimodal Fusion with Protein Dynamics for Mutation Fitness Prediction
Pith reviewed 2026-05-10 15:29 UTC · model grok-4.3
The pith
A trimodal model that adds protein dynamics to sequence and structure data improves predictions of how mutations affect protein fitness.
A machine-rendered reading of the paper's core claim, the machinery that carries it, and where it could break.
Core claim
The authors establish that dynamics embeddings, capturing residue flexibility, mode shapes, and cross-correlations, when fused with sequence and structure embeddings using an adaptive four-expert mixture-of-experts module and trimodal cross-modal contrastive learning, enable more accurate assessment of mutational tolerance than prior approaches limited to sequence or structure.
What carries the argument
An adaptive mixture-of-experts fusion module that routes and weights combinations of sequence, structure, and dynamics embeddings based on the specific protein input.
If this is right
- The dynamics modality yields the largest performance gain when incorporated alongside the other two.
- Outputs from the model are well-calibrated in their probability estimates.
- Adaptive weighting permits the fusion strategy to vary across different proteins rather than using a uniform rule.
Where Pith is reading between the lines
- This could extend to other prediction tasks in structural biology where motion data might resolve ambiguities in static models.
- Testing the fusion weights against known functional sites in proteins could reveal if the model learns biologically meaningful patterns.
- Applying similar trimodal integration to variants in membrane proteins or those under cellular conditions might further validate the approach.
Load-bearing premise
The information from protein flexibility and correlated motions is not already fully encoded in sequence patterns or static three-dimensional structures.
What would settle it
Observing no increase in accuracy on the mutation assay collection when dynamics features are excluded from the model.
Figures
read the original abstract
Predicting the functional impact of single amino acid substitutions (SAVs) is central to understanding genetic disease and engineering therapeutic proteins. While protein language models and structure-based methods have achieved strong performance on this task, they systematically neglect protein dynamics; residue flexibility, correlated motions, and allosteric coupling are well-established determinants of mutational tolerance in structural biology, yet have not been incorporated into supervised variant effect predictors. We present TriFit, a multimodal framework that integrates sequence, structure, and protein dynamics through a four-expert Mixture-of-Experts (MoE) fusion module with trimodal cross-modal contrastive learning. Sequence embeddings are extracted via masked marginal scoring with ESM-2 (650M); structural embeddings from AlphaFold2-predicted C-alpha geometries; and dynamics embeddings from Gaussian Network Model (GNM) B-factors, mode shapes, and residue-residue cross-correlations. The MoE router adaptively weights modality combinations conditioned on the input, enabling protein-specific fusion without fixed modality assumptions. On the ProteinGym substitution benchmark (217 DMS assays, 696k SAVs), TriFit achieves AUROC 0.897 +/- 0.0002, outperforming all supervised baselines including Kermut (0.864) and ProteinNPT (0.844), and the best zero-shot model ESM3 (0.769). Ablation studies confirm that dynamics provides the largest marginal contribution over pairwise modality combinations, and TriFit achieves well-calibrated probabilistic outputs (ECE = 0.044) without post-hoc correction.
Editorial analysis
A structured set of objections, weighed in public.
Referee Report
Summary. The manuscript presents TriFit, a multimodal framework for predicting fitness effects of single amino acid variants (SAVs). It extracts sequence embeddings via masked marginal scoring with ESM-2, structural embeddings from AlphaFold2 Cα geometries, and dynamics embeddings from Gaussian Network Model (GNM) B-factors, mode shapes, and cross-correlations. These are fused via a four-expert Mixture-of-Experts (MoE) module with trimodal cross-modal contrastive learning. On the ProteinGym substitution benchmark (217 DMS assays, 696k SAVs), TriFit reports AUROC 0.897 +/- 0.0002, outperforming supervised baselines (Kermut 0.864, ProteinNPT 0.844) and zero-shot ESM3 (0.769). Ablations claim the dynamics modality provides the largest marginal gain, and the model produces well-calibrated outputs (ECE=0.044).
Significance. If the results and ablations hold after clarification, the work would be significant for variant effect prediction by integrating protein dynamics—an established determinant of mutational tolerance from structural biology that prior supervised models have neglected. The adaptive MoE fusion and contrastive objective provide a principled way to combine modalities without fixed weighting assumptions. Evaluation on the large public ProteinGym benchmark and explicit reporting of calibration error are positive aspects that support reproducibility and practical utility.
major comments (3)
- [Abstract and Results section] Abstract and ablation studies (Results section): The claim that 'dynamics provides the largest marginal contribution over pairwise modality combinations' is load-bearing for the central novelty argument, yet the controls are unspecified. It is unclear whether dynamics features are appended to a frozen sequence+structure backbone, whether all three modalities are jointly retrained in every ablation arm, or whether the MoE router always receives the same input dimensionality. Without these details the 0.033 AUROC lift over Kermut cannot be confidently attributed to the GNM modality rather than the fusion architecture or training protocol.
- [Experimental setup (Section 3)] Experimental setup (Section 3): No information is given on the train-test split protocol across the 217 DMS assays (per-assay vs. global splits, sequence-identity cutoffs, or temporal splits). Because the MoE router and trimodal contrastive loss are fitted supervised on the same ProteinGym distribution used for final evaluation, the absence of explicit leakage controls undermines interpretation of the headline AUROC of 0.897.
- [Results section] Results section: The reported AUROC variance of +/- 0.0002 is unusually tight. It is not stated whether this reflects multiple random seeds, different data folds, or a single run. This detail is required to assess whether the outperformance over ProteinNPT (0.844) and Kermut (0.864) is statistically reliable.
minor comments (1)
- [Abstract] The abstract states 'well-calibrated probabilistic outputs (ECE = 0.044)' but does not define the expected calibration error formula or binning strategy used; this should be added for clarity.
Simulated Author's Rebuttal
We thank the referee for the constructive and detailed review. We address each major comment below and will revise the manuscript to improve clarity, reproducibility, and interpretation of the results.
read point-by-point responses
-
Referee: [Abstract and Results section] Abstract and ablation studies (Results section): The claim that 'dynamics provides the largest marginal contribution over pairwise modality combinations' is load-bearing for the central novelty argument, yet the controls are unspecified. It is unclear whether dynamics features are appended to a frozen sequence+structure backbone, whether all three modalities are jointly retrained in every ablation arm, or whether the MoE router always receives the same input dimensionality. Without these details the 0.033 AUROC lift over Kermut cannot be confidently attributed to the GNM modality rather than the fusion architecture or training protocol.
Authors: We thank the referee for this observation. In the ablation experiments, all three modalities were jointly retrained together with the MoE router in every arm; dynamics features were not appended to any frozen backbone. The router always received embeddings of identical dimensionality across configurations. We will revise the Results section to explicitly document these controls so that the marginal contribution of the dynamics modality can be properly evaluated. revision: yes
-
Referee: [Experimental setup (Section 3)] Experimental setup (Section 3): No information is given on the train-test split protocol across the 217 DMS assays (per-assay vs. global splits, sequence-identity cutoffs, or temporal splits). Because the MoE router and trimodal contrastive loss are fitted supervised on the same ProteinGym distribution used for final evaluation, the absence of explicit leakage controls undermines interpretation of the headline AUROC of 0.897.
Authors: We agree that the splitting protocol requires explicit description. The experiments used global splits across all 217 assays together with sequence-identity cutoffs between training and test proteins to prevent leakage; no temporal splits were applied. We will add a dedicated paragraph in Section 3 that fully specifies the splitting procedure and the leakage-mitigation steps taken during supervised training of the MoE and contrastive components. revision: yes
-
Referee: [Results section] Results section: The reported AUROC variance of +/- 0.0002 is unusually tight. It is not stated whether this reflects multiple random seeds, different data folds, or a single run. This detail is required to assess whether the outperformance over ProteinNPT (0.844) and Kermut (0.864) is statistically reliable.
Authors: The reported variance of +/- 0.0002 is the standard deviation obtained across multiple independent training runs that differed only in random seed. We will revise the Results section to state the exact number of runs performed and, space permitting, include a brief statistical comparison confirming that the observed gains remain significant. revision: yes
Circularity Check
No significant circularity in TriFit derivation chain
full rationale
The paper presents an empirical multimodal ML model that extracts fixed embeddings from ESM-2, AlphaFold2, and GNM, then trains a supervised MoE fusion module plus contrastive loss on the ProteinGym benchmark. Reported AUROC and ablation results are standard held-out evaluation metrics after training; they do not reduce by construction to the input features or to any self-citation. No load-bearing uniqueness theorems, self-definitional equations, or fitted parameters renamed as predictions appear in the abstract or described pipeline. The central claim (dynamics adds orthogonal signal) is an empirical hypothesis tested via ablation, not a tautology.
Axiom & Free-Parameter Ledger
free parameters (1)
- MoE router and expert weights
axioms (1)
- domain assumption Gaussian Network Model B-factors and cross-correlations capture mutational tolerance signals
Reference graph
Works this paper leans on
-
[1]
R., and Erman, B
Bahar, I., Atilgan, A. R., and Erman, B. Direct evalua- tion of thermal fluctuations in proteins using a single- 4 TriFit: Trimodal Fusion with Protein Dynamics for Mutation Fitness Prediction parameter harmonic potential.F olding and Design, 2: 173–181, 1997
1997
-
[2]
M., and Bahar, I
Bakan, A., Meireles, L. M., and Bahar, I. ProDy: Protein dynamics inferred from theory and experiments.Bioin- formatics, 27:1575–1577, 2011
2011
-
[3]
Robust deep learning–based protein sequence design using Pro- teinMPNN.Science, 378:49–56, 2022
Dauparas, J., Anishchenko, I., Bennett, N., et al. Robust deep learning–based protein sequence design using Pro- teinMPNN.Science, 378:49–56, 2022
2022
-
[4]
Fowler, D. M. and Fields, S. Deep mutational scanning: a new style of protein science.Nature Methods, 11:801– 807, 2014
2014
-
[5]
Gaussian dynamics of folded proteins.Physical Review Letters, 79:3090, 1997
Haliloglu, T., Bahar, I., and Erman, B. Gaussian dynamics of folded proteins.Physical Review Letters, 79:3090, 1997
1997
-
[6]
Simulating 500 million years of evolution with a language model.Science, 2024
Hayes, T., Rao, R., Akin, H., et al. Simulating 500 million years of evolution with a language model.Science, 2024
2024
-
[7]
Learning inverse folding from millions of predicted structures
Hsu, C., Verkuil, R., Liu, J., et al. Learning inverse folding from millions of predicted structures. InICML, 2022
2022
-
[8]
Learning from protein structure with geometric vector perceptrons
Dror, R. Learning from protein structure with geometric vector perceptrons. InICLR, 2021
2021
-
[9]
Highly accurate protein structure prediction with AlphaFold.Nature, 596: 583–589, 2021
Jumper, J., Evans, R., Pritzel, A., et al. Highly accurate protein structure prediction with AlphaFold.Nature, 596: 583–589, 2021
2021
-
[10]
Kermut, V . et al. Modelling mutational effects on biochemi- cal phenotypes using gaussian processes: Application to clinical variant interpretation.bioRxiv, 2024
2024
-
[11]
Gremlin and GEMME: Fast and accurate protein fitness landscape prediction.PLOS Computational Biology, 2019
Laine, E., Karami, Y ., and Carbone, A. Gremlin and GEMME: Fast and accurate protein fitness landscape prediction.PLOS Computational Biology, 2019
2019
-
[12]
Evolutionary-scale pre- diction of atomic-level protein structure with a language model.Science, 379:1123–1130, 2023
Lin, Z., Akin, H., Rao, R., et al. Evolutionary-scale pre- diction of atomic-level protein structure with a language model.Science, 379:1123–1130, 2023
2023
-
[13]
and Hutter, F
Loshchilov, I. and Hutter, F. SGDR: Stochastic gradient descent with warm restarts. InICLR, 2017
2017
-
[14]
and Hutter, F
Loshchilov, I. and Hutter, F. Decoupled weight decay regu- larization. InICLR, 2019
2019
-
[15]
VESPA: Variant effect score prediction without alignments.PLOS Computational Biology, 2022
Marquet, C., Heinzinger, M., Olenyi, T., et al. VESPA: Variant effect score prediction without alignments.PLOS Computational Biology, 2022
2022
-
[16]
Language models enable zero-shot prediction of the effects of mutations on protein function
Meier, J., Rao, R., Verkuil, R., et al. Language models enable zero-shot prediction of the effects of mutations on protein function. InNeurIPS, 2021
2021
-
[17]
TranceptEVE: Combining family-specific and family-agnostic models of protein sequences for improved fitness prediction
Notin, P., Van Niekerk, L., Kollasch, A., et al. TranceptEVE: Combining family-specific and family-agnostic models of protein sequences for improved fitness prediction. In NeurIPS Workshop on Learning Meaningful Representa- tions of Life, 2022
2022
-
[18]
MSA transformer
Rao, R., Liu, J., Verkuil, R., et al. MSA transformer. 2021
2021
-
[19]
SaProt: Protein language modeling with structure-aware vocabulary
Su, J., Han, C., Zhou, Y ., et al. SaProt: Protein language modeling with structure-aware vocabulary. InICLR, 2024. van den Oord, A., Li, Y ., and Vinyals, O. Representation learning with contrastive predictive coding. InNeurIPS, 2018. 5 TriFit: Trimodal Fusion with Protein Dynamics for Mutation Fitness Prediction A. Implementation Details Embedding extra...
2024
discussion (0)
Sign in with ORCID, Apple, or X to comment. Anyone can read and Pith papers without signing in.