arxiv: 2605.10985 · v1 · submitted 2026-05-09 · 💻 cs.LG · cs.AI· q-bio.BM

Recognition: 3 theorem links

· Lean Theorem

Structural Interpretations of Protein Language Model Representations via Differentiable Graph Partitioning

Siddhant Dutta , Edward Tan Beng Wai , Soumick Sarker , Pasan Gunawardane , Jagath C. Rajapakse

Authors on Pith no claims yet

Pith reviewed 2026-05-13 07:16 UTC · model grok-4.3

classification 💻 cs.LG cs.AIq-bio.BM

keywords protein language modelsgraph neural networksdifferentiable poolingfunctional substructuresenzyme classificationbinding site detectioninterpretabilitycontact graphs

0 comments

The pith

A graph network on protein contact maps turns ESM-2 features into auditable functional substructures without retraining the language model.

A machine-rendered reading of the paper's core claim, the machinery that carries it, and where it could break.

The paper proposes SoftBlobGIN to project ESM-2 residue representations onto protein contact graphs and apply a Graph Isomorphism Network with differentiable Gumbel-softmax pooling. This produces coarse functional substructures that improve accuracy on enzyme classification to 92.8 percent and raise binding-site residue AUROC from 0.885 to 0.983. A sympathetic reader cares because the approach supplies directly recoverable structural explanations, such as active-site enriched blobs, that are not available from language-model features alone. The method adds roughly 1.1 million parameters, requires no language-model retraining, and generalizes across multiple ProteinShake tasks while automatically grouping residues into biologically relevant clusters.

Core claim

SoftBlobGIN projects ESM-2 representations onto protein contact graphs and applies a lightweight Graph Isomorphism Network with differentiable Gumbel-softmax substructure pooling to perform structure-aware message passing and learn coarse functional substructures. On enzyme classification the model reaches 92.8 percent accuracy and 0.898 macro-F1. On binding-site detection it improves residue AUROC from 0.885 with an ESM-2 linear probe to 0.983. Learned blob partitions automatically group residues into functional substructures, with blobs containing annotated active-site residues showing 1.85 times higher importance without any active-site supervision.

What carries the argument

SoftBlobGIN, a Graph Isomorphism Network with differentiable Gumbel-softmax pooling applied to ESM-2 features projected onto contact graphs, which performs structure-aware message passing and extracts coarse functional substructures for downstream tasks.

If this is right

GNNExplainer applied to SoftBlobGIN recovers biologically meaningful active-site residues, spatially localized clusters, and catalytic contact patterns.
Blobs containing annotated active-site residues exhibit 1.85 times higher importance than other blobs with correlation 0.339.
The framework generalizes to Gene Ontology prediction with F_max of 0.733 and binding-site detection with AUROC of 0.969.
Only about 1.1 million additional parameters are required and the language model itself needs no retraining.

Where Pith is reading between the lines

These are editorial extensions of the paper, not claims the author makes directly.

The same graph-partitioning step could be attached to other sequence-only models to recover structural explanations without retraining the base model.
The observed enrichment of active-site residues inside learned blobs suggests the method isolates modules that linear probes on language-model features alone cannot isolate.
Testing whether the same blobs align with evolutionary conserved patches across homologous proteins would provide an independent check on biological relevance.

Load-bearing premise

Contact graphs derived from protein structures together with differentiable Gumbel-softmax pooling on ESM-2 features will produce biologically meaningful functional substructures rather than merely fitting task-specific patterns.

What would settle it

If the learned blobs fail to show statistically higher importance for annotated active-site residues or if binding-site AUROC falls back to the level of a plain ESM-2 linear probe when the graph structure is removed, the claim that structural partitioning supplies causally relevant substructures would be falsified.

Figures

Figures reproduced from arXiv: 2605.10985 by Edward Tan Beng Wai, Jagath C. Rajapakse, Pasan Gunawardane, Siddhant Dutta, Soumick Sarker.

**Figure 1.** Figure 1: Overview of the SoftBlobGIN Framework. Our pipeline acts as an interpretable structural companion to protein language models. (a) Dense, opaque ESM-2 representations (ϕ esm) are concatenated with explicit structural/physicochemical features and projected onto a 3D contact graph. (b) A lightweight, differentiable Gumbel-Softmax (GS) pooling head learns to softly partition residues into functional substructu… view at source ↗

**Figure 2.** Figure 2: Qualitative 3D case studies of learned SoftBlobGIN substructures. Proteins are rendered with residues colored by blob assignment, while annotated active-site residues are shown as magenta sticks. Across singledomain, multi-domain, & translocase examples, one dominant learned blob consistently overlaps functional active region, supporting biological interpretability of learned partitions. Predictive faithf… view at source ↗

**Figure 3.** Figure 3: Mean normalized solvent accessibility (SASA) of learned blobs across EC classes. Lower [PITH_FULL_IMAGE:figures/full_fig_p016_3.png] view at source ↗

**Figure 4.** Figure 4: Blob importance analysis for proteins with active-site annotations. Left: rank distribution [PITH_FULL_IMAGE:figures/full_fig_p016_4.png] view at source ↗

**Figure 5.** Figure 5: Per-blob amino acid enrichment relative to background amino acid frequencies. Stars [PITH_FULL_IMAGE:figures/full_fig_p017_5.png] view at source ↗

**Figure 6.** Figure 6: Mean intra-blob Cα distance across EC classes. Lower distance indicates greater spatial compactness. & post-hoc explanations. Together, solvent accessibility, active-site enrichment, amino acid composition, spatial coherence, & agreement with post-hoc explainers indicate that SoftBlobGIN learns biologically meaningful structural partitions rather than arbitrary graph clusters. H Relationship to known doma… view at source ↗

**Figure 7.** Figure 7: Agreement between SoftBlobGIN blob assignments & GNNExplainer important residues. [PITH_FULL_IMAGE:figures/full_fig_p018_7.png] view at source ↗

**Figure 8.** Figure 8: Representative GNNExplainer edge-importance maps across EC classes. Darker edges [PITH_FULL_IMAGE:figures/full_fig_p019_8.png] view at source ↗

read the original abstract

Protein language models such as ESM-2 learn rich residue representations that achieve strong performance on protein function prediction, but their features remain difficult to interpret as structural $\&$ evolutionary signals are encoded in dense latent spaces. We propose a plug-$\&$-play framework that projects ESM-2 representations onto protein contact graphs $\&$ applies $\textbf{SoftBlobGIN}$, a lightweight Graph Isomorphism Network with differentiable Gumbel-softmax substructure pooling, to perform structure-aware message passing $\&$ learn coarse functional substructures for downstream prediction tasks. Across enzyme classification, SoftBlobGIN achieves 92.8\% accuracy $\&$ 0.898 macro-F1. Unlike post hoc analysis of protein language models alone, our method produces directly auditable structural explanations: GNNExplainer recovers biologically meaningful active-site residues, spatially localized functional clusters, $\&$ catalytic contact patterns. On binding-site detection, SoftBlobGIN improves residue AUROC from $0.885$ using an ESM-2 linear probe to $0.983$, indicating that these structural explanations are not recoverable from language-model features alone. Learned blob partitions provide an additional layer of interpretability by automatically grouping residues into functional substructures, with blobs containing annotated active-site residues showing $1.85\times$ higher importance than other blobs ($\rho{=}0.339$, $p{=}0.009$), without any active-site supervision. Our framework requires no retraining of the language model, adds only $\sim$1.1M parameters, $\&$ generalises across ProteinShake tasks, achieving $F_{\max}$ of $0.733$ on Gene Ontology prediction $\&$ AUROC of $0.969$ on binding-site detection. We position this as an interpretable structural companion to protein language models that makes their predictions more transparent $\&$ auditable.

Editorial analysis

A structured set of objections, weighed in public.

Desk editor's note, referee report, simulated authors' rebuttal, and a circularity audit. Tearing a paper down is the easy half of reading it; the pith above is the substance, this is the friction.

Desk Editor's Note private letter to a colleague

The paper adds a lightweight GNN wrapper with differentiable pooling to ESM-2 embeddings for structural interpretability, delivering clear performance lifts on the reported tasks, but the gains may stem from the contact graph itself rather than the specific pooling mechanism.

read the letter

The core contribution is a plug-and-play setup that takes ESM-2 residue vectors, builds contact graphs from structure, and runs them through SoftBlobGIN—a GIN variant that uses Gumbel-softmax to learn coarse substructure partitions. This produces both higher downstream accuracy and some post-hoc explanations via GNNExplainer, all without touching the original language model weights. The numbers look decent: 92.8% accuracy on enzyme classification, an AUROC jump from 0.885 to 0.983 on binding-site detection, and an unsupervised 1.85× importance boost for blobs that happen to contain active-site residues. The added parameter count stays small at roughly 1.1M, which is practical for people who already run ESM-2.

Referee Report

3 major / 3 minor

Summary. The paper introduces SoftBlobGIN, a lightweight GIN augmented with differentiable Gumbel-softmax pooling applied to protein contact graphs derived from ESM-2 residue representations. It claims this yields strong performance on enzyme classification (92.8% accuracy, 0.898 macro-F1) and binding-site detection (residue AUROC 0.983 vs. 0.885 for an ESM-2 linear probe), while producing auditable structural explanations via GNNExplainer and unsupervised functional substructures ('blobs') whose active-site-containing partitions show 1.85× higher importance (ρ=0.339, p=0.009) without supervision. The method adds ~1.1M parameters, requires no LM retraining, and generalizes to ProteinShake tasks.

Significance. If the central claims hold after controls, the work offers a practical plug-and-play route to inject structural interpretability into frozen protein language models via contact-graph message passing and learned coarse partitions. The unsupervised importance correlation and concrete task numbers are strengths; however, the absence of ablations isolating the pooling operator leaves open whether these benefits are specific to differentiable partitioning or simply follow from any GNN on the contact graph.

major comments (3)

[binding-site detection experiment] Binding-site detection results: the AUROC gain (0.885 linear ESM-2 probe → 0.983 SoftBlobGIN) is reported without an ablation using a standard GIN or message-passing network on the identical contact graphs but without Gumbel-softmax pooling. This comparison is load-bearing for the claim that the differentiable partitioning (rather than graph structure alone) produces the structural explanations and performance lift.
[experimental setup] Experimental evaluation: no details are given on data splits, number of random seeds or statistical testing for the 92.8% accuracy / 0.898 macro-F1 figures, baseline implementations, or sensitivity of the learned partitions and importance scores (ρ=0.339) to the Gumbel-softmax temperature hyperparameter.
[interpretability analysis] Unsupervised blob-importance analysis: the 1.85× higher importance for active-site blobs is computed post-hoc via GNNExplainer; without an ablation that removes or replaces the differentiable pooling step, it remains possible that the functional grouping arises from the contact graph and ESM-2 features rather than the proposed SoftBlobGIN partitioning mechanism.

minor comments (3)

[abstract] The abstract states the method 'adds only ∼1.1M parameters' but provides no component-wise breakdown (e.g., pooling vs. GIN layers).
[results] Notation: clarify whether ρ denotes Spearman or Pearson correlation and whether the p-value is corrected for multiple testing.
[generalization experiments] The ProteinShake tasks are referenced but no citation or brief description of the benchmark is supplied.

Simulated Author's Rebuttal

3 responses · 0 unresolved

We thank the referee for their detailed and constructive comments. We address each major point below and have made revisions to the manuscript to incorporate additional ablations and experimental details as suggested.

read point-by-point responses

Referee: [binding-site detection experiment] Binding-site detection results: the AUROC gain (0.885 linear ESM-2 probe → 0.983 SoftBlobGIN) is reported without an ablation using a standard GIN or message-passing network on the identical contact graphs but without Gumbel-softmax pooling. This comparison is load-bearing for the claim that the differentiable partitioning (rather than graph structure alone) produces the structural explanations and performance lift.

Authors: We agree that an ablation isolating the contribution of the differentiable pooling is important. In the revised manuscript, we have added results for a standard GIN (without Gumbel-softmax pooling) applied to the same ESM-2-derived contact graphs. This baseline achieves an AUROC of 0.912 on binding-site detection, which is an improvement over the linear probe but lower than the 0.983 of SoftBlobGIN. This supports that the learned partitions contribute to the performance gain. We have updated the results section and added a new table comparing the variants. revision: yes
Referee: [experimental setup] Experimental evaluation: no details are given on data splits, number of random seeds or statistical testing for the 92.8% accuracy / 0.898 macro-F1 figures, baseline implementations, or sensitivity of the learned partitions and importance scores (ρ=0.339) to the Gumbel-softmax temperature hyperparameter.

Authors: We apologize for the omission of these details. In the revised version, we have expanded the Experimental Setup section to include: (1) data splits using stratified 5-fold cross-validation with 80/10/10 train/val/test ratios; (2) results averaged over 5 random seeds with standard deviations reported; (3) statistical significance via paired t-tests (p<0.01 for main comparisons); (4) baseline implementations using official codebases where available; and (5) a sensitivity analysis showing that the importance correlation ρ remains stable (0.31-0.36) for Gumbel temperatures between 0.5 and 2.0. These details have been added to the main text and Appendix B. revision: yes
Referee: [interpretability analysis] Unsupervised blob-importance analysis: the 1.85× higher importance for active-site blobs is computed post-hoc via GNNExplainer; without an ablation that removes or replaces the differentiable pooling step, it remains possible that the functional grouping arises from the contact graph and ESM-2 features rather than the proposed SoftBlobGIN partitioning mechanism.

Authors: We acknowledge this limitation in the original submission. To address it, we have performed an ablation where we replace the differentiable pooling with fixed random partitions or no pooling (i.e., using all residues). In these cases, the importance correlation drops to ρ=0.12 (p=0.21) and ρ=0.08 (p=0.45), respectively, compared to 0.339 in the full model. This indicates that the learned partitions are key to recovering the functional groupings. We have included these results in a new subsection on interpretability ablations. revision: yes

Circularity Check

0 steps flagged

No circularity: empirical results from new architecture are independent of inputs

full rationale

The paper proposes SoftBlobGIN as a new plug-and-play GNN with Gumbel-softmax pooling applied to ESM-2 features on contact graphs. All central claims (92.8% accuracy, AUROC lift from 0.885 to 0.983, 1.85× blob importance with ρ=0.339) are presented as measured experimental outcomes on held-out tasks, not as quantities that reduce by definition or by fitting to the same data used for evaluation. No self-citations justify uniqueness theorems, no ansatz is smuggled, and no fitted parameter is relabeled as a prediction. The derivation chain consists of architectural choices whose effects are externally validated rather than tautological.

Axiom & Free-Parameter Ledger

1 free parameters · 1 axioms · 2 invented entities

The central claim rests on the domain assumption that contact graphs encode functionally relevant structural information and on the introduction of a new differentiable pooling mechanism whose biological validity is asserted rather than independently verified.

free parameters (1)

Gumbel-softmax temperature
Controls the sharpness of the differentiable partition; its value is chosen during training and directly affects the learned blobs.

axioms (1)

domain assumption Protein contact graphs derived from structure accurately reflect interactions relevant to function
Invoked when representations are projected onto the graphs before message passing.

invented entities (2)

SoftBlobGIN no independent evidence
purpose: Lightweight GNN with differentiable substructure pooling for learning coarse functional substructures
New model component introduced by the paper.
blobs no independent evidence
purpose: Automatically discovered functional substructures that group residues
Output of the differentiable pooling layer; claimed to be biologically meaningful without supervision.

pith-pipeline@v0.9.0 · 5661 in / 1543 out tokens · 63719 ms · 2026-05-13T07:16:12.191606+00:00 · methodology

discussion (0)

Lean theorems connected to this paper

Citations machine-checked in the Pith Canon. Every link opens the source theorem in the public Lean library.

IndisputableMonolith/Foundation/AlexanderDuality.lean alexander_duality_circle_linking echoes

?

echoes
ECHOES: this paper passage has the same mathematical shape or conceptual pattern as the Recognition theorem, but is not a direct formal dependency.

blob count (K∈ {3,5,8,12}, optimal atK=8) ... enzymes have a small number of distinct functional substructures (active site, cofactor pocket, substrate channel, scaffold)
IndisputableMonolith/Foundation/AlexanderDuality.lean D3_admits_circle_linking echoes

?

echoes
ECHOES: this paper passage has the same mathematical shape or conceptual pattern as the Recognition theorem, but is not a direct formal dependency.

3D CoordinatesC∈RN×3 Radius Graphε= 8Å ... GINEConvBackbone ... Gumbel-Softmax Assignment
IndisputableMonolith/Cost/FunctionalEquation.lean washburn_uniqueness_aczel unclear

?

unclear
Relation between the paper passage and the cited Recognition theorem.

SoftBlobGIN ... differentiable Gumbel-softmax substructure pooling ... ~1.1M parameters

What do these tags mean?

matches: The paper's claim is directly supported by a theorem in the formal canon.
supports: The theorem supports part of the paper's argument, but the paper may add assumptions or extra steps.
extends: The paper goes beyond the formal theorem; the theorem is a base layer rather than the whole result.
uses: The paper appears to rely on the theorem as machinery.
contradicts: The paper's claim conflicts with a theorem or certificate in the canon.
unclear: Pith found a possible connection, but the passage is too broad, indirect, or ambiguous to say the theorem truly supports the claim.

Reference graph

Works this paper leans on

20 extracted references · 20 canonical work pages · 5 internal anchors

[1]

Strategies for pre-training graph neural networks.arXiv preprint arXiv:1905.12265, 2019

Strategies for pre-training graph neural networks , author=. arXiv preprint arXiv:1905.12265 , year=

work page arXiv 1905
[2]

Categorical Reparameterization with Gumbel-Softmax

Categorical reparameterization with gumbel-softmax , author=. arXiv preprint arXiv:1611.01144 , year=

work page internal anchor Pith review Pith/arXiv arXiv
[3]

nature , volume=

Highly accurate protein structure prediction with AlphaFold , author=. nature , volume=. 2021 , publisher=

work page 2021
[4]

Semi-Supervised Classification with Graph Convolutional Networks

Semi-supervised classification with graph convolutional networks , author=. arXiv preprint arXiv:1609.02907 , year=

work page internal anchor Pith review Pith/arXiv arXiv
[5]

Advances in Neural Information Processing Systems , volume=

Proteinshake: Building datasets and benchmarks for deep learning on protein structures , author=. Advances in Neural Information Processing Systems , volume=

work page
[6]

Nature Methods , volume=

Understudied proteins: opportunities and challenges for functional proteomics , author=. Nature Methods , volume=. 2022 , publisher=

work page 2022
[7]

Proceedings of the IEEE international conference on computer vision , pages=

Focal loss for dense object detection , author=. Proceedings of the IEEE international conference on computer vision , pages=

work page
[8]

Science , volume=

Evolutionary-scale prediction of atomic-level protein structure with a language model , author=. Science , volume=. 2023 , publisher=

work page 2023
[9]

Proceedings of the IEEE/CVF conference on computer vision and pattern recognition , pages=

Explainability methods for graph convolutional neural networks , author=. Proceedings of the IEEE/CVF conference on computer vision and pattern recognition , pages=

work page
[10]

International conference on machine learning , pages=

Axiomatic attribution for deep networks , author=. International conference on machine learning , pages=. 2017 , organization=

work page 2017
[11]

Nucleic acids research , volume=

DeepLoc 2.0: multi-label subcellular localization prediction using protein language models , author=. Nucleic acids research , volume=. 2022 , publisher=

work page 2022
[12]

Graph Attention Networks

Graph attention networks , author=. arXiv preprint arXiv:1710.10903 , year=

work page internal anchor Pith review Pith/arXiv arXiv
[13]

BioBlobs: Unsupervised Discovery of Functional Substructures for Protein Function Prediction

BioBlobs: Differentiable Graph Partitioning for Protein Representation Learning , author=. arXiv preprint arXiv:2510.01632 , year=

work page internal anchor Pith review Pith/arXiv arXiv
[14]

arXiv preprint arXiv:2207.12600 , year=

Learning hierarchical protein representations via complete 3d graph networks , author=. arXiv preprint arXiv:2207.12600 , year=

work page arXiv
[15]

arXiv preprint arXiv:2203.06125 , year=

Protein representation learning by geometric structure pretraining , author=. arXiv preprint arXiv:2203.06125 , year=

work page arXiv
[16]

International conference on machine learning , pages=

Representation learning on graphs with jumping knowledge networks , author=. International conference on machine learning , pages=. 2018 , organization=

work page 2018
[17]

How Powerful are Graph Neural Networks?

How powerful are graph neural networks? , author=. arXiv preprint arXiv:1810.00826 , year=

work page internal anchor Pith review Pith/arXiv arXiv
[18]

Advances in neural information processing systems , volume=

Hierarchical graph representation learning with differentiable pooling , author=. Advances in neural information processing systems , volume=

work page
[19]

Advances in neural information processing systems , volume=

Gnnexplainer: Generating explanations for graph neural networks , author=. Advances in neural information processing systems , volume=

work page
[20]

IEEE transactions on pattern analysis and machine intelligence , volume=

Explainability in graph neural networks: A taxonomic survey , author=. IEEE transactions on pattern analysis and machine intelligence , volume=. 2022 , publisher=

work page 2022