I-SAFE: Wasserstein Coherence Metrics for Structural Auditing of Scientific AI Models
Pith reviewed 2026-05-22 09:52 UTC · model grok-4.3
The pith
I-SAFE auditing reveals different distributional profiles in DTI models with similar accuracy.
A machine-rendered reading of the paper's core claim, the machinery that carries it, and where it could break.
Core claim
Given a trained black-box predictor and an external structural prior encoding domain knowledge about task-relevant input structure, I-SAFE evaluates raw model outputs under structurally guided perturbations of the input. The proposed audit measures output-distribution coherence through three complementary metrics: a Quantile-Based Metric for location-level coherence, the Wasserstein Coherence Metric for ordinal coherence, and a translation-invariant WCM variant for shape coherence. Instantiated on drug-target interaction prediction using the Davis kinase benchmark, KLIFS binding-pocket annotations, and three models, the framework shows that models with comparable predictive performance can,
What carries the argument
Wasserstein Coherence Metric that quantifies ordinal and shape coherence of model output distributions under perturbations derived from the external structural prior.
If this is right
- Models can be compared and selected according to structural coherence in addition to predictive accuracy.
- The audit can identify reliance on dataset-specific regularities rather than domain-relevant features.
- The framework applies directly to any scientific prediction task where inputs admit structured decomposition and an external prior exists.
Where Pith is reading between the lines
- Combining coherence scores with standard accuracy could produce a joint ranking criterion for model deployment in scientific settings.
- Low coherence on specific perturbation types might guide targeted data collection or architecture adjustments.
Load-bearing premise
The external structural prior accurately encodes task-relevant input structure that can be used to generate meaningful perturbations for the audit.
What would settle it
Running the three coherence metrics on the same set of KLIFS-guided perturbations and finding that the three DTI models produce identical or statistically indistinguishable distributional response profiles.
Figures
read the original abstract
Deep learning models are increasingly used in scientific prediction tasks where strong benchmark performance is often interpreted as evidence of scientifically meaningful behavior. This interpretation is fragile, as models may exploit shortcut features, dataset-specific regularities, or distributional biases that are predictive on held-out data but not aligned with domain-relevant structure. To address this limitation, we introduce the \textsc{I-SAFE} (Interventional Secure, Accurate, Fair and Explainable) framework, a post-hoc distributional auditing framework for scientific AI models centered on the Wasserstein Coherence Metric (WCM). Given a trained black-box predictor and an external structural prior encoding domain knowledge about task-relevant input structure, \textsc{I-SAFE} evaluates raw model outputs under structurally guided perturbations of the input. The proposed audit measures output-distribution coherence through three complementary metrics: a Quantile-Based Metric (QBM) for location-level coherence, the WCM for ordinal coherence, and a translation-invariant WCM variant for shape coherence. We instantiate \textsc{I-SAFE} on drug--target interaction (DTI) prediction using the Davis kinase benchmark, KLIFS (Kinase--Ligand Interaction Fingerprints and Structures) binding-pocket annotations, and three sequence-based DTI models: DeepConvDTI, DeepDTA, and TAPB. Although the models operate in a comparable predictive regime, \textsc{I-SAFE} reveals substantially different distributional response profiles, a distinction invisible to accuracy-based evaluation. The framework is model-agnostic and applicable to any domain where inputs admit a structured decomposition and an external prior is available.
Editorial analysis
A structured set of objections, weighed in public.
Referee Report
Summary. The paper introduces the I-SAFE framework, a post-hoc auditing method for scientific AI models that applies Wasserstein Coherence Metrics (WCM) and a Quantile-Based Metric (QBM) to evaluate output-distribution coherence under perturbations generated from an external structural prior (KLIFS binding-pocket annotations). It demonstrates the approach on three sequence-based drug-target interaction models (DeepConvDTI, DeepDTA, TAPB) trained on the Davis kinase benchmark, claiming that the models exhibit substantially different distributional response profiles despite comparable predictive accuracy.
Significance. If the central claims hold, I-SAFE offers a model-agnostic tool for detecting misalignment between model behavior and domain-relevant structure that standard accuracy metrics miss. The use of Wasserstein distances for ordinal and shape coherence, combined with an external prior, provides a concrete way to audit shortcut exploitation in scientific prediction tasks.
major comments (2)
- [§3.2] §3.2 (Perturbation Generation): The paper does not report whether KLIFS-guided perturbations preserve marginal input statistics such as amino-acid composition or sequence-length distribution across the three models. Without this check, the observed differences in QBM/WCM profiles could reflect architecture-specific sensitivity to any structured input change rather than genuine misalignment with binding-pocket structure, directly undermining the central claim that I-SAFE isolates task-relevant structural coherence.
- [§4.3] §4.3 (Results and Comparison): The claim that accuracy-based evaluation is 'invisible' to the distinctions found by I-SAFE requires explicit quantification of how much of the WCM/QBM separation is explained by residual correlations with non-structural features; the current presentation leaves open the possibility that the metrics are re-detecting known architecture differences rather than new scientific misalignment.
minor comments (2)
- [§2.3] The definition of the translation-invariant WCM variant should include an explicit equation showing how translation invariance is enforced, to allow readers to verify it does not inadvertently remove shape information relevant to the audit.
- [Figure 2] Figure 2 (Distributional response profiles): Axis labels and legend entries are too small for readability; increase font size and add a brief caption explaining the color coding for the three models.
Simulated Author's Rebuttal
We thank the referee for their constructive and detailed comments on our manuscript. These observations help clarify how to better isolate the contribution of structural priors in the I-SAFE framework. We respond to each major comment below and indicate the revisions we will make.
read point-by-point responses
-
Referee: [§3.2] §3.2 (Perturbation Generation): The paper does not report whether KLIFS-guided perturbations preserve marginal input statistics such as amino-acid composition or sequence-length distribution across the three models. Without this check, the observed differences in QBM/WCM profiles could reflect architecture-specific sensitivity to any structured input change rather than genuine misalignment with binding-pocket structure, directly undermining the central claim that I-SAFE isolates task-relevant structural coherence.
Authors: We agree that an explicit check on marginal input statistics would strengthen the interpretation. In the revised manuscript we will add a supplementary table and brief analysis comparing amino-acid composition and sequence-length distributions between the original sequences and the KLIFS-guided perturbations for each of the three models. The perturbations are constructed by targeted residue substitutions within the binding-pocket regions annotated by KLIFS; because the changes are localized and the overall sequence length is unchanged, we expect the marginals to remain largely preserved. Including this verification will directly address the concern that the observed coherence differences could arise from generic sensitivity to any input modification. revision: yes
-
Referee: [§4.3] §4.3 (Results and Comparison): The claim that accuracy-based evaluation is 'invisible' to the distinctions found by I-SAFE requires explicit quantification of how much of the WCM/QBM separation is explained by residual correlations with non-structural features; the current presentation leaves open the possibility that the metrics are re-detecting known architecture differences rather than new scientific misalignment.
Authors: We acknowledge that a quantitative separation from non-structural factors would make the claim more robust. In the revision we will add a short analysis (new panel or appendix) that reports partial correlations and a simple regression of the WCM/QBM scores against a set of non-structural covariates (model depth, embedding dimension, and basic sequence statistics). This will allow readers to see the fraction of metric separation that remains after controlling for these factors. We maintain that the primary distinction arises from differential sensitivity to the KLIFS structural prior, but the added quantification will clarify the extent to which architecture-specific traits contribute. revision: yes
Circularity Check
No circularity: I-SAFE metrics defined directly from external prior and Wasserstein distances
full rationale
The paper defines the Quantile-Based Metric (QBM), Wasserstein Coherence Metric (WCM), and its translation-invariant variant explicitly as functions of raw model outputs under perturbations generated from the independent KLIFS binding-pocket annotations. These definitions rely on standard Wasserstein distance applied to the resulting output distributions and do not reduce to fitted parameters, self-referential quantities, or prior results by the same authors. The central empirical claim—that the three DTI models exhibit distinct distributional response profiles despite comparable accuracy—is an observation obtained by applying the externally defined metrics, not a tautology. No self-citation chains, uniqueness theorems, or smuggled ansatzes appear in the load-bearing steps of the derivation.
Axiom & Free-Parameter Ledger
axioms (1)
- domain assumption External structural prior encodes domain knowledge about task-relevant input structure
invented entities (1)
-
Wasserstein Coherence Metric (WCM)
no independent evidence
Lean theorems connected to this paper
-
IndisputableMonolith/Cost/FunctionalEquation.leanwashburn_uniqueness_aczel unclear?
unclearRelation between the paper passage and the cited Recognition theorem.
Wasserstein Coherence Metric (WCM) defined via optimal transport reordering of output profiles under mechanistic vs spurious perturbations
What do these tags mean?
- matches
- The paper's claim is directly supported by a theorem in the formal canon.
- supports
- The theorem supports part of the paper's argument, but the paper may add assumptions or extra steps.
- extends
- The paper goes beyond the formal theorem; the theorem is a base layer rather than the whole result.
- uses
- The paper appears to rely on the theorem as machinery.
- contradicts
- The paper's claim conflicts with a theorem or certificate in the canon.
- unclear
- Pith found a possible connection, but the passage is too broad, indirect, or ambiguous to say the theorem truly supports the claim.
discussion (0)
Sign in with ORCID, Apple, or X to comment. Anyone can read and Pith papers without signing in.