DiSPA: Differential Substructure-Pathway Attention for Drug Response Prediction

Eunyi Jeong; Sangsoo Lim; Seokwoo Yun; Sunghyun Kim; Sungkyung Lee; Yewon Han

arxiv: 2601.14346 · v2 · submitted 2026-01-20 · 💻 cs.LG · cs.AI

DiSPA: Differential Substructure-Pathway Attention for Drug Response Prediction

Yewon Han , Sunghyun Kim , Eunyi Jeong , Sungkyung Lee , Seokwoo Yun , Sangsoo Lim This is my paper

Pith reviewed 2026-05-16 12:41 UTC · model grok-4.3

classification 💻 cs.LG cs.AI

keywords drug response predictioncross-attentionchemical substructurepathway gene expressionprecision medicinegeneralizationGDSC benchmarktranscriptomics

0 comments

The pith

DiSPA uses differential cross-attention to model interactions between drug substructures and cellular pathways for improved response prediction.

A machine-rendered reading of the paper's core claim, the machinery that carries it, and where it could break.

DiSPA models bidirectional interactions between chemical substructures in drugs and pathway-level gene expression in cells. It introduces differential cross-attention to suppress spurious associations while strengthening context-relevant ones. This produces stronger results on the GDSC benchmark, especially in disjoint and drug-blind splits that test generalization to new drugs. The resulting attention patterns are more selective than those from standard cross-attention, and the model shows initial promise on external datasets and spatial transcriptomics data.

Core claim

DiSPA achieves state-of-the-art performance on the GDSC benchmark by modeling bidirectional interactions between chemical substructures and pathway-level gene expression through differential cross-attention, which suppresses spurious associations and enhances context-relevant interactions, leading to more selective attention patterns and improved generalization in disjoint and drug-blind settings.

What carries the argument

Differential cross-attention mechanism that suppresses spurious associations while enhancing context-relevant interactions between chemical substructures and pathway gene expressions.

If this is right

Stronger performance when predicting responses for drugs absent from the training set.
More consistent results across random and drug-blind data splits.
Attention maps that concentrate on fewer, more relevant substructure-pathway pairs.
Better ranking of predefined target-related pathways.
Initial generalization to external benchmarks such as CTRP and zero-shot application to spatial transcriptomics.

Where Pith is reading between the lines

These are editorial extensions of the paper, not claims the author makes directly.

The selective attention could help identify which chemical features drive sensitivity in specific cell states.
The same differential mechanism might transfer to other structure-omics prediction tasks.
Testing the model on patient-derived samples would reveal whether cell-line gains translate to clinical settings.
Incorporating additional modalities such as proteomics could further constrain the attention to true mechanisms.

Load-bearing premise

The differential cross-attention mechanism captures genuine biological interactions rather than fitting to dataset-specific noise.

What would settle it

If DiSPA shows no accuracy gain over standard cross-attention baselines on a new independent dataset with altered noise profiles, or if its attention weights fail to align with known drug-target pathway links in controlled experiments, the central claim would not hold.

Figures

Figures reproduced from arXiv: 2601.14346 by Eunyi Jeong, Sangsoo Lim, Seokwoo Yun, Sunghyun Kim, Sungkyung Lee, Yewon Han.

**Figure 1.** Figure 1: Overview of the DiSPA framework. DiSPA consists of three major stages: feature encoding, dual–view differential cross–attention, and drug response prediction. Gene expression profiles are first mapped to KEGG pathways to construct pathway–level gene embeddings, while drug SMILES are decomposed into substructures and encoded alongside drug–level representations. In the dual–view cross–attention module, (Vie… view at source ↗

**Figure 2.** Figure 2: Predictive performance and global organization of drug–cell line interactions learned by DiSPA. (a) Scatter plot comparing predicted and observed ln(IC50) values across all evaluated drug–cell line pairs under the Random split, demonstrating high regression fidelity across the full dynamic range. (b) Pairwise comparison of Pearson correlation coefficients (PCC) between DiSPA and DRPreter at the drug level … view at source ↗

**Figure 3.** Figure 3: Mechanistic interpretation of substructure–pathway interactions learned by DiSPA. Representative case studies of chemically similar drug pairs with divergent predicted responses. Case 1 (KELLY, peripheral nervous system): (a) Chemical structures of UNC0638 and UNC0642 with highlighted regions associated with Path2Sub attention weights and potential activity cliffs. (b) Path2Sub attention weights showing di… view at source ↗

**Figure 4.** Figure 4: Transfer of bulk–trained drug response prediction to spatial and single–cell transcriptomics. (a) Counts of spatial domain–selective drugs in an invasive ductal carcinoma spatial transcriptomics dataset. (b) Overlap of domain–selective drugs (p < 0.05). (c) Top–ranked domain–selective compounds with ln(IC50) differences. (d) Spatial maps of predicted sensitivity for representative tumor– and invasive–selec… view at source ↗

read the original abstract

Accurate prediction of drug response in precision medicine requires models that capture how specific chemical substructures interact with cellular pathway states. However, most existing deep learning approaches treat chemical and transcriptomic modalities independently or combine them only at late stages, limiting their ability to model fine-grained, context-dependent mechanisms of drug action. In addition, vanilla attention mechanisms are often sensitive to noise and sparsity in high-dimensional biological networks, hindering both generalization and interpretability. We present DiSPA (Differential Substructure-Pathway Attention), a framework that models bidirectional interactions between chemical substructures and pathway-level gene expression. DiSPA introduces differential cross-attention to suppress spurious associations while enhancing context-relevant interactions. On the GDSC benchmark, DiSPA achieves state-of-the-art performance, with strong improvements in the disjoint setting. These gains are consistent across random and drug-blind splits, suggesting improved robustness. Analyses of attention patterns indicate more selective and concentrated interactions compared to standard cross-attention. Exploratory evaluation shows that differential attention better prioritizes predefined target-related pathways, although this does not constitute mechanistic validation. DiSPA also shows promising generalization on external datasets (CTRP) and cross-dataset settings, although further validation is needed. It further enables zero-shot application to spatial transcriptomics, providing exploratory insights into region-specific drug sensitivity patterns without ground-truth validation.

Editorial analysis

A structured set of objections, weighed in public.

Desk editor's note, referee report, simulated authors' rebuttal, and a circularity audit. Tearing a paper down is the easy half of reading it; the pith above is the substance, this is the friction.

Desk Editor's Note private letter to a colleague

DiSPA adds a targeted differential cross-attention layer for substructure-pathway links in drug response modeling, delivering incremental benchmark gains but without the ablations needed to isolate its contribution.

read the letter

The paper introduces DiSPA, which uses a differential cross-attention setup to link chemical substructures with pathway-level expression data for predicting drug responses. The main claim is that this helps suppress noise better than standard approaches, leading to SOTA results on GDSC across different splits. It does a decent job showing consistent improvements in both random and drug-blind settings, and it extends to external data like CTRP. The attention visualizations and pathway prioritization add some interpretability, which is useful even if they stay exploratory. The new part is tailoring the attention to be differential for bidirectional modeling, which seems like a solid incremental tweak over late-fusion or basic attention in this domain. Where it falls short is the missing ablations. There's no direct comparison replacing the differential attention with vanilla cross-attention to show what the differential part actually adds. Without that, it's hard to rule out that the gains come from overall model capacity or specific split characteristics rather than the noise suppression. The abstract also skips details on architectures, losses, and any statistical tests, which makes the robustness claims harder to assess fully. The circularity looks fine since they use external benchmarks with disjoint splits. No obvious fitting issues. This paper is aimed at ML researchers in computational pharmacology who want to push multimodal integration a bit further. A reader interested in attention variants for biological data might find the formulation worth looking at, but it won't change the field on its own. I'd recommend sending it for peer review. The core idea is worth testing with more rigorous controls, and the benchmarks are relevant.

Referee Report

3 major / 1 minor

Summary. The manuscript introduces DiSPA, a multimodal deep learning model for drug response prediction that employs differential cross-attention to model bidirectional interactions between chemical substructures and pathway-level gene expression profiles. The central claims are state-of-the-art performance on the GDSC benchmark (with gains in both random and drug-blind disjoint splits), improved robustness relative to prior methods, more selective attention patterns, and exploratory evidence of better prioritization of target-related pathways, plus generalization tests on CTRP and zero-shot application to spatial transcriptomics.

Significance. If the performance gains and robustness claims are substantiated, DiSPA would advance multimodal integration in precision oncology by addressing noise sensitivity in high-dimensional biological data and late-fusion limitations. The differential attention design and reported improvements in unseen-drug settings could inform more generalizable models, though the explicitly exploratory status of the interpretability results constrains immediate biological or clinical impact.

major comments (3)

[Results] Results section: The SOTA and robustness claims on GDSC (random and drug-blind splits) are presented without error bars, statistical significance tests against baselines, or full details on exact architectures, loss functions, and hyperparameter selection, preventing assessment of whether the reported gains exceed what would be expected from capacity differences alone.
[Methods and Results] Methods and Results: No ablation experiments are described that replace differential cross-attention with vanilla cross-attention while holding other components fixed; without such controls, it is impossible to attribute the disjoint-set improvements specifically to the noise-suppression mechanism rather than overall model expressivity or split-specific correlations.
[Interpretability analysis] Interpretability analysis: The claim that differential attention better prioritizes target-related pathways rests on qualitative attention pattern observations and is explicitly labeled exploratory without mechanistic validation; quantitative enrichment against external drug-pathway databases or permutation tests on pathway labels are absent, leaving open the possibility that patterns reflect dataset artifacts.

minor comments (1)

[Abstract and Methods] Abstract and Methods: The description of the differential cross-attention formulation would benefit from an explicit equation or pseudocode block to clarify how the suppression of spurious associations is implemented.

Simulated Author's Rebuttal

3 responses · 0 unresolved

We thank the referee for their constructive feedback. We address each major comment below and have revised the manuscript to strengthen the statistical reporting, add necessary controls, and enhance the interpretability analysis while preserving its exploratory framing.

read point-by-point responses

Referee: [Results] Results section: The SOTA and robustness claims on GDSC (random and drug-blind splits) are presented without error bars, statistical significance tests against baselines, or full details on exact architectures, loss functions, and hyperparameter selection, preventing assessment of whether the reported gains exceed what would be expected from capacity differences alone.

Authors: We agree that the original presentation lacked sufficient statistical detail. In the revised manuscript we now report all GDSC metrics as mean ± standard deviation across five independent runs with different random seeds. We have added paired t-tests (with p-values) comparing DiSPA to each baseline on both random and drug-blind splits. The Methods section has been expanded with complete architecture specifications, the loss function (MSE plus L2 regularization), and the full hyperparameter search protocol together with the final selected values. revision: yes
Referee: [Methods and Results] Methods and Results: No ablation experiments are described that replace differential cross-attention with vanilla cross-attention while holding other components fixed; without such controls, it is impossible to attribute the disjoint-set improvements specifically to the noise-suppression mechanism rather than overall model expressivity or split-specific correlations.

Authors: We concur that an isolated ablation is required. We have added the requested experiment in which differential cross-attention is replaced by standard vanilla cross-attention while freezing every other architectural component, loss, optimizer, and hyperparameter. The new results, presented in a dedicated subsection and table, show that the differential formulation yields further gains specifically in the drug-blind setting, consistent with a noise-suppression benefit beyond raw expressivity. revision: yes
Referee: [Interpretability analysis] Interpretability analysis: The claim that differential attention better prioritizes target-related pathways rests on qualitative attention pattern observations and is explicitly labeled exploratory without mechanistic validation; quantitative enrichment against external drug-pathway databases or permutation tests on pathway labels are absent, leaving open the possibility that patterns reflect dataset artifacts.

Authors: The manuscript already qualifies the analysis as exploratory and explicitly states that it does not provide mechanistic validation. To further address the concern we have supplemented the section with quantitative enrichment: hypergeometric tests against curated drug-pathway associations from KEGG/Reactome and 1,000-label permutation tests to evaluate whether the observed prioritization exceeds chance. These statistics are reported alongside the original qualitative visualizations while retaining the exploratory designation. revision: partial

Circularity Check

0 steps flagged

No circularity detected; model and claims are empirically grounded on external benchmarks

full rationale

The paper defines DiSPA via an explicit architecture (differential cross-attention between substructures and pathways) and evaluates its performance claims solely through comparisons on held-out external data (GDSC random/drug-blind splits, CTRP, cross-dataset). No equation or result reduces by construction to a fitted parameter, self-referential definition, or load-bearing self-citation; attention analyses are presented as exploratory without being used to justify the performance numbers. The derivation chain is therefore self-contained against independent benchmarks.

Axiom & Free-Parameter Ledger

1 free parameters · 0 axioms · 0 invented entities

The central claim rests on the effectiveness of the introduced differential cross-attention component and standard deep learning training assumptions; no new physical entities or ad-hoc axioms are introduced beyond typical neural network inductive biases.

free parameters (1)

model hyperparameters
Standard deep learning hyperparameters (learning rate, layer sizes, attention dimensions) are fitted during training on GDSC data.

pith-pipeline@v0.9.0 · 5546 in / 1135 out tokens · 53324 ms · 2026-05-16T12:41:02.308452+00:00 · methodology

discussion (0)

Lean theorems connected to this paper

Citations machine-checked in the Pith Canon. Every link opens the source theorem in the public Lean library.

IndisputableMonolith/Cost/FunctionalEquation.lean washburn_uniqueness_aczel unclear

?

unclear
Relation between the paper passage and the cited Recognition theorem.

differential cross-attention module that suppresses spurious pathway-substructure associations while amplifying contextually relevant interactions

What do these tags mean?

matches: The paper's claim is directly supported by a theorem in the formal canon.
supports: The theorem supports part of the paper's argument, but the paper may add assumptions or extra steps.
extends: The paper goes beyond the formal theorem; the theorem is a base layer rather than the whole result.
uses: The paper appears to rely on the theorem as machinery.
contradicts: The paper's claim conflicts with a theorem or certificate in the canon.
unclear: Pith found a possible connection, but the passage is too broad, indirect, or ambiguous to say the theorem truly supports the claim.

Forward citations

Cited by 1 Pith paper

Reviewed papers in the Pith corpus that reference this work. Sorted by Pith novelty score.

Training distribution determines the ceiling of drug-blind cancer sensitivity prediction
cs.LG 2026-05 unverdicted novelty 6.0

Drug-blind cancer sensitivity prediction is limited by evaluation metric and training distribution rather than drug representation complexity.

Reference graph

Works this paper leans on

15 extracted references · 15 canonical work pages · cited by 1 Pith paper

[1]

G. Adam, L. Rampášek, Z. Safikhani, P. Smirnov, B. Haibe-Kains, and A. Goldenberg. Machine learning approaches to drug response prediction: challenges and recent progress.NPJ precision oncology, 4(1):19, 2020

work page 2020
[2]

Baptista, P

D. Baptista, P. G. Ferreira, and M. Rocha. Deep learning for drug response prediction in cancer.Briefings in bioinformatics, 22(1):360–379, 2021

work page 2021
[3]

Bellver-Sanchis, Q

A. Bellver-Sanchis, Q. Geng, G. Navarro, P. A. Ávila-López, J. Companys-Alemany, L. Marsal-García, R. Larramona-Arcas, L. Miró, A. Perez-Bosque, D. Ortuño-Sahagún, et al. G9a inhibition promotes neuro- protection through gmfb regulation in alzheimer’s disease.Aging and disease, 15(1):311, 2024

work page 2024
[4]

Degen, C

J. Degen, C. Wegscheid-Gerlach, A. Zaliani, and M. Rarey. On the art of compiling and using’drug-like’chemical fragment spaces.ChemMedChem, 3(10):1503, 2008

work page 2008
[5]

Fan, W.-H

Z. Fan, W.-H. Lin, C. Liang, Y . Li, C.-J. Peng, J.-S. Luo, W.-Y . Tang, L.-M. Zheng, D.-P. Huang, Z.-Y . Ke, et al. Mg132 inhibits proliferation and induces apoptosis of acute lymphoblastic leukemia via akt/foxo3a/bim pathway. Human & Experimental Toxicology, 43:09603271241303030, 2024

work page 2024
[6]

Firoozbakht, B

F. Firoozbakht, B. Yousefi, and B. Schwikowski. An overview of machine learning methods for monotherapy drug response prediction.Briefings in bioinformatics, 23(1), 2022. 11 APREPRINT- JANUARY22, 2026

work page 2022
[7]

Kanehisa and S

M. Kanehisa and S. Goto. Kegg: kyoto encyclopedia of genes and genomes.Nucleic acids research, 28(1):27–30, 2000

work page 2000
[8]

H.-J. Lee, Y . Hong, H. E. Etlioglu, Y . B. Cho, V . Pomella, B. Van den Bosch, J. Vanhecke, S. Verbandt, D. W. Hong, J.-W. Min, et al. Lineage-dependent gene expression programs influence the immune landscape of colorectal cancer.Nature Genetics, 52(6):594–603, 2020

work page 2020
[9]

M. Pak, S. Lee, I. Sung, B. Koo, and S. Kim. Improved drug response prediction by drug target data integration via network-based profiling.Briefings in Bioinformatics, 24(2):bbad034, 2023

work page 2023
[10]

Schubert, B

M. Schubert, B. Klinger, M. Klünemann, A. Sieber, F. Uhlitz, S. Sauer, M. J. Garnett, N. Blüthgen, and J. Saez-Rodriguez. Perturbation-response genes reveal signaling footprints in cancer gene expression.Nature communications, 9(1):20, 2018

work page 2018
[11]

B. Shen, F. Feng, K. Li, P. Lin, L. Ma, and H. Li. A systematic assessment of deep learning methods for drug response prediction: from in vitro to clinical applications.Briefings in Bioinformatics, 24(1):bbac605, 2023

work page 2023
[12]

Stumpfe and J

D. Stumpfe and J. Bajorath. Exploring activity cliffs in medicinal chemistry: miniperspective.Journal of medicinal chemistry, 55(7):2932–2942, 2012

work page 2012
[13]

F. Xia, J. Allen, P. Balaprakash, T. Brettin, C. Garcia-Cardona, A. Clyde, J. Cohn, J. Doroshow, X. Duan, V . Dubinkina, et al. A cross-study analysis of drug response prediction in cancer cell lines.Briefings in bioinformatics, 23(1):bbab356, 2022

work page 2022
[14]

Z. Xun, X. Ding, Y . Zhang, B. Zhang, S. Lai, D. Zou, J. Zheng, G. Chen, B. Su, L. Han, et al. Reconstruction of the tumor spatial microenvironment along the malignant-boundary-nonmalignant axis.Nature Communications, 14(1):933, 2023

work page 2023
[15]

T. Ye, L. Dong, Y . Xia, Y . Sun, Y . Zhu, G. Huang, and F. Wei. Differential transformer. InThe Thirteenth International Conference on Learning Representations, 2025. URL https://openreview.net/forum?id= OvoCm1gGhN. 12

work page 2025

[1] [1]

G. Adam, L. Rampášek, Z. Safikhani, P. Smirnov, B. Haibe-Kains, and A. Goldenberg. Machine learning approaches to drug response prediction: challenges and recent progress.NPJ precision oncology, 4(1):19, 2020

work page 2020

[2] [2]

Baptista, P

D. Baptista, P. G. Ferreira, and M. Rocha. Deep learning for drug response prediction in cancer.Briefings in bioinformatics, 22(1):360–379, 2021

work page 2021

[3] [3]

Bellver-Sanchis, Q

A. Bellver-Sanchis, Q. Geng, G. Navarro, P. A. Ávila-López, J. Companys-Alemany, L. Marsal-García, R. Larramona-Arcas, L. Miró, A. Perez-Bosque, D. Ortuño-Sahagún, et al. G9a inhibition promotes neuro- protection through gmfb regulation in alzheimer’s disease.Aging and disease, 15(1):311, 2024

work page 2024

[4] [4]

Degen, C

J. Degen, C. Wegscheid-Gerlach, A. Zaliani, and M. Rarey. On the art of compiling and using’drug-like’chemical fragment spaces.ChemMedChem, 3(10):1503, 2008

work page 2008

[5] [5]

Fan, W.-H

Z. Fan, W.-H. Lin, C. Liang, Y . Li, C.-J. Peng, J.-S. Luo, W.-Y . Tang, L.-M. Zheng, D.-P. Huang, Z.-Y . Ke, et al. Mg132 inhibits proliferation and induces apoptosis of acute lymphoblastic leukemia via akt/foxo3a/bim pathway. Human & Experimental Toxicology, 43:09603271241303030, 2024

work page 2024

[6] [6]

Firoozbakht, B

F. Firoozbakht, B. Yousefi, and B. Schwikowski. An overview of machine learning methods for monotherapy drug response prediction.Briefings in bioinformatics, 23(1), 2022. 11 APREPRINT- JANUARY22, 2026

work page 2022

[7] [7]

Kanehisa and S

M. Kanehisa and S. Goto. Kegg: kyoto encyclopedia of genes and genomes.Nucleic acids research, 28(1):27–30, 2000

work page 2000

[8] [8]

H.-J. Lee, Y . Hong, H. E. Etlioglu, Y . B. Cho, V . Pomella, B. Van den Bosch, J. Vanhecke, S. Verbandt, D. W. Hong, J.-W. Min, et al. Lineage-dependent gene expression programs influence the immune landscape of colorectal cancer.Nature Genetics, 52(6):594–603, 2020

work page 2020

[9] [9]

M. Pak, S. Lee, I. Sung, B. Koo, and S. Kim. Improved drug response prediction by drug target data integration via network-based profiling.Briefings in Bioinformatics, 24(2):bbad034, 2023

work page 2023

[10] [10]

Schubert, B

M. Schubert, B. Klinger, M. Klünemann, A. Sieber, F. Uhlitz, S. Sauer, M. J. Garnett, N. Blüthgen, and J. Saez-Rodriguez. Perturbation-response genes reveal signaling footprints in cancer gene expression.Nature communications, 9(1):20, 2018

work page 2018

[11] [11]

B. Shen, F. Feng, K. Li, P. Lin, L. Ma, and H. Li. A systematic assessment of deep learning methods for drug response prediction: from in vitro to clinical applications.Briefings in Bioinformatics, 24(1):bbac605, 2023

work page 2023

[12] [12]

Stumpfe and J

D. Stumpfe and J. Bajorath. Exploring activity cliffs in medicinal chemistry: miniperspective.Journal of medicinal chemistry, 55(7):2932–2942, 2012

work page 2012

[13] [13]

F. Xia, J. Allen, P. Balaprakash, T. Brettin, C. Garcia-Cardona, A. Clyde, J. Cohn, J. Doroshow, X. Duan, V . Dubinkina, et al. A cross-study analysis of drug response prediction in cancer cell lines.Briefings in bioinformatics, 23(1):bbab356, 2022

work page 2022

[14] [14]

Z. Xun, X. Ding, Y . Zhang, B. Zhang, S. Lai, D. Zou, J. Zheng, G. Chen, B. Su, L. Han, et al. Reconstruction of the tumor spatial microenvironment along the malignant-boundary-nonmalignant axis.Nature Communications, 14(1):933, 2023

work page 2023

[15] [15]

T. Ye, L. Dong, Y . Xia, Y . Sun, Y . Zhu, G. Huang, and F. Wei. Differential transformer. InThe Thirteenth International Conference on Learning Representations, 2025. URL https://openreview.net/forum?id= OvoCm1gGhN. 12

work page 2025