arxiv: 2604.13254 · v1 · submitted 2026-04-14 · 💻 cs.GR

Recognition: unknown

Calibrated Abstention for Reliable TCR--pMHC Binding Prediction under Epitope Shift

Arman Bekov , Timur Bekzhanov , Bekzat Sadykov

Authors on Pith no claims yet

Pith reviewed 2026-05-10 13:16 UTC · model grok-4.3

classification 💻 cs.GR

keywords TCR-pMHC bindingselective predictionconformal predictionmodel calibrationepitope shiftabstentionprotein language modelsT-cell receptor

0 comments

The pith

A dual-encoder model with temperature scaling and conformal abstention delivers reliable TCR-pMHC binding predictions with coverage guarantees under epitope shift.

A machine-rendered reading of the paper's core claim, the machinery that carries it, and where it could break.

The paper treats TCR-pMHC binding prediction as a selective task in which the model must either produce a trustworthy score or abstain. It combines a dual-encoder architecture that processes CDR3 and peptide sequences through a pre-trained protein language model, temperature scaling to fix miscalibration, and a conformal abstention rule that supplies finite-sample coverage guarantees at any chosen target error rate. This setup is evaluated on random, epitope-held-out, and distance-aware splits. The approach matters because deployed models routinely overconfident on unseen epitopes, which undermines prioritization in vaccine design and T-cell therapies. Under the hardest epitope-held-out protocol the method reports an AUROC of 0.813 and an ECE of 0.043 while cutting calibration error by 69.7 percent relative to the uncalibrated baseline and lowering error rate from 18.7 percent to 10.9 percent at 80 percent coverage.

Core claim

We frame TCR-pMHC binding as a selective prediction problem and equip a dual-encoder architecture with temperature scaling plus a conformal abstention rule. The rule uses calibration data to set per-instance thresholds that guarantee the error rate among non-abstained predictions stays at or below a user-specified target with finite-sample coverage, even when epitopes are held out. On the epitope-held-out protocol the resulting model reaches AUROC 0.813 and ECE 0.043, a 69.7 percent reduction in ECE versus the uncalibrated baseline, and at 80 percent coverage reduces error rate from 18.7 percent to 10.9 percent.

What carries the argument

The conformal abstention rule, which converts nonconformity scores computed on a calibration set into instance-specific thresholds that enforce finite-sample coverage of the target error rate on the accepted predictions.

Load-bearing premise

The calibration data remains exchangeable with the test data despite the deliberate epitope shift between them.

What would settle it

On a fresh epitope-held-out test set the observed error rate among accepted predictions exceeds the target error rate by more than the conformal guarantee allows.

read the original abstract

Predicting T-cell receptor (TCR)--peptide-MHC (pMHC) binding is central to vaccine design and T-cell therapy, yet deployed models frequently encounter epitopes unseen during training, causing silent overconfidence and unreliable prioritization. We address this by framing TCR--pMHC prediction as a \emph{selective prediction} problem: a calibrated model should either output a trustworthy confidence score or explicitly abstain. Concretely, we (1) introduce a dual-encoder architecture encoding both CDR3$\alpha$/CDR3$\beta$ and peptide sequences via a pre-trained protein language model; (2) apply temperature scaling to correct systematic probability miscalibration; and (3) impose a conformal abstention rule that provides finite-sample coverage guarantees at a user-specified target error rate. Evaluated under three split strategies -- random, epitope-held-out, and distance-aware -- our method achieves AUROC 0.813 and ECE 0.043 under the challenging epitope-held-out protocol, reducing ECE by 69.7\% relative to an uncalibrated baseline. At 80\% coverage, the selective model further reduces error rate from 18.7\% to 10.9\%, demonstrating that calibrated abstention enables principled coverage-risk trade-offs aligned with practical screening budgets.

Editorial analysis

A structured set of objections, weighed in public.

Desk editor's note, referee report, simulated authors' rebuttal, and a circularity audit. Tearing a paper down is the easy half of reading it; the pith above is the substance, this is the friction.

Desk Editor's Note private letter to a colleague

The paper shows solid empirical gains in calibration and selective prediction for TCR-pMHC binding under epitope shift, but the claimed finite-sample conformal guarantees do not hold because the splits break exchangeability.

read the letter

This work takes a dual-encoder protein LM, adds temperature scaling, and layers on conformal abstention to let the model output a score or abstain on TCR-pMHC binding. The headline result is that they cut ECE by about 70% and drop error from 18.7% to 10.9% at 80% coverage on the epitope-held-out split, while keeping AUROC at 0.813. Those numbers are the practical takeaway for anyone screening candidates for vaccines or T-cell therapies who needs to avoid overconfident calls on new epitopes.

Referee Report

1 major / 1 minor

Summary. The paper presents a method for TCR-pMHC binding prediction that combines a dual-encoder architecture based on pre-trained protein language models, temperature scaling for calibration, and conformal prediction for abstention. It claims finite-sample coverage guarantees and reports strong performance metrics under random, epitope-held-out, and distance-aware splits, including AUROC 0.813 and ECE 0.043 on the challenging epitope-held-out setting with substantial error reduction at high coverage.

Significance. If the conformal guarantees hold under the reported shifts, this work would significantly advance reliable and trustworthy predictions for TCR-pMHC interactions, enabling better prioritization in vaccine design and T-cell therapy by allowing principled abstention. The empirical improvements in calibration (69.7% ECE reduction) and selective error rates provide practical value even without the guarantees.

major comments (1)

[Abstract] The claim that the conformal abstention rule provides finite-sample coverage guarantees at a user-specified target error rate under the epitope-held-out protocol is not justified. Standard conformal prediction relies on exchangeability between calibration and test points, which is violated by the deliberate epitope shift that partitions unseen epitopes. No weighted or adaptive conformal methods are described to address this non-exchangeability.

minor comments (1)

[Abstract] The description of the dual-encoder architecture could benefit from more detail on how CDR3α/β and peptide sequences are encoded and combined.

Simulated Author's Rebuttal

1 responses · 0 unresolved

We thank the referee for their careful reading and for identifying a point that requires clarification in the abstract and methods. We address the major comment below and will revise the manuscript accordingly.

read point-by-point responses

Referee: [Abstract] The claim that the conformal abstention rule provides finite-sample coverage guarantees at a user-specified target error rate under the epitope-held-out protocol is not justified. Standard conformal prediction relies on exchangeability between calibration and test points, which is violated by the deliberate epitope shift that partitions unseen epitopes. No weighted or adaptive conformal methods are described to address this non-exchangeability.

Authors: We agree that the finite-sample coverage guarantee of standard conformal prediction requires exchangeability between the calibration and test points. The epitope-held-out protocol partitions on unseen epitopes and therefore induces a distribution shift that violates exchangeability. The manuscript applies the standard (non-weighted, non-adaptive) conformal abstention rule and reports empirical coverage under all three splits, but does not derive or claim theoretical guarantees under non-exchangeability. We will revise the abstract to remove the implication that guarantees extend to the shifted protocols, and we will add explicit language in the methods and results sections stating that (i) coverage guarantees hold only under exchangeability (e.g., the random split) and (ii) under epitope and distance-aware shifts we provide only empirical coverage validation. These changes will be made in the revised version. revision: yes

Circularity Check

0 steps flagged

No circularity detected; empirical results from standard calibration and conformal methods

full rationale

The paper introduces a dual-encoder model, applies temperature scaling for calibration, and uses a conformal abstention rule for selective prediction with claimed finite-sample guarantees. All reported metrics (AUROC, ECE, coverage-risk trade-offs) are presented as empirical outcomes evaluated on held-out splits including epitope-held-out and distance-aware protocols. No equations, derivations, or steps in the provided abstract reduce any performance number or guarantee to a fitted parameter or input by construction. No self-citations are shown as load-bearing for the central claims, and the techniques are described as standard applications rather than novel derivations that loop back on themselves. The derivation chain is therefore self-contained against external benchmarks.

Axiom & Free-Parameter Ledger

2 free parameters · 1 axioms · 0 invented entities

The central claim rests on the assumption that conformal prediction coverage transfers across epitope shift and that the pre-trained protein language model provides useful representations for both CDR3 and peptide sequences.

free parameters (2)

temperature parameter
Scalar used in temperature scaling to correct miscalibration; fitted on held-out data.
coverage level
User-chosen fraction of predictions to retain (e.g., 80%); directly controls the abstention threshold.

axioms (1)

domain assumption Calibration and test sets are exchangeable enough for conformal prediction to deliver the stated finite-sample coverage even under epitope shift.
Invoked when the paper claims coverage guarantees on epitope-held-out and distance-aware splits.

pith-pipeline@v0.9.0 · 5538 in / 1347 out tokens · 41872 ms · 2026-05-10T13:16:39.282889+00:00 · methodology

discussion (0)

Reference graph

Works this paper leans on

19 extracted references · 15 canonical work pages

[1]

Methods for evaluating unsupervised vector representations of genomic regions,

G. Zheng, J. Rymuza, E. Gharavi, N. LeRoy, A. Zhang, and N. Sheffield, “Methods for evaluating unsupervised vector representations of genomic regions,”NAR Genomics and Bioinformatics, vol. 6, no. 3, Jul. 2024. [Online]. Available: http://dx.doi.org/10.1093/nargab/lqae086

work page doi:10.1093/nargab/lqae086 2024
[2]

Integrated mrna sequence optimization using deep learning,

H. Gong, J. Wen, R. Luo, Y . Feng, J. Guo, H. Fu, and X. Zhou, “Integrated mrna sequence optimization using deep learning,”Briefings in Bioinformatics, vol. 24, no. 1, Jan. 2023. [Online]. Available: http://dx.doi.org/10.1093/bib/bbad001

work page doi:10.1093/bib/bbad001 2023
[3]

Edlm6apred: ensemble deep learning approach for mrna m6a site prediction,

L. Zhang, G. Li, X. Li, H. Wang, S. Chen, and H. Liu, “Edlm6apred: ensemble deep learning approach for mrna m6a site prediction,” BMC Bioinformatics, vol. 22, no. 1, May 2021. [Online]. Available: http://dx.doi.org/10.1186/s12859-021-04206-4

work page doi:10.1186/s12859-021-04206-4 2021
[4]

epitcr: a highly sensitive predictor for tcr–peptide binding,

M.-D. N. Pham, T.-N. Nguyen, L. S. Tran, Q.-T. B. Nguyen, T.-P. H. Nguyen, T. M. Q. Pham, H.-N. Nguyen, H. Giang, M.-D. Phan, and V . Nguyen, “epitcr: a highly sensitive predictor for tcr–peptide binding,”Bioinformatics, vol. 39, no. 5, Apr. 2023. [Online]. Available: http://dx.doi.org/10.1093/bioinformatics/btad284

work page doi:10.1093/bioinformatics/btad284 2023
[5]

Noninvasive kras mutation estimation in colorectal cancer using a deep learning method based on ct imaging,

K. He, X. Liu, M. Li, X. Li, H. Yang, and H. Zhang, “Noninvasive kras mutation estimation in colorectal cancer using a deep learning method based on ct imaging,”BMC Medical Imaging, vol. 20, no. 1, Jun. 2020. [Online]. Available: http://dx.doi.org/10.1186/s12880-020-00457-4

work page doi:10.1186/s12880-020-00457-4 2020
[6]

On the feasibility of mining cd8+ t cell receptor patterns underlying immunogenic peptide recognition,

N. De Neuter, W. Bittremieux, C. Beirnaert, B. Cuypers, A. Mrzic, P. Moris, A. Suls, V . Van Tendeloo, B. Ogunjimi, K. Laukens, and P. Meysman, “On the feasibility of mining cd8+ t cell receptor patterns underlying immunogenic peptide recognition,”Immunogenetics, vol. 70, no. 3, p. 159–168, Aug. 2017. [Online]. Available: http://dx.doi.org/10.1007/s00251-...

work page doi:10.1007/s00251-017-1023-5 2017
[7]

Convolutional neural network architectures for predicting dna–protein binding,

H. Zeng, M. D. Edwards, G. Liu, and D. K. Gifford, “Convolutional neural network architectures for predicting dna–protein binding,” Bioinformatics, vol. 32, no. 12, p. i121–i127, Jun. 2016. [Online]. Available: http://dx.doi.org/10.1093/bioinformatics/btw255

work page doi:10.1093/bioinformatics/btw255 2016
[8]

Lawrence Zitnick, Jerry Ma, and Rob Fergus

A. Rives, J. Meier, T. Sercu, S. Goyal, Z. Lin, J. Liu, D. Guo, M. Ott, C. L. Zitnick, J. Ma, and R. Fergus, “Biological structure and function emerge from scaling unsupervised learning to 250 million protein sequences,”Proceedings of the National Academy of Sciences, vol. 118, no. 15, Apr. 2021. [Online]. Available: http://dx.doi.org/10.1073/pnas.2016239118

work page doi:10.1073/pnas.2016239118 2021
[9]

Evolutionary-scale prediction of atomic-level protein structure with a language model.Science, 379(6637):1123–1130, 2023

Z. Lin, H. Akin, R. Rao, B. Hie, Z. Zhu, W. Lu, N. Smetanin, R. Verkuil, O. Kabeli, Y . Shmueli, A. dos Santos Costa, M. Fazel- Zarandi, T. Sercu, S. Candido, and A. Rives, “Evolutionary-scale prediction of atomic-level protein structure with a language model,” Science, vol. 379, no. 6637, p. 1123–1130, Mar. 2023. [Online]. Available: http://dx.doi.org/10...

work page doi:10.1126/science.ade2574 2023
[10]

Epic-trace: predicting tcr binding to unseen epitopes using attention and contextualized embeddings,

D. Korpela, E. Jokinen, A. Dumitrescu, J. Huuhtanen, S. Mustjoki, and H. L ¨ahdesm¨aki, “Epic-trace: predicting tcr binding to unseen epitopes using attention and contextualized embeddings,”Bioinformatics, vol. 39, no. 12, Dec. 2023. [Online]. Available: http://dx.doi.org/10. 1093/bioinformatics/btad743

2023
[11]

Multiple instance learning: A survey of problem characteristics and applications,

M.-A. Carbonneau, V . Cheplygina, E. Granger, and G. Gagnon, “Multiple instance learning: A survey of problem characteristics and applications,”Pattern Recognition, vol. 77, p. 329–353, May 2018. [Online]. Available: http://dx.doi.org/10.1016/j.patcog.2017.10.009

work page doi:10.1016/j.patcog.2017.10.009 2018
[12]

Deepmhcii: a novel binding core-aware deep interaction model for accurate mhc-ii peptide binding affinity prediction,

R. You, W. Qu, H. Mamitsuka, and S. Zhu, “Deepmhcii: a novel binding core-aware deep interaction model for accurate mhc-ii peptide binding affinity prediction,”Bioinformatics, vol. 38, no. Supplement 1, p. i220–i228, Jun. 2022. [Online]. Available: http://dx.doi.org/10.1093/bioinformatics/btac225

work page doi:10.1093/bioinformatics/btac225 2022
[13]

Pan-specific prediction of peptide–mhc class i complex stability, a correlate of t cell immunogenicity,

M. Rasmussen, E. Fenoy, M. Harndahl, A. B. Kristensen, I. K. Nielsen, M. Nielsen, and S. Buus, “Pan-specific prediction of peptide–mhc class i complex stability, a correlate of t cell immunogenicity,”The Journal of Immunology, vol. 197, no. 4, p. 1517–1524, Aug. 2016. [Online]. Available: http://dx.doi.org/10.4049/jimmunol.1600582

work page doi:10.4049/jimmunol.1600582 2016
[14]

Teinet: a deep learning framework for prediction of tcr–epitope binding specificity,

Y . Jiang, M. Huo, and S. Cheng Li, “Teinet: a deep learning framework for prediction of tcr–epitope binding specificity,”Briefings in Bioinformatics, vol. 24, no. 2, Mar. 2023. [Online]. Available: http://dx.doi.org/10.1093/bib/bbad086

work page doi:10.1093/bib/bbad086 2023
[15]

Biological structure and function emerge from scaling unsupervised learning to 250 million protein sequences,

A. Rives, J. Meier, T. Sercu, S. Goyal, Z. Lin, J. Liu, D. Guo, M. Ott, C. L. Zitnick, J. Maet al., “Biological structure and function emerge from scaling unsupervised learning to 250 million protein sequences,” Proceedings of the national academy of sciences, vol. 118, no. 15, p. e2016239118, 2021

2021
[16]

Modeling tcr-pmhc binding with dual encoders and cross-attention fusion,

W. Wang, C. Qi, and Z. Wei, “Modeling tcr-pmhc binding with dual encoders and cross-attention fusion,” in2025 IEEE International Con- ference on Bioinformatics and Biomedicine (BIBM). IEEE, 2025, pp. 5083–5090

2025
[17]

Lantern: Tcr- peptide binding prediction via large language model representations,

C. Qi, H. Fang, S. Jiang, T. Hu, and Z. Wei, “Lantern: Tcr- peptide binding prediction via large language model representations,” PeerJ, vol. 14, p. e20980, Mar. 2026. [Online]. Available: http: //dx.doi.org/10.7717/peerj.20980

work page doi:10.7717/peerj.20980 2026
[18]

Assessing the generalization capabilities of tcr binding predictors via peptide distance analysis,

L. V . Castorina, F. Grazioli, P. Machart, A. M ¨osch, and F. Errica, “Assessing the generalization capabilities of tcr binding predictors via peptide distance analysis,”PLoS One, vol. 20, no. 5, p. e0324011, 2025

2025
[19]

A deep learning approach to diagnostic classification of prostate cancer using pathology–radiology fusion,

P. Khosravi, M. Lysandrou, M. Eljalby, Q. Li, E. Kazemi, P. Zisimopoulos, A. Sigaras, M. Brendel, J. Barnes, C. Ricketts, D. Meleshko, A. Yat, T. D. McClure, B. D. Robinson, A. Sboner, O. Elemento, B. Chughtai, and I. Hajirasouliha, “A deep learning approach to diagnostic classification of prostate cancer using pathology–radiology fusion,”Journal of Magne...

work page doi:10.1002/jmri.27599 2021