Recognition: unknown
Calibrated Abstention for Reliable TCR--pMHC Binding Prediction under Epitope Shift
Pith reviewed 2026-05-10 13:16 UTC · model grok-4.3
The pith
A dual-encoder model with temperature scaling and conformal abstention delivers reliable TCR-pMHC binding predictions with coverage guarantees under epitope shift.
A machine-rendered reading of the paper's core claim, the machinery that carries it, and where it could break.
Core claim
We frame TCR-pMHC binding as a selective prediction problem and equip a dual-encoder architecture with temperature scaling plus a conformal abstention rule. The rule uses calibration data to set per-instance thresholds that guarantee the error rate among non-abstained predictions stays at or below a user-specified target with finite-sample coverage, even when epitopes are held out. On the epitope-held-out protocol the resulting model reaches AUROC 0.813 and ECE 0.043, a 69.7 percent reduction in ECE versus the uncalibrated baseline, and at 80 percent coverage reduces error rate from 18.7 percent to 10.9 percent.
What carries the argument
The conformal abstention rule, which converts nonconformity scores computed on a calibration set into instance-specific thresholds that enforce finite-sample coverage of the target error rate on the accepted predictions.
Load-bearing premise
The calibration data remains exchangeable with the test data despite the deliberate epitope shift between them.
What would settle it
On a fresh epitope-held-out test set the observed error rate among accepted predictions exceeds the target error rate by more than the conformal guarantee allows.
read the original abstract
Predicting T-cell receptor (TCR)--peptide-MHC (pMHC) binding is central to vaccine design and T-cell therapy, yet deployed models frequently encounter epitopes unseen during training, causing silent overconfidence and unreliable prioritization. We address this by framing TCR--pMHC prediction as a \emph{selective prediction} problem: a calibrated model should either output a trustworthy confidence score or explicitly abstain. Concretely, we (1) introduce a dual-encoder architecture encoding both CDR3$\alpha$/CDR3$\beta$ and peptide sequences via a pre-trained protein language model; (2) apply temperature scaling to correct systematic probability miscalibration; and (3) impose a conformal abstention rule that provides finite-sample coverage guarantees at a user-specified target error rate. Evaluated under three split strategies -- random, epitope-held-out, and distance-aware -- our method achieves AUROC 0.813 and ECE 0.043 under the challenging epitope-held-out protocol, reducing ECE by 69.7\% relative to an uncalibrated baseline. At 80\% coverage, the selective model further reduces error rate from 18.7\% to 10.9\%, demonstrating that calibrated abstention enables principled coverage-risk trade-offs aligned with practical screening budgets.
Editorial analysis
A structured set of objections, weighed in public.
Referee Report
Summary. The paper presents a method for TCR-pMHC binding prediction that combines a dual-encoder architecture based on pre-trained protein language models, temperature scaling for calibration, and conformal prediction for abstention. It claims finite-sample coverage guarantees and reports strong performance metrics under random, epitope-held-out, and distance-aware splits, including AUROC 0.813 and ECE 0.043 on the challenging epitope-held-out setting with substantial error reduction at high coverage.
Significance. If the conformal guarantees hold under the reported shifts, this work would significantly advance reliable and trustworthy predictions for TCR-pMHC interactions, enabling better prioritization in vaccine design and T-cell therapy by allowing principled abstention. The empirical improvements in calibration (69.7% ECE reduction) and selective error rates provide practical value even without the guarantees.
major comments (1)
- [Abstract] The claim that the conformal abstention rule provides finite-sample coverage guarantees at a user-specified target error rate under the epitope-held-out protocol is not justified. Standard conformal prediction relies on exchangeability between calibration and test points, which is violated by the deliberate epitope shift that partitions unseen epitopes. No weighted or adaptive conformal methods are described to address this non-exchangeability.
minor comments (1)
- [Abstract] The description of the dual-encoder architecture could benefit from more detail on how CDR3α/β and peptide sequences are encoded and combined.
Simulated Author's Rebuttal
We thank the referee for their careful reading and for identifying a point that requires clarification in the abstract and methods. We address the major comment below and will revise the manuscript accordingly.
read point-by-point responses
-
Referee: [Abstract] The claim that the conformal abstention rule provides finite-sample coverage guarantees at a user-specified target error rate under the epitope-held-out protocol is not justified. Standard conformal prediction relies on exchangeability between calibration and test points, which is violated by the deliberate epitope shift that partitions unseen epitopes. No weighted or adaptive conformal methods are described to address this non-exchangeability.
Authors: We agree that the finite-sample coverage guarantee of standard conformal prediction requires exchangeability between the calibration and test points. The epitope-held-out protocol partitions on unseen epitopes and therefore induces a distribution shift that violates exchangeability. The manuscript applies the standard (non-weighted, non-adaptive) conformal abstention rule and reports empirical coverage under all three splits, but does not derive or claim theoretical guarantees under non-exchangeability. We will revise the abstract to remove the implication that guarantees extend to the shifted protocols, and we will add explicit language in the methods and results sections stating that (i) coverage guarantees hold only under exchangeability (e.g., the random split) and (ii) under epitope and distance-aware shifts we provide only empirical coverage validation. These changes will be made in the revised version. revision: yes
Circularity Check
No circularity detected; empirical results from standard calibration and conformal methods
full rationale
The paper introduces a dual-encoder model, applies temperature scaling for calibration, and uses a conformal abstention rule for selective prediction with claimed finite-sample guarantees. All reported metrics (AUROC, ECE, coverage-risk trade-offs) are presented as empirical outcomes evaluated on held-out splits including epitope-held-out and distance-aware protocols. No equations, derivations, or steps in the provided abstract reduce any performance number or guarantee to a fitted parameter or input by construction. No self-citations are shown as load-bearing for the central claims, and the techniques are described as standard applications rather than novel derivations that loop back on themselves. The derivation chain is therefore self-contained against external benchmarks.
Axiom & Free-Parameter Ledger
free parameters (2)
- temperature parameter
- coverage level
axioms (1)
- domain assumption Calibration and test sets are exchangeable enough for conformal prediction to deliver the stated finite-sample coverage even under epitope shift.
Reference graph
Works this paper leans on
-
[1]
Methods for evaluating unsupervised vector representations of genomic regions,
G. Zheng, J. Rymuza, E. Gharavi, N. LeRoy, A. Zhang, and N. Sheffield, “Methods for evaluating unsupervised vector representations of genomic regions,”NAR Genomics and Bioinformatics, vol. 6, no. 3, Jul. 2024. [Online]. Available: http://dx.doi.org/10.1093/nargab/lqae086
-
[2]
Integrated mrna sequence optimization using deep learning,
H. Gong, J. Wen, R. Luo, Y . Feng, J. Guo, H. Fu, and X. Zhou, “Integrated mrna sequence optimization using deep learning,”Briefings in Bioinformatics, vol. 24, no. 1, Jan. 2023. [Online]. Available: http://dx.doi.org/10.1093/bib/bbad001
-
[3]
Edlm6apred: ensemble deep learning approach for mrna m6a site prediction,
L. Zhang, G. Li, X. Li, H. Wang, S. Chen, and H. Liu, “Edlm6apred: ensemble deep learning approach for mrna m6a site prediction,” BMC Bioinformatics, vol. 22, no. 1, May 2021. [Online]. Available: http://dx.doi.org/10.1186/s12859-021-04206-4
-
[4]
epitcr: a highly sensitive predictor for tcr–peptide binding,
M.-D. N. Pham, T.-N. Nguyen, L. S. Tran, Q.-T. B. Nguyen, T.-P. H. Nguyen, T. M. Q. Pham, H.-N. Nguyen, H. Giang, M.-D. Phan, and V . Nguyen, “epitcr: a highly sensitive predictor for tcr–peptide binding,”Bioinformatics, vol. 39, no. 5, Apr. 2023. [Online]. Available: http://dx.doi.org/10.1093/bioinformatics/btad284
-
[5]
K. He, X. Liu, M. Li, X. Li, H. Yang, and H. Zhang, “Noninvasive kras mutation estimation in colorectal cancer using a deep learning method based on ct imaging,”BMC Medical Imaging, vol. 20, no. 1, Jun. 2020. [Online]. Available: http://dx.doi.org/10.1186/s12880-020-00457-4
-
[6]
N. De Neuter, W. Bittremieux, C. Beirnaert, B. Cuypers, A. Mrzic, P. Moris, A. Suls, V . Van Tendeloo, B. Ogunjimi, K. Laukens, and P. Meysman, “On the feasibility of mining cd8+ t cell receptor patterns underlying immunogenic peptide recognition,”Immunogenetics, vol. 70, no. 3, p. 159–168, Aug. 2017. [Online]. Available: http://dx.doi.org/10.1007/s00251-...
-
[7]
Convolutional neural network architectures for predicting dna–protein binding,
H. Zeng, M. D. Edwards, G. Liu, and D. K. Gifford, “Convolutional neural network architectures for predicting dna–protein binding,” Bioinformatics, vol. 32, no. 12, p. i121–i127, Jun. 2016. [Online]. Available: http://dx.doi.org/10.1093/bioinformatics/btw255
-
[8]
Lawrence Zitnick, Jerry Ma, and Rob Fergus
A. Rives, J. Meier, T. Sercu, S. Goyal, Z. Lin, J. Liu, D. Guo, M. Ott, C. L. Zitnick, J. Ma, and R. Fergus, “Biological structure and function emerge from scaling unsupervised learning to 250 million protein sequences,”Proceedings of the National Academy of Sciences, vol. 118, no. 15, Apr. 2021. [Online]. Available: http://dx.doi.org/10.1073/pnas.2016239118
-
[9]
Z. Lin, H. Akin, R. Rao, B. Hie, Z. Zhu, W. Lu, N. Smetanin, R. Verkuil, O. Kabeli, Y . Shmueli, A. dos Santos Costa, M. Fazel- Zarandi, T. Sercu, S. Candido, and A. Rives, “Evolutionary-scale prediction of atomic-level protein structure with a language model,” Science, vol. 379, no. 6637, p. 1123–1130, Mar. 2023. [Online]. Available: http://dx.doi.org/10...
-
[10]
Epic-trace: predicting tcr binding to unseen epitopes using attention and contextualized embeddings,
D. Korpela, E. Jokinen, A. Dumitrescu, J. Huuhtanen, S. Mustjoki, and H. L ¨ahdesm¨aki, “Epic-trace: predicting tcr binding to unseen epitopes using attention and contextualized embeddings,”Bioinformatics, vol. 39, no. 12, Dec. 2023. [Online]. Available: http://dx.doi.org/10. 1093/bioinformatics/btad743
2023
-
[11]
Multiple instance learning: A survey of problem characteristics and applications,
M.-A. Carbonneau, V . Cheplygina, E. Granger, and G. Gagnon, “Multiple instance learning: A survey of problem characteristics and applications,”Pattern Recognition, vol. 77, p. 329–353, May 2018. [Online]. Available: http://dx.doi.org/10.1016/j.patcog.2017.10.009
-
[12]
R. You, W. Qu, H. Mamitsuka, and S. Zhu, “Deepmhcii: a novel binding core-aware deep interaction model for accurate mhc-ii peptide binding affinity prediction,”Bioinformatics, vol. 38, no. Supplement 1, p. i220–i228, Jun. 2022. [Online]. Available: http://dx.doi.org/10.1093/bioinformatics/btac225
-
[13]
M. Rasmussen, E. Fenoy, M. Harndahl, A. B. Kristensen, I. K. Nielsen, M. Nielsen, and S. Buus, “Pan-specific prediction of peptide–mhc class i complex stability, a correlate of t cell immunogenicity,”The Journal of Immunology, vol. 197, no. 4, p. 1517–1524, Aug. 2016. [Online]. Available: http://dx.doi.org/10.4049/jimmunol.1600582
-
[14]
Teinet: a deep learning framework for prediction of tcr–epitope binding specificity,
Y . Jiang, M. Huo, and S. Cheng Li, “Teinet: a deep learning framework for prediction of tcr–epitope binding specificity,”Briefings in Bioinformatics, vol. 24, no. 2, Mar. 2023. [Online]. Available: http://dx.doi.org/10.1093/bib/bbad086
-
[15]
Biological structure and function emerge from scaling unsupervised learning to 250 million protein sequences,
A. Rives, J. Meier, T. Sercu, S. Goyal, Z. Lin, J. Liu, D. Guo, M. Ott, C. L. Zitnick, J. Maet al., “Biological structure and function emerge from scaling unsupervised learning to 250 million protein sequences,” Proceedings of the national academy of sciences, vol. 118, no. 15, p. e2016239118, 2021
2021
-
[16]
Modeling tcr-pmhc binding with dual encoders and cross-attention fusion,
W. Wang, C. Qi, and Z. Wei, “Modeling tcr-pmhc binding with dual encoders and cross-attention fusion,” in2025 IEEE International Con- ference on Bioinformatics and Biomedicine (BIBM). IEEE, 2025, pp. 5083–5090
2025
-
[17]
Lantern: Tcr- peptide binding prediction via large language model representations,
C. Qi, H. Fang, S. Jiang, T. Hu, and Z. Wei, “Lantern: Tcr- peptide binding prediction via large language model representations,” PeerJ, vol. 14, p. e20980, Mar. 2026. [Online]. Available: http: //dx.doi.org/10.7717/peerj.20980
-
[18]
Assessing the generalization capabilities of tcr binding predictors via peptide distance analysis,
L. V . Castorina, F. Grazioli, P. Machart, A. M ¨osch, and F. Errica, “Assessing the generalization capabilities of tcr binding predictors via peptide distance analysis,”PLoS One, vol. 20, no. 5, p. e0324011, 2025
2025
-
[19]
P. Khosravi, M. Lysandrou, M. Eljalby, Q. Li, E. Kazemi, P. Zisimopoulos, A. Sigaras, M. Brendel, J. Barnes, C. Ricketts, D. Meleshko, A. Yat, T. D. McClure, B. D. Robinson, A. Sboner, O. Elemento, B. Chughtai, and I. Hajirasouliha, “A deep learning approach to diagnostic classification of prostate cancer using pathology–radiology fusion,”Journal of Magne...
discussion (0)
Sign in with ORCID, Apple, or X to comment. Anyone can read and Pith papers without signing in.