arxiv: 2605.12575 · v1 · submitted 2026-05-12 · 📡 eess.IV · cs.AI· cs.CV

Recognition: no theorem link

Are Compact Rationales Free? Measuring Tile Selection Headroom in Frozen WSI-MIL

Hyun Do Jung , Jungwon Choi , Soojung Choi , Yujin Oh , Hwiyoung Kim

Authors on Pith no claims yet

Pith reviewed 2026-05-14 20:35 UTC · model grok-4.3

classification 📡 eess.IV cs.AIcs.CV

keywords WSI-MILmultiple instance learningmodel interpretabilitytile selectionrationalesfrozen backbonesattention mechanismshistopathology imaging

0 comments

The pith

FOCI reveals that compact rationales for frozen WSI-MIL predictions depend on the choice of backbone aggregator.

A machine-rendered reading of the paper's core claim, the machinery that carries it, and where it could break.

The paper tests whether trained WSI-MIL classifiers can have their slide predictions recovered from small, model-consistent tile subsets without any retraining. It does this by adding FOCI, a lightweight layer that learns to select or drop tiles using sufficiency and exclusion objectives. Evaluation with a sequential reveal protocol across multiple benchmarks and models shows that some backbones, especially transformers, support compact rationales while others quickly saturate or conflict with the external selection. This matters for interpretability because it offers a way to audit when a black-box MIL decision can be localized to a reviewable number of tiles.

Core claim

Across three WSI benchmarks and seven MIL backbones, FOCI shows that compact rationales are selection-headroom dependent: transformer and multi-branch attention aggregators can admit compact rationales, near-minimal attention-pooling baselines enter a selection-saturation regime, and hard-selection backbones can conflict with an external readout. For TransMIL, FOCI reduces the Minimum Sufficient K tile count by 32-56% relative to CLS-proxy ranking, and ACMIL+FOCI attains the highest mean SHI of +0.465.

What carries the argument

FOCI, a lightweight rationale-readout layer trained over a frozen MIL backbone with model-output sufficiency and exclusion objectives on keep/drop tile subsets.

Load-bearing premise

That the sufficiency and exclusion objectives produce tile subsets that are genuinely sufficient for the original model without introducing readout artifacts.

What would settle it

A direct test showing that, for a backbone with high reported SHI, the FOCI-selected minimal tiles fail to match the full-slide prediction accuracy while random same-sized subsets succeed.

Figures

Figures reproduced from arXiv: 2605.12575 by Hwiyoung Kim, Hyun Do Jung, Jungwon Choi, Soojung Choi, Yujin Oh.

**Figure 2.** Figure 2: FOCI as a frozen rationale-readout probe. The frozen encoder maps WSI tiles to features [PITH_FULL_IMAGE:figures/full_fig_p004_2.png] view at source ↗

**Figure 3.** Figure 3: Slide-level AUC and SHI are decoupled. Each point is a (backbone, dataset) pair; color [PITH_FULL_IMAGE:figures/full_fig_p008_3.png] view at source ↗

**Figure 4.** Figure 4: Qualitative illustration of FOCI selections on two LUSC slides. Each slide is shown twice. Top row of each pair: WSI thumbnail with FOCI’s top-32 selected tiles outlined in yellow and the top-3 highlighted in orange (#1, #2, #3), plus three zoom-in crops at 20× magnification. Bottom row of each pair: same WSI rendered three times with each method’s top-32 ranked tiles outlined; cyan = TransMIL CLS-proxy ra… view at source ↗

**Figure 5.** Figure 5: Extended SRP reveal curves for seven backbones [PITH_FULL_IMAGE:figures/full_fig_p017_5.png] view at source ↗

**Figure 6.** Figure 6: Cross-method SRP confidence curves on TCGA-NSCLC, TCGA-BRCA, and PANDA. [PITH_FULL_IMAGE:figures/full_fig_p018_6.png] view at source ↗

read the original abstract

Whole-slide image (WSI) multiple instance learning (MIL) classifiers can achieve strong slide-level AUC while leaving the full-bag prediction opaque. Attention scores are widely reused as post-hoc explanations, but high attention can reflect aggregation preference rather than a compact, model-sufficient rationale. We study post-hoc rationale highlighting for frozen WSI-MIL: given a trained classifier, can its slide-level prediction be recovered from a compact, output-consistent tile subset without retraining the backbone? We instantiate this with Finding Optimal Contextual Instances (FOCI), a lightweight rationale-readout layer over a frozen MIL backbone. FOCI is trained with model-output sufficiency and exclusion objectives over keep/drop tile subsets, evaluated with an insertion-style Sequential Reveal Protocol (SRP) adapted to WSI-MIL, and summarized by the Selection Headroom Index (SHI). Across three WSI benchmarks and seven MIL backbones, FOCI reveals that compact rationales are selection-headroom dependent: transformer and multi-branch attention aggregators can admit compact rationales, near-minimal attention-pooling baselines enter a selection-saturation regime, and hard-selection backbones can conflict with an external readout. For TransMIL, relative to its documented CLS-proxy ranking, FOCI reduces the Minimum Sufficient K (MSK) tile count by 32-56% across benchmarks, while ACMIL+FOCI attains the highest mean SHI (+0.465). Deletion-based perturbation and selected-only downstream evaluation provide complementary checks. These results position FOCI as a model-level interpretability and audit layer: selected tiles are not claims of clinical or pathologist-level diagnostic sufficiency, but candidate rationales that offer a compact, reviewable view of when a frozen MIL prediction can be localized to a small output-consistent subset.

Editorial analysis

A structured set of objections, weighed in public.

Desk editor's note, referee report, simulated authors' rebuttal, and a circularity audit. Tearing a paper down is the easy half of reading it; the pith above is the substance, this is the friction.

Desk Editor's Note private letter to a colleague

FOCI gives a practical way to measure how much compact rationale headroom exists in frozen WSI-MIL models, but the keep/drop training risks making the measured headroom partly an artifact of the readout itself.

read the letter

The paper's main contribution is FOCI, a lightweight readout trained on sufficiency and exclusion losses over keep/drop tile subsets, plus the SRP protocol and SHI metric to quantify how small a tile set can still recover the frozen backbone's slide-level prediction. Across three benchmarks and seven backbones it shows clear differences: transformers like TransMIL cut the minimum sufficient tile count by 32-56% relative to their CLS ranking, while basic attention-pooling models saturate and hard-selection ones clash with the external readout. ACMIL plus FOCI posts the highest average SHI. That pattern is the useful takeaway for anyone trying to audit existing MIL systems without retraining them. The deletion checks and selected-only downstream tests add some reassurance that the selected tiles are not just noise. The soft spot is the one the stress-test flags. Because FOCI is optimized exactly on the keep/drop subsets later used in SRP, the reported reductions and SHI gains could partly reflect what the objectives make easy to optimize rather than the backbone's intrinsic selection properties. Complementary checks help but do not fully separate the readout's influence from the model's own headroom. The abstract also omits error bars, statistical tests, and sensitivity to the keep/drop training schedule, which leaves the magnitude of the architecture differences harder to judge. This work is aimed at computational pathology groups already running MIL pipelines who want a post-hoc audit layer. It is worth sending to peer review because the experimental scope is broad and the practical framing is clear, but the circularity concern needs explicit discussion and probably an ablation that trains FOCI on held-out subsets or uses a different selection mechanism.

Referee Report

2 major / 2 minor

Summary. The paper introduces FOCI, a lightweight rationale-readout layer trained on frozen WSI-MIL backbones using sufficiency and exclusion objectives over keep/drop tile subsets. It evaluates these via an adapted Sequential Reveal Protocol (SRP) and the Selection Headroom Index (SHI) across three benchmarks and seven MIL architectures, claiming architecture-dependent selection headroom: transformers admit compact rationales (e.g., 32-56% MSK reduction for TransMIL vs. CLS-proxy), attention-pooling baselines saturate, and hard-selection models conflict with external readouts, with ACMIL+FOCI yielding the highest mean SHI (+0.465). Complementary deletion perturbations and selected-only downstream checks are included.

Significance. If the central claims hold, this provides a practical model-level interpretability and audit tool for WSI-MIL, quantifying when slide-level predictions can be recovered from compact, output-consistent tile subsets without retraining. The multi-backbone, multi-benchmark scope plus deletion and downstream checks constitute a strength, offering a falsifiable protocol for distinguishing architectures by inherent selection headroom in computational pathology.

major comments (2)

[FOCI training procedure (Section 3) and SRP evaluation (Section 4)] FOCI training procedure (Section 3) and SRP evaluation (Section 4): the joint optimization of FOCI on the exact keep/drop subsets later used in SRP creates a risk that MSK reductions (32-56% for TransMIL) and SHI gains (+0.465 for ACMIL) partly reflect objective-induced biases rather than intrinsic backbone headroom. While deletion checks and selected-only evaluation are noted as mitigations, no dedicated ablation on sensitivity to the keep/drop training procedure is reported; this is load-bearing for the architecture-dependent claim.
[Experimental results (Section 5)] Experimental results (Section 5): the manuscript reports specific quantitative improvements (e.g., 32-56% MSK reductions, +0.465 SHI) but provides no details on statistical testing, error bars across runs, or sensitivity to FOCI hyperparameters and random seeds. This weakens confidence in the cross-architecture comparisons given the empirical protocol.

minor comments (2)

[Abstract] Abstract: the three WSI benchmarks are referenced but not named; specifying them (e.g., CAMELYON16, TCGA-LUAD) would improve immediate readability.
[Notation and definitions (Section 2-3)] Notation and definitions: SHI, MSK, and the precise formulation of the sufficiency/exclusion losses would benefit from an explicit notation table or expanded initial presentation to aid readers.

Simulated Author's Rebuttal

2 responses · 0 unresolved

We thank the referee for the constructive and detailed review, as well as the positive assessment of the work's significance. We address each major comment point-by-point below, with clarifications on the design choices and revisions to strengthen the empirical support for our claims.

read point-by-point responses

Referee: [FOCI training procedure (Section 3) and SRP evaluation (Section 4)] FOCI training procedure (Section 3) and SRP evaluation (Section 4): the joint optimization of FOCI on the exact keep/drop subsets later used in SRP creates a risk that MSK reductions (32-56% for TransMIL) and SHI gains (+0.465 for ACMIL) partly reflect objective-induced biases rather than intrinsic backbone headroom. While deletion checks and selected-only evaluation are noted as mitigations, no dedicated ablation on sensitivity to the keep/drop training procedure is reported; this is load-bearing for the architecture-dependent claim.

Authors: We appreciate the referee highlighting this potential circularity. The joint use of keep/drop subsets is by design: FOCI optimizes a readout to recover the frozen backbone's output from minimal sufficient subsets (sufficiency objective) while penalizing reliance on excluded tiles (exclusion objective), directly quantifying selection headroom. SRP then evaluates the resulting minimal K in an insertion-style protocol. The reported MSK reductions and SHI values thus measure how compactly each backbone's decision can be localized, rather than claiming independence from the readout. Deletion perturbations and selected-only downstream checks were included precisely as orthogonal validations that the subsets remain predictive outside the training distribution. Nevertheless, to further isolate any sensitivity, we have added a dedicated ablation in the revised Section 5 varying keep/drop sampling ratios, loss weighting, and subset generation strategies; the relative architecture ordering by SHI is preserved, supporting that the headroom differences are backbone-intrinsic. revision: yes
Referee: [Experimental results (Section 5)] Experimental results (Section 5): the manuscript reports specific quantitative improvements (e.g., 32-56% MSK reductions, +0.465 SHI) but provides no details on statistical testing, error bars across runs, or sensitivity to FOCI hyperparameters and random seeds. This weakens confidence in the cross-architecture comparisons given the empirical protocol.

Authors: We agree that explicit variability and statistical reporting are necessary to support the cross-architecture claims. The original manuscript focused on mean trends across benchmarks but omitted these details. In the revised version, we now report standard deviations over five independent random seeds for FOCI training, SRP evaluation, and hyperparameter sweeps (including sufficiency/exclusion loss coefficients and subset sampling temperature). We additionally include paired t-test p-values for key SHI and MSK differences between architectures, confirming statistical significance of the reported gaps (e.g., TransMIL vs. attention-pooling baselines). These additions appear in the updated Section 5 and supplementary material. revision: yes

Circularity Check

0 steps flagged

Empirical protocol with new objectives exhibits no reduction by construction

full rationale

The paper defines FOCI as a new lightweight readout trained on sufficiency/exclusion objectives over keep/drop subsets and evaluates via the newly introduced SRP and SHI metrics. No equations, self-citations, or claims reduce the reported MSK reductions (32-56%) or SHI gains (+0.465) to quantities that are tautologically equivalent to the training inputs. Complementary deletion checks and selected-only evaluation are presented as independent verifications. This is a standard empirical measurement setup on frozen backbones; the central claims about architecture-dependent headroom rest on observable performance differences rather than definitional loops or fitted-input predictions.

Axiom & Free-Parameter Ledger

0 free parameters · 0 axioms · 0 invented entities

Abstract-only review yields no explicit free parameters, axioms, or invented entities beyond standard MIL bag assumptions and the new sufficiency/exclusion objectives; no independent evidence for any postulated entities is provided.

pith-pipeline@v0.9.0 · 5638 in / 1264 out tokens · 49137 ms · 2026-05-14T20:35:07.746936+00:00 · methodology

discussion (0)

Reference graph

Works this paper leans on

41 extracted references · 41 canonical work pages

[1]

Hanna, Luke Geneslaw, Allen Miraflor, Vitor Werneck Krauss Silva, Klaus J

Gabriele Campanella, Matthew G. Hanna, Luke Geneslaw, Allen Miraflor, Vitor Werneck Krauss Silva, Klaus J. Busam, Edi Brogi, Victor E. Reuter, David S. Klimstra, and Thomas J. Fuchs. Clinical-grade computational pathology using weakly supervised deep learning on whole slide images.Nature Medicine, 25(8):1301–1309, 2019

work page 2019
[2]

Attention-based deep multiple instance learning

Maximilian Ilse, Jakub Tomczak, and Max Welling. Attention-based deep multiple instance learning. In Proceedings of the 35th International Conference on Machine Learning, pages 2127–2136. PMLR, 2018

work page 2018
[3]

Data-efficient and weakly supervised computational pathology on whole-slide images.Nature biomedical engineering, 5(6):555–570, 2021

Ming Y Lu, Drew FK Williamson, Tiffany Y Chen, Richard J Chen, Matteo Barbieri, and Faisal Mahmood. Data-efficient and weakly supervised computational pathology on whole-slide images.Nature biomedical engineering, 5(6):555–570, 2021

work page 2021
[4]

Towards a general-purpose foundation model for computational pathology.Nature medicine, 30(3):850–862, 2024

Richard J Chen, Tong Ding, Ming Y Lu, Drew FK Williamson, Guillaume Jaume, Andrew H Song, Bowen Chen, Andrew Zhang, Daniel Shao, Muhammad Shaban, et al. Towards a general-purpose foundation model for computational pathology.Nature medicine, 30(3):850–862, 2024

work page 2024
[5]

Deep learning for whole slide image analysis: an overview.Frontiers in medicine, 6:264, 2019

Neofytos Dimitriou, Ognjen Arandjelovi´c, and Peter D Caie. Deep learning for whole slide image analysis: an overview.Frontiers in medicine, 6:264, 2019

work page 2019
[6]

Multiple instance learning for digital pathology: A review of the state-of-the-art, limitations & future potential.Computerized Medical Imaging and Graphics, 112:102337, 2024

Michael Gadermayr and Maximilian Tschuchnig. Multiple instance learning for digital pathology: A review of the state-of-the-art, limitations & future potential.Computerized Medical Imaging and Graphics, 112:102337, 2024

work page 2024
[7]

Sofia Serrano and Noah A. Smith. Is attention interpretable? InProceedings of the 57th Annual Meeting of the Association for Computational Linguistics, pages 2931–2951, Florence, Italy, July 2019. Association for Computational Linguistics

work page 2019
[8]

Danish Pruthi, Mansi Gupta, Bhuwan Dhingra, Graham Neubig, and Zachary C. Lipton. Learning to deceive with attention-based explanations. InProceedings of the 58th Annual Meeting of the Association for Computational Linguistics, pages 4782–4793. Association for Computational Linguistics, July 2020

work page 2020
[9]

Multiple instance learning for wsi: A comparative analysis of attention-based approaches.Journal of Pathology Informatics, 15:100403, 2024

Martim Afonso, Praphulla MS Bhawsar, Monjoy Saha, Jonas S Almeida, and Arlindo L Oliveira. Multiple instance learning for wsi: A comparative analysis of attention-based approaches.Journal of Pathology Informatics, 15:100403, 2024

work page 2024
[10]

Interpretability of deep learning models: A survey of results

Supriyo Chakraborty, Richard Tomsett, Ramya Raghavendra, Daniel Harborne, Moustafa Alzantot, Fed- erico Cerutti, Mani Srivastava, Alun Preece, Simon Julier, Raghuveer M Rao, et al. Interpretability of deep learning models: A survey of results. In2017 IEEE smartworld, ubiquitous intelligence & computing, advanced & trusted computed, scalable computing & co...

work page 2017
[11]

How effective can dropout be in multiple instance learning ? InForty-second International Conference on Machine Learning, 2025

Wenhui Zhu, Peijie Qiu, Xiwen Chen, Zhangsihao Yang, Aristeidis Sotiras, Abolfazl Razi, and Yalin Wang. How effective can dropout be in multiple instance learning ? InForty-second International Conference on Machine Learning, 2025

work page 2025
[12]

The cancer genome atlas pan-cancer analysis project

John N Weinstein, Eric A Collisson, Gordon B Mills, Kenna R Shaw, Brad A Ozenberger, Kyle Ellrott, Ilya Shmulevich, Chris Sander, and Joshua M Stuart. The cancer genome atlas pan-cancer analysis project. Nature genetics, 45(10):1113–1120, 2013

work page 2013
[13]

Artificial intelligence for diagnosis and gleason grading of prostate cancer: the panda challenge.Nature medicine, 28(1):154–163, 2022

Wouter Bulten, Kimmo Kartasalo, Po-Hsuan Cameron Chen, Peter Ström, Hans Pinckaers, Kunal Nagpal, Yuannan Cai, David F Steiner, Hester Van Boven, Robert Vink, et al. Artificial intelligence for diagnosis and gleason grading of prostate cancer: the panda challenge.Nature medicine, 28(1):154–163, 2022

work page 2022
[14]

Dietterich, Richard H

Thomas G. Dietterich, Richard H. Lathrop, and Tomás Lozano-Pérez. Solving the multiple instance problem with axis-parallel rectangles.Artificial Intelligence, 89(1):31–71, 1997

work page 1997
[15]

Transmil: Transformer based correlated multiple instance learning for whole slide image classification

Zhuchen Shao, Hao Bian, Yang Chen, Yifeng Wang, Jian Zhang, Xiangyang Ji, and Yongbing Zhang. Transmil: Transformer based correlated multiple instance learning for whole slide image classification. In M. Ranzato, A. Beygelzimer, Y . Dauphin, P.S. Liang, and J. Wortman Vaughan, editors,Advances in Neural Information Processing Systems, volume 34, pages 213...

work page 2021
[16]

Chen, Chengkuan Chen, Yicong Li, Tiffany Y

Richard J. Chen, Chengkuan Chen, Yicong Li, Tiffany Y . Chen, Andrew D. Trister, Rahul G. Krishnan, and Faisal Mahmood. Scaling vision transformers to gigapixel images via hierarchical self-supervised learning. InProceedings of the IEEE/CVF Conference on Computer Vision and Pattern Recognition (CVPR), pages 16144–16155, June 2022. 10

work page 2022
[17]

Multiple instance learning framework with masked hard instance mining for whole slide image classification

Wenhao Tang, Sheng Huang, Xiaoxian Zhang, Fengtao Zhou, Yi Zhang, and Bo Liu. Multiple instance learning framework with masked hard instance mining for whole slide image classification. InProceedings of the IEEE/CVF International Conference on Computer Vision (ICCV), pages 4078–4087, October 2023

work page 2023
[18]

Linghan Cai, Shenjin Huang, Ye Zhang, Jinpeng Lu, and Yongbing Zhang. Attrimil: Revisiting attention- based multiple instance learning for whole-slide pathological image classification from a perspective of instance attributes.Medical Image Analysis, 103:103631, 2025

work page 2025
[19]

Attention- challenging multiple instance learning for whole slide image classification

Yunlong Zhang, Honglin Li, Yunxuan Sun, Sunyi Zheng, Chenglu Zhu, and Lin Yang. Attention- challenging multiple instance learning for whole slide image classification. InEuropean conference on computer vision, pages 125–143. Springer, 2024

work page 2024
[20]

Lu, Bowen Chen, Drew F

Ming Y . Lu, Bowen Chen, Drew F. K. Williamson, Richard J. Chen, Ivy Liang, Tong Ding, Guillaume Jaume, Igor Odintsov, Long Phi Le, Georg Gerber, Anil V . Parwani, Andrew Zhang, and Faisal Mahmood. A visual-language foundation model for computational pathology.Nature Medicine, 30(3):863–874, 2024

work page 2024
[21]

Wright, Ari Robicsek, Brian Piening, Carlo Bifulco, Sheng Wang, and Hoifung Poon

Hanwen Xu, Naoto Usuyama, Jaspreet Bagga, Sheng Zhang, Rajesh Rao, Tristan Naumann, Cliff Wong, Zelalem Gero, Javier González, Yu Gu, Yanbo Xu, Mu Wei, Wenhui Wang, Shuming Ma, Furu Wei, Jianwei Yang, Chunyuan Li, Jianfeng Gao, Jaylen Rosemon, Tucker Bower, Soohee Lee, Roshanthi Weerasinghe, Bill J. Wright, Ari Robicsek, Brian Piening, Carlo Bifulco, Shen...

work page 2024
[22]

Hongyi Wang, Luyang Luo, Fang Wang, Ruofeng Tong, Yen-Wei Chen, Hongjie Hu, Lanfen Lin, and Hao Chen. Rethinking multiple instance learning for whole slide image classification: A bag-level classifier is a good instance-level teacher.IEEE Transactions on Medical Imaging, 43(11):3964–3976, 2024

work page 2024
[23]

Concept bottleneck models

Pang Wei Koh, Thao Nguyen, Yew Siang Tang, Stephen Mussmann, Emma Pierson, Been Kim, and Percy Liang. Concept bottleneck models. InProceedings of the 37th International Conference on Machine Learning, volume 119 ofProceedings of Machine Learning Research, pages 5338–5348. PMLR, 2020

work page 2020
[24]

Grad-cam: Visual explanations from deep networks via gradient-based localization

Ramprasaath R Selvaraju, Michael Cogswell, Abhishek Das, Ramakrishna Vedantam, Devi Parikh, and Dhruv Batra. Grad-cam: Visual explanations from deep networks via gradient-based localization. In Proceedings of the IEEE international conference on computer vision, pages 618–626, 2017

work page 2017
[25]

Additive MIL: Intrinsically interpretable multiple instance learning for pathology

Syed Ashar Javed, Dinkar Juyal, Harshith Padigela, Amaro Taylor-Weiner, Limin Yu, and aaditya prakash. Additive MIL: Intrinsically interpretable multiple instance learning for pathology. In Alice H. Oh, Alekh Agarwal, Danielle Belgrave, and Kyunghyun Cho, editors,Advances in Neural Information Processing Systems, 2022

work page 2022
[26]

Gupta, and Prateek Prasanna

Saarthak Kapse, Pushpak Pati, Srijan Das, Jingwei Zhang, Chao Chen, Maria Vakalopoulou, Joel Saltz, Dimitris Samaras, Rajarsi R. Gupta, and Prateek Prasanna. SI-MIL: Taming deep MIL for self- interpretability in gigapixel histopathology. InProceedings of the IEEE/CVF Conference on Computer Vision and Pattern Recognition (CVPR), pages 11226–11237, 2024

work page 2024
[27]

Evaluating the visualization of what a deep neural network has learned.IEEE Transactions on Neural Networks and Learning Systems, 28(11):2660–2673, 2017

Wojciech Samek, Alexander Binder, Grégoire Montavon, Sebastian Lapuschkin, and Klaus-Robert Müller. Evaluating the visualization of what a deep neural network has learned.IEEE Transactions on Neural Networks and Learning Systems, 28(11):2660–2673, 2017

work page 2017
[28]

xMIL: Insightful explanations for multiple instance learning in histopathology

Julius Hense, Mina Jamshidi Idaji, Oliver Eberle, Thomas Schnake, Jonas Dippel, Laure Ciernik, Oliver Buchstab, Andreas Mock, Frederick Klauschen, and Klaus Robert Müller. xMIL: Insightful explanations for multiple instance learning in histopathology. InThe Thirty-eighth Annual Conference on Neural Information Processing Systems, 2024

work page 2024
[29]

Batarseh

Sai Gurrapu, Ajay Kulkarni, Lifu Huang, Ismini Lourentzou, and Feras A. Batarseh. Rationalization for explainable nlp: a survey.Frontiers in Artificial Intelligence, V olume 6 - 2023, 2023

work page 2023
[30]

Rationalizing neural predictions

Tao Lei, Regina Barzilay, and Tommi Jaakkola. Rationalizing neural predictions. InProceedings of the 2016 Conference on Empirical Methods in Natural Language Processing, pages 107–117, Austin, Texas, November 2016. Association for Computational Linguistics

work page 2016
[31]

Interpretable neural predictions with differentiable binary variables

Jasmijn Bastings, Wilker Aziz, and Ivan Titov. Interpretable neural predictions with differentiable binary variables. InProceedings of the 57th Annual Meeting of the Association for Computational Linguistics, pages 2963–2977, Florence, Italy, July 2019. Association for Computational Linguistics

work page 2019
[32]

Boosting explainability through selective rationalization in pre-trained language models

Libing Yuan, Shuaibo Hu, Kui Yu, and Le Wu. Boosting explainability through selective rationalization in pre-trained language models. InProceedings of the 31st ACM SIGKDD Conference on Knowledge Discovery and Data Mining V .1, page 1867–1878, 2025. 11

work page 2025
[33]

Selective classification for deep neural networks

Yonatan Geifman and Ran El-Yaniv. Selective classification for deep neural networks. InAdvances in Neural Information Processing Systems, volume 30. Curran Associates, Inc., 2017

work page 2017
[34]

Surat Teerapittayanon, Bradley McDanel, and H. T. Kung. BranchyNet: Fast inference via early exiting from deep neural networks. In2016 23rd International Conference on Pattern Recognition (ICPR), pages 2464–2469. IEEE, 2016

work page 2016
[35]

Rethinking cooperative rationalization: Introspec- tive extraction and complement control

Mo Yu, Shiyu Chang, Yang Zhang, and Tommi Jaakkola. Rethinking cooperative rationalization: Introspec- tive extraction and complement control. InProceedings of the 2019 Conference on Empirical Methods in Natural Language Processing and the 9th International Joint Conference on Natural Language Processing (EMNLP-IJCNLP), pages 4094–4103, Hong Kong, China, ...

work page 2019
[36]

Estimating or propagating gradients through stochastic neurons for conditional computation, 2013

Yoshua Bengio, Nicholas Léonard, and Aaron Courville. Estimating or propagating gradients through stochastic neurons for conditional computation, 2013

work page 2013
[37]

Plataniotis

Linfeng Ye, Shayan Mohajer Hamidi, Zhixiang Chi, Guang Li, Mert Pilanci, Takahiro Ogawa, Miki Haseyama, and Konstantinos N. Plataniotis. ASMIL: Attention-stabilized multiple instance learning for whole-slide imaging. InThe Fourteenth International Conference on Learning Representations, 2026

work page 2026
[38]

Reamil: Reasoning- and evidence-aware multiple instance learning for whole-slide histopathology

Hyun Do Jung, Jungwon Choi, and Hwiyoung Kim. Reamil: Reasoning- and evidence-aware multiple instance learning for whole-slide histopathology. InProceedings of the IEEE/CVF Winter Conference on Applications of Computer Vision (WACV) Workshops, pages 40–45, March 2026

work page 2026
[39]

Maddison, Andriy Mnih, and Yee Whye Teh

Chris J. Maddison, Andriy Mnih, and Yee Whye Teh. The concrete distribution: A continuous relaxation of discrete random variables. InInternational Conference on Learning Representations, 2017

work page 2017
[40]

Categorical reparameterization with gumbel-softmax

Eric Jang, Shixiang Gu, and Ben Poole. Categorical reparameterization with gumbel-softmax. In International Conference on Learning Representations, 2017

work page 2017
[41]

sufficiency objective

Vitali Petsiuk, Abir Das, and Kate Saenko. RISE: Randomized input sampling for explanation of black-box models. InBritish Machine Vision Conference (BMVC), 2018. A Qualitative Illustration This appendix shows where FOCI-selected tiles appear, in WSI context, relative to two attention/selection baselines on the same input bag. The figure is illustrative an...

work page arXiv 2018