TRACE: A Concept Bottleneck Model for Longitudinal 3D Glioblastoma Response Assessment

Abdulrahman M. Selim; Alia Tarek; Hamsa Saberr; Hamza Elghonemy; Hasan Md Tusfiqur Alam Daniel Sonntag; Omair Shahzad Bhatti; Tamer Basha; Youssef Afify

arxiv: 2606.30313 · v1 · pith:OENWB2XWnew · submitted 2026-06-29 · 💻 cs.CV · cs.LG

TRACE: A Concept Bottleneck Model for Longitudinal 3D Glioblastoma Response Assessment

Alia Tarek , Hamsa Saberr , Hamza Elghonemy , Youssef Afify , Tamer Basha , Omair Shahzad Bhatti , Abdulrahman M. Selim , Hasan Md Tusfiqur Alam Daniel Sonntag This is my paper

Pith reviewed 2026-06-30 06:49 UTC · model grok-4.3

classification 💻 cs.CV cs.LG

keywords concept bottleneck modelglioblastomaRANO criterialongitudinal MRIresponse assessmentinterpretable AI3D vision encodertumor measurements

0 comments

The pith

TRACE frames glioblastoma response assessment as structured concept reasoning using RANO-aligned bottlenecks on longitudinal 3D MRI.

A machine-rendered reading of the paper's core claim, the machinery that carries it, and where it could break.

The paper introduces TRACE, a concept bottleneck model that predicts clinically meaningful tumor measurements from paired baseline and follow-up MRI scans, then applies deterministic RANO rules to classify response into four categories. This approach aims to make predictions interpretable and correctable by clinicians, unlike direct image-to-label deep learning methods. It reports performance on the LUMIERE dataset that improves on a concept bottleneck baseline and stays competitive with non-interpretable models. A sympathetic reader would care because it suggests a path toward transparent AI tools that align with existing clinical criteria rather than replacing them with opaque predictions.

Core claim

TRACE processes paired multimodal MRI scans with a shared 3D vision encoder to predict root concepts of tumor measurements, computes downstream RANO-derived concepts through deterministic rules, and incorporates scan interval and new-lesion information as passthrough concepts. On 5-fold patient-wise cross-validation, it achieves a 4-class macro F1 of 0.4769 and binary progression F1 of 0.7085, with ablations confirming the value of the expert RANO graph and intervention-consistency training. Intervention experiments show that correcting concepts can improve predictions.

What carries the argument

The RANO 2.0-aligned concept bottleneck that separates root tumor measurement concepts from deterministic downstream reasoning and passthrough concepts.

If this is right

The expert RANO graph and intervention-consistency training are important for performance.
Correcting predicted concepts can improve downstream response predictions.
Structured concept bottlenecks offer a transparent direction for longitudinal response assessment.
Larger protocol-aligned datasets and external validation are needed.

Where Pith is reading between the lines

These are editorial extensions of the paper, not claims the author makes directly.

This structured approach could extend to other standardized response criteria such as RECIST in different cancers.
Improving root concept accuracy with better segmentation would raise final label performance without altering the reasoning layer.
The model enables clinician corrections at the measurement level rather than only at the final label.
It demonstrates value in aligning AI outputs with existing clinical workflows instead of bypassing them.

Load-bearing premise

The predicted tumor measurements from imaging are accurate enough that applying the deterministic RANO rules yields clinically valid response labels.

What would settle it

A validation set where the model's concept predictions match expert tumor measurements but the resulting response labels still disagree with expert consensus would show that the deterministic RANO rules do not fully capture clinical judgment.

Figures

Figures reproduced from arXiv: 2606.30313 by Abdulrahman M. Selim, Alia Tarek, Hamsa Saberr, Hamza Elghonemy, Hasan Md Tusfiqur Alam Daniel Sonntag, Omair Shahzad Bhatti, Tamer Basha, Youssef Afify.

**Figure 1.** Figure 1: Overview of our proposed Trace CBM. (1) Baseline and follow-up MRI with segmentation masks are provided as input; (2) shared siamese 3D encoders extract MRI- and segmentation-based features and explicit delta representations; (3) these features are fused into a joint representation; (4) a causal concept bottleneck combines predicted root concepts, rule-based deterministic nodes, and passthrough clinical me… view at source ↗

**Figure 2.** Figure 2: Representative segmentation case from LUMIERE. Top row: three MRI modalities with the HD-GLIOAUTO overlay. Bottom row: non-enhancing (yellow) and enhancing (red) tumor labels in isolation. These regions are used to compute volumetric concepts for Trace. 4.1.2. Label Distribution The RANO label distribution is strongly imbalanced, as shown in [PITH_FULL_IMAGE:figures/full_fig_p012_2.png] view at source ↗

**Figure 3.** Figure 3: RANO label distribution across all 361 baseline–follow-up pairs in LUMIERE. PD accounts for 63.7% of samples, leading to a strongly imbalanced 4-class classification task. 4.2. Baseline Models We compare Trace against two baseline groups [PITH_FULL_IMAGE:figures/full_fig_p012_3.png] view at source ↗

**Figure 4.** Figure 4: Per-class performance and pooled errors on LUMIERE. Despite using ground-truth concepts, the Rule-based RANO oracle (GT concept) reaches only 0.346± 0.038 4-class macro F1 and 0.681 ± 0.058 binary macro F1, which is 13 percentage points below Trace on the 4-class task. It also recovers none of the 26 ground-truth CR cases and misses 41% of PD cases across the five folds. This suggests that the LUMIERE labe… view at source ↗

**Figure 5.** Figure 5: Signed CaCE per concept and per RANO class. Longitudinal-change concepts shift probability mass in directions consistent with RANO response categories. single-concept contribution in that fold; the random policy reveals random subsets of size 𝑘, averaged over 12 random subsets per 𝑘. Across folds, OISord = 0.576 ± 0.085 versus OISrnd = 0.392 ± 0.066, gives a per-fold gap of +0.184±0.048 macro-F1. The order… view at source ↗

**Figure 6.** Figure 6: Oracle concept-intervention policy on LUMIERE (5-fold CV, 𝑘max=8). (a) Mean 4-class macro-F1 across folds when the top-𝑘 concepts are replaced with ground-truth values, ranked by per-fold single-concept lift (blue, ordered) or chosen at random (grey, mean over 12 random subsets). Shaded bands are ±1 std across folds. The ordered policy improves macro-F1 monotonically and saturates around 𝑘=5, while the ran… view at source ↗

**Figure 7.** Figure 7: Concept-intervention walkthrough on a misclassified case (Patient-004, validation fold 1). Left: default forward pass; the model predicts CR while the ground-truth label is PD. Right: same input with intervention_index = 1 on enhancing_tumor_volume_cm3 and followup_non_enhancing_volume_cm3 (yellow outlines). The corrected concept values propagate deterministically along the RANO chain (Δ% →flags→response) … view at source ↗

**Figure 8.** Figure 8: Preprocessing pipeline applied to the raw MRI data. The four MRI modalities undergo atlas registration to MNI152 space, N4 bias field correction, histogram matching, and z-score normalization. The HD-GLIOAUTO segmentation mask is mapped to the same template space using the transformation estimated from the corresponding MRI, with nearest-neighbor interpolation used to preserve discrete label values [PITH… view at source ↗

**Figure 9.** Figure 9: summarizes the overall magnitude of concept influence by ranking concepts according to their CaCE-TV scores, independent of the direction of their class-specific effects [PITH_FULL_IMAGE:figures/full_fig_p029_9.png] view at source ↗

read the original abstract

Longitudinal glioblastoma response assessment requires comparing subtle tumor changes across MRI time points using structured clinical criteria such as RANO. However, most deep learning methods predict response labels directly from imaging features, which limits clinical inspection, verification, and correction. We introduce TRACE, a RANO 2.0-aligned concept bottleneck model for interpretable 4-class glioblastoma response classification on longitudinal 3D MRI. TRACE processes paired baseline and follow-up multimodal MRI scans with a shared 3D vision encoder, predicts clinically meaningful tumor measurements as root concepts, computes downstream RANO-derived concepts through deterministic rules, and incorporates scan interval and new-lesion information as passthrough concepts. This design frames response assessment as structured concept reasoning rather than direct image-to-label prediction. Using 5-fold patient-wise cross-validation on the LUMIERE dataset, TRACE achieves a 4-class macro F1 of 0.4769 and a binary progression-versus-non-progression macro F1 of 0.7085. It improves over a concept bottleneck baseline and remains within the range of published non-interpretable deep learning approaches. Ablation studies show that the expert RANO graph and intervention-consistency training are important for performance, while intervention experiments demonstrate that correcting concepts can improve downstream predictions. These results suggest that structured concept bottlenecks offer a transparent and clinically aligned direction for longitudinal glioblastoma response assessment, while highlighting the need for larger protocol-aligned datasets and external validation.

Editorial analysis

A structured set of objections, weighed in public.

Desk editor's note, referee report, simulated authors' rebuttal, and a circularity audit. Tearing a paper down is the easy half of reading it; the pith above is the substance, this is the friction.

Desk Editor's Note private letter to a colleague

TRACE applies RANO-aligned concept bottlenecks to longitudinal 3D GBM response but the missing root-concept accuracy numbers leave the main claim hard to verify.

read the letter

The main thing to know is that TRACE is a concept bottleneck model that predicts root tumor measurements from longitudinal 3D MRI and then uses deterministic RANO 2.0 rules to classify response into four classes. This framing is new.

The paper does well by aligning the model structure with existing clinical criteria and by showing that the expert graph and intervention-consistency training contribute to the reported F1 scores. The 4-class macro F1 of 0.4769 and binary of 0.7085 are given from patient-wise cross-validation, and the intervention experiments indicate that correcting concepts can change the predictions.

The soft spots are the missing accuracy metrics for the root concepts themselves. No Dice scores, volume errors, or diameter accuracies are provided, so it is not possible to verify that the RANO rules are being applied to accurate inputs rather than the model learning a shortcut. The abstract also lacks error bars and statistical tests on the F1 numbers. These are real gaps but not fatal if the full paper has more details.

The assumption that the LUMIERE dataset is representative is noted as a limitation in the abstract itself.

This paper is for researchers interested in interpretable models for oncology imaging. It shows honest engagement with the literature on concept bottlenecks and RANO.

I would bring this to a reading group as maybe, to discuss the design choices. I would not cite it in my own work yet. It deserves peer review to allow the authors to address the concept accuracy question.

Referee Report

3 major / 2 minor

Summary. The paper introduces TRACE, a RANO 2.0-aligned concept bottleneck model for 4-class glioblastoma response classification on longitudinal 3D MRI. It uses a shared 3D vision encoder to predict root concepts (tumor measurements), applies deterministic rules to compute downstream RANO-derived concepts, incorporates passthrough concepts for scan interval and new lesions, and reports 4-class macro F1 of 0.4769 and binary progression F1 of 0.7085 via 5-fold patient-wise CV on LUMIERE, with ablations claiming importance of the expert graph and intervention-consistency training plus intervention experiments showing concept correction benefits.

Significance. If root-concept fidelity is verified, the approach supplies a clinically aligned, inspectable alternative to direct image-to-label models in a domain where RANO criteria are standard; the deterministic rule pathway and intervention mechanism are genuine strengths that could enable verification and correction. The reported F1 values sit within published ranges for non-interpretable methods, but the absence of concept-level metrics prevents assessing whether the gains arise from structured reasoning.

major comments (3)

[Results] Results section: no quantitative metrics (volume MAE, diameter error, Dice overlap, or equivalent) are supplied for the root-concept predictions of tumor measurements on the same folds used for the final F1 scores. Because the central claim is that clinically valid labels are produced by accurate concept prediction followed by deterministic RANO rules, the lack of these numbers leaves open the possibility that the vision encoder is learning a direct image-to-label mapping that merely correlates with the derived labels.
[Methods and Experiments] Methods and Experiments: the intervention-consistency training objective and the ablation studies that supposedly demonstrate the importance of the expert RANO graph are described only at high level; no table or figure quantifies the performance drop when either component is removed, so their claimed contribution to the reported 0.4769 / 0.7085 F1 scores cannot be evaluated.
[Abstract and Results] Abstract and Results: the 4-class and binary macro F1 figures are given without error bars, standard deviations across folds, or statistical tests against the concept-bottleneck baseline, making it impossible to judge whether the stated improvement is reliable or within the variability of the 5-fold patient-wise split.

minor comments (2)

[Figures and Captions] Figure captions and text should explicitly state how concept-prediction accuracy was (or was not) measured during training and evaluation.
[Dataset] The LUMIERE dataset description would benefit from a table summarizing the distribution of response classes and scan intervals to allow readers to assess class balance and temporal coverage.

Simulated Author's Rebuttal

3 responses · 0 unresolved

We thank the referee for the constructive comments, which highlight important aspects of validating the concept-bottleneck design. We address each major comment below and commit to revisions that strengthen the manuscript without altering its core claims.

read point-by-point responses

Referee: [Results] Results section: no quantitative metrics (volume MAE, diameter error, Dice overlap, or equivalent) are supplied for the root-concept predictions of tumor measurements on the same folds used for the final F1 scores. Because the central claim is that clinically valid labels are produced by accurate concept prediction followed by deterministic RANO rules, the lack of these numbers leaves open the possibility that the vision encoder is learning a direct image-to-label mapping that merely correlates with the derived labels.

Authors: We agree that root-concept fidelity metrics are necessary to substantiate the claim that performance derives from structured reasoning rather than direct image-to-label correlation. The current manuscript reports only downstream classification F1 and does not include volume MAE, diameter error, or Dice scores for the root tumor measurements on the same 5-fold splits. We will add these metrics (computed on held-out folds) to the Results section and an expanded supplementary table in the revision. revision: yes
Referee: [Methods and Experiments] Methods and Experiments: the intervention-consistency training objective and the ablation studies that supposedly demonstrate the importance of the expert RANO graph are described only at high level; no table or figure quantifies the performance drop when either component is removed, so their claimed contribution to the reported 0.4769 / 0.7085 F1 scores cannot be evaluated.

Authors: The manuscript states that ablation studies show the expert RANO graph and intervention-consistency training are important, yet provides no numerical performance drops. We will expand the Experiments section with a dedicated ablation table reporting 4-class and binary macro F1 for the full model versus variants without the graph and without the consistency loss, using the same 5-fold splits. revision: yes
Referee: [Abstract and Results] Abstract and Results: the 4-class and binary macro F1 figures are given without error bars, standard deviations across folds, or statistical tests against the concept-bottleneck baseline, making it impossible to judge whether the stated improvement is reliable or within the variability of the 5-fold patient-wise split.

Authors: We acknowledge that the reported F1 scores lack fold-wise variability measures and statistical comparison. In the revision we will add standard deviations across the five patient-wise folds to all reported metrics, include error bars on relevant figures, and report paired statistical tests (e.g., McNemar or Wilcoxon) against the concept-bottleneck baseline. revision: yes

Circularity Check

0 steps flagged

No significant circularity; derivation uses external deterministic rules

full rationale

The paper's central chain predicts root concepts (tumor measurements) from a 3D vision encoder, then applies fixed external RANO 2.0 deterministic rules to produce downstream concepts and final labels. This separation means the output labels are not equivalent to the model inputs by construction, nor are any fitted parameters renamed as predictions. No load-bearing self-citations, uniqueness theorems from prior author work, or ansatzes smuggled via citation are present in the provided text. The intervention-consistency objective is noted at high level but does not create a self-definitional loop. The reported F1 scores therefore reflect an independent evaluation of the structured pipeline rather than a tautological reduction.

Axiom & Free-Parameter Ledger

0 free parameters · 1 axioms · 0 invented entities

Review performed on abstract only; full model architecture, training losses, and exact concept definitions are unavailable, limiting enumeration of fitted parameters and background assumptions.

axioms (1)

domain assumption RANO response categories can be faithfully recovered from a small set of tumor measurements via deterministic rules without loss of clinical validity
The paper states that downstream RANO-derived concepts are computed through deterministic rules.

pith-pipeline@v0.9.1-grok · 5834 in / 1269 out tokens · 31964 ms · 2026-06-30T06:49:48.959877+00:00 · methodology

discussion (0)

Reference graph

Works this paper leans on

52 extracted references · 38 canonical work pages · 2 internal anchors

[1]

J. P. Thakkar, T. A. Dolecek, C. Horbinski, Q. T. Ostrom, D. D. Lightner, J. S. Barnholtz-Sloan, J. L. Villano, Epidemiologic and molecular prognostic review of glioblastoma, Cancer Epidemiology, Biomarkers & Prevention 23 (2014) 1985–1996. doi:10.1158/1055-9965.EPI-14-0275

work page doi:10.1158/1055-9965.epi-14-0275 2014
[2]

P. Y. Wen, M. van den Bent, G. Youssef, T. F. Cloughesy, B. M. Ellingson, M. Weller, E. Galanis, D. P. Barboriak, J. de Groot, M. R. Gilbert, R. Huang, A. B. Lassman, M. Mehta, A. M. Molinaro, M. Preusser, R. Rahman, L. K. Shankar, R. Stupp, J. E. Villanueva-Meyer, W. Wick, D. R. Macdonald, D. A. Reardon, M. A. Vogelbaum, S. M. Chang, RANO 2.0: Update to ...

work page doi:10.1200/jco.23.01059 2023
[3]

Zhang, Y

S. Zhang, Y. Sun, Y. Ao, X. Zhang, R. Yang, J. Xu, Z. Ai, H. Zhang, X. Yang, Y. Xu, K. Li, D. Chen, Glomia-pro: A generalizable longitudinal medical image analysis framework for disease progres- sion prediction, 2025.arXiv:2507.12500

work page arXiv 2025
[4]

Moassefi, S

M. Moassefi, S. Faghani, G. M. Conte, R. O. Kowalchuk, S. Vahdati, D. J. Crompton, C. Perez-Vega, R. A. D. Cabreja, S. A. Vora, A. Quiñones-Hinojosa, I. F. Parney, D. M. Trifiletti, B. J. Erickson, A deep learning model for discriminating true progression from pseudoprogression in glioblastoma patients, Journal of Neuro-Oncology 159 (2022) 447–455. doi: 1...

work page doi:10.1007/s11060-022-04080-x 2022
[5]

Khalighi, K

S. Khalighi, K. Reddy, A. Midya, K. B. Pandav, A. Madabhushi, M. Abedalthagafi, Artificial intelligence in neuro-oncology: advances and challenges in brain tumor diagnosis, prognosis, and precision treatment, npj Precision Oncology 8 (2024) 80. doi:10.1038/s41698-024-00575-0

work page doi:10.1038/s41698-024-00575-0 2024
[6]

Rončević, N

A. Rončević, N. Koruga, A. S. Koruga, R. Rončević, Artificial intelligence in glioblastoma — transforming diagnosis and treatment, Chinese Neurosurgical Journal 11 (2025) 6. doi: 10.1186/ s41016-025-00399-2

2025
[7]

D. J. Ghadimi, A. M. Vahdani, H. Karimi, P. Ebrahimi, M. Fathi, F. Moodi, A. Habibzadeh, F. Kho- dadadi Shoushtari, G. Valizadeh, H. Mobarak Salari, H. Saligheh Rad, Deep Learning-Based Techniques in Glioma Brain Tumor Segmentation Using Multi-Parametric MRI: A Review on Clinical Applications and Future Outlooks, Journal of Magnetic Resonance Imaging 61 (...

work page doi:10.1002/jmri.29543 2024
[8]

Hagenbuchner, The black box problem of ai in oncology, Journal of Physics: Conference Series 1662 (2020) 012012

M. Hagenbuchner, The black box problem of ai in oncology, Journal of Physics: Conference Series 1662 (2020) 012012. doi:10.1088/1742-6596/1662/1/012012

work page doi:10.1088/1742-6596/1662/1/012012 2020
[9]

M. A. Gulum, C. M. Trombley, M. Kantardzic, A review of explainable deep learning cancer detec- tion models in medical imaging, Applied Sciences 11 (2021) 4573. doi:10.3390/app11104573

work page doi:10.3390/app11104573 2021
[10]

Charaabi, H

H. Charaabi, H. Mzoughi, R. E. Hamdi, M. Njah, EXplainable Artificial Intelligence (XAI) for MRI Brain Tumor Diagnosis: A Survey, in: Proceedings of the International Conference on Cyberworlds, 2023. doi:10.1109/CW58918.2023.00033

work page doi:10.1109/cw58918.2023.00033 2023
[11]

Desai, P

K. Desai, P. K. Patel, A. Barve, Enhancing trust in ai-driven diagnostics: A review of brain tumor classification using cnns with a hybrid grad-cam and counterfactual xai framework, in: 2025 4th International Conference on Applied Artificial Intelligence and Computing (ICAAIC), IEEE, Salem, India, 2025, pp. 1592–1598. doi:10.1109/ICAAIC64647.2025.11330252

work page doi:10.1109/icaaic64647.2025.11330252 2025
[12]

R. R. Selvaraju, M. Cogswell, A. Das, R. Vedantam, D. Parikh, D. Batra, Grad-CAM: Visual Explanations from Deep Networks via Gradient-Based Localization, in: 2017 IEEE International Conference on Computer Vision (ICCV), 2017, pp. 618–626. doi:10.1109/ICCV.2017.74

work page doi:10.1109/iccv.2017.74 2017
[13]

P. W. Koh, T. Nguyen, Y. S. Tang, S. Mussmann, E. Pierson, B. Kim, P. Liang, Concept bottleneck models, in: Proceedings of the 37th International Conference on Machine Learning, volume 119 of Proceedings of Machine Learning Research, PMLR, 2020, pp. 5338–5348. URL: https://proceedings. mlr.press/v119/koh20a.html

2020
[14]

H. M. T. Alam, D. Srivastav, M. A. Kadir, D. Sonntag, Towards interpretable radiology report generation via concept bottlenecks using a multi-agentic RAG, in: C. Hauff, C. Macdonald, D. Jannach, G. Kazai, F. M. Nardini, F. Pinelli, F. Silvestri, N. Tonellotto (Eds.), Advances in Information Retrieval - 47th European Conference on Information Retrieval, EC...

work page doi:10.1007/978-3-031-88714-7_18 2025
[15]

S. Shin, Y. Jo, S. Ahn, N. Lee, A closer look at the intervention procedure of concept bottleneck models, in: Workshop on Trustworthy and Socially Responsible Machine Learning, NeurIPS 2022,

2022
[16]

URL: https://openreview.net/forum?id=PUspzfGsgY
[17]

H. M. T. Alam, D. Srivastav, A. Mohamed Selim, M. A. Kadir, M. M. H. Shuvo, D. Sonntag, Cbm-rag: Demonstrating enhanced interpretability in radiology report generation with multi-agent rag and concept bottleneck models, in: Companion Proceedings of the 17th ACM SIGCHI Symposium on Engineering Interactive Computing Systems, EICS ’25 Companion, Association ...

work page doi:10.1145/3731406.3731970 2025
[18]

G. D. Felice, A. C. Flores, F. D. Santis, S. Santini, J. Schneider, P. Barbiero, A. Termine, Causally reliable concept bottleneck models, in: The Thirty-ninth Annual Conference on Neural Information Processing Systems, 2026. URL: https://openreview.net/forum?id=UX143QGvb8

2026
[19]

Suter, U

Y. Suter, U. Knecht, W. Valenzuela, M. Notter, E. Hewer, P. Schucht, R. Wiest, M. Reyes, The lumiere dataset: Longitudinal glioblastoma mri with expert rano evaluation, Scientific Data 9 (2022) 768. doi:10.1038/s41597-022-01881-7

work page doi:10.1038/s41597-022-01881-7 2022
[20]

Abu Khalaf, A

N. Abu Khalaf, A. Desjardins, J. J. Vredenburgh, D. P. Barboriak, Repeatability of automated image segmentation with BraTumIA in patients with recurrent glioblastoma, AJNR. American Journal of Neuroradiology 42 (2021) 1080–1086. doi:10.3174/ajnr.A7071

work page doi:10.3174/ajnr.a7071 2021
[21]

Kickingereder, F

P. Kickingereder, F. Isensee, I. Tursunova, J. Petersen, U. Neuberger, D. Bonekamp, G. Brugnara, M. Schell, T. Kessler, M. Foltyn, et al., Automated quantitative tumour response assessment of mri in neuro-oncology with artificial neural networks: a multicentre, retrospective study, The Lancet Oncology 20 (2019) 728–740. doi:10.1016/S1470-2045(19)30098-1

work page doi:10.1016/s1470-2045(19)30098-1 2019
[22]

Isensee, P

F. Isensee, P. F. Jaeger, S. A. A. Kohl, J. Petersen, K. H. Maier-Hein, nnu-net: a self-configuring method for deep learning-based biomedical image segmentation, Nature Methods 18 (2020) 203–211. doi:10.1038/s41592-020-01008-z

work page doi:10.1038/s41592-020-01008-z 2020
[23]

Isensee, M

F. Isensee, M. Schell, I. Pflueger, G. Brugnara, D. Bonekamp, U. Neuberger, A. Wick, H.-P. Schlemmer, S. Heiland, W. Wick, M. Bendszus, K. H. Maier-Hein, P. Kickingereder, Automated brain extraction of multisequence MRI using artificial neural networks, Human Brain Mapping 40 (2019) 4952–4964. doi:10.1002/hbm.24750

work page doi:10.1002/hbm.24750 2019
[24]

Suter, M

Y. Suter, M. Notter, R. Meier, T. Loosli, P. Schucht, R. Wiest, M. Reyes, U. Knecht, Evaluating automated longitudinal tumor measurements for glioblastoma response assessment, Frontiers in Radiology 3 (2023). doi:10.3389/fradi.2023.1211859

work page doi:10.3389/fradi.2023.1211859 2023
[25]

Matoso, C

A. Matoso, C. Passarinho, M. P. Loureiro, J. M. Moreira, P. Figueiredo, R. G. Nunes, Towards a deep learning approach for classifying treatment response in glioblastomas, 2025.arXiv:2504.18268

work page arXiv 2025
[26]

2025.11357220

D. Amato, S. Calderaro, L. D. Reitano, G. Lo Bosco, R. Rizzo, F. Vella, Integrating deep learning and radiomic features for glioblastoma treatment response classification, in: 2025 IEEE International Conference on Bioinformatics and Biomedicine (BIBM), IEEE, 2025. doi:10.1109/BIBM66473. 2025.11356532

work page doi:10.1109/bibm66473 2025
[27]

Tikhonov, M

D. Tikhonov, M. Scatolin, M. Banerjee, Q. Ji, A. Jaheen, M. Salem, A. Elsayed, H. Wang, S. Hashmi, M. Yaqub, Predicting Brain Tumor Response to Therapy using a Hybrid Deep Learning and Radiomics Approach, 2025. doi:10.48550/arXiv.2509.06511

work page doi:10.48550/arxiv.2509.06511 2025
[28]

J. V. Jeyakumar, A. Sarker, L. A. Garcia, M. Srivastava, X-CHAR: A Concept-based Explainable Complex Human Activity Recognition Model, Proc. ACM Interact. Mob. Wearable Ubiquitous Technol. 7 (2023) 17:1–17:28. doi:10.1145/3580804

work page doi:10.1145/3580804 2023
[29]

P. Knab, S. Marton, P. J. Schubert, D. Guggiana, C. Bartelt, Concepts in Motion: Temporal Con- cept Bottleneck Model for Interpretable Video Classification, 2026. doi:10.48550/arXiv.2509. 20899, arXiv:2509.20899 [cs.CV] version: 3

work page internal anchor Pith review Pith/arXiv arXiv doi:10.48550/arxiv.2509 2026
[30]

F. Bai, Y. Du, T. Huang, M. Q.-H. Meng, B. Zhao, M3d: Advancing 3d medical image analysis with multi-modal large language models, 2024. doi: 10.48550/arXiv.2404.00578. arXiv:2404.00578

work page doi:10.48550/arxiv.2404.00578 2024
[31]

A. Yan, Y. Wang, Y. Zhong, Z. He, P. Karypis, Z. Wang, C. Dong, A. Gentili, C.-N. Hsu, J. Shang, J. McAuley, Robust and interpretable medical image classifiers via concept bottleneck models, 2023. arXiv:2310.03182

work page arXiv 2023
[32]

Bunnell, Y

A. Bunnell, Y. Glaser, D. Valdez, T. Wolfgruber, A. Altamirano, C. Zamora González, B. Y. Her- nandez, P. Sadowski, J. A. Shepherd, Learning a Clinically-Relevant Concept Bottleneck for Lesion Detection in Breast Ultrasound, in: M. G. Linguraru, Q. Dou, A. Feragen, S. Gian- narou, B. Glocker, K. Lekadir, J. A. Schnabel (Eds.), Medical Image Computing and ...

work page doi:10.1007/978-3-031-72384-1_61 2024
[33]

S. J. Magny, R. Shikhman, A. L. Keppke, Breast Imaging Reporting and Data System, 2023. URL: http://www.ncbi.nlm.nih.gov/books/NBK459169/

2023
[34]

J. Kim, Z. Wang, Q. Qiu, Constructing Concept-Based Models to Mitigate Spurious Correlations with Minimal Human Effort, in: Computer Vision – ECCV 2024: 18th European Conference, Milan, Italy, September 29–October 4, 2024, Proceedings, Part LXXX, Springer-Verlag, Berlin, Heidelberg, 2024, pp. 137–153. doi:10.1007/978-3-031-72989-8_8

work page doi:10.1007/978-3-031-72989-8_8 2024
[35]

Fokkema, T

H. Fokkema, T. van Erven, S. Magliacane, Sample-efficient learning of concepts with theoretical guarantees: from data to concepts without interventions, in: The Thirty-ninth Annual Confer- ence on Neural Information Processing Systems, 2025. URL: https://openreview.net/forum?id= RCXF0UEmuE

2025
[36]

Y. Wu, Y. Liu, Y. Yang, M. S. Yao, W. Yang, X. Shi, L. Yang, D. Li, Y. Liu, S. Yin, C. Lei, M. Zhang, J. C. Gee, X. Yang, W. Wei, S. Gu, A concept-based interpretable model for the diagnosis of choroid neoplasias using multimodal data, Nature Communications 16 (2025) 3504. doi: 10.1038/ s41467-025-58801-7

2025
[37]

Oikarinen, S

T. Oikarinen, S. Das, L. M. Nguyen, T.-W. Weng, Label-free concept bottleneck models, in: The Eleventh International Conference on Learning Representations, 2023. URL: https://openreview. net/forum?id=FlCg47MNvBA

2023
[38]

Prasse, P

K. Prasse, P. Knab, S. Marton, C. Bartelt, M. Keuper, Dcbm: Data-efficient visual concept bottleneck models, in: Forty-second International Conference on Machine Learning, 2025. URL: https: //openreview.net/forum?id=BdO4R6XxUH

2025
[39]

Shrier, R

I. Shrier, R. W. Platt, Reducing bias through directed acyclic graphs, BMC Medical Research Methodology 8 (2008) 70. doi:10.1186/1471-2288-8-70

work page doi:10.1186/1471-2288-8-70 2008
[40]

Evans, B

D. Evans, B. Chaix, T. Lobbedez, C. Verger, A. Flahault, Combining directed acyclic graphs and the change-in-estimate procedure as a novel approach to adjustment-variable selection in epidemiology, BMC Medical Research Methodology 12 (2012) 156. doi:10.1186/1471-2288-12-156

work page doi:10.1186/1471-2288-12-156 2012
[41]

Piccininni, S

M. Piccininni, S. Konigorski, J. L. Rohmann, T. Kurth, Directed acyclic graphs and causal thinking in clinical risk prediction modeling, BMC Medical Research Methodology 20 (2020) 179. doi:10. 1186/s12874-020-01058-z

2020
[42]

Barbiero, M

P. Barbiero, M. E. Zarlenga, F. Giannini, A. Termine, F. Bonchi, M. Jamnik, G. Marra, Actionable Interpretability Must Be Defined in Terms of Symmetries, 2026. doi: 10.48550/arXiv.2601. 12913, arXiv:2601.12913 [cs.AI]

work page doi:10.48550/arxiv.2601 2026
[43]

A. Sun, Y. Yuan, P. Ma, S. Wang, Eliminating information leakage in hard concept bottleneck models with supervised, hierarchical concept learning, 2024.arXiv:2402.05945

work page arXiv 2024
[44]

S. Chen, K. Ma, Y. Zheng, Med3D: Transfer Learning for 3D Medical Image Analysis, 2019. doi:10. 48550/arXiv.1904.00625, arXiv:2601.12913 [cs.AI]

work page internal anchor Pith review Pith/arXiv arXiv 2019
[45]

T.-Y. Lin, P. Goyal, R. Girshick, K. He, P. Dollár, Focal Loss for Dense Object Detection, IEEE Transactions on Pattern Analysis and Machine Intelligence 42 (2020) 318–327. doi: 10.1109/ TPAMI.2018.2858826

work page arXiv 2020
[46]

Havasi, S

M. Havasi, S. Parbhoo, F. Doshi-Velez, Addressing leakage in concept bottleneck models, in: Proceedings of the 36th International Conference on Neural Information Processing Systems, NIPS ’22, Curran Associates Inc., Red Hook, NY, USA, 2022, pp. 23386–23397. URL: https://openreview. net/forum?id=tglniD_fn9

2022
[47]

Gagnon, D

L. Gagnon, D. Gupta, U. Nguyen, M. Correia de Verdier, R. Saluja, G. Mastorakos, N. White, V. Goodwill, C. R. McDonald, T. Beaumont, C. Conlin, T. M. Seibert, J. Hattangadi-Gluth, S. Kesari, J. D. Schulte, D. Piccioni, K. M. Schmainda, N. Farid, A. M. Dale, J. D. Rudie, The University of California San Diego Post-Treatment Glioblastoma (UCSD-PTGBM) annota...

work page doi:10.1038/s41597-025-06499-z 2026
[48]

B. K. K. Fields, E. Calabrese, J. Mongan, S. Cha, C. P. Hess, L. P. Sugrue, S. M. Chang, T. L. Luks, J. E. Villanueva-Meyer, A. M. Rauschecker, J. D. Rudie, The university of california san francisco adult longitudinal post-treatment diffuse glioma mri dataset, Radiology: Artificial Intelligence 6 (2024) e230182. doi:10.1148/ryai.230182

work page doi:10.1148/ryai.230182 2024
[49]

N. J. Tustison, B. B. Avants, P. A. Cook, Y. Zheng, A. Egan, P. A. Yushkevich, J. C. Gee, N4itk: improved n3 bias correction, IEEE Transactions on Medical Imaging 29 (2010) 1310–1320. doi: 10. 1109/TMI.2010.2046908

work page arXiv 2010
[50]

Loshchilov, F

I. Loshchilov, F. Hutter, Decoupled weight decay regularization, in: International Conference on Learning Representations, 2019. URL: https://openreview.net/forum?id=Bkg6RiCqY7

2019
[51]

Goyal, A

Y. Goyal, A. Feder, U. Shalit, B. Kim, Explaining Classifiers with Causal Concept Effect (CaCE),
[52]

doi:10.48550/arXiv.1907.07165, arXiv:1907.07165 [cs.LG]. A. Additional Implementation Details A.1. Clinical Background on SPD The Sum of Products of Diameters (SPD) is defined as the product of the two largest perpendicular tumor diameters measured on a single imaging slice. It is retained in RANO-based response assessment because the response thresholds ...

work page doi:10.48550/arxiv.1907.07165 1907

[1] [1]

J. P. Thakkar, T. A. Dolecek, C. Horbinski, Q. T. Ostrom, D. D. Lightner, J. S. Barnholtz-Sloan, J. L. Villano, Epidemiologic and molecular prognostic review of glioblastoma, Cancer Epidemiology, Biomarkers & Prevention 23 (2014) 1985–1996. doi:10.1158/1055-9965.EPI-14-0275

work page doi:10.1158/1055-9965.epi-14-0275 2014

[2] [2]

P. Y. Wen, M. van den Bent, G. Youssef, T. F. Cloughesy, B. M. Ellingson, M. Weller, E. Galanis, D. P. Barboriak, J. de Groot, M. R. Gilbert, R. Huang, A. B. Lassman, M. Mehta, A. M. Molinaro, M. Preusser, R. Rahman, L. K. Shankar, R. Stupp, J. E. Villanueva-Meyer, W. Wick, D. R. Macdonald, D. A. Reardon, M. A. Vogelbaum, S. M. Chang, RANO 2.0: Update to ...

work page doi:10.1200/jco.23.01059 2023

[3] [3]

Zhang, Y

S. Zhang, Y. Sun, Y. Ao, X. Zhang, R. Yang, J. Xu, Z. Ai, H. Zhang, X. Yang, Y. Xu, K. Li, D. Chen, Glomia-pro: A generalizable longitudinal medical image analysis framework for disease progres- sion prediction, 2025.arXiv:2507.12500

work page arXiv 2025

[4] [4]

Moassefi, S

M. Moassefi, S. Faghani, G. M. Conte, R. O. Kowalchuk, S. Vahdati, D. J. Crompton, C. Perez-Vega, R. A. D. Cabreja, S. A. Vora, A. Quiñones-Hinojosa, I. F. Parney, D. M. Trifiletti, B. J. Erickson, A deep learning model for discriminating true progression from pseudoprogression in glioblastoma patients, Journal of Neuro-Oncology 159 (2022) 447–455. doi: 1...

work page doi:10.1007/s11060-022-04080-x 2022

[5] [5]

Khalighi, K

S. Khalighi, K. Reddy, A. Midya, K. B. Pandav, A. Madabhushi, M. Abedalthagafi, Artificial intelligence in neuro-oncology: advances and challenges in brain tumor diagnosis, prognosis, and precision treatment, npj Precision Oncology 8 (2024) 80. doi:10.1038/s41698-024-00575-0

work page doi:10.1038/s41698-024-00575-0 2024

[6] [6]

Rončević, N

A. Rončević, N. Koruga, A. S. Koruga, R. Rončević, Artificial intelligence in glioblastoma — transforming diagnosis and treatment, Chinese Neurosurgical Journal 11 (2025) 6. doi: 10.1186/ s41016-025-00399-2

2025

[7] [7]

D. J. Ghadimi, A. M. Vahdani, H. Karimi, P. Ebrahimi, M. Fathi, F. Moodi, A. Habibzadeh, F. Kho- dadadi Shoushtari, G. Valizadeh, H. Mobarak Salari, H. Saligheh Rad, Deep Learning-Based Techniques in Glioma Brain Tumor Segmentation Using Multi-Parametric MRI: A Review on Clinical Applications and Future Outlooks, Journal of Magnetic Resonance Imaging 61 (...

work page doi:10.1002/jmri.29543 2024

[8] [8]

Hagenbuchner, The black box problem of ai in oncology, Journal of Physics: Conference Series 1662 (2020) 012012

M. Hagenbuchner, The black box problem of ai in oncology, Journal of Physics: Conference Series 1662 (2020) 012012. doi:10.1088/1742-6596/1662/1/012012

work page doi:10.1088/1742-6596/1662/1/012012 2020

[9] [9]

M. A. Gulum, C. M. Trombley, M. Kantardzic, A review of explainable deep learning cancer detec- tion models in medical imaging, Applied Sciences 11 (2021) 4573. doi:10.3390/app11104573

work page doi:10.3390/app11104573 2021

[10] [10]

Charaabi, H

H. Charaabi, H. Mzoughi, R. E. Hamdi, M. Njah, EXplainable Artificial Intelligence (XAI) for MRI Brain Tumor Diagnosis: A Survey, in: Proceedings of the International Conference on Cyberworlds, 2023. doi:10.1109/CW58918.2023.00033

work page doi:10.1109/cw58918.2023.00033 2023

[11] [11]

Desai, P

K. Desai, P. K. Patel, A. Barve, Enhancing trust in ai-driven diagnostics: A review of brain tumor classification using cnns with a hybrid grad-cam and counterfactual xai framework, in: 2025 4th International Conference on Applied Artificial Intelligence and Computing (ICAAIC), IEEE, Salem, India, 2025, pp. 1592–1598. doi:10.1109/ICAAIC64647.2025.11330252

work page doi:10.1109/icaaic64647.2025.11330252 2025

[12] [12]

R. R. Selvaraju, M. Cogswell, A. Das, R. Vedantam, D. Parikh, D. Batra, Grad-CAM: Visual Explanations from Deep Networks via Gradient-Based Localization, in: 2017 IEEE International Conference on Computer Vision (ICCV), 2017, pp. 618–626. doi:10.1109/ICCV.2017.74

work page doi:10.1109/iccv.2017.74 2017

[13] [13]

P. W. Koh, T. Nguyen, Y. S. Tang, S. Mussmann, E. Pierson, B. Kim, P. Liang, Concept bottleneck models, in: Proceedings of the 37th International Conference on Machine Learning, volume 119 of Proceedings of Machine Learning Research, PMLR, 2020, pp. 5338–5348. URL: https://proceedings. mlr.press/v119/koh20a.html

2020

[14] [14]

H. M. T. Alam, D. Srivastav, M. A. Kadir, D. Sonntag, Towards interpretable radiology report generation via concept bottlenecks using a multi-agentic RAG, in: C. Hauff, C. Macdonald, D. Jannach, G. Kazai, F. M. Nardini, F. Pinelli, F. Silvestri, N. Tonellotto (Eds.), Advances in Information Retrieval - 47th European Conference on Information Retrieval, EC...

work page doi:10.1007/978-3-031-88714-7_18 2025

[15] [15]

S. Shin, Y. Jo, S. Ahn, N. Lee, A closer look at the intervention procedure of concept bottleneck models, in: Workshop on Trustworthy and Socially Responsible Machine Learning, NeurIPS 2022,

2022

[16] [16]

URL: https://openreview.net/forum?id=PUspzfGsgY

[17] [17]

H. M. T. Alam, D. Srivastav, A. Mohamed Selim, M. A. Kadir, M. M. H. Shuvo, D. Sonntag, Cbm-rag: Demonstrating enhanced interpretability in radiology report generation with multi-agent rag and concept bottleneck models, in: Companion Proceedings of the 17th ACM SIGCHI Symposium on Engineering Interactive Computing Systems, EICS ’25 Companion, Association ...

work page doi:10.1145/3731406.3731970 2025

[18] [18]

G. D. Felice, A. C. Flores, F. D. Santis, S. Santini, J. Schneider, P. Barbiero, A. Termine, Causally reliable concept bottleneck models, in: The Thirty-ninth Annual Conference on Neural Information Processing Systems, 2026. URL: https://openreview.net/forum?id=UX143QGvb8

2026

[19] [19]

Suter, U

Y. Suter, U. Knecht, W. Valenzuela, M. Notter, E. Hewer, P. Schucht, R. Wiest, M. Reyes, The lumiere dataset: Longitudinal glioblastoma mri with expert rano evaluation, Scientific Data 9 (2022) 768. doi:10.1038/s41597-022-01881-7

work page doi:10.1038/s41597-022-01881-7 2022

[20] [20]

Abu Khalaf, A

N. Abu Khalaf, A. Desjardins, J. J. Vredenburgh, D. P. Barboriak, Repeatability of automated image segmentation with BraTumIA in patients with recurrent glioblastoma, AJNR. American Journal of Neuroradiology 42 (2021) 1080–1086. doi:10.3174/ajnr.A7071

work page doi:10.3174/ajnr.a7071 2021

[21] [21]

Kickingereder, F

P. Kickingereder, F. Isensee, I. Tursunova, J. Petersen, U. Neuberger, D. Bonekamp, G. Brugnara, M. Schell, T. Kessler, M. Foltyn, et al., Automated quantitative tumour response assessment of mri in neuro-oncology with artificial neural networks: a multicentre, retrospective study, The Lancet Oncology 20 (2019) 728–740. doi:10.1016/S1470-2045(19)30098-1

work page doi:10.1016/s1470-2045(19)30098-1 2019

[22] [22]

Isensee, P

F. Isensee, P. F. Jaeger, S. A. A. Kohl, J. Petersen, K. H. Maier-Hein, nnu-net: a self-configuring method for deep learning-based biomedical image segmentation, Nature Methods 18 (2020) 203–211. doi:10.1038/s41592-020-01008-z

work page doi:10.1038/s41592-020-01008-z 2020

[23] [23]

Isensee, M

F. Isensee, M. Schell, I. Pflueger, G. Brugnara, D. Bonekamp, U. Neuberger, A. Wick, H.-P. Schlemmer, S. Heiland, W. Wick, M. Bendszus, K. H. Maier-Hein, P. Kickingereder, Automated brain extraction of multisequence MRI using artificial neural networks, Human Brain Mapping 40 (2019) 4952–4964. doi:10.1002/hbm.24750

work page doi:10.1002/hbm.24750 2019

[24] [24]

Suter, M

Y. Suter, M. Notter, R. Meier, T. Loosli, P. Schucht, R. Wiest, M. Reyes, U. Knecht, Evaluating automated longitudinal tumor measurements for glioblastoma response assessment, Frontiers in Radiology 3 (2023). doi:10.3389/fradi.2023.1211859

work page doi:10.3389/fradi.2023.1211859 2023

[25] [25]

Matoso, C

A. Matoso, C. Passarinho, M. P. Loureiro, J. M. Moreira, P. Figueiredo, R. G. Nunes, Towards a deep learning approach for classifying treatment response in glioblastomas, 2025.arXiv:2504.18268

work page arXiv 2025

[26] [26]

2025.11357220

D. Amato, S. Calderaro, L. D. Reitano, G. Lo Bosco, R. Rizzo, F. Vella, Integrating deep learning and radiomic features for glioblastoma treatment response classification, in: 2025 IEEE International Conference on Bioinformatics and Biomedicine (BIBM), IEEE, 2025. doi:10.1109/BIBM66473. 2025.11356532

work page doi:10.1109/bibm66473 2025

[27] [27]

Tikhonov, M

D. Tikhonov, M. Scatolin, M. Banerjee, Q. Ji, A. Jaheen, M. Salem, A. Elsayed, H. Wang, S. Hashmi, M. Yaqub, Predicting Brain Tumor Response to Therapy using a Hybrid Deep Learning and Radiomics Approach, 2025. doi:10.48550/arXiv.2509.06511

work page doi:10.48550/arxiv.2509.06511 2025

[28] [28]

J. V. Jeyakumar, A. Sarker, L. A. Garcia, M. Srivastava, X-CHAR: A Concept-based Explainable Complex Human Activity Recognition Model, Proc. ACM Interact. Mob. Wearable Ubiquitous Technol. 7 (2023) 17:1–17:28. doi:10.1145/3580804

work page doi:10.1145/3580804 2023

[29] [29]

P. Knab, S. Marton, P. J. Schubert, D. Guggiana, C. Bartelt, Concepts in Motion: Temporal Con- cept Bottleneck Model for Interpretable Video Classification, 2026. doi:10.48550/arXiv.2509. 20899, arXiv:2509.20899 [cs.CV] version: 3

work page internal anchor Pith review Pith/arXiv arXiv doi:10.48550/arxiv.2509 2026

[30] [30]

F. Bai, Y. Du, T. Huang, M. Q.-H. Meng, B. Zhao, M3d: Advancing 3d medical image analysis with multi-modal large language models, 2024. doi: 10.48550/arXiv.2404.00578. arXiv:2404.00578

work page doi:10.48550/arxiv.2404.00578 2024

[31] [31]

A. Yan, Y. Wang, Y. Zhong, Z. He, P. Karypis, Z. Wang, C. Dong, A. Gentili, C.-N. Hsu, J. Shang, J. McAuley, Robust and interpretable medical image classifiers via concept bottleneck models, 2023. arXiv:2310.03182

work page arXiv 2023

[32] [32]

Bunnell, Y

A. Bunnell, Y. Glaser, D. Valdez, T. Wolfgruber, A. Altamirano, C. Zamora González, B. Y. Her- nandez, P. Sadowski, J. A. Shepherd, Learning a Clinically-Relevant Concept Bottleneck for Lesion Detection in Breast Ultrasound, in: M. G. Linguraru, Q. Dou, A. Feragen, S. Gian- narou, B. Glocker, K. Lekadir, J. A. Schnabel (Eds.), Medical Image Computing and ...

work page doi:10.1007/978-3-031-72384-1_61 2024

[33] [33]

S. J. Magny, R. Shikhman, A. L. Keppke, Breast Imaging Reporting and Data System, 2023. URL: http://www.ncbi.nlm.nih.gov/books/NBK459169/

2023

[34] [34]

J. Kim, Z. Wang, Q. Qiu, Constructing Concept-Based Models to Mitigate Spurious Correlations with Minimal Human Effort, in: Computer Vision – ECCV 2024: 18th European Conference, Milan, Italy, September 29–October 4, 2024, Proceedings, Part LXXX, Springer-Verlag, Berlin, Heidelberg, 2024, pp. 137–153. doi:10.1007/978-3-031-72989-8_8

work page doi:10.1007/978-3-031-72989-8_8 2024

[35] [35]

Fokkema, T

H. Fokkema, T. van Erven, S. Magliacane, Sample-efficient learning of concepts with theoretical guarantees: from data to concepts without interventions, in: The Thirty-ninth Annual Confer- ence on Neural Information Processing Systems, 2025. URL: https://openreview.net/forum?id= RCXF0UEmuE

2025

[36] [36]

Y. Wu, Y. Liu, Y. Yang, M. S. Yao, W. Yang, X. Shi, L. Yang, D. Li, Y. Liu, S. Yin, C. Lei, M. Zhang, J. C. Gee, X. Yang, W. Wei, S. Gu, A concept-based interpretable model for the diagnosis of choroid neoplasias using multimodal data, Nature Communications 16 (2025) 3504. doi: 10.1038/ s41467-025-58801-7

2025

[37] [37]

Oikarinen, S

T. Oikarinen, S. Das, L. M. Nguyen, T.-W. Weng, Label-free concept bottleneck models, in: The Eleventh International Conference on Learning Representations, 2023. URL: https://openreview. net/forum?id=FlCg47MNvBA

2023

[38] [38]

Prasse, P

K. Prasse, P. Knab, S. Marton, C. Bartelt, M. Keuper, Dcbm: Data-efficient visual concept bottleneck models, in: Forty-second International Conference on Machine Learning, 2025. URL: https: //openreview.net/forum?id=BdO4R6XxUH

2025

[39] [39]

Shrier, R

I. Shrier, R. W. Platt, Reducing bias through directed acyclic graphs, BMC Medical Research Methodology 8 (2008) 70. doi:10.1186/1471-2288-8-70

work page doi:10.1186/1471-2288-8-70 2008

[40] [40]

Evans, B

D. Evans, B. Chaix, T. Lobbedez, C. Verger, A. Flahault, Combining directed acyclic graphs and the change-in-estimate procedure as a novel approach to adjustment-variable selection in epidemiology, BMC Medical Research Methodology 12 (2012) 156. doi:10.1186/1471-2288-12-156

work page doi:10.1186/1471-2288-12-156 2012

[41] [41]

Piccininni, S

M. Piccininni, S. Konigorski, J. L. Rohmann, T. Kurth, Directed acyclic graphs and causal thinking in clinical risk prediction modeling, BMC Medical Research Methodology 20 (2020) 179. doi:10. 1186/s12874-020-01058-z

2020

[42] [42]

Barbiero, M

P. Barbiero, M. E. Zarlenga, F. Giannini, A. Termine, F. Bonchi, M. Jamnik, G. Marra, Actionable Interpretability Must Be Defined in Terms of Symmetries, 2026. doi: 10.48550/arXiv.2601. 12913, arXiv:2601.12913 [cs.AI]

work page doi:10.48550/arxiv.2601 2026

[43] [43]

A. Sun, Y. Yuan, P. Ma, S. Wang, Eliminating information leakage in hard concept bottleneck models with supervised, hierarchical concept learning, 2024.arXiv:2402.05945

work page arXiv 2024

[44] [44]

S. Chen, K. Ma, Y. Zheng, Med3D: Transfer Learning for 3D Medical Image Analysis, 2019. doi:10. 48550/arXiv.1904.00625, arXiv:2601.12913 [cs.AI]

work page internal anchor Pith review Pith/arXiv arXiv 2019

[45] [45]

T.-Y. Lin, P. Goyal, R. Girshick, K. He, P. Dollár, Focal Loss for Dense Object Detection, IEEE Transactions on Pattern Analysis and Machine Intelligence 42 (2020) 318–327. doi: 10.1109/ TPAMI.2018.2858826

work page arXiv 2020

[46] [46]

Havasi, S

M. Havasi, S. Parbhoo, F. Doshi-Velez, Addressing leakage in concept bottleneck models, in: Proceedings of the 36th International Conference on Neural Information Processing Systems, NIPS ’22, Curran Associates Inc., Red Hook, NY, USA, 2022, pp. 23386–23397. URL: https://openreview. net/forum?id=tglniD_fn9

2022

[47] [47]

Gagnon, D

L. Gagnon, D. Gupta, U. Nguyen, M. Correia de Verdier, R. Saluja, G. Mastorakos, N. White, V. Goodwill, C. R. McDonald, T. Beaumont, C. Conlin, T. M. Seibert, J. Hattangadi-Gluth, S. Kesari, J. D. Schulte, D. Piccioni, K. M. Schmainda, N. Farid, A. M. Dale, J. D. Rudie, The University of California San Diego Post-Treatment Glioblastoma (UCSD-PTGBM) annota...

work page doi:10.1038/s41597-025-06499-z 2026

[48] [48]

B. K. K. Fields, E. Calabrese, J. Mongan, S. Cha, C. P. Hess, L. P. Sugrue, S. M. Chang, T. L. Luks, J. E. Villanueva-Meyer, A. M. Rauschecker, J. D. Rudie, The university of california san francisco adult longitudinal post-treatment diffuse glioma mri dataset, Radiology: Artificial Intelligence 6 (2024) e230182. doi:10.1148/ryai.230182

work page doi:10.1148/ryai.230182 2024

[49] [49]

N. J. Tustison, B. B. Avants, P. A. Cook, Y. Zheng, A. Egan, P. A. Yushkevich, J. C. Gee, N4itk: improved n3 bias correction, IEEE Transactions on Medical Imaging 29 (2010) 1310–1320. doi: 10. 1109/TMI.2010.2046908

work page arXiv 2010

[50] [50]

Loshchilov, F

I. Loshchilov, F. Hutter, Decoupled weight decay regularization, in: International Conference on Learning Representations, 2019. URL: https://openreview.net/forum?id=Bkg6RiCqY7

2019

[51] [51]

Goyal, A

Y. Goyal, A. Feder, U. Shalit, B. Kim, Explaining Classifiers with Causal Concept Effect (CaCE),

[52] [52]

doi:10.48550/arXiv.1907.07165, arXiv:1907.07165 [cs.LG]. A. Additional Implementation Details A.1. Clinical Background on SPD The Sum of Products of Diameters (SPD) is defined as the product of the two largest perpendicular tumor diameters measured on a single imaging slice. It is retained in RANO-based response assessment because the response thresholds ...

work page doi:10.48550/arxiv.1907.07165 1907