Recognition: 2 theorem links
· Lean TheoremJANUS: Anatomy-Conditioned Gating for Robust CT Triage Under Distribution Shift
Pith reviewed 2026-05-14 19:20 UTC · model grok-4.3
The pith
A dual-stream model conditions visual CT embeddings on macro-radiomic priors via anatomically guided gating to improve triage accuracy and reduce false positives under distribution shift.
A machine-rendered reading of the paper's core claim, the machinery that carries it, and where it could break.
Core claim
JANUS is a physiology-guided dual-stream architecture that conditions visual embeddings on macro-radiomic priors via Anatomically Guided Gating. On the MERLIN test set of 5082 cases it attains macro-AUROC 0.88 and AUPRC 0.74, outperforming all reproduced baselines, and generalizes to an external dataset of 2000 cases with AUROC 0.87, with the largest gains on size- and attenuation-defined findings plus improved calibration.
What carries the argument
Anatomically Guided Gating, which fuses a visual embedding stream with macro-radiomic priors extracted from a parallel stream to supply physically grounded conditioning that modulates the final prediction.
If this is right
- Findings defined by quantitative size and attenuation receive the largest accuracy gains.
- Calibration improves on both the internal MERLIN set and the external set.
- High-confidence false positives are reduced substantially more often than true positives under domain shift, as quantified by the Physiological Veto Rate.
- The gating operation provides a measurable mechanism for physically grounded prediction suppression.
Where Pith is reading between the lines
- Similar prior-conditioned gating could be tested on other quantitative imaging tasks such as PET or MRI where attenuation or size biomarkers are available.
- The approach may lower overdiagnosis rates in screening programs by preferentially vetoing spurious high-confidence detections.
- Extending the second stream to include additional quantitative biomarkers could further tighten performance on rare or low-contrast pathologies.
Load-bearing premise
The macro-radiomic priors extracted in the second stream stay accurate and unbiased under the same distribution shifts that degrade the visual stream.
What would settle it
A new external CT dataset in which JANUS either fails to exceed baseline AUROC or shows a higher rate of true-positive suppression than false-positive suppression would falsify the central claim.
Figures
read the original abstract
Automated CT triage requires models that are simultaneously accurate across diverse pathologies and reliable under institutional shift. While Vision Transformers provide strong visual representations, many clinically significant findings are defined by quantitative imaging biomarkers rather than appearance alone. We introduce JANUS, a physiology-guided dual-stream architecture that conditions visual embeddings on macro-radiomic priors via Anatomically Guided Gating. On the MERLIN test set (N=5082), JANUS attains macro-AUROC 0.88 and AUPRC 0.74, outperforming all reproduced baselines. It generalizes to an external dataset N=2000; AUROC 0.87), with the largest gains on findings defined by size and attenuation as well as improved calibration on both datasets. We further quantify prediction suppression using the Physiological Veto Rate (PVR), showing that under domain shift JANUS reduces high-confidence false positives substantially more often than true positives. Together, these results are consistent with physically grounded conditioning that improves both discrimination and reliability in CT triage. Code is made publicly available at github repository https://github.com/lavsendahal/janus and model weights are at https://huggingface.co/lavsendahal/janus.
Editorial analysis
A structured set of objections, weighed in public.
Referee Report
Summary. The paper introduces JANUS, a physiology-guided dual-stream architecture for CT triage that conditions visual embeddings from Vision Transformers on macro-radiomic priors extracted via a second stream using Anatomically Guided Gating. It reports macro-AUROC 0.88 and AUPRC 0.74 on the MERLIN test set (N=5082), outperforming reproduced baselines, with generalization to an external dataset (N=2000, AUROC 0.87). Largest gains occur on size- and attenuation-defined findings, with improved calibration and reduced high-confidence false positives (more than true positives) under domain shift as quantified by the new Physiological Veto Rate (PVR) metric. Code and model weights are released publicly.
Significance. If the central claims hold, the work offers a promising direction for robust medical imaging models by incorporating quantitative imaging biomarkers to mitigate distribution shift in CT triage. The empirical gains on external data and public code release are strengths that could support reproducibility and clinical translation. The introduction of PVR as a reliability metric adds a useful lens, though its novelty requires careful validation to establish broader significance beyond standard AUROC/AUPRC.
major comments (3)
- [Methods (dual-stream architecture and prior extraction)] The accuracy of the macro-radiomic priors under distribution shift is assumed rather than measured: no standalone metrics (e.g., Dice scores for anatomical segmentation or regression error on radiomic quantities such as size/attenuation) are reported for the prior-extraction stream on the external N=2000 dataset. This is load-bearing for the claim that gating is physiologically grounded rather than a generic regularizer.
- [Results (ablation studies and external validation)] No ablation isolates the gating operation by corrupting the priors (e.g., via controlled noise or shift on the second stream) while leaving the visual stream intact. Without this, the reported reductions in false positives via PVR and gains on size/attenuation findings cannot be attributed specifically to the anatomy-conditioned mechanism.
- [§5 (PVR definition and results)] The Physiological Veto Rate (PVR) is a newly introduced metric central to the reliability claims under domain shift, yet its exact definition, computation (including thresholds for 'high-confidence' and veto criteria), and pseudocode are not provided in sufficient detail for independent verification or reproduction.
minor comments (2)
- [Abstract] Abstract contains a minor formatting error: 'external dataset N=2000; AUROC 0.87)' is missing the opening parenthesis before N.
- [Methods (experimental setup)] Ensure all data splits, exclusion criteria, and hyperparameter choices are explicitly labeled as pre-specified (vs. post-hoc) in the Methods to strengthen the external validation claims.
Simulated Author's Rebuttal
We thank the referee for the thoughtful and constructive comments, which highlight important aspects of our methodology and evaluation. We address each major comment below and will incorporate revisions to strengthen the manuscript.
read point-by-point responses
-
Referee: [Methods (dual-stream architecture and prior extraction)] The accuracy of the macro-radiomic priors under distribution shift is assumed rather than measured: no standalone metrics (e.g., Dice scores for anatomical segmentation or regression error on radiomic quantities such as size/attenuation) are reported for the prior-extraction stream on the external N=2000 dataset. This is load-bearing for the claim that gating is physiologically grounded rather than a generic regularizer.
Authors: We agree that standalone evaluation of the prior-extraction stream on the external dataset is necessary to substantiate the physiological grounding of the gating mechanism. In the revised manuscript, we will add Dice scores for anatomical segmentation accuracy and regression errors (MAE) for size and attenuation estimates computed on the N=2000 external set. These metrics will be reported in a new subsection of the Methods or Results to directly address this point. revision: yes
-
Referee: [Results (ablation studies and external validation)] No ablation isolates the gating operation by corrupting the priors (e.g., via controlled noise or shift on the second stream) while leaving the visual stream intact. Without this, the reported reductions in false positives via PVR and gains on size/attenuation findings cannot be attributed specifically to the anatomy-conditioned mechanism.
Authors: We acknowledge the value of an ablation that specifically isolates the gating operation. In the revision, we will introduce a controlled ablation where macro-radiomic priors are corrupted with Gaussian noise or simulated distribution shift while the visual ViT stream remains unchanged. We will report the resulting changes in PVR, AUROC on size/attenuation findings, and calibration metrics to demonstrate the specific contribution of the anatomy-conditioned gating. revision: yes
-
Referee: [§5 (PVR definition and results)] The Physiological Veto Rate (PVR) is a newly introduced metric central to the reliability claims under domain shift, yet its exact definition, computation (including thresholds for 'high-confidence' and veto criteria), and pseudocode are not provided in sufficient detail for independent verification or reproduction.
Authors: We agree that the PVR metric requires fuller specification for reproducibility. In the revised manuscript, we will expand §5 with the precise mathematical definition of PVR, explicit thresholds for high-confidence predictions (e.g., probability > 0.9), the veto criteria, and a step-by-step computation procedure. Pseudocode will be added to the Methods section or as an appendix to enable independent verification. revision: yes
Circularity Check
No circularity: claims rest on held-out empirical metrics
full rationale
The paper reports macro-AUROC 0.88 / AUPRC 0.74 on the MERLIN test set (N=5082) and AUROC 0.87 on an external set (N=2000). These are direct performance measurements on independent data, not quantities obtained by fitting parameters to the same observations and then re-deriving the metric. The dual-stream gating architecture is described as a design choice conditioned on macro-radiomic priors; no equation or result is shown to reduce to its own inputs by construction, nor does any central claim rely on a self-citation chain that itself lacks external verification. The derivation chain is therefore self-contained against external benchmarks.
Axiom & Free-Parameter Ledger
Lean theorems connected to this paper
-
IndisputableMonolith/Cost/FunctionalEquation.leanwashburn_uniqueness_aczel unclear?
unclearRelation between the paper passage and the cited Recognition theorem.
JANUS integrates the full macro-radiomic phenotype space with a visual stream via Anatomically Guided Gating: a disease-specific multiplicative bottleneck that modulates visual evidence through scalar priors
-
IndisputableMonolith/Foundation/BranchSelection.leanbranch_selection unclear?
unclearRelation between the paper passage and the cited Recognition theorem.
We introduce the Physiological Veto Rate (PVR) to quantify suppression of high-confidence baseline false positives
What do these tags mean?
- matches
- The paper's claim is directly supported by a theorem in the formal canon.
- supports
- The theorem supports part of the paper's argument, but the paper may add assumptions or extra steps.
- extends
- The paper goes beyond the formal theorem; the theorem is a base layer rather than the whole result.
- uses
- The paper appears to rely on the theorem as machinery.
- contradicts
- The paper's claim conflicts with a theorem or certificate in the canon.
- unclear
- Pith found a possible connection, but the passage is too broad, indirect, or ambiguous to say the theorem truly supports the claim.
Reference graph
Works this paper leans on
-
[1]
arXiv preprint arXiv:2511.17803 (2025)
Agrawal, K.K., Liu, L., Lian, L., Nercessian, M., Harguindeguy, N., Wu, Y., Mikhael, P., Lin, G., Sequist, L.V., Fintelmann, F., et al.: Pillar-0: A new frontier for radiology foundation models. arXiv preprint arXiv:2511.17803 (2025)
-
[2]
Research Square pp
Blankemeier, L., Cohen, J.P., Kumar, A., Van Veen, D., Gardezi, S.J.S., Paschali, M., Chen, Z., Delbrouck, J.B., Reis, E., Truyts, C., et al.: Merlin: A vision language foundation model for 3d computed tomography. Research Square pp. rs–3 (2024)
2024
-
[3]
arXiv preprint arXiv:2601.13385 (2026)
Dahal, L., Bhandari, Y., Rubin, G.D., Lo, J.Y.: Organ-aware attention improves ct triage and classification. arXiv preprint arXiv:2601.13385 (2026)
-
[4]
Medical Image Analysis 103, 103636 (2025)
Dahal, L., Ghojoghnejad, M., Vancoillie, L., Ghosh, D., Bhandari, Y., Kim, D., Ho, F.C., Tushar, F.I., Luo, S., Lafata, K.J., et al.: Xcat 3.0: A comprehensive library of personalized digital twins derived from ct scans. Medical Image Analysis 103, 103636 (2025)
2025
-
[5]
Dahal, L., Lo, J.Y.: Ct-idp: Segmentation-derived quantitative phe- notypes for interpretable abdominal ct disease classification (2026), https://arxiv.org/abs/2605.09002
work page internal anchor Pith review Pith/arXiv arXiv 2026
-
[6]
Nature Machine In- telligence2(11), 665–673 (2020)
Geirhos, R., Jacobsen, J.H., Michaelis, C., Zemel, R., Brendel, W., Bethge, M., Wichmann, F.A.: Shortcut learning in deep neural networks. Nature Machine In- telligence2(11), 665–673 (2020)
2020
-
[7]
World journal of nephrology3(4), 282 (2014)
Gücük, A., Üyetürk, U.: Usefulness of hounsfield unit and density in the assessment and treatment of urinary stones. World journal of nephrology3(4), 282 (2014)
2014
-
[8]
arXiv preprint arXiv:2409.04368 (2024) 10 Dahal et al
Guo, B., Lu, D., Szumel, G., Gui, R., Wang, T., Konz, N., Mazurowski, M.A.: The impact of scanner domain shift on deep learning performance in medical imaging: an experimental study. arXiv preprint arXiv:2409.04368 (2024) 10 Dahal et al
-
[9]
In: International conference on machine learning
Guo, C., Pleiss, G., Sun, Y., Weinberger, K.Q.: On calibration of modern neural networks. In: International conference on machine learning. pp. 1321–1330. PMLR (2017)
2017
-
[10]
In: Proceedings of the IEEE conference on computer vision and pattern recognition
Hu, J., Shen, L., Sun, G.: Squeeze-and-excitation networks. In: Proceedings of the IEEE conference on computer vision and pattern recognition. pp. 7132–7141 (2018)
2018
-
[11]
Medical image analysis85, 102762 (2023)
Li, J., Chen, J., Tang, Y., Wang, C., Landman, B.A., Zhou, S.K.: Transforming medicalimagingwithtransformers?acomparativereviewofkeyproperties,current progresses, and future perspectives. Medical image analysis85, 102762 (2023)
2023
-
[12]
Academic radiology 19(5), 588–598 (2012)
Linguraru, M.G., Sandberg, J.K., Jones, E.C., Petrick, N., Summers, R.M.: Assess- ing hepatomegaly: automated volumetric analysis of the liver. Academic radiology 19(5), 588–598 (2012)
2012
-
[13]
European radiology35(11), 6879–6893 (2025)
Momin, E., Cook, T., Gershon, G., Barr, J., De Cecco, C.N., van Assen, M.: Sys- tematic review on the impact of deep learning-driven worklist triage on radiology workflow and clinical outcomes. European radiology35(11), 6879–6893 (2025)
2025
-
[14]
arXiv preprint arXiv:2501.09001 (2025)
Pai, S., Hadzic, I., Bontempi, D., Bressem, K., Kann, B.H., Fedorov, A., Mak, R.H., Aerts, H.J.: Vision foundation models for computed tomography. arXiv preprint arXiv:2501.09001 (2025)
-
[15]
Cancers15(9), 2573 (2023)
Paudyal, R., Shah, A.D., Akin, O., Do, R.K., Konar, A.S., Hatzoglou, V., Mah- mood, U., Lee, N., Wong, R.J., Banerjee, S., et al.: Artificial intelligence in ct and mr imaging for oncological applications. Cancers15(9), 2573 (2023)
2023
-
[16]
In: Proceedings of the AAAI conference on artificial intelligence
Perez, E., Strub, F., De Vries, H., Dumoulin, V., Courville, A.: Film: Visual rea- soning with a general conditioning layer. In: Proceedings of the AAAI conference on artificial intelligence. vol. 32 (2018)
2018
-
[17]
Radiopaedia.org: Splenomegaly, https://radiopaedia.org/articles/splenomegaly, accessed: 2026-02-15
2026
-
[18]
Sellergren, A., Kazemzadeh, S., Jaroensri, T., Kiraly, A., Traverse, M., Kohlberger, T., Xu, S., Jamil, F., Hughes, C., Lau, C., et al.: Medgemma technical report. arXiv preprint arXiv:2507.05201 (2025)
work page internal anchor Pith review Pith/arXiv arXiv 2025
-
[19]
arXiv preprint arXiv:2501.14548 (2025)
Shui, Z., Zhang, J., Cao, W., Wang, S., Guo, R., Lu, L., Yang, L., Ye, X., Liang, T., Zhang, Q., et al.: Large-scale and fine-grained vision-language pre-training for enhanced ct image understanding. arXiv preprint arXiv:2501.14548 (2025)
-
[20]
Siméoni, O., Vo, H.V., Seitzer, M., Baldassarre, F., Oquab, M., Jose, C., Khali- dov, V., Szafraniec, M., Yi, S., Ramamonjisoa, M., et al.: Dinov3. arXiv preprint arXiv:2508.10104 (2025)
work page internal anchor Pith review Pith/arXiv arXiv 2025
-
[21]
Radiology: Artificial Intelligence 5(5), e230024 (2023)
Wasserthal, J., Breit, H.C., Meyer, M.T., Pradella, M., Hinck, D., Sauter, A.W., Heye, T., Boll, D.T., Cyriac, J., Yang, S., et al.: Totalsegmentator: robust segmen- tation of 104 anatomic structures in ct images. Radiology: Artificial Intelligence 5(5), e230024 (2023)
2023
-
[22]
healthcare (basel) (2021)
Winder, M., Owczarek, A., Chudek, J., Pilch-Kowalczyk, J., Baron, J.: Are we overdoing it? changes in diagnostic imaging workload during the years 2010-2020 including the impact of the sars-cov-2 pandemic. healthcare (basel) (2021)
2010
-
[23]
Yang, A., Li, A., Yang, B., Zhang, B., Hui, B., Zheng, B., Yu, B., Gao, C., Huang, C., Lv, C., et al.: Qwen3 technical report. arXiv preprint arXiv:2505.09388 (2025)
work page internal anchor Pith review Pith/arXiv arXiv 2025
discussion (0)
Sign in with ORCID, Apple, or X to comment. Anyone can read and Pith papers without signing in.