arxiv: 2605.13813 · v1 · submitted 2026-05-13 · 💻 cs.CV

Recognition: 2 theorem links

· Lean Theorem

JANUS: Anatomy-Conditioned Gating for Robust CT Triage Under Distribution Shift

Lavsen Dahal , Yubraj Bhandari , Geoffrey Rubin , Joseph Y. Lo

Authors on Pith no claims yet

Pith reviewed 2026-05-14 19:20 UTC · model grok-4.3

classification 💻 cs.CV

keywords CT triagedistribution shiftdual-stream architectureanatomically guided gatingmacro-radiomic priorsmedical imagingVision Transformercalibration

0 comments

The pith

A dual-stream model conditions visual CT embeddings on macro-radiomic priors via anatomically guided gating to improve triage accuracy and reduce false positives under distribution shift.

A machine-rendered reading of the paper's core claim, the machinery that carries it, and where it could break.

The paper presents JANUS as a physiology-guided architecture that pairs a visual stream with a second stream of macro-radiomic priors and uses gating to condition the visual embeddings on anatomical information. On a large internal test set the method reaches macro-AUROC 0.88 and AUPRC 0.74 while outperforming reproduced baselines; the same model maintains 0.87 AUROC on an external dataset of 2000 cases. Gains are largest for findings defined by size and attenuation, calibration improves on both datasets, and the Physiological Veto Rate shows that high-confidence false positives are suppressed more often than true positives when institutional shift occurs.

Core claim

JANUS is a physiology-guided dual-stream architecture that conditions visual embeddings on macro-radiomic priors via Anatomically Guided Gating. On the MERLIN test set of 5082 cases it attains macro-AUROC 0.88 and AUPRC 0.74, outperforming all reproduced baselines, and generalizes to an external dataset of 2000 cases with AUROC 0.87, with the largest gains on size- and attenuation-defined findings plus improved calibration.

What carries the argument

Anatomically Guided Gating, which fuses a visual embedding stream with macro-radiomic priors extracted from a parallel stream to supply physically grounded conditioning that modulates the final prediction.

If this is right

Findings defined by quantitative size and attenuation receive the largest accuracy gains.
Calibration improves on both the internal MERLIN set and the external set.
High-confidence false positives are reduced substantially more often than true positives under domain shift, as quantified by the Physiological Veto Rate.
The gating operation provides a measurable mechanism for physically grounded prediction suppression.

Where Pith is reading between the lines

These are editorial extensions of the paper, not claims the author makes directly.

Similar prior-conditioned gating could be tested on other quantitative imaging tasks such as PET or MRI where attenuation or size biomarkers are available.
The approach may lower overdiagnosis rates in screening programs by preferentially vetoing spurious high-confidence detections.
Extending the second stream to include additional quantitative biomarkers could further tighten performance on rare or low-contrast pathologies.

Load-bearing premise

The macro-radiomic priors extracted in the second stream stay accurate and unbiased under the same distribution shifts that degrade the visual stream.

What would settle it

A new external CT dataset in which JANUS either fails to exceed baseline AUROC or shows a higher rate of true-positive suppression than false-positive suppression would falsify the central claim.

Figures

Figures reproduced from arXiv: 2605.13813 by Geoffrey Rubin, Joseph Y. Lo, Lavsen Dahal, Yubraj Bhandari.

**Figure 1.** Figure 1: JANUS Architecture. (a) A 3D CT volume is sampled into N 2.5D trislices; segmentation masks yield macro-radiomic scalar priors. (b) A DINOv3 backbone extracts patch embeddings condensed into a label-specific visual feature zv via OrganMasked Attention Pooling. (c) Scalar priors are projected and sigmoid-bounded to form a physiological gate g. (d) g modulates zv via Hadamard product (⊙), acting as a Physi… view at source ↗

**Figure 2.** Figure 2: External Dataset: Gate behavior and physiological veto. (a) [PITH_FULL_IMAGE:figures/full_fig_p007_2.png] view at source ↗

**Figure 3.** Figure 3: Robustness to scalar corruption. AUROC under 10%, 20%, and 50% corruption of macro-radiomic priors on internal and external datasets. JANUS degrades gracefully and remains above the ViT-Baseline (dotted) at all corruption levels, including 50%, suggesting the multiplicative gate bounds the influence of corrupted inputs. selectivity). Because PVR is computed per label, operating points could be set per pa… view at source ↗

read the original abstract

Automated CT triage requires models that are simultaneously accurate across diverse pathologies and reliable under institutional shift. While Vision Transformers provide strong visual representations, many clinically significant findings are defined by quantitative imaging biomarkers rather than appearance alone. We introduce JANUS, a physiology-guided dual-stream architecture that conditions visual embeddings on macro-radiomic priors via Anatomically Guided Gating. On the MERLIN test set (N=5082), JANUS attains macro-AUROC 0.88 and AUPRC 0.74, outperforming all reproduced baselines. It generalizes to an external dataset N=2000; AUROC 0.87), with the largest gains on findings defined by size and attenuation as well as improved calibration on both datasets. We further quantify prediction suppression using the Physiological Veto Rate (PVR), showing that under domain shift JANUS reduces high-confidence false positives substantially more often than true positives. Together, these results are consistent with physically grounded conditioning that improves both discrimination and reliability in CT triage. Code is made publicly available at github repository https://github.com/lavsendahal/janus and model weights are at https://huggingface.co/lavsendahal/janus.

Editorial analysis

A structured set of objections, weighed in public.

Desk editor's note, referee report, simulated authors' rebuttal, and a circularity audit. Tearing a paper down is the easy half of reading it; the pith above is the substance, this is the friction.

Desk Editor's Note private letter to a colleague

JANUS adds gating of ViT embeddings by macro-radiomic priors and reports external validation gains on CT triage, but the priors' accuracy under shift is not directly measured.

read the letter

The main point is that JANUS uses a second stream to extract macro-radiomic priors and gates the Vision Transformer embeddings with them, aiming for better handling of distribution shift in CT triage. On the internal MERLIN test set it reaches macro-AUROC 0.88 and AUPRC 0.74, beating reproduced baselines, and holds at 0.87 AUROC on an external set of 2000 cases. The biggest lifts appear on size- and attenuation-based findings, with improved calibration and a new Physiological Veto Rate showing more suppression of high-confidence false positives than true positives under shift. Public code and weights are available, which helps reproducibility.

Referee Report

3 major / 2 minor

Summary. The paper introduces JANUS, a physiology-guided dual-stream architecture for CT triage that conditions visual embeddings from Vision Transformers on macro-radiomic priors extracted via a second stream using Anatomically Guided Gating. It reports macro-AUROC 0.88 and AUPRC 0.74 on the MERLIN test set (N=5082), outperforming reproduced baselines, with generalization to an external dataset (N=2000, AUROC 0.87). Largest gains occur on size- and attenuation-defined findings, with improved calibration and reduced high-confidence false positives (more than true positives) under domain shift as quantified by the new Physiological Veto Rate (PVR) metric. Code and model weights are released publicly.

Significance. If the central claims hold, the work offers a promising direction for robust medical imaging models by incorporating quantitative imaging biomarkers to mitigate distribution shift in CT triage. The empirical gains on external data and public code release are strengths that could support reproducibility and clinical translation. The introduction of PVR as a reliability metric adds a useful lens, though its novelty requires careful validation to establish broader significance beyond standard AUROC/AUPRC.

major comments (3)

[Methods (dual-stream architecture and prior extraction)] The accuracy of the macro-radiomic priors under distribution shift is assumed rather than measured: no standalone metrics (e.g., Dice scores for anatomical segmentation or regression error on radiomic quantities such as size/attenuation) are reported for the prior-extraction stream on the external N=2000 dataset. This is load-bearing for the claim that gating is physiologically grounded rather than a generic regularizer.
[Results (ablation studies and external validation)] No ablation isolates the gating operation by corrupting the priors (e.g., via controlled noise or shift on the second stream) while leaving the visual stream intact. Without this, the reported reductions in false positives via PVR and gains on size/attenuation findings cannot be attributed specifically to the anatomy-conditioned mechanism.
[§5 (PVR definition and results)] The Physiological Veto Rate (PVR) is a newly introduced metric central to the reliability claims under domain shift, yet its exact definition, computation (including thresholds for 'high-confidence' and veto criteria), and pseudocode are not provided in sufficient detail for independent verification or reproduction.

minor comments (2)

[Abstract] Abstract contains a minor formatting error: 'external dataset N=2000; AUROC 0.87)' is missing the opening parenthesis before N.
[Methods (experimental setup)] Ensure all data splits, exclusion criteria, and hyperparameter choices are explicitly labeled as pre-specified (vs. post-hoc) in the Methods to strengthen the external validation claims.

Simulated Author's Rebuttal

3 responses · 0 unresolved

We thank the referee for the thoughtful and constructive comments, which highlight important aspects of our methodology and evaluation. We address each major comment below and will incorporate revisions to strengthen the manuscript.

read point-by-point responses

Referee: [Methods (dual-stream architecture and prior extraction)] The accuracy of the macro-radiomic priors under distribution shift is assumed rather than measured: no standalone metrics (e.g., Dice scores for anatomical segmentation or regression error on radiomic quantities such as size/attenuation) are reported for the prior-extraction stream on the external N=2000 dataset. This is load-bearing for the claim that gating is physiologically grounded rather than a generic regularizer.

Authors: We agree that standalone evaluation of the prior-extraction stream on the external dataset is necessary to substantiate the physiological grounding of the gating mechanism. In the revised manuscript, we will add Dice scores for anatomical segmentation accuracy and regression errors (MAE) for size and attenuation estimates computed on the N=2000 external set. These metrics will be reported in a new subsection of the Methods or Results to directly address this point. revision: yes
Referee: [Results (ablation studies and external validation)] No ablation isolates the gating operation by corrupting the priors (e.g., via controlled noise or shift on the second stream) while leaving the visual stream intact. Without this, the reported reductions in false positives via PVR and gains on size/attenuation findings cannot be attributed specifically to the anatomy-conditioned mechanism.

Authors: We acknowledge the value of an ablation that specifically isolates the gating operation. In the revision, we will introduce a controlled ablation where macro-radiomic priors are corrupted with Gaussian noise or simulated distribution shift while the visual ViT stream remains unchanged. We will report the resulting changes in PVR, AUROC on size/attenuation findings, and calibration metrics to demonstrate the specific contribution of the anatomy-conditioned gating. revision: yes
Referee: [§5 (PVR definition and results)] The Physiological Veto Rate (PVR) is a newly introduced metric central to the reliability claims under domain shift, yet its exact definition, computation (including thresholds for 'high-confidence' and veto criteria), and pseudocode are not provided in sufficient detail for independent verification or reproduction.

Authors: We agree that the PVR metric requires fuller specification for reproducibility. In the revised manuscript, we will expand §5 with the precise mathematical definition of PVR, explicit thresholds for high-confidence predictions (e.g., probability > 0.9), the veto criteria, and a step-by-step computation procedure. Pseudocode will be added to the Methods section or as an appendix to enable independent verification. revision: yes

Circularity Check

0 steps flagged

No circularity: claims rest on held-out empirical metrics

full rationale

The paper reports macro-AUROC 0.88 / AUPRC 0.74 on the MERLIN test set (N=5082) and AUROC 0.87 on an external set (N=2000). These are direct performance measurements on independent data, not quantities obtained by fitting parameters to the same observations and then re-deriving the metric. The dual-stream gating architecture is described as a design choice conditioned on macro-radiomic priors; no equation or result is shown to reduce to its own inputs by construction, nor does any central claim rely on a self-citation chain that itself lacks external verification. The derivation chain is therefore self-contained against external benchmarks.

Axiom & Free-Parameter Ledger

0 free parameters · 0 axioms · 0 invented entities

The paper introduces one new architectural component (anatomically guided gating) and one new evaluation metric (PVR). No explicit free parameters beyond standard neural-network hyperparameters are mentioned in the abstract. No new physical entities are postulated.

pith-pipeline@v0.9.0 · 5521 in / 1277 out tokens · 23415 ms · 2026-05-14T19:20:56.085759+00:00 · methodology

discussion (0)

Lean theorems connected to this paper

Citations machine-checked in the Pith Canon. Every link opens the source theorem in the public Lean library.

IndisputableMonolith/Cost/FunctionalEquation.lean washburn_uniqueness_aczel unclear

?

unclear
Relation between the paper passage and the cited Recognition theorem.

JANUS integrates the full macro-radiomic phenotype space with a visual stream via Anatomically Guided Gating: a disease-specific multiplicative bottleneck that modulates visual evidence through scalar priors
IndisputableMonolith/Foundation/BranchSelection.lean branch_selection unclear

?

unclear
Relation between the paper passage and the cited Recognition theorem.

We introduce the Physiological Veto Rate (PVR) to quantify suppression of high-confidence baseline false positives

What do these tags mean?

matches: The paper's claim is directly supported by a theorem in the formal canon.
supports: The theorem supports part of the paper's argument, but the paper may add assumptions or extra steps.
extends: The paper goes beyond the formal theorem; the theorem is a base layer rather than the whole result.
uses: The paper appears to rely on the theorem as machinery.
contradicts: The paper's claim conflicts with a theorem or certificate in the canon.
unclear: Pith found a possible connection, but the passage is too broad, indirect, or ambiguous to say the theorem truly supports the claim.

Reference graph

Works this paper leans on

23 extracted references · 9 canonical work pages · 4 internal anchors

[1]

arXiv preprint arXiv:2511.17803 (2025)

Agrawal, K.K., Liu, L., Lian, L., Nercessian, M., Harguindeguy, N., Wu, Y., Mikhael, P., Lin, G., Sequist, L.V., Fintelmann, F., et al.: Pillar-0: A new frontier for radiology foundation models. arXiv preprint arXiv:2511.17803 (2025)

work page arXiv 2025
[2]

Research Square pp

Blankemeier, L., Cohen, J.P., Kumar, A., Van Veen, D., Gardezi, S.J.S., Paschali, M., Chen, Z., Delbrouck, J.B., Reis, E., Truyts, C., et al.: Merlin: A vision language foundation model for 3d computed tomography. Research Square pp. rs–3 (2024)

2024
[3]

arXiv preprint arXiv:2601.13385 (2026)

Dahal, L., Bhandari, Y., Rubin, G.D., Lo, J.Y.: Organ-aware attention improves ct triage and classification. arXiv preprint arXiv:2601.13385 (2026)

work page arXiv 2026
[4]

Medical Image Analysis 103, 103636 (2025)

Dahal, L., Ghojoghnejad, M., Vancoillie, L., Ghosh, D., Bhandari, Y., Kim, D., Ho, F.C., Tushar, F.I., Luo, S., Lafata, K.J., et al.: Xcat 3.0: A comprehensive library of personalized digital twins derived from ct scans. Medical Image Analysis 103, 103636 (2025)

2025
[5]

Dahal, L., Lo, J.Y.: Ct-idp: Segmentation-derived quantitative phe- notypes for interpretable abdominal ct disease classification (2026), https://arxiv.org/abs/2605.09002

work page internal anchor Pith review Pith/arXiv arXiv 2026
[6]

Nature Machine In- telligence2(11), 665–673 (2020)

Geirhos, R., Jacobsen, J.H., Michaelis, C., Zemel, R., Brendel, W., Bethge, M., Wichmann, F.A.: Shortcut learning in deep neural networks. Nature Machine In- telligence2(11), 665–673 (2020)

2020
[7]

World journal of nephrology3(4), 282 (2014)

Gücük, A., Üyetürk, U.: Usefulness of hounsfield unit and density in the assessment and treatment of urinary stones. World journal of nephrology3(4), 282 (2014)

2014
[8]

arXiv preprint arXiv:2409.04368 (2024) 10 Dahal et al

Guo, B., Lu, D., Szumel, G., Gui, R., Wang, T., Konz, N., Mazurowski, M.A.: The impact of scanner domain shift on deep learning performance in medical imaging: an experimental study. arXiv preprint arXiv:2409.04368 (2024) 10 Dahal et al

work page arXiv 2024
[9]

In: International conference on machine learning

Guo, C., Pleiss, G., Sun, Y., Weinberger, K.Q.: On calibration of modern neural networks. In: International conference on machine learning. pp. 1321–1330. PMLR (2017)

2017
[10]

In: Proceedings of the IEEE conference on computer vision and pattern recognition

Hu, J., Shen, L., Sun, G.: Squeeze-and-excitation networks. In: Proceedings of the IEEE conference on computer vision and pattern recognition. pp. 7132–7141 (2018)

2018
[11]

Medical image analysis85, 102762 (2023)

Li, J., Chen, J., Tang, Y., Wang, C., Landman, B.A., Zhou, S.K.: Transforming medicalimagingwithtransformers?acomparativereviewofkeyproperties,current progresses, and future perspectives. Medical image analysis85, 102762 (2023)

2023
[12]

Academic radiology 19(5), 588–598 (2012)

Linguraru, M.G., Sandberg, J.K., Jones, E.C., Petrick, N., Summers, R.M.: Assess- ing hepatomegaly: automated volumetric analysis of the liver. Academic radiology 19(5), 588–598 (2012)

2012
[13]

European radiology35(11), 6879–6893 (2025)

Momin, E., Cook, T., Gershon, G., Barr, J., De Cecco, C.N., van Assen, M.: Sys- tematic review on the impact of deep learning-driven worklist triage on radiology workflow and clinical outcomes. European radiology35(11), 6879–6893 (2025)

2025
[14]

arXiv preprint arXiv:2501.09001 (2025)

Pai, S., Hadzic, I., Bontempi, D., Bressem, K., Kann, B.H., Fedorov, A., Mak, R.H., Aerts, H.J.: Vision foundation models for computed tomography. arXiv preprint arXiv:2501.09001 (2025)

work page arXiv 2025
[15]

Cancers15(9), 2573 (2023)

Paudyal, R., Shah, A.D., Akin, O., Do, R.K., Konar, A.S., Hatzoglou, V., Mah- mood, U., Lee, N., Wong, R.J., Banerjee, S., et al.: Artificial intelligence in ct and mr imaging for oncological applications. Cancers15(9), 2573 (2023)

2023
[16]

In: Proceedings of the AAAI conference on artificial intelligence

Perez, E., Strub, F., De Vries, H., Dumoulin, V., Courville, A.: Film: Visual rea- soning with a general conditioning layer. In: Proceedings of the AAAI conference on artificial intelligence. vol. 32 (2018)

2018
[17]

Radiopaedia.org: Splenomegaly, https://radiopaedia.org/articles/splenomegaly, accessed: 2026-02-15

2026
[18]

MedGemma Technical Report

Sellergren, A., Kazemzadeh, S., Jaroensri, T., Kiraly, A., Traverse, M., Kohlberger, T., Xu, S., Jamil, F., Hughes, C., Lau, C., et al.: Medgemma technical report. arXiv preprint arXiv:2507.05201 (2025)

work page internal anchor Pith review Pith/arXiv arXiv 2025
[19]

arXiv preprint arXiv:2501.14548 (2025)

Shui, Z., Zhang, J., Cao, W., Wang, S., Guo, R., Lu, L., Yang, L., Ye, X., Liang, T., Zhang, Q., et al.: Large-scale and fine-grained vision-language pre-training for enhanced ct image understanding. arXiv preprint arXiv:2501.14548 (2025)

work page arXiv 2025
[20]

DINOv3

Siméoni, O., Vo, H.V., Seitzer, M., Baldassarre, F., Oquab, M., Jose, C., Khali- dov, V., Szafraniec, M., Yi, S., Ramamonjisoa, M., et al.: Dinov3. arXiv preprint arXiv:2508.10104 (2025)

work page internal anchor Pith review Pith/arXiv arXiv 2025
[21]

Radiology: Artificial Intelligence 5(5), e230024 (2023)

Wasserthal, J., Breit, H.C., Meyer, M.T., Pradella, M., Hinck, D., Sauter, A.W., Heye, T., Boll, D.T., Cyriac, J., Yang, S., et al.: Totalsegmentator: robust segmen- tation of 104 anatomic structures in ct images. Radiology: Artificial Intelligence 5(5), e230024 (2023)

2023
[22]

healthcare (basel) (2021)

Winder, M., Owczarek, A., Chudek, J., Pilch-Kowalczyk, J., Baron, J.: Are we overdoing it? changes in diagnostic imaging workload during the years 2010-2020 including the impact of the sars-cov-2 pandemic. healthcare (basel) (2021)

2010
[23]

Qwen3 Technical Report

Yang, A., Li, A., Yang, B., Zhang, B., Hui, B., Zheng, B., Yu, B., Gao, C., Huang, C., Lv, C., et al.: Qwen3 technical report. arXiv preprint arXiv:2505.09388 (2025)

work page internal anchor Pith review Pith/arXiv arXiv 2025