Recognition: unknown
Enhancing Fine-Grained Spatial Grounding in 3D CT Report Generation via Discriminative Guidance
Pith reviewed 2026-05-10 15:51 UTC · model grok-4.3
The pith
DCP-PD distills fine-grained cues from text reports and applies prompt dropout to improve spatial grounding in 3D CT report generation.
A machine-rendered reading of the paper's core claim, the machinery that carries it, and where it could break.
Core claim
DCP-PD distills fine-grained cues from free-text reports to guide report generation while using prompt dropout to mitigate shortcut learning, achieving state-of-the-art macro F1 of 0.603 on CT-RATE and raising out-of-distribution F1 from 0.266 to 0.503 on Rad-ChestCT; the same framework introduces a presence-laterality-lobe question protocol that reveals persistent challenges in fine-grained spatial localization even among high-scoring models.
What carries the argument
Discriminative Cue-Prompting with Prompt Dropout (DCP-PD), a plug-and-play framework that extracts fine-grained pathology cues from reports to supply location-specific supervision and prevents shortcut learning through prompt dropout.
If this is right
- Macro F1 on CT-RATE rises from 0.501 to 0.603.
- Out-of-distribution F1 on Rad-ChestCT nearly doubles from 0.266 to 0.503.
- Models show measurable improvement on hierarchical location questions covering presence, laterality, and lobe.
- Shortcut reliance is reduced without harming overall report quality.
- The hierarchical evaluation protocol provides a more diagnostic check for spatial grounding than existing lexical or entity-overlap metrics.
Where Pith is reading between the lines
- The cue-distillation idea could be transferred to other volumetric imaging modalities such as MRI or PET where free-text reports also contain unexploited location detail.
- Prompt dropout may serve as a general regularizer for other vision-language report generators to reduce text-only bias.
- The presence-laterality-lobe protocol offers a template for creating location-specific test suites in any medical VLM benchmark.
Load-bearing premise
Fine-grained cues distilled from free-text reports supply accurate, unbiased supervision for the actual locations of pathologies in the corresponding CT volumes.
What would settle it
A test set in which report text is deliberately edited to contain incorrect laterality or lobe information while the CT volumes remain unchanged; if the model still produces reports that match the altered text instead of the image, the cue-distillation claim is falsified.
Figures
read the original abstract
Vision--language models (VLMs) for radiology report generation (RRG) can produce long-form chest CT reports from volumetric scans and show strong potential to improve radiology workflow efficiency and consistency. However, existing methods face two key limitations: (i) training supervision is often coarse, aligning a whole CT volume with a full free-text report without explicit alignment for fine-grained attributes or pathology locations; and (ii) evaluation is typically holistic (lexical overlap, entity matching, or LLM-as-a-judge scores) and not diagnostic for spatial grounding. We propose \emph{Discriminative Cue-Prompting with Prompt Dropout (DCP-PD)}, a plug-and-play framework that distills fine-grained cues from free-text reports and uses them to guide report generation while mitigating shortcut reliance via prompt dropout. DCP-PD achieves state-of-the-art performance on CT-RATE, improving macro F1 from $=0.501$ to $0.603$ (20% relative), and substantially boosts out-of-distribution performance on Rad-ChestCT from F1 $=0.266$ to $0.503$ (89% relative). Finally, we introduce a hierarchical, location-aware question-set protocol (presence $\rightarrow$ laterality $\rightarrow$ lobe) to directly assess pathology-location grounding, showing that fine-grained spatial localization remains challenging even for models that score highly on current benchmarks.
Editorial analysis
A structured set of objections, weighed in public.
Referee Report
Summary. The manuscript proposes DCP-PD (Discriminative Cue-Prompting with Prompt Dropout), a plug-and-play framework that distills fine-grained cues from free-text radiology reports to provide explicit supervision for pathology locations and laterality in 3D CT report generation. Prompt dropout is introduced to reduce shortcut learning. The work reports state-of-the-art macro F1 scores on CT-RATE (0.501 to 0.603) and large out-of-distribution gains on Rad-ChestCT (0.266 to 0.503), while introducing a new hierarchical location-aware question-set evaluation protocol (presence, laterality, lobe) to diagnose spatial grounding.
Significance. If the reported gains are attributable to improved spatial grounding rather than incidental effects, the framework and especially the new evaluation protocol could become useful tools for developing and assessing fine-grained VLMs in radiology. The large relative OOD improvement is noteworthy and, if reproducible, would strengthen claims about robustness.
major comments (3)
- [Abstract / Methods] The central performance claims rest on the unvalidated assumption that cues automatically distilled from free-text reports supply accurate, unbiased supervision for pathology locations and laterality. Free-text reports frequently omit explicit laterality or lobe information or use ambiguous phrasing; without an independent validation (e.g., comparison of distilled cues against expert-annotated bounding boxes or a held-out set of location labels), the 20% and 89% relative F1 gains cannot be confidently attributed to better spatial grounding.
- [Experiments / Ablation studies] Prompt dropout is presented as the mechanism that prevents shortcut learning, yet the manuscript provides no ablation that isolates its contribution to both the new hierarchical grounding metrics and standard report-quality scores (RadGraph F1, clinical correctness). Without this, it remains possible that the observed improvements stem from generic regularization rather than the intended discriminative guidance.
- [Evaluation Protocol] The hierarchical question-set protocol is a constructive addition, but its reliability depends on how questions are generated from reports. The manuscript should detail the exact prompting or parsing procedure used to create presence/laterality/lobe questions and report inter-annotator or consistency statistics; otherwise the protocol itself risks inheriting the same ambiguities present in the original reports.
minor comments (2)
- [Abstract] The abstract states 'macro F1' without specifying the exact label set or averaging procedure; this should be clarified in the main text and tables for reproducibility.
- [Methods] Implementation details (exact distillation prompt templates, dropout rate schedule, and how the cue embeddings are injected into the VLM) are referenced but not fully specified; adding them would aid replication.
Simulated Author's Rebuttal
We thank the referee for the constructive feedback. We address each major comment below with clarifications and commitments to revisions that strengthen the attribution of gains to spatial grounding and the reliability of our contributions.
read point-by-point responses
-
Referee: [Abstract / Methods] The central performance claims rest on the unvalidated assumption that cues automatically distilled from free-text reports supply accurate, unbiased supervision for pathology locations and laterality. Free-text reports frequently omit explicit laterality or lobe information or use ambiguous phrasing; without an independent validation (e.g., comparison of distilled cues against expert-annotated bounding boxes or a held-out set of location labels), the 20% and 89% relative F1 gains cannot be confidently attributed to better spatial grounding.
Authors: We recognize that free-text reports can contain omissions and ambiguities regarding laterality and lobe. DCP-PD distills cues directly from these reports to supply explicit location and laterality supervision during training, and the hierarchical evaluation protocol (presence → laterality → lobe) is introduced precisely to diagnose whether these cues translate into improved spatial grounding in generated reports. The large relative OOD gains on Rad-ChestCT support that the improvements generalize beyond dataset-specific patterns. We agree that direct comparison to expert bounding boxes would provide stronger evidence; since such annotations are unavailable in the benchmarks, we will add an explicit limitations paragraph discussing reliance on report-derived cues. revision: partial
-
Referee: [Experiments / Ablation studies] Prompt dropout is presented as the mechanism that prevents shortcut learning, yet the manuscript provides no ablation that isolates its contribution to both the new hierarchical grounding metrics and standard report-quality scores (RadGraph F1, clinical correctness). Without this, it remains possible that the observed improvements stem from generic regularization rather than the intended discriminative guidance.
Authors: We agree that an ablation isolating prompt dropout is required to confirm its specific role. In the revised manuscript we will add ablation experiments that separately quantify the contribution of prompt dropout to the hierarchical location-aware metrics (presence, laterality, lobe) as well as to standard report-quality metrics including RadGraph F1 and clinical correctness. These results will clarify whether gains arise from the discriminative guidance mechanism rather than generic regularization. revision: yes
-
Referee: [Evaluation Protocol] The hierarchical question-set protocol is a constructive addition, but its reliability depends on how questions are generated from reports. The manuscript should detail the exact prompting or parsing procedure used to create presence/laterality/lobe questions and report inter-annotator or consistency statistics; otherwise the protocol itself risks inheriting the same ambiguities present in the original reports.
Authors: We will expand the methods section with the exact prompting templates and parsing rules used to derive the presence, laterality, and lobe questions from the source reports. We will also report consistency statistics (e.g., agreement across repeated parsing runs and sensitivity to prompt variations) to demonstrate the protocol's reliability and mitigate concerns about inheriting report ambiguities. revision: yes
- Independent validation of distilled cues against expert-annotated bounding boxes or held-out location labels, as no such annotations exist in the CT-RATE or Rad-ChestCT datasets.
Circularity Check
No significant circularity in derivation or claims
full rationale
The paper proposes an empirical framework (DCP-PD) that distills cues from reports and applies prompt dropout for CT report generation, with all central claims consisting of measured performance gains on external benchmarks (CT-RATE macro F1 0.501→0.603; Rad-ChestCT F1 0.266→0.503) and a new evaluation protocol. No equations, derivations, or self-citations are present that reduce any result to fitted inputs by construction, rename known patterns, or make the core improvement self-definitional. The method is presented as a plug-and-play addition whose value is demonstrated through ablation and out-of-distribution testing rather than analytic closure.
Axiom & Free-Parameter Ledger
axioms (2)
- domain assumption Vision-language models trained on paired CT volumes and reports can be improved by additional fine-grained cue supervision.
- domain assumption Prompt dropout during training prevents shortcut learning while preserving report quality.
invented entities (1)
-
DCP-PD framework
no independent evidence
Reference graph
Works this paper leans on
-
[1]
arXiv preprint arXiv:2503.12355 (2025) 6, 24
Agrawal, K.K., Lian, L., Liu, L., Harguindeguy, N., Li, B., Bick, A., Chung, M., Darrell, T., Yala, A.: Atlas: Multi-scale attention improves long context image modeling. arXiv preprint arXiv:2503.12355 (2025) 6, 24
-
[2]
arXiv preprint arXiv:2511.17803 (2025)
Agrawal, K.K., Liu, L., Lian, L., Nercessian, M., Harguindeguy, N., Wu, Y., Mikhael, P., Lin, G., Sequist, L.V., Fintelmann, F., Darrell, T., Bai, Y., Chung, M., Yala, A.: Pillar-0: A new frontier for radiology foundation models. arXiv preprint arXiv:2511.17803 (2025) 6, 27
-
[3]
Baharoon, M., Luo, L., Moritz, M., Kumar, A., Kim, S.E., Zhang, X., Zhu, M., Alabbad, M.H., Alhazmi, M.S., Mistry, N.P., et al.: Rexgroundingct: A 3d chest ct dataset for segmentation of findings from free-text reports. arXiv preprint arXiv:2507.22030 (2025) 18
-
[4]
In: Proceedings of the acl workshop on intrinsic and extrinsic evaluation measures for machine translation and/or summa- rization
Banerjee, S., Lavie, A.: Meteor: An automatic metric for mt evaluation with im- proved correlation with human judgments. In: Proceedings of the acl workshop on intrinsic and extrinsic evaluation measures for machine translation and/or summa- rization. pp. 65–72 (2005) 3, 9
2005
-
[5]
arXiv preprint arXiv:2406.04449 (2024) 3
Bannur, S., Bouzid, K., Castro, D.C., Schwaighofer, A., Thieme, A., Bond-Taylor, S., Ilse, M., Pérez-García, F., Salvatelli, V., Sharma, H., et al.: Maira-2: Grounded radiology report generation. arXiv preprint arXiv:2406.04449 (2024) 3
-
[6]
Research Square pp
Blankemeier, L., Cohen, J.P., Kumar, A., Van Veen, D., Gardezi, S.J.S., Paschali, M., Chen, Z., Delbrouck, J.B., Reis, E., Truyts, C., et al.: Merlin: A vision language foundation model for 3d computed tomography. Research Square pp. rs–3 (2024) 11, 12
2024
-
[7]
IEEE Transactions on Medical Imaging (2025) 2, 3
Chen, Z., Bie, Y., Jin, H., Chen, H.: Large language model with region-guided referring and grounding for ct report generation. IEEE Transactions on Medical Imaging (2025) 2, 3
2025
-
[8]
In: International Conference on Medical Image Com- puting and Computer-Assisted Intervention
Chen, Z., Luo, L., Bie, Y., Chen, H.: Dia-llama: Towards large language model- driven ct report generation. In: International Conference on Medical Image Com- puting and Computer-Assisted Intervention. pp. 141–151. Springer (2025) 2, 3
2025
-
[9]
In: The Fourteenth International Conference on Learning Representations 14
Cheng, S., Subramanian, D.: Rethinking radiology report generation: From narra- tive flow to topic-guided findings. In: The Fourteenth International Conference on Learning Representations 14
-
[10]
Comanici, G., Bieber, E., Schaekermann, M., Pasupat, I., Sachdeva, N., Dhillon, I., Blistein, M., Ram, O., Zhang, D., Rosen, E., et al.: Gemini 2.5: Pushing the frontier with advanced reasoning, multimodality, long context, and next generation agentic capabilities. arXiv preprint arXiv:2507.06261 (2025) 9
work page internal anchor Pith review Pith/arXiv arXiv 2025
-
[11]
In: Proceedings of the 63rd Annual Meeting of the Association for Computational Linguistics (Volume 1: Long Papers)
Delbrouck, J.B., Xu, J., Moll, J., Thomas, A., Chen, Z., Ostmeier, S., Azhar, A., Li, K.Z., Johnston, A., Bluethgen, C., et al.: Automated structured radiology report generation. In: Proceedings of the 63rd Annual Meeting of the Association for Computational Linguistics (Volume 1: Long Papers). pp. 26813–26829 (2025) 2
2025
-
[12]
DenOtter, T.D., Schubert, J.: Hounsfield unit (2019) 5
2019
-
[13]
In: 2025 IEEE 22nd International Symposium on Biomedical Imaging (ISBI)
Di Piazza, T., Lazarus, C., Nempont, O., Boussel, L.: Ct-agrg: Automated abnormality-guided report generation from 3d chest ct volumes. In: 2025 IEEE 22nd International Symposium on Biomedical Imaging (ISBI). pp. 01–05. IEEE (2025) 11
2025
-
[14]
Zenodo105281(2020) 8
Draelos,R.L.,Dov,D.,Mazurowski,M.A.,Lo,J.Y.,Henao,R.,Rubin,G.D.,Carin, L.: Rad-chestct dataset. Zenodo105281(2020) 8
2020
-
[15]
Goel, A.: Radextract: Radiology report structuring demo using langextract 5 22
-
[16]
In: Linguraru, M.G., Dou, Q., Feragen, A., Giannarou, S., Glocker, B., Lekadir, K., Schnabel, J.A
Hamamci, I.E., Er, S., Menze, B.: Ct2rep: Automated radiology report generation for 3d medical imaging. In: Linguraru, M.G., Dou, Q., Feragen, A., Giannarou, S., Glocker, B., Lekadir, K., Schnabel, J.A. (eds.) Medical Image Computing and Computer Assisted Intervention – MICCAI 2024. pp. 476–486. Springer Nature Switzerland, Cham (2024) 11, 12
2024
-
[17]
arXiv preprint arXiv:2505.17167 (2025) 9
Hamamci, I.E., Er, S., Shit, S., Reynaud, H., Kainz, B., Menze, B.: Crg score: A distribution-aware clinical metric for radiology report generation. arXiv preprint arXiv:2505.17167 (2025) 9
-
[18]
arXiv preprint arXiv:2510.20639 (2025) 11, 12, 33
Hamamci, I.E., Er, S., Shit, S., Reynaud, H., Yang, D., Guo, P., Edgar, M., Xu, D., Kainz, B., Menze, B.: Better tokens for better 3d: Advancing vision-language modeling in 3d medical imaging. arXiv preprint arXiv:2510.20639 (2025) 11, 12, 33
-
[19]
Nature Biomedical Engineering pp
Hamamci, I.E., Er, S., Wang, C., Almas, F., Simsek, A., Esirgun, S., Dogan, I., Durugol, O., Hou, B., Shit, S., Dai, W., Xu, M., Reynaud, H., Dasdelen, M., Wittmann, B., Amiranashvili, T., Simsar, E., Simsar, M., Erdemir, E., Menze, B.: Generalist foundation models from a multimodal dataset for 3d computed tomog- raphy. Nature Biomedical Engineering pp. 1...
2026
-
[20]
Nature Biomedical Engineering pp
Hamamci, I.E., Er, S., Wang, C., Almas, F., Simsek, A.G., Esirgun, S.N., Dogan, I., Durugol, O.F., Hou, B., Shit, S., et al.: Generalist foundation models from a multimodal dataset for 3d computed tomography. Nature Biomedical Engineering pp. 1–19 (2026) 8
2026
-
[21]
Gaussian Error Linear Units (GELUs)
Hendrycks, D., Gimpel, K.: Gaussian error linear units (gelus). arXiv preprint arXiv:1606.08415 (2016) 6
work page internal anchor Pith review Pith/arXiv arXiv 2016
-
[22]
European Radiology Experimental4(1) (Aug 2020)
Hofmanninger, J., Prayer, F., Pan, J., Röhrich, S., Prosch, H., Langs, G.: Auto- matic lung segmentation in routine imaging is primarily a data diversity problem, not a methodology problem. European Radiology Experimental4(1) (Aug 2020). https://doi.org/10.1186/s41747-020-00173-218
-
[23]
Iclr1(2), 3 (2022) 8, 27
Hu, E.J., Shen, Y., Wallis, P., Allen-Zhu, Z., Li, Y., Wang, S., Wang, L., Chen, W., et al.: Lora: Low-rank adaptation of large language models. Iclr1(2), 3 (2022) 8, 27
2022
-
[24]
In: Proceedings of the AAAI conference on artificial intelligence
Jin, H., Che, H., Lin, Y., Chen, H.: Promptmrg: Diagnosis-driven prompts for medical report generation. In: Proceedings of the AAAI conference on artificial intelligence. vol. 38, pp. 2607–2615 (2024) 2, 3
2024
-
[25]
In: Proceedings of the IEEE/CVF International Conference on Computer Vision
Kalisch, H., Hörst, F., Kleesiek, J., Herrmann, K., Seibold, C.: Ct-graph: Hier- archical graph attention network for anatomy-guided ct report generation. In: Proceedings of the IEEE/CVF International Conference on Computer Vision. pp. 6775–6784 (2025) 11
2025
-
[26]
arXiv preprint arXiv:2506.23102 (2025) 3
Kyung, S., Seo, J., Lim, H., Kim, D., Park, H., Sung, J., Kim, J., Jo, W., Nam, Y., Kim, N.: Medregion-ct: region-focused multimodal llm for comprehensive 3d ct report generation. arXiv preprint arXiv:2506.23102 (2025) 3
-
[27]
arXiv preprint arXiv:2412.13558 (2024)
Lee, C., Park, S., Shin, C.I., Choi, W.H., Park, H.J., Lee, J.E., Ye, J.C.: Read like a radiologist: Efficient vision-language model for 3d medical imaging interpretation. arXiv preprint arXiv:2412.13558 (2024) 2, 11
-
[28]
arXiv preprint arXiv:2404.15272 (2024)
Lin, J., Xia, Y., Zhang, J., Yan, K., Cao, K., Lu, L., Luo, J., Zhang, L.: Ct-glip: 3d grounded language-image pretraining with ct scans and radiology reports for full-body scenarios. arXiv preprint arXiv:2404.15272 (2024) 3
-
[29]
Liu, H., Georgescu, B., Zhang, Y., Yoo, Y., Baumgartner, M., Gao, R., Wang, J., Zhao, G., Gibson, E., Comaniciu, D., et al.: Revisiting 2d foundation models for scalable 3d medical image classification. arXiv preprint arXiv:2512.12887 (2025) 24 23
-
[30]
Advances in neural information processing systems36, 34892–34916 (2023) 9
Liu, H., Li, C., Wu, Q., Lee, Y.J.: Visual instruction tuning. Advances in neural information processing systems36, 34892–34916 (2023) 9
2023
-
[31]
Artificial Intelligence in Medicine106, 101878 (2020) 1
Monshi, M.M.A., Poon, J., Chung, V.: Deep learning in generating radiology re- ports: A survey. Artificial Intelligence in Medicine106, 101878 (2020) 1
2020
-
[32]
In: Hegselmann, S., Parziale, A., Shanmugam, D., Tang, S., Asiedu, M.N., Chang, S., Hartvigsen, T., Singh, H
Moor, M., Huang, Q., Wu, S., Yasunaga, M., Dalmia, Y., Leskovec, J., Zakka, C., Reis, E.P., Rajpurkar, P.: Med-flamingo: a multimodal medical few-shot learner. In: Hegselmann, S., Parziale, A., Shanmugam, D., Tang, S., Asiedu, M.N., Chang, S., Hartvigsen, T., Singh, H. (eds.) Proceedings of the 3rd Machine Learning for Health Symposium. Proceedings of Mac...
-
[33]
PMLR (10 Dec 2023) 11
2023
-
[34]
DINOv2: Learning Robust Visual Features without Supervision
Oquab, M., Darcet, T., Moutakanni, T., Vo, H., Szafraniec, M., Khalidov, V., Fernandez, P., Haziza, D., Massa, F., El-Nouby, A., et al.: Dinov2: Learning robust visual features without supervision. arXiv preprint arXiv:2304.07193 (2023) 24
work page internal anchor Pith review Pith/arXiv arXiv 2023
-
[35]
In: Findings of the association for computational linguistics: EMNLP 2024
Ostmeier, S., Xu, J., Chen, Z., Varma, M., Blankemeier, L., Bluethgen, C., Md, A.E.M., Moseley, M., Langlotz, C., Chaudhari, A.S., et al.: Green: Generative radiology report evaluation and error notation. In: Findings of the association for computational linguistics: EMNLP 2024. pp. 374–390 (2024) 3
2024
-
[36]
In: Proceedings of the 40th annual meeting of the Association for Computational Linguistics
Papineni, K., Roukos, S., Ward, T., Zhu, W.J.: Bleu: a method for automatic evaluation of machine translation. In: Proceedings of the 40th annual meeting of the Association for Computational Linguistics. pp. 311–318 (2002) 3, 9
2002
-
[37]
arXiv preprint arXiv:2501.14548 (2025)
Shui, Z., Zhang, J., Cao, W., Wang, S., Guo, R., Lu, L., Yang, L., Ye, X., Liang, T., Zhang, Q., et al.: Large-scale and fine-grained vision-language pre-training for enhanced ct image understanding. arXiv preprint arXiv:2501.14548 (2025) 2, 3
-
[38]
Nature Medicine31(2), 599–608 (2025) 1
Tanno, R., Barrett, D.G., Sellergren, A., Ghaisas, S., Dathathri, S., See, A., Welbl, J., Lau, C., Tu, T., Azizi, S., et al.: Collaboration between clinicians and vision– language models in radiology report generation. Nature Medicine31(2), 599–608 (2025) 1
2025
-
[39]
Team, Q.: Qwen3 technical report (2025),https://arxiv.org/abs/2505.093889
work page internal anchor Pith review Pith/arXiv arXiv 2025
-
[40]
IEEE Transactions on Medical Imaging (2025) 11
Tian, Y., Song, Y.: Feature decomposition via shared low-rank matrix recovery for ct report generation. IEEE Transactions on Medical Imaging (2025) 11
2025
- [41]
-
[42]
Radiology: Artificial Intelligence 5(5), e230024 (2023) 3
Wasserthal, J., Breit, H.C., Meyer, M.T., Pradella, M., Hinck, D., Sauter, A.W., Heye, T., Boll, D.T., Cyriac, J., Yang, S., et al.: Totalsegmentator: robust segmen- tation of 104 anatomic structures in ct images. Radiology: Artificial Intelligence 5(5), e230024 (2023) 3
2023
-
[43]
Wu, C., Zhang, X., Zhang, Y., Wang, Y., Xie, W.: Towards generalist foundation model for radiology (2023) 11
2023
-
[44]
Radiology: Artificial In- telligence4(4), e210258 (2022) 28
Yan, A., McAuley, J., Lu, X., Du, J., Chang, E.Y., Gentili, A., Hsu, C.N.: Radbert: adapting transformer-based language models to radiology. Radiology: Artificial In- telligence4(4), e210258 (2022) 28
2022
-
[45]
Patterns4(9) (2023) 3
Yu, F., Endo, M., Krishnan, R., Pan, I., Tsai, A., Reis, E.P., Fonseca, E.K.U.N., Lee, H.M.H., Abad, Z.S.H., Ng, A.Y., et al.: Evaluating progress in automatic chest x-ray radiology report generation. Patterns4(9) (2023) 3
2023
-
[46]
Zhang, X., Wu, C., Zhao, Z., Lin, W., Zhang, Y., Wang, Y., Xie, W.: Pmc-vqa: Visual instruction tuning for medical visual question answering. arXiv preprint arXiv:2305.10415 (2023) 11 24 Appendix Contents We provide additional details, analyses, and results for our paper in the following sections. –Section A presents additional method details. •Subsection...
discussion (0)
Sign in with ORCID, Apple, or X to comment. Anyone can read and Pith papers without signing in.