Recognition: 2 theorem links
· Lean TheoremMulti-Modal Multi-Agent Reinforcement Learning for Radiology Report Generation
Pith reviewed 2026-05-15 21:48 UTC · model grok-4.3
The pith
Multi-agent reinforcement learning optimizes region-specific and global agents to generate radiology reports with better clinical accuracy.
A machine-rendered reading of the paper's core claim, the machinery that carries it, and where it could break.
Core claim
MARL-Rad trains the entire agentic system on-policy within the radiology workflow. Chest X-ray interpretation is decomposed into region-specific agents and a global integrating agent whose outputs are jointly optimized by reinforcement learning driven by clinically verifiable reward signals. On the MIMIC-CXR and IU X-ray datasets the method reaches state-of-the-art clinical efficacy on RadGraph, CheXbert, and GREEN metrics, raises laterality consistency, produces more accurate and detailed reports, and yields outputs that a blinded clinician evaluation finds clinically comparable to ground-truth reports.
What carries the argument
Decomposition into region-specific multi-modal agents coordinated by a global integrating agent, jointly optimized on-policy via reinforcement learning with clinically verifiable rewards.
If this is right
- Achieves state-of-the-art clinical efficacy scores on RadGraph, CheXbert, and GREEN for MIMIC-CXR and IU X-ray.
- Improves laterality consistency in generated reports.
- Produces more accurate and detailed radiology reports.
- Yields outputs judged clinically comparable to ground-truth reports in blinded clinician evaluation.
Where Pith is reading between the lines
- The same region-plus-global decomposition may improve agentic systems for other medical imaging modalities where local detail and global coherence must be balanced.
- Joint policy optimization could reduce the inconsistencies often seen when fixed language models are assembled into medical report pipelines after training.
- Scaling the number or granularity of region agents offers a testable route to finer report quality on complex or multi-finding cases.
Load-bearing premise
Clinically verifiable rewards can be defined to accurately guide joint optimization of the multi-agent system without introducing biases or failing to capture key aspects of report quality.
What would settle it
A large-scale blinded study in which expert radiologists rate MARL-Rad reports no better than non-optimized agent baselines on diagnostic utility and error rate would falsify the central performance claim.
Figures
read the original abstract
We propose MARL-Rad, a multi-modal multi-agent reinforcement learning framework for radiology report generation that trains the entire agentic system on policy within its deployed radiology workflow. MARL-Rad addresses the limitation of post-hoc agentization, where fixed LLMs are organized into hand-designed agentic workflows without being optimized for their assigned roles. Our framework decomposes chest X-ray interpretation into region-specific agents and a global integrating agent, and jointly optimizes them using clinically verifiable rewards. Experiments on the MIMIC-CXR and IU X-ray datasets show that MARL-Rad consistently improves clinical efficacy metrics such as RadGraph, CheXbert, and GREEN scores, achieving state-of-the-art clinical efficacy performance. Further analyses show that MARL-Rad improves laterality consistency and produces more accurate and detailed reports. A blinded clinician evaluation further suggests that MARL-Rad produces reports clinically comparable to ground-truth reports.
Editorial analysis
A structured set of objections, weighed in public.
Referee Report
Summary. The paper proposes MARL-Rad, a multi-modal multi-agent reinforcement learning framework for radiology report generation from chest X-rays. It decomposes interpretation into region-specific agents plus a global integrating agent and jointly optimizes the system end-to-end using clinically verifiable rewards. Experiments on MIMIC-CXR and IU X-ray report state-of-the-art results on RadGraph, CheXbert, and GREEN scores, plus gains in laterality consistency, report detail, and blinded clinician equivalence to ground-truth reports.
Significance. If the experimental claims are substantiated, the work would be significant as one of the first demonstrations of end-to-end multi-agent RL optimization for medical report generation, moving beyond post-hoc LLM agent workflows. The use of region-specific agents, clinically grounded rewards, and clinician evaluation are positive elements that could influence future agentic systems in radiology.
major comments (3)
- [Abstract / Methods] Abstract and Methods: The central claim that clinically verifiable rewards enable joint optimization of region-specific and global agents rests on unspecified reward definitions, weighting, shaping, and handling of sparse signals. Without these details it is impossible to assess whether reported gains on RadGraph/CheXbert/GREEN reflect genuine clinical improvement or metric-specific optimization.
- [Experiments] Experiments: The manuscript does not report statistical significance tests, confidence intervals, or ablation studies isolating the contribution of the multi-agent RL component versus single-agent or supervised baselines, undermining the SOTA and laterality-consistency claims.
- [Experiments] Experiments: Potential circularity between reward signals and evaluation metrics (both drawn from entity-extraction and label-accuracy tools) is not addressed; explicit discussion or an independent held-out clinical metric is required to rule out reward hacking.
minor comments (2)
- Clarify the precise multi-modal fusion mechanism between image features and text tokens inside each agent.
- Add dataset split statistics, preprocessing steps, and hyper-parameter tables to support reproducibility.
Simulated Author's Rebuttal
We thank the referee for the constructive and detailed feedback. We address each major comment point by point below. Revisions will be incorporated into the next version of the manuscript to improve clarity, statistical rigor, and discussion of potential limitations.
read point-by-point responses
-
Referee: [Abstract / Methods] Abstract and Methods: The central claim that clinically verifiable rewards enable joint optimization of region-specific and global agents rests on unspecified reward definitions, weighting, shaping, and handling of sparse signals. Without these details it is impossible to assess whether reported gains on RadGraph/CheXbert/GREEN reflect genuine clinical improvement or metric-specific optimization.
Authors: We agree that additional detail on the reward formulation is necessary. In the revised Methods section we will explicitly define each component of the clinically verifiable rewards (entity-level matching from RadGraph, label accuracy from CheXbert, and GREEN score contributions), specify the weighting coefficients used to combine them, describe the reward-shaping functions applied to address sparsity, and explain how the composite reward is back-propagated through the multi-agent policy gradient updates. These additions will allow readers to evaluate whether the reported gains reflect genuine clinical improvement. revision: yes
-
Referee: [Experiments] Experiments: The manuscript does not report statistical significance tests, confidence intervals, or ablation studies isolating the contribution of the multi-agent RL component versus single-agent or supervised baselines, undermining the SOTA and laterality-consistency claims.
Authors: We accept this criticism. The revised Experiments section will include paired statistical significance tests (with p-values), 95% confidence intervals for all metrics on both MIMIC-CXR and IU X-ray, and a set of ablation studies that isolate the multi-agent RL component against single-agent RL and supervised-learning baselines. These results will be presented in new tables and will directly support the SOTA and laterality-consistency claims. revision: yes
-
Referee: [Experiments] Experiments: Potential circularity between reward signals and evaluation metrics (both drawn from entity-extraction and label-accuracy tools) is not addressed; explicit discussion or an independent held-out clinical metric is required to rule out reward hacking.
Authors: We acknowledge the need for explicit discussion of this issue. The revised manuscript will add a dedicated paragraph in the Experiments section that analyzes the overlap between reward signals and evaluation metrics and explains why the clinical grounding of the rewards reduces the risk of pure metric hacking. We will also expand the existing blinded clinician evaluation (already performed on a held-out set) to serve as an independent validation metric not used during reward computation. revision: partial
Circularity Check
No significant circularity; claims rest on external empirical validation
full rationale
The paper's derivation chain consists of a multi-agent RL framework whose policy is optimized via clinically verifiable rewards and then evaluated on standard public datasets (MIMIC-CXR, IU X-ray) using independent automated metrics (RadGraph, CheXbert, GREEN) plus blinded clinician review. No equation or step reduces a claimed prediction to a fitted input by construction, nor does any load-bearing premise collapse to a self-citation whose content is itself unverified. The reward design is presented as an external modeling choice rather than a tautological re-expression of the evaluation scores.
Axiom & Free-Parameter Ledger
axioms (1)
- domain assumption Clinically verifiable rewards can be defined to measure and optimize report quality for the multi-agent system.
Lean theorems connected to this paper
-
IndisputableMonolith/Cost/FunctionalEquation.leanwashburn_uniqueness_aczel unclear?
unclearRelation between the paper passage and the cited Recognition theorem.
The final reward is defined as the unweighted sum of these three components [CheXbert, RadGraph F1, ROUGE-L]
-
IndisputableMonolith/Foundation/RealityFromDistinction.leanreality_from_one_distinction unclear?
unclearRelation between the paper passage and the cited Recognition theorem.
MARL-Rad consists of region-specific agents ... and a global integrating agent ... jointly optimized through reinforcement learning
What do these tags mean?
- matches
- The paper's claim is directly supported by a theorem in the formal canon.
- supports
- The theorem supports part of the paper's argument, but the paper may add assumptions or extra steps.
- extends
- The paper goes beyond the formal theorem; the theorem is a base layer rather than the whole result.
- uses
- The paper appears to rely on the theorem as machinery.
- contradicts
- The paper's claim conflicts with a theorem or certificate in the canon.
- unclear
- Pith found a possible connection, but the passage is too broad, indirect, or ambiguous to say the theorem truly supports the claim.
Reference graph
Works this paper leans on
-
[1]
ARDGen: Augmentation regularization for domain- generalized medical report generation
Syed Bilal Ahsan, Muhammad Ikhalas, Muhammad Muza- mil Khan, Sana Ullah, and Muhammad Zaigham Za- heer. ARDGen: Augmentation regularization for domain- generalized medical report generation. InIEEE/CVF Con- ference on Computer Vision and Pattern Recognition Work- shops, pages 6526–6535, 2025. 1, 3, 4
work page 2025
-
[2]
Daniel Joseph Alapat, Malavika Venu Menon, and Sharmila Ashok. A review on detection of pneumonia in chest X- ray images using neural networks.Journal of Biomedical Physics and Engineering, 12(6):551–558, 2022. 1
work page 2022
-
[3]
Multi-resolution pathology-language pre-training model with text-guided visual representation
Shahad Albastaki, Anabia Sohail, Iyyakutti Iyappan Gana- pathi, Basit Alawode, Asim Khan, Sajid Javed, Naoufel Werghi, Mohammed Bennamoun, and Arif Mahmood. Multi-resolution pathology-language pre-training model with text-guided visual representation. InIEEE/CVF Con- ference on Computer Vision and Pattern Recognition, pages 25907–25919, 2025. 3
work page 2025
-
[4]
Kaito Baba, Ryota Yagi, Junichiro Takahashi, Risa Kishikawa, and Satoshi Kodera. JRadiEvo: A japanese radiology report generation model enhanced by evolu- tionary optimization of model merging.arXiv preprint arXiv:2411.09933, 2024. 1, 4
-
[5]
Kaito Baba, Chaoran Liu, Shuhei Kurita, and Akiyoshi San- nai. Prover Agent: An agent-based framework for formal mathematical proofs.arXiv preprint arXiv:2506.19923,
-
[6]
METEOR: An auto- matic metric for MT evaluation with improved correlation with human judgments
Satanjeev Banerjee and Alon Lavie. METEOR: An auto- matic metric for MT evaluation with improved correlation with human judgments. InProceedings of the ACL Work- shop on Intrinsic and Extrinsic Evaluation Measures for Machine Translation and/or Summarization, pages 65–72. Association for Computational Linguistics, 2005. 4
work page 2005
-
[7]
Shruthi Bannur, Kenza Bouzid, Daniel C. Castro, An- ton Schwaighofer, Anja Thieme, Sam Bond-Taylor, Max- imilian Ilse, Fernando P ´erez-Garc´ıa, Valentina Salvatelli, Harshita Sharma, Felix Meissen, Mercy Ranjit, Shaury Sri- vastav, Julia Gong, Noel C. F. Codella, Fabian Falck, Ozan Oktay, Matthew P. Lungren, Maria Teodora Wetscherek, Javier Alvarez-Valle...
-
[8]
Xiaolei Bo, Feiyang Yang, Feilong Xu, and Xiaoli Zhang. Cross-counter-repeat attention for enhanced understanding of visual semantics in radiology report generation. InPro- ceedings of the 33rd ACM International Conference on Multimedia, pages 4242–4250. Association for Computing Machinery, 2025. 3
work page 2025
-
[9]
Baselines for chest X-ray report generation
William Boag, Tzu-Ming Harry Hsu, Matthew Mcdermott, Gabriela Berner, Emily Alesentzer, and Peter Szolovits. Baselines for chest X-ray report generation. InProceed- ings of the Machine Learning for Health NeurIPS Work- shop, pages 126–140. PMLR, 2020. 5, 6
work page 2020
-
[10]
G. W. L. Boland, A. S. Guimaraes, and P. R. Mueller. Ra- diology report turnaround: expectations and solutions.Eu- ropean Radiology, 18(7):1326–1328, 2008. 1
work page 2008
-
[11]
Imaging the chest: The chest radiograph
Joshua Broder. Imaging the chest: The chest radiograph. InDiagnostic Imaging for the Emergency Physician, pages 185–296. Elsevier, 2011. 1
work page 2011
-
[12]
Sema Candemir and Sameer Antani. A review on lung boundary detection in chest X-rays.International Journal of Computer Assisted Radiology and Surgery, 14(4):563– 576, 2019. 1
work page 2019
-
[13]
Spatialvlm: Endow- ing vision-language models with spatial reasoning capabil- ities
Boyuan Chen, Zhuo Xu, Sean Kirmani, Brain Ichter, Dorsa Sadigh, Leonidas Guibas, and Fei Xia. Spatialvlm: Endow- ing vision-language models with spatial reasoning capabil- ities. InIEEE/CVF Conference on Computer Vision and Pattern Recognition, pages 14455–14465, 2024. 7
work page 2024
-
[14]
Pan, Wen Zhang, Huajun Chen, Fan Yang, Zenan Zhou, and Weipeng Chen
Mingyang Chen, Linzhuang Sun, Tianpeng Li, sunhaoze, ZhouYijie, Chenzheng Zhu, Haofen Wang, Jeff Z. Pan, Wen Zhang, Huajun Chen, Fan Yang, Zenan Zhou, and Weipeng Chen. ReSearch: Learning to reason with search for LLMs via reinforcement learning. InThe Thirty-ninth Annual Conference on Neural Information Processing Sys- tems, 2025. 2
work page 2025
-
[15]
Generating radiology reports via memory- driven transformer
Zhihong Chen, Yan Song, Tsung-Hui Chang, and Xi- ang Wan. Generating radiology reports via memory- driven transformer. InProceedings of the 2020 Confer- ence on Empirical Methods in Natural Language Process- ing (EMNLP), pages 1439–1449. Association for Compu- tational Linguistics, 2020. 2, 4, 5, 6
work page 2020
-
[16]
Cross-modal memory networks for radiology report gener- ation
Zhihong Chen, Yaling Shen, Yan Song, and Xiang Wan. Cross-modal memory networks for radiology report gener- ation. InProceedings of the 59th Annual Meeting of the As- sociation for Computational Linguistics and the 11th Inter- national Joint Conference on Natural Language Processing (Volume 1: Long Papers), pages 5904–5914. Association for Computational L...
work page 2021
-
[17]
CheXa- gent: Towards a foundation model for chest X-ray interpre- tation
Zhihong Chen, Maya Varma, Jean-Benoit Delbrouck, Mag- dalini Paschali, Louis Blankemeier, Dave Van Veen, Jeya Maria Jose Valanarasu, Alaa Youssef, Joseph Paul Co- hen, Eduardo Pontes Reis, Emily Tsai, Andrew Johnston, Cameron Olsen, Tanishq Mathew Abraham, Sergios Ga- tidis, Akshay S Chaudhari, and Curtis Langlotz. CheXa- gent: Towards a foundation model ...
work page 2024
-
[18]
Zhuoxiao Chen, Hongyang Yu, Ying Xu, Yadan Luo, Long Duong, and Yuan-Fang Li. OraPO: Oracle-educated rein- forcement learning for data-efficient and factual radiology report generation.arXiv preprint arXiv:2509.18600, 2025. 1, 3
-
[19]
SpatialRGPT: Grounded spatial reasoning in vision- language models
An-Chieh Cheng, Hongxu Yin, Yang Fu, Qiushan Guo, Ruihan Yang, Jan Kautz, Xiaolong Wang, and Sifei Liu. SpatialRGPT: Grounded spatial reasoning in vision- language models. InThe Thirty-eighth Annual Conference on Neural Information Processing Systems, 2024. 7
work page 2024
-
[20]
Noel C. F. Codella, Ying Jin, Shrey Jain, Yu Gu, Ho Hin Lee, Asma Ben Abacha, Alberto Santamaria-Pang, Will Guyman, Naiteek Sangani, Sheng Zhang, Hoifung Poon, Stephanie Hyland, Shruthi Bannur, Javier Alvarez-Valle, Xue Li, John Garrett, Alan McMillan, Gaurav Rajguru, Madhu Maddi, Nilesh Vijayrania, Rehaan Bhimai, Nick Mecklenburg, Rupal Jain, Daniel Hols...
-
[21]
Ian A. Cowan, Sharyn L. S. MacDonald, and Richard A. Floyd. Measuring and managing radiologist workload: measuring radiologist reporting times using data from a ra- diology information system.Journal of Medical Imaging and Radiation Oncology, 57(5):558–566, 2013. 1
work page 2013
-
[22]
Daniel Coelho de Castro, Aurelia Bustos, Shruthi Ban- nur, Stephanie L. Hyland, Kenza Bouzid, Maria Teodora Wetscherek, Maria Dolores S ´anchez-Valverde, Lara Jaques-P´erez, Lourdes P ´erez-Rodr´ıguez, Kenji Takeda, Jos´e Mar´ıa Salinas-Serrano, Javier Alvarez-Valle, Joaqu´ın Galant-Herrero, and Antonio Pertusa. PadChest-GR: A bilingual chest X-ray datase...
work page 2025
-
[23]
DeepSeek-R1: Incentivizing Reasoning Capability in LLMs via Reinforcement Learning
DeepSeek-AI. DeepSeek-R1: Incentivizing reasoning ca- pability in LLMs via reinforcement learning.arXiv preprint arXiv:2501.12948, 2025. 1
work page internal anchor Pith review Pith/arXiv arXiv 2025
-
[24]
Automated structured radiology report generation
Jean-Benoit Delbrouck, Justin Xu, Johannes Moll, Alois Thomas, Zhihong Chen, Sophie Ostmeier, Asfandyar Azhar, Kelvin Zhenghao Li, Andrew Johnston, Christian Bluethgen, Eduardo Pontes Reis, Mohamed S Muneer, Maya Varma, and Curtis Langlotz. Automated structured radiology report generation. InProceedings of the 63rd An- nual Meeting of the Association for ...
work page 2025
-
[25]
Dina Demner-Fushman, Marc D. Kohli, Marc B. Rosen- man, Sonya E. Shooshan, Laritza Rodriguez, Sameer An- tani, George R. Thoma, and Clement J. McDonald. Prepar- ing a collection of radiology examinations for distribution and retrieval.Journal of the American Medical Informatics Association, 23(2):304–310, 2015. 1, 3, 4, 6, 7
work page 2015
-
[26]
Fei Dong, Shouping Nie, Manling Chen, Fangfang Xu, and Qian Li. Keyword-based ai assistance in the generation of radiology reports: A pilot study.npj Digital Medicine, 8 (1):490, 2025. 1, 3
work page 2025
-
[27]
Guanting Dong, Yifei Chen, Xiaoxi Li, Jiajie Jin, Hongjin Qian, Yutao Zhu, Hangyu Mao, Guorui Zhou, Zhicheng Dou, and Ji-Rong Wen. Tool-Star: Empowering LLM- brained multi-tool reasoner via reinforcement learning. arXiv preprint arXiv:2505.16410, 2025. 2
-
[28]
Elboardy, Ghada Khoriba, and Essam A
Ahmed T. Elboardy, Ghada Khoriba, and Essam A. Rashed. Medical AI consensus: A multi-agent framework for ra- diology report generation and evaluation.arXiv preprint arXiv:2509.17353, 2025. 1, 3
-
[29]
ReTool: Reinforcement learning for strate- gic tool use in LLMs, 2025
Jiazhan Feng, Shijue Huang, Xingwei Qu, Ge Zhang, Yu- jia Qin, Baoquan Zhong, Chengquan Jiang, Jinxin Chi, and Wanjun Zhong. ReTool: Reinforcement learning for strate- gic tool use in LLMs, 2025. 2
work page 2025
-
[30]
Anna Fink, Alexander Rau, Marco Reisert, Fabian Bam- berg, and Maximilian F. Russe. Retrieval-augmented gen- eration with large language models in radiology: From the- ory to practice.Radiology: Artificial Intelligence, 7(4): e240790, 2025. 3
work page 2025
-
[31]
Google. Gemini 2.5: Pushing the frontier with advanced reasoning, multimodality, long context, and next genera- tion agentic capabilities.arXiv preprint arXiv:2507.06261,
work page internal anchor Pith review Pith/arXiv arXiv
-
[32]
FactCheXcker: Mitigating measurement hallucinations in chest X-ray report genera- tion models
Alice Heiman, Xiaoman Zhang, Emma Chen, Sung Eun Kim, and Pranav Rajpurkar. FactCheXcker: Mitigating measurement hallucinations in chest X-ray report genera- tion models. InIEEE/CVF Conference on Computer Vision and Pattern Recognition, pages 30787–30796, 2025. 3
work page 2025
-
[33]
MetaGPT: Meta programming for a multi- agent collaborative framework
Sirui Hong, Mingchen Zhuge, Jonathan Chen, Xiawu Zheng, Yuheng Cheng, Jinlin Wang, Ceyao Zhang, Zili Wang, Steven Ka Shing Yau, Zijuan Lin, Liyang Zhou, Chenyu Ran, Lingfeng Xiao, Chenglin Wu, and J ¨urgen Schmidhuber. MetaGPT: Meta programming for a multi- agent collaborative framework. InThe Twelfth Interna- tional Conference on Learning Representations...
work page 2024
-
[34]
RADAR: Enhancing radiology report generation with supplementary knowledge injection
Wenjun Hou, Yi Cheng, Kaishuai Xu, Heng Li, Yan Hu, Wenjie Li, and Jiang Liu. RADAR: Enhancing radiology report generation with supplementary knowledge injection. InProceedings of the 63rd Annual Meeting of the Associ- ation for Computational Linguistics (Volume 1: Long Pa- pers), pages 26366–26381. Association for Computational Linguistics, 2025. 3
work page 2025
-
[35]
RRG-Mamba: Efficient radiology report gener- ation with state space model
Xiaodi Hou, Xiaobo Li, Mingyu Lu, Simiao Wang, and Yi- jia Zhang. RRG-Mamba: Efficient radiology report gener- ation with state space model. InProceedings of the Thirty- Fourth International Joint Conference on Artificial Intel- ligence, IJCAI-25, pages 7410–7418. International Joint Conferences on Artificial Intelligence Organization, 2025
work page 2025
-
[36]
Xuege Hou, Yali Li, and Shengjin Wang. Knowledge- driven query network with adaptive cross-view attention for structured radiology report generation. InIEEE/CVF Inter- national Conference on Computer Vision Workshops, pages 1234–1243, 2025. 3, 4, 5, 6
work page 2025
-
[37]
OWL: Optimized workforce learning for general multi-agent assistance in real-world task automation
Mengkang Hu, Yuhang Zhou, Wendong Fan, Yuzhou Nie, Ziyu Ye, Bowei Xia, Tao Sun, Zhaoxuan Jin, Yingru Li, Zeyu Zhang, Yifeng Wang, Qianshuo Ye, Bernard Ghanem, Ping Luo, and Guohao Li. OWL: Optimized workforce learning for general multi-agent assistance in real-world task automation. InThe Thirty-ninth Annual Conference on Neural Information Processing Sys...
work page 2025
-
[38]
Shih-Cheng Huang, Liyue Shen, Matthew P. Lungren, and Serena Yeung. GLoRIA: A multimodal global-local rep- resentation learning framework for label-efficient medical image recognition. In2021 IEEE/CVF International Con- ference on Computer Vision (ICCV), pages 3922–3931,
-
[39]
Xiaofei Huang, Wenting Chen, Jie Liu, Qisheng Lu, Xi- aoling Luo, and Linlin Shen. DAMPER: A dual-stage medical report generation framework with coarse-grained mesh alignment and fine-grained hypergraph matching. AAAI Conference on Artificial Intelligence, 39(4):3769– 3778, 2025. 4, 5, 6
work page 2025
-
[40]
CmEAA: Cross-modal enhancement and alignment adapter for radiology report generation
Xiyang Huang, Yingjie Han, Yx L, Runzhi Li, Pengcheng Wu, and Kunli Zhang. CmEAA: Cross-modal enhancement and alignment adapter for radiology report generation. In 10 Proceedings of the 31st International Conference on Com- putational Linguistics, pages 8546–8556. Association for Computational Linguistics, 2025. 3
work page 2025
-
[41]
Kiut: Knowledge-injected u-transformer for radiology re- port generation
Zhongzhen Huang, Xiaofan Zhang, and Shaoting Zhang. Kiut: Knowledge-injected u-transformer for radiology re- port generation. InIEEE/CVF Conference on Computer Vision and Pattern Recognition, pages 19809–19818, 2023. 4, 5, 6
work page 2023
-
[42]
Hyland, Shruthi Bannur, Kenza Bouzid, Daniel C
Stephanie L. Hyland, Shruthi Bannur, Kenza Bouzid, Daniel C. Castro, Mercy Ranjit, Anton Schwaighofer, Fernando P ´erez-Garc´ıa, Valentina Salvatelli, Shaury Sri- vastav, Anja Thieme, Noel Codella, Matthew P. Lun- gren, Maria Teodora Wetscherek, Ozan Oktay, and Javier Alvarez-Valle. MAIRA-1: A specialised large multimodal model for radiology report genera...
-
[43]
RadGraph: Extracting clinical entities and relations from radiology reports
Saahil Jain, Ashwin Agrawal, Adriel Saporta, Steven Truong, Du Nguyen Duong Nguyen Duong, Tan Bui, Pierre Chambon, Yuhao Zhang, Matthew Lungren, Andrew Ng, Curtis Langlotz, Pranav Rajpurkar, and Pranav Rajpurkar. RadGraph: Extracting clinical entities and relations from radiology reports. InProceedings of the Neural Information Processing Systems Track on...
-
[44]
Feibo Jiang, Cunhua Pan, Li Dong, Kezhi Wang, Octavia A. Dobre, and Merouane Debbah. From large AI models to agentic AI: A tutorial on future intelligent communications. arXiv preprint arXiv:2505.22311, 2025. 1, 2
-
[45]
Hanqi Jiang, Xixuan Hao, Yuzhou Huang, Chong Ma, Jiaxun Zhang, Yi Pan, and Ruimao Zhang. Advanc- ing medical radiograph representation learning: A hybrid pre-training paradigm with multilevel semantic granularity. InEuropean Conference on Computer Vision Workshops, pages 16–33, 2025. 3
work page 2025
-
[46]
CoMT: Chain-of-medical-thought reduces hallucination in medical report generation
Yue Jiang, Jiawei Chen, Dingkang Yang, Mingcheng Li, Shunli Wang, Tong Wu, Ke Li, and Lihua Zhang. CoMT: Chain-of-medical-thought reduces hallucination in medical report generation. InIEEE International Conference on Acoustics, Speech and Signal Processing (ICASSP), pages 1–5, 2025. 3
work page 2025
-
[47]
Peiyuan Jing, Kinhei Lee, Zhenxuan Zhang, Huichi Zhou, Zhengqing Yuan, Zhifan Gao, Lei Zhu, Giorgos Papanasta- siou, Yingying Fang, and Guang Yang. Reason like a radiol- ogist: Chain-of-thought and reinforcement learning for ver- ifiable report generation.arXiv preprint arXiv:2504.18453,
-
[48]
Alistair E. W. Johnson, Tom J. Pollard, Seth J. Berkowitz, Nathaniel R. Greenbaum, Matthew P. Lungren, Chih-ying Deng, Roger G. Mark, and Steven Horng. MIMIC-CXR, a de-identified publicly available database of chest radio- graphs with free-text reports.Scientific Data, 6(1):317,
-
[49]
Hamza Kalisch, Fabian H ¨orst, Jens Kleesiek, Ken Her- rmann, and Constantin Seibold. CT-GRAPH: Hierarchical graph attention network for anatomy-guided CT report gen- eration.arXiv preprint arXiv:2508.05375, 2025. 3
-
[50]
MDA- gents: An adaptive collaboration of LLMs for medical decision-making
Yubin Kim, Chanwoo Park, Hyewon Jeong, Yik Siu Chan, Xuhai Xu, Daniel McDuff, Hyeonhoon Lee, Marzyeh Ghassemi, Cynthia Breazeal, and Hae Won Park. MDA- gents: An adaptive collaboration of LLMs for medical decision-making. InThe Thirty-eighth Annual Conference on Neural Information Processing Systems, 2024. 1
work page 2024
-
[51]
Yunsoo Kim, Jinge Wu, Su Hwan Kim, Pardeep Vasudev, Jiashu Shen, and Honghan Wu. Look & mark: Leveraging radiologist eye fixations and bounding boxes in multimodal large language models for chest X-ray report generation. In Findings of the Association for Computational Linguistics: ACL 2025, pages 17680–17694. Association for Computa- tional Linguistics, 2025. 3
work page 2025
-
[52]
Anis Koubaa. From pre-trained language models to agen- tic AI: Evolution and architectures for autonomous intelli- gence.Preprints, 2025. 1, 2
work page 2025
-
[53]
Efficient memory management for large language model serving with pagedattention
Woosuk Kwon, Zhuohan Li, Siyuan Zhuang, Ying Sheng, Lianmin Zheng, Cody Hao Yu, Joseph Gonzalez, Hao Zhang, and Ion Stoica. Efficient memory management for large language model serving with pagedattention. In Proceedings of the 29th Symposium on Operating Systems Principles, pages 611–626. Association for Computing Ma- chinery, 2023. 5
work page 2023
-
[54]
Yuxiang Lai, Jike Zhong, Ming Li, Shitian Zhao, Yuheng Li, Konstantinos Psounis, and Xiaofeng Yang. Med- R1: Reinforcement learning for generalizable medical reasoning in vision-language models.arXiv preprint arXiv:2503.13939, 2025. 3
-
[55]
Kyeongkyu Lee, Seonghwan Yoon, and Hongki Lim. CLARIFID: Improving radiology report generation by re- inforcing clinically accurate impressions and enforcing de- tailed findings.arXiv preprint arXiv:2507.17234, 2025. 1, 3, 4, 5
-
[56]
Seowoo Lee, Jiwon Youn, Hyungjin Kim, Mansu Kim, and Soon Ho Yoon. CXR-LLaV A: a multimodal large language model for interpreting chest X-ray images.European Radi- ology, 35(7):4374–4386, 2025. 3
work page 2025
-
[57]
CAMEL: Com- municative agents for ”mind” exploration of large language model society
Guohao Li, Hasan Abed Al Kader Hammoud, Hani Itani, Dmitrii Khizbullin, and Bernard Ghanem. CAMEL: Com- municative agents for ”mind” exploration of large language model society. InThirty-seventh Conference on Neural In- formation Processing Systems, 2023. 1, 2
work page 2023
-
[58]
Con- trastive learning with counterfactual explanations for radi- ology report generation
Mingjie Li, Haokun Lin, Liang Qiu, Xiaodan Liang, Ling Chen, Abdulmotaleb Elsaddik, and Xiaojun Chang. Con- trastive learning with counterfactual explanations for radi- ology report generation. InEuropean Conference on Com- puter Vision, pages 162–180, 2024. 4, 5, 6
work page 2024
-
[59]
Xinyi Li, Sai Wang, Siqi Zeng, Yu Wu, and Yi Yang. A survey on llm-based multi-agent systems: workflow, infras- tructure, and challenges.Vicinagearth, 1(1):9, 2024. 1, 2
work page 2024
-
[60]
ToRL: Scal- ing tool-integrated RL.arXiv preprint arXiv:2503.23383,
Xuefeng Li, Haoyang Zou, and Pengfei Liu. ToRL: Scal- ing tool-integrated RL.arXiv preprint arXiv:2503.23383,
-
[61]
Yilin Li, Chao Kong, Guosheng Zhao, and Zijian Zhao. Au- tomatic radiology report generation with deep learning: a comprehensive review of methods and advances.Artificial Intelligence Review, 58(11):344, 2025. 2
work page 2025
-
[62]
Yingshu Li, Yunyi Liu, Zhanyu Wang, Xinyu Liang, Lingqiao Liu, Lei Wang, and Luping Zhou. S-RRG-Bench: 11 Structured radiology report generation with fine-grained evaluation framework.Meta-Radiology, page 100171,
-
[63]
Zhuofeng Li, Haoxiang Zhang, Seungju Han, Sheng Liu, Jianwen Xie, Yu Zhang, Yejin Choi, James Zou, and Pan Lu. In-the-flow agentic system optimization for effective planning and tool use.arXiv preprint arXiv:2510.05592,
-
[64]
Encouraging divergent thinking in large language mod- els through multi-agent debate
Tian Liang, Zhiwei He, Wenxiang Jiao, Xing Wang, Yan Wang, Rui Wang, Yujiu Yang, Shuming Shi, and Zhaopeng Tu. Encouraging divergent thinking in large language mod- els through multi-agent debate. InProceedings of the 2024 Conference on Empirical Methods in Natural Language Processing, pages 17889–17904. Association for Compu- tational Linguistics, 2024. 2
work page 2024
-
[65]
ROUGE: A package for automatic evalua- tion of summaries
Chin-Yew Lin. ROUGE: A package for automatic evalua- tion of summaries. InText Summarization Branches Out, pages 74–81. Association for Computational Linguistics,
-
[66]
Qika Lin, Yifan Zhu, Bin Pu, Ling Huang, Haoran Luo, Jingying Ma, Zhen Peng, Tianzhe Zhao, Fangzhi Xu, Jian Zhang, Kai He, Zhonghong Ou, Swapnil Mishra, and Mengling Feng. A foundation model for chest X-ray inter- pretation with grounded reasoning via online reinforcement learning.arXiv preprint arXiv:2509.03906, 2025. 1, 3, 4, 5
-
[67]
Chang Liu, Yuanhe Tian, Weidong Chen, Yan Song, and Yongdong Zhang. Bootstrapping large language models for radiology report generation.AAAI Conference on Artificial Intelligence, 38(17):18635–18643, 2024. 4, 5, 6
work page 2024
-
[68]
Competence-based multimodal curriculum learning for medical report genera- tion
Fenglin Liu, Shen Ge, and Xian Wu. Competence-based multimodal curriculum learning for medical report genera- tion. InProceedings of the 59th Annual Meeting of the As- sociation for Computational Linguistics and the 11th Inter- national Joint Conference on Natural Language Processing (Volume 1: Long Papers), pages 3001–3012. Association for Computational ...
work page 2021
-
[69]
Exploring and distilling posterior and prior knowledge for radiology report generation
Fenglin Liu, Xian Wu, Shen Ge, Wei Fan, and Yuexian Zou. Exploring and distilling posterior and prior knowledge for radiology report generation. InIEEE/CVF Conference on Computer Vision and Pattern Recognition, pages 13748– 13757, 2021. 5, 6
work page 2021
-
[70]
Clinically accurate chest X-ray report generation
Guanxiong Liu, Tzu-Ming Harry Hsu, Matthew McDer- mott, Willie Boag, Wei-Hung Weng, Peter Szolovits, and Marzyeh Ghassemi. Clinically accurate chest X-ray report generation. InProceedings of the 4th Machine Learning for Healthcare Conference, pages 249–269. PMLR, 2019. 5, 6
work page 2019
-
[71]
Kang Liu, Zhuoqi Ma, Xiaolu Kang, Zhusi Zhong, Zhicheng Jiao, Grayson Baird, Harrison Bai, and Qiguang Miao. Structural entities extraction and patient indications incorporation for chest X-ray report generation. Inproceed- ings of Medical Image Computing and Computer Assisted Intervention. Springer Nature Switzerland, 2024. 4, 5
work page 2024
-
[72]
Enhanced contrastive learning with multi-view longitudinal data for chest X-ray report generation
Kang Liu, Zhuoqi Ma, Xiaolu Kang, Yunan Li, Kun Xie, Zhicheng Jiao, and Qiguang Miao. Enhanced contrastive learning with multi-view longitudinal data for chest X-ray report generation. InIEEE/CVF Conference on Computer Vision and Pattern Recognition, pages 10348–10359, 2025. 1, 3, 5
work page 2025
-
[73]
In-context learning for zero- shot medical report generation
Rui Liu, Mingjie Li, Shen Zhao, Ling Chen, Xiaojun Chang, and Lina Yao. In-context learning for zero- shot medical report generation. InProceedings of the 32nd ACM International Conference on Multimedia, pages 8721–8730. Association for Computing Machinery, 2024. 4, 5, 6
work page 2024
-
[74]
HC-LLM: Historical-constrained large language models for radiology report generation
Tengfei Liu, Jiapu Wang, Yongli Hu, Mingjie Li, Junfei Yi, Xiaojun Chang, Junbin Gao, and Baocai Yin. HC-LLM: Historical-constrained large language models for radiology report generation. InAAAI Conference on Artificial Intelli- gence, pages 5595–5603, 2025. 3
work page 2025
-
[75]
Xiaohong Liu, Hao Liu, Guoxing Yang, Zeyu Jiang, Shuguang Cui, Zhaoze Zhang, Huan Wang, Liyuan Tao, Yongchang Sun, Zhu Song, Tianpei Hong, Jin Yang, Tian- run Gao, Jiangjiang Zhang, Xiaohu Li, Jing Zhang, Ye Sang, Zhao Yang, Kanmin Xue, Song Wu, Ping Zhang, Jian Yang, Chunli Song, and Guangyu Wang. A generalist medi- cal language model for disease diagnos...
work page 2025
-
[76]
From observation to concept: A flexible multi-view paradigm for medical report generation
Zhizhe Liu, Zhenfeng Zhu, Shuai Zheng, Yawei Zhao, Kun- lun He, and Yao Zhao. From observation to concept: A flexible multi-view paradigm for medical report generation. IEEE Transactions on Multimedia, 26:5987–5995, 2024. 4, 5, 6
work page 2024
-
[77]
Understanding R1-Zero-like training: A critical perspective
Zichen Liu, Changyu Chen, Wenjun Li, Penghui Qi, Tianyu Pang, Chao Du, Wee Sun Lee, and Min Lin. Understanding R1-Zero-like training: A critical perspective. InSecond Conference on Language Modeling, 2025. 2
work page 2025
-
[78]
Part I: Tricks or traps? a deep dive into RL for LLM reasoning.arXiv preprint arXiv:2508.08221, 2025
Zihe Liu, Jiashun Liu, Yancheng He, Weixun Wang, Jia- heng Liu, Ling Pan, Xinyu Hu, Shaopan Xiong, Ju Huang, Jian Hu, Shengyi Huang, Johan Obando-Ceron, Siran Yang, Jiamang Wang, Wenbo Su, and Bo Zheng. Part I: Tricks or traps? a deep dive into RL for LLM reasoning.arXiv preprint arXiv:2508.08221, 2025. 2
-
[79]
Jinhui Lou, Yan Yang, Zhou Yu, Zhenqi Fu, Weidong Han, Qingming Huang, and Jun Yu. CXRAgent: Director- orchestrated multi-stage reasoning for chest X-ray interpre- tation.arXiv preprint arXiv:2510.21324, 2025. 1, 3
-
[80]
Eye-gaze guided multi-modal alignment for medical repre- sentation learning
Chong Ma, Hanqi Jiang, Wenting Chen, Yiwei Li, Zihao Wu, Xiaowei Yu, Zhengliang Liu, Lei Guo, Dajiang Zhu, Tuo Zhang, Dinggang Shen, Tianming Liu, and Xiang Li. Eye-gaze guided multi-modal alignment for medical repre- sentation learning. InThe Thirty-eighth Annual Conference on Neural Information Processing Systems, 2024. 3
work page 2024
discussion (0)
Sign in with ORCID, Apple, or X to comment. Anyone can read and Pith papers without signing in.