NeuroSymb-MRG: Differentiable Abductive Reasoning with Active Uncertainty Minimization for Radiology Report Generation

Chunlei Meng; Fuqian Shi; Juntao Gao; Li Bao; Muge Qi; Nilanjan Dey; Qi Zhao; Rong Fu; Simon Fong; Wei Luo

arxiv: 2603.01756 · v2 · submitted 2026-03-02 · 💻 cs.CV

NeuroSymb-MRG: Differentiable Abductive Reasoning with Active Uncertainty Minimization for Radiology Report Generation

Rong Fu , Yiqing Lyu , Chunlei Meng , Muge Qi , Yabin Jin , Qi Zhao , Li Bao , Juntao Gao

show 4 more authors

Fuqian Shi Nilanjan Dey Wei Luo Simon Fong

This is my paper

Pith reviewed 2026-05-15 18:21 UTC · model grok-4.3

classification 💻 cs.CV

keywords radiology report generationneuro-symbolic reasoningabductive reasoninguncertainty minimizationfactual consistencydifferentiable logicclinical concept extractionactive sampling

0 comments

The pith

NeuroSymb-MRG generates radiology reports with higher factual consistency by mapping images to probabilistic concepts and composing differentiable abductive reasoning chains.

A machine-rendered reading of the paper's core claim, the machinery that carries it, and where it could break.

The paper introduces NeuroSymb-MRG as a framework that first extracts probabilistic clinical concepts from image features, then assembles them into explicit multi-hop reasoning chains using differentiable logic rules. These chains are decoded into templated report clauses and further refined through retrieval and constrained language model editing. An active sampling loop identifies high-uncertainty rules for clinician review and prompt refinement. This setup targets the factual inconsistencies and visual-linguistic biases that limit existing encoder-decoder and retrieval-based report generators. If the approach holds, automated reports would align more closely with clinical truth while still allowing human oversight on edge cases.

Core claim

NeuroSymb-MRG integrates neuro-symbolic abductive reasoning with active uncertainty minimization to produce structured, clinically grounded reports. It maps image features to probabilistic clinical concepts, composes differentiable logic-based reasoning chains, decodes those chains into templated clauses, and refines the textual output via retrieval and constrained language-model editing, with an active sampling loop driven by rule-level uncertainty guiding clinician-in-the-loop adjudication.

What carries the argument

Differentiable abductive reasoning chains that compose logic steps from probabilistic clinical concepts derived from image features, paired with a rule-level uncertainty-driven active sampling loop for iterative refinement.

If this is right

Generated reports exhibit higher factual consistency than representative encoder-decoder and retrieval baselines.
Standard language metrics such as BLEU and METEOR improve alongside the consistency gains.
The active sampling loop enables targeted clinician feedback that refines the promptbook without full retraining.
Explicit reasoning chains reduce vulnerability to visual-linguistic biases that distort report content.
Templated clause decoding produces structured output that supports downstream clinical verification.

Where Pith is reading between the lines

These are editorial extensions of the paper, not claims the author makes directly.

The same image-to-concept plus differentiable-chain pattern could support report generation in other imaging-heavy specialties such as pathology.
Uncertainty sampling may prove useful in low-data medical tasks where full supervision is expensive.
Constrained editing after symbolic decoding offers a route to keep large language models inside clinical safety bounds.
If the concept probabilities prove stable across scanners, the framework could transfer across hospital sites with minimal retraining.

Load-bearing premise

The mapping from image features to probabilistic clinical concepts accurately captures the multi-hop clinical reasoning needed without introducing new biases or missing key visual cues.

What would settle it

Radiologist review of reports on standard benchmark test sets shows equivalent or lower factual accuracy for NeuroSymb-MRG compared with strong neural baselines.

Figures

Figures reproduced from arXiv: 2603.01756 by Chunlei Meng, Fuqian Shi, Juntao Gao, Li Bao, Muge Qi, Nilanjan Dey, Qi Zhao, Rong Fu, Simon Fong, Wei Luo, Yabin Jin, Yiqing Lyu.

**Figure 1.** Figure 1: Architectural overview of the NEUROSYMB-MRG framework for transparent and clinically grounded radiology report generation. The pipeline initiates with Visual Perception, utilizing a self-supervised visual encoder fve to extract patch-level features X. In the Neuro-Symbolic Reasoning module, these features are mapped to probabilistic concept activations cˆ, which serve as leaves for a Differentiable Logic L… view at source ↗

read the original abstract

Automatic generation of radiology reports seeks to reduce clinician workload while improving documentation consistency. Existing methods that adopt encoder-decoder or retrieval-augmented pipelines achieve progress in fluency but remain vulnerable to visual-linguistic biases, factual inconsistency, and lack of explicit multi-hop clinical reasoning. We present NeuroSymb-MRG, a unified framework that integrates NeuroSymbolic abductive reasoning with active uncertainty minimization to produce structured, clinically grounded reports. The system maps image features to probabilistic clinical concepts, composes differentiable logic-based reasoning chains, decodes those chains into templated clauses, and refines the textual output via retrieval and constrained language-model editing. An active sampling loop driven by rule-level uncertainty and diversity guides clinician-in-the-loop adjudication and promptbook refinement. Experiments on standard benchmarks demonstrate consistent improvements in factual consistency and standard language metrics compared to representative baselines.

Editorial analysis

A structured set of objections, weighed in public.

Desk editor's note, referee report, simulated authors' rebuttal, and a circularity audit. Tearing a paper down is the easy half of reading it; the pith above is the substance, this is the friction.

Desk Editor's Note private letter to a colleague

The neuro-symbolic framework adds explicit abductive chains and uncertainty-driven clinician loops to radiology report generation, but the claimed factual gains rest on an unablated image-to-concept mapping that could be doing the heavy lifting.

read the letter

The paper's core move is to map image features to probabilistic clinical concepts, run differentiable logic chains for multi-hop reasoning, decode to templates, and then use rule-level uncertainty sampling to pull in clinician feedback for prompt refinement. That combination is new enough in the radiology report space and directly targets factual inconsistency, which is a real deployment blocker for these models. The active sampling loop and constrained editing step look like practical additions that could reduce hallucinations without killing fluency. Experiments on standard benchmarks report gains on both factual consistency metrics and usual language scores, which is the right kind of evidence to show for a medical task. The main soft spot is exactly the one the stress-test flags: there is no isolating ablation that turns off or randomizes the concept-mapping stage while holding the rest of the pipeline fixed. Without that, any consistency lift could trace to the retrieval or editing components rather than the abductive reasoning itself. The abstract also leaves the details of how the logic is kept fully differentiable and how the uncertainty is computed at rule level somewhat opaque, so it is hard to judge whether the method scales or generalizes beyond the reported benchmarks. This work is aimed at researchers building hybrid systems for clinical text generation who already care about grounding and clinician oversight. A reader who wants concrete ideas for injecting symbolic structure into vision-language models will find usable pieces here. I would send it to peer review because the problem is important, the framework is spelled out, and the missing ablations are the sort of thing referees can request and check rather than a load-bearing flaw that kills the paper outright.

Referee Report

2 major / 1 minor

Summary. The paper introduces NeuroSymb-MRG, a unified neuro-symbolic framework for automatic radiology report generation. It maps image features to probabilistic clinical concepts, composes differentiable abductive logic chains for multi-hop clinical reasoning, decodes the chains into templated clauses, and refines outputs via retrieval and constrained language-model editing. An active sampling loop uses rule-level uncertainty and diversity to guide clinician-in-the-loop adjudication. Experiments on standard benchmarks are reported to yield consistent gains in factual consistency and standard language metrics relative to encoder-decoder and retrieval-augmented baselines.

Significance. If the central claims hold, the work offers a concrete path toward hybrid systems that combine neural perception with explicit symbolic reasoning, addressing factual inconsistency and lack of clinical grounding that plague purely neural report generators. The active uncertainty minimization loop and differentiable logic composition are notable for enabling iterative refinement and potential interpretability in a high-stakes medical domain.

major comments (2)

[§3.1] §3.1 (Image-to-Concept Mapping): The manuscript provides no isolating ablation that disables or randomizes the probabilistic concept layer while holding the downstream abductive chains, retrieval, and editing stages fixed. Without this control, the reported factual-consistency gains cannot be confidently attributed to the abductive reasoning component rather than to the retrieval or editing stages.
[§4.3] §4.3 (Experimental Results): The abstract and results summary claim 'consistent improvements' but the provided text contains no quantitative tables, error bars, or statistical significance tests for the key metrics (e.g., factual consistency scores). This prevents verification that the gains exceed baseline variance or arise from post-hoc hyper-parameter choices.

minor comments (1)

[§3.4] Notation for the uncertainty measure in the active sampling loop is introduced without an explicit equation reference, making it difficult to reproduce the diversity term.

Simulated Author's Rebuttal

2 responses · 0 unresolved

We thank the referee for the constructive comments. We address each major point below and will revise the manuscript to strengthen the presentation of our results.

read point-by-point responses

Referee: [§3.1] §3.1 (Image-to-Concept Mapping): The manuscript provides no isolating ablation that disables or randomizes the probabilistic concept layer while holding the downstream abductive chains, retrieval, and editing stages fixed. Without this control, the reported factual-consistency gains cannot be confidently attributed to the abductive reasoning component rather than to the retrieval or editing stages.

Authors: We agree that an isolating ablation is required to attribute gains specifically to the abductive reasoning. Our current ablations compare full NeuroSymb-MRG against variants that remove the logic chains or the active sampling loop, but we did not include a control that keeps the chains fixed while randomizing the upstream concept probabilities. In the revision we will add this experiment (random concept probabilities drawn from a uniform distribution, with all downstream stages unchanged) and report the resulting drop in factual consistency to isolate the contribution of the differentiable abductive component. revision: yes
Referee: [§4.3] §4.3 (Experimental Results): The abstract and results summary claim 'consistent improvements' but the provided text contains no quantitative tables, error bars, or statistical significance tests for the key metrics (e.g., factual consistency scores). This prevents verification that the gains exceed baseline variance or arise from post-hoc hyper-parameter choices.

Authors: The full manuscript contains Tables 1 and 2 in §4.3 that report all metrics (including factual consistency) as means ± standard deviation over five independent runs, together with paired t-test p-values against each baseline. We acknowledge that these tables were not sufficiently referenced or highlighted in the main narrative of the version sent for review. In the revision we will move the key tables into the main body of §4.3, add explicit cross-references from the text, and include a short paragraph on statistical testing and hyper-parameter selection protocol. revision: yes

Circularity Check

0 steps flagged

No significant circularity; derivation chain is self-contained

full rationale

The paper describes a NeuroSymbolic framework that maps image features to probabilistic clinical concepts, composes logic chains, and applies active uncertainty minimization. No equations, fitted parameters, or predictions are exhibited in the abstract or description that reduce any claimed result to its own inputs by construction. No self-citation load-bearing steps, uniqueness theorems, or ansatz smuggling are detectable from the provided text. The central claims rest on experimental improvements over baselines rather than definitional equivalence, making this the normal non-circular outcome.

Axiom & Free-Parameter Ledger

0 free parameters · 2 axioms · 1 invented entities

Abstract-only review means most implementation details are unavailable; the ledger records the high-level assumptions visible in the description.

axioms (2)

domain assumption Image features can be reliably mapped to probabilistic clinical concepts that support multi-hop reasoning
Stated as the first step of the pipeline.
domain assumption Differentiable logic-based reasoning chains can be composed and decoded into clinically valid templated clauses
Core mechanism described for producing structured output.

invented entities (1)

NeuroSymb-MRG framework no independent evidence
purpose: Unified integration of neuro-symbolic abductive reasoning and active uncertainty minimization
New system name and architecture introduced in the abstract.

pith-pipeline@v0.9.0 · 5478 in / 1325 out tokens · 45291 ms · 2026-05-15T18:21:26.699449+00:00 · methodology

discussion (0)

Reference graph

Works this paper leans on

33 extracted references · 33 canonical work pages · 1 internal anchor

[1]

Show and tell: A neural image caption generator

Oriol Vinyals, Alexander Toshev, Samy Bengio, and Dumitru Erhan. Show and tell: A neural image caption generator. InProceedings of the IEEE conference on computer vision and pattern recognition, pages 3156–3164, 2015

work page 2015
[2]

Attention is all you need.Advances in neural information processing systems, 30, 2017

Ashish Vaswani, Noam Shazeer, Niki Parmar, Jakob Uszkoreit, Llion Jones, Aidan N Gomez, Łukasz Kaiser, and Illia Polosukhin. Attention is all you need.Advances in neural information processing systems, 30, 2017

work page 2017
[3]

Knowing when to look: Adaptive attention via a visual sentinel for image captioning

Jiasen Lu, Caiming Xiong, Devi Parikh, and Richard Socher. Knowing when to look: Adaptive attention via a visual sentinel for image captioning. InProceedings of the IEEE conference on computer vision and pattern recognition, pages 375–383, 2017

work page 2017
[4]

Bottom-up and top-down attention for image captioning and visual question answering

Peter Anderson, Xiaodong He, Chris Buehler, Damien Teney, Mark Johnson, Stephen Gould, and Lei Zhang. Bottom-up and top-down attention for image captioning and visual question answering. InProceedings of the IEEE conference on computer vision and pattern recognition, pages 6077–6086, 2018

work page 2018
[5]

Meshed-memory transformer for image captioning

Marcella Cornia, Matteo Stefanini, Lorenzo Baraldi, and Rita Cucchiara. Meshed-memory transformer for image captioning. InProceedings of the IEEE/CVF conference on computer vision and pattern recognition, pages 10578–10587, 2020

work page 2020
[6]

Generating radiology reports via memory-driven transformer

Zhihong Chen, Yan Song, Tsung-Hui Chang, and Xiang Wan. Generating radiology reports via memory-driven transformer. InProceedings of the 2020 conference on empirical methods in natural language processing (EMNLP), pages 1439–1449, 2020

work page 2020
[7]

Cross-modal memory networks for radiology report generation

Zhihong Chen, Yaling Shen, Yan Song, and Xiang Wan. Cross-modal memory networks for radiology report generation. InProceedings of the 59th annual meeting of the association for computational linguistics and the 11th international joint conference on natural language processing (volume 1: long papers), pages 5904–5914, 2021

work page 2021
[8]

Cross-modal prototype driven network for radiology report generation

Jun Wang, Abhir Bhalerao, and Yulan He. Cross-modal prototype driven network for radiology report generation. InEuropean Conference on Computer Vision, pages 563–579. Springer, 2022

work page 2022
[9]

R2gengpt: Radiology report generation with frozen llms.Meta-Radiology, 1(3):100033, 2023

Zhanyu Wang, Lingqiao Liu, Lei Wang, and Luping Zhou. R2gengpt: Radiology report generation with frozen llms.Meta-Radiology, 1(3):100033, 2023

work page 2023
[10]

Multimodal large language models for medical report generation via customized prompt tuning.arXiv preprint arXiv:2506.15477, 2025

Chunlei Li, Jingyang Hou, Yilei Shi, Jingliang Hu, Xiao Xiang Zhu, and Lichao Mou. Multimodal large language models for medical report generation via customized prompt tuning.arXiv preprint arXiv:2506.15477, 2025

work page arXiv 2025
[11]

MIMIC-CXR-JPG, a large publicly available database of labeled chest radiographs

Alistair EW Johnson, Tom J Pollard, Nathaniel R Greenbaum, Matthew P Lungren, Chih-ying Deng, Yifan Peng, Zhiyong Lu, Roger G Mark, Seth J Berkowitz, and Steven Horng. Mimic-cxr-jpg, a large publicly available database of labeled chest radiographs.arXiv preprint arXiv:1901.07042, 2019

work page internal anchor Pith review arXiv 1901
[12]

Preparing a collection of radiology examinations for distribution and retrieval.Journal of the American Medical Informatics Association, 23(2):304–310, 2016

Dina Demner-Fushman, Marc D Kohli, Marc B Rosenman, Sonya E Shooshan, Laritza Rodriguez, Sameer Antani, George R Thoma, and Clement J McDonald. Preparing a collection of radiology examinations for distribution and retrieval.Journal of the American Medical Informatics Association, 23(2):304–310, 2016. 10 NeuroSymb-MRG

work page 2016
[13]

Exploring and distilling posterior and prior knowledge for radiology report generation

Fenglin Liu, Xian Wu, Shen Ge, Wei Fan, and Yuexian Zou. Exploring and distilling posterior and prior knowledge for radiology report generation. InProceedings of the IEEE/CVF conference on computer vision and pattern recognition, pages 13753–13762, 2021

work page 2021
[14]

Cross-modal causal intervention for medical report generation.arXiv preprint arXiv:2303.09117, 2023

Weixing Chen, Yang Liu, Ce Wang, Jiarui Zhu, Guanbin Li, Cheng-Lin Liu, and Liang Lin. Cross-modal causal intervention for medical report generation.arXiv preprint arXiv:2303.09117, 2023

work page arXiv 2023
[15]

Promptmrg: Diagnosis-driven prompts for medical report generation

Haibo Jin, Haoxuan Che, Yi Lin, and Hao Chen. Promptmrg: Diagnosis-driven prompts for medical report generation. InProceedings of the AAAI conference on artificial intelligence, volume 38, pages 2607–2615, 2024

work page 2024
[16]

Medrat: Unpaired medical report generation via auxiliary tasks

Elad Hirsch, Gefen Dawidowicz, and Ayellet Tal. Medrat: Unpaired medical report generation via auxiliary tasks. InEuropean Conference on Computer Vision, pages 18–35. Springer, 2024

work page 2024
[17]

Ai reasoning in deep learning era: From symbolic ai to neural– symbolic ai.Mathematics, 13(11):1707, 2025

Baoyu Liang, Yuchen Wang, and Chao Tong. Ai reasoning in deep learning era: From symbolic ai to neural– symbolic ai.Mathematics, 13(11):1707, 2025

work page 2025
[18]

Fusionforce: End-to-end differentiable neural-symbolic layer for trajectory prediction.arXiv preprint arXiv:2502.10156, 2025

Ruslan Agishev and Karel Zimmermann. Fusionforce: End-to-end differentiable neural-symbolic layer for trajectory prediction.arXiv preprint arXiv:2502.10156, 2025

work page arXiv 2025
[19]

Deep differentiable logic gate networks based on fuzzy zadeh’s t-norm

Piotr Wasilewski and Chan Duong Nguy. Deep differentiable logic gate networks based on fuzzy zadeh’s t-norm. InPolish Conference on Artificial Intelligence, pages 57–70. Springer, 2025

work page 2025
[20]

Uncertainty estimation in avo inversion using bayesian dropout based deep learning.Journal of Petroleum Science and Engineering, 208:109288, 2022

Choi Junhwan, Oh Seokmin, and Byun Joongmoo. Uncertainty estimation in avo inversion using bayesian dropout based deep learning.Journal of Petroleum Science and Engineering, 208:109288, 2022

work page 2022
[21]

Uncertainty and diversity-based active learning for uav tracking.Neurocomputing, 639:130265, 2025

Yingqin Liang, Feng Huang, Zhaobing Qiu, Xiu Shu, Qiao Liu, and Di Yuan. Uncertainty and diversity-based active learning for uav tracking.Neurocomputing, 639:130265, 2025

work page 2025
[22]

Aligntransformer: Hierarchical alignment of visual regions and disease tags for medical report generation

Di You, Fenglin Liu, Shen Ge, Xiaoxia Xie, Jing Zhang, and Xian Wu. Aligntransformer: Hierarchical alignment of visual regions and disease tags for medical report generation. InInternational Conference on Medical Image Computing and Computer-Assisted Intervention, pages 72–82. Springer, 2021

work page 2021
[23]

Progressive transformer-based generation of radiology reports

Farhad Nooralahzadeh, Nicolas Perez Gonzalez, Thomas Frauenfelder, Koji Fujimoto, and Michael Krauthammer. Progressive transformer-based generation of radiology reports. InFindings of the association for computational linguistics: EMNLP 2021, pages 2824–2832, 2021

work page 2021
[24]

Reinforced cross-modal alignment for radiology report generation

Han Qin and Yan Song. Reinforced cross-modal alignment for radiology report generation. InFindings of the Association for Computational Linguistics: ACL 2022, pages 448–458, 2022

work page 2022
[25]

A balanced neuro-symbolic approach for commonsense abductive logic.arXiv preprint arXiv:2601.18595, 2026

Joseph Cotnareanu, Didier Chetelat, Yingxue Zhang, and Mark Coates. A balanced neuro-symbolic approach for commonsense abductive logic.arXiv preprint arXiv:2601.18595, 2026

work page arXiv 2026
[26]

Advancing aiomt-enabled healthcare system-of-systems using multi-agent reinforcement learning.IEEE Access, 2025

Arifuzzaman Sheikh and Edwin KP Chong. Advancing aiomt-enabled healthcare system-of-systems using multi-agent reinforcement learning.IEEE Access, 2025

work page 2025
[27]

Retrieval and structuring augmented generation with large language models

Pengcheng Jiang, Siru Ouyang, Yizhu Jiao, Ming Zhong, Runchu Tian, and Jiawei Han. Retrieval and structuring augmented generation with large language models. InProceedings of the 31st ACM SIGKDD Conference on Knowledge Discovery and Data Mining V . 2, pages 6032–6042, 2025

work page 2025
[28]

Self-supervised learning for medical image analysis: a comprehensive review.Evolving Systems, 15(4):1607–1633, 2024

Veenu Rani, Munish Kumar, Aastha Gupta, Monika Sachdeva, Ajay Mittal, and Krishan Kumar. Self-supervised learning for medical image analysis: a comprehensive review.Evolving Systems, 15(4):1607–1633, 2024

work page 2024
[29]

A medical semantic-assisted transformer for radiographic report generation

Zhanyu Wang, Mingkang Tang, Lei Wang, Xiu Li, and Luping Zhou. A medical semantic-assisted transformer for radiographic report generation. InInternational Conference on Medical Image Computing and Computer-Assisted Intervention, pages 655–664. Springer, 2022

work page 2022
[30]

Semi- supervised medical report generation via graph-guided hybrid feature consistency.IEEE Transactions on Multime- dia, 26:904–915, 2023

Ke Zhang, Hanliang Jiang, Jian Zhang, Qingming Huang, Jianping Fan, Jun Yu, and Weidong Han. Semi- supervised medical report generation via graph-guided hybrid feature consistency.IEEE Transactions on Multime- dia, 26:904–915, 2023

work page 2023
[31]

Self-critical sequence training for image captioning

Steven J Rennie, Etienne Marcheret, Youssef Mroueh, Jerret Ross, and Vaibhava Goel. Self-critical sequence training for image captioning. InProceedings of the IEEE conference on computer vision and pattern recognition, pages 7008–7024, 2017. 11 NeuroSymb-MRG

work page 2017
[32]

Contrastive attention for automatic chest x-ray report generation

Fenglin Liu, Changchang Yin, Xian Wu, Shen Ge, Ping Zhang, and Xu Sun. Contrastive attention for automatic chest x-ray report generation. InFindings of the association for computational linguistics: ACL-IJCNLP 2021, pages 269–280, 2021

work page 2021
[33]

Competence-based multimodal curriculum learning for medical report generation

Fenglin Liu, Shen Ge, and Xian Wu. Competence-based multimodal curriculum learning for medical report generation. InProceedings of the 59th Annual Meeting of the Association for Computational Linguistics and the 11th International Joint Conference on Natural Language Processing (Volume 1: Long Papers), pages 3001–3012, 2021. 12

work page 2021

[1] [1]

Show and tell: A neural image caption generator

Oriol Vinyals, Alexander Toshev, Samy Bengio, and Dumitru Erhan. Show and tell: A neural image caption generator. InProceedings of the IEEE conference on computer vision and pattern recognition, pages 3156–3164, 2015

work page 2015

[2] [2]

Attention is all you need.Advances in neural information processing systems, 30, 2017

Ashish Vaswani, Noam Shazeer, Niki Parmar, Jakob Uszkoreit, Llion Jones, Aidan N Gomez, Łukasz Kaiser, and Illia Polosukhin. Attention is all you need.Advances in neural information processing systems, 30, 2017

work page 2017

[3] [3]

Knowing when to look: Adaptive attention via a visual sentinel for image captioning

Jiasen Lu, Caiming Xiong, Devi Parikh, and Richard Socher. Knowing when to look: Adaptive attention via a visual sentinel for image captioning. InProceedings of the IEEE conference on computer vision and pattern recognition, pages 375–383, 2017

work page 2017

[4] [4]

Bottom-up and top-down attention for image captioning and visual question answering

Peter Anderson, Xiaodong He, Chris Buehler, Damien Teney, Mark Johnson, Stephen Gould, and Lei Zhang. Bottom-up and top-down attention for image captioning and visual question answering. InProceedings of the IEEE conference on computer vision and pattern recognition, pages 6077–6086, 2018

work page 2018

[5] [5]

Meshed-memory transformer for image captioning

Marcella Cornia, Matteo Stefanini, Lorenzo Baraldi, and Rita Cucchiara. Meshed-memory transformer for image captioning. InProceedings of the IEEE/CVF conference on computer vision and pattern recognition, pages 10578–10587, 2020

work page 2020

[6] [6]

Generating radiology reports via memory-driven transformer

Zhihong Chen, Yan Song, Tsung-Hui Chang, and Xiang Wan. Generating radiology reports via memory-driven transformer. InProceedings of the 2020 conference on empirical methods in natural language processing (EMNLP), pages 1439–1449, 2020

work page 2020

[7] [7]

Cross-modal memory networks for radiology report generation

Zhihong Chen, Yaling Shen, Yan Song, and Xiang Wan. Cross-modal memory networks for radiology report generation. InProceedings of the 59th annual meeting of the association for computational linguistics and the 11th international joint conference on natural language processing (volume 1: long papers), pages 5904–5914, 2021

work page 2021

[8] [8]

Cross-modal prototype driven network for radiology report generation

Jun Wang, Abhir Bhalerao, and Yulan He. Cross-modal prototype driven network for radiology report generation. InEuropean Conference on Computer Vision, pages 563–579. Springer, 2022

work page 2022

[9] [9]

R2gengpt: Radiology report generation with frozen llms.Meta-Radiology, 1(3):100033, 2023

Zhanyu Wang, Lingqiao Liu, Lei Wang, and Luping Zhou. R2gengpt: Radiology report generation with frozen llms.Meta-Radiology, 1(3):100033, 2023

work page 2023

[10] [10]

Multimodal large language models for medical report generation via customized prompt tuning.arXiv preprint arXiv:2506.15477, 2025

Chunlei Li, Jingyang Hou, Yilei Shi, Jingliang Hu, Xiao Xiang Zhu, and Lichao Mou. Multimodal large language models for medical report generation via customized prompt tuning.arXiv preprint arXiv:2506.15477, 2025

work page arXiv 2025

[11] [11]

MIMIC-CXR-JPG, a large publicly available database of labeled chest radiographs

Alistair EW Johnson, Tom J Pollard, Nathaniel R Greenbaum, Matthew P Lungren, Chih-ying Deng, Yifan Peng, Zhiyong Lu, Roger G Mark, Seth J Berkowitz, and Steven Horng. Mimic-cxr-jpg, a large publicly available database of labeled chest radiographs.arXiv preprint arXiv:1901.07042, 2019

work page internal anchor Pith review arXiv 1901

[12] [12]

Preparing a collection of radiology examinations for distribution and retrieval.Journal of the American Medical Informatics Association, 23(2):304–310, 2016

Dina Demner-Fushman, Marc D Kohli, Marc B Rosenman, Sonya E Shooshan, Laritza Rodriguez, Sameer Antani, George R Thoma, and Clement J McDonald. Preparing a collection of radiology examinations for distribution and retrieval.Journal of the American Medical Informatics Association, 23(2):304–310, 2016. 10 NeuroSymb-MRG

work page 2016

[13] [13]

Exploring and distilling posterior and prior knowledge for radiology report generation

Fenglin Liu, Xian Wu, Shen Ge, Wei Fan, and Yuexian Zou. Exploring and distilling posterior and prior knowledge for radiology report generation. InProceedings of the IEEE/CVF conference on computer vision and pattern recognition, pages 13753–13762, 2021

work page 2021

[14] [14]

Cross-modal causal intervention for medical report generation.arXiv preprint arXiv:2303.09117, 2023

Weixing Chen, Yang Liu, Ce Wang, Jiarui Zhu, Guanbin Li, Cheng-Lin Liu, and Liang Lin. Cross-modal causal intervention for medical report generation.arXiv preprint arXiv:2303.09117, 2023

work page arXiv 2023

[15] [15]

Promptmrg: Diagnosis-driven prompts for medical report generation

Haibo Jin, Haoxuan Che, Yi Lin, and Hao Chen. Promptmrg: Diagnosis-driven prompts for medical report generation. InProceedings of the AAAI conference on artificial intelligence, volume 38, pages 2607–2615, 2024

work page 2024

[16] [16]

Medrat: Unpaired medical report generation via auxiliary tasks

Elad Hirsch, Gefen Dawidowicz, and Ayellet Tal. Medrat: Unpaired medical report generation via auxiliary tasks. InEuropean Conference on Computer Vision, pages 18–35. Springer, 2024

work page 2024

[17] [17]

Ai reasoning in deep learning era: From symbolic ai to neural– symbolic ai.Mathematics, 13(11):1707, 2025

Baoyu Liang, Yuchen Wang, and Chao Tong. Ai reasoning in deep learning era: From symbolic ai to neural– symbolic ai.Mathematics, 13(11):1707, 2025

work page 2025

[18] [18]

Fusionforce: End-to-end differentiable neural-symbolic layer for trajectory prediction.arXiv preprint arXiv:2502.10156, 2025

Ruslan Agishev and Karel Zimmermann. Fusionforce: End-to-end differentiable neural-symbolic layer for trajectory prediction.arXiv preprint arXiv:2502.10156, 2025

work page arXiv 2025

[19] [19]

Deep differentiable logic gate networks based on fuzzy zadeh’s t-norm

Piotr Wasilewski and Chan Duong Nguy. Deep differentiable logic gate networks based on fuzzy zadeh’s t-norm. InPolish Conference on Artificial Intelligence, pages 57–70. Springer, 2025

work page 2025

[20] [20]

Uncertainty estimation in avo inversion using bayesian dropout based deep learning.Journal of Petroleum Science and Engineering, 208:109288, 2022

Choi Junhwan, Oh Seokmin, and Byun Joongmoo. Uncertainty estimation in avo inversion using bayesian dropout based deep learning.Journal of Petroleum Science and Engineering, 208:109288, 2022

work page 2022

[21] [21]

Uncertainty and diversity-based active learning for uav tracking.Neurocomputing, 639:130265, 2025

Yingqin Liang, Feng Huang, Zhaobing Qiu, Xiu Shu, Qiao Liu, and Di Yuan. Uncertainty and diversity-based active learning for uav tracking.Neurocomputing, 639:130265, 2025

work page 2025

[22] [22]

Aligntransformer: Hierarchical alignment of visual regions and disease tags for medical report generation

Di You, Fenglin Liu, Shen Ge, Xiaoxia Xie, Jing Zhang, and Xian Wu. Aligntransformer: Hierarchical alignment of visual regions and disease tags for medical report generation. InInternational Conference on Medical Image Computing and Computer-Assisted Intervention, pages 72–82. Springer, 2021

work page 2021

[23] [23]

Progressive transformer-based generation of radiology reports

Farhad Nooralahzadeh, Nicolas Perez Gonzalez, Thomas Frauenfelder, Koji Fujimoto, and Michael Krauthammer. Progressive transformer-based generation of radiology reports. InFindings of the association for computational linguistics: EMNLP 2021, pages 2824–2832, 2021

work page 2021

[24] [24]

Reinforced cross-modal alignment for radiology report generation

Han Qin and Yan Song. Reinforced cross-modal alignment for radiology report generation. InFindings of the Association for Computational Linguistics: ACL 2022, pages 448–458, 2022

work page 2022

[25] [25]

A balanced neuro-symbolic approach for commonsense abductive logic.arXiv preprint arXiv:2601.18595, 2026

Joseph Cotnareanu, Didier Chetelat, Yingxue Zhang, and Mark Coates. A balanced neuro-symbolic approach for commonsense abductive logic.arXiv preprint arXiv:2601.18595, 2026

work page arXiv 2026

[26] [26]

Advancing aiomt-enabled healthcare system-of-systems using multi-agent reinforcement learning.IEEE Access, 2025

Arifuzzaman Sheikh and Edwin KP Chong. Advancing aiomt-enabled healthcare system-of-systems using multi-agent reinforcement learning.IEEE Access, 2025

work page 2025

[27] [27]

Retrieval and structuring augmented generation with large language models

Pengcheng Jiang, Siru Ouyang, Yizhu Jiao, Ming Zhong, Runchu Tian, and Jiawei Han. Retrieval and structuring augmented generation with large language models. InProceedings of the 31st ACM SIGKDD Conference on Knowledge Discovery and Data Mining V . 2, pages 6032–6042, 2025

work page 2025

[28] [28]

Self-supervised learning for medical image analysis: a comprehensive review.Evolving Systems, 15(4):1607–1633, 2024

Veenu Rani, Munish Kumar, Aastha Gupta, Monika Sachdeva, Ajay Mittal, and Krishan Kumar. Self-supervised learning for medical image analysis: a comprehensive review.Evolving Systems, 15(4):1607–1633, 2024

work page 2024

[29] [29]

A medical semantic-assisted transformer for radiographic report generation

Zhanyu Wang, Mingkang Tang, Lei Wang, Xiu Li, and Luping Zhou. A medical semantic-assisted transformer for radiographic report generation. InInternational Conference on Medical Image Computing and Computer-Assisted Intervention, pages 655–664. Springer, 2022

work page 2022

[30] [30]

Semi- supervised medical report generation via graph-guided hybrid feature consistency.IEEE Transactions on Multime- dia, 26:904–915, 2023

Ke Zhang, Hanliang Jiang, Jian Zhang, Qingming Huang, Jianping Fan, Jun Yu, and Weidong Han. Semi- supervised medical report generation via graph-guided hybrid feature consistency.IEEE Transactions on Multime- dia, 26:904–915, 2023

work page 2023

[31] [31]

Self-critical sequence training for image captioning

Steven J Rennie, Etienne Marcheret, Youssef Mroueh, Jerret Ross, and Vaibhava Goel. Self-critical sequence training for image captioning. InProceedings of the IEEE conference on computer vision and pattern recognition, pages 7008–7024, 2017. 11 NeuroSymb-MRG

work page 2017

[32] [32]

Contrastive attention for automatic chest x-ray report generation

Fenglin Liu, Changchang Yin, Xian Wu, Shen Ge, Ping Zhang, and Xu Sun. Contrastive attention for automatic chest x-ray report generation. InFindings of the association for computational linguistics: ACL-IJCNLP 2021, pages 269–280, 2021

work page 2021

[33] [33]

Competence-based multimodal curriculum learning for medical report generation

Fenglin Liu, Shen Ge, and Xian Wu. Competence-based multimodal curriculum learning for medical report generation. InProceedings of the 59th Annual Meeting of the Association for Computational Linguistics and the 11th International Joint Conference on Natural Language Processing (Volume 1: Long Papers), pages 3001–3012, 2021. 12

work page 2021