Recognition: unknown
NeuroAgent: LLM Agents for Multimodal Neuroimaging Analysis and Research
Pith reviewed 2026-05-08 09:33 UTC · model grok-4.3
The pith
NeuroAgent uses LLM agents to automate multimodal neuroimaging preprocessing and achieves an AUC of 0.9518 for Alzheimer's classification with four modalities.
A machine-rendered reading of the paper's core claim, the machinery that carries it, and where it could break.
Core claim
NeuroAgent's hierarchical multi-agent architecture with a feedback-driven Generate-Execute-Validate engine autonomously generates executable preprocessing code for heterogeneous neuroimaging data, detects and recovers from runtime errors, validates output integrity, and enables end-to-end multimodal analysis that yields an AUC of 0.9518 for Alzheimer's classification on pooled ADNI data, outperforming single-modality baselines.
What carries the argument
The hierarchical multi-agent architecture with a feedback-driven Generate-Execute-Validate engine that generates, executes, and validates preprocessing code across modalities while limiting human intervention to edge cases.
If this is right
- Multimodal data preprocessed by the system produces higher Alzheimer's classification AUC than any single modality.
- The architecture reduces manual effort for preprocessing to only edge cases via automated error recovery.
- Natural language queries become feasible for downstream statistical analysis once data passes validation.
- Pipeline performance scales with the capability of the underlying LLM backend up to 100% intent parsing and 84.8% end-to-end step correctness.
Where Pith is reading between the lines
- Similar agent ensembles could be adapted to automate preprocessing in other medical imaging domains that face comparable toolchain complexity.
- Wider adoption might enable labs with limited programming resources to conduct reproducible multimodal studies.
- The reliance on automated validation raises the need for ongoing checks against gold-standard pipelines to maintain scientific trust.
Load-bearing premise
That code produced and validated by the LLM agents yields scientifically valid neuroimaging data without introducing systematic artifacts or biases that would alter research conclusions.
What would settle it
A side-by-side comparison by neuroimaging experts showing that NeuroAgent-processed outputs differ meaningfully from established manual pipelines in standard quality metrics such as tissue segmentation accuracy or lead to different classification performance.
read the original abstract
Multimodal neuroimaging analysis often involves complex, modality-specific preprocessing workflows that require careful configuration, quality control, and coordination across heterogeneous toolchains. Beyond preprocessing, downstream statistical analysis and disease classification commonly require task-specific code, evaluation protocols, and data-format conventions, creating additional barriers between raw acquisitions and reproducible scientific analysis. We present NeuroAgent, an LLM-driven agentic framework that automates key preprocessing and analysis steps for heterogeneous neuroimaging data, including sMRI, fMRI, dMRI, and PET, and supports interactive downstream analysis through natural-language queries. NeuroAgent employs a hierarchical multi-agent architecture with a feedback-driven Generate-Execute-Validate engine: agents autonomously generate executable preprocessing code, detect and recover from runtime errors, and validate output integrity. We evaluate the system on 1,470 subjects pooled across all ADNI phases (CN=1,000, AD=470), where all subjects have sMRI and tabular data, with subsets also having Tau-PET (n=469), fMRI (n=278), and DTI ($n=620$). Pipeline ablation studies across multiple LLM backends show that capable models reach up to 100% intent-parsing accuracy, with the strongest backend (Qwen3.5-27B) reaching 84.8% end-to-end preprocessing step correctness. Automated recovery limits manual intervention to edge cases where human review is required via the Human-In-The-Loop interface. For Alzheimer's Disease classification using automatically preprocessed multimodal data, our agent ensemble achieves an AUC of 0.9518 with four modalities, outperforming all single-modality baselines. These results show that NeuroAgent can reduce the manual effort required for neuroimaging preprocessing and enable end-to-end automated analysis pipelines for neuroimaging research.
Editorial analysis
A structured set of objections, weighed in public.
Referee Report
Summary. The paper introduces NeuroAgent, an LLM-powered hierarchical multi-agent system for automating multimodal neuroimaging preprocessing (sMRI, fMRI, dMRI, PET) and downstream analysis via natural language. It employs a Generate-Execute-Validate engine for code generation, runtime error recovery, and output validation. On 1,470 ADNI subjects (with subsets having additional modalities), the system reports 100% intent parsing, up to 84.8% end-to-end preprocessing step correctness (Qwen3.5-27B backend), and an AUC of 0.9518 for Alzheimer's classification using four modalities, outperforming single-modality baselines. Automated recovery is said to limit human intervention to edge cases via a Human-In-The-Loop interface.
Significance. If the agent-generated preprocessing yields data scientifically equivalent to expert pipelines, NeuroAgent would meaningfully lower barriers to reproducible multimodal neuroimaging research by handling complex toolchains and enabling natural-language analysis. The evaluation on a large public ADNI cohort with concrete metrics across multiple LLM backends and modality ablations is a positive aspect. However, the significance is limited by the absence of direct evidence that execution success translates to valid scientific outputs.
major comments (3)
- [§5] §5 (AD classification results): The central claim of AUC 0.9518 with four modalities outperforming single-modality baselines is load-bearing for the paper's contribution. This performance is reported on agent-preprocessed data, yet no quantitative validation against standard pipelines (e.g., fMRIPrep for registration accuracy, FreeSurfer for segmentation Dice scores, or DTIPrep) or blinded expert QC is provided; the 84.8% step correctness measures execution and recovery success only.
- [§4.1] §4.1 (Generate-Execute-Validate engine description): The automated error recovery and output validation steps are presented as sufficient to produce usable data with minimal human correction. However, the evaluation provides no analysis of recovered error types, potential systematic artifacts introduced by LLM-generated code, or downstream impact on metrics such as SNR or alignment quality.
- [Abstract and §5] Abstract and §5 (ADNI cohort details): The 1,470-subject evaluation pools data across ADNI phases with modality subsets (Tau-PET n=469, fMRI n=278, DTI n=620). Data exclusion rules, preprocessing validation criteria, and statistical tests for the multimodal AUC improvement are not specified, undermining assessment of whether the reported gains reflect valid multimodal signal or preprocessing artifacts.
minor comments (3)
- [Abstract and §3] The abstract and §3 use 'end-to-end preprocessing step correctness' without a precise definition or breakdown by modality or error category; a table clarifying this metric would improve clarity.
- [Related Work] Related work section omits several recent LLM-agent frameworks for scientific code generation; adding 2-3 key citations would better situate the hierarchical architecture.
- [Figures] Figure captions for the agent architecture and Human-In-The-Loop interface are brief; expanding them to describe the feedback loop would aid readers unfamiliar with agentic systems.
Simulated Author's Rebuttal
We thank the referee for their constructive and detailed review, which identifies key areas where additional transparency and rigor would strengthen the manuscript. We address each major comment point by point below, committing to revisions that clarify our evaluation protocol while honestly noting the boundaries of the current study.
read point-by-point responses
-
Referee: [§5] §5 (AD classification results): The central claim of AUC 0.9518 with four modalities outperforming single-modality baselines is load-bearing for the paper's contribution. This performance is reported on agent-preprocessed data, yet no quantitative validation against standard pipelines (e.g., fMRIPrep for registration accuracy, FreeSurfer for segmentation Dice scores, or DTIPrep) or blinded expert QC is provided; the 84.8% step correctness measures execution and recovery success only.
Authors: We acknowledge that the 84.8% end-to-end correctness metric primarily captures successful code execution, error recovery, and basic output validation rather than direct quantitative equivalence to expert pipelines. In the revised manuscript, we will expand the Methods and Results sections to detail the exact criteria used for the correctness assessment (including visual and automated checks for gross artifacts and data integrity). We will also add an explicit limitations paragraph noting the absence of metrics such as Dice scores, registration accuracy, or blinded expert QC, and clarify that the reported multimodal AUC gains provide supporting but indirect evidence of data usability. Comprehensive quantitative benchmarking against tools like FreeSurfer or fMRIPrep was outside the scope of this work focused on agent automation. revision: partial
-
Referee: [§4.1] §4.1 (Generate-Execute-Validate engine description): The automated error recovery and output validation steps are presented as sufficient to produce usable data with minimal human correction. However, the evaluation provides no analysis of recovered error types, potential systematic artifacts introduced by LLM-generated code, or downstream impact on metrics such as SNR or alignment quality.
Authors: We agree that greater detail on error recovery and potential artifacts would improve the description of the Generate-Execute-Validate engine. In the revision, we will add a dedicated paragraph and supplementary table in §4.1 categorizing the observed error types (e.g., syntax errors, neuroimaging-tool-specific runtime failures, and format inconsistencies) along with recovery success rates. We will also discuss the risk of LLM-induced artifacts and report any available downstream quality indicators (such as basic alignment or intensity statistics) from the preprocessed outputs to address concerns about systematic effects on SNR or registration quality. revision: yes
-
Referee: [Abstract and §5] Abstract and §5 (ADNI cohort details): The 1,470-subject evaluation pools data across ADNI phases with modality subsets (Tau-PET n=469, fMRI n=278, DTI n=620). Data exclusion rules, preprocessing validation criteria, and statistical tests for the multimodal AUC improvement are not specified, undermining assessment of whether the reported gains reflect valid multimodal signal or preprocessing artifacts.
Authors: We will revise both the Abstract and §5 to explicitly state the data exclusion rules (subjects removed due to missing modalities, failed initial quality checks, or incomplete tabular data), the precise validation criteria applied during the agent's output validation step, and the statistical tests used to evaluate multimodal AUC improvement (including DeLong's test for paired AUC comparisons with p-values). These additions will allow readers to better assess whether the performance gains arise from genuine multimodal signal rather than preprocessing artifacts. revision: yes
Circularity Check
No circularity: empirical evaluation on external dataset
full rationale
The paper describes an LLM agent framework evaluated empirically on the public ADNI dataset (1470 subjects). Reported metrics (100% intent parsing, 84.8% preprocessing correctness, AUC 0.9518) are direct performance measurements from running the system, not mathematical derivations, fitted parameters renamed as predictions, or self-referential definitions. No equations, uniqueness theorems, or ansatzes appear; the central claims rest on observed outcomes rather than reducing to inputs by construction. This is a standard systems paper with independent empirical content.
Axiom & Free-Parameter Ledger
axioms (2)
- domain assumption LLMs can generate executable, modality-specific preprocessing code that integrates with existing neuroimaging toolchains
- domain assumption Automated validation can detect and allow recovery from most runtime and output-integrity errors without introducing bias
invented entities (1)
-
Hierarchical multi-agent Generate-Execute-Validate engine
no independent evidence
Reference graph
Works this paper leans on
-
[1]
Neuroimage , volume=
Resting-state fMRI in the Human Connectome Project , author=. Neuroimage , volume=. 2013 , publisher=
2013
-
[2]
Neuroimage , volume=
The WU-Minn human connectome project: an overview , author=. Neuroimage , volume=. 2013 , publisher=
2013
-
[3]
Neuroimage , volume=
FSL , author=. Neuroimage , volume=. 2012 , publisher=
2012
-
[4]
Human Brain Mapping , volume=
Fast robust automated brain extraction , author=. Human Brain Mapping , volume=. 2002 , doi=
2002
-
[5]
and Barnes, Kelly A
Power, Jonathan D. and Barnes, Kelly A. and Snyder, Abraham Z. and Schlaggar, Bradley L. and Petersen, Steven E. , journal=. Spurious but systematic correlations in functional connectivity. 2012 , doi=
2012
-
[6]
NeuroImage , volume=
Unified segmentation , author=. NeuroImage , volume=. 2005 , doi=
2005
-
[7]
and Tustison, Nicholas J
Avants, Brian B. and Tustison, Nicholas J. and Song, Gang and Cook, Philip A. and Klein, Arno and Gee, James C. , journal=. A reproducible evaluation of. 2011 , doi=
2011
-
[8]
2012 , doi=
Fischl, Bruce , journal=. 2012 , doi=
2012
-
[9]
IEEE Transactions on Medical Imaging , volume=
elastix: A Toolbox for Intensity-Based Medical Image Registration , author=. IEEE Transactions on Medical Imaging , volume=. 2010 , doi=
2010
-
[10]
Advances in Neural Information Processing Systems (NeurIPS) , pages=
A Unified Approach to Interpreting Model Predictions , author=. Advances in Neural Information Processing Systems (NeurIPS) , pages=
-
[11]
and Nielson, Dylan M
Esteban, Oscar and Blair, Ross W. and Nielson, Dylan M. and Varada, Jan C. and Marrett, Sean and Thomas, Adam G. and Poldrack, Russell A. and Gorgolewski, Krzysztof J. , journal=. Crowdsourced. 2019 , doi=
2019
-
[12]
Nature methods , volume=
fMRIPrep: a robust preprocessing pipeline for functional MRI , author=. Nature methods , volume=. 2019 , publisher=
2019
-
[13]
Frontiers in Neuroinformatics , volume=
Nipype: A Flexible, Lightweight and Extensible Neuroimaging Data Processing Framework in Python , author=. Frontiers in Neuroinformatics , volume=. 2011 , doi=
2011
-
[14]
Towards expert- level medical question answering with large language models,
Towards Expert-Level Medical Question Answering with Large Language Models , author=. 2023 , publisher=. doi:10.48550/arXiv.2305.09617 , url=
-
[15]
GPT-4 Technical Report , author=. arXiv preprint arXiv:2303.08774 , year=
work page internal anchor Pith review arXiv
-
[16]
Advances in Neural Information Processing Systems , volume=
Chain-of-Thought Prompting Elicits Reasoning in Large Language Models , author=. Advances in Neural Information Processing Systems , volume=
-
[17]
Frontiers of Computer Science , year=
A Survey on Large Language Model based Autonomous Agents , author=. Frontiers of Computer Science , year=
-
[18]
Nature , volume=
Variability in the analysis of a single neuroimaging dataset by many teams , author=. Nature , volume=. 2020 , doi=
2020
-
[19]
ReAct: Synergizing Reasoning and Acting in Language Models
ReAct: Synergizing Reasoning and Acting in Language Models , author=. 2022 , publisher=. doi:10.48550/arXiv.2210.03629 , url=
work page internal anchor Pith review doi:10.48550/arxiv.2210.03629 2022
-
[20]
Toolformer: Language Models Can Teach Themselves to Use Tools
Toolformer: Language Models Can Teach Themselves to Use Tools , author=. 2023 , publisher=. doi:10.48550/arXiv.2302.04761 , url=
work page internal anchor Pith review doi:10.48550/arxiv.2302.04761 2023
-
[21]
AutoGen: Enabling Next-Gen LLM Applications via Multi-Agent Conversation
AutoGen: Enabling Next-Gen LLM Applications via Multi-Agent Conversation Framework , author=. 2023 , publisher=. doi:10.48550/arXiv.2308.08155 , url=
-
[22]
MedGemma Technical Report , author=. 2025 , publisher=. doi:10.48550/arXiv.2507.05201 , url=
work page internal anchor Pith review Pith/arXiv arXiv doi:10.48550/arxiv.2507.05201 2025
-
[23]
Sparsellm: Towards global pruning for pre-trained language models, 2024
SparseLLM: Towards Global Pruning for Pre-trained Language Models , author=. 2024 , publisher=. doi:10.48550/arXiv.2402.17946 , url=
-
[26]
Adagent: Llm agent for alzheimer's disease analysis with collaborative coordinator, 2025
ADAgent: LLM Agent for Alzheimer's Disease Analysis with Collaborative Coordinator , author=. 2025 , publisher=. doi:10.48550/arXiv.2506.11150 , url=
-
[28]
AD-CARE: A Guideline-grounded, Modality-agnostic LLM Agent for Real-world Alzheimer's Disease Diagnosis with Multi-cohort Assessment, Fairness Analysis, and Reader Study , author=. 2026 , publisher=. doi:10.48550/arXiv.2603.25322 , url=
-
[29]
Evolving medical imaging agents via experience-driven self-skill discovery, 2026
Evolving Medical Imaging Agents via Experience-driven Self-skill Discovery , author=. 2026 , publisher=. doi:10.48550/arXiv.2603.05860 , url=
-
[30]
Rex-mle: The autonomous agent benchmark for medical imaging challenges, 2025
ReX-MLE: The Autonomous Agent Benchmark for Medical Imaging Challenges , author=. 2025 , publisher=. doi:10.48550/arXiv.2512.17838 , url=
-
[31]
A co-evolving agentic ai system for medical imaging analysis, 2025
A co-evolving agentic AI system for medical imaging analysis , author=. 2025 , publisher=. doi:10.48550/arXiv.2509.20279 , url=
-
[32]
Aura: A multi-modal medical agent for understanding, reasoning & annotation, 2025
AURA: A Multi-Modal Medical Agent for Understanding, Reasoning & Annotation , author=. 2025 , publisher=. doi:10.48550/arXiv.2507.16940 , url=
-
[33]
MedMASLab: A Unified Orchestration Framework for Benchmarking Multimodal Medical Multi-Agent Systems , author=. 2026 , publisher=. doi:10.48550/arXiv.2603.09909 , url=
-
[34]
Ad-reasoning: Multimodal guideline-guided reasoning for alzheimer's disease diagnosis, 2026
AD-Reasoning: Multimodal Guideline-Guided Reasoning for Alzheimer's Disease Diagnosis , author=. 2026 , publisher=. doi:10.48550/arXiv.2603.24059 , url=
-
[35]
Annual review of biomedical engineering , volume=
Deep learning in medical image analysis , author=. Annual review of biomedical engineering , volume=. 2017 , publisher=
2017
-
[36]
Alzheimer's & Dementia , volume=
Revised criteria for diagnosis and staging of Alzheimer's disease: Alzheimer's Association Workgroup , author=. Alzheimer's & Dementia , volume=. 2024 , doi=
2024
-
[37]
Alzheimer's & Dementia , volume=
NIA-AA Research Framework: Toward a biological definition of Alzheimer's disease , author=. Alzheimer's & Dementia , volume=. 2018 , doi=
2018
-
[38]
Insights into Imaging , volume=
Multimodality imaging of neurodegenerative disorders with a focus on multiparametric magnetic resonance and molecular imaging , author=. Insights into Imaging , volume=. 2023 , doi=
2023
-
[39]
Cognitive Neurodynamics , volume=
Machine learning with multimodal neuroimaging data to classify stages of Alzheimer's disease: a systematic review and meta-analysis , author=. Cognitive Neurodynamics , volume=. 2023 , doi=
2023
-
[40]
Alzheimer's & Dementia , volume=
Design and validation of the ADNI MR protocol , author=. Alzheimer's & Dementia , volume=. 2024 , publisher=
2024
-
[41]
Frontiers in Neuroinformatics , volume =
Diffusion MRI Indices and Their Relation to Cognitive Impairment in Brain Aging: The Updated Multi-Protocol Approach in ADNI3 , author =. Frontiers in Neuroinformatics , volume =. 2019 , doi =
2019
-
[42]
Alzheimer's & Dementia , year =
Design and Validation of the ADNI4 MRI Protocol , author =. Alzheimer's & Dementia , year =
-
[43]
Journal of Magnetic Resonance, Series B , volume =
MR Diffusion Tensor Spectroscopy and Imaging , author =. Journal of Magnetic Resonance, Series B , volume =. 1994 , doi =
1994
-
[44]
Magnetic Resonance in Medicine , volume =
Diffusional Kurtosis Imaging: The Quantification of Non-Gaussian Water Diffusion by Means of Magnetic Resonance Imaging , author =. Magnetic Resonance in Medicine , volume =. 2005 , doi =
2005
-
[45]
NMR in Biomedicine , volume =
MRI Quantification of Non-Gaussian Water Diffusion by Kurtosis Analysis , author =. NMR in Biomedicine , volume =. 2010 , doi =
2010
-
[46]
NeuroImage , volume =
NODDI: Practical In Vivo Neurite Orientation Dispersion and Density Imaging of the Human Brain , author =. NeuroImage , volume =. 2012 , doi =
2012
-
[47]
Magnetic Resonance in Medicine , volume =
Generalized Autocalibrating Partially Parallel Acquisitions (GRAPPA) , author =. Magnetic Resonance in Medicine , volume =. 2002 , doi =
2002
-
[48]
Magnetic Resonance in Medicine , volume =
SENSE: Sensitivity Encoding for Fast MRI , author =. Magnetic Resonance in Medicine , volume =. 1999 , doi =
1999
-
[49]
Magnetic Resonance in Medicine , volume =
Blipped-Controlled Aliasing in Parallel Imaging for Simultaneous Multislice Echo Planar Imaging with Reduced g-Factor Penalty , author =. Magnetic Resonance in Medicine , volume =. 2012 , doi =
2012
-
[50]
Journal of Magnetic Resonance Imaging , volume =
Three-Dimensional Magnetization-Prepared Rapid Gradient-Echo Imaging (3D MP RAGE) , author =. Journal of Magnetic Resonance Imaging , volume =. 1990 , doi =
1990
-
[51]
Neuroimage , volume=
How to correct susceptibility distortions in spin-echo echo-planar images: application to diffusion tensor imaging , author=. Neuroimage , volume=. 2003 , publisher=
2003
-
[52]
Neuroimage , volume=
An integrated framework for correction of susceptibility, eddy currents, and motion artifacts in diffusion MRI , author=. Neuroimage , volume=. 2016 , publisher=
2016
-
[53]
Neuroimage , volume=
Robust determination of the fibre orientation distribution in diffusion MRI: non-negativity constrained super-resolved spherical deconvolution , author=. Neuroimage , volume=. 2007 , publisher=
2007
-
[54]
Magnetic Resonance in Medicine , volume=
Blipped-controlled aliasing in parallel imaging for simultaneous multislice echo planar imaging with reduced g-factor penalty , author=. Magnetic Resonance in Medicine , volume=. 2012 , publisher=
2012
-
[55]
Magnetic Resonance Imaging , volume=
Simultaneous multi-slice MRI with non-Cartesian trajectories , author=. Magnetic Resonance Imaging , volume=. 2021 , publisher=
2021
-
[56]
Neuroimage , volume=
White matter characterization with diffusional kurtosis imaging , author=. Neuroimage , volume=. 2011 , publisher=
2011
-
[58]
Design and validation of the adni mr protocol
Arvin Arani, Bret Borowski, John Felmlee, Robert I Reid, David L Thomas, Jeffrey L Gunter, Lara Stables, Randy L Buckner, Youngkyoo Jung, Duygu Tosun, et al. Design and validation of the adni mr protocol. Alzheimer's & Dementia, 20 0 (9): 0 6615--6621, 2024
2024
-
[59]
Bernstein, Brian J
Arvin Arani, Matthew A. Bernstein, Brian J. Borowski, Clifford R. Jack, and Michael W. Weiner. Design and validation of the adni4 mri protocol. Alzheimer's & Dementia, 2025. In press
2025
-
[60]
John Ashburner and Karl J. Friston. Unified segmentation. NeuroImage, 26 0 (3): 0 839--851, 2005. doi:10.1016/j.neuroimage.2005.02.018
-
[61]
Brian B. Avants, Nicholas J. Tustison, Gang Song, Philip A. Cook, Arno Klein, and James C. Gee. A reproducible evaluation of ANTs similarity metric performance in brain image registration. NeuroImage, 54 0 (3): 0 2033--2044, 2011. doi:10.1016/j.neuroimage.2010.09.025
-
[62]
Sparsellm: Towards global pruning for pre-trained language models, 2024
Guangji Bai, Yijiang Li, Chen Ling, Kibaek Kim, and Liang Zhao. Sparsellm: Towards global pruning for pre-trained language models, 2024. URL https://arxiv.org/abs/2402.17946
-
[63]
Peter J. Basser, James Mattiello, and Denis LeBihan. Mr diffusion tensor spectroscopy and imaging. Journal of Magnetic Resonance, Series B, 103 0 (3): 0 247--254, 1994. doi:10.1006/jmrb.1994.1037
-
[64]
Rotem Botvinik-Nezer, Felix Holzmeister, Colin F. Camerer, Anna Dreber, Juergen Huber, Magnus Johannesson, Michael Kirchler, Roni Iwanir, Jeanette A. Mumford, R. Alison Adcock, et al. Variability in the analysis of a single neuroimaging dataset by many teams. Nature, 582 0 (7810): 0 84--88, 2020. doi:10.1038/s41586-020-2314-9. URL https://doi.org/10.1038/...
-
[65]
MONAI: An open-source framework for deep learning in healthcare
M Jorge Cardoso, Wenqi Li, Richard Brown, Nic Ma, Eric Kerfoot, Yiheng Wang, Benjamin Murrey, Andriy Myronenko, Can Zhao, Dong Yang, et al. Monai: An open-source framework for deep learning in healthcare. arXiv preprint arXiv:2211.02701, 2022
work page internal anchor Pith review arXiv 2022
-
[66]
Ad-reasoning: Multimodal guideline-guided reasoning for alzheimer's disease diagnosis, 2026
Qiuhui Chen et al. Ad-reasoning: Multimodal guideline-guided reasoning for alzheimer's disease diagnosis, 2026. URL https://arxiv.org/abs/2603.24059
-
[67]
Oscar Esteban, Ross W. Blair, Dylan M. Nielson, Jan C. Varada, Sean Marrett, Adam G. Thomas, Russell A. Poldrack, and Krzysztof J. Gorgolewski. Crowdsourced MRI quality metrics and expert quality annotations for training of humans and machines. Scientific Data, 6 0 (1): 0 30, 2019 a . doi:10.1038/s41597-019-0035-4
-
[68]
fmriprep: a robust preprocessing pipeline for functional mri
Oscar Esteban et al. fmriprep: a robust preprocessing pipeline for functional mri. Nature methods, 16 0 (1): 0 111--116, 2019 b
2019
-
[69]
Evolving medical imaging agents via experience-driven self-skill discovery, 2026
Lin Fan et al. Evolving medical imaging agents via experience-driven self-skill discovery, 2026. URL https://arxiv.org/abs/2603.05860
-
[70]
Aura: A multi-modal medical agent for understanding, reasoning & annotation, 2025
Nima Fathi, Amar Kumar, and Tal Arbel. Aura: A multi-modal medical agent for understanding, reasoning & annotation, 2025. URL https://arxiv.org/abs/2507.16940
-
[71]
Bruce Fischl. FreeSurfer . NeuroImage, 62 0 (2): 0 774--781, 2012. doi:10.1016/j.neuroimage.2012.01.021
-
[72]
Burns, Cindee Madison, Dav Clark, Yaroslav O
Krzysztof Gorgolewski, Christopher D. Burns, Cindee Madison, Dav Clark, Yaroslav O. Halchenko, Michael L. Waskom, and Satrajit S. Ghosh. Nipype: A flexible, lightweight and extensible neuroimaging data processing framework in python. Frontiers in Neuroinformatics, 5, 2011. doi:10.3389/fninf.2011.00013. URL https://doi.org/10.3389/fninf.2011.00013
-
[73]
Adagent: Llm agent for alzheimer's disease analysis with collaborative coordinator, 2025
Wenlong Hou, Guangqian Yang, Ye Du, Yeung Lau, Lihao Liu, Junjun He, Ling Long, and Shujun Wang. Adagent: Llm agent for alzheimer's disease analysis with collaborative coordinator, 2025. URL https://arxiv.org/abs/2506.11150
-
[74]
Wenlong Hou, Sheng Bi, Guangqian Yang, Lihao Liu, Ye Du, Hanxiao Xue, Juncheng Wang, Yuxiang Feng, Yue Xun, Nanxi Yu, et al. Ad-care: A guideline-grounded, modality-agnostic llm agent for real-world alzheimer's disease diagnosis with multi-cohort assessment, fairness analysis, and reader study, 2026. URL https://arxiv.org/abs/2603.25322
-
[75]
Alzheimer’s and Dementia14, 535–562 (4 2018)
Clifford R. Jack, David A. Bennett, Kaj Blennow, Maria C. Carrillo, Billy Dunn, Samantha Budd Haeberlein, David M. Holtzman, William Jagust, Frank Jessen, Jason Karlawish, et al. Nia-aa research framework: Toward a biological definition of alzheimer's disease. Alzheimer's & Dementia, 14 0 (4): 0 535--562, 2018. doi:10.1016/j.jalz.2018.02.018
-
[76]
Clifford R. Jack, J. Scott Andrews, Thomas G. Beach, Teresa Buracchio, Billy Dunn, Ana Graf, Oskar Hansson, Carole Ho, William Jagust, Eric McDade, et al. Revised criteria for diagnosis and staging of alzheimer's disease: Alzheimer's association workgroup. Alzheimer's & Dementia, 20 0 (8): 0 5143--5169, 2024. doi:10.1002/alz.13859
-
[77]
Mark Jenkinson, Christian F Beckmann, Timothy EJ Behrens, Mark W Woolrich, and Stephen M Smith. Fsl. Neuroimage, 62 0 (2): 0 782--790, 2012
2012
-
[78]
Rex-mle: The autonomous agent benchmark for medical imaging challenges, 2025
Roshan Kenia et al. Rex-mle: The autonomous agent benchmark for medical imaging challenges, 2025. URL https://arxiv.org/abs/2512.17838
-
[79]
Stefan Klein, Marius Staring, Keelin Murphy, Max A. Viergever, and Josien P. W. Pluim. elastix: A toolbox for intensity-based medical image registration. IEEE Transactions on Medical Imaging, 29 0 (1): 0 196--205, 2010. doi:10.1109/TMI.2009.2035616
-
[80]
A co-evolving agentic ai system for medical imaging analysis, 2025
Songhao Li, Jonathan Xu, Tiancheng Bao, Yuxuan Liu, Yuchen Liu, Yihang Liu, Lilin Wang, Wenhui Lei, Sheng Wang, Yinuo Xu, Yan Cui, Jialu Yao, Shunsuke Koga, and Zhi Huang. A co-evolving agentic ai system for medical imaging analysis, 2025. URL https://arxiv.org/abs/2509.20279
-
[81]
James Ryan Loftus, Savita Puri, and Steven P. Meyers. Multimodality imaging of neurodegenerative disorders with a focus on multiparametric magnetic resonance and molecular imaging. Insights into Imaging, 14 0 (1), 2023. doi:10.1186/s13244-022-01358-6
-
[82]
Lundberg and Su-In Lee
Scott M. Lundberg and Su-In Lee. A unified approach to interpreting model predictions. In Advances in Neural Information Processing Systems (NeurIPS), pp.\ 4765--4774, 2017
2017
-
[83]
Luo, X., Rechardt, A., Sun, G., Nejad, K
Xiaoliang Luo, Akilles Rechardt, Guangzhi Sun, Kevin K. Nejad, Felipe Y \'a \ n ez, Bati Yilmaz, Kangjoo Lee, Alexandra O. Cohen, Valentina Borghesani, Anton Pashkov, et al. Large language models surpass human experts in predicting neuroscience results. Nature Human Behaviour, 9 0 (2): 0 305--315, nov 2024. doi:10.1038/s41562-024-02046-9. URL https://doi....
-
[84]
Modupe Odusami, Rytis Maskeli \=u nas, Robertas Dama s evi c ius, and Sanjay Misra. Machine learning with multimodal neuroimaging data to classify stages of alzheimer's disease: a systematic review and meta-analysis. Cognitive Neurodynamics, 18 0 (3): 0 775--794, 2023. doi:10.1007/s11571-023-09993-5
discussion (0)
Sign in with ORCID, Apple, or X to comment. Anyone can read and Pith papers without signing in.