pith. machine review for the scientific record. sign in

arxiv: 2605.06584 · v1 · submitted 2026-05-07 · 💻 cs.AI

Recognition: unknown

NeuroAgent: LLM Agents for Multimodal Neuroimaging Analysis and Research

Authors on Pith no claims yet

Pith reviewed 2026-05-08 09:33 UTC · model grok-4.3

classification 💻 cs.AI
keywords neuroimaging analysisLLM agentsmultimodal preprocessingAlzheimer's classificationagentic frameworkautomated pipelinesADNI dataset
0
0 comments X

The pith

NeuroAgent uses LLM agents to automate multimodal neuroimaging preprocessing and achieves an AUC of 0.9518 for Alzheimer's classification with four modalities.

A machine-rendered reading of the paper's core claim, the machinery that carries it, and where it could break.

The paper presents NeuroAgent as a system of LLM-powered agents that takes on the complex task of preparing and analyzing brain scans from multiple modalities including sMRI, fMRI, dMRI, and PET. Agents generate code for preprocessing steps, run it, recover from errors, and validate outputs through a Generate-Execute-Validate loop, cutting down on manual configuration and quality checks. Tested on 1,470 subjects from the ADNI dataset, the approach reaches high correctness rates with strong LLM backends and delivers better disease classification when all modalities are combined than when any one is used alone. This setup supports natural-language queries for further analysis after the data is ready.

Core claim

NeuroAgent's hierarchical multi-agent architecture with a feedback-driven Generate-Execute-Validate engine autonomously generates executable preprocessing code for heterogeneous neuroimaging data, detects and recovers from runtime errors, validates output integrity, and enables end-to-end multimodal analysis that yields an AUC of 0.9518 for Alzheimer's classification on pooled ADNI data, outperforming single-modality baselines.

What carries the argument

The hierarchical multi-agent architecture with a feedback-driven Generate-Execute-Validate engine that generates, executes, and validates preprocessing code across modalities while limiting human intervention to edge cases.

If this is right

  • Multimodal data preprocessed by the system produces higher Alzheimer's classification AUC than any single modality.
  • The architecture reduces manual effort for preprocessing to only edge cases via automated error recovery.
  • Natural language queries become feasible for downstream statistical analysis once data passes validation.
  • Pipeline performance scales with the capability of the underlying LLM backend up to 100% intent parsing and 84.8% end-to-end step correctness.

Where Pith is reading between the lines

These are editorial extensions of the paper, not claims the author makes directly.

  • Similar agent ensembles could be adapted to automate preprocessing in other medical imaging domains that face comparable toolchain complexity.
  • Wider adoption might enable labs with limited programming resources to conduct reproducible multimodal studies.
  • The reliance on automated validation raises the need for ongoing checks against gold-standard pipelines to maintain scientific trust.

Load-bearing premise

That code produced and validated by the LLM agents yields scientifically valid neuroimaging data without introducing systematic artifacts or biases that would alter research conclusions.

What would settle it

A side-by-side comparison by neuroimaging experts showing that NeuroAgent-processed outputs differ meaningfully from established manual pipelines in standard quality metrics such as tissue segmentation accuracy or lead to different classification performance.

read the original abstract

Multimodal neuroimaging analysis often involves complex, modality-specific preprocessing workflows that require careful configuration, quality control, and coordination across heterogeneous toolchains. Beyond preprocessing, downstream statistical analysis and disease classification commonly require task-specific code, evaluation protocols, and data-format conventions, creating additional barriers between raw acquisitions and reproducible scientific analysis. We present NeuroAgent, an LLM-driven agentic framework that automates key preprocessing and analysis steps for heterogeneous neuroimaging data, including sMRI, fMRI, dMRI, and PET, and supports interactive downstream analysis through natural-language queries. NeuroAgent employs a hierarchical multi-agent architecture with a feedback-driven Generate-Execute-Validate engine: agents autonomously generate executable preprocessing code, detect and recover from runtime errors, and validate output integrity. We evaluate the system on 1,470 subjects pooled across all ADNI phases (CN=1,000, AD=470), where all subjects have sMRI and tabular data, with subsets also having Tau-PET (n=469), fMRI (n=278), and DTI ($n=620$). Pipeline ablation studies across multiple LLM backends show that capable models reach up to 100% intent-parsing accuracy, with the strongest backend (Qwen3.5-27B) reaching 84.8% end-to-end preprocessing step correctness. Automated recovery limits manual intervention to edge cases where human review is required via the Human-In-The-Loop interface. For Alzheimer's Disease classification using automatically preprocessed multimodal data, our agent ensemble achieves an AUC of 0.9518 with four modalities, outperforming all single-modality baselines. These results show that NeuroAgent can reduce the manual effort required for neuroimaging preprocessing and enable end-to-end automated analysis pipelines for neuroimaging research.

Editorial analysis

A structured set of objections, weighed in public.

Desk editor's note, referee report, simulated authors' rebuttal, and a circularity audit. Tearing a paper down is the easy half of reading it; the pith above is the substance, this is the friction.

Referee Report

3 major / 3 minor

Summary. The paper introduces NeuroAgent, an LLM-powered hierarchical multi-agent system for automating multimodal neuroimaging preprocessing (sMRI, fMRI, dMRI, PET) and downstream analysis via natural language. It employs a Generate-Execute-Validate engine for code generation, runtime error recovery, and output validation. On 1,470 ADNI subjects (with subsets having additional modalities), the system reports 100% intent parsing, up to 84.8% end-to-end preprocessing step correctness (Qwen3.5-27B backend), and an AUC of 0.9518 for Alzheimer's classification using four modalities, outperforming single-modality baselines. Automated recovery is said to limit human intervention to edge cases via a Human-In-The-Loop interface.

Significance. If the agent-generated preprocessing yields data scientifically equivalent to expert pipelines, NeuroAgent would meaningfully lower barriers to reproducible multimodal neuroimaging research by handling complex toolchains and enabling natural-language analysis. The evaluation on a large public ADNI cohort with concrete metrics across multiple LLM backends and modality ablations is a positive aspect. However, the significance is limited by the absence of direct evidence that execution success translates to valid scientific outputs.

major comments (3)
  1. [§5] §5 (AD classification results): The central claim of AUC 0.9518 with four modalities outperforming single-modality baselines is load-bearing for the paper's contribution. This performance is reported on agent-preprocessed data, yet no quantitative validation against standard pipelines (e.g., fMRIPrep for registration accuracy, FreeSurfer for segmentation Dice scores, or DTIPrep) or blinded expert QC is provided; the 84.8% step correctness measures execution and recovery success only.
  2. [§4.1] §4.1 (Generate-Execute-Validate engine description): The automated error recovery and output validation steps are presented as sufficient to produce usable data with minimal human correction. However, the evaluation provides no analysis of recovered error types, potential systematic artifacts introduced by LLM-generated code, or downstream impact on metrics such as SNR or alignment quality.
  3. [Abstract and §5] Abstract and §5 (ADNI cohort details): The 1,470-subject evaluation pools data across ADNI phases with modality subsets (Tau-PET n=469, fMRI n=278, DTI n=620). Data exclusion rules, preprocessing validation criteria, and statistical tests for the multimodal AUC improvement are not specified, undermining assessment of whether the reported gains reflect valid multimodal signal or preprocessing artifacts.
minor comments (3)
  1. [Abstract and §3] The abstract and §3 use 'end-to-end preprocessing step correctness' without a precise definition or breakdown by modality or error category; a table clarifying this metric would improve clarity.
  2. [Related Work] Related work section omits several recent LLM-agent frameworks for scientific code generation; adding 2-3 key citations would better situate the hierarchical architecture.
  3. [Figures] Figure captions for the agent architecture and Human-In-The-Loop interface are brief; expanding them to describe the feedback loop would aid readers unfamiliar with agentic systems.

Simulated Author's Rebuttal

3 responses · 0 unresolved

We thank the referee for their constructive and detailed review, which identifies key areas where additional transparency and rigor would strengthen the manuscript. We address each major comment point by point below, committing to revisions that clarify our evaluation protocol while honestly noting the boundaries of the current study.

read point-by-point responses
  1. Referee: [§5] §5 (AD classification results): The central claim of AUC 0.9518 with four modalities outperforming single-modality baselines is load-bearing for the paper's contribution. This performance is reported on agent-preprocessed data, yet no quantitative validation against standard pipelines (e.g., fMRIPrep for registration accuracy, FreeSurfer for segmentation Dice scores, or DTIPrep) or blinded expert QC is provided; the 84.8% step correctness measures execution and recovery success only.

    Authors: We acknowledge that the 84.8% end-to-end correctness metric primarily captures successful code execution, error recovery, and basic output validation rather than direct quantitative equivalence to expert pipelines. In the revised manuscript, we will expand the Methods and Results sections to detail the exact criteria used for the correctness assessment (including visual and automated checks for gross artifacts and data integrity). We will also add an explicit limitations paragraph noting the absence of metrics such as Dice scores, registration accuracy, or blinded expert QC, and clarify that the reported multimodal AUC gains provide supporting but indirect evidence of data usability. Comprehensive quantitative benchmarking against tools like FreeSurfer or fMRIPrep was outside the scope of this work focused on agent automation. revision: partial

  2. Referee: [§4.1] §4.1 (Generate-Execute-Validate engine description): The automated error recovery and output validation steps are presented as sufficient to produce usable data with minimal human correction. However, the evaluation provides no analysis of recovered error types, potential systematic artifacts introduced by LLM-generated code, or downstream impact on metrics such as SNR or alignment quality.

    Authors: We agree that greater detail on error recovery and potential artifacts would improve the description of the Generate-Execute-Validate engine. In the revision, we will add a dedicated paragraph and supplementary table in §4.1 categorizing the observed error types (e.g., syntax errors, neuroimaging-tool-specific runtime failures, and format inconsistencies) along with recovery success rates. We will also discuss the risk of LLM-induced artifacts and report any available downstream quality indicators (such as basic alignment or intensity statistics) from the preprocessed outputs to address concerns about systematic effects on SNR or registration quality. revision: yes

  3. Referee: [Abstract and §5] Abstract and §5 (ADNI cohort details): The 1,470-subject evaluation pools data across ADNI phases with modality subsets (Tau-PET n=469, fMRI n=278, DTI n=620). Data exclusion rules, preprocessing validation criteria, and statistical tests for the multimodal AUC improvement are not specified, undermining assessment of whether the reported gains reflect valid multimodal signal or preprocessing artifacts.

    Authors: We will revise both the Abstract and §5 to explicitly state the data exclusion rules (subjects removed due to missing modalities, failed initial quality checks, or incomplete tabular data), the precise validation criteria applied during the agent's output validation step, and the statistical tests used to evaluate multimodal AUC improvement (including DeLong's test for paired AUC comparisons with p-values). These additions will allow readers to better assess whether the performance gains arise from genuine multimodal signal rather than preprocessing artifacts. revision: yes

Circularity Check

0 steps flagged

No circularity: empirical evaluation on external dataset

full rationale

The paper describes an LLM agent framework evaluated empirically on the public ADNI dataset (1470 subjects). Reported metrics (100% intent parsing, 84.8% preprocessing correctness, AUC 0.9518) are direct performance measurements from running the system, not mathematical derivations, fitted parameters renamed as predictions, or self-referential definitions. No equations, uniqueness theorems, or ansatzes appear; the central claims rest on observed outcomes rather than reducing to inputs by construction. This is a standard systems paper with independent empirical content.

Axiom & Free-Parameter Ledger

0 free parameters · 2 axioms · 1 invented entities

The framework rests on the assumption that current LLMs can reliably produce and debug neuroimaging-specific code; no free parameters are explicitly fitted in the abstract, but model choice and prompt engineering act as implicit tunable elements.

axioms (2)
  • domain assumption LLMs can generate executable, modality-specific preprocessing code that integrates with existing neuroimaging toolchains
    Invoked in the Generate step of the engine and central to the 84.8% correctness claim
  • domain assumption Automated validation can detect and allow recovery from most runtime and output-integrity errors without introducing bias
    Required for the claim that manual intervention is limited to edge cases
invented entities (1)
  • Hierarchical multi-agent Generate-Execute-Validate engine no independent evidence
    purpose: Autonomously handle code generation, execution, error recovery, and validation for neuroimaging pipelines
    Core novel component introduced by the paper; no independent evidence provided beyond reported metrics

pith-pipeline@v0.9.0 · 5634 in / 1603 out tokens · 72744 ms · 2026-05-08T09:33:38.071538+00:00 · methodology

discussion (0)

Sign in with ORCID, Apple, or X to comment. Anyone can read and Pith papers without signing in.

Reference graph

Works this paper leans on

94 extracted references · 48 canonical work pages · 9 internal anchors

  1. [1]

    Neuroimage , volume=

    Resting-state fMRI in the Human Connectome Project , author=. Neuroimage , volume=. 2013 , publisher=

  2. [2]

    Neuroimage , volume=

    The WU-Minn human connectome project: an overview , author=. Neuroimage , volume=. 2013 , publisher=

  3. [3]

    Neuroimage , volume=

    FSL , author=. Neuroimage , volume=. 2012 , publisher=

  4. [4]

    Human Brain Mapping , volume=

    Fast robust automated brain extraction , author=. Human Brain Mapping , volume=. 2002 , doi=

  5. [5]

    and Barnes, Kelly A

    Power, Jonathan D. and Barnes, Kelly A. and Snyder, Abraham Z. and Schlaggar, Bradley L. and Petersen, Steven E. , journal=. Spurious but systematic correlations in functional connectivity. 2012 , doi=

  6. [6]

    NeuroImage , volume=

    Unified segmentation , author=. NeuroImage , volume=. 2005 , doi=

  7. [7]

    and Tustison, Nicholas J

    Avants, Brian B. and Tustison, Nicholas J. and Song, Gang and Cook, Philip A. and Klein, Arno and Gee, James C. , journal=. A reproducible evaluation of. 2011 , doi=

  8. [8]

    2012 , doi=

    Fischl, Bruce , journal=. 2012 , doi=

  9. [9]

    IEEE Transactions on Medical Imaging , volume=

    elastix: A Toolbox for Intensity-Based Medical Image Registration , author=. IEEE Transactions on Medical Imaging , volume=. 2010 , doi=

  10. [10]

    Advances in Neural Information Processing Systems (NeurIPS) , pages=

    A Unified Approach to Interpreting Model Predictions , author=. Advances in Neural Information Processing Systems (NeurIPS) , pages=

  11. [11]

    and Nielson, Dylan M

    Esteban, Oscar and Blair, Ross W. and Nielson, Dylan M. and Varada, Jan C. and Marrett, Sean and Thomas, Adam G. and Poldrack, Russell A. and Gorgolewski, Krzysztof J. , journal=. Crowdsourced. 2019 , doi=

  12. [12]

    Nature methods , volume=

    fMRIPrep: a robust preprocessing pipeline for functional MRI , author=. Nature methods , volume=. 2019 , publisher=

  13. [13]

    Frontiers in Neuroinformatics , volume=

    Nipype: A Flexible, Lightweight and Extensible Neuroimaging Data Processing Framework in Python , author=. Frontiers in Neuroinformatics , volume=. 2011 , doi=

  14. [14]

    Towards expert- level medical question answering with large language models,

    Towards Expert-Level Medical Question Answering with Large Language Models , author=. 2023 , publisher=. doi:10.48550/arXiv.2305.09617 , url=

  15. [15]

    GPT-4 Technical Report

    GPT-4 Technical Report , author=. arXiv preprint arXiv:2303.08774 , year=

  16. [16]

    Advances in Neural Information Processing Systems , volume=

    Chain-of-Thought Prompting Elicits Reasoning in Large Language Models , author=. Advances in Neural Information Processing Systems , volume=

  17. [17]

    Frontiers of Computer Science , year=

    A Survey on Large Language Model based Autonomous Agents , author=. Frontiers of Computer Science , year=

  18. [18]

    Nature , volume=

    Variability in the analysis of a single neuroimaging dataset by many teams , author=. Nature , volume=. 2020 , doi=

  19. [19]

    ReAct: Synergizing Reasoning and Acting in Language Models

    ReAct: Synergizing Reasoning and Acting in Language Models , author=. 2022 , publisher=. doi:10.48550/arXiv.2210.03629 , url=

  20. [20]

    Toolformer: Language Models Can Teach Themselves to Use Tools

    Toolformer: Language Models Can Teach Themselves to Use Tools , author=. 2023 , publisher=. doi:10.48550/arXiv.2302.04761 , url=

  21. [21]

    AutoGen: Enabling Next-Gen LLM Applications via Multi-Agent Conversation

    AutoGen: Enabling Next-Gen LLM Applications via Multi-Agent Conversation Framework , author=. 2023 , publisher=. doi:10.48550/arXiv.2308.08155 , url=

  22. [22]

    MedGemma Technical Report

    MedGemma Technical Report , author=. 2025 , publisher=. doi:10.48550/arXiv.2507.05201 , url=

  23. [23]

    Sparsellm: Towards global pruning for pre-trained language models, 2024

    SparseLLM: Towards Global Pruning for Pre-trained Language Models , author=. 2024 , publisher=. doi:10.48550/arXiv.2402.17946 , url=

  24. [26]

    Adagent: Llm agent for alzheimer's disease analysis with collaborative coordinator, 2025

    ADAgent: LLM Agent for Alzheimer's Disease Analysis with Collaborative Coordinator , author=. 2025 , publisher=. doi:10.48550/arXiv.2506.11150 , url=

  25. [28]

    Ad-care: A guideline-grounded, modality-agnostic llm agent for real-world alzheimer's disease diagnosis with multi-cohort assessment, fairness analysis, and reader study, 2026

    AD-CARE: A Guideline-grounded, Modality-agnostic LLM Agent for Real-world Alzheimer's Disease Diagnosis with Multi-cohort Assessment, Fairness Analysis, and Reader Study , author=. 2026 , publisher=. doi:10.48550/arXiv.2603.25322 , url=

  26. [29]

    Evolving medical imaging agents via experience-driven self-skill discovery, 2026

    Evolving Medical Imaging Agents via Experience-driven Self-skill Discovery , author=. 2026 , publisher=. doi:10.48550/arXiv.2603.05860 , url=

  27. [30]

    Rex-mle: The autonomous agent benchmark for medical imaging challenges, 2025

    ReX-MLE: The Autonomous Agent Benchmark for Medical Imaging Challenges , author=. 2025 , publisher=. doi:10.48550/arXiv.2512.17838 , url=

  28. [31]

    A co-evolving agentic ai system for medical imaging analysis, 2025

    A co-evolving agentic AI system for medical imaging analysis , author=. 2025 , publisher=. doi:10.48550/arXiv.2509.20279 , url=

  29. [32]

    Aura: A multi-modal medical agent for understanding, reasoning & annotation, 2025

    AURA: A Multi-Modal Medical Agent for Understanding, Reasoning & Annotation , author=. 2025 , publisher=. doi:10.48550/arXiv.2507.16940 , url=

  30. [33]

    Medmaslab: A unified orchestration framework for benchmarking multimodal medical multi-agent systems, 2026

    MedMASLab: A Unified Orchestration Framework for Benchmarking Multimodal Medical Multi-Agent Systems , author=. 2026 , publisher=. doi:10.48550/arXiv.2603.09909 , url=

  31. [34]

    Ad-reasoning: Multimodal guideline-guided reasoning for alzheimer's disease diagnosis, 2026

    AD-Reasoning: Multimodal Guideline-Guided Reasoning for Alzheimer's Disease Diagnosis , author=. 2026 , publisher=. doi:10.48550/arXiv.2603.24059 , url=

  32. [35]

    Annual review of biomedical engineering , volume=

    Deep learning in medical image analysis , author=. Annual review of biomedical engineering , volume=. 2017 , publisher=

  33. [36]

    Alzheimer's & Dementia , volume=

    Revised criteria for diagnosis and staging of Alzheimer's disease: Alzheimer's Association Workgroup , author=. Alzheimer's & Dementia , volume=. 2024 , doi=

  34. [37]

    Alzheimer's & Dementia , volume=

    NIA-AA Research Framework: Toward a biological definition of Alzheimer's disease , author=. Alzheimer's & Dementia , volume=. 2018 , doi=

  35. [38]

    Insights into Imaging , volume=

    Multimodality imaging of neurodegenerative disorders with a focus on multiparametric magnetic resonance and molecular imaging , author=. Insights into Imaging , volume=. 2023 , doi=

  36. [39]

    Cognitive Neurodynamics , volume=

    Machine learning with multimodal neuroimaging data to classify stages of Alzheimer's disease: a systematic review and meta-analysis , author=. Cognitive Neurodynamics , volume=. 2023 , doi=

  37. [40]

    Alzheimer's & Dementia , volume=

    Design and validation of the ADNI MR protocol , author=. Alzheimer's & Dementia , volume=. 2024 , publisher=

  38. [41]

    Frontiers in Neuroinformatics , volume =

    Diffusion MRI Indices and Their Relation to Cognitive Impairment in Brain Aging: The Updated Multi-Protocol Approach in ADNI3 , author =. Frontiers in Neuroinformatics , volume =. 2019 , doi =

  39. [42]

    Alzheimer's & Dementia , year =

    Design and Validation of the ADNI4 MRI Protocol , author =. Alzheimer's & Dementia , year =

  40. [43]

    Journal of Magnetic Resonance, Series B , volume =

    MR Diffusion Tensor Spectroscopy and Imaging , author =. Journal of Magnetic Resonance, Series B , volume =. 1994 , doi =

  41. [44]

    Magnetic Resonance in Medicine , volume =

    Diffusional Kurtosis Imaging: The Quantification of Non-Gaussian Water Diffusion by Means of Magnetic Resonance Imaging , author =. Magnetic Resonance in Medicine , volume =. 2005 , doi =

  42. [45]

    NMR in Biomedicine , volume =

    MRI Quantification of Non-Gaussian Water Diffusion by Kurtosis Analysis , author =. NMR in Biomedicine , volume =. 2010 , doi =

  43. [46]

    NeuroImage , volume =

    NODDI: Practical In Vivo Neurite Orientation Dispersion and Density Imaging of the Human Brain , author =. NeuroImage , volume =. 2012 , doi =

  44. [47]

    Magnetic Resonance in Medicine , volume =

    Generalized Autocalibrating Partially Parallel Acquisitions (GRAPPA) , author =. Magnetic Resonance in Medicine , volume =. 2002 , doi =

  45. [48]

    Magnetic Resonance in Medicine , volume =

    SENSE: Sensitivity Encoding for Fast MRI , author =. Magnetic Resonance in Medicine , volume =. 1999 , doi =

  46. [49]

    Magnetic Resonance in Medicine , volume =

    Blipped-Controlled Aliasing in Parallel Imaging for Simultaneous Multislice Echo Planar Imaging with Reduced g-Factor Penalty , author =. Magnetic Resonance in Medicine , volume =. 2012 , doi =

  47. [50]

    Journal of Magnetic Resonance Imaging , volume =

    Three-Dimensional Magnetization-Prepared Rapid Gradient-Echo Imaging (3D MP RAGE) , author =. Journal of Magnetic Resonance Imaging , volume =. 1990 , doi =

  48. [51]

    Neuroimage , volume=

    How to correct susceptibility distortions in spin-echo echo-planar images: application to diffusion tensor imaging , author=. Neuroimage , volume=. 2003 , publisher=

  49. [52]

    Neuroimage , volume=

    An integrated framework for correction of susceptibility, eddy currents, and motion artifacts in diffusion MRI , author=. Neuroimage , volume=. 2016 , publisher=

  50. [53]

    Neuroimage , volume=

    Robust determination of the fibre orientation distribution in diffusion MRI: non-negativity constrained super-resolved spherical deconvolution , author=. Neuroimage , volume=. 2007 , publisher=

  51. [54]

    Magnetic Resonance in Medicine , volume=

    Blipped-controlled aliasing in parallel imaging for simultaneous multislice echo planar imaging with reduced g-factor penalty , author=. Magnetic Resonance in Medicine , volume=. 2012 , publisher=

  52. [55]

    Magnetic Resonance Imaging , volume=

    Simultaneous multi-slice MRI with non-Cartesian trajectories , author=. Magnetic Resonance Imaging , volume=. 2021 , publisher=

  53. [56]

    Neuroimage , volume=

    White matter characterization with diffusional kurtosis imaging , author=. Neuroimage , volume=. 2011 , publisher=

  54. [58]

    Design and validation of the adni mr protocol

    Arvin Arani, Bret Borowski, John Felmlee, Robert I Reid, David L Thomas, Jeffrey L Gunter, Lara Stables, Randy L Buckner, Youngkyoo Jung, Duygu Tosun, et al. Design and validation of the adni mr protocol. Alzheimer's & Dementia, 20 0 (9): 0 6615--6621, 2024

  55. [59]

    Bernstein, Brian J

    Arvin Arani, Matthew A. Bernstein, Brian J. Borowski, Clifford R. Jack, and Michael W. Weiner. Design and validation of the adni4 mri protocol. Alzheimer's & Dementia, 2025. In press

  56. [60]

    John Ashburner and Karl J. Friston. Unified segmentation. NeuroImage, 26 0 (3): 0 839--851, 2005. doi:10.1016/j.neuroimage.2005.02.018

  57. [61]

    Avants, Nicholas J

    Brian B. Avants, Nicholas J. Tustison, Gang Song, Philip A. Cook, Arno Klein, and James C. Gee. A reproducible evaluation of ANTs similarity metric performance in brain image registration. NeuroImage, 54 0 (3): 0 2033--2044, 2011. doi:10.1016/j.neuroimage.2010.09.025

  58. [62]

    Sparsellm: Towards global pruning for pre-trained language models, 2024

    Guangji Bai, Yijiang Li, Chen Ling, Kibaek Kim, and Liang Zhao. Sparsellm: Towards global pruning for pre-trained language models, 2024. URL https://arxiv.org/abs/2402.17946

  59. [63]

    29 Dexter Kozen

    Peter J. Basser, James Mattiello, and Denis LeBihan. Mr diffusion tensor spectroscopy and imaging. Journal of Magnetic Resonance, Series B, 103 0 (3): 0 247--254, 1994. doi:10.1006/jmrb.1994.1037

  60. [64]

    Raymond J

    Rotem Botvinik-Nezer, Felix Holzmeister, Colin F. Camerer, Anna Dreber, Juergen Huber, Magnus Johannesson, Michael Kirchler, Roni Iwanir, Jeanette A. Mumford, R. Alison Adcock, et al. Variability in the analysis of a single neuroimaging dataset by many teams. Nature, 582 0 (7810): 0 84--88, 2020. doi:10.1038/s41586-020-2314-9. URL https://doi.org/10.1038/...

  61. [65]

    MONAI: An open-source framework for deep learning in healthcare

    M Jorge Cardoso, Wenqi Li, Richard Brown, Nic Ma, Eric Kerfoot, Yiheng Wang, Benjamin Murrey, Andriy Myronenko, Can Zhao, Dong Yang, et al. Monai: An open-source framework for deep learning in healthcare. arXiv preprint arXiv:2211.02701, 2022

  62. [66]

    Ad-reasoning: Multimodal guideline-guided reasoning for alzheimer's disease diagnosis, 2026

    Qiuhui Chen et al. Ad-reasoning: Multimodal guideline-guided reasoning for alzheimer's disease diagnosis, 2026. URL https://arxiv.org/abs/2603.24059

  63. [67]

    Blair, Dylan M

    Oscar Esteban, Ross W. Blair, Dylan M. Nielson, Jan C. Varada, Sean Marrett, Adam G. Thomas, Russell A. Poldrack, and Krzysztof J. Gorgolewski. Crowdsourced MRI quality metrics and expert quality annotations for training of humans and machines. Scientific Data, 6 0 (1): 0 30, 2019 a . doi:10.1038/s41597-019-0035-4

  64. [68]

    fmriprep: a robust preprocessing pipeline for functional mri

    Oscar Esteban et al. fmriprep: a robust preprocessing pipeline for functional mri. Nature methods, 16 0 (1): 0 111--116, 2019 b

  65. [69]

    Evolving medical imaging agents via experience-driven self-skill discovery, 2026

    Lin Fan et al. Evolving medical imaging agents via experience-driven self-skill discovery, 2026. URL https://arxiv.org/abs/2603.05860

  66. [70]

    Aura: A multi-modal medical agent for understanding, reasoning & annotation, 2025

    Nima Fathi, Amar Kumar, and Tal Arbel. Aura: A multi-modal medical agent for understanding, reasoning & annotation, 2025. URL https://arxiv.org/abs/2507.16940

  67. [71]

    2012 , journal =

    Bruce Fischl. FreeSurfer . NeuroImage, 62 0 (2): 0 774--781, 2012. doi:10.1016/j.neuroimage.2012.01.021

  68. [72]

    Burns, Cindee Madison, Dav Clark, Yaroslav O

    Krzysztof Gorgolewski, Christopher D. Burns, Cindee Madison, Dav Clark, Yaroslav O. Halchenko, Michael L. Waskom, and Satrajit S. Ghosh. Nipype: A flexible, lightweight and extensible neuroimaging data processing framework in python. Frontiers in Neuroinformatics, 5, 2011. doi:10.3389/fninf.2011.00013. URL https://doi.org/10.3389/fninf.2011.00013

  69. [73]

    Adagent: Llm agent for alzheimer's disease analysis with collaborative coordinator, 2025

    Wenlong Hou, Guangqian Yang, Ye Du, Yeung Lau, Lihao Liu, Junjun He, Ling Long, and Shujun Wang. Adagent: Llm agent for alzheimer's disease analysis with collaborative coordinator, 2025. URL https://arxiv.org/abs/2506.11150

  70. [74]

    Ad-care: A guideline-grounded, modality-agnostic llm agent for real-world alzheimer's disease diagnosis with multi-cohort assessment, fairness analysis, and reader study, 2026

    Wenlong Hou, Sheng Bi, Guangqian Yang, Lihao Liu, Ye Du, Hanxiao Xue, Juncheng Wang, Yuxiang Feng, Yue Xun, Nanxi Yu, et al. Ad-care: A guideline-grounded, modality-agnostic llm agent for real-world alzheimer's disease diagnosis with multi-cohort assessment, fairness analysis, and reader study, 2026. URL https://arxiv.org/abs/2603.25322

  71. [75]

    Alzheimer’s and Dementia14, 535–562 (4 2018)

    Clifford R. Jack, David A. Bennett, Kaj Blennow, Maria C. Carrillo, Billy Dunn, Samantha Budd Haeberlein, David M. Holtzman, William Jagust, Frank Jessen, Jason Karlawish, et al. Nia-aa research framework: Toward a biological definition of alzheimer's disease. Alzheimer's & Dementia, 14 0 (4): 0 535--562, 2018. doi:10.1016/j.jalz.2018.02.018

  72. [76]

    Clifford R. Jack, J. Scott Andrews, Thomas G. Beach, Teresa Buracchio, Billy Dunn, Ana Graf, Oskar Hansson, Carole Ho, William Jagust, Eric McDade, et al. Revised criteria for diagnosis and staging of alzheimer's disease: Alzheimer's association workgroup. Alzheimer's & Dementia, 20 0 (8): 0 5143--5169, 2024. doi:10.1002/alz.13859

  73. [77]

    Mark Jenkinson, Christian F Beckmann, Timothy EJ Behrens, Mark W Woolrich, and Stephen M Smith. Fsl. Neuroimage, 62 0 (2): 0 782--790, 2012

  74. [78]

    Rex-mle: The autonomous agent benchmark for medical imaging challenges, 2025

    Roshan Kenia et al. Rex-mle: The autonomous agent benchmark for medical imaging challenges, 2025. URL https://arxiv.org/abs/2512.17838

  75. [79]

    Viergever, and Josien P

    Stefan Klein, Marius Staring, Keelin Murphy, Max A. Viergever, and Josien P. W. Pluim. elastix: A toolbox for intensity-based medical image registration. IEEE Transactions on Medical Imaging, 29 0 (1): 0 196--205, 2010. doi:10.1109/TMI.2009.2035616

  76. [80]

    A co-evolving agentic ai system for medical imaging analysis, 2025

    Songhao Li, Jonathan Xu, Tiancheng Bao, Yuxuan Liu, Yuchen Liu, Yihang Liu, Lilin Wang, Wenhui Lei, Sheng Wang, Yinuo Xu, Yan Cui, Jialu Yao, Shunsuke Koga, and Zhi Huang. A co-evolving agentic ai system for medical imaging analysis, 2025. URL https://arxiv.org/abs/2509.20279

  77. [81]

    James Ryan Loftus, Savita Puri, and Steven P. Meyers. Multimodality imaging of neurodegenerative disorders with a focus on multiparametric magnetic resonance and molecular imaging. Insights into Imaging, 14 0 (1), 2023. doi:10.1186/s13244-022-01358-6

  78. [82]

    Lundberg and Su-In Lee

    Scott M. Lundberg and Su-In Lee. A unified approach to interpreting model predictions. In Advances in Neural Information Processing Systems (NeurIPS), pp.\ 4765--4774, 2017

  79. [83]

    Luo, X., Rechardt, A., Sun, G., Nejad, K

    Xiaoliang Luo, Akilles Rechardt, Guangzhi Sun, Kevin K. Nejad, Felipe Y \'a \ n ez, Bati Yilmaz, Kangjoo Lee, Alexandra O. Cohen, Valentina Borghesani, Anton Pashkov, et al. Large language models surpass human experts in predicting neuroscience results. Nature Human Behaviour, 9 0 (2): 0 305--315, nov 2024. doi:10.1038/s41562-024-02046-9. URL https://doi....

  80. [84]

    Machine learning with multimodal neuroimaging data to classify stages of alzheimer's disease: a systematic review and meta-analysis

    Modupe Odusami, Rytis Maskeli \=u nas, Robertas Dama s evi c ius, and Sanjay Misra. Machine learning with multimodal neuroimaging data to classify stages of alzheimer's disease: a systematic review and meta-analysis. Cognitive Neurodynamics, 18 0 (3): 0 775--794, 2023. doi:10.1007/s11571-023-09993-5

Showing first 80 references.