pith. sign in

arxiv: 2606.28684 · v1 · pith:KTX6NMPGnew · submitted 2026-06-27 · 📡 eess.IV · cs.LG

A Neuroimaging Simulation Framework for Developing and Evaluating Causal AI

Pith reviewed 2026-06-30 09:06 UTC · model grok-4.3

classification 📡 eess.IV cs.LG
keywords neuroimaging simulationcausal AIsynthetic MRIground-truth datacausal discoveryT1-weighted imagesvolumetric changesbrain region control
0
0 comments X

The pith

A simulation framework produces realistic 3D brain scans with precisely controlled causal effects to supply ground-truth data for causal AI methods.

A machine-rendered reading of the paper's core claim, the machinery that carries it, and where it could break.

The paper introduces a method that generates synthetic T1-weighted magnetic resonance images while enforcing a user-specified causal structure between non-image variables and image features. Anatomical differences across subjects are created by sampling a subspace from real scans and warping a template, and causal relationships are imposed through targeted volume adjustments in chosen brain regions. These adjustments avoid global side effects, yielding low error rates in both targeted and non-targeted areas. The resulting datasets allow objective testing of causal discovery algorithms, which the authors show still produce many spurious links when applied to image data. This addresses the absence of known causal ground truth that has slowed progress on causal AI for neuroimaging.

Core claim

The framework generates realistic synthetic 3D neuroimages that adhere to a user-specified causal structure by encoding relationships through precise volumetric changes of any region-of-interest without unwanted global artifacts, while anatomical variability is modeled by sampling from a subspace estimated from real data and deforming a template image, thereby creating the first source of ground-truth datasets for benchmarking and developing causal AI methods in neuroimaging.

What carries the argument

Encoding causal relationships via precise volumetric changes of any region-of-interest without unwanted global artifacts, combined with subspace sampling from real data for subject variability.

If this is right

  • Enables creation of unlimited ground-truth datasets with known causal structures for objective benchmarking of causal AI.
  • Demonstrates that current causal discovery methods applied to these images produce many spurious connections.
  • Supports development of new causal methods adapted to the statistical properties of medical images.
  • Achieves relative volume errors of 0.3-2.66 percent in targeted regions while keeping mean absolute errors in non-target regions between 0.034-0.397 ml.

Where Pith is reading between the lines

These are editorial extensions of the paper, not claims the author makes directly.

  • The same volumetric-control approach could be adapted to test causal models of disease progression by varying the strength of the imposed effects.
  • Synthetic datasets from this framework could serve as a common benchmark for comparing different causal AI architectures on identical causal ground truth.
  • Integration with existing image-registration tools might allow the framework to incorporate real patient covariates as additional causal nodes.

Load-bearing premise

Precise volumetric changes to chosen brain regions without creating global artifacts produce images realistic enough to stand in for real causal structures in neuroimaging.

What would settle it

Apply existing causal discovery algorithms to the generated images and check whether they recover the known causal edges at rates substantially above those expected from random guessing while keeping false-positive rates low.

Figures

Figures reproduced from arXiv: 2606.28684 by Emma A.M. Stanley, Erik Y. Ohara, Eryn Libert-Scott, Matthias Wilms, Nils D. Forkert, Vibujithan Vigneshwaran.

Figure 1
Figure 1. Figure 1: Directed graphs of three causal structures. [PITH_FULL_IMAGE:figures/full_fig_p003_1.png] view at source ↗
Figure 2
Figure 2. Figure 2: Original, intervened, and difference map images for left lateral [PITH_FULL_IMAGE:figures/full_fig_p006_2.png] view at source ↗
Figure 3
Figure 3. Figure 3: Volume change vs. causal variable across various target VOIs for the [PITH_FULL_IMAGE:figures/full_fig_p006_3.png] view at source ↗
Figure 4
Figure 4. Figure 4: Adjacency matrices for the three ground-truth structures, chain(left), [PITH_FULL_IMAGE:figures/full_fig_p007_4.png] view at source ↗
Figure 5
Figure 5. Figure 5: Visualization of causal graphs of the ground-truth (chain) and causal [PITH_FULL_IMAGE:figures/full_fig_p008_5.png] view at source ↗
read the original abstract

Causally linking disease-related factors to image-derived biomarkers provides a powerful pathway to understanding disease mechanisms. Despite growing interest in applying causal artificial intelligence (AI) approaches for this task, these methods still need to be adapted for complex medical images, and especially, neuroimaging. However, the lack of ground-truth data presents a barrier to development. To bridge this gap, we developed and tested a method for generating synthetic neuroimages, which adhere to a user-specified causal structure describing the non-image to image variable relationships, permitting the creation of ground-truth neuroimaging datasets. In the simulated T1-weighted magnetic resonance images, anatomical variability is modeled by sampling from a subspace estimated from real data and deforming a template image to create unique simulated subjects. Causal relationships are encoded via precise volumetric changes of any region-of-interest without unwanted global artifacts. We achieved relative volume errors of 0.3-2.66% for the targeted regions-of-interest and demonstrate their statistically significant causal relationships, while maintaining mean absolute errors for non-target brain regions between 0.034-0.397ml. An initial evaluation of causal discovery methods exposes their limited ability to suppress spurious connections, highlighting the need for image-appropriate methods. Our framework is the first to enable the generation of realistic synthetic 3D neuroimages with explicit causal control that can serve as the missing ground-truth data necessary for the objective benchmarking and development of causal AI methods.

Editorial analysis

A structured set of objections, weighed in public.

Desk editor's note, referee report, simulated authors' rebuttal, and a circularity audit. Tearing a paper down is the easy half of reading it; the pith above is the substance, this is the friction.

Referee Report

1 major / 0 minor

Summary. The paper introduces a simulation framework for generating synthetic 3D T1-weighted MRI neuroimages that follow user-specified causal structures. Anatomical variability is modeled by sampling from a real-data subspace and deforming a template; causality is encoded exclusively through precise volumetric deformations of user-specified ROIs. The authors report relative volume errors of 0.3-2.66% on target ROIs, non-target MAE of 0.034-0.397 ml, statistically significant causal relationships, and an initial demonstration that existing causal discovery methods produce spurious connections on the generated data. They position the framework as the first to supply realistic ground-truth 3D neuroimages with explicit causal control for benchmarking causal AI methods.

Significance. If the generated images prove representative of real causal neuroimaging structures beyond volume metrics, the framework would address a genuine bottleneck in causal AI development for medical imaging by supplying controllable ground-truth data. The reported quantitative volume-preservation metrics constitute a concrete, falsifiable strength. However, the central utility claim hinges on an untested assumption that volumetric ROI changes alone produce intensity patterns, textures, and higher-order features whose joint distributions match those arising from the same causal factors in real data.

major comments (1)
  1. [Abstract] Abstract: the central claim that the framework produces 'realistic synthetic 3D neuroimages' suitable for 'objective benchmarking' of causal AI rests on the assumption that precise volumetric ROI changes without global artifacts suffice to reproduce the relevant causal image structure. The only quantitative support provided is relative volume error (0.3-2.66%) and non-target MAE (0.034-0.397 ml); no metrics on intensity histograms, texture features, or higher-order statistics are reported to test whether the simulated images match the joint distributions that would arise from the same causal factors in real data. This is load-bearing for the utility claim.

Simulated Author's Rebuttal

1 responses · 0 unresolved

We thank the referee for their constructive feedback. We address the major comment below.

read point-by-point responses
  1. Referee: [Abstract] Abstract: the central claim that the framework produces 'realistic synthetic 3D neuroimages' suitable for 'objective benchmarking' of causal AI rests on the assumption that precise volumetric ROI changes without global artifacts suffice to reproduce the relevant causal image structure. The only quantitative support provided is relative volume error (0.3-2.66%) and non-target MAE (0.034-0.397 ml); no metrics on intensity histograms, texture features, or higher-order statistics are reported to test whether the simulated images match the joint distributions that would arise from the same causal factors in real data. This is load-bearing for the utility claim.

    Authors: The framework models anatomical variability by sampling from a real-data subspace and deforming a template, which is intended to reproduce intensity patterns and textures consistent with real distributions. The reported volume metrics demonstrate precise causal control without global artifacts. We agree that the absence of explicit metrics on intensity histograms, texture features, or higher-order statistics leaves the broader realism claim less fully supported than it could be. We will revise the abstract and add such analyses to the manuscript. revision: yes

Circularity Check

0 steps flagged

No circularity: generative simulation is self-contained construction

full rationale

The paper presents a forward generative procedure: sample anatomical variability from a real-data subspace, deform a template, and encode user-specified causal relationships exclusively via targeted volumetric ROI changes. All reported quantities (relative volume errors 0.3-2.66%, non-target MAE 0.034-0.397 ml, statistically significant volume correlations) are direct measurements of the controlled deformations themselves rather than predictions or inferences that reduce to fitted parameters presupposing the target result. No self-citations, uniqueness theorems, or ansatzes are invoked as load-bearing justification; the method is offered as an explicit construction for producing ground-truth data. The derivation chain therefore contains no reduction of outputs to inputs by construction.

Axiom & Free-Parameter Ledger

1 free parameters · 2 axioms · 0 invented entities

The central claim rests on domain assumptions about modeling anatomical variability and encoding causality through localized volume changes; no explicit free parameters fitted to the target causal results are described in the abstract.

free parameters (1)
  • subspace estimated from real data
    Used to sample anatomical variability for creating unique simulated subjects.
axioms (2)
  • domain assumption Anatomical variability can be modeled by sampling from a subspace estimated from real data and deforming a template image
    Invoked to create unique simulated subjects while maintaining realism.
  • domain assumption Causal relationships can be encoded via precise volumetric changes of any region-of-interest without unwanted global artifacts
    Central to permitting creation of ground-truth datasets with specified causal structure.

pith-pipeline@v0.9.1-grok · 5807 in / 1305 out tokens · 30664 ms · 2026-06-30T09:06:35.273011+00:00 · methodology

discussion (0)

Sign in with ORCID, Apple, or X to comment. Anyone can read and Pith papers without signing in.

Reference graph

Works this paper leans on

43 extracted references · 21 canonical work pages · 4 internal anchors

  1. [1]

    Uncertainty in the Translation of Preclinical Experiments to Clinical Trials. Why do Most Phase III Clinical Trials Fail?

    P. Lowenstein and M. Castro, “Uncertainty in the Translation of Preclinical Experiments to Clinical Trials. Why do Most Phase III Clinical Trials Fail?”Current Gene Therapy, vol. 9, no. 5, pp. 368–374, Oct. 2009. [Online]. Available: http://www.eurekaselect.com/openurl/c ontent.php?genre=article&issn=1566-5232&volume=9&issue=5&spage =368

  2. [2]

    Distinct visual biases affect humans and artificial intelligence in medical imaging diagnoses,

    G. A. McLeod, E. A. M. Stanley, T. Rosenal, and N. D. Forkert, “Distinct visual biases affect humans and artificial intelligence in medical imaging diagnoses,”npj Digital Medicine, vol. 9, no. 1, p. 62, Dec. 2025. [Online]. Available: https://www.nature.com/articles/s41746 -025-02226-5

  3. [3]

    High-performance medicine: the convergence of human and artificial intelligence,

    E. J. Topol, “High-performance medicine: the convergence of human and artificial intelligence,”Nature Medicine, vol. 25, no. 1, pp. 44–56, Jan. 2019. [Online]. Available: https://www.nature.com/articles/s41591 -018-0300-7

  4. [4]

    Artificial intelligence in medicine: current trends and future possibilities,

    V . H. Buch, I. Ahmed, and M. Maruthappu, “Artificial intelligence in medicine: current trends and future possibilities,”British Journal of General Practice, vol. 68, no. 668, pp. 143–144, Mar. 2018. [Online]. Available: https://bjgp.org/lookup/doi/10.3399/bjgp18X695213

  5. [5]

    Causal Machine Learning for Healthcare and Precision Medicine,

    P. Sanchez, J. P. V oisey, T. Xia, H. I. Watson, A. Q. ONeil, and S. A. Tsaftaris, “Causal Machine Learning for Healthcare and Precision Medicine,” 2022, version Number: 2. [Online]. Available: https://arxiv.org/abs/2205.11402

  6. [6]

    Pearl,Causality

    J. Pearl,Causality. Cambridge University Press, 2009

  7. [7]

    Causality matters in medical imaging,

    D. C. Castro, I. Walker, and B. Glocker, “Causality matters in medical imaging,”Nature Communications, vol. 11, no. 1, p. 3673, Jul. 2020. [Online]. Available: https://www.nature.com/articles/s41467-020-17478 -w

  8. [8]

    From identifiable causal representations to controllable counterfactual generation: A survey on causal generative modeling,

    A. Komanduri, X. Wu, Y . Wu, and F. Chen, “From identifiable causal representations to controllable counterfactual generation: A survey on causal generative modeling,” 2024. [Online]. Available: https://arxiv.org/abs/2310.11011

  9. [9]

    Causal Machine Learning: A Survey and Open Problems

    J. Kaddour, A. Lynch, Q. Liu, M. J. Kusner, and R. Silva, “Causal Machine Learning: A Survey and Open Problems,” 2022, version Number: 3. [Online]. Available: https://arxiv.org/abs/2206.15475

  10. [10]

    Over 1 in 3 people affected by neurological conditions, the leading cause of illness and disability worldwide,

    “Over 1 in 3 people affected by neurological conditions, the leading cause of illness and disability worldwide,” 2024. [Online]. Available: https://www.who.int/news/item/14-03-2024-over-1-in-3-people-affect ed-by-neurological-conditions--the-leading-cause-of-illness-and-disab ility-worldwide

  11. [11]

    DAGs with NO TEARS: Continuous Optimization for Structure Learning

    X. Zheng, B. Aragam, P. Ravikumar, and E. P. Xing, “Dags with no tears: Continuous optimization for structure learning,” 2018. [Online]. Available: https://arxiv.org/abs/1803.01422

  12. [12]

    DAG-GNN: DAG Structure Learning with Graph Neural Networks

    Y . Yu, J. Chen, T. Gao, and M. Yu, “DAG-GNN: DAG Structure Learning with Graph Neural Networks,” 2019, version Number: 1. [Online]. Available: https://arxiv.org/abs/1904.10098

  13. [13]

    Dagma: Learning dags via m-matrices and a log-determinant acyclicity characterization,

    K. Bello, B. Aragam, and P. Ravikumar, “Dagma: Learning dags via m-matrices and a log-determinant acyclicity characterization,” 2023. [Online]. Available: https://arxiv.org/abs/2209.08037

  14. [14]

    A Survey on Causal Discovery: Theory and Practice,

    A. Zanga, E. Ozkirimli, and F. Stella, “A Survey on Causal Discovery: Theory and Practice,”International Journal of Approximate Reasoning, vol. 151, pp. 101–129, Dec. 2022. [Online]. Available: https://linkinghub.elsevier.com/retrieve/pii/S0888613X22001402

  15. [15]

    Database with cause- effect pairs,

    Max Planck Institute for Intelligent Systems, “Database with cause- effect pairs,” https://webdav.tuebingen.mpg.de/cause-effect/, 2026, accessed: 2026-01-07

  16. [16]

    CausalWorld: A Robotic Manipulation Benchmark for Causal Structure and Transfer Learning,

    O. Ahmed, F. Tr ¨auble, A. Goyal, A. Neitz, Y . Bengio, B. Sch ¨olkopf, M. W ¨uthrich, and S. Bauer, “CausalWorld: A Robotic Manipulation Benchmark for Causal Structure and Transfer Learning,” 2020, version Number: 2. [Online]. Available: https://arxiv.org/abs/2010.04296

  17. [17]

    Causal Protein-Signaling Networks Derived from Multiparameter Single-Cell Data,

    K. Sachs, O. Perez, D. Pe’er, D. A. Lauffenburger, and G. P. Nolan, “Causal Protein-Signaling Networks Derived from Multiparameter Single-Cell Data,”Science, vol. 308, no. 5721, pp. 523–529, Apr. 2005. [Online]. Available: https://www.science.org/doi/10.1126/science.1105 809

  18. [18]

    Droplet Barcoding for Single-Cell Transcriptomics Applied to Embryonic Stem Cells,

    A. Klein, L. Mazutis, I. Akartuna, N. Tallapragada, A. Veres, V . Li, L. Peshkin, D. Weitz, and M. Kirschner, “Droplet Barcoding for Single-Cell Transcriptomics Applied to Embryonic Stem Cells,”Cell, vol. 161, no. 5, pp. 1187–1201, May 2015. [Online]. Available: https://linkinghub.elsevier.com/retrieve/pii/S0092867415005000

  19. [19]

    Perturb-Seq: Dissecting Molecular Circuits with Scalable Single-Cell RNA Profiling of Pooled Genetic Screens,

    A. Dixit, O. Parnas, B. Li, J. Chen, C. P. Fulco, L. Jerby-Arnon, N. D. Marjanovic, D. Dionne, T. Burks, R. Raychowdhury, B. Adamson, T. M. Norman, E. S. Lander, J. S. Weissman, N. Friedman, and A. Regev, “Perturb-Seq: Dissecting Molecular Circuits with Scalable Single-Cell RNA Profiling of Pooled Genetic Screens,”Cell, vol. 167, no. 7, pp. 1853–1866.e17,...

  20. [20]

    Synthetic data generation methods in healthcare: A review on open-source tools and methods,

    V . C. Pezoulas, D. I. Zaridis, E. Mylona, C. Androutsos, K. Apostolidis, N. S. Tachos, and D. I. Fotiadis, “Synthetic data generation methods in healthcare: A review on open-source tools and methods,”Computational and Structural Biotechnology Journal, vol. 23, pp. 2892–2910, Dec

  21. [21]

    Available: https://spj.science.org/doi/10.1016/j.csbj.202 4.07.005

    [Online]. Available: https://spj.science.org/doi/10.1016/j.csbj.202 4.07.005

  22. [22]

    Causalvae: Structured causal disentanglement in variational autoencoder,

    M. Yang, F. Liu, Z. Chen, X. Shen, J. Hao, and J. Wang, “Causalvae: Structured causal disentanglement in variational autoencoder,” 2023. [Online]. Available: https://arxiv.org/abs/2004.08697

  23. [23]

    CausalGAN: Learning Causal Implicit Generative Models with Adversarial Training

    M. Kocaoglu, C. Snyder, A. G. Dimakis, and S. Vishwanath, “CausalGAN: Learning Causal Implicit Generative Models with Adversarial Training,” 2017, version Number: 2. [Online]. Available: https://arxiv.org/abs/1709.02023

  24. [24]

    MACAW: A Causal Generative Model for Medical Imaging,

    V . Vigneshwaran, E. Ohara, M. Wilms, and N. Forkert, “MACAW: A Causal Generative Model for Medical Imaging,” 2024, version Number:

  25. [25]

    Available: https://arxiv.org/abs/2412.02900

    [Online]. Available: https://arxiv.org/abs/2412.02900

  26. [26]

    A Flexible Framework for Simulating and Evaluating Biases in Deep Learning-Based Medical Image Analysis,

    E. A. M. Stanley, M. Wilms, and N. D. Forkert, “A Flexible Framework for Simulating and Evaluating Biases in Deep Learning-Based Medical Image Analysis,” inMedical Image Computing and Computer 10 Assisted Intervention – MICCAI 2023, H. Greenspan, A. Madabhushi, P. Mousavi, S. Salcudean, J. Duncan, T. Syeda-Mahmood, and R. Taylor, Eds. Cham: Springer Natur...

  27. [27]

    Towards objective and systematic evaluation of bias in artificial intelligence for medical imaging,

    E. A. M. Stanley, R. Souza, A. J. Winder, V . Gulve, K. Amador, M. Wilms, and N. D. Forkert, “Towards objective and systematic evaluation of bias in artificial intelligence for medical imaging,” Journal of the American Medical Informatics Association, vol. 31, no. 11, pp. 2613–2621, Nov. 2024. [Online]. Available: https: //academic.oup.com/jamia/article/3...

  28. [28]

    Synthetic Ground Truth Counterfactuals for Comprehensive Evaluation of Causal Generative Models in Medical Imaging,

    E. A. M. Stanley, V . Vigneshwaran, E. Y . Ohara, F. G. Vamosi, N. D. Forkert, and M. Wilms, “Synthetic Ground Truth Counterfactuals for Comprehensive Evaluation of Causal Generative Models in Medical Imaging,” inMedical Image Computing and Computer Assisted Intervention – MICCAI 2025, J. C. Gee, D. C. Alexander, J. Hong, J. E. Iglesias, C. H. Sudre, A. V...

  29. [29]

    The SRI24 multichannel atlas of normal adult human brain structure,

    T. Rohlfing, N. M. Zahr, E. V . Sullivan, and A. Pfefferbaum, “The SRI24 multichannel atlas of normal adult human brain structure,”Human Brain Mapping, vol. 31, no. 5, pp. 798–819, May 2010. [Online]. Available: https://onlinelibrary.wiley.com/doi/10.1002/hbm.20906

  30. [30]

    Ixi dataset – brain development,

    Biomedical Image Analysis Group, “Ixi dataset – brain development,” https://brain-development.org/ixi-dataset/, 2023, accessed: 2023-02-27

  31. [31]

    SynthSeg: Segmentation of brain MRI scans of any contrast and resolution without retraining,

    B. Billot, D. N. Greve, O. Puonti, A. Thielscher, K. Van Leemput, B. Fischl, A. V . Dalca, and J. E. Iglesias, “SynthSeg: Segmentation of brain MRI scans of any contrast and resolution without retraining,” Medical Image Analysis, vol. 86, p. 102789, May 2023. [Online]. Available: https://linkinghub.elsevier.com/retrieve/pii/S136184152300 0506

  32. [32]

    Back matter,

    J. Nocedal and S. J. Wright, “Back matter,” inNumerical Optimization. New York, NY: Springer, 2006, pp. 598–664

  33. [33]

    Spirtes, C

    P. Spirtes, C. N. Glymour, and R. Scheines,Causation, Prediction, and Search. MIT Press, 2000

  34. [34]

    Optimal structure identification with greedy search,

    C. Maxwell, “Optimal structure identification with greedy search,” Journal of Machine Learning Research, Mar. 2003

  35. [35]

    A linear non-gaussian acyclic model for causal discovery,

    S. Shimizu, P. O. Hoyer, A. Hyv ¨arinen, and A. Kerminen, “A linear non-gaussian acyclic model for causal discovery,”Journal of Machine Learning Research, Dec. 2006

  36. [36]

    Review of Causal Discovery Methods Based on Graphical Models,

    C. Glymour, K. Zhang, and P. Spirtes, “Review of Causal Discovery Methods Based on Graphical Models,”Frontiers in Genetics, vol. 10, p. 524, Jun. 2019. [Online]. Available: https: //www.frontiersin.org/article/10.3389/fgene.2019.00524/full

  37. [37]

    Learning Bayesian Networks is NP-Complete,

    D. M. Chickering, “Learning Bayesian Networks is NP-Complete,” in Learning from Data, P. Bickel, P. Diggle, S. Fienberg, K. Krickeberg, I. Olkin, N. Wermuth, S. Zeger, D. Fisher, and H.-J. Lenz, Eds. New York, NY: Springer New York, 1996, vol. 112, pp. 121– 130, series Title: Lecture Notes in Statistics. [Online]. Available: http://link.springer.com/10.10...

  38. [38]

    Large-sample learning of bayesian networks is np-hard,

    D. M. Chickering, D. Heckerman, and C. Meek, “Large-sample learning of bayesian networks is np-hard,”Journal of Machine Learning Re- search, vol. 5, Jan. 2004

  39. [39]

    gcastle: A python toolbox for causal discovery.arXiv preprint arXiv:2111.15155, 2021

    K. Zhang, S. Zhu, M. Kalander, I. Ng, J. Ye, Z. Chen, and L. Pan, “gCastle: A Python Toolbox for Causal Discovery,” 2021, version Number: 1. [Online]. Available: https://arxiv.org/abs/2111.15155

  40. [40]

    pgmpy: A Python Toolkit for Bayesian Networks,

    A. Ankan and J. Textor, “pgmpy: A Python Toolkit for Bayesian Networks,” 2023, version Number: 1. [Online]. Available: https: //arxiv.org/abs/2304.08639

  41. [41]

    Large-scale unconstrained optimization,

    J. Nocedal and S. J. Wright, “Large-scale unconstrained optimization,” inNumerical Optimization, J. Nocedal and S. J. Wright, Eds. New York, NY: Springer, 2006, pp. 164–192

  42. [42]

    Quasi-newton methods,

    ——, “Quasi-newton methods,” inNumerical Optimization, J. Nocedal and S. J. Wright, Eds. New York, NY: Springer, 2006, pp. 135–163

  43. [43]

    Evaluation of 3D Counterfactual Brain MRI Generation,

    P. Sun, W. Peng, L. Y . Li, Y . Wang, and K. M. Pohl, “Evaluation of 3D Counterfactual Brain MRI Generation,” 2025, version Number: 2. [Online]. Available: https://arxiv.org/abs/2508.02880