arxiv: 2604.12294 · v1 · submitted 2026-04-14 · 🧬 q-bio.QM · stat.AP

Recognition: unknown

The IQ-Motion Confound in Multi-Site Autism fMRI May Be Inflated by Site-Correlated Measurement Uncertainty

Kareem Soliman

Pith reviewed 2026-05-10 14:34 UTC · model grok-4.3

classification 🧬 q-bio.QM stat.AP

keywords errors-in-variablesOLS biashead motionautism fMRImulti-site studiesABIDEIQ confoundmeasurement uncertainty

0 comments

The pith

Pooled OLS overestimates the IQ-motion slope by a factor of 4.67 in multi-site autism fMRI data.

A machine-rendered reading of the paper's core claim, the machinery that carries it, and where it could break.

The paper examines whether ordinary least squares regression gives an unbiased estimate of how full-scale IQ relates to head motion in pooled multi-site autism neuroimaging datasets. It applies an errors-in-variables regression that incorporates published test-retest reliability for IQ scores and a site-level proxy for motion measurement uncertainty. On the ABIDE-I sample of 935 subjects across 19 sites, the analysis shows that standard OLS produces a slope more than four times steeper than the corrected estimate, that a single pooled model yields negative out-of-sample R-squared at every site, and that the corrected slope direction holds across wide ranges of the noise parameters.

Core claim

Ordinary least squares regression overestimates the negative IQ-motion association by a factor of 4.67 (OLS coefficient -0.00125 mm per IQ point versus EIV coefficient -0.00027 mm per IQ point). Leave-site-out cross-validation shows that a pooled OLS predictor of raw framewise displacement produces negative R-squared at all 19 sites (overall R-squared = -0.074). The direction of the errors-in-variables corrected slope remains negative and stable across an 8-by-8 grid of noise-parameter values spanning 12-fold ranges.

What carries the argument

Probability Cloud Regression, an errors-in-variables estimator that models per-observation measurement uncertainty in both the IQ predictor (from Wechsler test-retest reliability) and the motion response (from within-site standard deviation of mean framewise displacement).

If this is right

Pooled OLS may overstate the size of the IQ-motion confound that needs to be removed in multi-site autism fMRI analyses.
A single IQ-based predictor of motion does not generalize across scanning sites once site identity is withheld.
The direction of the corrected slope is insensitive to plausible ranges of the two noise parameters.
Formal errors-in-variables methods remain rarely used for confound estimation in multi-site neuroimaging.

Where Pith is reading between the lines

These are editorial extensions of the paper, not claims the author makes directly.

Similar measurement-error biases could affect other pooled confound regressions that mix IQ or age with site-specific variables.
Connectivity-level re-analyses that substitute EIV-corrected motion covariates might change reported group differences in autism studies.
Reporting both OLS and EIV estimates side-by-side would give readers a direct sense of the inflation factor in future multi-site work.

Load-bearing premise

The within-site standard deviation of mean framewise displacement accurately represents the per-subject measurement uncertainty in the motion variable.

What would settle it

Repeating the analysis on a dataset that supplies empirical repeat-scan motion traces or repeat IQ assessments for the same subjects, then checking whether the EIV slope matches the observed within-subject change.

Figures

Figures reproduced from arXiv: 2604.12294 by Kareem Soliman.

**Figure 2.** Figure 2: Within-tier OLS slopes with pooled OLS (red dashed) and PCR EIV-corrected (blue [PITH_FULL_IMAGE:figures/full_fig_p007_2.png] view at source ↗

**Figure 3.** Figure 3: Full scatter of all 935 ABIDE-I subjects with OLS (solid red) and PCR EIV-corrected [PITH_FULL_IMAGE:figures/full_fig_p008_3.png] view at source ↗

**Figure 4.** Figure 4: Sensitivity of the attenuation estimate to [PITH_FULL_IMAGE:figures/full_fig_p008_4.png] view at source ↗

read the original abstract

Multi-site autism neuroimaging studies routinely control for the confound between full-scale IQ and head motion by regressing framewise displacement against IQ scores and removing shared variance. This procedure assumes that ordinary least squares (OLS) provides an unbiased estimate of the confound magnitude. We tested this assumption on the ABIDE-I phenotypic dataset (n=935 subjects across 19 international scanning sites) using Probability Cloud Regression, an errors-in-variables (EIV) estimator that models per-observation measurement uncertainty in both variables. IQ measurement error was derived from published Wechsler test-retest reliability coefficients; response-side uncertainty was represented by a site-level proxy equal to the within-site standard deviation of mean framewise displacement. Three findings emerged. First, OLS overestimates the IQ-motion slope by a factor of 4.67 relative to the EIV-corrected estimate when the bias factor is computed from the full-precision fitted coefficients (OLS -0.00125, EIV -0.00027 mm per IQ point after rounding for display). Second, under leave-site-out cross-validation a single pooled predictor of raw FD produces negative out-of-sample R^2 at all 19 sites (overall R^2 = -0.074), indicating that the pooled predictor does not transport cleanly across sites once site information is removed. Third, the direction of the EIV-corrected slope is robust across all 64 configurations of an 8x8 sensitivity grid spanning 12-fold ranges of each noise parameter. These results suggest that pooled OLS may overstate the IQ-motion association in ABIDE-I, but direct downstream consequences for motion-correction pipelines remain to be quantified using raw motion traces and connectivity-level re-analysis. Formal EIV methods appear to remain uncommon in multi-site neuroimaging confound estimation.

Editorial analysis

A structured set of objections, weighed in public.

Desk editor's note, referee report, simulated authors' rebuttal, and a circularity audit. Tearing a paper down is the easy half of reading it; the pith above is the substance, this is the friction.

Desk Editor's Note private letter to a colleague

The paper shows OLS overestimates the IQ-motion slope in ABIDE-I by a factor of 4.67 versus EIV, with pooled models failing cross-site prediction, but the FD uncertainty proxy likely mixes true variance and error.

read the letter

The main point is that OLS regression on ABIDE-I data gives an IQ to framewise displacement slope about 4.67 times steeper than an errors-in-variables fit, and that a single pooled model produces negative out-of-sample R-squared at every site in leave-site-out checks. The EIV version stays negative but much smaller in magnitude, and the sign holds across their 64-point sensitivity grid on the noise parameters. They pull IQ error from published test-retest numbers and set the motion-side uncertainty to the within-site SD of mean FD. Those concrete coefficients and the cross-validation result are the clearest new pieces here. Prior work on this confound mostly stopped at OLS, so the direct comparison and the transport failure are useful to see quantified. The leave-site-out negative R-squared in particular is a simple, interpretable warning against assuming one model works everywhere. The soft spot is the motion uncertainty proxy. Within-site SD of mean FD equals the square root of true motion variance plus measurement error variance. True motion differences across subjects within a site are substantial for behavioral and clinical reasons, so the proxy is probably larger than the actual per-subject error in FD. In EIV that overstates sigma_y and pulls the slope closer to zero than a correct error model would, which inflates the reported OLS-EIV ratio. Their grid changes the size of the proxy but does not test whether it is error variance rather than total variance. They also stop short of re-running any connectivity analyses, so the practical effect on downstream results is still open. This is for labs that do multi-site autism fMRI and routinely regress motion against IQ or similar variables. The numbers are specific enough that a referee could check the proxy choice or ask for a simulation to bound the over-correction. I would send it to peer review rather than desk reject.

Referee Report

2 major / 2 minor

Summary. The manuscript claims that ordinary least squares (OLS) regression overestimates the IQ-head motion (mean framewise displacement) slope by a factor of 4.67 relative to an errors-in-variables (EIV) correction via Probability Cloud Regression on the ABIDE-I dataset (n=935, 19 sites). IQ error is taken from published test-retest reliabilities while response uncertainty uses a site-level proxy; the pooled OLS model yields negative leave-site-out R² at every site (overall -0.074), and the EIV slope sign remains stable across an 8x8 sensitivity grid spanning 12-fold noise ranges.

Significance. If the EIV specification holds, the result indicates that conventional confound regression in multi-site autism fMRI may substantially inflate the apparent IQ-motion association, with potential consequences for how motion is partialled out prior to connectivity analyses. The explicit sensitivity grid, consistent negative cross-validation R², and use of external reliability coefficients for the predictor error are strengths that enhance reproducibility and internal checks.

major comments (2)

[Abstract (EIV model) and Methods (proxy definition)] The response-side uncertainty proxy (within-site SD of mean FD) is set equal to the total observed variance rather than measurement error alone. Because this quantity includes substantial true inter-subject motion variance (behavioral, scanner, and autism-related), it overstates per-observation sigma_y in the EIV model. This directly amplifies the attenuation correction and produces the reported 4.67-fold inflation (OLS -0.00125 vs. EIV -0.00027 mm/IQ point). The 8x8 grid only scales the magnitude of this already-inflated proxy; it does not test whether the proxy isolates error variance.
[Results (leave-site-out paragraph)] The leave-site-out result (negative R² at all 19 sites) demonstrates that a single pooled OLS predictor does not transport across sites, but the manuscript does not show how this finding quantitatively supports or modifies the central OLS-vs-EIV slope comparison. Clarify the logical link between the cross-validation exercise and the bias-factor claim.

minor comments (2)

[Abstract] The abstract states the 4.67 factor is computed from 'full-precision fitted coefficients' yet reports rounded values; provide the unrounded coefficients and the exact arithmetic in the main text or a supplementary table so readers can reproduce the ratio.
[Methods] Notation for the Probability Cloud Regression estimator and the precise form of the EIV likelihood should be stated explicitly (e.g., as an equation) rather than referenced only by name.

Simulated Author's Rebuttal

2 responses · 0 unresolved

We thank the referee for their constructive and detailed comments, which have prompted us to re-examine our error modeling choices and the presentation of our cross-validation results. We address each major comment below with honest assessment and plans for revision.

read point-by-point responses

Referee: [Abstract (EIV model) and Methods (proxy definition)] The response-side uncertainty proxy (within-site SD of mean FD) is set equal to the total observed variance rather than measurement error alone. Because this quantity includes substantial true inter-subject motion variance (behavioral, scanner, and autism-related), it overstates per-observation sigma_y in the EIV model. This directly amplifies the attenuation correction and produces the reported 4.67-fold inflation (OLS -0.00125 vs. EIV -0.00027 mm/IQ point). The 8x8 grid only scales the magnitude of this already-inflated proxy; it does not test whether the proxy isolates error variance.

Authors: We agree that the within-site SD of mean FD captures both measurement error and true inter-subject variability in motion, and that treating the full SD as sigma_y overstates the pure error component in the EIV model. This was an intentional conservative proxy given the absence of repeated FD measures in ABIDE-I, but the referee is correct that it risks inflating the attenuation correction. In the revised manuscript we will update the Methods to explicitly state this limitation, reframe the proxy as an upper-bound estimate, and add a new sensitivity analysis that scales the error fraction of the SD (e.g., 25 % and 50 % of observed SD treated as error). This will show how the OLS-to-EIV slope ratio varies when the proxy is made more conservative and will directly address the concern that the existing 8x8 grid does not isolate error variance. revision: yes
Referee: [Results (leave-site-out paragraph)] The leave-site-out result (negative R² at all 19 sites) demonstrates that a single pooled OLS predictor does not transport across sites, but the manuscript does not show how this finding quantitatively supports or modifies the central OLS-vs-EIV slope comparison. Clarify the logical link between the cross-validation exercise and the bias-factor claim.

Authors: The leave-site-out analysis was included to demonstrate that a single pooled OLS model fails to generalize, yielding negative out-of-sample R² at every site. This finding is conceptually related to the EIV results because both expose limitations of standard OLS pooling in multi-site data: OLS assumes error-free predictors and a homogeneous slope, while EIV corrects for measurement error within that pooled framework. The poor transportability indicates substantial site-driven variance that may interact with measurement error. However, we acknowledge that the manuscript does not provide a direct quantitative decomposition showing how much of the 4.67-fold bias factor is attributable to non-transportability versus pure measurement error. In revision we will add explicit text in the Results and Discussion linking the two analyses and will note that a fully integrated quantitative bridge would require additional site-specific EIV simulations, which we can outline as future work. revision: partial

Circularity Check

0 steps flagged

No significant circularity in the derivation chain

full rationale

The paper applies standard OLS regression and Probability Cloud Regression (EIV) to the ABIDE-I dataset, sourcing IQ error variance from external published reliability coefficients and using within-site SD of mean FD as a proxy for response uncertainty. The reported 4.67x overestimation factor is the direct numerical ratio of the two fitted slopes (OLS -0.00125 vs. EIV -0.00027); it is not obtained by fitting a parameter to a subset and renaming it as a prediction, nor does any equation reduce to its inputs by definition. Leave-site-out cross-validation and the 8x8 sensitivity grid are independent computations on the same data that do not presuppose the target ratio. No self-citations are load-bearing, and the central claim rests on the application of an external EIV estimator rather than tautological re-expression.

Axiom & Free-Parameter Ledger

1 free parameters · 2 axioms · 0 invented entities

The claim depends on external reliability coefficients for IQ error and on the authors' choice of site-level SD as FD error proxy; standard EIV assumptions of independent additive errors are invoked without direct per-subject validation.

free parameters (1)

FD uncertainty proxy
Within-site standard deviation of mean framewise displacement, chosen as proxy and varied across 12-fold range in sensitivity grid.

axioms (2)

domain assumption Measurement errors in IQ and framewise displacement are independent of the true values and of each other.
Core assumption of the errors-in-variables model applied to the regression.
ad hoc to paper The site-level SD of mean FD serves as a valid stand-in for per-observation response uncertainty.
Introduced in the abstract as a proxy without direct measurement or external validation.

pith-pipeline@v0.9.0 · 5625 in / 1470 out tokens · 48230 ms · 2026-05-10T14:34:52.197034+00:00 · methodology

discussion (0)

Reference graph

Works this paper leans on

16 extracted references · 16 canonical work pages

[1]

Raymond J

doi: 10.1038/s41586-020-2314-9. Raymond J. Carroll, David Ruppert, Leonard A. Stefanski, and Ciprian M. Crainiceanu.Mea- surement Error in Nonlinear Models: A Modern Perspective. Chapman and Hall/CRC, 2nd edition,

work page doi:10.1038/s41586-020-2314-9
[2]

Rastko Ciric, Daniel H

doi: 10.1201/9781420010138. Rastko Ciric, Daniel H. Wolf, Jonathan D. Power, David R. Roalf, Graham L. Baum, Kosha Ruparel, Russell T. Shinohara, Mark A. Elliott, Simon B. Eickhoff, Christos Da- vatzikos, Ruben C. Gur, Raquel E. Gur, Danielle S. Bassett, and Theodore D. Satterth- waite. Benchmarking of participant-level confound regression strategies for ...

work page doi:10.1201/9781420010138
[3]

RastkoCiric, AdonF.G.Rosen, GurayErus, MatthewCieslak, AzeezAdebimpe, PhilipA.Cook, Danielle S

doi: 10.1016/j.neuroimage.2017.03.020. RastkoCiric, AdonF.G.Rosen, GurayErus, MatthewCieslak, AzeezAdebimpe, PhilipA.Cook, Danielle S. Bassett, Christos Davatzikos, Daniel H. Wolf, and Theodore D. Satterthwaite. Mitigating head motion artifact in functional connectivity MRI.Nature Protocols, 13(12): 2801–2826,

work page doi:10.1016/j.neuroimage.2017.03.020 2017
[4]

doi: 10.1038/s41596-018-0065-y. W. Edwards Deming.Statistical Adjustment of Data. John Wiley & Sons, New York,

work page doi:10.1038/s41596-018-0065-y
[5]

Oscar Esteban, Christopher J

doi: 10.1038/mp.2013.78. Oscar Esteban, Christopher J. Markiewicz, Ross W. Blair, Craig A. Moodie, A. Ilkay Isik, Asier Erramuzpe, James D. Kent, Mathias Goncalves, Elizabeth DuPre, Madeline Snyder, Hiroyuki Oya, Satrajit S. Ghosh, Jessey Wright, Joke Durnez, Russell A. Poldrack, and Krzysztof J. Gorgolewski. fMRIPrep: a robust preprocessing pipeline for ...

work page doi:10.1038/mp.2013.78 2013
[6]

Jean-Philippe Fortin, Drew Parker, Birkan Tunç, Takanori Watanabe, Mark A

doi: 10.1038/s41592-018-0235-4. Jean-Philippe Fortin, Drew Parker, Birkan Tunç, Takanori Watanabe, Mark A. Elliott, Kosha Ruparel, David R. Roalf, Theodore D. Satterthwaite, Ruben C. Gur, Raquel E. Gur, Robert T. Schultz, Ragini Verma, and Russell T. Shinohara. Harmonization of multi-site diffusion tensor imaging data.NeuroImage, 161:149–170,

work page doi:10.1038/s41592-018-0235-4
[7]

Jean-Philippe Fortin, Nicholas Cullen, Yvette I

doi: 10.1016/j.neuroimage.2017.08.047. Jean-Philippe Fortin, Nicholas Cullen, Yvette I. Sheline, Warren D. Taylor, Irem Aselcioglu, Philip A. Cook, Phil Adams, Crystal Cooper, Maurizio Fava, Patrick J. McGrath, Melvin McInnis, Mary L. Phillips, Madhukar H. Trivedi, Myrna M. Weissman, and Russell T. Shino- hara. Harmonizationofcorticalthicknessmeasurements...

work page doi:10.1016/j.neuroimage.2017.08.047 2017
[8]

Harmonization of cortical thickness measurements across scanners and sites

doi: 10.1016/j.neuroimage.2017.11.024. 12 Wayne A. Fuller.Measurement Error Models. John Wiley & Sons, New York,

work page doi:10.1016/j.neuroimage.2017.11.024 2017
[9]

doi: 10.1002/9780470316665. W. Evan Johnson, Cheng Li, and Ariel Rabinovic. Adjusting batch effects in microarray expression data using empirical Bayes methods.Biostatistics, 8(1):118–127,

work page doi:10.1002/9780470316665
[10]

Jonathan D

doi: 10.1093/biostatistics/kxj037. Jonathan D. Power, Kelly A. Barnes, Abraham Z. Snyder, Bradley L. Schlaggar, and Steven E. Petersen. Spurious but systematic correlations in functional connectivity MRI networks arise from subject motion.NeuroImage, 59(3):2142–2154,

work page doi:10.1093/biostatistics/kxj037
[11]

doi: 10.1016/j.neuroimage.2011.10

work page doi:10.1016/j.neuroimage.2011.10 2011
[12]

Theodore D

doi: 10.1016/j.neuroimage.2011.12.063. Theodore D. Satterthwaite, Mark A. Elliott, Raphael T. Gerraty, Kosha Ruparel, James Loug- head, Monica E. Calkins, Simon B. Eickhoff, Hakon Hakonarson, Ruben C. Gur, Raquel E. Gur, and Daniel H. Wolf. An improved framework for confound regression and filtering for control of motion artifact in the preprocessing of r...

work page doi:10.1016/j.neuroimage.2011.12.063 2011
[13]

Joshua S

doi: 10.1016/j.neuroimage.2012.08.052. Joshua S. Siegel, Jonathan D. Power, Joseph W. Dubis, Alecia C. Vogel, Jessica A. Church, Bradley L. Schlaggar, and Steven E. Petersen. Statistical improvements in functional magnetic resonance imaging analyses produced by censoring high-motion data points.Human Brain Mapping, 35(5):1981–1996,

work page doi:10.1016/j.neuroimage.2012.08.052 2012
[14]

doi: 10.1002/hbm.22307. Koene R. A. Van Dijk, Mert R. Sabuncu, and Randy L. Buckner. The influence of head motion on intrinsic functional connectivity MRI.NeuroImage, 59(1):431–438,

work page doi:10.1002/hbm.22307
[15]

https://doi.org/10.1016/j

doi: 10.1016/j. neuroimage.2011.07.044. David Wechsler.WISC-IV: Wechsler Intelligence Scale for Children, Fourth Edition: Technical and Interpretive Manual. San Antonio, TX,

work page doi:10.1016/j 2011
[16]

doi: 10.1002/hbm. 24241. 13

work page doi:10.1002/hbm