Principal Components Decomposition of Fraction of Variance Explained in High Dimensional Linear Models with Strong Correlation
Pith reviewed 2026-06-28 08:42 UTC · model grok-4.3
The pith
Decomposing fraction of variance explained into strong and weak correlation parts reduces bias for high-dimensional estimators.
A machine-rendered reading of the paper's core claim, the machinery that carries it, and where it could break.
Core claim
The fraction of variance explained can be partitioned into a low-dimensional component that captures strong correlations among predictors via their dominant principal components and a high-dimensional component consisting of weak residual correlations, allowing separate estimation that reduces overall bias.
What carries the argument
Principal components decomposition of the fraction of variance explained that separates strong from weak correlations among predictors.
If this is right
- Simulations demonstrate improved bias reduction when dominant PCs are decomposed before applying GWASH or LMM-REML.
- The estimator remains consistent as both the number of predictors and the number of samples increase.
- Application to the ABCD brain imaging dataset captures nuanced heritability signals in the FVE of cognitive measures.
- The low-dimensional component is estimable by ordinary low-dimensional methods while the high-dimensional component uses existing high-dimensional tools.
Where Pith is reading between the lines
- The same split could be tested with other high-dimensional FVE estimators beyond GWASH and LMM-REML.
- Fields such as genomics that also feature blocks of strong predictor correlation might see similar bias reductions.
- If the number of dominant PCs required grows with dimension, the separation benefit could shrink.
Load-bearing premise
The dominant principal components capture essentially all strong correlations, leaving only weak residual correlations that high-dimensional estimators can handle without bias.
What would settle it
A simulation or dataset in which removing the dominant principal components still leaves residual correlations strong enough to bias GWASH or LMM-REML estimates, or where the decomposed FVE shows no bias reduction relative to direct application of those estimators.
Figures
read the original abstract
The fraction of variance explained (FVE) in a linear model quantifies the extent to which predictors account for outcome variability. In high-dimensional settings, where traditional FVE estimators do not apply, modern FVE estimators such as GWASH or linear mix-effect model estimated through the restricted maximum likelihood (LMM-REML) struggle with strong correlation among predictors, often found, for example, in brain imaging data. We propose a decomposition framework that partitions the FVE into two components: a low-dimensional component capturing the strong correlation, estimable by low dimensional methods, and a high-dimensional component with remaining weak correlation, estimable by high dimensional methods. Simulations demonstrate that decomposing dominant principal components (PCs) and estimating the high-dimensional FVE using GWASH or LMM-REML leads to improved bias reduction compared to directly applying standard approaches such as GWASH and LMM-REML. Our method shows consistent performance asymptotically as both the number of predictors and the number of samples increase. We illustrate the method in an analysis of the Adolescent Brain Cognitive Development (ABCD) brain imaging dataset, capturing nuanced heritability signals in the FVE of cognitive measures predicted by high-resolution brain imaging data.
Editorial analysis
A structured set of objections, weighed in public.
Referee Report
Summary. The paper proposes a principal components decomposition framework for estimating the fraction of variance explained (FVE) in high-dimensional linear models where predictors exhibit strong correlations. It partitions the total FVE into a low-dimensional component captured by dominant principal components (addressing strong correlations via standard low-dimensional methods) and a residual high-dimensional component with weak correlations (estimated via GWASH or LMM-REML). Simulations are used to demonstrate bias reduction relative to direct application of GWASH or LMM-REML, along with asymptotic consistency as both p and n grow; the approach is illustrated on the ABCD brain imaging dataset for heritability signals in cognitive measures.
Significance. If the partitioning isolates strong correlations without introducing new bias, the framework could meaningfully extend reliable FVE estimation to correlated high-dimensional regimes common in neuroimaging and genomics. The explicit use of established high-dimensional estimators on the residual, combined with simulation evidence of bias reduction and consistency, and the real-data application, constitute concrete strengths. The approach is presented as a methodological framework rather than a tautological identity.
minor comments (4)
- [§3.1] §3.1: The precise definition of the two FVE components (low-dimensional vs. high-dimensional) should include an explicit equation showing how the total FVE is recovered as their sum; the current description leaves the additivity implicit.
- [Figure 2] Figure 2: The simulation panels comparing bias for the proposed method versus direct GWASH/LMM-REML would benefit from error bars or standard errors across replicates to allow visual assessment of variability.
- [§4.2] §4.2: The criterion used to choose the number of dominant PCs is described only qualitatively; a short algorithmic statement or pseudocode would improve reproducibility.
- [Table 1] Table 1: The asymptotic consistency claim is stated for p,n → ∞, but the table reports finite-sample results only; a brief note on the rate or conditions would clarify the link to theory.
Simulated Author's Rebuttal
We thank the referee for the positive assessment of our manuscript, the accurate summary of the proposed PC-based FVE decomposition framework, and the recommendation for minor revision. No specific major comments were provided in the report.
Circularity Check
No significant circularity identified
full rationale
The paper proposes a decomposition of FVE into low-dimensional (strong correlation via dominant PCs) and high-dimensional (weak residual correlation) components, with claims of bias reduction validated via simulations and asymptotic consistency as p and n grow. No equations, derivations, or self-referential definitions appear in the provided text that reduce any prediction or result to a fitted quantity defined by the same inputs. The framework is presented as a methodological partitioning rather than a self-definitional identity, and no load-bearing self-citations or ansatzes are invoked. This is a standard non-circular finding for a simulation-supported proposal.
Axiom & Free-Parameter Ledger
Reference graph
Works this paper leans on
-
[1]
Brain Communications , author =
The role of critical immune genes in brain disorders: insights from neuroimaging immunogenetics , volume =. Brain Communications , author =. 2022 , pages =. doi:10.1093/braincomms/fcac078 , abstract =
-
[2]
Multi-ancestry phenome-wide association of complement component 4 variation with psychiatric and brain phenotypes in youth , volume =. Genome Biology , author =. 2023 , pages =. doi:10.1186/s13059-023-02878-0 , abstract =
-
[3]
Proceedings of the National Academy of Sciences , author =
Morphometricity as a measure of the neuroanatomical signature of a trait , volume =. Proceedings of the National Academy of Sciences , author =. 2016 , file =. doi:10.1073/pnas.1604378113 , abstract =
-
[4]
, author =
Methods of correlation analysis. , author =. 1930 , annote =
1930
-
[5]
Tutorial: a statistical genetics guide to identifying. Nature Protocols , author =. 2023 , pages =. doi:10.1038/s41596-023-00853-4 , language =
-
[6]
Genotype. Behavior Genetics , author =. 2023 , pages =. doi:10.1007/s10519-023-10143-0 , language =
-
[7]
Applied Occupational and Environmental Hygiene , author =
Estimation of. Applied Occupational and Environmental Hygiene , author =. 1990 , pages =. doi:10.1080/1047322X.1990.10389587 , language =
-
[8]
The Annals of Occupational Hygiene , month = dec, year =
Much. The Annals of Occupational Hygiene , month = dec, year =. doi:10.1093/annhyg/mep092 , language =
-
[9]
Journal of Chromatography A , author =
How to estimate moments and quantiles of environmental data sets with non-detected observations?. Journal of Chromatography A , author =. 2002 , pages =. doi:10.1016/S0021-9673(02)01327-4 , language =
-
[10]
Environmental Science & Technology , author =
Evaluation of. Environmental Science & Technology , author =. 2008 , pages =. doi:10.1021/es071301c , language =
-
[11]
Water Resources Research , author =
Estimation of descriptive statistics for multiply censored water quality data , volume =. Water Resources Research , author =. 1988 , pages =. doi:10.1029/WR024i012p01997 , abstract =
-
[12]
Environmental Science & Technology , author =
Less than obvious - statistical treatment of data below the detection limit , volume =. Environmental Science & Technology , author =. 1990 , pages =. doi:10.1021/es00082a001 , language =
-
[13]
American Journal of Epidemiology , author =
Effects of. American Journal of Epidemiology , author =. 2003 , pages =. doi:10.1093/aje/kwf217 , number =
-
[14]
American Journal of Epidemiology , author =
The. American Journal of Epidemiology , author =. 2006 , pages =. doi:10.1093/aje/kwj039 , language =
-
[15]
Statistics in Biosciences , author =
Statistical. Statistics in Biosciences , author =. 2015 , pages =. doi:10.1007/s12561-013-9099-4 , language =
-
[16]
Cold Spring Harbor Perspectives in Medicine , author =
Brain. Cold Spring Harbor Perspectives in Medicine , author =. 2012 , pages =. doi:10.1101/cshperspect.a006213 , language =
-
[17]
Biological Psychiatry , author =
Structural brain imaging in schizophrenia: a selective review , volume =. Biological Psychiatry , author =. 1999 , pages =. doi:10.1016/S0006-3223(99)00071-2 , language =
-
[18]
Proceedings of the National Academy of Sciences , author =
Cortex mapping reveals regionally specific patterns of genetic and disease-specific gray-matter deficits in twins discordant for schizophrenia , volume =. Proceedings of the National Academy of Sciences , author =. 2002 , pages =. doi:10.1073/pnas.052023499 , abstract =
-
[19]
Cortical abnormalities in bipolar disorder investigated with. NeuroImage , author =. 2006 , pages =. doi:10.1016/j.neuroimage.2005.09.029 , language =
-
[20]
Developmental Cognitive Neuroscience , author =
The. Developmental Cognitive Neuroscience , author =. 2018 , pages =. doi:10.1016/j.dcn.2018.03.001 , language =
-
[21]
Nature Communications , author =
Multivariate genome-wide association study on tissue-sensitive diffusion metrics highlights pathways that shape the human brain , volume =. Nature Communications , author =. 2022 , pages =. doi:10.1038/s41467-022-30110-3 , abstract =
-
[22]
Brain, Behavior, and Immunity , author =. 2024 , pages =. doi:10.1016/j.bbi.2023.10.019 , language =
-
[23]
Protective. EBioMedicine , author =. 2018 , pages =. doi:10.1016/j.ebiom.2018.02.005 , language =
-
[24]
Image processing and analysis methods for the. NeuroImage , author =. 2019 , pages =. doi:10.1016/j.neuroimage.2019.116091 , language =
-
[25]
Azriel, David and Davenport, Samuel and Schwartzman, Armin , month = feb, year =. Consistency of heritability estimation from summary statistics in high-dimensional linear models , url =. doi:10.48550/arXiv.2502.11144 , keywords =
-
[26]
The American Journal of Human Genetics , author =
Maximizing the. The American Journal of Human Genetics , author =. 2014 , pages =. doi:10.1016/j.ajhg.2014.03.016 , language =
-
[27]
Briefings in Bioinformatics , author =
A review of. Briefings in Bioinformatics , author =. 2022 , pages =. doi:10.1093/bib/bbac067 , abstract =
-
[28]
Computational and Structural Biotechnology Journal , author =
Statistical methods for. Computational and Structural Biotechnology Journal , author =. 2020 , pages =. doi:10.1016/j.csbj.2020.06.011 , language =
-
[29]
Nature Reviews Genetics , author =
Heritability in the genomics era — concepts and misconceptions , volume =. Nature Reviews Genetics , author =. 2008 , pages =. doi:10.1038/nrg2322 , language =
-
[30]
The Annals of Applied Statistics , author =
A simple, consistent estimator of. The Annals of Applied Statistics , author =. 2019 , file =. doi:10.1214/19-AOAS1291 , number =
-
[31]
Human Brain Mapping , author =
A unified framework for association and prediction from vertex‐wise grey‐matter structure , volume =. Human Brain Mapping , author =. 2020 , pages =. doi:10.1002/hbm.25109 , abstract =
-
[32]
The American Journal of Human Genetics , author =. 2011 , pages =. doi:10.1016/j.ajhg.2010.11.011 , language =
-
[33]
Monographs of the Society for Research in Child Development , author =. 2013 , pages =. doi:10.1111/mono.12038 , abstract =
-
[34]
Proceedings of the National Academy of Sciences , author =
Limitations of. Proceedings of the National Academy of Sciences , author =. 2016 , file =. doi:10.1073/pnas.1520109113 , abstract =
-
[35]
Journal of the American Statistical Association , volume =
The. Journal of the American Statistical Association , author =. 2015 , pages =. doi:10.1080/01621459.2014.958156 , language =
-
[36]
The American Journal of Human Genetics , author =
Significance tests for. The American Journal of Human Genetics , author =. 2023 , pages =. doi:10.1016/j.ajhg.2023.01.004 , language =
-
[37]
2010 , publisher=
Kendall's advanced theory of statistics, distribution theory , author=. 2010 , publisher=
2010
-
[38]
Biometrika , volume=
The mean and second moment coefficient of the multiple correlation coefficient, in samples from a normal population , author=. Biometrika , volume=. 1931 , publisher=
1931
-
[39]
Collabra: Psychology , author =
Improving on. Collabra: Psychology , author =. 2020 , pages =. doi:10.1525/collabra.343 , abstract =
-
[40]
Electronic Journal of Statistics , author =
Estimation of linear projections of non-sparse coefficients in high-dimensional regression , volume =. Electronic Journal of Statistics , author =. 2020 , file =. doi:10.1214/19-EJS1656 , number =
-
[41]
Archives of General Psychiatry , author =
Brain. Archives of General Psychiatry , author =. 2011 , pages =. doi:10.1001/archgenpsychiatry.2011.117 , language =
-
[42]
The Journal of Neuroscience , author =
A unifying model for discordant and concordant results in human neuroimaging studies of facial viewpoint selectivity , copyright =. The Journal of Neuroscience , author =. 2024 , pages =. doi:10.1523/JNEUROSCI.0296-23.2024 , abstract =
-
[43]
The Annals of Mathematical Statistics , author =
Unbiased. The Annals of Mathematical Statistics , author =. 1958 , pages =. doi:10.1214/aoms/1177706717 , language =
-
[44]
PLoS Med.12(3), e1001779 (2015).https://doi.org/10.1371/journal.pmed.1001779
PLOS Medicine , author =. 2015 , pages =. doi:10.1371/journal.pmed.1001779 , language =
-
[45]
The WU-Minn Human Connectome Project: An overview , journal =
The. NeuroImage , author =. 2013 , pages =. doi:10.1016/j.neuroimage.2013.05.041 , language =
-
[46]
, author=
Methods of correlation analysis. , author=. 1930 , publisher=
1930
-
[47]
Palmer, C E and Zhao, W and Loughnan, R and Zou, J and Fan, C C and Thompson, W K and Dale, A M and Jernigan, T L , month = feb, year =. Distinct regionalization patterns of cortical morphology are associated with cognitive performance across different domains , copyright =. doi:10.1101/2020.02.13.948596 , abstract =
-
[48]
Individual. Cerebral Cortex , author =. 2021 , pages =. doi:10.1093/cercor/bhaa290 , abstract =
-
[49]
Accurate estimation of. Nature Genetics , author =. 2019 , pages =. doi:10.1038/s41588-019-0465-0 , language =
-
[50]
Reevaluation of. Nature Genetics , author =. 2017 , pages =. doi:10.1038/ng.3865 , language =
-
[51]
The Annals of Statistics , author =
On the. The Annals of Statistics , author =. 2001 , note =
2001
-
[52]
The Annals of Probability , author =
Tracy–. The Annals of Probability , author =. 2007 , file =. doi:10.1214/009117906000000917 , number =
-
[53]
, year =
Bai, Zhidong and Silverstein, Jack W. , year =. Spectral
-
[54]
Phase transition of the largest eigenvalue for non-null complex sample covariance matrices
Baik, Jinho and Arous, Gerard Ben and Peche, Sandrine , year =. Phase transition of the largest eigenvalue for non-null complex sample covariance matrices , copyright =. doi:10.48550/ARXIV.MATH/0403022 , abstract =
work page internal anchor Pith review doi:10.48550/arxiv.math/0403022
-
[55]
A useful variant of the. Biometrika , author =. 2015 , pages =. doi:10.1093/biomet/asv008 , language =
-
[56]
Biological Psychiatry: Cognitive Neuroscience and Neuroimaging , author =
Double. Biological Psychiatry: Cognitive Neuroscience and Neuroimaging , author =. 2020 , pages =. doi:10.1016/j.bpsc.2019.09.003 , language =
-
[57]
Nature Neuroscience , author =
Circular analysis in systems neuroscience: the dangers of double dipping , volume =. Nature Neuroscience , author =. 2009 , pages =. doi:10.1038/nn.2303 , language =
-
[58]
Estimation of the
Gupta, A K , year =. Estimation of the
-
[59]
Gupta, A.K. and Nagar, D.K. , month = may, year =. Matrix. doi:10.1201/9780203749289 , language =
-
[60]
The Annals of Statistics , year =
On the distribution of the largest eigenvalue in principal components analysis , author =. The Annals of Statistics , year =
-
[61]
The Annals of Statistics , year =
Edge universality of correlation matrices , author =. The Annals of Statistics , year =
-
[62]
1997 , series =
Matrix Analysis , author =. 1997 , series =
1997
-
[63]
Nature Genetics , author =. 2015 , pages =. doi:10.1038/ng.3211 , language =
-
[64]
Journal of Educational and Behavioral Statistics , author =
A. Journal of Educational and Behavioral Statistics , author =. 2002 , pages =. doi:10.3102/10769986027003223 , abstract =
-
[65]
Representational similarity analysis – connecting the branches of systems neuroscience , issn =. Frontiers in Systems Neuroscience , author =. 2008 , file =. doi:10.3389/neuro.06.004.2008 , urldate =
-
[66]
2019 , eprint=
Spiked sample covariance matrices with possibly multiple bulk components , author=. 2019 , eprint=
2019
-
[67]
Journal of Multivariate Analysis , author =
Eigenvalues of large sample covariance matrices of spiked population models , volume =. Journal of Multivariate Analysis , author =. 2006 , keywords =. doi:https://doi.org/10.1016/j.jmva.2005.08.003 , abstract =
-
[68]
Common. Nature Genetics , author =. 2010 , pages =. doi:10.1038/ng.608 , language =
-
[69]
Random Matrices: Theory and Applications , author =
Efficient computation of limit spectra of sample covariance matrices , volume =. Random Matrices: Theory and Applications , author =. 2015 , pages =. doi:10.1142/S2010326315500197 , abstract =
-
[70]
Journal of Multivariate Analysis , author =
Analysis of the. Journal of Multivariate Analysis , author =. 1995 , pages =. doi:10.1006/jmva.1995.1058 , language =
-
[71]
A quantified comparison of cortical atlases on the basis of trait morphometricity , volume =. Cortex , author =. 2023 , pages =. doi:10.1016/j.cortex.2022.11.001 , language =
-
[72]
Mathematics of the USSR-Sbornik , author =. 1967 , pages =. doi:10.1070/SM1967v001n04ABEH001994 , number =
-
[73]
Variance estimation in high-dimensional linear models , volume =. Biometrika , author =. 2014 , pages =. doi:10.1093/biomet/ast065 , language =
-
[74]
Bioinformatics , volume =
LASER server: ancestry tracing with genotypes or sequence reads , author =. Bioinformatics , volume =. 2017 , doi =
2017
-
[75]
Reproducible brain-wide association studies require thousands of individuals , volume =. Nature , author =. 2022 , pages =. doi:10.1038/s41586-022-04492-9 , abstract =
-
[76]
Pham, Benjamin K and Davenport, Samuel and Azriel, David and Schwartzman, Armin , month = may, year =. When can whole-genome. doi:10.64898/2026.05.13.724972 , abstract =
-
[77]
Improved ancestry inference using weights from external reference panels , volume =. Bioinformatics , author =. 2013 , pages =. doi:10.1093/bioinformatics/btt144 , abstract =
discussion (0)
Sign in with ORCID, Apple, or X to comment. Anyone can read and Pith papers without signing in.