Recognition: 1 theorem link
· Lean TheoremThe Generalised Kernel Covariance Measure
Pith reviewed 2026-05-13 16:55 UTC · model grok-4.3
The pith
The Generalised Kernel Covariance Measure tests conditional independence using flexible regression estimators instead of kernel ridge regression.
A machine-rendered reading of the paper's core claim, the machinery that carries it, and where it could break.
Core claim
We propose the Generalised Kernel Covariance Measure as a regression-model-agnostic kernel-based test for conditional independence, building on the Generalised Hilbertian Covariance Measure framework, and characterise conditions under which it satisfies uniform asymptotic level guarantees while demonstrating strong empirical performance with tree-based models.
What carries the argument
The generalised kernel covariance measure, computed from residuals of regressing variable embeddings on conditioning variables using arbitrary regression estimators.
Load-bearing premise
The regression estimators must satisfy the approximation or consistency conditions required for the uniform asymptotic level guarantees to hold.
What would settle it
Observing inflated type I error rates in finite samples when using a regression estimator that violates the consistency conditions characterized in the paper.
Figures
read the original abstract
We consider the problem of conditional independence (CI) testing and adopt a kernel-based approach. Kernel-based CI tests embed variables in reproducing kernel Hilbert spaces, regress their embeddings on the conditioning variables, and test the resulting residuals for marginal independence. This approach yields tests that are sensitive to a broad range of conditional dependencies. Existing methods, however, rely heavily on kernel ridge regression, which is computationally expensive when properly tuned and yields poorly calibrated tests when left untuned, which limits their practical usefulness. We propose the Generalised Kernel Covariance Measure (GKCM), a regression-model-agnostic kernel-based CI test that accommodates a broad class of regression estimators. Building on the Generalised Hilbertian Covariance Measure framework (Lundborg et al., 2022), we characterise conditions under which GKCM satisfies uniform asymptotic level guarantees. In simulations, GKCM paired with tree-based regression models frequently outperforms state-of-the-art CI tests across a diverse range of data-generating processes, achieving better type I error control and competitive or superior power.
Editorial analysis
A structured set of objections, weighed in public.
Referee Report
Summary. The paper introduces the Generalised Kernel Covariance Measure (GKCM), a kernel-based conditional independence test that is agnostic to the choice of regression estimator. Building on the Generalised Hilbertian Covariance Measure (GHCM) of Lundborg et al. (2022), it characterizes conditions under which GKCM achieves uniform asymptotic level guarantees. Simulations demonstrate that when paired with tree-based regression models such as random forests and gradient boosting, GKCM often achieves superior type I error control and competitive power compared to state-of-the-art CI tests across various data-generating processes.
Significance. If the theoretical conditions are satisfied by the regression estimators used in practice, this work provides a flexible and computationally efficient framework for kernel-based CI testing that broadens the applicability beyond kernel ridge regression. The empirical results suggest practical advantages in diverse settings, potentially advancing the field by allowing integration with modern machine learning regressors while maintaining theoretical guarantees.
major comments (2)
- [§3] §3 (Theoretical Characterization): The uniform asymptotic level guarantees require the regression estimators to satisfy specific approximation error rates or consistency conditions in the relevant RKHS norms (as characterized from the GHCM framework). It is not shown that the tree-based estimators (random forests, gradient boosting) used in the simulations meet these rates, particularly under high-dimensional or non-smooth data-generating processes.
- [Simulations] Simulation results (e.g., type I error tables): The reported better type I error control for GKCM with tree-based models is presented as evidence of practical performance, but without verification that the estimators satisfy the paper's conditions, the asymptotic justification does not apply to these experiments, leaving the performance claims as purely empirical observations.
minor comments (2)
- [Abstract] Abstract: The claim of 'uniform asymptotic level guarantees' could briefly note the dependence on the regression estimator satisfying the characterized conditions to avoid implying unconditional guarantees.
- [Notation] Notation and references: Ensure explicit cross-references to the exact conditions from Lundborg et al. (2022) when stating the generalization; some RKHS embedding notation could be clarified with a short appendix table.
Simulated Author's Rebuttal
We thank the referee for the constructive and detailed report. We address each major comment below and outline the revisions we will make to the manuscript.
read point-by-point responses
-
Referee: [§3] §3 (Theoretical Characterization): The uniform asymptotic level guarantees require the regression estimators to satisfy specific approximation error rates or consistency conditions in the relevant RKHS norms (as characterized from the GHCM framework). It is not shown that the tree-based estimators (random forests, gradient boosting) used in the simulations meet these rates, particularly under high-dimensional or non-smooth data-generating processes.
Authors: We agree that the uniform asymptotic level guarantees in Section 3 are conditional on the regression estimators satisfying the specified approximation error rates or consistency conditions in the relevant RKHS norms, as derived from the GHCM framework of Lundborg et al. (2022). The manuscript does not verify or demonstrate that tree-based estimators such as random forests and gradient boosting meet these rates, particularly in high-dimensional or non-smooth settings. This is a genuine limitation in bridging the theory to the specific estimators used in the simulations. In the revised manuscript, we will add explicit discussion in Section 3 clarifying the conditional nature of the guarantees and noting that verification of the rates for tree-based methods is left for future work, as establishing such rates is technically challenging and outside the current scope. We will also cross-reference this in the simulation section. revision: partial
-
Referee: [Simulations] Simulation results (e.g., type I error tables): The reported better type I error control for GKCM with tree-based models is presented as evidence of practical performance, but without verification that the estimators satisfy the paper's conditions, the asymptotic justification does not apply to these experiments, leaving the performance claims as purely empirical observations.
Authors: We acknowledge that, without verification that the tree-based estimators satisfy the paper's conditions, the asymptotic level guarantees do not apply to the simulation experiments, and the observed improvements in type I error control should be viewed strictly as empirical findings. In the revised version, we will update the simulation section (including the description of the type I error tables and any interpretive text) to explicitly state that these results are empirical observations and do not rely on the asymptotic theory unless the relevant conditions are met. This revision will more accurately separate the theoretical contributions from the practical performance results. revision: yes
Circularity Check
Minor self-citation to GHCM framework; central generalization remains independent
full rationale
The paper explicitly builds on the GHCM framework of Lundborg et al. (2022) to characterize conditions for uniform asymptotic level guarantees under arbitrary regressors. This citation provides the base Hilbertian covariance structure but does not reduce the new claims (model-agnostic extension and condition characterization) to fitted parameters or self-referential definitions from the present work. Simulations with tree-based estimators are presented as empirical validation rather than as the source of the asymptotic guarantees. No self-definitional, fitted-input-as-prediction, or load-bearing self-citation patterns are exhibited in the derivation chain.
Axiom & Free-Parameter Ledger
axioms (1)
- domain assumption Regression estimators satisfy the approximation or consistency conditions needed for uniform asymptotic level guarantees.
Lean theorems connected to this paper
-
IndisputableMonolith/Foundation/RealityFromDistinction.leanreality_from_one_distinction unclear?
unclearRelation between the paper passage and the cited Recognition theorem.
We characterise conditions under which GKCM satisfies uniform asymptotic level guarantees... regression estimators must satisfy the approximation or consistency conditions
What do these tags mean?
- matches
- The paper's claim is directly supported by a theorem in the formal canon.
- supports
- The theorem supports part of the paper's argument, but the paper may add assumptions or extra steps.
- extends
- The paper goes beyond the formal theorem; the theorem is a base layer rather than the whole result.
- uses
- The paper appears to rely on the theorem as machinery.
- contradicts
- The paper's claim conflicts with a theorem or certificate in the canon.
- unclear
- Pith found a possible connection, but the passage is too broad, indirect, or ambiguous to say the theorem truly supports the claim.
Reference graph
Works this paper leans on
-
[1]
C´eline Brouard, Marie Szafranski, and Florence d’Alch ´e Buc
URL https://doi.org/10.1007/s11222-015-9583-4. C´eline Brouard, Marie Szafranski, and Florence d’Alch ´e Buc. Input Output Kernel Regression: Supervised and Semi-Supervised Structured Output Prediction with Operator-Valued Kernels. Journal of Machine Learning Research, 17(176):1–48,
-
[2]
ISBN 978-1-4614-6955-1. URLhttps://link. springer.com/10.1007/978-1-4614-6956-8. Panayiota Constantinou and A. Philip Dawid. Extended conditional independence and applications in causal inference.The Annals of Statistics, 45(6):2618–2653, December
-
[3]
Publisher: Institute of Mathematical Statistics
URLhttps: //doi.org/10.1214/16-AOS1537. Publisher: Institute of Mathematical Statistics. Giuseppe Da Prato and Jerzy Zabczyk.Stochastic Equations in Infinite Dimensions. Encyclopedia of Mathematics and its Applications. Cambridge University Press, Cambridge, 2 edition,
-
[4]
URLhttps://doi.org/10.1017/CBO9781107295513
ISBN 978-1-107-05584-1. URLhttps://doi.org/10.1017/CBO9781107295513. Jean-Jacques Daudin. Partial association measures and an application to qualitative regression. Biometrika, 67(3):581–590,
-
[5]
URLhttps://doi.org/10.1093/biomet/67.3
-
[6]
URLhttps://doi.org/10.1007/978-3-031-70352-2_6
Springer Nature Switzerland. URLhttps://doi.org/10.1007/978-3-031-70352-2_6. Damien Garreau, Wittawat Jitkrittum, and Motonobu Kanagawa. Large sample analysis of the median heuristic, October
-
[7]
URLhttp://arxiv.org/abs/1707.07269. arXiv:1707.07269 [math]. Pierre Geurts, Louis Wehenkel, and Florence d’Alch´e Buc. Kernelizing the Output of Tree-Based Methods. InProceedings of the 23nd International Machine Learning Conference, ICML’06, pages 345–352, Pittsburgh, Pennsylvania,
-
[8]
URL https://doi.org/10.1145/1143844.1143888
Association for Computing Machinery. URL https://doi.org/10.1145/1143844.1143888. Pierre Geurts, Louis Wehenkel, and Florence d’Alch´e Buc. Gradient Boosting for Kernelized Output Spaces. InProceedings of the 24th international conference on Machine learning, ICML’07, pages 289–296, New York, NY , USA, June
-
[9]
Association for Computing Machinery. doi: 10.1145/1273496.1273533. URLhttps://doi.org/10.1145/1273496.1273533. Clark Glymour, Kun Zhang, and Peter Spirtes. Review of Causal Discovery Methods Based on Graphical Models.Frontiers in Genetics, 10, June
-
[10]
doi: 10.3389/fgene.2019. 00524. URLhttps://www.frontiersin.org/journals/genetics/articles/ 10.3389/fgene.2019.00524/full. Publisher: Frontiers. Steffen Gr¨unew¨alder, Guy Lever, Luca Baldassarre, Sam Patterson, Arthur Gretton, and Massim- ilano Pontil. Conditional mean embeddings as regressors. InProceedings of the 29th Inter- national Coference on Intern...
-
[11]
URLhttps://doi. org/10.1515/jci-2017-0016. Tailen Hsing and Randall L. Eubank.Theoretical Foundations of Functional Data Analysis, with an Introduction to Linear Operators. Wiley,
-
[12]
URLhttps: //onlinelibrary.wiley.com/doi/book/10.1002/9781118762547
ISBN 978-1-118-76257-8. URLhttps: //onlinelibrary.wiley.com/doi/book/10.1002/9781118762547. Olav Kallenberg.Foundations of Modern Probability, volume 99 ofProbability Theory and Stochastic Modelling. Springer International Publishing, Cham,
-
[13]
URLhttps://link.springer.com/10.1007/978-3-030-61871-1
ISBN 978-3-030-61871-1. URLhttps://link.springer.com/10.1007/978-3-030-61871-1. Lucas Kook.comets: Covariance Measure Tests for Conditional Independence,
-
[14]
URLhttps://doi. org/10.1111/rssb.12544. Anton Rask Lundborg, Ilmun Kim, Rajen D. Shah, and Richard J. Samworth. The projected co- variance measure for assumption-lean variable significance testing.The Annals of Statistics, 52 (6):2851–2878, December
-
[15]
Pub- lisher: Institute of Mathematical Statistics
URLhttps://doi.org/10.1214/24-AOS2447. Pub- lisher: Institute of Mathematical Statistics. Afsaneh Mastouri, Yuchen Zhu, Limor Gultchin, Anna Korba, Ricardo Silva, Matt Kusner, Arthur Gretton, and Krikamol Muandet. Proximal Causal Learning with Kernels: Two-Stage Estimation and Moment Restriction. InProceedings of the 38th International Conference on Machi...
-
[16]
URLhttps://proceedings.neurips.cc/paper_files/ paper/2020/file/f340f1b1f65b6df5b5e3f94d95b11daf-Paper.pdf. Vern I. Paulsen and Mrinal Raghupathi.An Introduction to the Theory of Reproducing Ker- nel Hilbert Spaces. Cambridge Studies in Advanced Mathematics. Cambridge University 15 BERGENSEJDINOVICDIDELEZ Press, Cambridge,
work page 2020
-
[17]
URLhttp://arxiv.org/ abs/2402.13196. arXiv:2402.13196 [cs]. Ali Rahimi and Benjamin Recht. Random Features for Large-Scale Kernel Machines. InAdvances in Neural Information Processing Systems, volume
-
[18]
Cyrill Scheidegger, Julia H ¨orrmann, and Peter B ¨uhlmann
URLhttps://papers.nips.cc/paper_files/paper/2007/hash/ 013a006f03dbc5392effeb8f18fda755-Abstract.html. Cyrill Scheidegger, Julia H ¨orrmann, and Peter B ¨uhlmann. The Weighted Generalised Covariance Measure.Journal of Machine Learning Research, 23(273):1–68,
work page 2007
-
[19]
URLhttps: //dl.acm.org/doi/10.5555/3692070.3693903
JMLR.org. URLhttps: //dl.acm.org/doi/10.5555/3692070.3693903. Bharath K. Sriperumbudur, Kenji Fukumizu, and Gert R. G. Lanckriet. Universality, Character- istic Kernels and RKHS Embedding of Measures.Journal of Machine Learning Research, 12 (70):2389–2410,
-
[20]
doi: 10.1007/978-0-387-77242-4{\_}4
ISBN 978-0-387-77241-7. URLhttps: //link.springer.com/10.1007/978-0-387-77242-4. Eric V . Strobl, Kun Zhang, and Shyam Visweswaran. Approximate Kernel-Based Conditional In- dependence Tests for Fast Non-Parametric Causal Discovery.Journal of Causal Inference, 7 (1), March
-
[21]
URLhttps://doi.org/10.1515/jci-2018-0017. Publisher: De Gruyter. Zolt´an Szab ´o and Bharath K. Sriperumbudur. Characteristic and Universal Tensor Product Ker- nels.Journal of Machine Learning Research, 18(233):1–29,
-
[22]
URLhttp://arxiv.org/abs/1106.4075. arXiv:1106.4075 [math]. 16 THEGENERALISEDKERNELCOVARIANCEMEASURE Kun Zhang and Aapo Hyv ¨arinen. On the identifiability of the post-nonlinear causal model. In Proceedings of the Twenty-Fifth Conference on Uncertainty in Artificial Intelligence, UAI ’09, pages 647–655, Arlington, Virginia, USA, June
-
[23]
AUAI Press. URLhttps://dl.acm. org/doi/10.5555/1795114.1795190. Kun Zhang, Jonas Peters, Dominik Janzing, and Bernhard Scholkopf. Kernel-based Condi- tional Independence Test and Application in Causal Discovery. InProceedings of the Twenty- Seventh Conference on Uncertainty in Artificial Intelligence, UAI’11, pages 804–813, Barcelona, Spain,
-
[24]
URL https://www.auai.org/uai2017/proceedings/papers/250.pdf
AUAI Press. URL https://www.auai.org/uai2017/proceedings/papers/250.pdf. Appendix A. Proof of Lemma 1 By Cohn (2013, Lemma 8.3.8), there exists a Borel-measurable functiong:F → Xsuch that (g◦ϕ)(x) =xfor eachx∈ X. TherebyX= (g◦ϕ)(X)andσ(X) =σ((g◦ϕ)(X)). Sincegand ϕare Borel-measurable, it holds thatσ((g◦ϕ)(X)) =σ(X)⊆σ(ϕ(X))andσ(ϕ(X))⊆σ(X). Appendix B. CI t...
work page 2013
-
[25]
for a version of the KCIT based on a U-statistic and the wild bootstrap: they show that using a joint embedding can 18 THEGENERALISEDKERNELCOVARIANCEMEASURE improve power under the alternative, but also amplify “leaked dependence” under the null. This phenomenon may be especially pronounced when tuningm ′ to maximise power. While leaked de- pendence shoul...
work page 2011
-
[26]
20 THEGENERALISEDKERNELCOVARIANCEMEASURE 0.00 0.00 0.00 0.00 0.00 0.00 0.00 0.02 0.03 0.02 0.00 0.01 0.02 0.00 0.01 0.01 0.03 0.03 0.07 0.06 0.01 0.01 0.02 0.02 0.01 0.01 0.01 0.01 0.02 0.01 0.01 0.01 0.01 0.02 0.07 0.00 0.02 0.00 0.00 0.03 0.98 0.97 0.98 0.98 0.99 0.99 0.99 0.99 0.95 0.95 0.91 0.85 0.82 0.85 0.85 0.91 0.95 0.93 0.96 0.97 1.00 1.00 1.00 1...
work page 2011
-
[27]
Like the type-I error rates, the power of the kernel-based tests using KRR has become more similar to the power of GKCM RF as well; hence, for the former, the reduction in type-I error rates came at the price of a reduction in power. In particular, in settings 1 and 2, KCIT, GKCM KRR and GKCM RF now perform comparably (for most sample sizes with a small l...
work page 2000
discussion (0)
Sign in with ORCID, Apple, or X to comment. Anyone can read and Pith papers without signing in.