pith. machine review for the scientific record. sign in

arxiv: 2604.18653 · v1 · submitted 2026-04-20 · 📊 stat.ME · physics.soc-ph

Recognition: unknown

How to quantify direct correlations between variables

Jeffery Wu, Shengjun Wu

Authors on Pith no claims yet

Pith reviewed 2026-05-10 04:39 UTC · model grok-4.3

classification 📊 stat.ME physics.soc-ph
keywords direct correlationJensen-Shannon divergenceKullback-Leibler divergencedistributional shiftdo-calculuscausal inferencebounded measures
0
0 comments X

The pith

Jensen-Shannon regularization yields bounded [0,1] measures of direct correlation for distributional-shift and do-calculus families.

A machine-rendered reading of the paper's core claim, the machinery that carries it, and where it could break.

The paper organizes existing direct correlation measures into two families: distributional-shift measures that remove the direct link and quantify the change, and do-calculus measures that intervene on one variable to observe the response in the other. For each Kullback-Leibler version in these families, it proposes a Jensen-Shannon regularized analogue that is bounded in [0,1] because the square root of Jensen-Shannon divergence forms a metric. This avoids the singularity problem where KL divergence becomes infinite for non-overlapping supports. The authors derive the maximum attainable value for each such measure given only the observed marginal p(x,z), showing it depends on alphabet size and is typically below 1, which provides the proper reference scale for interpretation. These properties are illustrated using a toy decision-making model and three real-world datasets with bootstrap confidence intervals.

Core claim

By replacing the Kullback-Leibler divergence with its Jensen-Shannon counterpart in both the distributional-shift family and the do-calculus family, the resulting regularized measures of direct correlation take values in the interval [0,1], are free from singularities, and have explicit upper bounds determined by the alphabet sizes under the observed marginal distribution p(x,z).

What carries the argument

Jensen-Shannon regularization applied to Kullback-Leibler-based direct correlation measures from the distributional-shift and do-calculus families.

If this is right

  • The scale of direct correlation is set by an upper bound strictly below 1 that depends on the variable alphabet sizes.
  • Numerical values from the measures can be directly compared across different variable pairs without normalization issues.
  • The approach applies to empirical data with uncertainty estimates via bootstrapping, as demonstrated on survival and income datasets.
  • Direct and indirect correlations can be distinguished more reliably in the presence of confounders.

Where Pith is reading between the lines

These are editorial extensions of the paper, not claims the author makes directly.

  • Such measures might help in building more interpretable causal graphs by attaching quantitative direct strengths to edges.
  • The upper bound analysis highlights the need for alphabet-aware normalization when alphabets are small.
  • Future work could test these measures in continuous variable settings by suitable discretization or kernel approximations.
  • Different applications may favor one family over the other depending on whether associational or interventional semantics are preferred.

Load-bearing premise

The Jensen-Shannon regularized analogues retain the intended interpretation of direct correlation as defined by the original Kullback-Leibler constructions in the two families.

What would settle it

A joint distribution over discrete variables where the regularized measure exceeds its derived upper bound under the fixed marginal p(x,z), or where it reports positive direct correlation despite known independence after conditioning on confounders.

Figures

Figures reproduced from arXiv: 2604.18653 by Jeffery Wu, Shengjun Wu.

Figure 1
Figure 1. Figure 1: The CMI measures the change of the joint distribu [PITH_FULL_IMAGE:figures/full_fig_p004_1.png] view at source ↗
Figure 2
Figure 2. Figure 2: The PMI measures the change of the joint distri [PITH_FULL_IMAGE:figures/full_fig_p004_2.png] view at source ↗
Figure 3
Figure 3. Figure 3: The one-way ICMI C D icmi(X → Y ) measures the distance between a joint distribution p1 with correlation be￾tween Z and X removed and another joint distribution p2 with the correlation between X and Y further removed as well. Similarly, the one-way ICMI from Y to X is defined as C D icmi(X ← Y ) = DKL(p(x|y, z)p(y)p(z)||p(y)p(x, z)). (22) The average of the two one-way correlation is a measure of the direc… view at source ↗
Figure 4
Figure 4. Figure 4: The reconstructed conditional do probability [PITH_FULL_IMAGE:figures/full_fig_p005_4.png] view at source ↗
Figure 5
Figure 5. Figure 5: Two Markov-equivalent [18] influence diagrams [PITH_FULL_IMAGE:figures/full_fig_p007_5.png] view at source ↗
Figure 6
Figure 6. Figure 6: A decision-making model. Here q1 denotes the strength of Z’s influence on X, similarly for q2 and q3. with probability p(z) given by p(z) =    1 + q0 2 (z = 0) 1 − q0 2 (z = 1). (47) Xie usually seeks advice from Zhang, and let q1 denote Zhang’s influence strength on Xie’s choice. q1 = 1 means a complete influence and Xie’s choice is the same as Zhang’s (x = z); q1 = 0 means Xie’s choice is inde￾pende… view at source ↗
Figure 7
Figure 7. Figure 7: Various measures of direct correlation in the [PITH_FULL_IMAGE:figures/full_fig_p010_7.png] view at source ↗
Figure 8
Figure 8. Figure 8: Three models of different relationships indistin [PITH_FULL_IMAGE:figures/full_fig_p010_8.png] view at source ↗
Figure 9
Figure 9. Figure 9: A simplified decision-making model. λ0 is the influ￾ence strength from Z to X and λ1 parametrises the influence from X to Y . (vote). Zhang makes an unbiased choice Z = z with an equal probability p(z) = 1/2. Xie usually seeks ad￾vice from Zhang, and let λ0 denote Zhang’s influence strength on Xie’s choice. λ0 = 1 means a complete in￾fluence and Xie’s choice is the always same as Zhang’s (x = z); λ0 = 0 me… view at source ↗
Figure 10
Figure 10. Figure 10: Direct-correlation measures in the simplified [PITH_FULL_IMAGE:figures/full_fig_p011_10.png] view at source ↗
Figure 11
Figure 11. Figure 11: Regularized direct-correlation measures on Tit [PITH_FULL_IMAGE:figures/full_fig_p014_11.png] view at source ↗
read the original abstract

Analyzing correlation between variables is often both the tool and the goal of modern science. A crucial question is whether the correlation between two variables is a direct correlation or only an indirect correlation through a confounder. We review the existing measures of direct correlation and organize them into two families, each corresponding to a systematic construction: (i) removing the direct correlation from the original joint distribution and quantifying the resulting distributional shift, and (ii) intervening on one variable via do-calculus and quantifying how the distribution of the other variable responds. For every Kullback--Leibler-based measure in either family, we propose a Jensen--Shannon-based regularized analogue. Since the square root of the Jensen--Shannon divergence is a bounded metric, the regularized measures take values in $[0,1]$ and are free of the singularity of the Kullback--Leibler divergence. We further analyze the achievable upper bound of each regularized measure under the observed marginal $p(x,z)$, which depends on the alphabet size and is in general strictly below $1$; this sets the correct scale against which observed values should be read. The properties and the differences of the proposed measures are illustrated on a decision-making toy model and on three public real datasets: Titanic survival, UCI Adult (Census Income), and the UC~Berkeley 1973 graduate admissions. Bootstrap $95\%$ confidence intervals are reported for every numerical value.

Editorial analysis

A structured set of objections, weighed in public.

Desk editor's note, referee report, simulated authors' rebuttal, and a circularity audit. Tearing a paper down is the easy half of reading it; the pith above is the substance, this is the friction.

Referee Report

2 major / 2 minor

Summary. The paper reviews measures of direct correlation between variables and organizes them into two families: (i) distributional-shift constructions that remove direct correlation from the joint and quantify the shift via divergences, and (ii) do-calculus constructions that intervene on one variable and quantify the response of the other. For every KL-based measure in either family the authors propose a Jensen-Shannon regularized analogue; because sqrt(JS) is a metric the new quantities lie in [0,1] and avoid KL singularities. They derive the achievable upper bound of each regularized measure under the observed marginal p(x,z), showing that the bound depends on alphabet size and is typically strictly less than 1. The measures are illustrated on a decision-making toy model and on the Titanic, UCI Adult, and UC Berkeley admissions datasets, with bootstrap 95% confidence intervals reported for all numerical values.

Significance. If the upper-bound derivations and semantic preservation hold for both families, the work supplies a practical, bounded scale for quantifying direct versus indirect correlation that is directly usable in applied causal and statistical analyses. The combination of theoretical regularization, explicit upper-bound analysis, and empirical demonstration with confidence intervals strengthens the contribution for methodology journals.

major comments (2)
  1. [Section 4] Section 4 (upper-bound analysis): the claim that the achievable upper bound of each regularized measure is characterized under the observed marginal p(x,z) does not uniformly apply to the do-calculus family. Interventional distributions p(·|do(·)) are fixed by the causal mechanism, not solely by the observational marginal; maximizing only over joints consistent with p(x,z) may therefore produce bounds that are either unattainable or not the least upper bound under the actual interventional semantics. This directly affects the recommended scaling interpretation for the do-calculus regularized measures.
  2. [Section 3] Section 3 (regularization construction): while the JS analogues are shown to be bounded and singularity-free, the manuscript does not verify that they preserve the original KL measures' invariance properties or direct-correlation semantics under the two distinct constructions. A short proof or counter-example check that the ordering or zero/non-zero behavior is retained would be needed to confirm the analogues remain faithful to the intended interpretation.
minor comments (2)
  1. [Methods / Experiments] The bootstrap procedure (number of replicates, handling of discrete alphabet size, and whether resampling respects the marginal constraint) is mentioned but not detailed; adding a short algorithmic description or pseudocode would improve reproducibility.
  2. [Notation] Notation for the two families is introduced clearly in the abstract but the transition between distributional-shift and do-calculus notation in the main text occasionally re-uses symbols (e.g., p(x,z) for both observational and interventional contexts); a brief reminder table would reduce reader confusion.

Simulated Author's Rebuttal

2 responses · 0 unresolved

We thank the referee for the careful review and constructive comments. We address each major comment below and will revise the manuscript to incorporate the necessary clarifications and additions.

read point-by-point responses
  1. Referee: [Section 4] Section 4 (upper-bound analysis): the claim that the achievable upper bound of each regularized measure is characterized under the observed marginal p(x,z) does not uniformly apply to the do-calculus family. Interventional distributions p(·|do(·)) are fixed by the causal mechanism, not solely by the observational marginal; maximizing only over joints consistent with p(x,z) may therefore produce bounds that are either unattainable or not the least upper bound under the actual interventional semantics. This directly affects the recommended scaling interpretation for the do-calculus regularized measures.

    Authors: We agree with the referee that the upper-bound derivation does not apply uniformly. The bounds were obtained by maximizing over joints consistent with the observed marginal p(x,z), but interventional distributions in the do-calculus family are fixed by the causal mechanisms, which are not identifiable from p(x,z) alone. This means the reported bounds may not be attainable or tight under the interventional semantics. We will revise Section 4 to distinguish the two families explicitly, state the limitation for the do-calculus measures, and qualify the scaling interpretation as the best achievable under the observational marginal without further assumptions on the structural causal model. revision: yes

  2. Referee: [Section 3] Section 3 (regularization construction): while the JS analogues are shown to be bounded and singularity-free, the manuscript does not verify that they preserve the original KL measures' invariance properties or direct-correlation semantics under the two distinct constructions. A short proof or counter-example check that the ordering or zero/non-zero behavior is retained would be needed to confirm the analogues remain faithful to the intended interpretation.

    Authors: We acknowledge that explicit verification of semantic preservation was omitted. Because the Jensen-Shannon divergence satisfies D_JS(P||Q)=0 if and only if P=Q, the regularized measures are zero precisely when the original KL-based measures are zero, thereby retaining the zero/non-zero behavior that identifies absence of direct correlation in each construction. Invariance to relabeling of categories is inherited directly from the divergence and the shared functional form of the constructions. We will add a short appendix containing these arguments together with numerical checks on the decision-making toy model confirming that the relative ordering of values is preserved across the examples. This will establish that the analogues remain faithful to the intended interpretations. revision: yes

Circularity Check

0 steps flagged

No circularity: derivations rest on standard divergences and do-calculus without self-referential reductions

full rationale

The paper organizes existing KL-based direct-correlation measures into distributional-shift and do-calculus families, then explicitly constructs JS-regularized analogues using the standard square-root JS metric. Upper bounds are derived mathematically under the fixed observational marginal p(x,z) as a separate analysis step. No equation reduces a proposed measure to a fitted parameter, a self-definition, or a prior result by the same authors that itself lacks independent verification. Self-citations, if present, are not load-bearing for the central constructions or bounds. The chain is therefore self-contained against external benchmarks.

Axiom & Free-Parameter Ledger

0 free parameters · 2 axioms · 0 invented entities

The constructions rely on standard axioms of probability and the do-calculus framework; no new entities are postulated and no free parameters are introduced in the abstract.

axioms (2)
  • domain assumption The variables under study are discrete with finite alphabets
    Upper-bound analysis is stated to depend on alphabet size.
  • domain assumption Do-calculus interventions correctly isolate direct effects in the observed joint distribution
    Second family is defined via intervening on one variable via do-calculus.

pith-pipeline@v0.9.0 · 5545 in / 1542 out tokens · 44459 ms · 2026-05-10T04:39:51.583877+00:00 · methodology

discussion (0)

Sign in with ORCID, Apple, or X to comment. Anyone can read and Pith papers without signing in.

Reference graph

Works this paper leans on

42 extracted references · 17 canonical work pages · 1 internal anchor

  1. [1]

    How to quantify direct correlations between variables

    5% of the 2691 male applicants but only 30. 4% of the 1835 female applicants were admitted, an apparent 14- point gender gap. Broken down by the six largest depart- ments, however, the admission rates are close for the two sexes and in four of the six departments in fact slightly favour women; the overall gap arises because women ap- plied disproportionat...

  2. [2]

    how correlated is correlated

    58 and p(Y =1|do(Pclass=3))≈ 0. 26 (computed under strategy (b) using the observed p(sex)). B. Dataset II: UCI Adult (Census Income) The UCI Adult dataset [39] contains n = 32561 records extracted from the 1994 US Census, with income di- chotomised at $50 000/year. We take X as the education level binned into four ordinal groups (0: at most 12th grade, 1:...

  3. [3]

    P. J. Bickel, E. A. Hammel, J. W. O’Connell, Sex bias in graduate admissions: data from Berkeley, Science 187 (4175) (1975) 398–404. doi:10.1126/science.187.4175.398. URL https://vincentarelbundock.github.io/ Rdatasets/csv/datasets/UCBAdmissions.csv

  4. [4]

    URL https://www.kaggle.com/c/titanic Training-split CSV (used in this paper) mir- rored at https://raw.githubusercontent.com/ datasciencedojo/datasets/master/titanic.csv

    Kaggle, Inc., Titanic — Machine Learning from Disaster, online competition, 2012. URL https://www.kaggle.com/c/titanic Training-split CSV (used in this paper) mir- rored at https://raw.githubusercontent.com/ datasciencedojo/datasets/master/titanic.csv. Passenger-level details compiled originally from Ency- clopedia Titanica and Eaton & Haas, Titanic: Triu...

  5. [5]

    E. H. Simpson, The interpretation of interaction in con- tingency tables, Journal of the Royal Statistical Soci- ety, Series B 13 (2) (1951) 238–241. doi:10.1111/j. 2517-6161.1951.tb00088.x

  6. [6]

    Sedgwick, Spearman’s rank correlation coefficient, BMJ 349 (2014) g7327

    P. Sedgwick, Spearman’s rank correlation coefficient, BMJ 349 (2014) g7327. doi:10.1136/bmj.g7327

  7. [7]

    C. E. Shannon, A mathematical theory of communica- tion, The Bell system technical journal 27 (3) (1948) 379–423

  8. [8]

    C. E. Shannon, A mathematical theory of communica- tion, The Bell system technical journal 27 (4) (1948) 623–656

  9. [9]

    The Annals of Mathe- matical Statistics22(1), 79–86 (1951) https://doi.org/10.1214/aoms/1177729694

    S. Kullback, R. A. Leibler, On information and suffi- ciency, Annals of Mathematical Statistics 22 (1) (1951) 79–86. doi:10.1214/aoms/1177729694

  10. [10]

    G. J. Székely, M. L. Rizzo, N. K. Bakirov, Measuring and testing dependence by correlation of distances, The Annals of Statistics 35 (6) (2007) 2769–2794

  11. [11]

    M. R. Kosorok, On Brownian distance covariance and high dimensional data, Annals of Applied Statistics 3 (4) (2009) 1266–1269. doi:10.1214/09-AOAS312B

  12. [12]

    D. N. Reshef, Y. A. Reshef, H. K. Finucane, S. R. Grossman, G. McVean, P. J. Turnbaugh, E. S. Lander, M. Mitzenmacher, P. C. Sabeti, Detecting novel associa- tions in large data sets, Science 334 (6062) (2011) 1518– 1524

  13. [13]

    J. B. Kinney, G. S. Atwal, Equitability, mutual informa - tion, and the maximal information coefficient, Proceed- ings of the National Academy of Sciences 111 (9) (2014) 3354–3359

  14. [14]

    Measuring statistical dependence with Hilbert-Schmidt norms

    A. Gretton, O. Bousquet, A. Smola, B. Schölkopf, Measuring statistical dependence with Hilbert-Schmidt norms, in: Algorithmic Learning Theory, Vol. 3734 of Lecture Notes in Computer Science, Springer, 2005, pp. 63–77. doi:10.1007/11564089_7

  15. [15]

    Altman, M

    N. Altman, M. Krzywinski, Points of significance: assoc i- ation, correlation and causation, Nature Methods 12 (10) (2015) 899–900. doi:10.1038/nmeth.3587

  16. [16]

    Reichenbach, The Direction of Time, University of California Press, Berkeley, 1956

    H. Reichenbach, The Direction of Time, University of California Press, Berkeley, 1956

  17. [17]

    D. M. Hausman, J. Woodward, Independence, invariance and the causal markov condition, The British journal for the philosophy of science 50 (4) (1999) 521–583

  18. [18]

    Geiger, J

    D. Geiger, J. Pearl, On the logic of causal models, in: R. D. Shachter, T. S. Levitt, L. N. Kanal, J. F. Lemmer (Eds.), Uncertainty in Artificial Intelligence, Vol. 9 of Machine Intelligence and Pattern Recogni- tion, North-Holland, 1990, pp. 3–14. doi:10.1016/ B978-0-444-88650-7.50006-8 . 15

  19. [19]

    S. L. Lauritzen, A. P. Dawid, B. N. Larsen, H.-G. Leimer, Independence properties of directed markov fields, Net- works 20 (5) (1990) 491–505

  20. [20]

    Verma, J

    T. Verma, J. Pearl, Equivalence and synthesis of causal models, in: Proceedings of the Sixth Conference on Un- certainty in Artificial Intelligence (UAI’90), Cambridge, MA, 1990, pp. 220–227

  21. [21]

    Koller, N

    D. Koller, N. Friedman, Probabilistic graphical model s: Principles and techniques, MIT press, 2009

  22. [22]

    De La Fuente, N

    A. De La Fuente, N. Bing, I. Hoeschele, P. Mendes, Dis- covery of meaningful associations in genomic data us- ing partial correlation coefficients, Bioinformatics 20 (18 ) (2004) 3565–3574

  23. [23]

    J. M. Stuart, E. Segal, D. Koller, S. K. Kim, A gene- coexpression network for global discovery of conserved genetic modules, Science 302 (5643) (2003) 249–255

  24. [24]

    Zhang, X.-M

    X. Zhang, X.-M. Zhao, K. He, L. Lu, Y. Cao, J. Liu, J.-K. Hao, Z.-P. Liu, L. Chen, Inferring gene regulatory net- works from gene expression data by path consistency al- gorithm based on conditional mutual information, Bioin- formatics 28 (1) (2012) 98–104

  25. [25]

    Liang, X

    K.-C. Liang, X. Wang, Gene regulatory network reconstruction using conditional mutual information, EURASIP Journal on Bioinformatics and Systems Bi- ology 2008 (2008) 253894. doi:10.1155/2008/253894

  26. [26]

    Zhang, J

    X. Zhang, J. Zhao, J.-K. Hao, X.-M. Zhao, L. Chen, Conditional mutual inclusive information enables accu- rate quantification of associations in gene regulatory net- works, Nucleic Acids Research 43 (5) (2015) e31–e31

  27. [27]

    J. Zhao, Y. Zhou, X. Zhang, L. Chen, Part mutual in- formation for quantifying direct associations in networks , Proceedings of the National Academy of Sciences 113 (18) (2016) 5130–5135

  28. [28]

    J. Shi, J. Zhao, X. Liu, L. Chen, T. Li, Quantifying direc t dependencies in biological networks by multiscale associ- ation analysis, IEEE/ACM Transactions on Computa- tional Biology and Bioinformatics 17 (2) (2020) 449–458. doi:10.1109/TCBB.2018.2846536

  29. [29]

    M. Zhao, Y. Chen, Q. Liu, S. Wu, Quantifying direct as- sociations between variables, Fundamental Research 5 (4) (2025) 1538–1546. doi:10.1016/j.fmre.2023.06.012

  30. [30]

    Pearl, Probabilistic reasoning in intelligent syst ems: networks of plausible inference, Morgan Kaufmann, 1988

    J. Pearl, Probabilistic reasoning in intelligent syst ems: networks of plausible inference, Morgan Kaufmann, 1988

  31. [31]

    Pearl, Causality, Cambridge University Press, 2009

    J. Pearl, Causality, Cambridge University Press, 2009

  32. [32]

    Pearl, Causal inference in statistics: An overview, Statistics Surveys 3 (2009) 96–146

    J. Pearl, Causal inference in statistics: An overview, Statistics Surveys 3 (2009) 96–146

  33. [33]

    Pearl, Direct and indirect effects, in: H

    J. Pearl, Direct and indirect effects, in: H. Geffner, R. Dechter, J. Y. Halpern (Eds.), Probabilistic and Causal Inference: The Works of Judea Pearl, ACM Books, 2022, pp. 373–392. doi:10.1145/3501714. 3501736

  34. [34]

    P. W. Holland, Statistics and causal inference, Journa l of the American Statistical Association 81 (396) (1986) 945–960. doi:10.1080/01621459.1986.10478354

  35. [35]

    Janzing, D

    D. Janzing, D. Balduzzi, M. Grosse-Wentrup, B. Schölkopf, Quantifying causal influences, The Annals of Statistics 41 (5) (2013) 2324–2358

  36. [36]

    D. A. Freedman, Statistical models: theory and practic e, Cambridge University Press, 2009

  37. [37]

    McNamee, Confounding and confounders, Occupa- tional and Environmental Medicine 60 (3) (2003) 227– 234

    R. McNamee, Confounding and confounders, Occupa- tional and Environmental Medicine 60 (3) (2003) 227– 234

  38. [38]

    Peters, D

    J. Peters, D. Janzing, B. Schölkopf, Elements of Causal Inference: Foundations and Learning Algorithms , MIT Press, Cambridge, MA, 2017

  39. [39]

    , year =

    J. Lin, Divergence measures based on the Shannon en- tropy, IEEE Transactions on Information Theory 37 (1) (1991) 145–151. doi:10.1109/18.61115

  40. [40]

    D. M. Endres, J. E. Schindelin, A new metric for proba- bility distributions, IEEE Transactions on Information Theory 49 (7) (2003) 1858–1860. doi:10.1109/TIT. 2003.813506

  41. [41]

    Becker and R

    R. Kohavi, B. Becker, Adult Data Set (Census In- come), UCI Machine Learning Repository (1996). doi:10.24432/C5XW20. URL https://archive.ics.uci.edu/ml/ machine-learning-databases/adult/adult.data

  42. [42]

    The Annals of Statistics7(1), 1–26 (1979) https://doi.org/10.1214/aos/1176344552

    B. Efron, Bootstrap methods: another look at the jack- knife, Annals of Statistics 7 (1) (1979) 1–26. doi: 10.1214/aos/1176344552