Causal Density Functions

Sridhar Mahadevan

arxiv: 2606.00754 · v1 · pith:FYKZAC3Vnew · submitted 2026-05-30 · 📊 stat.ME · cs.AI· cs.LG

Causal Density Functions

Sridhar Mahadevan This is my paper

Pith reviewed 2026-06-28 18:24 UTC · model grok-4.3

classification 📊 stat.ME cs.AIcs.LG

keywords causal density functionsRadon-Nikodym derivativesinterventional lawsobservational lawscausal effectsdensity ratiosdirected influencedo-calculus

0 comments

The pith

Causal density functions are Radon-Nikodym derivatives that let observational expectations be reweighted to recover interventional ones through the identity E_do[f(Y)] equals E_obs[f(Y) times rho(X,Y)].

A machine-rendered reading of the paper's core claim, the machinery that carries it, and where it could break.

The paper defines causal density functions as the Radon-Nikodym derivatives that compare interventional probability laws to observational ones. These derivatives act as pointwise density ratios that quantify how interventions change local probabilities. The central identity shows that any observational expectation of a function can be turned into the corresponding interventional expectation simply by multiplying by the density ratio, which makes the functions directly testable and estimable from data. Practical estimators follow for do-curves and scores on directed edges, and the construction is related to existing change-of-measure semantics for conditioning and intervention.

Core claim

Causal density functions are Radon-Nikodym derivatives that compare interventional laws to observational laws and therefore act as local density ratios for causal effects. The basic identity E_do[f(Y)] = E_obs[f(Y) rho(X,Y)] makes causal density directly testable: if the estimated density ratio is correct, observational expectations reweighted by rho reproduce interventional expectations. The paper derives practical estimators for do-curves and directed edge scores and relates the construction to Radon-Nikodym and Kan semantics for conditioning and intervention.

What carries the argument

The causal density function rho, the Radon-Nikodym derivative between interventional and observational laws, which supplies the local density ratio used to reweight observational data into interventional expectations.

If this is right

If the estimated density ratio is correct, observational expectations reweighted by rho reproduce interventional expectations.
Estimators can be derived for do-curves and for scores on individual directed edges.
The construction supplies a change-of-measure semantics that connects intervention to conditioning via Radon-Nikodym derivatives.

Where Pith is reading between the lines

These are editorial extensions of the paper, not claims the author makes directly.

The same reweighting identity could be used to validate causal claims on data sets where only observational samples are available but limited interventional benchmarks exist for checking.
Because the ratio is pointwise, it might be combined with kernel or neural density estimators to handle continuous or high-dimensional variables without requiring full distribution comparisons.
The approach suggests a route to causal feature importance that scores each variable's directed contribution locally rather than through global distribution distances.

Load-bearing premise

The Radon-Nikodym derivative between the interventional and observational laws exists as a well-defined, estimable function that can be used to score directed influence.

What would settle it

A controlled experiment in which known interventional outcomes are compared against observational data reweighted by an estimated causal density ratio; systematic mismatch between the two would show that the ratio does not correctly capture the change of measure.

Figures

Figures reproduced from arXiv: 2606.00754 by Sridhar Mahadevan.

**Figure 1.** Figure 1: Sachs causal-density results. Left: causal-density edge-score heatmap (sij ; lower is better). Right: regime divergence (max MMD per variable across environments). Low-score entries include known signaling relations while MMD measures cross-regime overlap. 7 Experimental Results We evaluate causal density functions as estimable change-of-measure objects for interventional response and directed structure di… view at source ↗

**Figure 2.** Figure 2: Multi-regime chain causal-density scores. Left: Ground–truth chain structure (X0 → X1 → · · · → X7). Middle: predicted adjacency from the pairwise causal-density scorer. Right: causal-density edge-score matrix sij (lower is better). Extra edges remain, but the correct chain directionality is visible in the low-score band near the diagonal [PITH_FULL_IMAGE:figures/full_fig_p008_2.png] view at source ↗

**Figure 3.** Figure 3: Triangle illustrating the relationship between Kan Extensions, Radon-Nikodym derivatives, [PITH_FULL_IMAGE:figures/full_fig_p010_3.png] view at source ↗

**Figure 4.** Figure 4: Comparison between Judo and Kan–Do calculi. [PITH_FULL_IMAGE:figures/full_fig_p011_4.png] view at source ↗

**Figure 5.** Figure 5: Beck–Chevalley square: exchanging intervention (left Kan) and conditioning (right Kan). [PITH_FULL_IMAGE:figures/full_fig_p017_5.png] view at source ↗

**Figure 6.** Figure 6: PISA 2022 causal-density estimates. Left: causal-density edge scores (sij ; lower is better). Middle: predicted adjacency (Top–k = 2). Right: cross–country regime divergence (max MMD). Model selection and reporting. Because no gold-standard DAG is available, we select sparsity/thresholds by validation log-likelihood computed on held-out env regimes (60/20/20 split stratified by country), mirroring the sel… view at source ↗

**Figure 7.** Figure 7: PISA do–curve: hisei_trend → escs_trend. Interventional response computed via causal-density reweighting. E.2 Additional Results: LINCS L1000 Panel We use the panel_coarse.csv version of the LINCS L1000 data from the Judo Calculus experiments, restricted here to a 30–gene subset for tractability. Each row corresponds to a perturbagen regime, encoded by the column env, and the remaining columns are normali… view at source ↗

**Figure 8.** Figure 8: RN calibration for hisei_trend → escs_trend. Large calibration gaps reflect substantial cross–country divergence [PITH_FULL_IMAGE:figures/full_fig_p020_8.png] view at source ↗

**Figure 9.** Figure 9: LINCS L1000 (Kan–Do) results. Left: RN–Kan edge–score heatmap (sij , lower is better). Middle: predicted adjacency (Top–k= 3 parents per target). Right: j–stability (max MMD across perturbagens) for the 20 most unstable genes. perturbagens for each gene. Parent selection uses a Top–k= 3 criterion per target with a greedy DAG breaker to enforce acyclicity [PITH_FULL_IMAGE:figures/full_fig_p020_9.png] view at source ↗

**Figure 10.** Figure 10: LINCS do–curve for HSPA8. Estimated causal response of the best–scored downstream gene (CDC25B) under soft interventions on HSPA8. RN calibration for HSPA8. To assess how well the Radon–Nikodym causal density transports observational expectations to interventional ones in a high–variance biological setting, we evaluated the RN calibration identity for the pair (HSPA8, CDC25B). Using the fitted RN–flow on … view at source ↗

**Figure 11.** Figure 11: reports the resulting absolute gaps. Unlike the synthetic and S9 settings (where gaps are typically below 10−2 ), the LINCS calibration errors are noticeably larger (≈ 0.12 for f(y) = y and ≈ 0.32 for f(y) = y 2 ), reflecting the substantial regime heterogeneity of the perturbagen panel (App. E.2). This illustrates a key theme of Kan–Do calculus: when local regimes differ strongly, RN densities remain inf… view at source ↗

**Figure 12.** Figure 12: Ground–truth chain (left), predicted adjacency (middle), and RN–Kan edge–score matrix [PITH_FULL_IMAGE:figures/full_fig_p023_12.png] view at source ↗

read the original abstract

We introduce causal density functions: Radon-Nikodym derivatives that compare interventional laws to observational laws and therefore act as local density ratios for causal effects. Whereas many causal-strength measures compare whole distributions after graph surgery, causal density functions provide a pointwise change-of-measure object that can be estimated, calibrated, and used to score directed influence. The basic identity \[ \mathbb{E}_{\mathrm{do}}[f(Y)] = \mathbb{E}_{\mathrm{obs}}\!\left[f(Y)\rho(X,Y)\right] \] makes causal density directly testable: if the estimated density ratio is correct, observational expectations reweighted by $\rho$ reproduce interventional expectations. We derive practical estimators for do-curves and directed edge scores, relate the construction to Radon-Nikodym/Kan semantics for conditioning and intervention, and evaluate the resulting estimators on synthetic and real perturbation benchmarks.

Editorial analysis

A structured set of objections, weighed in public.

Desk editor's note, referee report, simulated authors' rebuttal, and a circularity audit. Tearing a paper down is the easy half of reading it; the pith above is the substance, this is the friction.

Desk Editor's Note private letter to a colleague

The causal density construction runs into a basic problem with the existence of the Radon-Nikodym derivative under hard interventions.

read the letter

The paper's main contribution is an attempt to define causal density functions as Radon-Nikodym derivatives between interventional and observational distributions. This is meant to give a local, pointwise way to measure causal influence that can be estimated from data.

What stands out as new is the basic identity that lets you recover interventional expectations by reweighting observational ones with this density ratio ρ. They also claim to derive estimators for do-curves and edge scores, link it to existing semantics for intervention, and test the estimators on synthetic and real perturbation data.

The work does a decent job of making the idea concrete and testable through that reweighting identity. Connecting it to Radon-Nikodym and Kan semantics shows some engagement with the measure-theoretic side of causal inference.

The soft spot is the assumption that the Radon-Nikodym derivative exists. Standard hard interventions create singular measures, so the derivative is not defined in the usual cases the field cares about. The abstract does not spell out restrictions to soft interventions or other settings where absolute continuity holds. If the full paper does not address this directly with conditions or examples, the central claim is on shaky ground.

This paper is for people working on causal effect estimation who are comfortable with density ratio methods. A reader looking for new tools in that niche might find the estimators useful to try out.

It deserves a serious referee because it brings a fresh object to the table with some empirical checks, even though the foundational issue needs sorting out.

I would recommend sending it for peer review with the expectation that reviewers will press on the measure existence question.

Referee Report

1 major / 0 minor

Summary. The paper introduces causal density functions as Radon-Nikodym derivatives between interventional and observational laws, serving as pointwise density ratios for causal effects. It presents the identity E_do[f(Y)] = E_obs[f(Y) ρ(X,Y)] as a testable relation allowing observational expectations reweighted by ρ to recover interventional ones, derives estimators for do-curves and directed edge scores, relates the approach to Radon-Nikodym semantics for intervention, and evaluates on synthetic and real benchmarks.

Significance. If the identity and estimators are well-defined, the construction supplies a local, estimable, and directly testable alternative to global post-intervention distribution comparisons, with potential utility for scoring directed influence. The reweighting identity is a strength if the Radon-Nikodym derivative exists and can be estimated reliably.

major comments (1)

[Abstract / central identity] Abstract (central identity): The equality E_do[f(Y)] = E_obs[f(Y) ρ(X,Y)] requires the interventional measure to be absolutely continuous w.r.t. the observational measure for the Radon-Nikodym derivative ρ to exist as a measurable function. Standard hard interventions (e.g., do(X=x) inducing a Dirac mass) produce mutually singular measures, so ρ is undefined. The manuscript must state explicit conditions on intervention type (soft vs. hard) and measure classes guaranteeing absolute continuity, or demonstrate via counter-example where the construction fails.

Simulated Author's Rebuttal

1 responses · 0 unresolved

We thank the referee for highlighting the important technical requirement of absolute continuity for the Radon-Nikodym derivative. We address this point directly below and will revise the manuscript accordingly.

read point-by-point responses

Referee: [Abstract / central identity] Abstract (central identity): The equality E_do[f(Y)] = E_obs[f(Y) ρ(X,Y)] requires the interventional measure to be absolutely continuous w.r.t. the observational measure for the Radon-Nikodym derivative ρ to exist as a measurable function. Standard hard interventions (e.g., do(X=x) inducing a Dirac mass) produce mutually singular measures, so ρ is undefined. The manuscript must state explicit conditions on intervention type (soft vs. hard) and measure classes guaranteeing absolute continuity, or demonstrate via counter-example where the construction fails.

Authors: We agree that the identity holds only when the interventional law is absolutely continuous with respect to the observational law. The manuscript's development of causal density functions is intended for regimes in which this absolute continuity obtains (e.g., soft or randomized interventions that do not introduce atoms on sets of observational measure zero). We will add an explicit statement of this assumption in the abstract, introduction, and methods, together with a short paragraph clarifying the distinction between soft and hard interventions and noting that the construction does not apply to deterministic hard interventions on continuous variables that produce mutually singular measures. A brief counter-example illustrating failure under hard intervention will also be included. revision: yes

Circularity Check

0 steps flagged

No significant circularity in derivation chain

full rationale

The paper defines causal density functions explicitly as Radon-Nikodym derivatives between interventional and observational measures, then states the change-of-measure identity that holds by the definition of the RN derivative. This is presented as the basic identity enabling testability, but the equality is tautological once ρ is defined as dP_do/dP_obs; no claimed prediction or first-principles result is shown to reduce to a fitted parameter or prior self-citation by construction. The paper proceeds to derive estimators for do-curves and edge scores and evaluates them on synthetic and real benchmarks, supplying independent empirical content. No load-bearing step matches the enumerated circularity patterns.

Axiom & Free-Parameter Ledger

0 free parameters · 1 axioms · 0 invented entities

The central construction rests on the existence of the Radon-Nikodym derivative between interventional and observational measures; no free parameters or invented entities are described in the abstract.

axioms (1)

domain assumption Existence of Radon-Nikodym derivative between interventional and observational probability measures
Required for the causal density function ρ to be defined.

pith-pipeline@v0.9.1-grok · 5673 in / 1171 out tokens · 22678 ms · 2026-06-28T18:24:39.821338+00:00 · methodology

discussion (0)

Forward citations

Cited by 2 Pith papers

Reviewed papers in the Pith corpus that reference this work. Sorted by Pith novelty score.

Infinitesimal Causality
math.CT 2026-06 unverdicted novelty 7.0

Infinitesimal causality is defined via compatibility of categorical and geometric Frobenius structures in Markov categories, with interventions as tangent vectors deforming copy/discard operations and Lie brackets mea...
Latent Confounded Causal Discovery via Lie Bracket Geometry
cs.LG 2026-06 unverdicted novelty 6.0

Introduces BRIDGE and SKFM algorithms that detect latent confounders via non-closing Lie brackets in interventional vector fields derived from density ratios.

Reference graph

Works this paper leans on

21 extracted references · 5 canonical work pages · cited by 2 Pith papers

[1]

Probability and Measure

Patrick Billingsley. Probability and Measure. Wiley, 3rd edition, 1995

1995
[2]

Differentiable causal discovery from interventional data, 2020 a

Philippe Brouillard, Sébastien Lachapelle, Alexandre Lacoste, Simon Lacoste-Julien, and Alexandre Drouin. Differentiable causal discovery from interventional data, 2020 a . URL https://arxiv.org/abs/2007.01754

work page arXiv 2020
[3]

Differentiable causal discovery from interventional data

Pierre Brouillard, Benoit Lachapelle, Simon Lacoste-Julien, Alexandre Lacoste, and Boris Oreshkin. Differentiable causal discovery from interventional data. In Advances in Neural Information Processing Systems, 2020 b

2020
[4]

Optimal structure identification with greedy search

David Maxwell Chickering. Optimal structure identification with greedy search. In Journal of Machine Learning Research, volume 3, pages 507--554, Nov 2002

2002
[5]

A synthetic approach to markov kernels, conditional independence and theorems on sufficient statistics

Tobias Fritz. A synthetic approach to markov kernels, conditional independence and theorems on sufficient statistics. Advances in Mathematics, 370: 0 107239, August 2020. ISSN 0001-8708. doi:10.1016/j.aim.2020.107239. URL http://dx.doi.org/10.1016/j.aim.2020.107239

work page doi:10.1016/j.aim.2020.107239 2020
[6]

Paul R. Halmos. Measure Theory. Van Nostrand, 1950

1950
[7]

Imbens and Donald B

Guido W. Imbens and Donald B. Rubin. Causal Inference for Statistics, Social, and Biomedical Sciences: An Introduction. Cambridge University Press, USA, 2015. ISBN 0521885884

2015
[8]

Quantifying causal influences

Dominik Janzing, David Balduzzi, Moritz Grosse-Wentrup, and Bernhard Schölkopf. Quantifying causal influences. The Annals of Statistics, 41 0 (5), October 2013. ISSN 0090-5364. doi:10.1214/13-aos1145. URL http://dx.doi.org/10.1214/13-AOS1145

work page doi:10.1214/13-aos1145 2013
[9]

Learning causal effects via causal inference theory

Nan Ke, Alexander Didic, Xinyu Chen, Seungjin Kim, and Yoshua Bengio. Learning causal effects via causal inference theory. In International Conference on Learning Representations, 2019

2019
[10]

Gradient-based neural dag learning

Sébastien Lachapelle, Pierre Brouillard, Tristan Deleu, and Simon Lacoste-Julien. Gradient-based neural dag learning. In International Conference on Learning Representations, 2020

2020
[11]

Categories for the Working Mathematician

Saunders MacLane. Categories for the Working Mathematician. Springer-Verlag, New York, 1971. Graduate Texts in Mathematics, Vol. 5

1971
[12]

Intuitionistic j -do-calculus in topos causal models, 2025

Sridhar Mahadevan. Intuitionistic j -do-calculus in topos causal models, 2025. URL https://arxiv.org/abs/2510.17944

work page arXiv 2025
[13]

Mahadevan, S. (2026). Categories for AGI , https://people.cs.umass.edu/ mahadeva/papers/catagi.pdf

2026
[14]

Causality: Models, Reasoning and Inference

Judea Pearl. Causality: Models, Reasoning and Inference. Cambridge University Press, USA, 2nd edition, 2009. ISBN 052189560X

2009
[15]

E. Riehl. Category Theory in Context. Aurora: Dover Modern Math Originals. Dover Publications, 2017. ISBN 9780486820804. URL https://books.google.com/books?id=6B9MDgAAQBAJ

2017
[16]

Real and Complex Analysis

Walter Rudin. Real and Complex Analysis. McGraw-Hill, 3rd edition, 1987

1987
[17]

Causal protein-signaling networks derived from multiparameter single-cell data

Karen Sachs, Diego Perez, Dana Pe'er, Douglas A Lauffenburger, and Garry P Nolan. Causal protein-signaling networks derived from multiparameter single-cell data. Science, 308 0 (5721): 0 523--529, 2005. doi:10.1126/science.1105809

work page doi:10.1126/science.1105809 2005
[18]

Causation, Prediction, and Search

Peter Spirtes, Clark Glymour, and Richard Scheines. Causation, Prediction, and Search. MIT Press, 2000

2000
[19]

Kan Extensions in Probability Theory

Ruben van Belle. Kan Extensions in Probability Theory. PhD thesis, University of Edinburgh, 2024

2024
[20]

Permutation-based causal inference algorithms with interventions

Yue Wang and Mathias Drton. Permutation-based causal inference algorithms with interventions. In Advances in Neural Information Processing Systems, 2017

2017
[21]

Dags with no tears: Continuous optimization for structure learning

Xun Zheng, Bryon Aragam, Pradeep Ravikumar, and Eric P Xing. Dags with no tears: Continuous optimization for structure learning. In Advances in Neural Information Processing Systems, 2018

2018

[1] [1]

Probability and Measure

Patrick Billingsley. Probability and Measure. Wiley, 3rd edition, 1995

1995

[2] [2]

Differentiable causal discovery from interventional data, 2020 a

Philippe Brouillard, Sébastien Lachapelle, Alexandre Lacoste, Simon Lacoste-Julien, and Alexandre Drouin. Differentiable causal discovery from interventional data, 2020 a . URL https://arxiv.org/abs/2007.01754

work page arXiv 2020

[3] [3]

Differentiable causal discovery from interventional data

Pierre Brouillard, Benoit Lachapelle, Simon Lacoste-Julien, Alexandre Lacoste, and Boris Oreshkin. Differentiable causal discovery from interventional data. In Advances in Neural Information Processing Systems, 2020 b

2020

[4] [4]

Optimal structure identification with greedy search

David Maxwell Chickering. Optimal structure identification with greedy search. In Journal of Machine Learning Research, volume 3, pages 507--554, Nov 2002

2002

[5] [5]

A synthetic approach to markov kernels, conditional independence and theorems on sufficient statistics

Tobias Fritz. A synthetic approach to markov kernels, conditional independence and theorems on sufficient statistics. Advances in Mathematics, 370: 0 107239, August 2020. ISSN 0001-8708. doi:10.1016/j.aim.2020.107239. URL http://dx.doi.org/10.1016/j.aim.2020.107239

work page doi:10.1016/j.aim.2020.107239 2020

[6] [6]

Paul R. Halmos. Measure Theory. Van Nostrand, 1950

1950

[7] [7]

Imbens and Donald B

Guido W. Imbens and Donald B. Rubin. Causal Inference for Statistics, Social, and Biomedical Sciences: An Introduction. Cambridge University Press, USA, 2015. ISBN 0521885884

2015

[8] [8]

Quantifying causal influences

Dominik Janzing, David Balduzzi, Moritz Grosse-Wentrup, and Bernhard Schölkopf. Quantifying causal influences. The Annals of Statistics, 41 0 (5), October 2013. ISSN 0090-5364. doi:10.1214/13-aos1145. URL http://dx.doi.org/10.1214/13-AOS1145

work page doi:10.1214/13-aos1145 2013

[9] [9]

Learning causal effects via causal inference theory

Nan Ke, Alexander Didic, Xinyu Chen, Seungjin Kim, and Yoshua Bengio. Learning causal effects via causal inference theory. In International Conference on Learning Representations, 2019

2019

[10] [10]

Gradient-based neural dag learning

Sébastien Lachapelle, Pierre Brouillard, Tristan Deleu, and Simon Lacoste-Julien. Gradient-based neural dag learning. In International Conference on Learning Representations, 2020

2020

[11] [11]

Categories for the Working Mathematician

Saunders MacLane. Categories for the Working Mathematician. Springer-Verlag, New York, 1971. Graduate Texts in Mathematics, Vol. 5

1971

[12] [12]

Intuitionistic j -do-calculus in topos causal models, 2025

Sridhar Mahadevan. Intuitionistic j -do-calculus in topos causal models, 2025. URL https://arxiv.org/abs/2510.17944

work page arXiv 2025

[13] [13]

Mahadevan, S. (2026). Categories for AGI , https://people.cs.umass.edu/ mahadeva/papers/catagi.pdf

2026

[14] [14]

Causality: Models, Reasoning and Inference

Judea Pearl. Causality: Models, Reasoning and Inference. Cambridge University Press, USA, 2nd edition, 2009. ISBN 052189560X

2009

[15] [15]

E. Riehl. Category Theory in Context. Aurora: Dover Modern Math Originals. Dover Publications, 2017. ISBN 9780486820804. URL https://books.google.com/books?id=6B9MDgAAQBAJ

2017

[16] [16]

Real and Complex Analysis

Walter Rudin. Real and Complex Analysis. McGraw-Hill, 3rd edition, 1987

1987

[17] [17]

Causal protein-signaling networks derived from multiparameter single-cell data

Karen Sachs, Diego Perez, Dana Pe'er, Douglas A Lauffenburger, and Garry P Nolan. Causal protein-signaling networks derived from multiparameter single-cell data. Science, 308 0 (5721): 0 523--529, 2005. doi:10.1126/science.1105809

work page doi:10.1126/science.1105809 2005

[18] [18]

Causation, Prediction, and Search

Peter Spirtes, Clark Glymour, and Richard Scheines. Causation, Prediction, and Search. MIT Press, 2000

2000

[19] [19]

Kan Extensions in Probability Theory

Ruben van Belle. Kan Extensions in Probability Theory. PhD thesis, University of Edinburgh, 2024

2024

[20] [20]

Permutation-based causal inference algorithms with interventions

Yue Wang and Mathias Drton. Permutation-based causal inference algorithms with interventions. In Advances in Neural Information Processing Systems, 2017

2017

[21] [21]

Dags with no tears: Continuous optimization for structure learning

Xun Zheng, Bryon Aragam, Pradeep Ravikumar, and Eric P Xing. Dags with no tears: Continuous optimization for structure learning. In Advances in Neural Information Processing Systems, 2018

2018