arxiv: 2605.04955 · v1 · submitted 2026-05-06 · 💻 cs.LG

Recognition: 2 theorem links

· Lean Theorem

Order-based Rehearsal Learning

Yu-Xuan Tao , Tian-Zuo Wang , Zhi-Hua Zhou

Authors on Pith no claims yet

Pith reviewed 2026-05-08 17:46 UTC · model grok-4.3

classification 💻 cs.LG

keywords avoiding undesired futureorder learningrehearsal learningcausal inferenceobservational datadecision makinginformation theorysurrogate objective

0 comments

The pith

The order structure alone can be sufficient to identify decision influences for avoiding undesired future events from observational data.

A machine-rendered reading of the paper's core claim, the machinery that carries it, and where it could break.

Many rehearsal learning methods for avoiding undesired futures rely on learning a full causal graph, which is difficult and prone to errors from observational data. This paper demonstrates that a simpler order among variables can provide enough information to make effective decisions without the full graph. They introduce an information-theoretic approach to learn the order without assuming specific forms for how variables relate or the noise involved. Using this order, they build a sampler to estimate how decisions change the chance of avoiding the bad event and turn the problem into an optimization task that can be solved with gradients. If this holds, it simplifies solving these decision problems by skipping the full graph learning step.

Core claim

We demonstrate that the order structure can be sufficient for AUF decision-making, and propose the first order-based rehearsal learning method. Our information-theoretic order learning imposes no restrictions on the form of structural functions or the type of noise distributions. For decision-making, we construct an order-based sampler to approximate the influence of decisions and reduce the AUF task to a differentiable optimization problem using a surrogate objective for maximizing post-decision success probability. Experiments show that our approach not only surpasses methods relying on learned graphs or learned orders, but also matches or even exceeds oracle baselines given the true graph

What carries the argument

The order-based sampler that approximates the causal influence of decisions on the undesired event using only the learned order from observational data.

If this is right

Rehearsal learning for AUF can proceed without estimating a complete causal graph, lowering the risk of estimation errors.
The information-theoretic order learner works under general assumptions about data generating processes.
AUF decision making reduces to differentiable optimization, making it compatible with standard machine learning pipelines.
Order-based methods can outperform graph-based ones in practice when graph learning is inaccurate.

Where Pith is reading between the lines

These are editorial extensions of the paper, not claims the author makes directly.

This approach might extend to other intervention-based decision tasks where only partial causal information is needed.
Testing the method on real-world datasets with hidden confounders could reveal limits of order sufficiency.
Combining order learning with other causal discovery techniques might yield hybrid methods for even better performance.

Load-bearing premise

The order extracted from observational data captures the necessary information to distinguish the impact of different decisions on the probability of the undesired event.

What would settle it

A controlled experiment using data generated from a known structural causal model where the order-based method's achieved success probability is substantially lower than that of an oracle method provided with the true graph.

Figures

Figures reproduced from arXiv: 2605.04955 by Tian-Zuo Wang, Yu-Xuan Tao, Zhi-Hua Zhou.

**Figure 1.** Figure 1: Results on the synthetic datasets with beta or Gaussian noise ( view at source ↗

**Figure 2.** Figure 2: (a) Relation between the success probability and the sample size on the Bermuda dataset. (b) Relation between the performance gap and the sample size on the Bermuda dataset. We further study the effect of sample size on Bermuda. Figure 2a shows absolute performance as a function of sample size, and Figure 2b reports the gap pGrad-Rh − pOLEM-Rh. Both methods improve as sample size increases, reflecting m… view at source ↗

**Figure 3.** Figure 3: d = 5, p = 0.3 with beta noise 0 1 linear rate 0 1 2 3 DIV 0 1 linear rate 0.0 2.5 5.0 7.5 10.0 SID 0 1 linear rate 0.0 2.5 5.0 7.5 SHD CAM LISTEN SCORE DiffAN CaPS OLEM view at source ↗

**Figure 5.** Figure 5: d = 5, p = 0.8 with beta noise 0.0 0.5 1.0 linear rate 0.0 2.5 5.0 7.5 10.0 DIV 0.0 0.5 1.0 linear rate 0 20 40 SID 0.0 0.5 1.0 linear rate 0 10 20 SHD CAM LISTEN SCORE DiffAN CaPS OLEM view at source ↗

**Figure 7.** Figure 7: d = 10, p = 0.5 with beta noise 0.0 0.5 1.0 linear rate 0 10 20 30 DIV 0.0 0.5 1.0 linear rate 0 20 40 60 80 SID 0.0 0.5 1.0 linear rate 0 20 40 SHD CAM LISTEN SCORE DiffAN CaPS OLEM view at source ↗

**Figure 9.** Figure 9: d = 20, p = 0.3 with beta noise 0 1 linear rate 0 20 40 60 80 DIV 0 1 linear rate 100 200 300 SID 0 1 linear rate 0 50 100 150 SHD CAM LISTEN SCORE DiffAN CaPS OLEM view at source ↗

**Figure 11.** Figure 11: d = 5, p = 0.3 with Gaussian noise 0.0 0.5 1.0 linear rate 0 1 2 DIV 0.0 0.5 1.0 linear rate 0 5 10 15 SID 0.0 0.5 1.0 linear rate 0 5 10 SHD CAM LISTEN SCORE DiffAN CaPS OLEM view at source ↗

**Figure 13.** Figure 13: d = 5, p = 0.8 with Gaussian noise 0.0 0.5 1.0 linear rate 0 2 4 6 DIV 0.0 0.5 1.0 linear rate 20 40 60 SID 0.0 0.5 1.0 linear rate 10 20 SHD CAM LISTEN SCORE DiffAN CaPS OLEM view at source ↗

**Figure 15.** Figure 15: d = 10, p = 0.5 with Gaussian noise 0.0 0.5 1.0 linear rate 5 10 15 DIV 0.0 0.5 1.0 linear rate 20 40 60 80 SID 0.0 0.5 1.0 linear rate 10 20 30 40 SHD CAM LISTEN SCORE DiffAN CaPS OLEM view at source ↗

**Figure 17.** Figure 17: d = 20, p = 0.3 with Gaussian noise 0 1 linear rate 20 40 60 DIV 0 1 linear rate 100 200 300 SID 0 1 linear rate 50 100 150 SHD CAM LISTEN SCORE DiffAN CaPS OLEM view at source ↗

**Figure 19.** Figure 19: d = 5, p = 0.3 with exponential noise 0.0 0.5 1.0 linear rate 0 1 2 DIV 0.0 0.5 1.0 linear rate 0 5 10 SID 0.0 0.5 1.0 linear rate 0 5 10 SHD CAM LISTEN SCORE DiffAN CaPS OLEM view at source ↗

**Figure 21.** Figure 21: d = 5, p = 0.8 with exponential noise 0.0 0.5 1.0 linear rate 0 2 4 6 8 DIV 0.0 0.5 1.0 linear rate 0 20 40 60 SID 0.0 0.5 1.0 linear rate 10 20 SHD CAM LISTEN SCORE DiffAN CaPS OLEM view at source ↗

**Figure 23.** Figure 23: d = 10, p = 0.5 with exponential noise 0.0 0.5 1.0 linear rate 0 5 10 15 DIV 0.0 0.5 1.0 linear rate 20 40 60 80 SID 0.0 0.5 1.0 linear rate 10 20 30 40 SHD CAM LISTEN SCORE DiffAN CaPS OLEM view at source ↗

**Figure 25.** Figure 25: d = 20, p = 0.3 with exponential noise 0 1 linear rate 20 40 DIV 0 1 linear rate 100 200 300 SID 0 1 linear rate 50 100 150 SHD CAM LISTEN SCORE DiffAN CaPS OLEM view at source ↗

read the original abstract

When a machine learning (ML) model forecasts an undesired event, one often seeks a decision to avoid it, known as the avoiding undesired future (AUF) problem. Many rehearsal learning methods have been proposed for AUF, but they rely on an underlying graph structure; learning such a graph from observational data is challenging and can incur substantial estimation error. In this work, we demonstrate that the order structure can be sufficient for AUF decision-making, and propose the first order-based rehearsal learning method. Although an order is less informative than a graph, it can be sufficient to identify the influence of decisions from observational data, suggesting that learning the entire graph is not always necessary. To learn the order, we develop an information-theoretic method that imposes no restrictions on the form of structural functions or the type of noise distributions. For AUF decision-making, we construct an order-based sampler to approximate the influence of decisions and, combined with a surrogate objective for maximizing the post-decision success probability, reduce the AUF task to a differentiable optimization problem. Experiments show that our order learning method outperforms existing methods, and that our AUF approach not only surpasses methods relying on learned graphs or learned orders, but also matches or even exceeds oracle baselines that are given the true graph.

Editorial analysis

A structured set of objections, weighed in public.

Desk editor's note, referee report, simulated authors' rebuttal, and a circularity audit. Tearing a paper down is the easy half of reading it; the pith above is the substance, this is the friction.

Desk Editor's Note private letter to a colleague

The paper's order-based approach to AUF rehearsal learning is a genuine simplification over graph methods, but the claim of beating a true-graph oracle is the part that does not hold up without more evidence.

read the letter

The main thing to know is that this work shows an order over variables can be enough for rehearsal learning in the avoiding undesired future problem, and they give the first method built around that instead of a full causal graph. They learn the order with an information-theoretic approach that avoids any assumptions on structural functions or noise, then use an order-based sampler plus a surrogate objective to turn decision influence approximation into a differentiable optimization task. This directly targets the estimation errors that come from learning complete graphs from observational data, which is a practical win if it works.

Referee Report

3 major / 2 minor

Summary. The paper claims that order structure (rather than full causal graphs) is sufficient for solving the avoiding undesired future (AUF) decision-making problem from observational data. It proposes an information-theoretic order learner that imposes no restrictions on structural functions or noise distributions, an order-based sampler to approximate decision influence, and a surrogate objective that reduces AUF to differentiable optimization. Experiments reportedly show the order learner outperforming existing methods and the full AUF approach surpassing learned-graph and learned-order baselines while matching or exceeding oracle baselines supplied with the true graph.

Significance. If the central claims hold, the work would be significant for causal decision-making and rehearsal learning: it suggests that learning full graphs is not always necessary, potentially reducing estimation error in high-dimensional settings. The restriction-free order learner and the reduction to differentiable optimization are potentially valuable if they are shown to be parameter-free or free of hidden fitting steps. Reproducible code or machine-checked elements are not mentioned.

major comments (3)

[Abstract / Experiments] Abstract and experimental section: the claim that the order-based AUF approach 'matches or even exceeds oracle baselines that are given the true graph' is load-bearing for the sufficiency argument. Because an order is strictly less informative than a graph, this requires explicit verification that the oracle baseline fully exploits the graph (e.g., via exact interventional computations rather than the same order-based sampler and surrogate objective). Without that, the comparison risks being an artifact of suboptimal oracle implementation.
[Order Learning Method] Method section on the order learner: the information-theoretic order learner is asserted to impose 'no restrictions on the form of structural functions or the type of noise distributions.' Please provide the explicit estimator or objective (including any finite-sample approximations) and demonstrate that it remains consistent without parametric assumptions; otherwise the 'parameter-free' character is unclear.
[Order-based Sampler and Surrogate Objective] AUF decision-making section: the order-based sampler and surrogate objective are used to approximate influence and reduce the task to differentiable optimization. Clarify whether the sampler is derived from the learned order alone or incorporates additional fitted parameters; any hidden fitting would undermine the claim that order structure alone suffices.

minor comments (2)

[Abstract] The abstract states experimental superiority but supplies no details on protocols, error bars, data splits, or number of runs. These should be added for reproducibility.
[Preliminaries] Notation for the order (e.g., how partial orders are represented and sampled) should be introduced with a clear definition early in the methods.

Simulated Author's Rebuttal

3 responses · 0 unresolved

Thank you for the referee's thoughtful and constructive comments. We address each major comment point by point below. We agree that additional clarity is needed in several places and will revise the manuscript accordingly to strengthen the presentation of our results without altering the core claims.

read point-by-point responses

Referee: [Abstract / Experiments] Abstract and experimental section: the claim that the order-based AUF approach 'matches or even exceeds oracle baselines that are given the true graph' is load-bearing for the sufficiency argument. Because an order is strictly less informative than a graph, this requires explicit verification that the oracle baseline fully exploits the graph (e.g., via exact interventional computations rather than the same order-based sampler and surrogate objective). Without that, the comparison risks being an artifact of suboptimal oracle implementation.

Authors: We agree this clarification is essential to support the sufficiency argument. In the current experiments the oracle baselines are implemented by leveraging the true graph to derive exact interventional distributions for computing decision influences, rather than reusing the order-based sampler. This is what enables the comparison to demonstrate that order structure alone can match or exceed a graph-based approach in the tested settings. To address the concern directly, we will revise the experimental section with an expanded description of the oracle implementation, including the mathematical steps used to exploit the full graph (e.g., via do-calculus or direct simulation from the structural equations). revision: yes
Referee: [Order Learning Method] Method section on the order learner: the information-theoretic order learner is asserted to impose 'no restrictions on the form of structural functions or the type of noise distributions.' Please provide the explicit estimator or objective (including any finite-sample approximations) and demonstrate that it remains consistent without parametric assumptions; otherwise the 'parameter-free' character is unclear.

Authors: The information-theoretic order learner identifies the order by maximizing an objective based on conditional mutual information estimated directly from the observational data. The estimator uses a non-parametric k-nearest-neighbor approach for finite samples and imposes no functional or distributional assumptions beyond standard i.i.d. sampling. We will revise the method section to include the precise mathematical form of the objective, the finite-sample estimator, and a short consistency argument under mild regularity conditions on the joint distribution. This will make the parameter-free nature explicit. revision: yes
Referee: [Order-based Sampler and Surrogate Objective] AUF decision-making section: the order-based sampler and surrogate objective are used to approximate influence and reduce the task to differentiable optimization. Clarify whether the sampler is derived from the learned order alone or incorporates additional fitted parameters; any hidden fitting would undermine the claim that order structure alone suffices.

Authors: The sampler generates trajectories by respecting only the precedence constraints encoded in the learned order; no additional parameters or auxiliary models are fitted. The surrogate objective is then a direct, differentiable function of the success probability approximated from these order-constrained samples. We will revise the AUF decision-making section to state this explicitly, add the formal definition of the sampler, and include pseudocode confirming that the procedure uses the order as its sole input. revision: yes

Circularity Check

0 steps flagged

No circularity: derivation chain is self-contained

full rationale

The paper develops an information-theoretic order learner with no restrictions on structural functions or noise, then constructs an order-based sampler and surrogate objective to reduce AUF to differentiable optimization. These steps are presented as independent constructions from observational data, not as reparameterizations or fits of the target quantities. The empirical claim of matching/exceeding a true-graph oracle is an experimental outcome rather than a definitional reduction. No self-definitional, fitted-input-as-prediction, or self-citation load-bearing steps appear in the abstract or described chain; the method is not forced by its own inputs.

Axiom & Free-Parameter Ledger

0 free parameters · 2 axioms · 0 invented entities

The central claim rests on the domain assumption that an order structure suffices to identify decision influences from observational data, plus the claim that an unrestricted information-theoretic learner can recover such an order. No free parameters or invented entities are mentioned in the abstract.

axioms (2)

domain assumption Order structure is sufficient to identify the influence of decisions from observational data
Explicitly asserted as a demonstration in the abstract.
domain assumption Information-theoretic method can learn the order without restrictions on structural functions or noise distributions
Stated as part of the developed learning method in the abstract.

pith-pipeline@v0.9.0 · 5520 in / 1435 out tokens · 32484 ms · 2026-05-08T17:46:26.171359+00:00 · methodology

discussion (0)

Lean theorems connected to this paper

Citations machine-checked in the Pith Canon. Every link opens the source theorem in the public Lean library.

Cost.FunctionalEquation / Foundation.LogicAsFunctionalEquation washburn_uniqueness_aczel (J = ½(x+x⁻¹)−1 uniqueness) unclear

?

unclear
Relation between the paper passage and the cited Recognition theorem.

we develop OLEM (Order Learning via conditional Entropy Maximization). OLEM learns the order recursively by repeatedly identifying and removing the 'last' variable from the remaining set of variables. Each step is achieved by maximizing a conditional entropy objective
Foundation.ArithmeticFromLogic (orbit/order on LogicNat) embed_strictMono_of_one_lt unclear

?

unclear
Relation between the paper passage and the cited Recognition theorem.

i* = arg max_{i∈π[1:m]} h(V_i | V^{π[1:m]}_{-i}) implies i* is a sink in G^{π[1:m]}

What do these tags mean?

matches: The paper's claim is directly supported by a theorem in the formal canon.
supports: The theorem supports part of the paper's argument, but the paper may add assumptions or extra steps.
extends: The paper goes beyond the formal theorem; the theorem is a base layer rather than the whole result.
uses: The paper appears to rely on the theorem as machinery.
contradicts: The paper's claim conflicts with a theorem or certificate in the canon.
unclear: Pith found a possible connection, but the passage is too broad, indirect, or ambiguous to say the theorem truly supports the claim.

Forward citations

Cited by 1 Pith paper

Reviewed papers in the Pith corpus that reference this work. Sorted by Pith novelty score.

Non-Parametric Rehearsal Learning via Conditional Mean Embeddings
cs.LG 2026-05 unverdicted novelty 7.0

A non-parametric rehearsal learning framework using conditional mean embeddings and a Probit surrogate for avoiding undesired outcomes, with consistency guarantees.

Reference graph

Works this paper leans on

42 extracted references · 4 canonical work pages · cited by 1 Pith paper

[1]

Andreas Andersson and Nicholas Bates. In situ measurements used for coral and reef-scale calcification structural equation modeling including environmental and chemical measurements, and coral calcification rates in Bermuda from 2010 to 2012 ( BEACON project), 2018

2010
[2]

Framework for Easily Invertible Architectures (FrEIA)

Lynton Ardizzone, Till Bungert, Felix Draxler, Ullrich K \"o the, Jakob Kruse, Robert Schmier, and Peter Sorrenson. Framework for Easily Invertible Architectures (FrEIA) . 2018--2022

2018
[3]

arXiv preprint arXiv:1907.02392 , year=

Lynton Ardizzone, Carsten L \"u th, Jakob Kruse, Carsten Rother, and Ullrich K \"o the. Guided image generation with conditional invertible neural networks. arXiv preprint arXiv:1907.02392, 2019

work page arXiv 1907
[4]

A review on deep learning for recommender systems: challenges and remedies

Zeynep Batmaz, Ali Yurekli, Alper Bilge, and Cihan Kaleli. A review on deep learning for recommender systems: challenges and remedies. Artificial Intelligence Review, 52 0 (1): 0 1--37, 2019

2019
[5]

Berrett, Richard J

Thomas B. Berrett, Richard J. Samworth, and Ming Yuan. Efficient multivariate entropy estimation via k -nearest neighbour distances. The Annals of Statistics, 47 0 (1): 0 288--318, 2019

2019
[6]

Convex optimization

Stephen Boyd and Lieven Vandenberghe. Convex optimization. Cambridge University Press, 2004

2004
[7]

Cam: Causal additive models, high-dimensional order search and penalized regression

Peter B \"u hlmann, Jonas Peters, and Jan Ernest. Cam: Causal additive models, high-dimensional order search and penalized regression. The Annals of Statistics, 42 0 (6): 0 2526--2556, 2014

2014
[8]

Optimal structure identification with greedy search

David Maxwell Chickering. Optimal structure identification with greedy search. Journal of Machine Learning Research, 3: 0 507--554, 2002

2002
[9]

Environmental controls on modern scleractinian coral and reef-scale calcification

Travis A Courtney, Mario Lebrato, Nicholas R Bates, Andrew Collins, Samantha J De Putron, Rebecca Garley, Rod Johnson, Juan-Carlos Molinero, Timothy J Noyes, Christopher L Sabine, and Andreas J Andersson. Environmental controls on modern scleractinian coral and reef-scale calcification. Science Advances, 3 0 (11): 0 e1701356, 2017

2017
[10]

Variance-reduced long-term rehearsal learning with quadratic programming reformulation

Wen-Bo Du, Tian Qin, Tian-Zuo Wang, and Zhi-Hua Zhou. Variance-reduced long-term rehearsal learning with quadratic programming reformulation. In Advances in Neural Information Processing Systems 38, 2025

2025
[11]

Erdős and A

P. Erdős and A. Rényi. On the evolution of random graphs. Publ. Math. Inst. Hung. Acad. Sci., 5: 0 17--61, 1960

1960
[12]

Bayesian optimization with inequality constraints

Jacob R Gardner, Matt J Kusner, Zhixiang Eddie Xu, Kilian Q Weinberger, and John P Cunningham. Bayesian optimization with inequality constraints. In Proceedings of the 31st International Conference on Machine Learning, pages 937--945, 2014

2014
[13]

Learning linear structural equation models in polynomial time and sample complexity

Asish Ghoshal and Jean Honorio. Learning linear structural equation models in polynomial time and sample complexity. In Proceedings of the 21st International Conference on Artificial Intelligence and Statistics, pages 1466--1475, 2018

2018
[14]

Review of causal discovery methods based on graphical models

Clark Glymour, Kun Zhang, and Peter Spirtes. Review of causal discovery methods based on graphical models. Frontiers in Genetics, 10: 0 524, 2019

2019
[15]

Nonlinear causal discovery with additive noise models

Patrik Hoyer, Dominik Janzing, Joris M Mooij, Jonas Peters, and Bernhard Sch \"o lkopf. Nonlinear causal discovery with additive noise models. In Advances in Neural Information Processing Systems 21, pages 689--696, 2008

2008
[16]

Yumi Iwasaki and Herbert A. Simon. Theories of causal ordering: Reply to de kleer and brown. Artificial Intelligence, 29 0 (1): 0 63--72, 1986

1986
[17]

Adam: A method for stochastic optimization

DP Kingma and J Ba. Adam: A method for stochastic optimization. In Proceedings of the 3rd International Conference on Learning Representations, 2015

2015
[18]

L. F. Kozachenko and N. N. Leonenko. Sample estimate of the entropy of a random vector. Probl. Pered. Inform., 23: 0 95--101, 1987

1987
[19]

Deep learning for natural language processing: advantages and challenges

Hang Li. Deep learning for natural language processing: advantages and challenges. National Science Review, 5 0 (1): 0 24--26, 2017

2017
[20]

Causality: Models, Reasoning, and Inference

Judea Pearl. Causality: Models, Reasoning, and Inference. Cambridge University Press, 2000

2000
[21]

Structural intervention distance for evaluating causal graphs

Jonas Peters and Peter B \"u hlmann. Structural intervention distance for evaluating causal graphs. Neural Computation, 27 0 (3): 0 771--799, 2015

2015
[22]

Rehearsal learning for avoiding undesired future

Tian Qin, Tian-Zuo Wang, and Zhi-Hua Zhou. Rehearsal learning for avoiding undesired future. In Advances in Neural Information Processing Systems 36, pages 80517--80542, 2023

2023
[23]

Gradient-based nonlinear rehearsal learning with multivariate alterations

Tian Qin, Tian-Zuo Wang, and Zhi-Hua Zhou. Gradient-based nonlinear rehearsal learning with multivariate alterations. In Proceedings of the 39th AAAI Conference on Artificial Intelligence , pages 26859--26867, 2025

2025
[24]

a us Kleindessner, Chris Russell, Dominik Janzing, Bernhard Sch \

Paul Rolland, Volkan Cevher, Matth \"a us Kleindessner, Chris Russell, Dominik Janzing, Bernhard Sch \"o lkopf, and Francesco Locatello. Score matching enables causal discovery of nonlinear additive noise models. In Proceedings of the 39th International Conference on Machine Learning, pages 18741--18753, 2022

2022
[25]

Causal protein-signaling networks derived from multiparameter single-cell data

Karen Sachs, Omar Perez, Dana Pe'er, Douglas A Lauffenburger, and Garry P Nolan. Causal protein-signaling networks derived from multiparameter single-cell data. Science, 308 0 (5721): 0 523--529, 2005

2005
[26]

Diffusion models for causal discovery via topological ordering

Pedro Sanchez, Xiao Liu, Alison Q O'Neil, and Sotirios A Tsaftaris. Diffusion models for causal discovery via topological ordering. In Proceedings of the 11th International Conference on Learning Representations, pages 20468--20487, 2023

2023
[27]

A linear non-gaussian acyclic model for causal discovery

Shohei Shimizu, Patrik O Hoyer, Aapo Hyv \"a rinen, Antti Kerminen, and Michael Jordan. A linear non-gaussian acyclic model for causal discovery. Journal of Machine Learning Research, 7: 0 2003--2030, 2006

2003
[28]

Herbert A. Simon. Causal ordering and identifiability. In Wm. C. Hood and Tjalling C. Koopmans, editors, Studies in Econometric Methods, pages 49--74. John Wiley & Sons, New York, 1953

1953
[29]

Causation, Prediction, and Search

Peter Spirtes, Clark Glymour, and Richard Scheines. Causation, Prediction, and Search. MIT Press, 2000

2000
[30]

Ensemble estimators for multivariate entropy estimation

Kumar Sricharan, Dennis Wei, and Alfred O Hero. Ensemble estimators for multivariate entropy estimation. IEEE Transactions on Information Theory, 59 0 (7): 0 4374--4388, 2013

2013
[31]

Avoiding undesired future with sequential decisions

Lue Tao, Tian-Zuo Wang, Yuan Jiang, and Zhi-Hua Zhou. Avoiding undesired future with sequential decisions. In Proceedings of the 34th International Joint Conference on Artificial Intelligence, pages 6245--6253, 2025

2025
[32]

Ordering-based search: A simple and effective algorithm for learning bayesian networks

Marc Teyssier and Daphne Koller. Ordering-based search: A simple and effective algorithm for learning bayesian networks. In Proceedings of the 21st Conference on Uncertainty in Artificial Intelligence, pages 584--590, 2005

2005
[33]

Geometry of the faithfulness assumption in causal inference

Caroline Uhler, Garvesh Raskutti, Peter B \"u hlmann, and Bin Yu. Geometry of the faithfulness assumption in causal inference. The Annals of Statistics, pages 436--463, 2013

2013
[34]

Deep learning for computer vision: A brief review

Athanasios Voulodimos, Anastasios Doulamis, Nikolaos Doulamis, and Evangelos Protopapadakis. Deep learning for computer vision: A brief review. Computational Intelligence and Neuroscience, pages 1--13, 2018

2018
[35]

Learning likelihoods with conditional normalizing flows.CoRR, abs/1912.00042,

Christina Winkler, Daniel Worrall, Emiel Hoogeboom, and Max Welling. Learning likelihoods with conditional normalizing flows. arXiv preprint arXiv:1912.00042, 2019

work page arXiv 1912
[36]

Local causal structure learning in the presence of latent variables

Feng Xie, Zheng Li, Peng Wu, Yan Zeng, Chunchen Liu, and Zhi Geng. Local causal structure learning in the presence of latent variables. arXiv preprint arXiv:2405.16225, 2024

work page arXiv 2024
[37]

Ordering-based causal discovery for linear and nonlinear relations

Zhuopeng Xu, Yujie Li, Cheng Liu, and Ning Gui. Ordering-based causal discovery for linear and nonlinear relations. In Advances in Neural Information Processing Systems 37, pages 4315--4340, 2024

2024
[38]

A survey on causal discovery: theory and practice

Alessio Zanga, Elif Ozkirimli, and Fabio Stella. A survey on causal discovery: theory and practice. International Journal of Approximate Reasoning, 151: 0 101--129, 2022

2022
[39]

On the completeness of orientation rules for causal discovery in the presence of latent confounders

Jiji Zhang. On the completeness of orientation rules for causal discovery in the presence of latent confounders. Artificial Intelligence, 172 0 (16-17): 0 1873--1896, 2008

2008
[40]

MoRe: Modular Representations for Principled Continual Learning on LLMs

Kun Zhang, Shaoan Xie, Ignavier Ng, and Yujia Zheng. Causal representation learning from multiple distributions: A general setting. arXiv preprint arXiv:2402.05052, 2024

work page arXiv 2024
[41]

Dags with no tears: Continuous optimization for structure learning

Xun Zheng, Bryon Aragam, Pradeep K Ravikumar, and Eric P Xing. Dags with no tears: Continuous optimization for structure learning. In Advances in Neural Information Processing Systems 31, pages 9472--9483, 2018

2018
[42]

Rehearsal: learning from prediction to decision

Zhi-Hua Zhou. Rehearsal: learning from prediction to decision. Frontiers of Computer Science, 16(4): 0 164352, 2022

2022