pith. machine review for the scientific record. sign in

arxiv: 2605.04955 · v1 · submitted 2026-05-06 · 💻 cs.LG

Recognition: 2 theorem links

· Lean Theorem

Order-based Rehearsal Learning

Authors on Pith no claims yet

Pith reviewed 2026-05-08 17:46 UTC · model grok-4.3

classification 💻 cs.LG
keywords avoiding undesired futureorder learningrehearsal learningcausal inferenceobservational datadecision makinginformation theorysurrogate objective
0
0 comments X

The pith

The order structure alone can be sufficient to identify decision influences for avoiding undesired future events from observational data.

A machine-rendered reading of the paper's core claim, the machinery that carries it, and where it could break.

Many rehearsal learning methods for avoiding undesired futures rely on learning a full causal graph, which is difficult and prone to errors from observational data. This paper demonstrates that a simpler order among variables can provide enough information to make effective decisions without the full graph. They introduce an information-theoretic approach to learn the order without assuming specific forms for how variables relate or the noise involved. Using this order, they build a sampler to estimate how decisions change the chance of avoiding the bad event and turn the problem into an optimization task that can be solved with gradients. If this holds, it simplifies solving these decision problems by skipping the full graph learning step.

Core claim

We demonstrate that the order structure can be sufficient for AUF decision-making, and propose the first order-based rehearsal learning method. Our information-theoretic order learning imposes no restrictions on the form of structural functions or the type of noise distributions. For decision-making, we construct an order-based sampler to approximate the influence of decisions and reduce the AUF task to a differentiable optimization problem using a surrogate objective for maximizing post-decision success probability. Experiments show that our approach not only surpasses methods relying on learned graphs or learned orders, but also matches or even exceeds oracle baselines given the true graph

What carries the argument

The order-based sampler that approximates the causal influence of decisions on the undesired event using only the learned order from observational data.

If this is right

  • Rehearsal learning for AUF can proceed without estimating a complete causal graph, lowering the risk of estimation errors.
  • The information-theoretic order learner works under general assumptions about data generating processes.
  • AUF decision making reduces to differentiable optimization, making it compatible with standard machine learning pipelines.
  • Order-based methods can outperform graph-based ones in practice when graph learning is inaccurate.

Where Pith is reading between the lines

These are editorial extensions of the paper, not claims the author makes directly.

  • This approach might extend to other intervention-based decision tasks where only partial causal information is needed.
  • Testing the method on real-world datasets with hidden confounders could reveal limits of order sufficiency.
  • Combining order learning with other causal discovery techniques might yield hybrid methods for even better performance.

Load-bearing premise

The order extracted from observational data captures the necessary information to distinguish the impact of different decisions on the probability of the undesired event.

What would settle it

A controlled experiment using data generated from a known structural causal model where the order-based method's achieved success probability is substantially lower than that of an oracle method provided with the true graph.

Figures

Figures reproduced from arXiv: 2605.04955 by Tian-Zuo Wang, Yu-Xuan Tao, Zhi-Hua Zhou.

Figure 1
Figure 1. Figure 1: Results on the synthetic datasets with beta or Gaussian noise ( view at source ↗
Figure 2
Figure 2. Figure 2: (a) Relation between the success proba￾bility and the sample size on the Bermuda dataset. (b) Relation between the performance gap and the sample size on the Bermuda dataset. We further study the effect of sample size on Bermuda. Figure 2a shows absolute performance as a function of sample size, and Figure 2b re￾ports the gap pGrad-Rh − pOLEM-Rh. Both meth￾ods improve as sample size increases, reflecting m… view at source ↗
Figure 3
Figure 3. Figure 3: d = 5, p = 0.3 with beta noise 0 1 linear rate 0 1 2 3 DIV 0 1 linear rate 0.0 2.5 5.0 7.5 10.0 SID 0 1 linear rate 0.0 2.5 5.0 7.5 SHD CAM LISTEN SCORE DiffAN CaPS OLEM view at source ↗
Figure 5
Figure 5. Figure 5: d = 5, p = 0.8 with beta noise 0.0 0.5 1.0 linear rate 0.0 2.5 5.0 7.5 10.0 DIV 0.0 0.5 1.0 linear rate 0 20 40 SID 0.0 0.5 1.0 linear rate 0 10 20 SHD CAM LISTEN SCORE DiffAN CaPS OLEM view at source ↗
Figure 7
Figure 7. Figure 7: d = 10, p = 0.5 with beta noise 0.0 0.5 1.0 linear rate 0 10 20 30 DIV 0.0 0.5 1.0 linear rate 0 20 40 60 80 SID 0.0 0.5 1.0 linear rate 0 20 40 SHD CAM LISTEN SCORE DiffAN CaPS OLEM view at source ↗
Figure 9
Figure 9. Figure 9: d = 20, p = 0.3 with beta noise 0 1 linear rate 0 20 40 60 80 DIV 0 1 linear rate 100 200 300 SID 0 1 linear rate 0 50 100 150 SHD CAM LISTEN SCORE DiffAN CaPS OLEM view at source ↗
Figure 11
Figure 11. Figure 11: d = 5, p = 0.3 with Gaussian noise 0.0 0.5 1.0 linear rate 0 1 2 DIV 0.0 0.5 1.0 linear rate 0 5 10 15 SID 0.0 0.5 1.0 linear rate 0 5 10 SHD CAM LISTEN SCORE DiffAN CaPS OLEM view at source ↗
Figure 13
Figure 13. Figure 13: d = 5, p = 0.8 with Gaussian noise 0.0 0.5 1.0 linear rate 0 2 4 6 DIV 0.0 0.5 1.0 linear rate 20 40 60 SID 0.0 0.5 1.0 linear rate 10 20 SHD CAM LISTEN SCORE DiffAN CaPS OLEM view at source ↗
Figure 15
Figure 15. Figure 15: d = 10, p = 0.5 with Gaussian noise 0.0 0.5 1.0 linear rate 5 10 15 DIV 0.0 0.5 1.0 linear rate 20 40 60 80 SID 0.0 0.5 1.0 linear rate 10 20 30 40 SHD CAM LISTEN SCORE DiffAN CaPS OLEM view at source ↗
Figure 17
Figure 17. Figure 17: d = 20, p = 0.3 with Gaussian noise 0 1 linear rate 20 40 60 DIV 0 1 linear rate 100 200 300 SID 0 1 linear rate 50 100 150 SHD CAM LISTEN SCORE DiffAN CaPS OLEM view at source ↗
Figure 19
Figure 19. Figure 19: d = 5, p = 0.3 with exponential noise 0.0 0.5 1.0 linear rate 0 1 2 DIV 0.0 0.5 1.0 linear rate 0 5 10 SID 0.0 0.5 1.0 linear rate 0 5 10 SHD CAM LISTEN SCORE DiffAN CaPS OLEM view at source ↗
Figure 21
Figure 21. Figure 21: d = 5, p = 0.8 with exponential noise 0.0 0.5 1.0 linear rate 0 2 4 6 8 DIV 0.0 0.5 1.0 linear rate 0 20 40 60 SID 0.0 0.5 1.0 linear rate 10 20 SHD CAM LISTEN SCORE DiffAN CaPS OLEM view at source ↗
Figure 23
Figure 23. Figure 23: d = 10, p = 0.5 with exponential noise 0.0 0.5 1.0 linear rate 0 5 10 15 DIV 0.0 0.5 1.0 linear rate 20 40 60 80 SID 0.0 0.5 1.0 linear rate 10 20 30 40 SHD CAM LISTEN SCORE DiffAN CaPS OLEM view at source ↗
Figure 25
Figure 25. Figure 25: d = 20, p = 0.3 with exponential noise 0 1 linear rate 20 40 DIV 0 1 linear rate 100 200 300 SID 0 1 linear rate 50 100 150 SHD CAM LISTEN SCORE DiffAN CaPS OLEM view at source ↗
read the original abstract

When a machine learning (ML) model forecasts an undesired event, one often seeks a decision to avoid it, known as the avoiding undesired future (AUF) problem. Many rehearsal learning methods have been proposed for AUF, but they rely on an underlying graph structure; learning such a graph from observational data is challenging and can incur substantial estimation error. In this work, we demonstrate that the order structure can be sufficient for AUF decision-making, and propose the first order-based rehearsal learning method. Although an order is less informative than a graph, it can be sufficient to identify the influence of decisions from observational data, suggesting that learning the entire graph is not always necessary. To learn the order, we develop an information-theoretic method that imposes no restrictions on the form of structural functions or the type of noise distributions. For AUF decision-making, we construct an order-based sampler to approximate the influence of decisions and, combined with a surrogate objective for maximizing the post-decision success probability, reduce the AUF task to a differentiable optimization problem. Experiments show that our order learning method outperforms existing methods, and that our AUF approach not only surpasses methods relying on learned graphs or learned orders, but also matches or even exceeds oracle baselines that are given the true graph.

Editorial analysis

A structured set of objections, weighed in public.

Desk editor's note, referee report, simulated authors' rebuttal, and a circularity audit. Tearing a paper down is the easy half of reading it; the pith above is the substance, this is the friction.

Referee Report

3 major / 2 minor

Summary. The paper claims that order structure (rather than full causal graphs) is sufficient for solving the avoiding undesired future (AUF) decision-making problem from observational data. It proposes an information-theoretic order learner that imposes no restrictions on structural functions or noise distributions, an order-based sampler to approximate decision influence, and a surrogate objective that reduces AUF to differentiable optimization. Experiments reportedly show the order learner outperforming existing methods and the full AUF approach surpassing learned-graph and learned-order baselines while matching or exceeding oracle baselines supplied with the true graph.

Significance. If the central claims hold, the work would be significant for causal decision-making and rehearsal learning: it suggests that learning full graphs is not always necessary, potentially reducing estimation error in high-dimensional settings. The restriction-free order learner and the reduction to differentiable optimization are potentially valuable if they are shown to be parameter-free or free of hidden fitting steps. Reproducible code or machine-checked elements are not mentioned.

major comments (3)
  1. [Abstract / Experiments] Abstract and experimental section: the claim that the order-based AUF approach 'matches or even exceeds oracle baselines that are given the true graph' is load-bearing for the sufficiency argument. Because an order is strictly less informative than a graph, this requires explicit verification that the oracle baseline fully exploits the graph (e.g., via exact interventional computations rather than the same order-based sampler and surrogate objective). Without that, the comparison risks being an artifact of suboptimal oracle implementation.
  2. [Order Learning Method] Method section on the order learner: the information-theoretic order learner is asserted to impose 'no restrictions on the form of structural functions or the type of noise distributions.' Please provide the explicit estimator or objective (including any finite-sample approximations) and demonstrate that it remains consistent without parametric assumptions; otherwise the 'parameter-free' character is unclear.
  3. [Order-based Sampler and Surrogate Objective] AUF decision-making section: the order-based sampler and surrogate objective are used to approximate influence and reduce the task to differentiable optimization. Clarify whether the sampler is derived from the learned order alone or incorporates additional fitted parameters; any hidden fitting would undermine the claim that order structure alone suffices.
minor comments (2)
  1. [Abstract] The abstract states experimental superiority but supplies no details on protocols, error bars, data splits, or number of runs. These should be added for reproducibility.
  2. [Preliminaries] Notation for the order (e.g., how partial orders are represented and sampled) should be introduced with a clear definition early in the methods.

Simulated Author's Rebuttal

3 responses · 0 unresolved

Thank you for the referee's thoughtful and constructive comments. We address each major comment point by point below. We agree that additional clarity is needed in several places and will revise the manuscript accordingly to strengthen the presentation of our results without altering the core claims.

read point-by-point responses
  1. Referee: [Abstract / Experiments] Abstract and experimental section: the claim that the order-based AUF approach 'matches or even exceeds oracle baselines that are given the true graph' is load-bearing for the sufficiency argument. Because an order is strictly less informative than a graph, this requires explicit verification that the oracle baseline fully exploits the graph (e.g., via exact interventional computations rather than the same order-based sampler and surrogate objective). Without that, the comparison risks being an artifact of suboptimal oracle implementation.

    Authors: We agree this clarification is essential to support the sufficiency argument. In the current experiments the oracle baselines are implemented by leveraging the true graph to derive exact interventional distributions for computing decision influences, rather than reusing the order-based sampler. This is what enables the comparison to demonstrate that order structure alone can match or exceed a graph-based approach in the tested settings. To address the concern directly, we will revise the experimental section with an expanded description of the oracle implementation, including the mathematical steps used to exploit the full graph (e.g., via do-calculus or direct simulation from the structural equations). revision: yes

  2. Referee: [Order Learning Method] Method section on the order learner: the information-theoretic order learner is asserted to impose 'no restrictions on the form of structural functions or the type of noise distributions.' Please provide the explicit estimator or objective (including any finite-sample approximations) and demonstrate that it remains consistent without parametric assumptions; otherwise the 'parameter-free' character is unclear.

    Authors: The information-theoretic order learner identifies the order by maximizing an objective based on conditional mutual information estimated directly from the observational data. The estimator uses a non-parametric k-nearest-neighbor approach for finite samples and imposes no functional or distributional assumptions beyond standard i.i.d. sampling. We will revise the method section to include the precise mathematical form of the objective, the finite-sample estimator, and a short consistency argument under mild regularity conditions on the joint distribution. This will make the parameter-free nature explicit. revision: yes

  3. Referee: [Order-based Sampler and Surrogate Objective] AUF decision-making section: the order-based sampler and surrogate objective are used to approximate influence and reduce the task to differentiable optimization. Clarify whether the sampler is derived from the learned order alone or incorporates additional fitted parameters; any hidden fitting would undermine the claim that order structure alone suffices.

    Authors: The sampler generates trajectories by respecting only the precedence constraints encoded in the learned order; no additional parameters or auxiliary models are fitted. The surrogate objective is then a direct, differentiable function of the success probability approximated from these order-constrained samples. We will revise the AUF decision-making section to state this explicitly, add the formal definition of the sampler, and include pseudocode confirming that the procedure uses the order as its sole input. revision: yes

Circularity Check

0 steps flagged

No circularity: derivation chain is self-contained

full rationale

The paper develops an information-theoretic order learner with no restrictions on structural functions or noise, then constructs an order-based sampler and surrogate objective to reduce AUF to differentiable optimization. These steps are presented as independent constructions from observational data, not as reparameterizations or fits of the target quantities. The empirical claim of matching/exceeding a true-graph oracle is an experimental outcome rather than a definitional reduction. No self-definitional, fitted-input-as-prediction, or self-citation load-bearing steps appear in the abstract or described chain; the method is not forced by its own inputs.

Axiom & Free-Parameter Ledger

0 free parameters · 2 axioms · 0 invented entities

The central claim rests on the domain assumption that an order structure suffices to identify decision influences from observational data, plus the claim that an unrestricted information-theoretic learner can recover such an order. No free parameters or invented entities are mentioned in the abstract.

axioms (2)
  • domain assumption Order structure is sufficient to identify the influence of decisions from observational data
    Explicitly asserted as a demonstration in the abstract.
  • domain assumption Information-theoretic method can learn the order without restrictions on structural functions or noise distributions
    Stated as part of the developed learning method in the abstract.

pith-pipeline@v0.9.0 · 5520 in / 1435 out tokens · 32484 ms · 2026-05-08T17:46:26.171359+00:00 · methodology

discussion (0)

Sign in with ORCID, Apple, or X to comment. Anyone can read and Pith papers without signing in.

Lean theorems connected to this paper

Citations machine-checked in the Pith Canon. Every link opens the source theorem in the public Lean library.

What do these tags mean?
matches
The paper's claim is directly supported by a theorem in the formal canon.
supports
The theorem supports part of the paper's argument, but the paper may add assumptions or extra steps.
extends
The paper goes beyond the formal theorem; the theorem is a base layer rather than the whole result.
uses
The paper appears to rely on the theorem as machinery.
contradicts
The paper's claim conflicts with a theorem or certificate in the canon.
unclear
Pith found a possible connection, but the passage is too broad, indirect, or ambiguous to say the theorem truly supports the claim.

Forward citations

Cited by 1 Pith paper

Reviewed papers in the Pith corpus that reference this work. Sorted by Pith novelty score.

  1. Non-Parametric Rehearsal Learning via Conditional Mean Embeddings

    cs.LG 2026-05 unverdicted novelty 7.0

    A non-parametric rehearsal learning framework using conditional mean embeddings and a Probit surrogate for avoiding undesired outcomes, with consistency guarantees.

Reference graph

Works this paper leans on

42 extracted references · 4 canonical work pages · cited by 1 Pith paper

  1. [1]

    Andreas Andersson and Nicholas Bates. In situ measurements used for coral and reef-scale calcification structural equation modeling including environmental and chemical measurements, and coral calcification rates in Bermuda from 2010 to 2012 ( BEACON project), 2018

  2. [2]

    Framework for Easily Invertible Architectures (FrEIA)

    Lynton Ardizzone, Till Bungert, Felix Draxler, Ullrich K \"o the, Jakob Kruse, Robert Schmier, and Peter Sorrenson. Framework for Easily Invertible Architectures (FrEIA) . 2018--2022

  3. [3]

    arXiv preprint arXiv:1907.02392 , year=

    Lynton Ardizzone, Carsten L \"u th, Jakob Kruse, Carsten Rother, and Ullrich K \"o the. Guided image generation with conditional invertible neural networks. arXiv preprint arXiv:1907.02392, 2019

  4. [4]

    A review on deep learning for recommender systems: challenges and remedies

    Zeynep Batmaz, Ali Yurekli, Alper Bilge, and Cihan Kaleli. A review on deep learning for recommender systems: challenges and remedies. Artificial Intelligence Review, 52 0 (1): 0 1--37, 2019

  5. [5]

    Berrett, Richard J

    Thomas B. Berrett, Richard J. Samworth, and Ming Yuan. Efficient multivariate entropy estimation via k -nearest neighbour distances. The Annals of Statistics, 47 0 (1): 0 288--318, 2019

  6. [6]

    Convex optimization

    Stephen Boyd and Lieven Vandenberghe. Convex optimization. Cambridge University Press, 2004

  7. [7]

    Cam: Causal additive models, high-dimensional order search and penalized regression

    Peter B \"u hlmann, Jonas Peters, and Jan Ernest. Cam: Causal additive models, high-dimensional order search and penalized regression. The Annals of Statistics, 42 0 (6): 0 2526--2556, 2014

  8. [8]

    Optimal structure identification with greedy search

    David Maxwell Chickering. Optimal structure identification with greedy search. Journal of Machine Learning Research, 3: 0 507--554, 2002

  9. [9]

    Environmental controls on modern scleractinian coral and reef-scale calcification

    Travis A Courtney, Mario Lebrato, Nicholas R Bates, Andrew Collins, Samantha J De Putron, Rebecca Garley, Rod Johnson, Juan-Carlos Molinero, Timothy J Noyes, Christopher L Sabine, and Andreas J Andersson. Environmental controls on modern scleractinian coral and reef-scale calcification. Science Advances, 3 0 (11): 0 e1701356, 2017

  10. [10]

    Variance-reduced long-term rehearsal learning with quadratic programming reformulation

    Wen-Bo Du, Tian Qin, Tian-Zuo Wang, and Zhi-Hua Zhou. Variance-reduced long-term rehearsal learning with quadratic programming reformulation. In Advances in Neural Information Processing Systems 38, 2025

  11. [11]

    Erdős and A

    P. Erdős and A. Rényi. On the evolution of random graphs. Publ. Math. Inst. Hung. Acad. Sci., 5: 0 17--61, 1960

  12. [12]

    Bayesian optimization with inequality constraints

    Jacob R Gardner, Matt J Kusner, Zhixiang Eddie Xu, Kilian Q Weinberger, and John P Cunningham. Bayesian optimization with inequality constraints. In Proceedings of the 31st International Conference on Machine Learning, pages 937--945, 2014

  13. [13]

    Learning linear structural equation models in polynomial time and sample complexity

    Asish Ghoshal and Jean Honorio. Learning linear structural equation models in polynomial time and sample complexity. In Proceedings of the 21st International Conference on Artificial Intelligence and Statistics, pages 1466--1475, 2018

  14. [14]

    Review of causal discovery methods based on graphical models

    Clark Glymour, Kun Zhang, and Peter Spirtes. Review of causal discovery methods based on graphical models. Frontiers in Genetics, 10: 0 524, 2019

  15. [15]

    Nonlinear causal discovery with additive noise models

    Patrik Hoyer, Dominik Janzing, Joris M Mooij, Jonas Peters, and Bernhard Sch \"o lkopf. Nonlinear causal discovery with additive noise models. In Advances in Neural Information Processing Systems 21, pages 689--696, 2008

  16. [16]

    Yumi Iwasaki and Herbert A. Simon. Theories of causal ordering: Reply to de kleer and brown. Artificial Intelligence, 29 0 (1): 0 63--72, 1986

  17. [17]

    Adam: A method for stochastic optimization

    DP Kingma and J Ba. Adam: A method for stochastic optimization. In Proceedings of the 3rd International Conference on Learning Representations, 2015

  18. [18]

    L. F. Kozachenko and N. N. Leonenko. Sample estimate of the entropy of a random vector. Probl. Pered. Inform., 23: 0 95--101, 1987

  19. [19]

    Deep learning for natural language processing: advantages and challenges

    Hang Li. Deep learning for natural language processing: advantages and challenges. National Science Review, 5 0 (1): 0 24--26, 2017

  20. [20]

    Causality: Models, Reasoning, and Inference

    Judea Pearl. Causality: Models, Reasoning, and Inference. Cambridge University Press, 2000

  21. [21]

    Structural intervention distance for evaluating causal graphs

    Jonas Peters and Peter B \"u hlmann. Structural intervention distance for evaluating causal graphs. Neural Computation, 27 0 (3): 0 771--799, 2015

  22. [22]

    Rehearsal learning for avoiding undesired future

    Tian Qin, Tian-Zuo Wang, and Zhi-Hua Zhou. Rehearsal learning for avoiding undesired future. In Advances in Neural Information Processing Systems 36, pages 80517--80542, 2023

  23. [23]

    Gradient-based nonlinear rehearsal learning with multivariate alterations

    Tian Qin, Tian-Zuo Wang, and Zhi-Hua Zhou. Gradient-based nonlinear rehearsal learning with multivariate alterations. In Proceedings of the 39th AAAI Conference on Artificial Intelligence , pages 26859--26867, 2025

  24. [24]

    a us Kleindessner, Chris Russell, Dominik Janzing, Bernhard Sch \

    Paul Rolland, Volkan Cevher, Matth \"a us Kleindessner, Chris Russell, Dominik Janzing, Bernhard Sch \"o lkopf, and Francesco Locatello. Score matching enables causal discovery of nonlinear additive noise models. In Proceedings of the 39th International Conference on Machine Learning, pages 18741--18753, 2022

  25. [25]

    Causal protein-signaling networks derived from multiparameter single-cell data

    Karen Sachs, Omar Perez, Dana Pe'er, Douglas A Lauffenburger, and Garry P Nolan. Causal protein-signaling networks derived from multiparameter single-cell data. Science, 308 0 (5721): 0 523--529, 2005

  26. [26]

    Diffusion models for causal discovery via topological ordering

    Pedro Sanchez, Xiao Liu, Alison Q O'Neil, and Sotirios A Tsaftaris. Diffusion models for causal discovery via topological ordering. In Proceedings of the 11th International Conference on Learning Representations, pages 20468--20487, 2023

  27. [27]

    A linear non-gaussian acyclic model for causal discovery

    Shohei Shimizu, Patrik O Hoyer, Aapo Hyv \"a rinen, Antti Kerminen, and Michael Jordan. A linear non-gaussian acyclic model for causal discovery. Journal of Machine Learning Research, 7: 0 2003--2030, 2006

  28. [28]

    Herbert A. Simon. Causal ordering and identifiability. In Wm. C. Hood and Tjalling C. Koopmans, editors, Studies in Econometric Methods, pages 49--74. John Wiley & Sons, New York, 1953

  29. [29]

    Causation, Prediction, and Search

    Peter Spirtes, Clark Glymour, and Richard Scheines. Causation, Prediction, and Search. MIT Press, 2000

  30. [30]

    Ensemble estimators for multivariate entropy estimation

    Kumar Sricharan, Dennis Wei, and Alfred O Hero. Ensemble estimators for multivariate entropy estimation. IEEE Transactions on Information Theory, 59 0 (7): 0 4374--4388, 2013

  31. [31]

    Avoiding undesired future with sequential decisions

    Lue Tao, Tian-Zuo Wang, Yuan Jiang, and Zhi-Hua Zhou. Avoiding undesired future with sequential decisions. In Proceedings of the 34th International Joint Conference on Artificial Intelligence, pages 6245--6253, 2025

  32. [32]

    Ordering-based search: A simple and effective algorithm for learning bayesian networks

    Marc Teyssier and Daphne Koller. Ordering-based search: A simple and effective algorithm for learning bayesian networks. In Proceedings of the 21st Conference on Uncertainty in Artificial Intelligence, pages 584--590, 2005

  33. [33]

    Geometry of the faithfulness assumption in causal inference

    Caroline Uhler, Garvesh Raskutti, Peter B \"u hlmann, and Bin Yu. Geometry of the faithfulness assumption in causal inference. The Annals of Statistics, pages 436--463, 2013

  34. [34]

    Deep learning for computer vision: A brief review

    Athanasios Voulodimos, Anastasios Doulamis, Nikolaos Doulamis, and Evangelos Protopapadakis. Deep learning for computer vision: A brief review. Computational Intelligence and Neuroscience, pages 1--13, 2018

  35. [35]

    Learning likelihoods with conditional normalizing flows.CoRR, abs/1912.00042,

    Christina Winkler, Daniel Worrall, Emiel Hoogeboom, and Max Welling. Learning likelihoods with conditional normalizing flows. arXiv preprint arXiv:1912.00042, 2019

  36. [36]

    Local causal structure learning in the presence of latent variables

    Feng Xie, Zheng Li, Peng Wu, Yan Zeng, Chunchen Liu, and Zhi Geng. Local causal structure learning in the presence of latent variables. arXiv preprint arXiv:2405.16225, 2024

  37. [37]

    Ordering-based causal discovery for linear and nonlinear relations

    Zhuopeng Xu, Yujie Li, Cheng Liu, and Ning Gui. Ordering-based causal discovery for linear and nonlinear relations. In Advances in Neural Information Processing Systems 37, pages 4315--4340, 2024

  38. [38]

    A survey on causal discovery: theory and practice

    Alessio Zanga, Elif Ozkirimli, and Fabio Stella. A survey on causal discovery: theory and practice. International Journal of Approximate Reasoning, 151: 0 101--129, 2022

  39. [39]

    On the completeness of orientation rules for causal discovery in the presence of latent confounders

    Jiji Zhang. On the completeness of orientation rules for causal discovery in the presence of latent confounders. Artificial Intelligence, 172 0 (16-17): 0 1873--1896, 2008

  40. [40]

    MoRe: Modular Representations for Principled Continual Learning on LLMs

    Kun Zhang, Shaoan Xie, Ignavier Ng, and Yujia Zheng. Causal representation learning from multiple distributions: A general setting. arXiv preprint arXiv:2402.05052, 2024

  41. [41]

    Dags with no tears: Continuous optimization for structure learning

    Xun Zheng, Bryon Aragam, Pradeep K Ravikumar, and Eric P Xing. Dags with no tears: Continuous optimization for structure learning. In Advances in Neural Information Processing Systems 31, pages 9472--9483, 2018

  42. [42]

    Rehearsal: learning from prediction to decision

    Zhi-Hua Zhou. Rehearsal: learning from prediction to decision. Frontiers of Computer Science, 16(4): 0 164352, 2022