Recognition: 2 theorem links
· Lean TheoremIntegrating Causal DAGs in Deep RL: Activating Minimal Markovian States with Multi-Order Exposure
Pith reviewed 2026-05-11 02:29 UTC · model grok-4.3
The pith
Given a longitudinal causal graph over observations, a procedure builds a provably minimal Markov state for RL, yet deep networks require multi-order historical exposures to realize any gains.
A machine-rendered reading of the paper's core claim, the machinery that carries it, and where it could break.
Core claim
Given a longitudinal causal graph over observed variables, a procedure constructs a provably minimal state representation that satisfies the Markov property. In deep RL, the minimal representation alone fails to improve performance, indicating that neural networks cannot directly exploit Markovian minimality. MOSE addresses this by feeding multi-order historical state constructions into the same Q-function. MOSE consistently outperforms both the minimal state construction and single-window policies on common benchmarks and synthetic datasets. Including the minimal representation alongside MOSE can further improve performance. These results establish that minimal sufficiency is not enough and
What carries the argument
MOSE (Multi-Order State Exposure), the mechanism that augments a minimal Markov state derived from a causal DAG with multiple historical orders and supplies them jointly to a standard Q-network.
If this is right
- A provably minimal Markovian state can be derived directly from any accurate longitudinal causal DAG over observed variables.
- Standard deep Q-networks cannot exploit the minimality of a state without additional structure such as multi-order histories.
- Multi-order exposure of historical states produces higher performance than either the pure minimal state or single-window policies.
- Combining the minimal state with MOSE yields further gains beyond MOSE alone.
- The performance pattern holds on common RL benchmarks and on synthetic datasets with known causal structure.
Where Pith is reading between the lines
- If the causal graph must be learned from data rather than provided exactly, small errors could turn the derived state non-Markovian and erase the theoretical guarantee.
- The same principle of controlled redundancy might apply to other deep RL architectures such as actor-critic or model-based methods.
- An adaptive choice of which historical orders to expose could replace the fixed multi-order scheme and reduce unnecessary computation.
- The construction could be tested in partially observable settings where some variables in the causal graph are hidden.
Load-bearing premise
An accurate longitudinal causal graph over the observed variables is supplied as input, and standard neural Q-networks can directly exploit the minimal state when it is augmented with multi-order histories without further architectural changes.
What would settle it
If, on a controlled benchmark where the true causal graph is known exactly, MOSE produces no improvement or produces worse performance than a non-causal baseline that ignores the graph, the claim that multi-order exposure unlocks the benefit of causal states would be refuted.
Figures
read the original abstract
Online reinforcement learning (RL) relies on the Markov property for guaranteed performance, but real-world applications often lack well-defined states given raw observed variables. While causal RL has attracted growing interest, existing work typically assumes Markovian states are provided and focuses on using causality to accelerate learning, leaving a fundamental gap: \emph{given a longitudinal causal graph over observed variables, how does one construct MDP states that provably satisfy the Markov property?} We address this by providing a procedure that constructs a provably minimal state representation. In deep RL, we observe that the minimal representation alone empirically fails to improve performance, indicating that neural networks cannot directly exploit Markovian minimality. To address this, we propose \textbf{MOSE} (Multi-Order State Exposure), which feeds multi-order historical state constructions into the same $Q$-function. MOSE consistently outperforms both the minimal state construction and single-window policies on common benchmarks and synthetic datasets. Including the minimal representation alongside MOSE can further improve performance. Our results establish a core principle for causal deep RL: minimal sufficiency is not enough, and \emph{controlled redundancy} is necessary to unlock the benefit of causal state information.
Editorial analysis
A structured set of objections, weighed in public.
Referee Report
Summary. The paper claims to provide a procedure that constructs a provably minimal Markovian state representation from a longitudinal causal graph over observed variables for use in RL. It observes that this minimal representation alone does not improve performance in deep RL, and proposes MOSE which feeds multi-order historical state constructions into the Q-function. MOSE is reported to consistently outperform the minimal state construction and single-window policies on common benchmarks and synthetic datasets. The paper concludes that minimal sufficiency is not enough and controlled redundancy is necessary to unlock the benefit of causal state information in deep RL.
Significance. If the results hold, this work is significant for causal deep RL as it provides a principled way to derive minimal Markov states from causal DAGs and highlights a key practical issue with using minimal representations in neural RL agents. The proposal of MOSE as a simple way to add controlled redundancy is a useful contribution. The paper explicitly credits the idea of using causal graphs but extends it to state construction and empirical validation of the redundancy principle. However, the significance depends on the rigor of the proof and experiments, which are not detailed in the abstract.
major comments (2)
- [Construction procedure (likely §3)] The claim that the procedure constructs a 'provably minimal state representation' is central but the manuscript does not supply the algorithm steps or a proof sketch in the provided text, preventing assessment of whether the construction indeed satisfies the Markov property without additional assumptions.
- [Empirical evaluation (likely §5)] The assertion that 'MOSE consistently outperforms' both minimal and single-window policies lacks any quantitative results, error bars, baseline details, or statistical tests in the abstract, which is load-bearing for the claim that minimal sufficiency is not enough.
minor comments (2)
- [Abstract] The abstract mentions 'common benchmarks and synthetic datasets' but does not specify which ones, reducing clarity.
- [Notation] The term 'multi-order historical state constructions' is introduced without a formal definition or equation in the summary text.
Simulated Author's Rebuttal
We thank the referee for the constructive feedback and the recommendation for major revision. We address each major comment below by referencing the relevant sections of the full manuscript and outlining the revisions we will make to improve clarity and accessibility of the key claims.
read point-by-point responses
-
Referee: [Construction procedure (likely §3)] The claim that the procedure constructs a 'provably minimal state representation' is central but the manuscript does not supply the algorithm steps or a proof sketch in the provided text, preventing assessment of whether the construction indeed satisfies the Markov property without additional assumptions.
Authors: Section 3 of the full manuscript presents the complete construction procedure as an algorithm that extracts a minimal set of variables from the longitudinal causal DAG such that the resulting state satisfies the Markov property for the RL process. The section includes pseudocode for the procedure and a proof sketch based on d-separation and the definition of minimal sufficient statistics for the transition and reward functions. The proof requires no assumptions beyond the given causal graph being a faithful representation of the data-generating process. We will revise the manuscript to move the proof sketch into the main text (currently in the appendix) and add an explicit statement that the construction is minimal by construction. revision: yes
-
Referee: [Empirical evaluation (likely §5)] The assertion that 'MOSE consistently outperforms' both minimal and single-window policies lacks any quantitative results, error bars, baseline details, or statistical tests in the abstract, which is load-bearing for the claim that minimal sufficiency is not enough.
Authors: Section 5 reports the full experimental results on standard RL benchmarks and synthetic datasets, including mean returns with standard error bars over 10 random seeds, explicit baseline implementations (minimal state, fixed-window history, and standard DQN), and statistical significance tests confirming MOSE's improvements. To make the abstract self-contained and address the load-bearing nature of the claim, we will revise it to include concise quantitative highlights such as average performance gains and significance levels. revision: yes
Circularity Check
No significant circularity detected in derivation chain
full rationale
The paper's core claim is a procedure that, given an external longitudinal causal graph over observed variables as input, constructs a provably minimal state representation satisfying the Markov property. This is presented as derived from causal graph properties rather than fitted parameters or self-referential definitions. The subsequent observation that the minimal state alone fails to improve deep RL performance (leading to the MOSE multi-order augmentation) is an empirical finding, not a mathematical reduction to the input. No load-bearing equations, uniqueness theorems, or ansatzes are shown to collapse by construction to the provided causal graph or to self-citations; the central result remains independent of the fitted Q-networks and rests on the external graph plus experimental validation. This is the expected honest non-finding for a method whose inputs are stated as given and whose outputs are not tautological renamings of those inputs.
Axiom & Free-Parameter Ledger
axioms (1)
- domain assumption A longitudinal causal graph over observed variables is given and correctly encodes the temporal dependencies.
invented entities (1)
-
MOSE (Multi-Order State Exposure)
no independent evidence
Lean theorems connected to this paper
-
IndisputableMonolith/Cost/FunctionalEquation.leanwashburn_uniqueness_aczel unclear?
unclearRelation between the paper passage and the cited Recognition theorem.
We address this by providing a procedure that constructs a provably minimal state representation... minimal sufficiency is not enough, and controlled redundancy is necessary
-
IndisputableMonolith/Foundation/RealityFromDistinction.leanreality_from_one_distinction unclear?
unclearRelation between the paper passage and the cited Recognition theorem.
Theorem 4.1 (Graphical criterion for valid causal state space construction)
What do these tags mean?
- matches
- The paper's claim is directly supported by a theorem in the formal canon.
- supports
- The theorem supports part of the paper's argument, but the paper may add assumptions or extra steps.
- extends
- The paper goes beyond the formal theorem; the theorem is a base layer rather than the whole result.
- uses
- The paper appears to rely on the theorem as machinery.
- contradicts
- The paper's claim conflicts with a theorem or certificate in the canon.
- unclear
- Pith found a possible connection, but the passage is too broad, indirect, or ambiguous to say the theorem truly supports the claim.
Reference graph
Works this paper leans on
-
[1]
Richard S Sutton and Andrew G Barto.Reinforcement learning: An introduction, volume 1. MIT press Cambridge, 1998
work page 1998
-
[2]
Matthieu Komorowski, Leo A Celi, Omar Badawi, Anthony C Gordon, and A Aldo Faisal. The artificial intelligence clinician learns optimal treatment strategies for sepsis in intensive care. Nature medicine, 24(11):1716–1720, 2018
work page 2018
-
[3]
Rethinking progression of memory state in robotic manipulation: An object-centric perspective
Nhat Chung, Taisei Hanyu, Toan Nguyen, Huy Le, Frederick Bumgarner, Duy Minh Ho Nguyen, Khoa V o, Kashu Yamazaki, Chase Rainwater, Tung Kieu, et al. Rethinking progression of memory state in robotic manipulation: An object-centric perspective. InProceedings of the AAAI Conference on Artificial Intelligence, volume 40, pages 3407–3415, 2026
work page 2026
-
[4]
George E Monahan. State of the art—a survey of partially observable markov decision processes: theory, models, and algorithms.Management science, 28(1):1–16, 1982
work page 1982
-
[5]
Deep recurrent q-learning for partially observable mdps
Matthew J Hausknecht and Peter Stone. Deep recurrent q-learning for partially observable mdps. InAAAI fall symposia, volume 45, page 141, 2015
work page 2015
-
[6]
Recurrent experience replay in distributed reinforcement learning
Steven Kapturowski, Georg Ostrovski, John Quan, Remi Munos, and Will Dabney. Recurrent experience replay in distributed reinforcement learning. InInternational conference on learning representations, 2018
work page 2018
-
[7]
Agent57: Outperforming the atari human benchmark
Adrià Puigdomènech Badia, Bilal Piot, Steven Kapturowski, Pablo Sprechmann, Alex Vitvit- skyi, Zhaohan Daniel Guo, and Charles Blundell. Agent57: Outperforming the atari human benchmark. InInternational conference on machine learning, pages 507–517. PMLR, 2020
work page 2020
-
[8]
Lili Chen, Kevin Lu, Aravind Rajeswaran, Kimin Lee, Aditya Grover, Misha Laskin, Pieter Abbeel, Aravind Srinivas, and Igor Mordatch. Decision transformer: Reinforcement learning via sequence modeling.Advances in neural information processing systems, 34:15084–15097, 2021
work page 2021
-
[9]
Qinqing Zheng, Amy Zhang, and Aditya Grover. Online decision transformer. Ininternational conference on machine learning, pages 27042–27059. PMLR, 2022
work page 2022
-
[10]
Chengchun Shi, Runzhe Wan, Rui Song, Wenbin Lu, and Ling Leng. Does the markov decision process fit the data: Testing for the markov property in sequential decision making. In International Conference on Machine Learning, pages 8807–8817. PMLR, 2020
work page 2020
-
[11]
Human-level control through deep reinforcement learning.nature, 518(7540):529–533, 2015
V olodymyr Mnih, Koray Kavukcuoglu, David Silver, Andrei A Rusu, Joel Veness, Marc G Bellemare, Alex Graves, Martin Riedmiller, Andreas K Fidjeland, Georg Ostrovski, et al. Human-level control through deep reinforcement learning.nature, 518(7540):529–533, 2015
work page 2015
-
[12]
Rainbow: Combining improve- ments in deep reinforcement learning
Matteo Hessel, Joseph Modayil, Hado Van Hasselt, Tom Schaul, Georg Ostrovski, Will Dabney, Dan Horgan, Bilal Piot, Mohammad Azar, and David Silver. Rainbow: Combining improve- ments in deep reinforcement learning. InProceedings of the AAAI conference on artificial intelligence, volume 32, 2018
work page 2018
-
[13]
Model based reinforcement learning for atari
Łukasz Kaiser, Mohammad Babaeizadeh, Piotr Miłos, Bła˙zej Osi´nski, Roy H Campbell, Konrad Czechowski, Dumitru Erhan, Chelsea Finn, Piotr Kozakowski, Sergey Levine, et al. Model based reinforcement learning for atari. InInternational Conference on Learning Representations, 2020
work page 2020
-
[14]
Finding the framestack: Learning what to remember for non-markovian reinforcement learning
Geraud Nangue Tasse, Matthew Riemer, Benjamin Rosman, and Tim Klinger. Finding the framestack: Learning what to remember for non-markovian reinforcement learning. InFinding the Frame Workshop at RLC 2025, 2025
work page 2025
-
[15]
Automatic reward shaping from confounded offline data.arXiv preprint arXiv:2505.11478, 2025
Mingxuan Li, Junzhe Zhang, and Elias Bareinboim. Automatic reward shaping from confounded offline data.arXiv preprint arXiv:2505.11478, 2025
-
[16]
Confounding robust deep reinforcement learning: A causal approach
Mingxuan Li, Junzhe Zhang, and Elias Bareinboim. Confounding robust deep reinforcement learning: A causal approach. InThe Thirty-ninth Annual Conference on Neural Information Processing Systems, 2025. 10
work page 2025
-
[17]
arXiv preprint arXiv:1804.06893 , year=
Chiyuan Zhang, Oriol Vinyals, Remi Munos, and Samy Bengio. A study on overfitting in deep reinforcement learning.arXiv preprint arXiv:1804.06893, 2018
-
[18]
The primacy bias in deep reinforcement learning
Evgenii Nikishin, Max Schwarzer, Pierluca D’Oro, Pierre-Luc Bacon, and Aaron Courville. The primacy bias in deep reinforcement learning. InInternational conference on machine learning, pages 16828–16847. PMLR, 2022
work page 2022
-
[19]
The dormant neuron phenomenon in deep reinforcement learning
Ghada Sokar, Rishabh Agarwal, Pablo Samuel Castro, and Utku Evci. The dormant neuron phenomenon in deep reinforcement learning. InInternational Conference on Machine Learning, pages 32145–32168. PMLR, 2023
work page 2023
-
[20]
Quantifying generalization in reinforcement learning
Karl Cobbe, Oleg Klimov, Chris Hesse, Taehoon Kim, and John Schulman. Quantifying generalization in reinforcement learning. InInternational conference on machine learning, pages 1282–1289. PMLR, 2019
work page 2019
-
[21]
Noisy networks for exploration
M Fortunato, MG Azar, B Piot, J Menick, M Hessel, I Osband, A Graves, V Mnih, R Munos, D Hassabis, O Pietquin, and S Blundell, C Legg. Noisy networks for exploration. InInterna- tional Conference on Learning Representations (ICLR), 2018
work page 2018
-
[22]
Misha Laskin, Kimin Lee, Adam Stooke, Lerrel Pinto, Pieter Abbeel, and Aravind Srinivas. Reinforcement learning with augmented data.Advances in Neural Information Processing Systems, 33:19884–19895, 2020
work page 2020
-
[23]
Curl: Contrastive unsupervised repre- sentations for reinforcement learning
Michael Laskin, Aravind Srinivas, and Pieter Abbeel. Curl: Contrastive unsupervised repre- sentations for reinforcement learning. InInternational conference on machine learning, pages 5639–5650. PMLR, 2020
work page 2020
-
[24]
Decoupling representation learning from reinforcement learning
Adam Stooke, Kimin Lee, Pieter Abbeel, and Michael Laskin. Decoupling representation learning from reinforcement learning. InInternational conference on machine learning, pages 9870–9879. PMLR, 2021
work page 2021
-
[25]
Multi-view reinforcement learning.Advances in Neural Information Processing Systems, 32, 2019
Minne Li, Lisheng Wu, Jun Wang, and Haitham Bou Ammar. Multi-view reinforcement learning.Advances in Neural Information Processing Systems, 32, 2019
work page 2019
-
[26]
Unsupervised learning of visual 3d keypoints for control
Boyuan Chen, Pieter Abbeel, and Deepak Pathak. Unsupervised learning of visual 3d keypoints for control. InInternational Conference on Machine Learning, pages 1539–1549. PMLR, 2021
work page 2021
-
[27]
Look closer: Bridging egocentric and third-person views with transformers for robotic manipulation
Rishabh Jangir, Nicklas Hansen, Sambaran Ghosal, Mohit Jain, and Xiaolong Wang. Look closer: Bridging egocentric and third-person views with transformers for robotic manipulation. IEEE Robotics and Automation Letters, 7(2):3046–3053, 2022
work page 2022
-
[28]
Information-theoretic state space model for multi-view reinforcement learning
HyeongJoo Hwang, Seokin Seo, Youngsoo Jang, Sungyoon Kim, Geon-Hyeong Kim, Se- unghoon Hong, and Kee-Eung Kim. Information-theoretic state space model for multi-view reinforcement learning. InProceedings of the 40th International Conference on Machine Learning, pages 14249–14282, 2023
work page 2023
-
[29]
Testing for the markov property in timeseries.Econometric Theory, 28(1):130–178, 2012
Bin Chen and Yongmiao Hong. Testing for the markov property in timeseries.Econometric Theory, 28(1):130–178, 2012
work page 2012
-
[30]
Yunzhe Zhou, Chengchun Shi, Lexin Li, and Qiwei Yao. Testing for the markov property in time series via deep conditional generative learning.Journal of the Royal Statistical Society Series B: Statistical Methodology, 85(4):1204–1222, 2023
work page 2023
-
[31]
Causal directed acyclic graph-informed reward design
Lutong Zou, Ziping Xu, Daiqi Gao, and Susan Murphy. Causal directed acyclic graph-informed reward design. InThe Multi-disciplinary Conference on Reinforcement Learning and Decision Making (RLDM), 2025
work page 2025
-
[32]
Robust reward modeling via causal rubrics
Pragya Srivastava, Harman Singh, Rahul Madhavan, Gandharv Patil, Sravanti Addepalli, Arun Suggala, Rengarajan Aravamudhan, Soumya Sharma, Anirban Laha, Aravindan Raghuveer, et al. Robust reward modeling via causal rubrics. InICML 2025 Workshop on Models of Human Feedback for AI Alignment, 2025
work page 2025
-
[33]
Mateo Juliani, Mingxuan Li, and Elias Bareinboim. Confounding robust continuous control via automatic reward shaping.Proceedings of the 25th International Conference on Autonomous Agents and Multiagent Systems, 2026. 11
work page 2026
-
[34]
Designing optimal dynamic treatment regimes: A causal reinforcement learning approach
Junzhe Zhang and Elias Bareinboim. Designing optimal dynamic treatment regimes: A causal reinforcement learning approach. InInternational Conference on Machine Learning. PMLR, 2020
work page 2020
-
[35]
Causal dynamics learning for task-independent state abstraction
Zizhao Wang, Xuesu Xiao, Zifan Xu, Yuke Zhu, and Peter Stone. Causal dynamics learning for task-independent state abstraction. InInternational Conference on Machine Learning, pages 23151–23180. PMLR, 2022
work page 2022
-
[36]
Invariant causal prediction for block mdps
Amy Zhang, Clare Lyle, Shagun Sodhani, Angelos Filos, Marta Kwiatkowska, Joelle Pineau, Yarin Gal, and Doina Precup. Invariant causal prediction for block mdps. InInternational Conference on Machine Learning, pages 11214–11224. PMLR, 2020
work page 2020
-
[37]
Building minimal and reusable causal state abstractions for reinforcement learning
Zizhao Wang, Caroline Wang, Xuesu Xiao, Yuke Zhu, and Peter Stone. Building minimal and reusable causal state abstractions for reinforcement learning. InProceedings of the AAAI Conference on Artificial Intelligence, volume 38, pages 15778–15786, 2024
work page 2024
-
[38]
Harnessing causality in reinforce- ment learning with bagged decision times
Daiqi Gao, Hsin-Yu Lai, Predrag Klasnja, and Susan Murphy. Harnessing causality in reinforce- ment learning with bagged decision times. InThe 28th International Conference on Artificial Intelligence and Statistics, 2025
work page 2025
-
[39]
State abstraction for programmable reinforcement learning agents
David Andre, Stuart J Russell, et al. State abstraction for programmable reinforcement learning agents. InAnnual AAAI Conference on Artificial Intelligence, 2002
work page 2002
-
[40]
State abstractions for lifelong reinforcement learning
David Abel, Dilip Arumugam, Lucas Lehnert, and Michael Littman. State abstractions for lifelong reinforcement learning. InInternational conference on machine learning, pages 10–19. PMLR, 2018
work page 2018
-
[41]
Bridging State and History Represen- tations: Understanding Self-Predictive RL, April 2024
Tianwei Ni, Benjamin Eysenbach, Erfan Seyedsalehi, Michel Ma, Clement Gehring, Aditya Mahajan, and Pierre-Luc Bacon. Bridging state and history representations: Understanding self-predictive rl.arXiv preprint arXiv:2401.08898, 2024
-
[42]
State representation learning for control: An overview.Neural Networks, 108:379–392, 2018
Timothée Lesort, Natalia Díaz-Rodríguez, Jean-Franois Goudou, and David Filliat. State representation learning for control: An overview.Neural Networks, 108:379–392, 2018
work page 2018
-
[43]
Kei Ota, Tomoaki Oiki, Devesh Jha, Toshisada Mariyama, and Daniel Nikovski. Can increasing input dimensionality improve deep reinforcement learning? InInternational conference on machine learning, pages 7424–7433. PMLR, 2020
work page 2020
-
[44]
Curiosity-driven exploration by self-supervised prediction
Deepak Pathak, Pulkit Agrawal, Alexei A Efros, and Trevor Darrell. Curiosity-driven exploration by self-supervised prediction. InInternational Conference on Machine Learning, pages 2778–
-
[45]
Image augmentation is all you need: Regu- larizing deep reinforcement learning from pixels
Denis Yarats, Ilya Kostrikov, and Rob Fergus. Image augmentation is all you need: Regu- larizing deep reinforcement learning from pixels. InInternational Conference on Learning Representations (ICLR), 2021
work page 2021
-
[46]
Time-contrastive networks: Self-supervised learning from video
Pierre Sermanet, Corey Lynch, Yevgen Chebotar, Jasmine Hsu, Eric Jang, Stefan Schaal, Sergey Levine, and Google Brain. Time-contrastive networks: Self-supervised learning from video. In2018 IEEE international conference on robotics and automation (ICRA), pages 1134–1141. IEEE, 2018
work page 2018
-
[47]
Michael Laskin, Hao Liu, Xue Bin Peng, Denis Yarats, Aravind Rajeswaran, and Pieter Abbeel. Unsupervised reinforcement learning with contrastive intrinsic control.Advances in Neural Information Processing Systems, 35:34478–34491, 2022
work page 2022
-
[48]
Benjamin Eysenbach, Tianjun Zhang, Sergey Levine, and Russ R Salakhutdinov. Contrastive learning as goal-conditioned reinforcement learning.Advances in Neural Information Process- ing Systems, 35:35603–35620, 2022
work page 2022
-
[49]
Causality: Models, reasoning, and inference.Cambridge, UK: Cambridge University Press, 19(2):3, 2000
Judea Pearl. Causality: Models, reasoning, and inference.Cambridge, UK: Cambridge University Press, 19(2):3, 2000. 12
work page 2000
-
[50]
Elias Bareinboim, Juan D. Correa, Duligur Ibeling, and Thomas Icard.On Pearl’s Hierarchy and the Foundations of Causal Inference, page 507–556. Association for Computing Machinery, New York, NY , USA, 1 edition, 2022. ISBN 9781450395861. URLhttps://doi.org/10. 1145/3501714.3501743
-
[51]
Judea Pearl. Probabilities of Causation: Three Counterfactual Interpretations and Their Identifi- cation.Synthese, 121:93–149, 1999
work page 1999
-
[52]
Causal inference: A tale of three frameworks.arXiv preprint arXiv:2511.21516, 2025
Linbo Wang, Thomas Richardson, and James Robins. Causal inference: A tale of three frameworks.arXiv preprint arXiv:2511.21516, 2025
-
[53]
Jakob Runge. Causal network reconstruction from time series: From theoretical assumptions to practical estimation.Chaos: An Interdisciplinary Journal of Nonlinear Science, 28(7), 2018
work page 2018
-
[54]
Charles K Assaad, Emilie Devijver, and Eric Gaussier. Survey and evaluation of causal discovery methods for time series.Journal of Artificial Intelligence Research, 73:767–819, 2022
work page 2022
-
[55]
Uzma Hasan, Emam Hossain, and Md Osman Gani. A survey on causal discovery methods for iid and time series data.Transactions on Machine Learning Research, 2023
work page 2023
-
[56]
Jonas Peters, Dominik Janzing, and Bernhard Schölkopf. Causal inference on time series using restricted structural equation models.Advances in neural information processing systems, 26, 2013
work page 2013
-
[57]
Near-optimal regret bounds for reinforcement learning
Peter Auer, Thomas Jaksch, and Ronald Ortner. Near-optimal regret bounds for reinforcement learning. InAdvances in Neural Information Processing Systems, volume 21. Curran Associates, Inc., 2008
work page 2008
-
[58]
An introduction to causal reinforcement learning.arXiv preprint arXiv:2101.06498, 2025
Elias Bareinboim, Sanghack Lee, and Junzhe Zhang. An introduction to causal reinforcement learning.arXiv preprint arXiv:2101.06498, 2025
-
[59]
Soft actor-critic: Off- policy maximum entropy deep reinforcement learning with a stochastic actor
Tuomas Haarnoja, Aurick Zhou, Pieter Abbeel, and Sergey Levine. Soft actor-critic: Off- policy maximum entropy deep reinforcement learning with a stochastic actor. InInternational conference on machine learning, pages 1861–1870. Pmlr, 2018
work page 2018
-
[60]
Marlos C. Machado, Marc G. Bellemare, Erik Talvitie, Joel Veness, Matthew Hausknecht, and Michael Bowling. Revisiting the arcade learning environment: evaluation protocols and open problems for general agents (extended abstract). InProceedings of the 27th International Joint Conference on Artificial Intelligence, page 5573–5577, 2018. ISBN 9780999241127
work page 2018
-
[61]
Deep Reinforcement Learning with Double Q-learning
Hado van Hasselt, Arthur Guez, and David Silver. Deep reinforcement learning with double q-learning, 2015. URLhttps://arxiv.org/abs/1509.06461
work page Pith review arXiv 2015
-
[62]
Patrik Hoyer, Dominik Janzing, Joris M Mooij, Jonas Peters, and Bernhard Schölkopf. Nonlinear causal discovery with additive noise models.Advances in neural information processing systems, 21, 2008
work page 2008
-
[63]
Shohei Shimizu, Takanori Inazumi, Yasuhiro Sogawa, Aapo Hyvarinen, Yoshinobu Kawahara, Takashi Washio, Patrik O Hoyer, Kenneth Bollen, and Patrik Hoyer. Directlingam: A direct method for learning a linear non-gaussian structural equation model.Journal of Machine Learning Research-JMLR, 12(Apr):1225–1248, 2011
work page 2011
-
[64]
Jonas Peters, Joris M Mooij, Dominik Janzing, and Bernhard Schölkopf. Causal discovery with continuous additive noise models.Journal of Machine Learning Research, 15:2009–2053, 2014
work page 2009
-
[65]
On the identifiability of the post-nonlinear causal model
K Zhang and A Hyvärinen. On the identifiability of the post-nonlinear causal model. In25th Conference on Uncertainty in Artificial Intelligence (UAI 2009), pages 647–655. AUAI Press, 2009
work page 2009
-
[66]
Clive WJ Granger. Investigating causal relations by econometric models and cross-spectral methods.Econometrica: journal of the Econometric Society, pages 424–438, 1969
work page 1969
-
[67]
Measurement of linear dependence and feedback between multiple time series
John Geweke. Measurement of linear dependence and feedback between multiple time series. Journal of the American statistical association, 77(378):304–313, 1982. 13
work page 1982
-
[68]
Riˇcards Marcinkevi ˇcs and Julia E V ogt. Interpretable models for granger causality using self-explaining neural networks.arXiv preprint arXiv:2101.07600, 2021
-
[69]
Alex Tank, Ian Covert, Nicholas Foti, Ali Shojaie, and Emily B Fox. Neural granger causality. IEEE Transactions on Pattern Analysis and Machine Intelligence, 44(8):4267–4279, 2022
work page 2022
-
[70]
Amortized causal discovery: Learning to infer causal graphs from time-series data
Sindy Löwe, David Madras, Richard Zemel, and Max Welling. Amortized causal discovery: Learning to infer causal graphs from time-series data. InConference on Causal Learning and Reasoning, pages 509–525. PMLR, 2022
work page 2022
-
[71]
Causal discovery for non-stationary non-linear time series data using just-in-time modeling
Daigo Fujiwara, Kazuki Koyama, Keisuke Kiritoshi, Tomomi Okawachi, Tomonori Izumitani, and Shohei Shimizu. Causal discovery for non-stationary non-linear time series data using just-in-time modeling. InConference on Causal Learning and Reasoning, pages 880–894. PMLR, 2023
work page 2023
-
[72]
On causal discovery from time series data using fci.Proba- bilistic graphical models, 16, 2010
Doris Entner and Patrik O Hoyer. On causal discovery from time series data using fci.Proba- bilistic graphical models, 16, 2010
work page 2010
-
[73]
Jakob Runge, Peer Nowack, Marlene Kretschmer, Seth Flaxman, and Dino Sejdinovic. Detecting and quantifying causal associations in large nonlinear time series datasets.Science advances, 5 (11):eaau4996, 2019
work page 2019
-
[74]
Jakob Runge. Discovering contemporaneous and lagged causal relations in autocorrelated nonlinear time series datasets. InProceedings of the 36th Conference on Uncertainty in Artificial Intelligence (UAI), pages 1388–1397, 2020
work page 2020
-
[75]
Causal discovery for time series from multiple datasets with latent contexts
Wiebke Günther, Urmi Ninad, and Jakob Runge. Causal discovery for time series from multiple datasets with latent contexts. InUncertainty in Artificial Intelligence, pages 766–776. PMLR, 2023
work page 2023
-
[76]
Dynotears: Structure learning from time-series data
Roxana Pamfil, Nisara Sriwattanaworachai, Shaan Desai, Philip Pilgerstorfer, Konstantinos Georgatzis, Paul Beaumont, and Bryon Aragam. Dynotears: Structure learning from time-series data. InInternational conference on artificial intelligence and statistics, pages 1595–1605. PMLR, 2020
work page 2020
-
[77]
Neural graphical modelling in continuous-time: consistency guarantees and algorithms
Alexis Bellot, Kim Branson, and Mihaela van der Schaar. Neural graphical modelling in continuous-time: consistency guarantees and algorithms. InInternational Conference on Learning Representations, 2022
work page 2022
-
[78]
Nts-notears: Learning nonpara- metric dbns with prior knowledge
Xiangyu Sun, Oliver Schulte, Guiliang Liu, and Pascal Poupart. Nts-notears: Learning nonpara- metric dbns with prior knowledge. InInternational Conference on Artificial Intelligence and Statistics, pages 1942–1964. PMLR, 2023
work page 1942
-
[79]
Mingzhou Liu, Xinwei Sun, and Yizhou Wang. Conditional local independence testing for it\ˆ o processes with applications to dynamic causal discovery.arXiv preprint arXiv:2506.07844, 2025
-
[80]
Finnian Lattimore, Tor Lattimore, and Mark D Reid. Causal bandits: Learning good interven- tions via causal inference.Advances in neural information processing systems, 29, 2016
work page 2016
discussion (0)
Sign in with ORCID, Apple, or X to comment. Anyone can read and Pith papers without signing in.