Self-Evolving Cognitive Framework via Causal World Modeling for Embodied Scientific Intelligence

Tetsunari Inamura; Yi Yu

arxiv: 2606.22449 · v1 · pith:SLLH5XOHnew · submitted 2026-06-21 · 💻 cs.AI · cs.RO

Self-Evolving Cognitive Framework via Causal World Modeling for Embodied Scientific Intelligence

Yi Yu , Tetsunari Inamura This is my paper

Pith reviewed 2026-06-26 11:03 UTC · model grok-4.3

classification 💻 cs.AI cs.RO

keywords causal world modelingembodied intelligenceself-evolving systemscausal discoveryintervention-driven reasoningcounterfactual reasoningepistemic intelligencecontinual cognitive refinement

0 comments

The pith

Embodied agents evolve their cognition by continually revising causal world models through discovery, interventions, and counterfactual reasoning.

A machine-rendered reading of the paper's core claim, the machinery that carries it, and where it could break.

The paper argues that embodied intelligence should shift from optimizing predictive world models, which falter on distribution shifts and unseen cases, to self-evolving systems that build and update internal causal representations. It proposes a framework combining causal world modeling, intervention-driven causal reasoning, and continual cognitive refinement to achieve this. The central idea is that agents revise their causal models via causal discovery, feedback from interventions, and counterfactual thinking, allowing cognition itself to improve over time. Embodied interaction is reframed as an epistemic process for generating hypotheses, conducting experiments, and acquiring knowledge rather than merely optimizing trajectories. This supplies a conceptual basis for moving toward epistemic intelligence grounded in ongoing causal model construction and refinement.

Core claim

The proposed self-evolving cognitive framework integrates causal world modeling, intervention-driven causal reasoning, and continual cognitive refinement. The framework continuously revises and expands its internal causal world model through causal discovery, intervention-driven feedback, and counterfactual reasoning, supporting continual cognitive refinement and enabling cognition itself to evolve over time. Embodied interaction is reinterpreted as an epistemic process for causal hypothesis generation, intervention-driven experimentation, and continual knowledge acquisition, providing a foundation for a transition from predictive intelligence to epistemic intelligence.

What carries the argument

The self-evolving cognitive framework that integrates causal world modeling, intervention-driven causal reasoning, and continual cognitive refinement to revise internal causal representations.

If this is right

Embodied interaction functions as hypothesis generation and experimentation rather than trajectory optimization alone.
Systems achieve better generalization under distribution shifts through causal rather than predictive modeling.
An intervention-driven causal-epistemic benchmarking paradigm evaluates progress in self-evolving embodied scientific intelligence.
Cognition emerges and improves through repeated cycles of causal model construction, revision, and refinement via environment interaction.

Where Pith is reading between the lines

These are editorial extensions of the paper, not claims the author makes directly.

The framework could extend to active learning settings where agents select interventions to maximize causal information gain.
In robotics applications, this might enable adaptation to novel objects or tasks by updating causal beliefs rather than retraining predictors.
Simulated environments with controlled noise levels could test whether the refinement loop remains stable when observations are incomplete.
This view connects to questions in cognitive science about how humans form and revise causal theories through play and experimentation.

Load-bearing premise

Causal discovery and intervention-driven feedback can be integrated into embodied systems to produce genuine continual refinement despite real-world noise and partial observability.

What would settle it

An experiment in which an embodied agent executes an intervention, receives noisy or partial feedback, and fails to revise its causal model to correctly predict subsequent outcomes.

Figures

Figures reproduced from arXiv: 2606.22449 by Tetsunari Inamura, Yi Yu.

read the original abstract

Current embodied world models are primarily optimized for predictive objectives, limiting their ability to generalize under distribution shifts and reason systematically about unseen situations and hypothetical interventions. We argue that embodied intelligence should move beyond predictive world modeling toward self-evolving cognitive systems that continually construct and refine internal causal representations through interaction with the environment. To this end, we propose a self-evolving cognitive framework via causal world modeling for embodied scientific intelligence, which integrates three complementary components: causal world modeling, intervention-driven causal reasoning, and continual cognitive refinement. The proposed framework continuously revises and expands its internal causal world model through causal discovery, intervention-driven feedback, and counterfactual reasoning, supporting continual cognitive refinement and enabling cognition itself to evolve over time. Furthermore, we reinterpret embodied interaction not merely as a means of trajectory optimization, but as an epistemic process for causal hypothesis generation, intervention-driven experimentation, and continual knowledge acquisition. This work provides a conceptual and theoretical foundation for a transition from predictive intelligence toward epistemic intelligence, in which intelligence emerges through the continual construction, revision, and refinement of causal world models via interaction with the environment. Accordingly, an intervention-driven causal-epistemic benchmarking paradigm is suggested for evaluating self-evolving embodied scientific intelligence.

Editorial analysis

A structured set of objections, weighed in public.

Desk editor's note, referee report, simulated authors' rebuttal, and a circularity audit. Tearing a paper down is the easy half of reading it; the pith above is the substance, this is the friction.

Desk Editor's Note private letter to a colleague

This is a high-level conceptual proposal for causal self-evolving embodied AI with no algorithms, equations, or results to back the claims.

read the letter

The paper's core idea is a framework that shifts embodied AI from predictive world models to ones that continually revise causal representations through discovery, interventions, and counterfactuals. It frames interaction itself as an epistemic process for knowledge building rather than just control.

What it does is synthesize causal AI concepts with embodied settings and suggest an intervention-based benchmark. The motivation section makes a reasonable case that pure prediction limits generalization under shifts, and the three-component breakdown (causal modeling, intervention reasoning, continual refinement) is presented cleanly.

The problems are straightforward and central. The entire argument stays at the level of naming components and stating that they integrate via causal discovery and feedback. There are no update rules, no treatment of real-world issues like partial observability or sensor noise, and no example of how the model would actually expand or revise itself over time. The self-evolution claim therefore rests on the assumption that these processes will work as described, without any demonstration or even pseudocode. That leaves the main contribution as a restatement of existing causal ideas rather than a workable advance.

This is aimed at researchers who like high-level architectural thinking in robotics and scientific AI. Someone looking for methods they could implement or test will not find them here. The thinking is coherent as a proposal, but the absence of technical substance means it does not reach the threshold for serious refereeing.

I would not send it to peer review. It needs concrete mechanisms or at least a worked example before it could be evaluated as research.

Referee Report

2 major / 2 minor

Summary. The paper claims that embodied world models are limited by predictive objectives and proposes a self-evolving cognitive framework for embodied scientific intelligence. This framework integrates causal world modeling, intervention-driven causal reasoning, and continual cognitive refinement to enable continuous revision and expansion of internal causal representations via causal discovery, intervention-driven feedback, and counterfactual reasoning. Embodied interaction is reinterpreted as an epistemic process for hypothesis generation and knowledge acquisition, with a suggested intervention-driven causal-epistemic benchmarking paradigm for evaluation.

Significance. If operationalized, the framework could shift embodied AI from predictive to epistemic intelligence, enabling systems that actively construct, test, and refine causal models through interaction. This has potential implications for scientific discovery in robotics and autonomous agents by grounding cognition in causal understanding rather than trajectory optimization.

major comments (2)

[Abstract] Abstract: The central claim that the framework 'continuously revises and expands its internal causal world model through causal discovery, intervention-driven feedback, and counterfactual reasoning' is stated without any formal update rules, pseudocode, or integration mechanisms, rendering the assertion that 'cognition itself [can] evolve over time' an ungrounded assertion rather than a derivable property.
[The proposed framework (main text description of components)] The proposed framework description: No mechanisms are specified for integrating the three components under partial observability, sensor noise, or non-stationary dynamics, which are load-bearing for the feasibility of continual refinement in embodied settings.

minor comments (2)

The abstract and introduction would benefit from explicit comparison to related work in causal reinforcement learning and active learning to clarify novelty.
Terminology such as 'epistemic intelligence' and 'self-evolving' is used without precise definitions, which could be clarified for readers.

Simulated Author's Rebuttal

2 responses · 0 unresolved

We thank the referee for the detailed and constructive feedback. The comments correctly identify that the manuscript is a high-level conceptual proposal rather than an implemented system with executable mechanisms. We address each point below and indicate planned revisions.

read point-by-point responses

Referee: [Abstract] Abstract: The central claim that the framework 'continuously revises and expands its internal causal world model through causal discovery, intervention-driven feedback, and counterfactual reasoning' is stated without any formal update rules, pseudocode, or integration mechanisms, rendering the assertion that 'cognition itself [can] evolve over time' an ungrounded assertion rather than a derivable property.

Authors: We agree that the manuscript is a theoretical framework paper and does not supply formal update rules or pseudocode. The central claim is presented as a high-level description of the intended epistemic process rather than a derivable algorithmic property. In the revision we will add an explicit limitations paragraph in the abstract and a new subsection in the framework description that outlines candidate formalization routes (e.g., iterative causal discovery operators and intervention feedback loops) drawn from existing causal inference literature, while clarifying that concrete pseudocode belongs to future empirical instantiations. revision: partial
Referee: [The proposed framework (main text description of components)] The proposed framework description: No mechanisms are specified for integrating the three components under partial observability, sensor noise, or non-stationary dynamics, which are load-bearing for the feasibility of continual refinement in embodied settings.

Authors: The comment is accurate: the current text does not detail integration mechanisms under partial observability, sensor noise, or non-stationary dynamics. Because the manuscript is positioned as a conceptual foundation, these engineering considerations were left for subsequent work. We will revise the framework section to include a dedicated paragraph sketching how each component can be realized under those conditions (e.g., robust causal discovery under noisy observations and adaptive intervention scheduling for non-stationarity), referencing relevant literature on causal inference in uncertain environments. revision: yes

Circularity Check

0 steps flagged

No circularity: purely conceptual proposal with no equations, derivations, or fitted predictions

full rationale

The paper contains no equations, parameter fittings, or formal derivations. Its central claims consist of definitional statements about a proposed framework that integrates named components (causal world modeling, intervention-driven reasoning, continual refinement) via the same processes it describes. Because there are no load-bearing mathematical steps, no self-citation chains invoking uniqueness theorems, and no 'predictions' that reduce to fitted inputs, the text does not exhibit any of the enumerated circularity patterns. The framework is presented as a conceptual foundation rather than a derived result, making the derivation chain self-contained by absence of any reduction to inspect.

Axiom & Free-Parameter Ledger

0 free parameters · 2 axioms · 1 invented entities

The ledger reflects the abstract-only nature of the review; the proposal introduces the framework as a new entity without independent evidence or derivations.

axioms (2)

domain assumption Predictive objectives inherently limit generalization under distribution shifts in embodied settings
Stated as the core argument against current embodied world models.
domain assumption Causal representations can be continually constructed and refined through interaction
Central premise of the proposed framework.

invented entities (1)

self-evolving cognitive framework via causal world modeling no independent evidence
purpose: To enable continual cognitive refinement and epistemic intelligence in embodied systems
Introduced as the main contribution; no independent evidence or falsifiable predictions provided.

pith-pipeline@v0.9.1-grok · 5736 in / 1419 out tokens · 28381 ms · 2026-06-26T11:03:57.865370+00:00 · methodology

discussion (0)

Reference graph

Works this paper leans on

24 extracted references · 17 canonical work pages · 7 internal anchors

[1]

Konstantinos Bousmalis, Giulia Vezzani, et al . 2023. RoboCat: A Self-Improving Foundation Agent for Robotic Manipulation.arXiv preprint arXiv:2306.11706(2023). doi:10.48550/arXiv.2306.11706

work page doi:10.48550/arxiv.2306.11706 2023
[2]

Anthony Brohan et al. 2023. RT-2: Vision-Language-Action Models Transfer Web Knowledge to Robotic Control.arXiv preprint arXiv:2307.15818 (2023). doi:10.48550/arXiv.2307.15818

work page internal anchor Pith review Pith/arXiv arXiv doi:10.48550/arxiv.2307.15818 2023
[3]

Anthony Brohan, Noah Brown, et al. 2022. RT-1: Robotics Transformer for Real-World Control at Scale.arXiv preprint arXiv:2212.06817(2022)

Pith/arXiv arXiv 2022
[4]

Lars Buesing, Theophane Weber, et al. 2019. Woulda, Coulda, Shoulda: Counterfactually-Guided Policy Search.arXiv preprint arXiv:1811.06272 (2019). doi:10.48550/arXiv.1811.06272

work page internal anchor Pith review Pith/arXiv arXiv doi:10.48550/arxiv.1811.06272 2019
[5]

Jingtao Ding, Yunke Zhang, et al. 2025. Understanding World or Predicting Future? A Comprehensive Survey of World Models.Comput. Surveys58, 3 (2025). doi:10.1145/3746449

work page doi:10.1145/3746449 2025
[6]

Danny Driess, Fei Xia, et al. 2023. PaLM-E: An Embodied Multimodal Language Model.arXiv preprint arXiv:2303.03378(2023). doi:10.48550/arXiv. 2303.03378

work page internal anchor Pith review Pith/arXiv arXiv doi:10.48550/arxiv 2023
[7]

Fikes and Nils J

Richard E. Fikes and Nils J. Nilsson. 1971. STRIPS: A New Approach to the Application of Theorem Proving to Problem Solving.Artificial Intelligence 2, 3–4 (1971), 189–208. doi:10.1016/0004-3702(71)90010-5

work page doi:10.1016/0004-3702(71)90010-5 1971
[8]

David Ha and Jürgen Schmidhuber. 2018. World Models.arXiv preprint arXiv:1803.10122(2018). doi:10.48550/arXiv.1803.10122

work page internal anchor Pith review Pith/arXiv arXiv doi:10.48550/arxiv.1803.10122 2018
[9]

Danijar Hafner et al. 2023. Mastering Diverse Control Tasks through World Models.arXiv preprint arXiv:2301.04104(2023). doi:10.48550/arXiv.2301. 04104

work page internal anchor Pith review Pith/arXiv arXiv doi:10.48550/arxiv.2301 2023
[10]

Danijar Hafner, Timothy Lillicrap, Jimmy Ba, and Mohammad Norouzi. 2019. Learning Latent Dynamics for Planning from Pixels.Proceedings of the 36th International Conference on Machine Learning97 (2019)

2019
[11]

Danijar Hafner, Timothy Lillicrap, Jimmy Ba, and Mohammad Norouzi. 2020. Dream to Control: Learning Behaviors by Latent Imagination. International Conference on Learning Representations(2020)

2020
[12]

Tom He, Jasmina Gajcin, and Ivana Dusparic. 2022. Causal Counterfactuals for Improving the Robustness of Reinforcement Learning.arXiv preprint arXiv:2211.05551(2022). doi:10.48550/arXiv.2211.05551

work page doi:10.48550/arxiv.2211.05551 2022
[13]

Ruofei Ju, Xinrui Wang, et al. 2026. EmbodiSkill: Skill-Aware Reflection for Self-Evolving Embodied Agents.arXiv preprint arXiv:2605.10332(2026)

Pith/arXiv arXiv 2026
[14]

Leslie Pack Kaelbling and Tomas Lozano-Perez. 2011. Hierarchical Task and Motion Planning in the Now.IEEE International Conference on Robotics and Automation(2011)

2011
[15]

Moo Jin Kim et al. 2024. OpenVLA: An Open-Source Vision-Language-Action Model.arXiv preprint arXiv:2406.09246(2024)

Pith/arXiv arXiv 2024
[16]

Matthias De Lange, Rahaf Aljundi, et al. 2021. A Continual Learning Survey: Defying Forgetting in Classification Tasks.IEEE Transactions on Pattern Analysis and Machine Intelligence44, 7 (2021), 3366–3385. doi:10.1109/TPAMI.2021.3057446

work page doi:10.1109/tpami.2021.3057446 2021
[17]

Part, Christopher Kanan, and Stefan Wermter

German I. Parisi, Ronald Kemker, Jose L. Part, Christopher Kanan, and Stefan Wermter. 2019. Continual Lifelong Learning with Neural Networks: A Review.Neural Networks113 (2019), 54–71. doi:10.1016/j.neunet.2019.01.012

work page doi:10.1016/j.neunet.2019.01.012 2019
[18]

2009.Causality: Models, Reasoning, and Inference(2nd ed.)

Judea Pearl. 2009.Causality: Models, Reasoning, and Inference(2nd ed.). Cambridge University Press. doi:10.1017/CBO9780511803161

work page doi:10.1017/cbo9780511803161 2009
[19]

2017.Elements of Causal Inference: Foundations and Learning Algorithms

Jonas Peters, Dominik Janzing, and BernhardSchölkopf. 2017.Elements of Causal Inference: Foundations and Learning Algorithms. MIT Press

2017
[20]

Scott Reed et al. 2022. A Generalist Agent.arXiv preprint arXiv:2205.06175(2022). doi:10.48550/arXiv.2205.06175

work page internal anchor Pith review Pith/arXiv arXiv doi:10.48550/arxiv.2205.06175 2022
[21]

Schölkopfand Locatello Schölkopf, Francesco, Bauer, et al . 2021. Toward Causal Representation Learning.Proc. IEEE109, 5 (2021), 612–634. doi:10.1109/JPROC.2021.3058954

work page doi:10.1109/jproc.2021.3058954 2021
[22]

Julian Schrittwieser, Ioannis Antonoglou, et al. 2020. Mastering Atari, Go, Chess and Shogi by Planning with a Learned Model.Nature588, 7839 (2020), 604–609. doi:10.1038/s41586-020-03051-4

work page internal anchor Pith review doi:10.1038/s41586-020-03051-4 2020
[23]

Zhongwei Yu, Jingqing Ruan, and Dengpeng Xing. 2023. Explainable Reinforcement Learning via a Causal World Model.arXiv preprint arXiv:2305.02749(2023). doi:10.48550/arXiv.2305.02749

work page doi:10.48550/arxiv.2305.02749 2023
[24]

Yan Zeng, Ruichu Cai, Fuchun Sun, Libo Huang, and Zhifeng Hao. 2025. A Survey on Causal Reinforcement Learning.IEEE Transactions on Neural Networks and Learning Systems36 (2025), 5942–5962. doi:10.1109/TNNLS.2024.3403001 Manuscript submitted to ACM

work page doi:10.1109/tnnls.2024.3403001 2025

[1] [1]

Konstantinos Bousmalis, Giulia Vezzani, et al . 2023. RoboCat: A Self-Improving Foundation Agent for Robotic Manipulation.arXiv preprint arXiv:2306.11706(2023). doi:10.48550/arXiv.2306.11706

work page doi:10.48550/arxiv.2306.11706 2023

[2] [2]

Anthony Brohan et al. 2023. RT-2: Vision-Language-Action Models Transfer Web Knowledge to Robotic Control.arXiv preprint arXiv:2307.15818 (2023). doi:10.48550/arXiv.2307.15818

work page internal anchor Pith review Pith/arXiv arXiv doi:10.48550/arxiv.2307.15818 2023

[3] [3]

Anthony Brohan, Noah Brown, et al. 2022. RT-1: Robotics Transformer for Real-World Control at Scale.arXiv preprint arXiv:2212.06817(2022)

Pith/arXiv arXiv 2022

[4] [4]

Lars Buesing, Theophane Weber, et al. 2019. Woulda, Coulda, Shoulda: Counterfactually-Guided Policy Search.arXiv preprint arXiv:1811.06272 (2019). doi:10.48550/arXiv.1811.06272

work page internal anchor Pith review Pith/arXiv arXiv doi:10.48550/arxiv.1811.06272 2019

[5] [5]

Jingtao Ding, Yunke Zhang, et al. 2025. Understanding World or Predicting Future? A Comprehensive Survey of World Models.Comput. Surveys58, 3 (2025). doi:10.1145/3746449

work page doi:10.1145/3746449 2025

[6] [6]

Danny Driess, Fei Xia, et al. 2023. PaLM-E: An Embodied Multimodal Language Model.arXiv preprint arXiv:2303.03378(2023). doi:10.48550/arXiv. 2303.03378

work page internal anchor Pith review Pith/arXiv arXiv doi:10.48550/arxiv 2023

[7] [7]

Fikes and Nils J

Richard E. Fikes and Nils J. Nilsson. 1971. STRIPS: A New Approach to the Application of Theorem Proving to Problem Solving.Artificial Intelligence 2, 3–4 (1971), 189–208. doi:10.1016/0004-3702(71)90010-5

work page doi:10.1016/0004-3702(71)90010-5 1971

[8] [8]

David Ha and Jürgen Schmidhuber. 2018. World Models.arXiv preprint arXiv:1803.10122(2018). doi:10.48550/arXiv.1803.10122

work page internal anchor Pith review Pith/arXiv arXiv doi:10.48550/arxiv.1803.10122 2018

[9] [9]

Danijar Hafner et al. 2023. Mastering Diverse Control Tasks through World Models.arXiv preprint arXiv:2301.04104(2023). doi:10.48550/arXiv.2301. 04104

work page internal anchor Pith review Pith/arXiv arXiv doi:10.48550/arxiv.2301 2023

[10] [10]

Danijar Hafner, Timothy Lillicrap, Jimmy Ba, and Mohammad Norouzi. 2019. Learning Latent Dynamics for Planning from Pixels.Proceedings of the 36th International Conference on Machine Learning97 (2019)

2019

[11] [11]

Danijar Hafner, Timothy Lillicrap, Jimmy Ba, and Mohammad Norouzi. 2020. Dream to Control: Learning Behaviors by Latent Imagination. International Conference on Learning Representations(2020)

2020

[12] [12]

Tom He, Jasmina Gajcin, and Ivana Dusparic. 2022. Causal Counterfactuals for Improving the Robustness of Reinforcement Learning.arXiv preprint arXiv:2211.05551(2022). doi:10.48550/arXiv.2211.05551

work page doi:10.48550/arxiv.2211.05551 2022

[13] [13]

Ruofei Ju, Xinrui Wang, et al. 2026. EmbodiSkill: Skill-Aware Reflection for Self-Evolving Embodied Agents.arXiv preprint arXiv:2605.10332(2026)

Pith/arXiv arXiv 2026

[14] [14]

Leslie Pack Kaelbling and Tomas Lozano-Perez. 2011. Hierarchical Task and Motion Planning in the Now.IEEE International Conference on Robotics and Automation(2011)

2011

[15] [15]

Moo Jin Kim et al. 2024. OpenVLA: An Open-Source Vision-Language-Action Model.arXiv preprint arXiv:2406.09246(2024)

Pith/arXiv arXiv 2024

[16] [16]

Matthias De Lange, Rahaf Aljundi, et al. 2021. A Continual Learning Survey: Defying Forgetting in Classification Tasks.IEEE Transactions on Pattern Analysis and Machine Intelligence44, 7 (2021), 3366–3385. doi:10.1109/TPAMI.2021.3057446

work page doi:10.1109/tpami.2021.3057446 2021

[17] [17]

Part, Christopher Kanan, and Stefan Wermter

German I. Parisi, Ronald Kemker, Jose L. Part, Christopher Kanan, and Stefan Wermter. 2019. Continual Lifelong Learning with Neural Networks: A Review.Neural Networks113 (2019), 54–71. doi:10.1016/j.neunet.2019.01.012

work page doi:10.1016/j.neunet.2019.01.012 2019

[18] [18]

2009.Causality: Models, Reasoning, and Inference(2nd ed.)

Judea Pearl. 2009.Causality: Models, Reasoning, and Inference(2nd ed.). Cambridge University Press. doi:10.1017/CBO9780511803161

work page doi:10.1017/cbo9780511803161 2009

[19] [19]

2017.Elements of Causal Inference: Foundations and Learning Algorithms

Jonas Peters, Dominik Janzing, and BernhardSchölkopf. 2017.Elements of Causal Inference: Foundations and Learning Algorithms. MIT Press

2017

[20] [20]

Scott Reed et al. 2022. A Generalist Agent.arXiv preprint arXiv:2205.06175(2022). doi:10.48550/arXiv.2205.06175

work page internal anchor Pith review Pith/arXiv arXiv doi:10.48550/arxiv.2205.06175 2022

[21] [21]

Schölkopfand Locatello Schölkopf, Francesco, Bauer, et al . 2021. Toward Causal Representation Learning.Proc. IEEE109, 5 (2021), 612–634. doi:10.1109/JPROC.2021.3058954

work page doi:10.1109/jproc.2021.3058954 2021

[22] [22]

Julian Schrittwieser, Ioannis Antonoglou, et al. 2020. Mastering Atari, Go, Chess and Shogi by Planning with a Learned Model.Nature588, 7839 (2020), 604–609. doi:10.1038/s41586-020-03051-4

work page internal anchor Pith review doi:10.1038/s41586-020-03051-4 2020

[23] [23]

Zhongwei Yu, Jingqing Ruan, and Dengpeng Xing. 2023. Explainable Reinforcement Learning via a Causal World Model.arXiv preprint arXiv:2305.02749(2023). doi:10.48550/arXiv.2305.02749

work page doi:10.48550/arxiv.2305.02749 2023

[24] [24]

Yan Zeng, Ruichu Cai, Fuchun Sun, Libo Huang, and Zhifeng Hao. 2025. A Survey on Causal Reinforcement Learning.IEEE Transactions on Neural Networks and Learning Systems36 (2025), 5942–5962. doi:10.1109/TNNLS.2024.3403001 Manuscript submitted to ACM

work page doi:10.1109/tnnls.2024.3403001 2025