Topology-Aware State Abstraction with Tangle Cores for Markov Decision Processes

Anuj Sharma; Ibne Farabi Shihab; Sanjeda Akter

arxiv: 2606.00427 · v1 · pith:7D7DHRMAnew · submitted 2026-05-29 · 💻 cs.LG

Topology-Aware State Abstraction with Tangle Cores for Markov Decision Processes

Ibne Farabi Shihab , Sanjeda Akter , Anuj Sharma This is my paper

Pith reviewed 2026-06-28 22:49 UTC · model grok-4.3

classification 💻 cs.LG

keywords state abstractionMarkov decision processesgraph tanglesoverlapping abstractionsreinforcement learningbottleneck statesvalue preservationtopology aware abstraction

0 comments

The pith

Tangle cores yield value-preserving overlapping abstractions for MDPs

A machine-rendered reading of the paper's core claim, the machinery that carries it, and where it could break.

The paper introduces tangle-core abstraction as a way to handle shared interface states in navigation and graph-based decision problems that hard partitions miss. It builds abstract states from consistently oriented low-order separations identified by graph tangles and represents overlaps through a membership kernel. Value-preservation guarantees are given for the resulting abstract MDP under an action-consistency condition, along with an error decomposition separating interior homogeneity from boundary leakage. A quantitative result shows when hard partitions pay an avoidable boundary error. Experiments demonstrate better compression to return tradeoffs than several baselines in tabular domains, mazes, and MiniGrid, with a noted failure case when topology carries no information.

Core claim

Tangle-core abstraction constructs abstract states from consistently oriented low-order separations in the empirical transition graph and represents shared interfaces through a membership kernel rather than a hard partition, yielding value-preservation guarantees for the induced overlapping abstract MDP under an explicit action-consistency condition, an interior-homogeneity/boundary-leakage error decomposition, and a quantitative interface-overlap result on avoidable boundary error in hard partitions.

What carries the argument

Tangle-core abstraction, which identifies abstract states via graph tangles on the empirical transition graph and uses a membership kernel to encode overlapping participation in multiple regions.

If this is right

Value preservation holds in the abstract MDP whenever the action-consistency condition is satisfied.
Overlapping abstractions avoid boundary errors that hard partitions incur at interface states.
Favorable compression-return tradeoffs are achieved in bottlenecked tabular domains, procedurally generated mazes, and MiniGrid environments.
The method offers little benefit in regimes where transition topology is uninformative.

Where Pith is reading between the lines

These are editorial extensions of the paper, not claims the author makes directly.

Applying tangle cores to continuous or high-dimensional state spaces would require approximating the empirical transition graph from samples.
Combining tangle-core abstractions with function approximation could extend the guarantees to deep reinforcement learning settings.
The interior-boundary error split suggests a diagnostic for choosing between topology-aware and reward-aware abstraction methods.
Similar tangle-based overlaps might apply to hierarchical task decomposition in multi-agent MDPs.

Load-bearing premise

The empirical transition graph must exhibit consistently oriented low-order separations that graph tangles can extract, and the action-consistency condition must hold for the abstract MDP to preserve value.

What would settle it

A counterexample MDP with interface states where the action-consistency condition fails, leading to a measurable difference in value between the original and the overlapping abstract MDP, or an empirical evaluation in a bottleneck domain showing no compression-return advantage over hard-partition baselines.

Figures

Figures reproduced from arXiv: 2606.00427 by Anuj Sharma, Ibne Farabi Shihab, Sanjeda Akter.

**Figure 2.** Figure 2: Schematic illustration, not an empirical output of the algorithm. Three densely connected [PITH_FULL_IMAGE:figures/full_fig_p004_2.png] view at source ↗

read the original abstract

State abstraction in reinforcement learning is usually formulated as a partition of states based on reward and transition similarity. This excludes a common structural pattern in navigation, graph, and hierarchical decision problems: interface states such as doors, hubs, and bottlenecks naturally participate in more than one region. We introduce \emph{tangle-core abstraction}, an overlapping state-abstraction framework based on graph tangles of empirical transition graphs. The method constructs abstract states from consistently oriented low-order separations and represents shared interfaces through a membership kernel rather than a hard partition. We give value-preservation guarantees for the induced overlapping abstract MDP under an explicit action-consistency condition, identify an interior-homogeneity/boundary-leakage error decomposition, and prove a quantitative interface-overlap result showing when hard partitions incur an avoidable boundary error. Empirically, tangle-core abstractions achieve favorable compression--return tradeoffs against reward-aware, learned, topological-map, and graph-partitioning baselines across bottlenecked tabular domains, procedurally generated mazes, and MiniGrid representations. We also identify a clear failure regime in which transition topology is uninformative, where tangles predictably offer little benefit. These results position graph tangles as an effective topology-aware abstraction prior for decision problems with shared interface structure.

Editorial analysis

A structured set of objections, weighed in public.

Desk editor's note, referee report, simulated authors' rebuttal, and a circularity audit. Tearing a paper down is the easy half of reading it; the pith above is the substance, this is the friction.

Desk Editor's Note private letter to a colleague

Tangle cores give a way to build overlapping MDP abstractions from graph tangles on transition structure, with conditional value preservation and an explicit error split, but the gains are scoped to domains where interfaces dominate and the action-consistency condition holds.

read the letter

The paper's core move is to treat shared interface states (doors, bottlenecks) as overlap rather than forcing a hard partition, then use graph tangles on the empirical transition graph to define abstract states via a membership kernel. It states value-preservation for the resulting abstract MDP when an action-consistency condition is met, decomposes error into interior homogeneity versus boundary leakage, and gives a quantitative result on when hard partitions pay an avoidable cost. Empirically it reports better compression-return curves than reward-aware, learned, topological-map, and graph-partition baselines on bottlenecked tabular problems, procedural mazes, and MiniGrid.

What is new is the tangle-based construction for overlap and the explicit interface-overlap theorem; the failure regime where topology is uninformative is also called out plainly. The work is grounded in a concrete structural pattern that shows up in navigation and hierarchical settings.

The main limitation is that everything rests on the action-consistency condition actually holding in the target domain; when it does not, the guarantees do not apply. The abstract supplies no proof sketches, so the tightness of the decomposition or the overlap result cannot be checked from the given text. Experiments are limited to the domains where the structural assumption is plausible, and implementation details for extracting the tangles or building the kernel are not visible here.

This is for readers already working on state abstraction or hierarchical RL who need to handle interface states without losing value. It is not a universal method. A serious referee should see it because the framing is honest about its scope, the claims are stated conditionally rather than universally, and the empirical comparison is against relevant baselines. It is worth the review time even if the proofs need expansion.

Referee Report

2 major / 2 minor

Summary. The manuscript introduces tangle-core abstraction, an overlapping state-abstraction method for MDPs that extracts abstract states from consistently oriented low-order separations in empirical transition graphs and represents shared interfaces (e.g., bottlenecks) via a membership kernel rather than hard partitions. It states value-preservation guarantees for the induced overlapping abstract MDP under an explicit action-consistency condition, identifies an interior-homogeneity/boundary-leakage error decomposition, proves a quantitative interface-overlap result on when hard partitions incur avoidable boundary error, and reports favorable compression-return tradeoffs versus reward-aware, learned, topological-map, and graph-partitioning baselines on bottlenecked tabular domains, procedurally generated mazes, and MiniGrid environments, while identifying a failure regime where transition topology is uninformative.

Significance. If the stated guarantees and decomposition are rigorously derived, the work supplies a topology-aware prior that directly addresses overlapping interface structure common in navigation and hierarchical MDPs, together with an explicit conditioning on action-consistency and a documented failure case. The empirical scope across multiple environment classes and the comparison to several baseline families would make the contribution practically relevant for abstraction design.

major comments (2)

[Abstract] Abstract (guarantees paragraph): the value-preservation claim, interior-homogeneity/boundary-leakage decomposition, and quantitative interface-overlap result are asserted without any proof sketch, assumption list, or derivation outline; because these are load-bearing for the central theoretical contribution, the manuscript must supply at least a high-level proof strategy and the precise statement of the action-consistency condition.
[Empirical evaluation] Empirical section (comparison tables): the reported compression-return tradeoffs are presented without quantitative verification that the action-consistency condition held in the evaluated domains; if the condition is violated, the observed gains cannot be attributed to the value-preservation guarantee, weakening the theory-practice link.

minor comments (2)

[Notation] Notation for the membership kernel should be introduced once and used uniformly; currently the abstract and later sections appear to employ slightly varying symbols.
[Discussion] The failure-regime paragraph would benefit from a short illustrative example (e.g., a fully connected graph) to make the boundary condition concrete for readers.

Simulated Author's Rebuttal

2 responses · 0 unresolved

We thank the referee for the constructive feedback on the presentation of the theoretical results and the empirical validation. We address the two major comments point by point below.

read point-by-point responses

Referee: [Abstract] Abstract (guarantees paragraph): the value-preservation claim, interior-homogeneity/boundary-leakage decomposition, and quantitative interface-overlap result are asserted without any proof sketch, assumption list, or derivation outline; because these are load-bearing for the central theoretical contribution, the manuscript must supply at least a high-level proof strategy and the precise statement of the action-consistency condition.

Authors: The manuscript already states the precise action-consistency condition (Definition 3.1) and contains the full proofs of the value-preservation guarantee (Theorem 4.1), the interior-homogeneity/boundary-leakage decomposition (Theorem 4.2), and the quantitative interface-overlap result (Theorem 4.3) in Section 4, together with a high-level proof strategy at the start of that section. To make the abstract self-contained as requested, we will revise the guarantees paragraph to include an explicit one-sentence statement of the action-consistency condition and a brief outline of the proof strategy for the main preservation result. revision: yes
Referee: [Empirical evaluation] Empirical section (comparison tables): the reported compression-return tradeoffs are presented without quantitative verification that the action-consistency condition held in the evaluated domains; if the condition is violated, the observed gains cannot be attributed to the value-preservation guarantee, weakening the theory-practice link.

Authors: We agree that explicit verification would tighten the theory-practice connection. In the bottlenecked tabular domains and procedurally generated mazes the condition holds by construction (deterministic transitions with uniform action semantics). For the MiniGrid environments we will add a post-hoc quantitative check in the revised version, reporting the percentage of abstract states satisfying action consistency; any violations will be discussed with respect to the observed returns. revision: partial

Circularity Check

0 steps flagged

No significant circularity detected

full rationale

The paper presents value-preservation guarantees for the overlapping abstract MDP explicitly conditioned on an action-consistency requirement, together with an interior-homogeneity/boundary-leakage error decomposition and a quantitative interface-overlap result. These are framed as theorems under stated external conditions rather than reductions of outputs to fitted inputs or self-referential definitions. The abstract identifies a clear failure regime where transition topology is uninformative and reports empirical comparisons against external baselines without asserting that the method is forced by construction. No self-citation chains, ansatz smuggling, or renaming of known results appear as load-bearing steps in the provided claims.

Axiom & Free-Parameter Ledger

0 free parameters · 1 axioms · 1 invented entities

The central claims rest on the newly introduced tangle-core construction and the action-consistency condition; no free parameters are mentioned and the only explicit assumption is the action-consistency requirement needed for the value-preservation result.

axioms (1)

domain assumption action-consistency condition
Required for the value-preservation guarantees on the induced overlapping abstract MDP as stated in the abstract.

invented entities (1)

tangle cores no independent evidence
purpose: To construct abstract states from consistently oriented low-order separations in the empirical transition graph and to represent shared interfaces via a membership kernel
New postulated structure introduced to enable overlapping rather than hard-partition abstractions.

pith-pipeline@v0.9.1-grok · 5759 in / 1441 out tokens · 24184 ms · 2026-06-28T22:49:12.324541+00:00 · methodology

discussion (0)

Reference graph

Works this paper leans on

21 extracted references · 4 canonical work pages · 2 internal anchors

[1]

A theory of state abstraction for reinforcement learning

David Abel. A theory of state abstraction for reinforcement learning. InProceedings of the AAAI conference on artificial intelligence, volume 33, pages 9876–9877, 2019

2019
[2]

Fast unfolding of communities in large networks.Journal of statistical mechanics: theory and experiment, 2008(10):P10008, 2008

Vincent D Blondel, Jean-Loup Guillaume, Renaud Lambiotte, and Etienne Lefebvre. Fast unfolding of communities in large networks.Journal of statistical mechanics: theory and experiment, 2008(10):P10008, 2008. 9

2008
[3]

Scalable methods for computing state similarity in deterministic markov decision processes

Pablo Samuel Castro. Scalable methods for computing state similarity in deterministic markov decision processes. InProceedings of the AAAI Conference on Artificial Intelligence, vol- ume 34, pages 10069–10076, 2020

2020
[5]

Minigrid & miniworld: Mod- ular & customizable reinforcement learning environments for goal-oriented tasks.Advances in Neural Information Processing Systems, 36:73383–73394, 2023

Maxime Chevalier-Boisvert, Bolun Dai, Mark Towers, Rodrigo Perez-Vicente, Lucas Willems, Salem Lahlou, Suman Pal, Pablo Samuel Castro, and J K Terry. Minigrid & miniworld: Mod- ular & customizable reinforcement learning environments for goal-oriented tasks.Advances in Neural Information Processing Systems, 36:73383–73394, 2023

2023
[7]

URLhttps://arxiv.org/abs/2006.01830

work page arXiv 2006
[8]

Hierarchical reinforcement learning with the maxq value function de- composition.Journal of artificial intelligence research, 13:227–303, 2000

Thomas G Dietterich. Hierarchical reinforcement learning with the maxq value function de- composition.Journal of artificial intelligence research, 13:227–303, 2000

2000
[9]

Metrics for finite markov decision pro- cesses

Norm Ferns, Prakash Panangaden, and Doina Precup. Metrics for finite markov decision pro- cesses. InProceedings of the 20th Conference on Uncertainty in Artificial Intelligence, UAI ’04, page 162–169, Arlington, Virginia, USA, 2004. AUAI Press. ISBN 0974903906

2004
[10]

Deep- mdp: Learning continuous latent space models for representation learning

Carles Gelada, Saurabh Kumar, Jacob Buckman, Ofir Nachum, and Marc G Bellemare. Deep- mdp: Learning continuous latent space models for representation learning. InInternational conference on machine learning, pages 2170–2179. PMLR, 2019

2019
[11]

Impact of Connectivity on Laplacian Representations in Reinforcement Learning

Tommaso Giorgi, Pierriccardo Olivieri, Keyue Jiang, Laura Toni, and Matteo Papini. Impact of connectivity on laplacian representations in reinforcement learning, 2026. URLhttps: //arxiv.org/abs/2603.08558

work page internal anchor Pith review Pith/arXiv arXiv 2026
[12]

Equivalence notions and model minimiza- tion in markov decision processes.Artificial intelligence, 147(1-2):163–223, 2003

Robert Givan, Thomas Dean, and Matthew Greig. Equivalence notions and model minimiza- tion in markov decision processes.Artificial intelligence, 147(1-2):163–223, 2003

2003
[14]

A fast and high quality multilevel scheme for partitioning irregular graphs.SIAM Journal on scientific Computing, 20(1):359–392, 1998

George Karypis and Vipin Kumar. A fast and high quality multilevel scheme for partitioning irregular graphs.SIAM Journal on scientific Computing, 20(1):359–392, 1998

1998
[15]

Clustering with tangles: Algorithmic framework and theoretical guarantees.Journal of Machine Learning Research, 24(190):1–56, 2023

Solveig Klepper, Christian Elbracht, Diego Fioravanti, Jakob Kneip, Luca Rendsburg, Maxi- milian Teegen, and Ulrike V on Luxburg. Clustering with tangles: Algorithmic framework and theoretical guarantees.Journal of Machine Learning Research, 24(190):1–56, 2023

2023
[16]

Towards a unified theory of state abstrac- tion for mdps.AI&M, 1(2):3, 2006

Lihong Li, Thomas J Walsh, and Michael L Littman. Towards a unified theory of state abstrac- tion for mdps.AI&M, 1(2):3, 2006

2006
[17]

A laplacian framework for option discovery in reinforcement learning

Marlos C Machado, Marc G Bellemare, and Michael Bowling. A laplacian framework for option discovery in reinforcement learning. InInternational conference on machine learning, pages 2295–2304. PMLR, 2017

2017
[18]

Proto-value functions: A laplacian framework for learning representation and control in markov decision processes.Journal of Machine Learn- ing Research, 8(10), 2007

Sridhar Mahadevan and Mauro Maggioni. Proto-value functions: A laplacian framework for learning representation and control in markov decision processes.Journal of Machine Learn- ing Research, 8(10), 2007

2007
[19]

Graph minors

Neil Robertson and Paul D Seymour. Graph minors. x. obstructions to tree-decomposition. Journal of Combinatorial Theory, Series B, 52(2):153–190, 1991

1991
[20]

Semi-parametric Topological Memory for Navigation

Nikolay Savinov, Alexey Dosovitskiy, and Vladlen Koltun. Semi-parametric topological mem- ory for navigation.CoRR, abs/1803.00653, 2018. URLhttp://arxiv.org/abs/1803. 00653. 10

work page internal anchor Pith review Pith/arXiv arXiv 2018
[21]

Identifying useful subgoals in reinforce- ment learning by local graph partitioning

Özgür ¸ Sim¸ sek, Alicia P Wolfe, and Andrew G Barto. Identifying useful subgoals in reinforce- ment learning by local graph partitioning. InProceedings of the 22nd international conference on Machine learning, pages 816–823, 2005

2005
[22]

Between mdps and semi-mdps: A frame- work for temporal abstraction in reinforcement learning.Artificial intelligence, 112(1-2):181– 211, 1999

Richard S Sutton, Doina Precup, and Satinder Singh. Between mdps and semi-mdps: A frame- work for temporal abstraction in reinforcement learning.Artificial intelligence, 112(1-2):181– 211, 1999

1999
[23]

A tutorial on spectral clustering.Statistics and computing, 17(4):395– 416, 2007

Ulrike V on Luxburg. A tutorial on spectral clustering.Statistics and computing, 17(4):395– 416, 2007

2007
[24]

arXiv preprint arXiv:2006.10742 , year=

Amy Zhang, Rowan McAllister, Roberto Calandra, Yarin Gal, and Sergey Levine. Learn- ing invariant representations for reinforcement learning without reconstruction.CoRR, abs/2006.10742, 2020. URLhttps://arxiv.org/abs/2006.10742. A Related Work State abstraction in reinforcement learning has a long history as a way to trade representational com- plexity fo...

work page arXiv 2006

[1] [1]

A theory of state abstraction for reinforcement learning

David Abel. A theory of state abstraction for reinforcement learning. InProceedings of the AAAI conference on artificial intelligence, volume 33, pages 9876–9877, 2019

2019

[2] [2]

Fast unfolding of communities in large networks.Journal of statistical mechanics: theory and experiment, 2008(10):P10008, 2008

Vincent D Blondel, Jean-Loup Guillaume, Renaud Lambiotte, and Etienne Lefebvre. Fast unfolding of communities in large networks.Journal of statistical mechanics: theory and experiment, 2008(10):P10008, 2008. 9

2008

[3] [3]

Scalable methods for computing state similarity in deterministic markov decision processes

Pablo Samuel Castro. Scalable methods for computing state similarity in deterministic markov decision processes. InProceedings of the AAAI Conference on Artificial Intelligence, vol- ume 34, pages 10069–10076, 2020

2020

[4] [5]

Minigrid & miniworld: Mod- ular & customizable reinforcement learning environments for goal-oriented tasks.Advances in Neural Information Processing Systems, 36:73383–73394, 2023

Maxime Chevalier-Boisvert, Bolun Dai, Mark Towers, Rodrigo Perez-Vicente, Lucas Willems, Salem Lahlou, Suman Pal, Pablo Samuel Castro, and J K Terry. Minigrid & miniworld: Mod- ular & customizable reinforcement learning environments for goal-oriented tasks.Advances in Neural Information Processing Systems, 36:73383–73394, 2023

2023

[5] [7]

URLhttps://arxiv.org/abs/2006.01830

work page arXiv 2006

[6] [8]

Hierarchical reinforcement learning with the maxq value function de- composition.Journal of artificial intelligence research, 13:227–303, 2000

Thomas G Dietterich. Hierarchical reinforcement learning with the maxq value function de- composition.Journal of artificial intelligence research, 13:227–303, 2000

2000

[7] [9]

Metrics for finite markov decision pro- cesses

Norm Ferns, Prakash Panangaden, and Doina Precup. Metrics for finite markov decision pro- cesses. InProceedings of the 20th Conference on Uncertainty in Artificial Intelligence, UAI ’04, page 162–169, Arlington, Virginia, USA, 2004. AUAI Press. ISBN 0974903906

2004

[8] [10]

Deep- mdp: Learning continuous latent space models for representation learning

Carles Gelada, Saurabh Kumar, Jacob Buckman, Ofir Nachum, and Marc G Bellemare. Deep- mdp: Learning continuous latent space models for representation learning. InInternational conference on machine learning, pages 2170–2179. PMLR, 2019

2019

[9] [11]

Impact of Connectivity on Laplacian Representations in Reinforcement Learning

Tommaso Giorgi, Pierriccardo Olivieri, Keyue Jiang, Laura Toni, and Matteo Papini. Impact of connectivity on laplacian representations in reinforcement learning, 2026. URLhttps: //arxiv.org/abs/2603.08558

work page internal anchor Pith review Pith/arXiv arXiv 2026

[10] [12]

Equivalence notions and model minimiza- tion in markov decision processes.Artificial intelligence, 147(1-2):163–223, 2003

Robert Givan, Thomas Dean, and Matthew Greig. Equivalence notions and model minimiza- tion in markov decision processes.Artificial intelligence, 147(1-2):163–223, 2003

2003

[11] [14]

A fast and high quality multilevel scheme for partitioning irregular graphs.SIAM Journal on scientific Computing, 20(1):359–392, 1998

George Karypis and Vipin Kumar. A fast and high quality multilevel scheme for partitioning irregular graphs.SIAM Journal on scientific Computing, 20(1):359–392, 1998

1998

[12] [15]

Clustering with tangles: Algorithmic framework and theoretical guarantees.Journal of Machine Learning Research, 24(190):1–56, 2023

Solveig Klepper, Christian Elbracht, Diego Fioravanti, Jakob Kneip, Luca Rendsburg, Maxi- milian Teegen, and Ulrike V on Luxburg. Clustering with tangles: Algorithmic framework and theoretical guarantees.Journal of Machine Learning Research, 24(190):1–56, 2023

2023

[13] [16]

Towards a unified theory of state abstrac- tion for mdps.AI&M, 1(2):3, 2006

Lihong Li, Thomas J Walsh, and Michael L Littman. Towards a unified theory of state abstrac- tion for mdps.AI&M, 1(2):3, 2006

2006

[14] [17]

A laplacian framework for option discovery in reinforcement learning

Marlos C Machado, Marc G Bellemare, and Michael Bowling. A laplacian framework for option discovery in reinforcement learning. InInternational conference on machine learning, pages 2295–2304. PMLR, 2017

2017

[15] [18]

Proto-value functions: A laplacian framework for learning representation and control in markov decision processes.Journal of Machine Learn- ing Research, 8(10), 2007

Sridhar Mahadevan and Mauro Maggioni. Proto-value functions: A laplacian framework for learning representation and control in markov decision processes.Journal of Machine Learn- ing Research, 8(10), 2007

2007

[16] [19]

Graph minors

Neil Robertson and Paul D Seymour. Graph minors. x. obstructions to tree-decomposition. Journal of Combinatorial Theory, Series B, 52(2):153–190, 1991

1991

[17] [20]

Semi-parametric Topological Memory for Navigation

Nikolay Savinov, Alexey Dosovitskiy, and Vladlen Koltun. Semi-parametric topological mem- ory for navigation.CoRR, abs/1803.00653, 2018. URLhttp://arxiv.org/abs/1803. 00653. 10

work page internal anchor Pith review Pith/arXiv arXiv 2018

[18] [21]

Identifying useful subgoals in reinforce- ment learning by local graph partitioning

Özgür ¸ Sim¸ sek, Alicia P Wolfe, and Andrew G Barto. Identifying useful subgoals in reinforce- ment learning by local graph partitioning. InProceedings of the 22nd international conference on Machine learning, pages 816–823, 2005

2005

[19] [22]

Between mdps and semi-mdps: A frame- work for temporal abstraction in reinforcement learning.Artificial intelligence, 112(1-2):181– 211, 1999

Richard S Sutton, Doina Precup, and Satinder Singh. Between mdps and semi-mdps: A frame- work for temporal abstraction in reinforcement learning.Artificial intelligence, 112(1-2):181– 211, 1999

1999

[20] [23]

A tutorial on spectral clustering.Statistics and computing, 17(4):395– 416, 2007

Ulrike V on Luxburg. A tutorial on spectral clustering.Statistics and computing, 17(4):395– 416, 2007

2007

[21] [24]

arXiv preprint arXiv:2006.10742 , year=

Amy Zhang, Rowan McAllister, Roberto Calandra, Yarin Gal, and Sergey Levine. Learn- ing invariant representations for reinforcement learning without reconstruction.CoRR, abs/2006.10742, 2020. URLhttps://arxiv.org/abs/2006.10742. A Related Work State abstraction in reinforcement learning has a long history as a way to trade representational com- plexity fo...

work page arXiv 2006