Compositional Behavioral Semantics for State Abstraction in Reinforcement Learning

Manuel Baltieri; Yivan Zhang; Ziyan Luo

arxiv: 2606.25357 · v1 · pith:TAIJAPL7new · submitted 2026-06-24 · 💻 cs.LG · cs.AI· math.CT

Compositional Behavioral Semantics for State Abstraction in Reinforcement Learning

Yivan Zhang , Ziyan Luo , Manuel Baltieri This is my paper

Pith reviewed 2026-06-25 21:13 UTC · model grok-4.3

classification 💻 cs.LG cs.AImath.CT

keywords state abstractionreinforcement learningbehavioral semanticscompositional definitionsbisimulationvalue functionsbehavioral metricsinvariants

0 comments

The pith

A compositional framework defines behavioral structures from local one-step dynamics to transfer them safely under state abstraction in reinforcement learning.

A machine-rendered reading of the paper's core claim, the machinery that carries it, and where it could break.

The paper tries to establish a general principle for determining which behavioral structures remain valid when states are abstracted in reinforcement learning. It does so by giving a framework that specifies these structures compositionally from local, one-step descriptions of system dynamics. A sympathetic reader would care because this makes it possible to move value functions, invariants, bisimulations, and metrics between concrete and abstract models while preserving their meaning. The same framework also yields quantitative metrics from logical descriptions that come with soundness guarantees.

Core claim

Our framework provides a compositional way to specify behavioral semantics based on local, one-step descriptions of system dynamics. Using this framework, we establish results showing how behavioral structures can be safely transferred between abstract and concrete systems. We further show how to construct quantitative metrics from logical behavioral semantics with soundness guarantees. Together, these results provide a principled foundation for reasoning about behaviors under state abstraction in reinforcement learning and offer reusable definition and proof principles for a broad class of behavioral structures in reinforcement learning.

What carries the argument

Compositional specification of behavioral semantics from local one-step descriptions of system dynamics.

If this is right

Value functions and invariants transfer safely between abstract and concrete systems.
Bisimulation relations are preserved under state abstraction via the compositional definitions.
Quantitative metrics can be built from logical behavioral semantics while retaining soundness.
Reusable definition and proof principles apply across a broad class of behavioral structures.

Where Pith is reading between the lines

These are editorial extensions of the paper, not claims the author makes directly.

Designers of state abstractions for large MDPs may need fewer global checks when using these local definitions.
The same local-to-global transfer could support safety verification in abstracted RL policies.
The approach might extend naturally to settings with continuous states or partial observability by keeping the one-step locality.

Load-bearing premise

Behavioral structures admit compositional definitions from purely local one-step dynamics that remain invariant or transferable under arbitrary state abstractions without extra global constraints.

What would settle it

An MDP and state abstraction pair where a standard behavioral structure such as a bisimulation or value function fails to transfer under the compositional local-dynamics definition.

Figures

Figures reproduced from arXiv: 2606.25357 by Manuel Baltieri, Yivan Zhang, Ziyan Luo.

**Figure 1.** Figure 1: Policy-dependent transition obtained by closing the loop between the observation and action spaces via a policy π : O → A. t : S×A → PS is the stochastic transition map of the environment, o : S → O is the observation map, and dots represent copying. Definition 2.8 (Natural transformation). Let F and G be two functors. A natural transformation α : F ⇒ G is a family of maps αX : F X → GX such that for every… view at source ↗

**Figure 2.** Figure 2: Three ways to lift a relation/distance A × A → V to probability distributions PA × PA → V . B.5. Commutative lifting Next, we investigate the commutativity of liftings between different categories. Definition B.26 (Commutative functor lifting). Given two forgetful functors U : D → C and U ′ : D ′ → C and a functor Z : D → D ′ such that U = U ′Z, the liftings D(F) : D → D and D ′ (F) : D ′ → D ′ of the same… view at source ↗

read the original abstract

State abstraction plays a key role in scaling reinforcement learning to complex but structured systems. In studying such systems, a wide range of behavioral structures have been studied in reinforcement learning, including value functions, invariants, bisimulation relations, and behavioral metrics. However, a general principle for determining what structures are provably preserved under state abstraction is still lacking. In this paper, we present a unified framework for defining and analyzing behavioral structures in reinforcement learning. Our framework provides a compositional way to specify behavioral semantics based on local, one-step descriptions of system dynamics. Using this framework, we establish results showing how behavioral structures can be safely transferred between abstract and concrete systems. We further show how to construct quantitative metrics from logical behavioral semantics with soundness guarantees. Together, these results provide a principled foundation for reasoning about behaviors under state abstraction in reinforcement learning and offer reusable definition and proof principles for a broad class of behavioral structures in reinforcement learning.

Editorial analysis

A structured set of objections, weighed in public.

Desk editor's note, referee report, simulated authors' rebuttal, and a circularity audit. Tearing a paper down is the easy half of reading it; the pith above is the substance, this is the friction.

Desk Editor's Note private letter to a colleague

The paper sketches a compositional framework for defining behavioral structures like bisimulations and metrics from local one-step dynamics in RL, with claims on safe transfer under abstraction, but the abstract alone gives no derivations to check the claims.

read the letter

The one thing to know is that this paper tries to give a general compositional principle for behavioral structures in RL that survive state abstraction, using a framework built from local one-step system descriptions.

It is new in attempting to unify value functions, invariants, bisimulations, and behavioral metrics under one compositional semantics, and in claiming results on safe transfer and on deriving metrics from logical semantics with soundness.

The paper does well in clearly stating the problem of lacking a general preservation principle and in outlining how the framework could provide reusable definition and proof principles for many structures.

The soft spots are in the lack of any visible proof sketches or examples in the abstract, which makes it difficult to verify if the local one-step approach really allows arbitrary abstractions without extra constraints. The weakest assumption seems to be that these structures can be defined compositionally from local dynamics alone and remain transferable, but many similar results in the literature require some form of global consistency or specific properties of the abstraction mapping.

This paper is for people in the RL community working on state abstraction and representation learning. A reader who wants a more principled way to think about what behavioral properties are preserved would get value from it, assuming the full paper delivers on the claims.

It deserves a serious referee because the claims are specific and the area is active.

I would recommend sending it to peer review so the technical details can be checked.

Referee Report

2 major / 0 minor

Summary. The paper presents a unified framework for defining and analyzing behavioral structures in reinforcement learning (value functions, invariants, bisimulations, behavioral metrics) via compositional specifications based on local one-step system dynamics. It claims to establish results on safe transfer of these structures between abstract and concrete systems under state abstraction, and to construct quantitative metrics from logical behavioral semantics with soundness guarantees, providing reusable definition and proof principles.

Significance. If the central claims hold, the framework would supply a principled, compositional foundation for state abstraction in RL and unify a range of behavioral structures under local dynamics. No machine-checked proofs, reproducible code, or falsifiable predictions are mentioned, so these strengths cannot be credited.

major comments (2)

The abstract asserts results on safe transfer of behavioral structures and soundness guarantees for metrics, yet the manuscript supplies no derivations, theorems, proof sketches, or examples. Without these, the central claim that local one-step compositional definitions remain invariant or transferable under arbitrary abstractions cannot be evaluated.
The weakest assumption (behavioral structures admit compositional definitions from purely local dynamics that transfer without additional global MDP constraints) is stated but receives no supporting construction or counter-example analysis in any visible section.

Simulated Author's Rebuttal

2 responses · 0 unresolved

We thank the referee for their detailed review and for highlighting issues of clarity and evidential support in the manuscript. We address each major comment below and will revise the paper to strengthen the presentation of our results.

read point-by-point responses

Referee: The abstract asserts results on safe transfer of behavioral structures and soundness guarantees for metrics, yet the manuscript supplies no derivations, theorems, proof sketches, or examples. Without these, the central claim that local one-step compositional definitions remain invariant or transferable under arbitrary abstractions cannot be evaluated.

Authors: We agree that the current manuscript does not contain explicit theorem statements, derivations, proof sketches, or concrete examples in the sections provided. This limits evaluability of the transfer claims. In the revised version we will add a new section presenting the main theorems on safe transfer of behavioral structures (including value functions, invariants, bisimulations, and metrics), together with proof sketches that rely on the compositional one-step definitions and small illustrative examples demonstrating invariance under state abstraction. revision: yes
Referee: The weakest assumption (behavioral structures admit compositional definitions from purely local dynamics that transfer without additional global MDP constraints) is stated but receives no supporting construction or counter-example analysis in any visible section.

Authors: The assumption is introduced via the category-theoretic framework that defines behavioral semantics from local dynamics alone. We acknowledge, however, that the manuscript lacks an explicit supporting construction showing transfer without global constraints and any counter-example analysis. The revision will include a dedicated subsection that supplies the construction (via functorial composition of local specifications) and a counter-example illustrating when global MDP constraints become necessary if the local-compositionality assumption is dropped. revision: yes

Circularity Check

0 steps flagged

No significant circularity

full rationale

The abstract and described claims introduce a compositional framework defined from local one-step dynamics, with transfer results and metric constructions presented as following from that framework. No equations, self-citations, or fitted quantities are supplied that would reduce any central claim to its inputs by construction. The derivation chain therefore remains independent of the target results.

Axiom & Free-Parameter Ledger

0 free parameters · 0 axioms · 0 invented entities

Review performed on abstract only; no concrete free parameters, axioms, or invented entities can be extracted.

pith-pipeline@v0.9.1-grok · 5686 in / 1086 out tokens · 23807 ms · 2026-06-25T21:13:02.202339+00:00 · methodology

discussion (0)

Reference graph

Works this paper leans on

12 extracted references · 7 canonical work pages

[1]

cc/paper/2023/hash/9d8cf1247786d6dfe efeeb53b8b5f6d7-Abstract.html

URL https://proceedings.neurips. cc/paper/2023/hash/9d8cf1247786d6dfe efeeb53b8b5f6d7-Abstract.html. Abel, D., Ho, M. K., and Harutyunyan, A. Three dogmas of reinforcement learning. InReinforcement Learning Conference, 2024. URL https://rlj.cs.umass .edu/2024/papers/Paper89.html. Abel, D., Bowling, M., Barreto, A., Dabney, W., Dong, S., Hansen, S., Haruty...

2023
[2]

Bakirtzis, G., Savvas, M., and Topcu, U

URL https://doi.org/10.1093/acpr of:oso/9780198568612.001.0001. Bakirtzis, G., Savvas, M., and Topcu, U. Categorical seman- tics of compositional reinforcement learning.Journal of Machine Learning Research, 26(130):1–37, 2025. URL http://jmlr.org/papers/v26/24-0197.ht ml. Baldan, P., Bonchi, F., Kerstan, H., and König, B. Behav- ioral metrics via functor ...

work page doi:10.1093/acpr 2025
[3]

v24i1.7740

URL https://doi.org/10.1609/aaai .v33i01.33013582. Fritz, T. A synthetic approach to Markov kernels, condi- tional independence and theorems on sufficient statistics. Advances in Mathematics, 370:107239, 2020. URLhttp s://doi.org/10.1016/j.aim.2020.107239 . https://arxiv.org/abs/1908.07021. Gelada, C., Kumar, S., Buckman, J., Nachum, O., and Belle- mare, ...

work page doi:10.1609/aaai 2020
[4]

cc/paper/2020/hash/3bb585ea00014b0e3 ebe4c6dd165a358-Abstract.html

URL https://proceedings.neurips. cc/paper/2020/hash/3bb585ea00014b0e3 ebe4c6dd165a358-Abstract.html. Hanna, J. and Corrado, N. When can model-free reinforce- ment learning be enough for thinking? InNeural In- formation Processing Systems, 2025. URL https: //proceedings.neurips.cc/paper/2025/ hash/2a4179ef39846557e99f6bfac580ea2 e-Abstract.html. Hasuo, I.,...

work page doi:10.5555/645531.656017 2020
[5]

12 Compositional Behavioral Semantics Moody, J., Wu, L., Liao, Y ., and Saffell, M

URL https://doi.org/10.1613/jair .1.15703. 12 Compositional Behavioral Semantics Moody, J., Wu, L., Liao, Y ., and Saffell, M. Performance functions and reinforcement learning for trading systems and portfolios.Journal of forecasting, 17(5-6):441–470,

work page doi:10.1613/jair
[6]

Myers, D

URL https://doi.org/10.1002/(SI CI)1099-131X(1998090)17:5/6%3C441::AI D-FOR707%3E3.0.CO;2-%23. Myers, D. J. Categorical systems theory, 2023. URL http s://www.davidjaz.com/Papers/Dynamica lBook.pdf. Ni, T., Eysenbach, B., Seyedsalehi, E., Ma, M., Gehring, C., Mahajan, A., and Bacon, P.-L. Bridging state and history representations: Understanding self-pred...

work page doi:10.1002/(si 2023
[7]

Ota, K., Oiki, T., Jha, D., Mariyama, T., and Nikovski, D

URL https://openreview.net/forum ?id=ms0VgzSGF2. Ota, K., Oiki, T., Jha, D., Mariyama, T., and Nikovski, D. Can increasing input dimensionality improve deep reinforcement learning? InInternational Conference on Machine Learning, 2020. URL https://proceedi ngs.mlr.press/v119/ota20a.html. Panangaden, P., Rezaei-Shoshtari, S., Zhao, R., Meger, D., and Precup...

work page doi:10.1016/s0304-3975(03 2020
[8]

Sutton, R

URL http://jmlr.org/papers/v23/ 20-1165.html. Sutton, R. S. and Barto, A. G.Reinforcement Learning: An Introduction. The MIT Press, 1998. URL http://in completeideas.net/book/the-book.html. Sutton, R. S., Precup, D., and Singh, S. Between MDPs and semi-MDPs: A framework for temporal abstraction in reinforcement learning.Artificial Intelligence, 112(1): 18...

work page doi:10.24963/ijcai.2 1998
[9]

cc/paper/2020/hash/4a5cfa9281924139d b466a8a19291aff-Abstract.html

URL https://proceedings.neurips. cc/paper/2020/hash/4a5cfa9281924139d b466a8a19291aff-Abstract.html. Wiltzer, H., Farebrother, J., Gretton, A., Tang, Y ., Barreto, A., Dabney, W., Bellemare, M. G., and Rowland, M. A distributional analogue to the successor representation. InInternational Conference on Machine Learning, 2024. URL https://proceedings.mlr.pr...

2020
[10]

An-ary bundleA n →Vis a generalization where the domain is then-fold productA n

Define a category of bundlesAbundleover a C-object V is simply a C-object A equipped with a C-morphism hA :A→V. An-ary bundleA n →Vis a generalization where the domain is then-fold productA n. Definition B.2(Bundle).A n-ary bundleover V is a pair (A, hA :A n →V) of a C-object A and a C-morphism hA :A n →V from the product An to V . Alax bundle morphism f:...
[11]

Define a forgetful functor Definition B.3.A forgetful functorU:C n V →Cis given by U:C n V →C (A, hA :A n →V)7→A f: (A, h A)→(B, h B)7→f:A→B (40)
[12]

environment spotlight

Define a functor lifting Definition B.4(Bundle lifting).The lifting Cn V (F) :C n V →C n V of an endofunctor F:C→C along U:C n V →C must have the form Cn V (F) :C n V →C n V (A, hA :A n →V)7→(F A, λ A(hA) : (F A)n →V) f: (A, h A)→(B, h B)7→F f: (F A, λ A(hA))→(F B, λ B(hB)) (41) where λA is a family of functions indexed by C-objects A, mapping each C-morp...

2001

[1] [1]

cc/paper/2023/hash/9d8cf1247786d6dfe efeeb53b8b5f6d7-Abstract.html

URL https://proceedings.neurips. cc/paper/2023/hash/9d8cf1247786d6dfe efeeb53b8b5f6d7-Abstract.html. Abel, D., Ho, M. K., and Harutyunyan, A. Three dogmas of reinforcement learning. InReinforcement Learning Conference, 2024. URL https://rlj.cs.umass .edu/2024/papers/Paper89.html. Abel, D., Bowling, M., Barreto, A., Dabney, W., Dong, S., Hansen, S., Haruty...

2023

[2] [2]

Bakirtzis, G., Savvas, M., and Topcu, U

URL https://doi.org/10.1093/acpr of:oso/9780198568612.001.0001. Bakirtzis, G., Savvas, M., and Topcu, U. Categorical seman- tics of compositional reinforcement learning.Journal of Machine Learning Research, 26(130):1–37, 2025. URL http://jmlr.org/papers/v26/24-0197.ht ml. Baldan, P., Bonchi, F., Kerstan, H., and König, B. Behav- ioral metrics via functor ...

work page doi:10.1093/acpr 2025

[3] [3]

v24i1.7740

URL https://doi.org/10.1609/aaai .v33i01.33013582. Fritz, T. A synthetic approach to Markov kernels, condi- tional independence and theorems on sufficient statistics. Advances in Mathematics, 370:107239, 2020. URLhttp s://doi.org/10.1016/j.aim.2020.107239 . https://arxiv.org/abs/1908.07021. Gelada, C., Kumar, S., Buckman, J., Nachum, O., and Belle- mare, ...

work page doi:10.1609/aaai 2020

[4] [4]

cc/paper/2020/hash/3bb585ea00014b0e3 ebe4c6dd165a358-Abstract.html

URL https://proceedings.neurips. cc/paper/2020/hash/3bb585ea00014b0e3 ebe4c6dd165a358-Abstract.html. Hanna, J. and Corrado, N. When can model-free reinforce- ment learning be enough for thinking? InNeural In- formation Processing Systems, 2025. URL https: //proceedings.neurips.cc/paper/2025/ hash/2a4179ef39846557e99f6bfac580ea2 e-Abstract.html. Hasuo, I.,...

work page doi:10.5555/645531.656017 2020

[5] [5]

12 Compositional Behavioral Semantics Moody, J., Wu, L., Liao, Y ., and Saffell, M

URL https://doi.org/10.1613/jair .1.15703. 12 Compositional Behavioral Semantics Moody, J., Wu, L., Liao, Y ., and Saffell, M. Performance functions and reinforcement learning for trading systems and portfolios.Journal of forecasting, 17(5-6):441–470,

work page doi:10.1613/jair

[6] [6]

Myers, D

URL https://doi.org/10.1002/(SI CI)1099-131X(1998090)17:5/6%3C441::AI D-FOR707%3E3.0.CO;2-%23. Myers, D. J. Categorical systems theory, 2023. URL http s://www.davidjaz.com/Papers/Dynamica lBook.pdf. Ni, T., Eysenbach, B., Seyedsalehi, E., Ma, M., Gehring, C., Mahajan, A., and Bacon, P.-L. Bridging state and history representations: Understanding self-pred...

work page doi:10.1002/(si 2023

[7] [7]

Ota, K., Oiki, T., Jha, D., Mariyama, T., and Nikovski, D

URL https://openreview.net/forum ?id=ms0VgzSGF2. Ota, K., Oiki, T., Jha, D., Mariyama, T., and Nikovski, D. Can increasing input dimensionality improve deep reinforcement learning? InInternational Conference on Machine Learning, 2020. URL https://proceedi ngs.mlr.press/v119/ota20a.html. Panangaden, P., Rezaei-Shoshtari, S., Zhao, R., Meger, D., and Precup...

work page doi:10.1016/s0304-3975(03 2020

[8] [8]

Sutton, R

URL http://jmlr.org/papers/v23/ 20-1165.html. Sutton, R. S. and Barto, A. G.Reinforcement Learning: An Introduction. The MIT Press, 1998. URL http://in completeideas.net/book/the-book.html. Sutton, R. S., Precup, D., and Singh, S. Between MDPs and semi-MDPs: A framework for temporal abstraction in reinforcement learning.Artificial Intelligence, 112(1): 18...

work page doi:10.24963/ijcai.2 1998

[9] [9]

cc/paper/2020/hash/4a5cfa9281924139d b466a8a19291aff-Abstract.html

URL https://proceedings.neurips. cc/paper/2020/hash/4a5cfa9281924139d b466a8a19291aff-Abstract.html. Wiltzer, H., Farebrother, J., Gretton, A., Tang, Y ., Barreto, A., Dabney, W., Bellemare, M. G., and Rowland, M. A distributional analogue to the successor representation. InInternational Conference on Machine Learning, 2024. URL https://proceedings.mlr.pr...

2020

[10] [10]

An-ary bundleA n →Vis a generalization where the domain is then-fold productA n

Define a category of bundlesAbundleover a C-object V is simply a C-object A equipped with a C-morphism hA :A→V. An-ary bundleA n →Vis a generalization where the domain is then-fold productA n. Definition B.2(Bundle).A n-ary bundleover V is a pair (A, hA :A n →V) of a C-object A and a C-morphism hA :A n →V from the product An to V . Alax bundle morphism f:...

[11] [11]

Define a forgetful functor Definition B.3.A forgetful functorU:C n V →Cis given by U:C n V →C (A, hA :A n →V)7→A f: (A, h A)→(B, h B)7→f:A→B (40)

[12] [12]

environment spotlight

Define a functor lifting Definition B.4(Bundle lifting).The lifting Cn V (F) :C n V →C n V of an endofunctor F:C→C along U:C n V →C must have the form Cn V (F) :C n V →C n V (A, hA :A n →V)7→(F A, λ A(hA) : (F A)n →V) f: (A, h A)→(B, h B)7→F f: (F A, λ A(hA))→(F B, λ B(hB)) (41) where λA is a family of functions indexed by C-objects A, mapping each C-morp...

2001