arxiv: 2509.01728 · v4 · submitted 2025-09-01 · 💻 cs.RO · cs.LG· cs.LO

Constrained Decoding for Safe Robot Navigation Foundation Models

Parv Kapoor , Akila Ganlath , Michael Clifford , Changliu Liu , Sebastian Scherer , Eunsuk Kang This is my paper

Pith reviewed 2026-05-18 19:03 UTC · model grok-4.3

classification 💻 cs.RO cs.LGcs.LO

keywords constrained decodingrobot foundation modelssignal temporal logicsafe navigationembodied AItransformer policiesSTL specifications

0 comments

The pith

SafeDec uses constrained decoding to make generated actions from robot navigation foundation models provably satisfy STL safety specifications without retraining.

A machine-rendered reading of the paper's core claim, the machinery that carries it, and where it could break.

This paper presents SafeDec, a framework that adds safety checks to how transformer-based robot models generate action sequences. The models are powerful but can produce unsafe behaviors because they lack explicit correctness guarantees. SafeDec enforces Signal Temporal Logic rules during the decoding process itself, using assumed robot dynamics to verify that actions will meet safety specs. This allows safe operation at runtime while working with any existing policy and without needing to retrain the model. Readers should care as it bridges data-driven AI with formal safety assurances for physical robots in navigation tasks.

Core claim

The central discovery is a constrained decoding approach for autoregressive robot foundation models that enforces STL formulas so that the output action sequences provably satisfy the safety specifications under the assumed dynamics, all at inference time and independent of the specific policy model used.

What carries the argument

SafeDec constrained decoding framework, which integrates STL verification into the autoregressive generation process using assumed system dynamics to prune or guide unsafe action sequences.

If this is right

Action sequences generated by the model will satisfy the given STL specifications by design.
The method applies to any underlying navigation policy since it is policy-agnostic.
Safety is ensured at runtime without any need for model retraining or fine-tuning.
Decoding interventions can both filter out unsafe actions and generate conditional safe actions for tasks in the CHORES benchmark.

Where Pith is reading between the lines

These are editorial extensions of the paper, not claims the author makes directly.

If the assumed dynamics match reality closely, this could allow reliable safe navigation in diverse environments without custom safety training.
Extensions might include handling uncertainty in dynamics or applying similar constraints to other robot tasks like manipulation.
The approach highlights a way to combine formal methods with large foundation models for embodied systems.

Load-bearing premise

The verification step during decoding depends on having an accurate model of the robot's dynamics to check STL satisfaction.

What would settle it

Executing the generated actions on a physical robot and finding that an STL specification is violated under the dynamics model used during decoding would show the claim does not hold.

Figures

Figures reproduced from arXiv: 2509.01728 by Akila Ganlath, Changliu Liu, Eunsuk Kang, Michael Clifford, Parv Kapoor, Sebastian Scherer.

**Figure 2.** Figure 2: Qualitative comparison of decoded trajectories for a sample scene. Each plot shows a top [PITH_FULL_IMAGE:figures/full_fig_p008_2.png] view at source ↗

**Figure 3.** Figure 3: Ablation studies. (a) STL satisfaction (%) for HCD and RCD under baseline vs noisy dynamics across base models. (b) Effect of β on success rate and safety satisfaction. robustness and base logits. To investigate the impact of β on the success rate and STL satisfaction, we performed an ablation with varying values for β for Flare and PoliFormer. We observe that as β increases for PoliFormer, both STL satisf… view at source ↗

**Figure 4.** Figure 4: Top-down views across indoor environments with [PITH_FULL_IMAGE:figures/full_fig_p014_4.png] view at source ↗

read the original abstract

Recent advances in the development of robotic foundation models have led to promising end-to-end and general-purpose capabilities in robotic systems. Trained on vast datasets of simulated and real-world trajectories, these policies map multimodal observations directly to action sequences for physical execution. Despite promising real-world capabilities, these models are still data-driven and, therefore, lack explicit notions of behavioral correctness. We address this gap by introducing SafeDec, a constrained decoding framework for autoregressive, transformer-based robot navigation foundation models that enforces safety specifications expressed as Signal Temporal Logic (STL) formulas. Our method ensures that generated actions provably satisfy STL specifications under assumed dynamics at runtime without retraining while remaining agnostic of the underlying policy. We evaluate SafeDec on tasks from the CHORES benchmark for state-of-the-art embodied navigation policies across hundreds of procedurally generated environments and show that our decoding-time interventions are useful not only for filtering unsafe actions but also for conditional action generation. Videos are available at constrained-robot-fms.github.io

Editorial analysis

A structured set of objections, weighed in public.

Desk editor's note, referee report, simulated authors' rebuttal, and a circularity audit. Tearing a paper down is the easy half of reading it; the pith above is the substance, this is the friction.

Desk Editor's Note private letter to a colleague

SafeDec wraps constrained decoding around STL specs to enforce safety on robot foundation model outputs at runtime without retraining, but the guarantees stay tied to how well the assumed dynamics match reality.

read the letter

The main takeaway is that this paper gives a practical way to layer formal safety checks onto existing autoregressive robot navigation models by constraining what the decoder can output. SafeDec uses STL formulas and forward simulation under assumed dynamics to reject unsafe action sequences on the fly. That approach is new in this specific setting for foundation-model policies, and it keeps the original model untouched, which is a real advantage for deployment on top of pretrained systems. The evaluation on the CHORES benchmark across hundreds of procedurally generated environments shows the method can both filter bad actions and support conditional generation, which is a useful demonstration even if the numbers are not detailed in the abstract. The work is policy-agnostic, so it could apply to other transformer-based navigation policies without extra training. The central limitation is that the provable STL satisfaction only holds inside the assumed dynamics used for simulation during decoding. Any gap between those assumptions and actual robot kinematics, sensor noise, or unmodeled effects would break the guarantee, and the paper does not appear to include robustness checks or sensitivity analysis for that mismatch. The abstract also skips quantitative metrics and error breakdowns, which makes it harder to judge the performance cost of the constraints. This paper is aimed at researchers working on safe learned robotics and runtime verification for foundation models. A reader already thinking about STL or constrained decoding in robotics would find the concrete integration worth looking at. I would send it to peer review because the core method addresses a clear gap and the evaluation setup is large enough to be informative, even though revisions for more robustness analysis and concrete results would help.

Referee Report

2 major / 2 minor

Summary. The paper introduces SafeDec, a constrained decoding framework for autoregressive, transformer-based robot navigation foundation models. It enforces safety specifications expressed as Signal Temporal Logic (STL) formulas during decoding to ensure that generated actions provably satisfy the STL specifications under assumed dynamics at runtime, without retraining the model and while remaining agnostic to the underlying policy. The approach is evaluated on tasks from the CHORES benchmark for state-of-the-art embodied navigation policies across hundreds of procedurally generated environments, demonstrating utility for both filtering unsafe actions and conditional action generation.

Significance. If the central claims hold, this work offers a practical runtime mechanism to add formal safety guarantees to black-box robot foundation models using STL, which could be significant for safe real-world deployment of general-purpose navigation policies. The policy-agnostic and no-retraining properties are strengths that could enable broad adoption. The scale of the evaluation across hundreds of environments is a positive indicator of generality, though the absence of detailed quantitative results limits the demonstrated impact.

major comments (2)

[Abstract and Evaluation section] Abstract and Evaluation section: The claim that generated actions 'provably satisfy STL specifications under assumed dynamics' is load-bearing on the fidelity of the assumed dynamics model used to simulate future states during decoding. The manuscript provides no robustness analysis, sensitivity experiments, or discussion of mismatches due to parameter errors, unmodeled effects, or disturbances, which is required to establish that the guarantee transfers beyond simulation.
[Evaluation section] Evaluation section: The abstract states that SafeDec is evaluated on the CHORES benchmark across hundreds of environments and shows usefulness for filtering and conditional generation, but no quantitative metrics (e.g., success rates, violation rates, or comparison baselines), error analysis, or details on how post-hoc dynamics assumptions affect the reported outcomes are provided. This absence makes it difficult to substantiate the practical effectiveness of the decoding interventions.

minor comments (2)

[Abstract] The video link is given as constrained-robot-fms.github.io; provide a complete, clickable URL and ensure supplementary material includes example STL formulas and decoding traces for reproducibility.
Clarify the exact form of the STL formulas employed in the CHORES tasks and the precise mechanism by which they are checked during autoregressive token generation (e.g., horizon length, predicate definitions).

Simulated Author's Rebuttal

2 responses · 0 unresolved

We thank the referee for the constructive feedback and for recognizing the potential significance of SafeDec in providing runtime safety guarantees for robot foundation models. We address each major comment below and outline the revisions planned for the next version of the manuscript.

read point-by-point responses

Referee: [Abstract and Evaluation section] Abstract and Evaluation section: The claim that generated actions 'provably satisfy STL specifications under assumed dynamics' is load-bearing on the fidelity of the assumed dynamics model used to simulate future states during decoding. The manuscript provides no robustness analysis, sensitivity experiments, or discussion of mismatches due to parameter errors, unmodeled effects, or disturbances, which is required to establish that the guarantee transfers beyond simulation.

Authors: We agree that the provable satisfaction of STL specifications is conditional on the fidelity of the assumed dynamics model. The manuscript already states this assumption explicitly, but we acknowledge the value of additional analysis. In the revised version, we will add a dedicated paragraph in the Evaluation section (and a brief note in the abstract if space permits) discussing potential mismatches arising from parameter errors, unmodeled effects, and disturbances. We will also include a sensitivity study that perturbs key dynamics parameters (e.g., friction coefficients and mass) within realistic ranges and reports the resulting STL satisfaction rates. This will clarify the scope of the guarantees without overstating transfer to real-world conditions. revision: partial
Referee: [Evaluation section] Evaluation section: The abstract states that SafeDec is evaluated on the CHORES benchmark across hundreds of environments and shows usefulness for filtering and conditional generation, but no quantitative metrics (e.g., success rates, violation rates, or comparison baselines), error analysis, or details on how post-hoc dynamics assumptions affect the reported outcomes are provided. This absence makes it difficult to substantiate the practical effectiveness of the decoding interventions.

Authors: We accept that the current presentation would be strengthened by explicit quantitative reporting. Although the evaluation spans hundreds of procedurally generated CHORES environments and demonstrates both filtering and conditional generation through concrete examples, we will revise the Evaluation section to include aggregated metrics such as task success rates, STL violation rates (with and without SafeDec), and comparisons against unconstrained baselines. We will also add a short error analysis and a discussion of how the post-hoc dynamics assumptions influence the observed outcomes. These additions will provide clearer evidence of practical effectiveness. revision: yes

Circularity Check

0 steps flagged

No circularity: enforcement uses external STL formulas and assumed dynamics independent of model parameters

full rationale

The paper presents SafeDec as a runtime constrained decoding method that filters or generates action sequences to satisfy given STL specifications under separately assumed robot dynamics. This construction does not define any quantity in terms of itself, rename a fitted parameter as a prediction, or rely on a load-bearing self-citation chain whose prior result is unverified. The central guarantee is explicitly conditional on the accuracy of the external dynamics model and the provided STL formula; neither is derived from the foundation model's weights or training data by construction. The derivation therefore remains self-contained against external benchmarks and receives the default non-circularity finding.

Axiom & Free-Parameter Ledger

0 free parameters · 1 axioms · 0 invented entities

The approach rests on standard domain assumptions about robot dynamics and the ability of STL to capture relevant safety properties; no free parameters or new entities are introduced in the abstract.

axioms (1)

domain assumption Robot follows known dynamics for forward simulation during decoding
Invoked to provably check STL satisfaction at runtime.

pith-pipeline@v0.9.0 · 5713 in / 1092 out tokens · 55163 ms · 2026-05-18T19:03:29.781690+00:00 · methodology

discussion (0)

Lean theorems connected to this paper

Citations machine-checked in the Pith Canon. Every link opens the source theorem in the public Lean library.

IndisputableMonolith/Cost/FunctionalEquation.lean washburn_uniqueness_aczel unclear

?

unclear
Relation between the paper passage and the cited Recognition theorem.

we propose SafeDec: A constrained decoding strategy that integrates STL specifications into the foundation model action selection process itself... using the quantitative semantics of STL specifications (robustness)
IndisputableMonolith/Foundation/RealityFromDistinction.lean reality_from_one_distinction unclear

?

unclear
Relation between the paper passage and the cited Recognition theorem.

For our dynamics model, we assume a unicycle model, an approximate first-order dynamics abstraction

What do these tags mean?

matches: The paper's claim is directly supported by a theorem in the formal canon.
supports: The theorem supports part of the paper's argument, but the paper may add assumptions or extra steps.
extends: The paper goes beyond the formal theorem; the theorem is a base layer rather than the whole result.
uses: The paper appears to rely on the theorem as machinery.
contradicts: The paper's claim conflicts with a theorem or certificate in the canon.
unclear: Pith found a possible connection, but the passage is too broad, indirect, or ambiguous to say the theorem truly supports the claim.

Reference graph

Works this paper leans on

45 extracted references · 45 canonical work pages · 9 internal anchors

[1]

Y . Hu, Q. Xie, V . Jain, J. Francis, J. Patrikar, N. V . Keetha, S. Kim, Y . Xie, T. Zhang, S. Zhao, et al. Toward general-purpose robots via foundation models: A survey and meta-analysis. CoRR, 2023

work page 2023
[2]

Ehsani, T

K. Ehsani, T. Gupta, R. Hendrix, J. Salvador, L. Weihs, K.-H. Zeng, K. P. Singh, Y . Kim, W. Han, A. Herrasti, et al. Spoc: Imitating shortest paths in simulation enables effective navigation and manipulation in the real world. InProceedings of the IEEE/CVF Conference on Computer Vision and Pattern Recognition, pages 16238–16250, 2024

work page 2024
[3]

K.-H. Zeng, Z. Zhang, K. Ehsani, R. Hendrix, J. Salvador, A. Herrasti, R. Girshick, A. Kem- bhavi, and L. Weihs. Poliformer: Scaling on-policy rl with transformers results in masterful navigators. InConference on Robot Learning, pages 408–432. PMLR, 2025

work page 2025
[4]

J. Hu, R. Hendrix, A. Farhadi, A. Kembhavi, R. Mart ´ın-Mart´ın, P. Stone, K.-H. Zeng, and K. Ehsani. Flare: Achieving masterful and adaptive robot policies with large-scale reinforce- ment learning fine-tuning. In2025 IEEE International Conference on Robotics and Automation (ICRA), pages 3617–3624. IEEE, 2025

work page 2025
[5]

M. J. Kim, K. Pertsch, S. Karamcheti, T. Xiao, A. Balakrishna, S. Nair, R. Rafailov, E. Foster, G. Lam, P. Sanketi, Q. Vuong, T. Kollar, B. Burchfiel, R. Tedrake, D. Sadigh, S. Levine, P. Liang, and C. Finn. Openvla: An open-source vision-language-action model, 2024. URL https://arxiv.org/abs/2406.09246

work page internal anchor Pith review Pith/arXiv arXiv 2024
[6]

Specification Patterns for Robotic Missions

C. Menghi, C. Tsigkanos, P. Pelliccione, C. Ghezzi, and T. Berger. Specification patterns for robotic missions.CoRR, abs/1901.02077, 2019. URLhttp://arxiv.org/abs/1901. 02077

work page internal anchor Pith review Pith/arXiv arXiv 1901
[7]

Farrell, M

M. Farrell, M. Luckcuck, and M. Fisher.Robotics and Integrated Formal Methods: Ne- cessity Meets Opportunity, page 161–171. Springer International Publishing, 2018. ISBN 9783319989389. doi:10.1007/978-3-319-98938-9 10. URLhttp://dx.doi.org/10. 1007/978-3-319-98938-9_10

work page doi:10.1007/978-3-319-98938-9 2018
[8]

A. Pnueli. The temporal logic of programs. In18th Annual Symposium on Foundations of Computer Science (sfcs 1977), pages 46–57, 1977. doi:10.1109/SFCS.1977.32

work page doi:10.1109/sfcs.1977.32 1977
[9]

Kapoor, S

P. Kapoor, S. Vemprala, and A. Kapoor. Logically constrained robotics transformers for en- hanced perception-action planning.arXiv preprint arXiv:2408.05336, 2024. 10

work page arXiv 2024
[10]

B. T. Willard and R. Louf. Efficient guided generation for large language models.arXiv preprint arXiv:2307.09702, 2023

work page internal anchor Pith review Pith/arXiv arXiv 2023
[11]

Beurer-Kellner, M

L. Beurer-Kellner, M. Fischer, and M. Vechev. Prompting is programming: A query language for large language models.Proc. ACM Program. Lang., 7(PLDI), June 2023. doi:10.1145/ 3591300. URLhttps://doi.org/10.1145/3591300

work page doi:10.1145/3591300 2023
[12]

G. AI. Guidance: A language model control framework.https://github.com/ guidance-ai/guidance, 2023. Accessed: April 27, 2025

work page 2023
[13]

Maler and D

O. Maler and D. Nickovic. Monitoring Temporal Properties of Continuous Signals. (3253): 152–166, 2004

work page 2004
[14]

Kapoor, K

P. Kapoor, K. Mizuta, E. Kang, and K. Leung. Stlcg++: A masking approach for differentiable signal temporal logic specification.IEEE Robotics and Automation Letters, 10(9):9240–9247, Sept. 2025. ISSN 2377-3774. doi:10.1109/lra.2025.3588389. URLhttp://dx.doi.org/ 10.1109/LRA.2025.3588389

work page doi:10.1109/lra.2025.3588389 2025
[15]

Maler and D

O. Maler and D. Nickovic. Monitoring temporal properties of continuous signals. InFOR- MATS/FTRTFT, 2004. URLhttps://api.semanticscholar.org/CorpusID:15642684

work page 2004
[16]

The Curious Case of Neural Text Degeneration

A. Holtzman, J. Buys, L. Du, M. Forbes, and Y . Choi. The curious case of neural text degen- eration.arXiv preprint arXiv:1904.09751, 2019

work page internal anchor Pith review Pith/arXiv arXiv 1904
[17]

A. Fan, M. Lewis, and Y . Dauphin. Hierarchical neural story generation.arXiv preprint arXiv:1805.04833, 2018

work page internal anchor Pith review Pith/arXiv arXiv 2018
[18]

In: Proceedings of the 3rd Workshop on Figurative Language Processing (FLP), pp

C. Hokamp and Q. Liu. Lexically constrained decoding for sequence generation using grid beam search. In R. Barzilay and M.-Y . Kan, editors,Proceedings of the 55th Annual Meeting of the Association for Computational Linguistics (Volume 1: Long Papers), pages 1535–1546, Vancouver, Canada, July 2017. Association for Computational Linguistics. doi:10.18653/v...

work page doi:10.18653/v1/ 2017
[19]

Welleck, A

S. Welleck, A. Bertsch, M. Finlayson, H. Schoelkopf, A. Xie, G. Neubig, I. Kulikov, and Z. Harchaoui. From decoding to meta-generation: Inference-time algorithms for large lan- guage models, 2024. URLhttps://arxiv.org/abs/2406.16838

work page arXiv 2024
[20]

Peyrard, M

M. Peyrard, M. Josifoski, and R. West. The era of semantic decoding, 2024. URLhttps: //arxiv.org/abs/2403.14562

work page arXiv 2024
[21]

K. Park, J. Wang, T. Berg-Kirkpatrick, N. Polikarpova, and L. D' Antoni. Grammar- aligned decoding. In A. Globerson, L. Mackey, D. Belgrave, A. Fan, U. Pa- quet, J. Tomczak, and C. Zhang, editors,Advances in Neural Information Pro- cessing Systems, volume 37, pages 24547–24568. Curran Associates, Inc., 2024. URLhttps://proceedings.neurips.cc/paper_files/p...

work page 2024
[22]

A. Liu, M. Sap, X. Lu, S. Swayamdipta, C. Bhagavatula, N. A. Smith, and Y . Choi. Dexperts: Decoding-time controlled text generation with experts and anti-experts, 2021. URLhttps: //arxiv.org/abs/2105.03023

work page arXiv 2021
[23]

AI2-THOR: An Interactive 3D Environment for Visual AI

E. Kolve, R. Mottaghi, W. Han, E. VanderBilt, L. Weihs, A. Herrasti, M. Deitke, K. Ehsani, D. Gordon, Y . Zhu, A. Kembhavi, A. Gupta, and A. Farhadi. Ai2-thor: An interactive 3d environment for visual ai, 2022. URLhttps://arxiv.org/abs/1712.05474

work page internal anchor Pith review Pith/arXiv arXiv 2022
[24]

T. He, C. Zhang, W. Xiao, G. He, C. Liu, and G. Shi. Agile but safe: Learning collision-free high-speed legged locomotion. InRobotics: Science and Systems (RSS), 2024. 11

work page 2024
[25]

K. S. Yun, R. Chen, C. Dunaway, J. M. Dolan, and C. Liu. Safe control of quadruped in varying dynamics via safety index adaptation. In2025 IEEE International Conference on Robotics and Automation (ICRA), pages 7771–7777. IEEE, 2025

work page 2025
[26]

Kapoor, I

P. Kapoor, I. Higgins, N. Keetha, J. Patrikar, B. Moon, Z. Ye, Y . He, I. Cisneros, Y . Hu, C. Liu, E. Kang, and S. Scherer. Demonstrating visafe: Vision-enabled safety for high-speed detect and avoid. 2025

work page 2025
[27]

W. Zhao, Y . Sun, F. Li, R. Chen, R. Liu, T. Wei, and C. Liu. GUARD: A safe reinforcement learning benchmark.Transactions on Machine Learning Research, 2024. ISSN 2835-8856. URLhttps://openreview.net/forum?id=kZFKwApeQO

work page 2024
[28]

M. H. Cohen, T. G. Molnar, and A. D. Ames. Safety-critical control for autonomous systems: Control barrier functions via reduced-order models.Annual Reviews in Control, 57:100947,

work page
[29]

doi:https://doi.org/10.1016/j.arcontrol.2024.100947

ISSN 1367-5788. doi:https://doi.org/10.1016/j.arcontrol.2024.100947. URLhttps: //www.sciencedirect.com/science/article/pii/S1367578824000166

work page doi:10.1016/j.arcontrol.2024.100947 2024
[30]

L. Sha. Using simplicity to control complexity.IEEE Software, 18(4):20–28, 2001. doi: 10.1109/MS.2001.936213

work page doi:10.1109/ms.2001.936213 2001
[31]

A. D. Ames, S. Coogan, M. Egerstedt, G. Notomista, K. Sreenath, and P. Tabuada. Control barrier functions: Theory and applications, 2019. URLhttps://arxiv.org/abs/1903. 11199

work page 2019
[32]

S. Gu, L. Yang, Y . Du, G. Chen, F. Walter, J. Wang, and A. Knoll. A review of safe reinforce- ment learning: Methods, theory and applications, 2024. URLhttps://arxiv.org/abs/ 2205.10330

work page arXiv 2024
[33]

Safe Reinforcement Learning via Shielding

M. Alshiekh, R. Bloem, R. Ehlers, B. K ¨onighofer, S. Niekum, and U. Topcu. Safe reinforce- ment learning via shielding, 2017. URLhttps://arxiv.org/abs/1708.08611

work page internal anchor Pith review Pith/arXiv arXiv 2017
[34]

Kurtz and H

V . Kurtz and H. Lin. Mixed-integer programming for signal temporal logic with fewer binary variables.arXiv preprint arXiv:2204.06367, 2022

work page arXiv 2022
[35]

Kapoor, E

P. Kapoor, E. Kang, and R. Meira-G ´oes. Safe planning through incremental decomposition of signal temporal logic specifications. InNASA Formal Methods Symposium, pages 377–396. Springer, 2024

work page 2024
[36]

SafeVLA: Towards Safety Alignment of Vision-Language-Action Model via Constrained Learning

B. Zhang, Y . Zhang, J. Ji, Y . Lei, J. Dai, Y . Chen, and Y . Yang. Safevla: Towards safety alignment of vision-language-action model via safe reinforcement learning, 2025. URL https://arxiv.org/abs/2503.03480

work page internal anchor Pith review Pith/arXiv arXiv 2025
[37]

Sermanet, A

P. Sermanet, A. Majumdar, A. Irpan, D. Kalashnikov, and V . Sindhwani. Generating robot constitutions & benchmarks for semantic safety.arXiv preprint arXiv:2503.08663, 2025

work page arXiv 2025
[38]

Y . Wu, Z. Xiong, Y . Hu, S. S. Iyengar, N. Jiang, A. Bera, L. Tan, and S. Jagannathan. Selp: Generating safe and efficient task plans for robot agents with large language models, 2025. URLhttps://arxiv.org/abs/2409.19471

work page arXiv 2025
[39]

J. J. Aloor, J. Patrikar, P. Kapoor, J. Oh, and S. Scherer. Follow the rules: Online signal temporal logic tree search for guided imitation learning in stochastic domains. In2023 IEEE International Conference on Robotics and Automation (ICRA), pages 1320–1326, 2023. doi: 10.1109/ICRA48891.2023.10160953

work page doi:10.1109/icra48891.2023.10160953 2023
[40]

Luckcuck, M

M. Luckcuck, M. Farrell, L. A. Dennis, C. Dixon, and M. Fisher. Formal specification and verification of autonomous robotic systems: A survey.ACM Computing Surveys, 52(5):1–41, Sept. 2019. ISSN 1557-7341. doi:10.1145/3342355. URLhttp://dx.doi.org/10.1145/ 3342355. 12

work page doi:10.1145/3342355 2019
[41]

Kapoor, A

P. Kapoor, A. Hammer, A. Kapoor, K. Leung, and E. Kang. Pretrained embeddings as a behavior specification mechanism, 2025. URLhttps://arxiv.org/abs/2503.02012

work page arXiv 2025
[42]

M. Li, S. Zhao, Q. Wang, K. Wang, Y . Zhou, S. Srivastava, C. Gokmen, T. Lee, L. E. Li, R. Zhang, W. Liu, P. Liang, L. Fei-Fei, J. Mao, and J. Wu. Embodied agent interface: Bench- marking llms for embodied decision making, 2025. URLhttps://arxiv.org/abs/2410. 07166

work page 2025
[43]

C. Hao, W. Lu, Y . Xu, and Y . Chen. Neural motion simulator: Pushing the limit of world models in reinforcement learning, 2025. URLhttps://arxiv.org/abs/2504.07095

work page arXiv 2025
[44]

G. Zhou, H. Pan, Y . LeCun, and L. Pinto. Dino-wm: World models on pre-trained visual features enable zero-shot planning, 2025. URLhttps://arxiv.org/abs/2411.04983

work page internal anchor Pith review Pith/arXiv arXiv 2025
[45]

Micheli, E

V . Micheli, E. Alonso, and F. Fleuret. Transformers are sample-efficient world models. In The Eleventh International Conference on Learning Representations, 2023. URLhttps: //openreview.net/forum?id=vhFu1Acb0xb. 13 A Appendix A.1 Quantitative Semantics of STL Given a signals t representing a signal starting at time t, the quantitative semantics of satisf...

work page 2023