Multi-Agent Reinforcement Learning Guided by Signal Temporal Logic Specifications

Fei Miao; Jiangwei Wang; Meiyi Ma; Rahul Mangharam; Shuo Yang; Songyang Han; Zhili Zhang; Ziyan An

arxiv: 2306.06808 · v2 · pith:76HG23IFnew · submitted 2023-06-11 · 💻 cs.AI

Multi-Agent Reinforcement Learning Guided by Signal Temporal Logic Specifications

Jiangwei Wang , Shuo Yang , Ziyan An , Songyang Han , Zhili Zhang , Rahul Mangharam , Meiyi Ma , Fei Miao This is my paper

classification 💻 cs.AI

keywords multi-agentlearningreinforcementrewardspecificationsrequirementssafetyagent

0 comments

read the original abstract

Reward design is a key component of deep reinforcement learning, yet some tasks and designer's objectives may be unnatural to define as a scalar cost function. Among the various techniques, formal methods integrated with DRL have garnered considerable attention due to their expressiveness and flexibility to define the reward and requirements for different states and actions of the agent. However, how to leverage Signal Temporal Logic (STL) to guide multi-agent reinforcement learning reward design remains unexplored. Complex interactions, heterogeneous goals and critical safety requirements in multi-agent systems make this problem even more challenging. In this paper, we propose a novel STL-guided multi-agent reinforcement learning framework. The STL requirements are designed to include both task specifications according to the objective of each agent and safety specifications, and the robustness values of the STL specifications are leveraged to generate rewards. We validate the advantages of our method through empirical studies. The experimental results demonstrate significant reward performance improvements compared to MARL without STL guidance, along with a remarkable increase in the overall safety rate of the multi-agent systems.

This paper has not been read by Pith yet.

discussion (0)

Forward citations

Cited by 2 Pith papers

Reviewed papers in the Pith corpus that reference this work. Sorted by Pith novelty score.

SpecRLBench: A Benchmark for Generalization in Specification-Guided Reinforcement Learning
cs.LG 2026-04 unverdicted novelty 7.0

SpecRLBench is a new benchmark evaluating generalization of LTL-guided RL methods across navigation and manipulation domains with static/dynamic environments and varied robot dynamics.
Robust and Safe Multi-Agent Reinforcement Learning with Communication for Autonomous Vehicles: From Simulation to Hardware
cs.RO 2025-06 unverdicted novelty 5.0

RSR-RSMARL is a robust safe MARL framework with V2V communication and CBF safety shields that supports zero-shot sim-to-real transfer and improves coordination on 1/10-scale vehicle hardware.