Decentralized Non-communicating Multiagent Collision Avoidance with Deep Reinforcement Learning

Jonathan P. How; Miao Liu; Michael Everett; Yu Fan Chen

arxiv: 1609.07845 · v2 · pith:DLUKXOX3new · submitted 2016-09-26 · 💻 cs.MA

Decentralized Non-communicating Multiagent Collision Avoidance with Deep Reinforcement Learning

Yu Fan Chen , Miao Liu , Michael Everett , Jonathan P. How This is my paper

classification 💻 cs.MA

keywords avoidancecollisionfindinggoallearningmultiagentpathstime

0 comments

read the original abstract

Finding feasible, collision-free paths for multiagent systems can be challenging, particularly in non-communicating scenarios where each agent's intent (e.g. goal) is unobservable to the others. In particular, finding time efficient paths often requires anticipating interaction with neighboring agents, the process of which can be computationally prohibitive. This work presents a decentralized multiagent collision avoidance algorithm based on a novel application of deep reinforcement learning, which effectively offloads the online computation (for predicting interaction patterns) to an offline learning procedure. Specifically, the proposed approach develops a value network that encodes the estimated time to the goal given an agent's joint configuration (positions and velocities) with its neighbors. Use of the value network not only admits efficient (i.e., real-time implementable) queries for finding a collision-free velocity vector, but also considers the uncertainty in the other agents' motion. Simulation results show more than 26 percent improvement in paths quality (i.e., time to reach the goal) when compared with optimal reciprocal collision avoidance (ORCA), a state-of-the-art collision avoidance strategy.

This paper has not been read by Pith yet.

discussion (0)

Forward citations

Cited by 2 Pith papers

Reviewed papers in the Pith corpus that reference this work. Sorted by Pith novelty score.

Act on What You See: Unlocking Safe Social Navigation in Vision-Language-Action Models
cs.RO 2026-06 unverdicted novelty 5.0

SALSA aligns social features and adds future-risk signals in VLA models to cut near-collisions by 86.4% and raise social accuracy from 53% to 93% on SCAND and real robots.
Enhancing the MADDPG Algorithm for Multi-Agent Learning via Action Inference and Importance Sampling
cs.LG 2026-06 unverdicted novelty 5.0

Action inference and geometric importance sampling enhance MADDPG, yielding better stability, cooperation, and exploration efficiency on the discrete Predator-Prey benchmark.