Intent-aware Multi-agent Reinforcement Learning

Siyuan Qi; Song-Chun Zhu

arxiv: 1803.02018 · v1 · pith:GDUP3L56new · submitted 2018-03-06 · 💻 cs.AI

Intent-aware Multi-agent Reinforcement Learning

Siyuan Qi , Song-Chun Zhu This is my paper

classification 💻 cs.AI

keywords algorithmframeworklearningplanningprocessutilityagentsfunction

0 comments

read the original abstract

This paper proposes an intent-aware multi-agent planning framework as well as a learning algorithm. Under this framework, an agent plans in the goal space to maximize the expected utility. The planning process takes the belief of other agents' intents into consideration. Instead of formulating the learning problem as a partially observable Markov decision process (POMDP), we propose a simple but effective linear function approximation of the utility function. It is based on the observation that for humans, other people's intents will pose an influence on our utility for a goal. The proposed framework has several major advantages: i) it is computationally feasible and guaranteed to converge. ii) It can easily integrate existing intent prediction and low-level planning algorithms. iii) It does not suffer from sparse feedbacks in the action space. We experiment our algorithm in a real-world problem that is non-episodic, and the number of agents and goals can vary over time. Our algorithm is trained in a scene in which aerial robots and humans interact, and tested in a novel scene with a different environment. Experimental results show that our algorithm achieves the best performance and human-like behaviors emerge during the dynamic process.

This paper has not been read by Pith yet.

discussion (0)

Forward citations

Cited by 2 Pith papers

Reviewed papers in the Pith corpus that reference this work. Sorted by Pith novelty score.

Robust Instruction Compliance in Cooperative Multi-Agent Reinforcement Learning
cs.AI 2026-05 unverdicted novelty 6.0

MAVIC corrects Bellman backups at instruction boundaries by adjusting the incoming objective and restoring continuation value, enabling consistent estimation under stochastic instruction switching in a unified policy.
Robust Instruction Compliance in Cooperative Multi-Agent Reinforcement Learning
cs.AI 2026-05 unverdicted novelty 6.0

MAVIC corrects Bellman backups at instruction boundaries by adjusting the incoming objective and restoring continuation value, enabling consistent estimation under stochastic instruction switching in cooperative MARL.