pith. sign in

arxiv: 1803.02018 · v1 · pith:GDUP3L56new · submitted 2018-03-06 · 💻 cs.AI

Intent-aware Multi-agent Reinforcement Learning

classification 💻 cs.AI
keywords algorithmframeworklearningplanningprocessutilityagentsfunction
0
0 comments X
read the original abstract

This paper proposes an intent-aware multi-agent planning framework as well as a learning algorithm. Under this framework, an agent plans in the goal space to maximize the expected utility. The planning process takes the belief of other agents' intents into consideration. Instead of formulating the learning problem as a partially observable Markov decision process (POMDP), we propose a simple but effective linear function approximation of the utility function. It is based on the observation that for humans, other people's intents will pose an influence on our utility for a goal. The proposed framework has several major advantages: i) it is computationally feasible and guaranteed to converge. ii) It can easily integrate existing intent prediction and low-level planning algorithms. iii) It does not suffer from sparse feedbacks in the action space. We experiment our algorithm in a real-world problem that is non-episodic, and the number of agents and goals can vary over time. Our algorithm is trained in a scene in which aerial robots and humans interact, and tested in a novel scene with a different environment. Experimental results show that our algorithm achieves the best performance and human-like behaviors emerge during the dynamic process.

This paper has not been read by Pith yet.

discussion (0)

Sign in with ORCID, Apple, or X to comment. Anyone can read and Pith papers without signing in.

Forward citations

Cited by 2 Pith papers

Reviewed papers in the Pith corpus that reference this work. Sorted by Pith novelty score.

  1. Robust Instruction Compliance in Cooperative Multi-Agent Reinforcement Learning

    cs.AI 2026-05 unverdicted novelty 6.0

    MAVIC corrects Bellman backups at instruction boundaries by adjusting the incoming objective and restoring continuation value, enabling consistent estimation under stochastic instruction switching in a unified policy.

  2. Robust Instruction Compliance in Cooperative Multi-Agent Reinforcement Learning

    cs.AI 2026-05 unverdicted novelty 6.0

    MAVIC corrects Bellman backups at instruction boundaries by adjusting the incoming objective and restoring continuation value, enabling consistent estimation under stochastic instruction switching in cooperative MARL.