Exploration by Learning Diverse Skills through Successor State Measures

Dennis G. Wilson; Emmanuel Rachelson; Florent Teichteil-Konigsbuch; Paul-Antoine Le Tolguenec; Yann Besse

arxiv: 2406.10127 · v1 · pith:ZVB3KHPQnew · submitted 2024-06-14 · 💻 cs.AI · cs.RO

Exploration by Learning Diverse Skills through Successor State Measures

Paul-Antoine Le Tolguenec , Yann Besse , Florent Teichteil-Konigsbuch , Dennis G. Wilson , Emmanuel Rachelson This is my paper

classification 💻 cs.AI cs.RO

keywords skillsdiverseexplorationstatestatessuccessorapproachbonuses

0 comments

read the original abstract

The ability to perform different skills can encourage agents to explore. In this work, we aim to construct a set of diverse skills which uniformly cover the state space. We propose a formalization of this search for diverse skills, building on a previous definition based on the mutual information between states and skills. We consider the distribution of states reached by a policy conditioned on each skill and leverage the successor state measure to maximize the difference between these skill distributions. We call this approach LEADS: Learning Diverse Skills through Successor States. We demonstrate our approach on a set of maze navigation and robotic control tasks which show that our method is capable of constructing a diverse set of skills which exhaustively cover the state space without relying on reward or exploration bonuses. Our findings demonstrate that this new formalization promotes more robust and efficient exploration by combining mutual information maximization and exploration bonuses.

This paper has not been read by Pith yet.

discussion (0)

Forward citations

Cited by 1 Pith paper

Reviewed papers in the Pith corpus that reference this work. Sorted by Pith novelty score.

Reward-free Pretraining for Reinforcement Learning via Occupancy Coverage Maximization
cs.LG 2026-06 unverdicted novelty 6.0

ROVER pretrains transferable exploration policies by maximizing occupancy coverage with a learned resolvent world model and virtual sink state, outperforming baselines on sparse navigation tasks.