hub

Exploration by random network distillation

Yuri Burda, Harrison Edwards, Amos Storkey, Oleg Klimov · 2018 · arXiv 1810.12894

12 Pith papers cite this work. Polarity classification is still indexing.

12 Pith papers citing it

read on arXiv browse 12 citing papers

hub tools

JSON dossier citing papers JSON arXiv source

representative citing papers

Quality-Aware Exploration Budget Allocation for Cooperative Multi-Agent Reinforcement Learning

cs.MA · 2026-05-03 · unverdicted · novelty 7.0

A quality-aware exploration method using return-conditioned sigmoid scheduling and per-agent RSQ metrics achieves top-tier returns on seven cooperative MARL benchmarks.

Beyond Single-Model Optimization: Preserving Plasticity in Continual Reinforcement Learning

cs.LG · 2026-04-16 · unverdicted · novelty 7.0

TeLAPA maintains archives of behaviorally diverse yet competent policies aligned in a shared latent space to preserve plasticity and enable faster recovery after interference in continual reinforcement learning.

Dota 2 with Large Scale Deep Reinforcement Learning

cs.LG · 2019-12-13 · accept · novelty 7.0

OpenAI Five achieved superhuman performance in Dota 2 by defeating the world champions using scaled self-play reinforcement learning.

Learning What Matters: Adaptive Information-Theoretic Objectives for Robot Exploration

cs.RO · 2026-05-12 · unverdicted · novelty 6.0

QOED selects identifiable parameter directions via Fisher matrix eigenspace analysis and modifies exploration objectives to approximate ideal information gain under bounded nuisance assumptions, yielding 21-35% performance gains in robotic tasks.

Shaping Zero-Shot Coordination via State Blocking

cs.LG · 2026-05-12 · unverdicted · novelty 6.0

SBC generates virtual environments via state blocking to expose agents to diverse suboptimal partner policies, yielding superior zero-shot coordination performance including with humans.

Learning to Theorize the World from Observation

cs.LG · 2026-05-05 · unverdicted · novelty 6.0

NEO induces compositional latent programs as world theories from observations and executes them to enable explanation-driven generalization.

QHyer: Q-conditioned Hybrid Attention-mamba Transformer for Offline Goal-conditioned RL

cs.LG · 2026-05-03 · unverdicted · novelty 6.0

QHyer replaces return-to-go with a state-conditioned Q-estimator and adds a gated hybrid attention-mamba backbone to achieve state-of-the-art performance in offline goal-conditioned RL on both Markovian and non-Markovian datasets.

Breaking the Computational Barrier: Provably Efficient Actor-Critic for Low-Rank MDPs

cs.LG · 2026-05-02 · unverdicted · novelty 6.0

An actor-critic RL algorithm for low-rank MDPs achieves improved sample efficiency using solely a policy evaluation oracle.

Distill-Belief: Closed-Loop Inverse Source Localization and Characterization in Physical Fields

cs.AI · 2026-04-28 · unverdicted · novelty 6.0

Distill-Belief distills Bayesian information-gain signals from a particle-filter teacher into a compact student policy for fast closed-loop source localization and parameter estimation while avoiding reward hacking.

Improving Zero-Shot Offline RL via Behavioral Task Sampling

cs.AI · 2026-04-28 · unverdicted · novelty 6.0

Extracting task vectors from the offline dataset for policy training improves zero-shot offline RL performance by an average of 20% over random sampling baselines.

Dual-Timescale Memory in a Spiking Neuron-Astrocyte Network for Efficient Navigation

q-bio.QM · 2026-04-16 · unverdicted · novelty 6.0

A neuron-astrocyte network with dual-timescale memory reduces median path lengths up to sixfold in partially observable grid-world navigation tasks.

Learning-Based Sparsification of Dynamic Graphs in Robotic Exploration Algorithms

cs.RO · 2026-04-15 · unverdicted · novelty 6.0

A PPO-trained transformer policy sparsifies dynamic graphs during RRT frontier exploration, cutting size by up to 96% and yielding the most consistent exploration rates across environments.

citing papers explorer

Showing 12 of 12 citing papers.

Quality-Aware Exploration Budget Allocation for Cooperative Multi-Agent Reinforcement Learning cs.MA · 2026-05-03 · unverdicted · none · ref 13
A quality-aware exploration method using return-conditioned sigmoid scheduling and per-agent RSQ metrics achieves top-tier returns on seven cooperative MARL benchmarks.
Beyond Single-Model Optimization: Preserving Plasticity in Continual Reinforcement Learning cs.LG · 2026-04-16 · unverdicted · none · ref 2
TeLAPA maintains archives of behaviorally diverse yet competent policies aligned in a shared latent space to preserve plasticity and enable faster recovery after interference in continual reinforcement learning.
Dota 2 with Large Scale Deep Reinforcement Learning cs.LG · 2019-12-13 · accept · none · ref 44
OpenAI Five achieved superhuman performance in Dota 2 by defeating the world champions using scaled self-play reinforcement learning.
Learning What Matters: Adaptive Information-Theoretic Objectives for Robot Exploration cs.RO · 2026-05-12 · unverdicted · none · ref 23
QOED selects identifiable parameter directions via Fisher matrix eigenspace analysis and modifies exploration objectives to approximate ideal information gain under bounded nuisance assumptions, yielding 21-35% performance gains in robotic tasks.
Shaping Zero-Shot Coordination via State Blocking cs.LG · 2026-05-12 · unverdicted · none · ref 37
SBC generates virtual environments via state blocking to expose agents to diverse suboptimal partner policies, yielding superior zero-shot coordination performance including with humans.
Learning to Theorize the World from Observation cs.LG · 2026-05-05 · unverdicted · none · ref 60
NEO induces compositional latent programs as world theories from observations and executes them to enable explanation-driven generalization.
QHyer: Q-conditioned Hybrid Attention-mamba Transformer for Offline Goal-conditioned RL cs.LG · 2026-05-03 · unverdicted · none · ref 19
QHyer replaces return-to-go with a state-conditioned Q-estimator and adds a gated hybrid attention-mamba backbone to achieve state-of-the-art performance in offline goal-conditioned RL on both Markovian and non-Markovian datasets.
Breaking the Computational Barrier: Provably Efficient Actor-Critic for Low-Rank MDPs cs.LG · 2026-05-02 · unverdicted · none · ref 60
An actor-critic RL algorithm for low-rank MDPs achieves improved sample efficiency using solely a policy evaluation oracle.
Distill-Belief: Closed-Loop Inverse Source Localization and Characterization in Physical Fields cs.AI · 2026-04-28 · unverdicted · none · ref 9
Distill-Belief distills Bayesian information-gain signals from a particle-filter teacher into a compact student policy for fast closed-loop source localization and parameter estimation while avoiding reward hacking.
Improving Zero-Shot Offline RL via Behavioral Task Sampling cs.AI · 2026-04-28 · unverdicted · none · ref 3
Extracting task vectors from the offline dataset for policy training improves zero-shot offline RL performance by an average of 20% over random sampling baselines.
Dual-Timescale Memory in a Spiking Neuron-Astrocyte Network for Efficient Navigation q-bio.QM · 2026-04-16 · unverdicted · none · ref 51
A neuron-astrocyte network with dual-timescale memory reduces median path lengths up to sixfold in partially observable grid-world navigation tasks.
Learning-Based Sparsification of Dynamic Graphs in Robotic Exploration Algorithms cs.RO · 2026-04-15 · unverdicted · none · ref 25
A PPO-trained transformer policy sparsifies dynamic graphs during RRT frontier exploration, cutting size by up to 96% and yielding the most consistent exploration rates across environments.

Exploration by random network distillation

hub tools

fields

years

verdicts

representative citing papers

citing papers explorer