LEO enables efficient all-goals learning in goal-conditioned RL by jointly predicting for all goals in one network pass, yielding >250x speedup over relabelling and better performance on Craftax.
Simplifying deep temporal difference learning
6 Pith papers cite this work. Polarity classification is still indexing.
citation-role summary
citation-polarity summary
roles
background 1polarities
background 1representative citing papers
FlashSAC improves training speed and final performance of off-policy RL on high-dimensional robot tasks by reducing update frequency, increasing model scale, and bounding norms to limit critic error accumulation.
Adaptive Batch Scaling dynamically increases batch size in on-policy RL as policy volatility drops, measured by a new Behavioral Divergence metric, and shows larger networks plus larger batches outperform on ALE with PQN.
A C++ Dec-POMDP simulator using data-oriented design and zero-copy PyTorch integration achieves up to 33 million steps per second on a 16-core CPU, enabling multi-agent policy training in minutes with PPO, DQN, and SAC.
Survey unifies the definition of plasticity loss in DRL, taxonomizes over 50 mitigations, identifies evaluation gaps, and finds general regularization often outperforms domain-specific methods.
citing papers explorer
-
Goal-Conditioned Agents that Learn Everything All at Once
LEO enables efficient all-goals learning in goal-conditioned RL by jointly predicting for all goals in one network pass, yielding >250x speedup over relabelling and better performance on Craftax.
-
FlashSAC: Fast and Stable Off-Policy Reinforcement Learning for High-Dimensional Robot Control
FlashSAC improves training speed and final performance of off-policy RL on high-dimensional robot tasks by reducing update frequency, increasing model scale, and bounding norms to limit critic error accumulation.
-
Scalable On-Policy Reinforcement Learning via Adaptive Batch Scaling
Adaptive Batch Scaling dynamically increases batch size in on-policy RL as policy volatility drops, measured by a new Behavioral Divergence metric, and shows larger networks plus larger batches outperform on ALE with PQN.
-
A High-Throughput Compute-Efficient POMDP Hide-And-Seek-Engine (HASE) for Multi-Agent Operations
A C++ Dec-POMDP simulator using data-oriented design and zero-copy PyTorch integration achieves up to 33 million steps per second on a 16-core CPU, enabling multi-agent policy training in minutes with PPO, DQN, and SAC.
-
Plasticity Loss in Deep Reinforcement Learning: A Survey
Survey unifies the definition of plasticity loss in DRL, taxonomizes over 50 mitigations, identifies evaluation gaps, and finds general regularization often outperforms domain-specific methods.
- TABX: A High-Throughput Sandbox Battle Simulator for Multi-Agent Reinforcement Learning