Volatility promotes exploration and stochasticity suppresses it in Gaussian state-space bandits, shown by extending Gittins indices and deriving the CAUSE exploration bonus via control-as-inference.
Soft actor-critic: Off-policy max- imum entropy deep reinforcement learning with a stochastic actor
2 Pith papers cite this work. Polarity classification is still indexing.
2
Pith papers citing it
years
2026 2verdicts
UNVERDICTED 2representative citing papers
A C++ Dec-POMDP simulator using data-oriented design and zero-copy PyTorch integration achieves up to 33 million steps per second on a 16-core CPU, enabling multi-agent policy training in minutes with PPO, DQN, and SAC.
citing papers explorer
-
Not all uncertainty is alike: volatility, stochasticity, and exploration
Volatility promotes exploration and stochasticity suppresses it in Gaussian state-space bandits, shown by extending Gittins indices and deriving the CAUSE exploration bonus via control-as-inference.
-
A High-Throughput Compute-Efficient POMDP Hide-And-Seek-Engine (HASE) for Multi-Agent Operations
A C++ Dec-POMDP simulator using data-oriented design and zero-copy PyTorch integration achieves up to 33 million steps per second on a 16-core CPU, enabling multi-agent policy training in minutes with PPO, DQN, and SAC.