Soft actor-critic: Off-policy max- imum entropy deep reinforcement learning with a stochastic actor

Tuomas Haarnoja, Aurick Zhou, Pieter Abbeel, Sergey Levine · 2018

2 Pith papers cite this work. Polarity classification is still indexing.

2 Pith papers citing it

browse 2 citing papers

representative citing papers

Not all uncertainty is alike: volatility, stochasticity, and exploration

cs.AI · 2026-05-19 · unverdicted · novelty 7.0

Volatility promotes exploration and stochasticity suppresses it in Gaussian state-space bandits, shown by extending Gittins indices and deriving the CAUSE exploration bonus via control-as-inference.

A High-Throughput Compute-Efficient POMDP Hide-And-Seek-Engine (HASE) for Multi-Agent Operations

cs.MA · 2026-04-29 · unverdicted · novelty 5.0

A C++ Dec-POMDP simulator using data-oriented design and zero-copy PyTorch integration achieves up to 33 million steps per second on a 16-core CPU, enabling multi-agent policy training in minutes with PPO, DQN, and SAC.

citing papers explorer

Showing 2 of 2 citing papers.

Not all uncertainty is alike: volatility, stochasticity, and exploration cs.AI · 2026-05-19 · unverdicted · none · ref 29
Volatility promotes exploration and stochasticity suppresses it in Gaussian state-space bandits, shown by extending Gittins indices and deriving the CAUSE exploration bonus via control-as-inference.
A High-Throughput Compute-Efficient POMDP Hide-And-Seek-Engine (HASE) for Multi-Agent Operations cs.MA · 2026-04-29 · unverdicted · none · ref 13
A C++ Dec-POMDP simulator using data-oriented design and zero-copy PyTorch integration achieves up to 33 million steps per second on a 16-core CPU, enabling multi-agent policy training in minutes with PPO, DQN, and SAC.

Soft actor-critic: Off-policy max- imum entropy deep reinforcement learning with a stochastic actor

fields

years

verdicts

representative citing papers

citing papers explorer