Extreme q-learning: Maxent rl without entropy

Divyansh Garg, Joey Hejna, Matthieu Geist, Stefano Ermon · 2023 · arXiv 2301.02328

4 Pith papers cite this work. Polarity classification is still indexing.

4 Pith papers citing it

representative citing papers

Towards Efficient and Expressive Offline RL via Flow-Anchored Noise-conditioned Q-Learning

cs.LG · 2026-05-03 · unverdicted · novelty 7.0

FAN achieves state-of-the-art offline RL performance on robotic tasks by anchoring flow policies and using single-sample noise-conditioned Q-learning, with proven convergence and reduced runtimes.

Beyond Autoregressive RTG: Conditioning via Injection Outside Sequential Modeling in Decision Transformer

cs.LG · 2026-05-07 · unverdicted · novelty 6.0

Injecting RTG into states outside the autoregressive sequence yields shorter, more efficient Decision Transformers that outperform the original on offline RL tasks.

Fisher Decorator: Refining Flow Policy via a Local Transport Map

cs.LG · 2026-04-20 · unverdicted · novelty 6.0

Fisher Decorator refines flow policies in offline RL via a local transport map and Fisher-matrix quadratic approximation of the KL constraint, yielding controllable error near the optimum and SOTA benchmark results.

IDQL: Implicit Q-Learning as an Actor-Critic Method with Diffusion Policies

cs.LG · 2023-04-20 · conditional · novelty 6.0

IDQL generalizes IQL into an actor-critic framework and uses diffusion policies for robust policy extraction, outperforming prior offline RL methods.

citing papers explorer

Showing 4 of 4 citing papers.

Towards Efficient and Expressive Offline RL via Flow-Anchored Noise-conditioned Q-Learning cs.LG · 2026-05-03 · unverdicted · none · ref 48
FAN achieves state-of-the-art offline RL performance on robotic tasks by anchoring flow policies and using single-sample noise-conditioned Q-learning, with proven convergence and reduced runtimes.
Beyond Autoregressive RTG: Conditioning via Injection Outside Sequential Modeling in Decision Transformer cs.LG · 2026-05-07 · unverdicted · none · ref 10
Injecting RTG into states outside the autoregressive sequence yields shorter, more efficient Decision Transformers that outperform the original on offline RL tasks.
Fisher Decorator: Refining Flow Policy via a Local Transport Map cs.LG · 2026-04-20 · unverdicted · none · ref 49
Fisher Decorator refines flow policies in offline RL via a local transport map and Fisher-matrix quadratic approximation of the KL constraint, yielding controllable error near the optimum and SOTA benchmark results.
IDQL: Implicit Q-Learning as an Actor-Critic Method with Diffusion Policies cs.LG · 2023-04-20 · conditional · none · ref 14
IDQL generalizes IQL into an actor-critic framework and uses diffusion policies for robust policy extraction, outperforming prior offline RL methods.

Extreme q-learning: Maxent rl without entropy

fields

years

verdicts

representative citing papers

citing papers explorer