DREAM: deep regret minimization with advantage baselines and model-free learning.CoRR, abs/2006.10410

Dream: Deep regret minimization with advantage baselines, model-free learning , author= · 2006 · arXiv 2006.10410

4 Pith papers cite this work. Polarity classification is still indexing.

4 Pith papers citing it

representative citing papers

NashPG: A Policy Gradient Method with Iteratively Refined Regularization for Finding Nash Equilibria

cs.LG · 2025-10-21 · unverdicted · novelty 6.0

NashPG is a policy-gradient method with iteratively refined regularization that guarantees monotonic convergence to Nash equilibria in two-player zero-sum extensive-form games and scales to large benchmarks.

Phi-Actor-Critic: Steering General-Sum Games to Pareto-Efficient Correlated Equilibria

cs.MA · 2026-06-09 · unverdicted · novelty 5.0

Phi-Actor-Critic is a new method that steers multi-agent reinforcement learning toward Pareto-efficient correlated equilibria using regret minimization and Lagrangian selection.

Data-Augmented Game Starts for Accelerating Self-Play Exploration in Imperfect Information Games

cs.LG · 2026-05-14 · unverdicted · novelty 5.0

DAGS initializes policy-gradient self-play from human-derived intermediate states to reduce exploitability in challenging imperfect-information games, with a multi-task flag fix for resulting bias and new benchmark environments.

AlphaExploitem: Going Beyond the Nash Equilibrium in Poker by Learning to Exploit Suboptimal Play

cs.LG · 2026-05-09 · unverdicted · novelty 5.0

AlphaExploitem adds a hierarchical transformer encoder and a diverse pool of exploitable opponents to AlphaHoldem, enabling exploitation of suboptimal poker play while preserving performance against Nash-equilibrium opponents.

citing papers explorer

Showing 4 of 4 citing papers.

NashPG: A Policy Gradient Method with Iteratively Refined Regularization for Finding Nash Equilibria cs.LG · 2025-10-21 · unverdicted · none · ref 36
NashPG is a policy-gradient method with iteratively refined regularization that guarantees monotonic convergence to Nash equilibria in two-player zero-sum extensive-form games and scales to large benchmarks.
Phi-Actor-Critic: Steering General-Sum Games to Pareto-Efficient Correlated Equilibria cs.MA · 2026-06-09 · unverdicted · none · ref 44
Phi-Actor-Critic is a new method that steers multi-agent reinforcement learning toward Pareto-efficient correlated equilibria using regret minimization and Lagrangian selection.
Data-Augmented Game Starts for Accelerating Self-Play Exploration in Imperfect Information Games cs.LG · 2026-05-14 · unverdicted · none · ref 43
DAGS initializes policy-gradient self-play from human-derived intermediate states to reduce exploitability in challenging imperfect-information games, with a multi-task flag fix for resulting bias and new benchmark environments.
AlphaExploitem: Going Beyond the Nash Equilibrium in Poker by Learning to Exploit Suboptimal Play cs.LG · 2026-05-09 · unverdicted · none · ref 9
AlphaExploitem adds a hierarchical transformer encoder and a diverse pool of exploitable opponents to AlphaHoldem, enabling exploitation of suboptimal poker play while preserving performance against Nash-equilibrium opponents.

DREAM: deep regret minimization with advantage baselines and model-free learning.CoRR, abs/2006.10410

fields

years

verdicts

representative citing papers

citing papers explorer