Accelerating Skill Assessment in Chess: A Drift-Diffusion-Enhanced Elo Rating System

Tianming Yang; Tianyuan Zhou; Zhizheng Fu

arxiv: 2606.26267 · v1 · pith:NA7OIOUNnew · submitted 2026-06-24 · 💻 cs.AI

Accelerating Skill Assessment in Chess: A Drift-Diffusion-Enhanced Elo Rating System

Tianyuan Zhou , Zhizheng Fu , Tianming Yang This is my paper

Pith reviewed 2026-06-26 01:41 UTC · model grok-4.3

classification 💻 cs.AI

keywords chess ratingdrift diffusion modelElo systemskill assessmentmove-level datarating adaptationdecision process

0 comments

The pith

DD-Elo incorporates move quality into chess ratings via a drift-diffusion model to adapt faster to skill changes while staying bounded near traditional Elo.

A machine-rendered reading of the paper's core claim, the machinery that carries it, and where it could break.

The paper proposes DD-Elo to reduce the lag in chess skill ratings that comes from relying only on game wins and losses. It treats each move as accumulating evidence in a drift-diffusion decision process drawn from cognitive models, allowing ratings to shift based on move quality. A mathematical derivation establishes that the new ratings remain within a fixed distance from standard Elo values. Experiments on real data show quicker response to actual skill shifts. If the approach holds, rating systems could update player levels more responsively without discarding existing infrastructure.

Core claim

By modeling skill expression as a drift-diffusion decision process, DD-Elo integrates move-level data to capture rapid skill fluctuations. Rigorous mathematical derivation proves that DD-Elo maintains a bounded deviation from the traditional Elo system. Extensive experiments demonstrate that DD-Elo adapts to skill changes faster than Elo.

What carries the argument

The Drift-Diffusion-Enhanced Elo (DD-Elo) rating system, which applies a drift-diffusion decision process to accumulate evidence of skill from individual moves.

If this is right

DD-Elo adapts to skill changes faster than standard Elo by using move data.
DD-Elo maintains a bounded deviation from traditional Elo ratings by mathematical proof.
DD-Elo offers an explainable method to incorporate move-level information into ratings.
DD-Elo remains backward-compatible with existing Elo-based matchmaking systems.

Where Pith is reading between the lines

These are editorial extensions of the paper, not claims the author makes directly.

Online chess platforms could reduce delays in updating ratings during active play sessions.
The same modeling step might apply to rating systems in other sequential decision games.
Live move processing could enable real-time skill estimates if computational cost stays low.

Load-bearing premise

That chess move quality can be modeled as a drift-diffusion process without the noise in move data overwhelming the overall skill signal.

What would settle it

Run DD-Elo and standard Elo on a dataset containing sudden artificial skill shifts in players and measure whether DD-Elo detects those shifts earlier while its ratings remain within the proven bound of Elo values.

Figures

Figures reproduced from arXiv: 2606.26267 by Tianming Yang, Tianyuan Zhou, Zhizheng Fu.

**Figure 2.** Figure 2: Elo and DD-Elo rating trajectories for a representative player. During [PITH_FULL_IMAGE:figures/full_fig_p002_2.png] view at source ↗

**Figure 3.** Figure 3: Distribution of Area Improvement Percentage (AIP), Directional Ac [PITH_FULL_IMAGE:figures/full_fig_p006_3.png] view at source ↗

**Figure 4.** Figure 4: The result of Standard IC 0.02 as the threshold for significance. An IC value surpassing 0.02 implies that the factor contains significant predictive information regarding the outcome. Standard IC: Direct Prediction Our first application, denoted as Standard IC, applies the IC formula directly to rating changes. The DD-Elo correction term ∆t is treated as the predictive factor (X), while the realized outco… view at source ↗

read the original abstract

Rating systems such as Elo serve as the gold standard for matchmaking in competitive chess. However, they inherently suffer from response lag due to their exclusive reliance on match outcomes, neglecting the granular quality of gameplay. Nevertheless, incorporating move-by-move information into rating adjustments presents a significant challenge given the substantial noise and the vastness of the game-state space. To address this, we propose the Drift-Diffusion-Enhanced Elo Rating System (DD-Elo), a novel skill assessment framework inspired by the drift diffusion model (DDM) from cognitive neuroscience. By modeling skill expression as a decision-making process, our model integrates move-level data to capture rapid skill fluctuations. We provide a rigorous mathematical derivation proving that DD-Elo maintains a bounded deviation from the traditional Elo system, ensuring theoretical alignment. Extensive experiments demonstrate that DD-Elo adapts to skill changes faster than Elo. Our findings suggest that DD-Elo offers an explainable, highly responsive, and backward-compatible solution for chess rating ecosystems. The implementation code is publicly available at https://github.com/Aquila-zhou1/DD-Elo .

Editorial analysis

A structured set of objections, weighed in public.

Desk editor's note, referee report, simulated authors' rebuttal, and a circularity audit. Tearing a paper down is the easy half of reading it; the pith above is the substance, this is the friction.

Desk Editor's Note private letter to a colleague

DD-Elo adds move-level drift-diffusion updates to Elo but the adaptation advantage looks vulnerable to noise and the bounded-deviation result needs the equations checked.

read the letter

The paper introduces DD-Elo, which applies a drift-diffusion model to update chess ratings after individual moves instead of waiting for full games. It claims this produces faster response to skill changes while keeping the ratings within a bounded distance of standard Elo, and the authors release code.

What stands out as new is the explicit fusion of DDM-style evidence accumulation with per-move Elo adjustments. That combination is not in the usual Elo variants. The abstract also does a clean job stating the compatibility goal and the motivation around response lag.

The soft spots sit mainly in the empirical and modeling assumptions. Chess move quality is noisy; any proxy for drift (engine scores, win probabilities) carries position-specific variance that can easily dominate the signal. If the diffusion term is large, the rating trajectories may simply become more volatile without converging faster to true skill. The stress-test note on this point holds up from the abstract alone. The bounded-deviation claim is presented as a rigorous derivation, but without the actual equations it is impossible to judge whether the result is independent or largely built into the modeling choices. Experiments are asserted to show faster adaptation, yet no details on controls, baselines, or noise levels are visible here.

This work is for people who maintain or extend rating systems in competitive games. A reader already working on fine-grained skill tracking could extract the core idea and the public code for testing. It deserves a serious referee because the claims are concrete and falsifiable, even though the current evidence level is low. I would send it to review but flag the noise-robustness question and the need to see the derivation in full.

Referee Report

2 major / 1 minor

Summary. The manuscript proposes the Drift-Diffusion-Enhanced Elo (DD-Elo) rating system, which models chess skill expression via a drift-diffusion process to integrate move-level data into rating updates. It asserts a rigorous mathematical derivation proving bounded deviation from standard Elo and presents experiments showing faster adaptation to skill changes while remaining backward-compatible.

Significance. If the bounded-deviation result is non-tautological and the empirical adaptation gains survive realistic move-level noise, the approach could yield a more responsive rating system for chess and similar domains. Public code availability supports reproducibility and is a clear strength.

major comments (2)

[Abstract] Abstract: the claim of a 'rigorous mathematical derivation proving that DD-Elo maintains a bounded deviation from the traditional Elo system' requires inspection of the update equations to determine whether the bound is an independent result or follows by construction from the choice of drift term that aligns with Elo.
[Experiments] Experiments: the reported faster adaptation must be shown to persist when move-level proxies (engine eval, win probability, etc.) carry position-specific and evaluation noise; if the diffusion component dominates, trajectories may become more volatile without faster convergence to latent skill.

minor comments (1)

[Abstract] Abstract: the phrase 'substantial noise and the vastness of the game-state space' is stated but not linked to a concrete mechanism by which the DDM is claimed to extract signal.

Simulated Author's Rebuttal

2 responses · 0 unresolved

We thank the referee for the constructive comments. We respond to each major point below.

read point-by-point responses

Referee: [Abstract] Abstract: the claim of a 'rigorous mathematical derivation proving that DD-Elo maintains a bounded deviation from the traditional Elo system' requires inspection of the update equations to determine whether the bound is an independent result or follows by construction from the choice of drift term that aligns with Elo.

Authors: The bounded-deviation result is not tautological. The drift term is chosen to match Elo in expectation, yet the derivation establishes an explicit, path-independent bound on cumulative deviation that holds uniformly over any finite sequence of updates; this bound follows from the variance-control properties of the diffusion process rather than from the alignment alone. We will revise the abstract and add a clarifying remark after the main theorem to emphasize the non-trivial nature of the bound. revision: partial
Referee: [Experiments] Experiments: the reported faster adaptation must be shown to persist when move-level proxies (engine eval, win probability, etc.) carry position-specific and evaluation noise; if the diffusion component dominates, trajectories may become more volatile without faster convergence to latent skill.

Authors: Our reported experiments already use real-game move-level data whose proxies contain natural position-specific and engine-evaluation noise. Nevertheless, to directly address the concern we will add controlled synthetic-noise experiments that inject calibrated position-dependent perturbations and verify that the adaptation-speed advantage is retained while volatility stays within the same bound derived in the theory. revision: yes

Circularity Check

0 steps flagged

No significant circularity detected

full rationale

The provided abstract and excerpts describe a DDM-inspired model with a claimed mathematical derivation for bounded deviation from Elo and separate experiments on adaptation speed. No equations, self-citations, or modeling steps are quoted that reduce the bounded-deviation result to a definitional identity, fitted parameter, or self-referential premise. The derivation is presented as an independent proof of alignment under the assumed dynamics rather than a tautology, and the central claims retain independent empirical content outside any self-referential structure.

Axiom & Free-Parameter Ledger

0 free parameters · 0 axioms · 0 invented entities

Abstract-only review supplies no explicit free parameters, axioms, or invented entities; the model is described only at the level of inspiration from DDM and a bounded-deviation guarantee.

pith-pipeline@v0.9.1-grok · 5722 in / 1074 out tokens · 17120 ms · 2026-06-26T01:41:43.892513+00:00 · methodology

discussion (0)

Reference graph

Works this paper leans on

32 extracted references · 3 canonical work pages

[1]

A comprehensive guide to chess ratings,

M. E. Glickman, “A comprehensive guide to chess ratings,”American Chess Journal, vol. 3, no. 1, pp. 59–102, 1995

1995
[2]

A. E. Elo and S. Sloan,The rating of chessplayers : past and present. Ishi Press International, 2008. [Online]. Available: https://cir.nii.ac.jp/crid/1971149384795592101

work page arXiv 2008
[3]

M. E. Glickman,Paired comparison models with time-varying parame- ters. Harvard University, 1993

1993
[4]

Empirical parameterization of the elo rating system,

S. Maitra, T. Banerjee, A. De, D. Mukherjee, and T. Mukherjee, “Empirical parameterization of the elo rating system,”arXiv preprint arXiv:2512.18013, 2025

work page arXiv 2025
[5]

The glicko system,

M. E. Glickman, “The glicko system,”Boston University, vol. 16, no. 8, p. 9, 1995

1995
[6]

Trueskill™: a bayesian skill rating system,

R. Herbrich, T. Minka, and T. Graepel, “Trueskill™: a bayesian skill rating system,”Advances in neural information processing systems, vol. 19, 2006

2006
[7]

Trueskill 2: An improved bayesian skill rating system,

T. Minka, R. Cleven, and Y . Zaykov, “Trueskill 2: An improved bayesian skill rating system,”Technical Report, 2018

2018
[8]

Expected human performance behav- ior in chess using centipawn loss analysis,

R. V . Leite and A. V . de Oliveira, “Expected human performance behav- ior in chess using centipawn loss analysis,” inInternational Conference on Human-Computer Interaction. Springer, 2023, pp. 243–252

2023
[9]

Computer analysis of world chess champions,

M. Guid and I. Bratko, “Computer analysis of world chess champions,” ICGA journal, vol. 29, no. 2, pp. 65–73, 2006

2006
[10]

Deep blue,

M. Campbell, A. J. Hoane Jr, and F.-h. Hsu, “Deep blue,”Artificial intelligence, vol. 134, no. 1-2, pp. 57–83, 2002

2002
[11]

Chess rating estimation from moves and clock times using a cnn-lstm,

M. Omori and P. Tadepalli, “Chess rating estimation from moves and clock times using a cnn-lstm,” inInternational Conference on Computers and Games. Springer, 2024, pp. 3–13

2024
[12]

The physics of optimal decision making: a formal analysis of models of performance in two-alternative forced-choice tasks

R. Bogacz, E. Brown, J. Moehlis, P. Holmes, and J. D. Cohen, “The physics of optimal decision making: a formal analysis of models of performance in two-alternative forced-choice tasks.”Psychological review, vol. 113, no. 4, p. 700, 2006

2006
[13]

The diffusion decision model: theory and data for two-choice decision tasks,

R. Ratcliff and G. McKoon, “The diffusion decision model: theory and data for two-choice decision tasks,”Neural computation, vol. 20, no. 4, pp. 873–922, 2008

2008
[14]

On extending the bradley-terry model to accommo- date ties in paired comparison experiments,

R. R. Davidson, “On extending the bradley-terry model to accommo- date ties in paired comparison experiments,”Journal of the American Statistical Association, vol. 65, no. 329, pp. 317–328, 1970

1970
[15]

Parameter estimation in large dynamic paired com- parison experiments,

M. E. Glickman, “Parameter estimation in large dynamic paired com- parison experiments,”Journal of the Royal Statistical Society Series C: Applied Statistics, vol. 48, no. 3, pp. 377–394, 1999

1999
[16]

Whole-history rating: A bayesian rating system for players of time-varying strength,

R. Coulom, “Whole-history rating: A bayesian rating system for players of time-varying strength,” inInternational conference on computers and games. Springer, 2008, pp. 113–124

2008
[17]

An analysis of elo rating systems via markov chains,

S. Olesker-Taylor and L. Zanetti, “An analysis of elo rating systems via markov chains,”Advances in Neural Information Processing Systems, vol. 37, pp. 138 289–138 323, 2024

2024
[18]

Ordnungsbestimmungen zur dwz- spielst¨arkebewertung von schachspielern in deutschland,

Deutscher Schachbund, “Ordnungsbestimmungen zur dwz- spielst¨arkebewertung von schachspielern in deutschland,” http://www.schachbund.de/wertungsordnung.html, accessed: 2026- 01-08

2026
[19]

Example of the glicko-2 system,

M. E. Glickman, “Example of the glicko-2 system,”Boston University, vol. 28, p. 2012, 2012

2012
[20]

The speed and accuracy of a simple perceptual decision: a mathematical primer,

M. N. Shadlen, T. D. Hanks, A. K. Churchland, R. Kiani, and T. Yang, “The speed and accuracy of a simple perceptual decision: a mathematical primer,”Bayesian brain: Probabilistic approaches to neural coding, pp. 209–237, 2006

2006
[21]

Decision making as a window on cogni- tion,

M. N. Shadlen and R. Kiani, “Decision making as a window on cogni- tion,”Neuron, vol. 80, no. 3, pp. 791–806, 2013. [Online]. Available: https://www.sciencedirect.com/science/article/pii/S0896627313009999

2013
[22]

A neural implementation of Wald’s sequential probability ratio test,

S. Kira, T. Yang, and M. N. Shadlen, “A neural implementation of Wald’s sequential probability ratio test,”Neuron, vol. 85, no. 4, pp. 861–873, feb 2015

2015
[23]

Intrinsic chess ratings,

K. Regan and G. Haworth, “Intrinsic chess ratings,” inProceedings of the AAAI Conference on Artificial Intelligence, vol. 25, no. 1, 2011, pp. 834–839

2011
[24]

Golden summer for rybka,

H. Secelle and E. van Reem, “Golden summer for rybka,”ICGA Journal, vol. 30, no. 3, pp. 171–176, 2007

2007
[25]

Stockfish,

The Stockfish developers, “Stockfish,” available: https://stockfishchess.org/download/. [Online]. Available: https: //stockfishchess.org/
[26]

Lichess open database,

Lichess, “Lichess open database,” 2026, accessed: 2026-01-25. [Online]. Available: https://database.lichess.org/

2026
[27]

Profit potential in an “almost efficient

K. P. Ambachtsheer, “Profit potential in an “almost efficient” market,” The Journal of Portfolio Management, vol. 1, no. 1, pp. 84–87, 1974

1974
[28]

Alpha factor mining using hierarchical genetic algorithms with quality-diversity selection,

J. Jung, “Alpha factor mining using hierarchical genetic algorithms with quality-diversity selection,”Available at SSRN 5467586, 2025

2025
[29]

Mastering the game of go with deep neural networks and tree search,

D. Silver, A. Huang, C. J. Maddison, A. Guez, L. Sifre, G. Van Den Driessche, J. Schrittwieser, I. Antonoglou, V . Panneershelvam, M. Lanctotet al., “Mastering the game of go with deep neural networks and tree search,”nature, vol. 529, no. 7587, pp. 484–489, 2016

2016
[30]

A general reinforcement learning algorithm that masters chess, shogi, and go through self-play,

D. Silver, T. Hubert, J. Schrittwieser, I. Antonoglou, M. Lai, A. Guez, M. Lanctot, L. Sifre, D. Kumaran, T. Graepelet al., “A general reinforcement learning algorithm that masters chess, shogi, and go through self-play,”Science, vol. 362, no. 6419, pp. 1140–1144, 2018

2018
[31]

Regret minimization in games with incomplete information,

M. Zinkevich, M. Johanson, M. Bowling, and C. Piccione, “Regret minimization in games with incomplete information,”Advances in neural information processing systems, vol. 20, 2007

2007
[32]

Computing approximate equilibria in sequential adversar- ial games by exploitability descent,

E. Lockhart, M. Lanctot, J. P ´erolat, J.-B. Lespiau, D. Morrill, F. Timbers, and K. Tuyls, “Computing approximate equilibria in sequential adversar- ial games by exploitability descent,”arXiv preprint arXiv:1903.05614, 2019

work page arXiv 1903

[1] [1]

A comprehensive guide to chess ratings,

M. E. Glickman, “A comprehensive guide to chess ratings,”American Chess Journal, vol. 3, no. 1, pp. 59–102, 1995

1995

[2] [2]

A. E. Elo and S. Sloan,The rating of chessplayers : past and present. Ishi Press International, 2008. [Online]. Available: https://cir.nii.ac.jp/crid/1971149384795592101

work page arXiv 2008

[3] [3]

M. E. Glickman,Paired comparison models with time-varying parame- ters. Harvard University, 1993

1993

[4] [4]

Empirical parameterization of the elo rating system,

S. Maitra, T. Banerjee, A. De, D. Mukherjee, and T. Mukherjee, “Empirical parameterization of the elo rating system,”arXiv preprint arXiv:2512.18013, 2025

work page arXiv 2025

[5] [5]

The glicko system,

M. E. Glickman, “The glicko system,”Boston University, vol. 16, no. 8, p. 9, 1995

1995

[6] [6]

Trueskill™: a bayesian skill rating system,

R. Herbrich, T. Minka, and T. Graepel, “Trueskill™: a bayesian skill rating system,”Advances in neural information processing systems, vol. 19, 2006

2006

[7] [7]

Trueskill 2: An improved bayesian skill rating system,

T. Minka, R. Cleven, and Y . Zaykov, “Trueskill 2: An improved bayesian skill rating system,”Technical Report, 2018

2018

[8] [8]

Expected human performance behav- ior in chess using centipawn loss analysis,

R. V . Leite and A. V . de Oliveira, “Expected human performance behav- ior in chess using centipawn loss analysis,” inInternational Conference on Human-Computer Interaction. Springer, 2023, pp. 243–252

2023

[9] [9]

Computer analysis of world chess champions,

M. Guid and I. Bratko, “Computer analysis of world chess champions,” ICGA journal, vol. 29, no. 2, pp. 65–73, 2006

2006

[10] [10]

Deep blue,

M. Campbell, A. J. Hoane Jr, and F.-h. Hsu, “Deep blue,”Artificial intelligence, vol. 134, no. 1-2, pp. 57–83, 2002

2002

[11] [11]

Chess rating estimation from moves and clock times using a cnn-lstm,

M. Omori and P. Tadepalli, “Chess rating estimation from moves and clock times using a cnn-lstm,” inInternational Conference on Computers and Games. Springer, 2024, pp. 3–13

2024

[12] [12]

The physics of optimal decision making: a formal analysis of models of performance in two-alternative forced-choice tasks

R. Bogacz, E. Brown, J. Moehlis, P. Holmes, and J. D. Cohen, “The physics of optimal decision making: a formal analysis of models of performance in two-alternative forced-choice tasks.”Psychological review, vol. 113, no. 4, p. 700, 2006

2006

[13] [13]

The diffusion decision model: theory and data for two-choice decision tasks,

R. Ratcliff and G. McKoon, “The diffusion decision model: theory and data for two-choice decision tasks,”Neural computation, vol. 20, no. 4, pp. 873–922, 2008

2008

[14] [14]

On extending the bradley-terry model to accommo- date ties in paired comparison experiments,

R. R. Davidson, “On extending the bradley-terry model to accommo- date ties in paired comparison experiments,”Journal of the American Statistical Association, vol. 65, no. 329, pp. 317–328, 1970

1970

[15] [15]

Parameter estimation in large dynamic paired com- parison experiments,

M. E. Glickman, “Parameter estimation in large dynamic paired com- parison experiments,”Journal of the Royal Statistical Society Series C: Applied Statistics, vol. 48, no. 3, pp. 377–394, 1999

1999

[16] [16]

Whole-history rating: A bayesian rating system for players of time-varying strength,

R. Coulom, “Whole-history rating: A bayesian rating system for players of time-varying strength,” inInternational conference on computers and games. Springer, 2008, pp. 113–124

2008

[17] [17]

An analysis of elo rating systems via markov chains,

S. Olesker-Taylor and L. Zanetti, “An analysis of elo rating systems via markov chains,”Advances in Neural Information Processing Systems, vol. 37, pp. 138 289–138 323, 2024

2024

[18] [18]

Ordnungsbestimmungen zur dwz- spielst¨arkebewertung von schachspielern in deutschland,

Deutscher Schachbund, “Ordnungsbestimmungen zur dwz- spielst¨arkebewertung von schachspielern in deutschland,” http://www.schachbund.de/wertungsordnung.html, accessed: 2026- 01-08

2026

[19] [19]

Example of the glicko-2 system,

M. E. Glickman, “Example of the glicko-2 system,”Boston University, vol. 28, p. 2012, 2012

2012

[20] [20]

The speed and accuracy of a simple perceptual decision: a mathematical primer,

M. N. Shadlen, T. D. Hanks, A. K. Churchland, R. Kiani, and T. Yang, “The speed and accuracy of a simple perceptual decision: a mathematical primer,”Bayesian brain: Probabilistic approaches to neural coding, pp. 209–237, 2006

2006

[21] [21]

Decision making as a window on cogni- tion,

M. N. Shadlen and R. Kiani, “Decision making as a window on cogni- tion,”Neuron, vol. 80, no. 3, pp. 791–806, 2013. [Online]. Available: https://www.sciencedirect.com/science/article/pii/S0896627313009999

2013

[22] [22]

A neural implementation of Wald’s sequential probability ratio test,

S. Kira, T. Yang, and M. N. Shadlen, “A neural implementation of Wald’s sequential probability ratio test,”Neuron, vol. 85, no. 4, pp. 861–873, feb 2015

2015

[23] [23]

Intrinsic chess ratings,

K. Regan and G. Haworth, “Intrinsic chess ratings,” inProceedings of the AAAI Conference on Artificial Intelligence, vol. 25, no. 1, 2011, pp. 834–839

2011

[24] [24]

Golden summer for rybka,

H. Secelle and E. van Reem, “Golden summer for rybka,”ICGA Journal, vol. 30, no. 3, pp. 171–176, 2007

2007

[25] [25]

Stockfish,

The Stockfish developers, “Stockfish,” available: https://stockfishchess.org/download/. [Online]. Available: https: //stockfishchess.org/

[26] [26]

Lichess open database,

Lichess, “Lichess open database,” 2026, accessed: 2026-01-25. [Online]. Available: https://database.lichess.org/

2026

[27] [27]

Profit potential in an “almost efficient

K. P. Ambachtsheer, “Profit potential in an “almost efficient” market,” The Journal of Portfolio Management, vol. 1, no. 1, pp. 84–87, 1974

1974

[28] [28]

Alpha factor mining using hierarchical genetic algorithms with quality-diversity selection,

J. Jung, “Alpha factor mining using hierarchical genetic algorithms with quality-diversity selection,”Available at SSRN 5467586, 2025

2025

[29] [29]

Mastering the game of go with deep neural networks and tree search,

D. Silver, A. Huang, C. J. Maddison, A. Guez, L. Sifre, G. Van Den Driessche, J. Schrittwieser, I. Antonoglou, V . Panneershelvam, M. Lanctotet al., “Mastering the game of go with deep neural networks and tree search,”nature, vol. 529, no. 7587, pp. 484–489, 2016

2016

[30] [30]

A general reinforcement learning algorithm that masters chess, shogi, and go through self-play,

D. Silver, T. Hubert, J. Schrittwieser, I. Antonoglou, M. Lai, A. Guez, M. Lanctot, L. Sifre, D. Kumaran, T. Graepelet al., “A general reinforcement learning algorithm that masters chess, shogi, and go through self-play,”Science, vol. 362, no. 6419, pp. 1140–1144, 2018

2018

[31] [31]

Regret minimization in games with incomplete information,

M. Zinkevich, M. Johanson, M. Bowling, and C. Piccione, “Regret minimization in games with incomplete information,”Advances in neural information processing systems, vol. 20, 2007

2007

[32] [32]

Computing approximate equilibria in sequential adversar- ial games by exploitability descent,

E. Lockhart, M. Lanctot, J. P ´erolat, J.-B. Lespiau, D. Morrill, F. Timbers, and K. Tuyls, “Computing approximate equilibria in sequential adversar- ial games by exploitability descent,”arXiv preprint arXiv:1903.05614, 2019

work page arXiv 1903