For How Long Should We Be Punching? Learning Action Duration in Fighting Games

Dennis J.N.J. Soemers; Hoang Hai Nguyen; Kurt Driessens

arxiv: 2605.20911 · v1 · pith:CSY737HBnew · submitted 2026-05-20 · 💻 cs.AI · cs.LG

For How Long Should We Be Punching? Learning Action Duration in Fighting Games

Hoang Hai Nguyen , Kurt Driessens , Dennis J.N.J. Soemers This is my paper

Pith reviewed 2026-05-21 04:33 UTC · model grok-4.3

classification 💻 cs.AI cs.LG

keywords reinforcement learningfighting gamesaction durationframe skipreal-time decision makingpolicy learninggame AI

0 comments

The pith

Reinforcement learning agents in fighting games learn both the action and how long to hold it instead of using fixed decision intervals.

A machine-rendered reading of the paper's core claim, the machinery that carries it, and where it could break.

The paper examines an alternative to fixed frame skips in real-time fighting games where agents must decide both what to do and for how many frames to do it. By training policies that output both action and duration together, the agent can adjust its responsiveness based on the current situation rather than committing to one interval for the whole match. Tests against scripted opponents show that this learned timing reaches the same level of success as well-tuned fixed skips. The approach also pushes agents toward repeating the same move sequences, which works well against predictable bots but does not make them more robust overall. In practice the strongest results still come from policies that choose high frame skips most of the time.

Core claim

Jointly predicting action and duration lets agents reach win rates comparable to the best fixed frame-skip baselines while producing repeatable action sequences that exploit scripted opponents, although the learned policies still perform best when they default to consistently high frame skips and do not automatically gain robustness.

What carries the argument

Joint prediction of action and execution duration inside the reinforcement learning policy, allowing the agent to choose variable hold times rather than a single fixed interval.

If this is right

Learned duration policies match the win rates obtained with well-chosen fixed frame skips.
Agents converge on high frame-skip values for best performance, reducing decision frequency.
Duration learning promotes repeatable action patterns rather than varied responses.
Robustness against different opponents is not guaranteed by adding duration prediction alone.

Where Pith is reading between the lines

These are editorial extensions of the paper, not claims the author makes directly.

Duration learning may be more valuable against human opponents who change timing than against static scripted bots.
The method could reduce the need for manual frame-skip tuning if extended to other real-time game environments.
Pairing duration prediction with additional mechanisms might be required to obtain both performance and robustness.

Load-bearing premise

That performance against scripted built-in bots is a sufficient test for both the effectiveness and the robustness of the learned duration policies.

What would settle it

If agents using learned durations achieve lower win rates than a fixed high frame-skip baseline when matched against the same scripted bots or against more varied opponents, the claim that learned timing matches or improves on fixed intervals would be refuted.

Figures

Figures reproduced from arXiv: 2605.20911 by Dennis J.N.J. Soemers, Hoang Hai Nguyen, Kurt Driessens.

**Figure 1.** Figure 1: Policy architecture with separate heads. There is one policy head for combinations of movement and attack actions, and another for the selection of frame-skip. of 8, forming a tensor input to the policy network and critic of PPO. Stacking frames is a common technique in RL, as it enables the agent to understand the temporal dynamics (e.g., infer directions and velocities of moving objects). The reward fun… view at source ↗

**Figure 2.** Figure 2: Policy architecture with combined head. Every action is a combination of movement, attack type, and frame-skip value. For the adaptive agent that learns to autonomously select its frame-skip value for each decision point, we furthermore consider two different ways to model the action space. In the separated agent (see [PITH_FULL_IMAGE:figures/full_fig_p004_2.png] view at source ↗

**Figure 3.** Figure 3: Average episodic reward (top) and episode length (bottom) for the Separated (4-16) training run, as functions of number of training steps. ing Ryu (i.e., the same character controlled by our trained agents), over a total of 100 evaluation games. The start level of the opponent (which serves as an indicator of difficulty level for human players) is varied from level 1 to 8 throughout these 100 evaluation g… view at source ↗

**Figure 4.** Figure 4: Distribution of frame-skip choices as function of the number of training episodes, for the Separated (4-16) training run. 0 25 50 75 100 125 150 Episode Summary ID 0.0 0.2 0.4 0.6 0.8 1.0 Probability A B C DOWN DOWN+LEFT DOWN+RIGHT LEFT RIGHT UP UP+LEFT UP+RIGHT X Y Z [PITH_FULL_IMAGE:figures/full_fig_p007_4.png] view at source ↗

**Figure 5.** Figure 5: Distribution of button choices as function of the number of training episodes, for the Separated (4-16) training run. Note that some actions (e.g., defense + light punch) correspond to multiple simultaneous button presses. are not tight. The results are merely meant to give an impression of ability to generalize to new opponents, rather than establishing precise measures of win percentage [PITH_FULL_IMAGE… view at source ↗

**Figure 6.** Figure 6: Distribution of combo choices as function of the number of training episodes, for the Separated (4-16) training run. is finetuned separately against each individual opponent character. In the Sequential Finetuning strategy, a single network (taking the policy trained against Ryu as a starting point) is finetuned sequentially against each of the new characters in increasing order of difficulty level. Fine… view at source ↗

**Figure 7.** Figure 7: Distribution of combo choices as function of training time (measured in number of training episodes), for the Sequential Finetuning training run (including the original Separated (4-16) training run against Ryu in the first part). The vertical dashed lines indicate points in time where we switch to a new opponent character for training. frame-skip values as a part of the policy—ought to include different a… view at source ↗

read the original abstract

Fighting games such as Street Fighter II present unique challenges to reinforcement learning (RL) agents due to their fast-paced, real-time nature. In most RL frameworks, agents are hard-coded to make decisions at a fixed interval, typically every frame or every N frames. Although this design ensures timely responses, it restricts the agent's ability to adjust its reaction timing. Acting every frame grants frame-perfect reflexes, which are unrealistic compared to human players, whereas longer fixed intervals reduce computational cost but hinder responsiveness. We consider an alternative decision-making framework in which the agent learns not only what action to take but also for how long to execute it. By jointly predicting both action and duration, the agent can dynamically adapt its responsiveness to different situations in the game. We implement this method using the open-source FightLadder environment with agents trained against scripted built-in bots, systematically testing different frame skip configurations to analyze their influence on performance, responsiveness, and learned behavior. Experiments show that learned timing can match the performance of well-chosen fixed frame skips and encourages repeatable action patterns, but does not ensure robustness on its own. In most cases, we see agents performing best with consistently high frame skip values (i.e., low responsiveness). This strategy makes it easier to learn exploitative strategies where the same action is repeated over and over, which the scripted bots appear to be susceptible to.

Editorial analysis

A structured set of objections, weighed in public.

Desk editor's note, referee report, simulated authors' rebuttal, and a circularity audit. Tearing a paper down is the easy half of reading it; the pith above is the substance, this is the friction.

Desk Editor's Note private letter to a colleague

The paper shows that jointly learning action and duration in fighting-game RL can match fixed frame-skip performance and favors repeatable patterns, but the scripted-bot testbed makes it hard to tell whether this reflects real timing adaptation.

read the letter

The main thing to know is that this work tests whether an RL agent in a fighting game can predict both its action and how long to hold it, rather than sticking to a fixed decision interval. In the FightLadder environment the learned-duration agents reach performance levels comparable to well-tuned fixed skips and settle into repeatable action sequences, especially at high skip values that lower responsiveness.

Referee Report

2 major / 2 minor

Summary. The manuscript introduces a reinforcement learning approach for fighting games where agents jointly learn both the action to perform and its execution duration (frame skip) rather than relying on fixed intervals. Implemented in the FightLadder environment and evaluated against scripted built-in bots, the work systematically varies frame-skip configurations and concludes that learned timing achieves performance comparable to well-chosen fixed skips, promotes repeatable action patterns, yet does not ensure robustness, with agents performing best under consistently high frame skips.

Significance. If the empirical results hold under stronger evaluation, the approach could enable more efficient and adaptive decision-making in real-time game environments, reducing computational demands while approximating human-like timing variability. The use of an open-source testbed supports potential reproducibility, and the comparison to fixed baselines provides concrete insight into the trade-offs of learned versus static responsiveness.

major comments (2)

Abstract: The abstract states that experiments were performed and reports qualitative outcomes, but provides no quantitative results, statistical tests, ablation details, or error bars; therefore the data support for the stated claims cannot be verified from the given text.
Experiments section: The evaluation relies exclusively on scripted built-in bots, which the abstract acknowledges are susceptible to repeated actions. This creates a risk that the observed preference for high frame skips and repeatable patterns simply reflects optimization against a narrow, non-adaptive opponent rather than genuine dynamic timing learning, undermining the robustness conclusions.

minor comments (2)

Abstract: The qualifier 'in most cases' is imprecise; specify the exact frame-skip values, number of runs, or conditions under which high skips were optimal.
Introduction: Consider adding a short reference to prior RL work on variable action durations or hierarchical policies to better situate the contribution.

Simulated Author's Rebuttal

2 responses · 0 unresolved

We thank the referee for their constructive feedback on our manuscript. We address each major comment point by point below, indicating where revisions will be made.

read point-by-point responses

Referee: Abstract: The abstract states that experiments were performed and reports qualitative outcomes, but provides no quantitative results, statistical tests, ablation details, or error bars; therefore the data support for the stated claims cannot be verified from the given text.

Authors: We agree that the abstract would benefit from the inclusion of quantitative results to support the claims. In the revised manuscript, we will update the abstract to include key quantitative findings from our experiments, such as specific performance metrics comparing learned action durations to fixed frame skips. revision: yes
Referee: Experiments section: The evaluation relies exclusively on scripted built-in bots, which the abstract acknowledges are susceptible to repeated actions. This creates a risk that the observed preference for high frame skips and repeatable patterns simply reflects optimization against a narrow, non-adaptive opponent rather than genuine dynamic timing learning, undermining the robustness conclusions.

Authors: We acknowledge this limitation in our evaluation setup. The use of scripted bots was to facilitate systematic and reproducible testing within the FightLadder environment. The manuscript already points out the bots' vulnerability to repeatable patterns. We will expand the discussion to highlight this as a current limitation and suggest directions for future work with more robust opponents to strengthen the robustness claims. revision: partial

Circularity Check

0 steps flagged

No circularity: purely empirical RL evaluation

full rationale

The paper reports experimental results from training RL agents to jointly predict actions and durations in the FightLadder environment, comparing them to fixed frame-skip baselines against scripted bots. No mathematical derivations, equations, or first-principles claims are present that could reduce outputs to inputs by construction. All performance, repeatability, and robustness observations are direct empirical measurements rather than fitted parameters renamed as predictions or self-referential definitions. The work is therefore self-contained with no load-bearing circular steps.

Axiom & Free-Parameter Ledger

0 free parameters · 2 axioms · 0 invented entities

The central empirical claim rests on the assumption that the FightLadder simulator faithfully reproduces fighting-game dynamics and that scripted bots provide a meaningful benchmark for both performance and robustness.

axioms (2)

domain assumption The FightLadder environment accurately models the real-time dynamics and state transitions of fighting games for the purpose of RL training.
All training and evaluation occurs inside this simulator.
domain assumption Performance against scripted built-in bots is a valid proxy for general agent capability and robustness.
The paper reports results exclusively from matches against these bots.

pith-pipeline@v0.9.0 · 5781 in / 1274 out tokens · 42245 ms · 2026-05-21T04:33:32.215062+00:00 · methodology

discussion (0)

Reference graph

Works this paper leans on

300 extracted references · 300 canonical work pages · 1 internal anchor

[1]

arXiv preprint arXiv:2406.02081 , year=

FightLadder: A benchmark for competitive multi-agent reinforcement learning , author=. arXiv preprint arXiv:2406.02081 , year=

work page arXiv
[2]

Journal of machine learning research , volume=

Stable-baselines3: Reliable reinforcement learning implementations , author=. Journal of machine learning research , volume=

work page
[3]

Proximal Policy Optimization Algorithms

Proximal policy optimization algorithms , author=. arXiv preprint arXiv:1707.06347 , year=

work page internal anchor Pith review Pith/arXiv arXiv
[4]

nature , volume=

Human-level control through deep reinforcement learning , author=. nature , volume=. 2015 , publisher=

work page 2015
[5]

nature , volume=

Mastering the game of Go with deep neural networks and tree search , author=. nature , volume=. 2016 , publisher=

work page 2016
[6]

Science , volume=

A general reinforcement learning algorithm that masters chess, shogi, and Go through self-play , author=. Science , volume=. 2018 , publisher=

work page 2018
[7]

arXiv preprint arXiv:2102.03718 , year=

An analysis of frame-skipping in reinforcement learning , author=. arXiv preprint arXiv:2102.03718 , year=

work page arXiv
[8]

1998 , publisher=

Reinforcement learning: An introduction , author=. 1998 , publisher=

work page 1998
[9]

Abbas and F

M. Abbas and F. A. Jam and T. I. Khan. Is it harmful or helpful? Examining the causes and consequences of generative AI usage among university students. International Journal of Educational Technology in Higher Education. 2024

work page 2024
[10]

Abid and A

A. Abid and A. Abdalla and A. Abid and D. Khan and A. Alfozan and James Zou. Gradio: Hassle-Free Sharing and Testing of ML Models in the Wild. 2019 ICML Workshop on Human in the Loop Learning. 2019

work page 2019
[11]

Abramson

B. Abramson. Expected-outcome: a general model of static evaluation. IEEE Transactions on Pattern Analysis and Machine Intelligence. 1990

work page 1990
[12]

Achtibat and M

R. Achtibat and M. Dreyer and I. Eisenbraun and S. Bosse and T. Wiegand and W. Samek and S. Lapuschkin. From Attribution Maps to Humand-understandable Explanations through Concept Relevance Propagation. Nature Machine Intelligence. 2023

work page 2023
[13]

Afshar and W

A. Afshar and W. Li. DeLF : Designing Learning Environments with Foundation Models. AAAI 2024 Workshop on Synergy of Reinforcement Learning and Large Language Models. 2024

work page 2024
[14]

Advances in Neural Information Processing Systems , year=

Deep Reinforcement Learning at the Edge of the Statistical Precipice , author=. Advances in Neural Information Processing Systems , year=

work page
[15]

Agarwal and M

R. Agarwal and M. Schwarzer and P. S. Castro and A. C. Courville and M. Bellemare , booktitle =. Reincarnating Reinforcement Learning: Reusing Prior Computation to Accelerate Progress , volume =

work page
[16]

Ahmadian and C

A. Ahmadian and C. Cremer and M. Gall \'e and M. Fadee and J. Kreutzer and O. Pietquin and A. \"U st \"u n and S. Hooker. Back to Basics: Revisiting REINFORCE -Style Optimization for Learning from Human Feedback in LLM s. Proceedings of the 62nd Annual Meeting of the Association for Computational Linguistics (Volume 1: Long Papers). 2024

work page 2024
[17]

Ahmed and N

Z. Ahmed and N. Le Roux and M. Norouzi and D. Schuurmans. Understanding the Impact of Entropy on Policy Optimization. Proceedings of the 36th International Conference on Machine Learning. 2019

work page 2019
[18]

Ahn and D

M. Ahn and D. Dwibedi and C. Finn and M. Gonzalez Arenas and K. Gopalakrishnan and K. Hausman and B. Ichter and A. Irpan and N. Joshi and R. Julian and S. Kirmani and I. Leal and E. Lee and S. Levine and Y. Lu and S. Maddineni and K. Rao and D. Sadigh and P. Sanketi and P. Sermanet and Q. Vuong and S. Welker and F. Xia and T. Xiao and P. Xu and S. Xu and ...

work page 2024
[19]

2007 , publisher=

Lessons in Play: An Introduction to Combinatorial Game Theory , author=. 2007 , publisher=

work page 2007
[20]

Alharin and T.-N

A. Alharin and T.-N. Doan and M. Sartipi. Reinforcement Learning Interpretation Methods: A Survey. IEEE Access. 2020

work page 2020
[21]

Aljalbout and N

E. Aljalbout and N. Sotirakis and P. van der Smagt and M. Karl and N. Chen. LIMT : Language-Informed Multi-Task Visual World Models. 2024

work page 2024
[22]

Allen and K

C. Allen and K. Asadi and M. Roderick and A. Mohamed and G. Konidaris and M. Littman. Mean Actor Critic. 2018

work page 2018
[23]

L. V. Allis and M. van der Meulen and H. J. van den Herik. Proof-number Search. Artificial Intelligence. 1994

work page 1994
[24]

Alth\"ofer, Ingo , Institution =

work page
[25]

Proceedings of the SIGCHI Conference on Human Factors in Computing Systems , pages =

Andersen, Erik and O'Rourke, Eleanor and Liu, Yun-En and Snider, Rich and Lowdermilk, Jeff and Truong, David and Cooper, Seth and Popovic, Zoran , title =. Proceedings of the SIGCHI Conference on Human Factors in Computing Systems , pages =. 2012 , publisher =

work page 2012
[26]

Andreas and D

J. Andreas and D. Klein and S. Levine. Modular Multitask Reinforcement Learning with Policy Sketches. Proceedings of the 34th International Conference on Machine Learning. 2017

work page 2017
[27]

Andrychowicz and A

M. Andrychowicz and A. Raichuk and P. Sta \'n' czyk and M. Orsini and S. Girgin and R. Marinier and L. Hussenot and M. Geist and O. Pietquin and M. Michalski and S. Gelly and O. Bachem. What Matters for On-Policy Deep Actor-Critic Methods? A Large-Scale Study. 2021 International Conference on Learning Representations. 2021

work page 2021
[28]

Anthony and Z

T. Anthony and Z. Tian and D. Barber. Thinking Fast and Slow with Deep Learning and Tree Search. Advances in Neural Information Processing Systems 30. 2017

work page 2017
[29]

Apeldoorn and V

D. Apeldoorn and V. Volz. Measuring Strategic Depth in Games Using Hierarchical Knowledge Bases. Proceedings of the 2017 IEEE Conference on Computational Intelligence and Games. 2017

work page 2017
[30]

Araki and K

N. Araki and K. Yoshida and Y. Tsuruoka and J. Tsujii. Move Prediction in G o with the Maximum Entropy Method. Proceedings of the 2007 IEEE Symposium on Computational Intelligence and Games. 2007

work page 2007
[31]

Aram and G

M. Aram and G. Neumann. Multilayered analysis of co-development of business information systems. Journal of Internet Services and Applications. 2015

work page 2015
[32]

M. Ascher. M u T orere: An analysis of a M aori game. Mathematics Magazine. 1987

work page 1987
[33]

COLT 2010 - The 23rd Conference on Learning Theory , pages=

Best Arm Identification in Multi-Armed Bandits , author=. COLT 2010 - The 23rd Conference on Learning Theory , pages=

work page 2010
[34]

Auer and N

P. Auer and N. Cesa-Bianchi and P. Fischer. Finite-time Analysis of the Multiarmed Bandit Problem. Machine Learning. 2002

work page 2002
[35]

Back and D

T. Back and D. B. Fogel and Z. Michalewicz , title =. 1997 , publisher =

work page 1997
[36]

Banerjee and P

B. Banerjee and P. Stone , title =. 2007

work page 2007
[37]

Baier and P

H. Baier and P. D. Drake. The Power of Forgetting: Improving the Last-Good-Reply Policy in M onte C arlo G o. IEEE Transactions on Computational Intelligence and AI in Games. 2010

work page 2010
[38]

Baier and P

H. Baier and P. D. Drake. The Power of Forgetting: Improving the Last-Good-Reply Policy in M onte C arlo G o. IEEE Trans. Comput. Intell. AI Games. 2010

work page 2010
[39]

Baier and M

H. Baier and M. H. M. Winands. Monte C arlo Tree Search and Minimax Hybrids with Heuristic Evaluation Functions. Computer Games. 2014

work page 2014
[40]

Baier and M

H. Baier and M. H. M. Winands. MCTS-M inimax Hybrids. IEEE Transactions on Computational Intelligence and AI in Games. 2015

work page 2015
[41]

Baier and M

H. Baier and M. H. M. Winands. Time Management for M onte C arlo Tree Search. IEEE Transactions on Computational Intelligence and AI in Games. 2015

work page 2015
[42]

Baier and M

H. Baier and M. H. M. Winands. MCTS-M inimax Hybrids with State Evaluations. Journal of Artificial Intelligence Research. 2018

work page 2018
[43]

Baier and M

H. Baier and M. Kaisers. Explainable Search. 2020 IJCAI-PRICAI Workshop on Explainable Artificial Intelligence. 2020

work page 2020
[44]

Baier and M

H. Baier and M. Kaisers. Towards Explainable MCTS. 2021 AAAI Workshop on Explainable Agency in AI. 2021

work page 2021
[45]

Bamford and S

C. Bamford and S. Huang and S. Lucas , year=. Griddly: A platform for

work page
[46]

Bansal and J

T. Bansal and J. Pachocki and S. Sidor and I. Sutskever and I. Mordatch. Emergent Complexity via Multi-Agent Competition. International Conference on Learning Representations (ICLR 2018). 2018

work page 2018
[47]

Bard and J

N. Bard and J. N. Foerster and S. Chandar and N. Burch and M. Lanctot and H. F. Song and E. Parisotto and V. Dumoulin and S. Moitra and E. Hughes and I. Dunning and S. Mourad and H. Larochelle and M. G. Bellemare and M. Bowling. The H anabi challenge: A new frontier for AI research. Artificial Intelligence. 2020

work page 2020
[48]

Barman and Z

D. Barman and Z. Guo and O. Conlan. The Dark Side of Language Models: Exploring the Potential of LLM s in Multimedia Disinformation Generation and Dissemination. Machine Learning with Applications. 2024

work page 2024
[49]

Bartz-Beielstein and C

T. Bartz-Beielstein and C. Doerr and D. van den Berg and J. Bossek and S. Chandrasekaran and T. Eftimov and A. Fischbach and P. Kerschke and W. La Cava and M. L \'o pez-Ib \'a \ n ez and K. M. Malan and J. H. Moore and B. Naujoks and P. Orzechowski and V. Volz and M. Wagner and T. Weise. Benchmarking in Optimization: Best Practice and Open Issues. 2020

work page 2020
[50]

Beal and M

D. Beal and M. Clarke. The Construction of Economical and Correct Algorithms for King and Pawn against King. Advances in Computer Chess 2. 1980

work page 1980
[51]

Beal , journal=

D. Beal , journal=. 1990 , volume=

work page 1990
[52]

Beck and R

J. Beck and R. Vuorio and E. Z. Liu and Z. Xiong and L. Zintgraf and C. Finn and S. Whiteson. A Survey of Meta-Reinforcement Learning. 2023

work page 2023
[53]

Journal of Artificial Intelligence Research , volume = 47, number = 1, pages =

The Arcade Learning Environment: An Evaluation Platform for General Agents , author =. Journal of Artificial Intelligence Research , volume = 47, number = 1, pages =

work page
[54]

M. G. Bellemare and S. Candido and P. S. Castro and J. Gong and M. C. Machado and S. Moitra and S. S. Ponda and Z. Wang. Autonomous navigation of stratospheric balloons using reinforcement learning. Nature. 2020

work page 2020
[55]

R. Bellman. Dynamic Programming. 1957

work page 1957
[56]

R. Bellman. An Introduction to Artificial Intelligence: Can Computers Think?. 1978

work page 1978
[57]

Bettini and R

M. Bettini and R. Kortvelesy and J. Blumenkamp and A. Prorok , year =. Proceedings of the 16th International Symposium on Distributed Autonomous Robotic Systems , publisher =

work page
[58]

Benjamins and T

C. Benjamins and T. Eimer and F. Schubert and A. Mohan and S. D \"o hler and A. Biedenkapp and B. Rosenhahn and F. Hutter and M. Lindauer. Contextualize Me – The Case for Context in Reinforcement Learning. Transactions on Machine Learning Research. 2023

work page 2023
[59]

S. G. Bennett. The Adventures of S ir G alahad. 1949

work page 1949
[60]

Beyer and P

L. Beyer and P. Izmailov and A. Kolesnikov and M. Caron and S. Kornblith and X. Zhai and M. Minderer and M. Tschannen and I. Alabdulmohsin and F. Pavetic. FlexiViT : One Model for All Patch Sizes. Proceedings of the 2023 Conference on Computer Vision and Pattern Recognition (CVPR). 2023

work page 2023
[61]

IEEE Transactions on Games

CadiaPlayer: A simulation-based general game player , author =. IEEE Transactions on Games

work page
[62]

Proceedings of the Twentieth European Conference on Artificial Intelligence , pages =

Learning Rules of Simplified Boardgames by Observing , author =. Proceedings of the Twentieth European Conference on Artificial Intelligence , pages =

work page
[63]

Handbook of Digital Games and Entertainment Technologies , publisher =

General Game Playing , author =. Handbook of Digital Games and Entertainment Technologies , publisher =

work page
[64]

Blili-Hamelin and C

B. Blili-Hamelin and C. Graziul and L. Hancox-Li and H. Hazan and E.-M. El-Mhamdi and A. Ghosh and K. Heller and J. Metcalf and F. Murai and E. Salvaggio and A. Smart and T. Snider and M. Tighanimine and T. Ringer and M. Mitchell and S. Dori-Hacohen. Position: Stop treating ` AGI ' as the north-star goal of AI research. Proceedings of the 42nd Internation...

work page 2025
[65]

Blattmann and T

A. Blattmann and T. Dockhorn and S. Kulal and D. Mendelevitch and M. Kilian and D. Lorenz and Y. Levi and Z. English anda V. Voleti and A. Letts and V. Jampani and R. Rombach. Stable Video Diffusion: Scaling Latent Video Diffusion Models to Large Datasets. 2023

work page 2023
[66]

J. Bloch. Effective Java. 2008

work page 2008
[67]

2021 , howpublished=

On the Opportunities and Risks of Foundation Models , author=. 2021 , howpublished=

work page 2021
[68]

Bonnet and D

C. Bonnet and D. Luo and D. Byrne and S. Surana and S. Abramowitz and P. Duckworth and V. Coyette and L. I. Midgley and E. Tegegn and T. Kalloniatis and O. Mahjoub and M. Macfarlane and A. P. Smit and N. Grinsztajn and R. Bolge and C. N. Waters and M. A. Mimouni and U. A. Mbou Sob and R. de Kock and S. Singh and D. Furelos-Blanco and V. Le and A. Pretoriu...

work page
[69]

AIIDE , year=

Matching Games and Algorithms for General Video Game Playing , author=. AIIDE , year=

work page
[70]

A. Borvo. Anatomie D'un Jeu de Cartes: L'Aluette ou le Jeu de Vache. 1977

work page 1977
[71]

Bou Ammar

H. Bou Ammar. Automated Transfer in Reinforcement Learning. 2013

work page 2013
[72]

Bou Ammar and E

H. Bou Ammar and E. Eaton and P. Ruvolo and M. E. Taylor. Online Multi-Task Learning for Policy Gradient Methods. Proceedings of the 31st International Conference on Machine Learning. 2014

work page 2014
[73]

Bou Ammar and E

H. Bou Ammar and E. Eaton and M. E. Taylor and D. C. Mocanu and K. Driessens and G. Weiss and K. Tuyls. An Automated Measure of MDP Similarity for Transfer in Reinforcement Learning. Proceedings of the Interactive Systems Workshop at the American Association of Artificial Intelligence (AAAI). 2014

work page 2014
[74]

Bou Ammar and S

H. Bou Ammar and S. Chen and K. Tuyls and G. Weiss. Automated Transfer for Reinforcement Learning Tasks. K \"u nstliche Intelligenz. 2014

work page 2014
[75]

Accounting for Variance in Machine Learning Benchmarks , volume =

Bouthillier, Xavier and Delaunay, Pierre and Bronzi, Mirko and Trofimov, Assya and Nichyporuk, Brennan and Szeto, Justin and Mohammadi Sepahvand, Nazanin and Raff, Edward and Madan, Kanika and Voleti, Vikram and Ebrahimi Kahou, Samira and Michalski, Vincent and Arbel, Tal and Pal, Chris and Varoquaux, Gael and Vincent, Pascal , booktitle =. Accounting for...

work page
[76]

Bouzy and G

B. Bouzy and G. Chaslot. Bayesian Generation and Integration of K -Nearest-Neighbor Patterns for 19x19 G o. Proceedings of the 2005 IEEE Symposium on Computational Intelligence in Games. 2005

work page 2005
[77]

B. Bouzy. Associating domain-dependent knowledge and M onte C arlo approaches within a G o program. Information Sciences. 2005

work page 2005
[78]

Heads-Up Limit Hold

Bowling, Michael and Burch, Neil and Johanson, Michael and Tammelin, Oskari , year = 2015, journal =. Heads-Up Limit Hold

work page 2015
[79]

Bradbury and R

J. Bradbury and R. Frostig and P. Hawkins and M. J. Johnson and C. Leary and D. Maclaurin and G. Necula and A. Paszke and J. Vander

work page
[80]

S. R. K. Branavan and D. Silver and R. Barzilay. Learning to Win by Reading Manuals in a M onte- C arlo Framework. Journal of Artificial Intelligence Research. 2012

work page 2012

Showing first 80 references.

[1] [1]

arXiv preprint arXiv:2406.02081 , year=

FightLadder: A benchmark for competitive multi-agent reinforcement learning , author=. arXiv preprint arXiv:2406.02081 , year=

work page arXiv

[2] [2]

Journal of machine learning research , volume=

Stable-baselines3: Reliable reinforcement learning implementations , author=. Journal of machine learning research , volume=

work page

[3] [3]

Proximal Policy Optimization Algorithms

Proximal policy optimization algorithms , author=. arXiv preprint arXiv:1707.06347 , year=

work page internal anchor Pith review Pith/arXiv arXiv

[4] [4]

nature , volume=

Human-level control through deep reinforcement learning , author=. nature , volume=. 2015 , publisher=

work page 2015

[5] [5]

nature , volume=

Mastering the game of Go with deep neural networks and tree search , author=. nature , volume=. 2016 , publisher=

work page 2016

[6] [6]

Science , volume=

A general reinforcement learning algorithm that masters chess, shogi, and Go through self-play , author=. Science , volume=. 2018 , publisher=

work page 2018

[7] [7]

arXiv preprint arXiv:2102.03718 , year=

An analysis of frame-skipping in reinforcement learning , author=. arXiv preprint arXiv:2102.03718 , year=

work page arXiv

[8] [8]

1998 , publisher=

Reinforcement learning: An introduction , author=. 1998 , publisher=

work page 1998

[9] [9]

Abbas and F

M. Abbas and F. A. Jam and T. I. Khan. Is it harmful or helpful? Examining the causes and consequences of generative AI usage among university students. International Journal of Educational Technology in Higher Education. 2024

work page 2024

[10] [10]

Abid and A

A. Abid and A. Abdalla and A. Abid and D. Khan and A. Alfozan and James Zou. Gradio: Hassle-Free Sharing and Testing of ML Models in the Wild. 2019 ICML Workshop on Human in the Loop Learning. 2019

work page 2019

[11] [11]

Abramson

B. Abramson. Expected-outcome: a general model of static evaluation. IEEE Transactions on Pattern Analysis and Machine Intelligence. 1990

work page 1990

[12] [12]

Achtibat and M

R. Achtibat and M. Dreyer and I. Eisenbraun and S. Bosse and T. Wiegand and W. Samek and S. Lapuschkin. From Attribution Maps to Humand-understandable Explanations through Concept Relevance Propagation. Nature Machine Intelligence. 2023

work page 2023

[13] [13]

Afshar and W

A. Afshar and W. Li. DeLF : Designing Learning Environments with Foundation Models. AAAI 2024 Workshop on Synergy of Reinforcement Learning and Large Language Models. 2024

work page 2024

[14] [14]

Advances in Neural Information Processing Systems , year=

Deep Reinforcement Learning at the Edge of the Statistical Precipice , author=. Advances in Neural Information Processing Systems , year=

work page

[15] [15]

Agarwal and M

R. Agarwal and M. Schwarzer and P. S. Castro and A. C. Courville and M. Bellemare , booktitle =. Reincarnating Reinforcement Learning: Reusing Prior Computation to Accelerate Progress , volume =

work page

[16] [16]

Ahmadian and C

A. Ahmadian and C. Cremer and M. Gall \'e and M. Fadee and J. Kreutzer and O. Pietquin and A. \"U st \"u n and S. Hooker. Back to Basics: Revisiting REINFORCE -Style Optimization for Learning from Human Feedback in LLM s. Proceedings of the 62nd Annual Meeting of the Association for Computational Linguistics (Volume 1: Long Papers). 2024

work page 2024

[17] [17]

Ahmed and N

Z. Ahmed and N. Le Roux and M. Norouzi and D. Schuurmans. Understanding the Impact of Entropy on Policy Optimization. Proceedings of the 36th International Conference on Machine Learning. 2019

work page 2019

[18] [18]

Ahn and D

M. Ahn and D. Dwibedi and C. Finn and M. Gonzalez Arenas and K. Gopalakrishnan and K. Hausman and B. Ichter and A. Irpan and N. Joshi and R. Julian and S. Kirmani and I. Leal and E. Lee and S. Levine and Y. Lu and S. Maddineni and K. Rao and D. Sadigh and P. Sanketi and P. Sermanet and Q. Vuong and S. Welker and F. Xia and T. Xiao and P. Xu and S. Xu and ...

work page 2024

[19] [19]

2007 , publisher=

Lessons in Play: An Introduction to Combinatorial Game Theory , author=. 2007 , publisher=

work page 2007

[20] [20]

Alharin and T.-N

A. Alharin and T.-N. Doan and M. Sartipi. Reinforcement Learning Interpretation Methods: A Survey. IEEE Access. 2020

work page 2020

[21] [21]

Aljalbout and N

E. Aljalbout and N. Sotirakis and P. van der Smagt and M. Karl and N. Chen. LIMT : Language-Informed Multi-Task Visual World Models. 2024

work page 2024

[22] [22]

Allen and K

C. Allen and K. Asadi and M. Roderick and A. Mohamed and G. Konidaris and M. Littman. Mean Actor Critic. 2018

work page 2018

[23] [23]

L. V. Allis and M. van der Meulen and H. J. van den Herik. Proof-number Search. Artificial Intelligence. 1994

work page 1994

[24] [24]

Alth\"ofer, Ingo , Institution =

work page

[25] [25]

Proceedings of the SIGCHI Conference on Human Factors in Computing Systems , pages =

Andersen, Erik and O'Rourke, Eleanor and Liu, Yun-En and Snider, Rich and Lowdermilk, Jeff and Truong, David and Cooper, Seth and Popovic, Zoran , title =. Proceedings of the SIGCHI Conference on Human Factors in Computing Systems , pages =. 2012 , publisher =

work page 2012

[26] [26]

Andreas and D

J. Andreas and D. Klein and S. Levine. Modular Multitask Reinforcement Learning with Policy Sketches. Proceedings of the 34th International Conference on Machine Learning. 2017

work page 2017

[27] [27]

Andrychowicz and A

M. Andrychowicz and A. Raichuk and P. Sta \'n' czyk and M. Orsini and S. Girgin and R. Marinier and L. Hussenot and M. Geist and O. Pietquin and M. Michalski and S. Gelly and O. Bachem. What Matters for On-Policy Deep Actor-Critic Methods? A Large-Scale Study. 2021 International Conference on Learning Representations. 2021

work page 2021

[28] [28]

Anthony and Z

T. Anthony and Z. Tian and D. Barber. Thinking Fast and Slow with Deep Learning and Tree Search. Advances in Neural Information Processing Systems 30. 2017

work page 2017

[29] [29]

Apeldoorn and V

D. Apeldoorn and V. Volz. Measuring Strategic Depth in Games Using Hierarchical Knowledge Bases. Proceedings of the 2017 IEEE Conference on Computational Intelligence and Games. 2017

work page 2017

[30] [30]

Araki and K

N. Araki and K. Yoshida and Y. Tsuruoka and J. Tsujii. Move Prediction in G o with the Maximum Entropy Method. Proceedings of the 2007 IEEE Symposium on Computational Intelligence and Games. 2007

work page 2007

[31] [31]

Aram and G

M. Aram and G. Neumann. Multilayered analysis of co-development of business information systems. Journal of Internet Services and Applications. 2015

work page 2015

[32] [32]

M. Ascher. M u T orere: An analysis of a M aori game. Mathematics Magazine. 1987

work page 1987

[33] [33]

COLT 2010 - The 23rd Conference on Learning Theory , pages=

Best Arm Identification in Multi-Armed Bandits , author=. COLT 2010 - The 23rd Conference on Learning Theory , pages=

work page 2010

[34] [34]

Auer and N

P. Auer and N. Cesa-Bianchi and P. Fischer. Finite-time Analysis of the Multiarmed Bandit Problem. Machine Learning. 2002

work page 2002

[35] [35]

Back and D

T. Back and D. B. Fogel and Z. Michalewicz , title =. 1997 , publisher =

work page 1997

[36] [36]

Banerjee and P

B. Banerjee and P. Stone , title =. 2007

work page 2007

[37] [37]

Baier and P

H. Baier and P. D. Drake. The Power of Forgetting: Improving the Last-Good-Reply Policy in M onte C arlo G o. IEEE Transactions on Computational Intelligence and AI in Games. 2010

work page 2010

[38] [38]

Baier and P

H. Baier and P. D. Drake. The Power of Forgetting: Improving the Last-Good-Reply Policy in M onte C arlo G o. IEEE Trans. Comput. Intell. AI Games. 2010

work page 2010

[39] [39]

Baier and M

H. Baier and M. H. M. Winands. Monte C arlo Tree Search and Minimax Hybrids with Heuristic Evaluation Functions. Computer Games. 2014

work page 2014

[40] [40]

Baier and M

H. Baier and M. H. M. Winands. MCTS-M inimax Hybrids. IEEE Transactions on Computational Intelligence and AI in Games. 2015

work page 2015

[41] [41]

Baier and M

H. Baier and M. H. M. Winands. Time Management for M onte C arlo Tree Search. IEEE Transactions on Computational Intelligence and AI in Games. 2015

work page 2015

[42] [42]

Baier and M

H. Baier and M. H. M. Winands. MCTS-M inimax Hybrids with State Evaluations. Journal of Artificial Intelligence Research. 2018

work page 2018

[43] [43]

Baier and M

H. Baier and M. Kaisers. Explainable Search. 2020 IJCAI-PRICAI Workshop on Explainable Artificial Intelligence. 2020

work page 2020

[44] [44]

Baier and M

H. Baier and M. Kaisers. Towards Explainable MCTS. 2021 AAAI Workshop on Explainable Agency in AI. 2021

work page 2021

[45] [45]

Bamford and S

C. Bamford and S. Huang and S. Lucas , year=. Griddly: A platform for

work page

[46] [46]

Bansal and J

T. Bansal and J. Pachocki and S. Sidor and I. Sutskever and I. Mordatch. Emergent Complexity via Multi-Agent Competition. International Conference on Learning Representations (ICLR 2018). 2018

work page 2018

[47] [47]

Bard and J

N. Bard and J. N. Foerster and S. Chandar and N. Burch and M. Lanctot and H. F. Song and E. Parisotto and V. Dumoulin and S. Moitra and E. Hughes and I. Dunning and S. Mourad and H. Larochelle and M. G. Bellemare and M. Bowling. The H anabi challenge: A new frontier for AI research. Artificial Intelligence. 2020

work page 2020

[48] [48]

Barman and Z

D. Barman and Z. Guo and O. Conlan. The Dark Side of Language Models: Exploring the Potential of LLM s in Multimedia Disinformation Generation and Dissemination. Machine Learning with Applications. 2024

work page 2024

[49] [49]

Bartz-Beielstein and C

T. Bartz-Beielstein and C. Doerr and D. van den Berg and J. Bossek and S. Chandrasekaran and T. Eftimov and A. Fischbach and P. Kerschke and W. La Cava and M. L \'o pez-Ib \'a \ n ez and K. M. Malan and J. H. Moore and B. Naujoks and P. Orzechowski and V. Volz and M. Wagner and T. Weise. Benchmarking in Optimization: Best Practice and Open Issues. 2020

work page 2020

[50] [50]

Beal and M

D. Beal and M. Clarke. The Construction of Economical and Correct Algorithms for King and Pawn against King. Advances in Computer Chess 2. 1980

work page 1980

[51] [51]

Beal , journal=

D. Beal , journal=. 1990 , volume=

work page 1990

[52] [52]

Beck and R

J. Beck and R. Vuorio and E. Z. Liu and Z. Xiong and L. Zintgraf and C. Finn and S. Whiteson. A Survey of Meta-Reinforcement Learning. 2023

work page 2023

[53] [53]

Journal of Artificial Intelligence Research , volume = 47, number = 1, pages =

The Arcade Learning Environment: An Evaluation Platform for General Agents , author =. Journal of Artificial Intelligence Research , volume = 47, number = 1, pages =

work page

[54] [54]

M. G. Bellemare and S. Candido and P. S. Castro and J. Gong and M. C. Machado and S. Moitra and S. S. Ponda and Z. Wang. Autonomous navigation of stratospheric balloons using reinforcement learning. Nature. 2020

work page 2020

[55] [55]

R. Bellman. Dynamic Programming. 1957

work page 1957

[56] [56]

R. Bellman. An Introduction to Artificial Intelligence: Can Computers Think?. 1978

work page 1978

[57] [57]

Bettini and R

M. Bettini and R. Kortvelesy and J. Blumenkamp and A. Prorok , year =. Proceedings of the 16th International Symposium on Distributed Autonomous Robotic Systems , publisher =

work page

[58] [58]

Benjamins and T

C. Benjamins and T. Eimer and F. Schubert and A. Mohan and S. D \"o hler and A. Biedenkapp and B. Rosenhahn and F. Hutter and M. Lindauer. Contextualize Me – The Case for Context in Reinforcement Learning. Transactions on Machine Learning Research. 2023

work page 2023

[59] [59]

S. G. Bennett. The Adventures of S ir G alahad. 1949

work page 1949

[60] [60]

Beyer and P

L. Beyer and P. Izmailov and A. Kolesnikov and M. Caron and S. Kornblith and X. Zhai and M. Minderer and M. Tschannen and I. Alabdulmohsin and F. Pavetic. FlexiViT : One Model for All Patch Sizes. Proceedings of the 2023 Conference on Computer Vision and Pattern Recognition (CVPR). 2023

work page 2023

[61] [61]

IEEE Transactions on Games

CadiaPlayer: A simulation-based general game player , author =. IEEE Transactions on Games

work page

[62] [62]

Proceedings of the Twentieth European Conference on Artificial Intelligence , pages =

Learning Rules of Simplified Boardgames by Observing , author =. Proceedings of the Twentieth European Conference on Artificial Intelligence , pages =

work page

[63] [63]

Handbook of Digital Games and Entertainment Technologies , publisher =

General Game Playing , author =. Handbook of Digital Games and Entertainment Technologies , publisher =

work page

[64] [64]

Blili-Hamelin and C

B. Blili-Hamelin and C. Graziul and L. Hancox-Li and H. Hazan and E.-M. El-Mhamdi and A. Ghosh and K. Heller and J. Metcalf and F. Murai and E. Salvaggio and A. Smart and T. Snider and M. Tighanimine and T. Ringer and M. Mitchell and S. Dori-Hacohen. Position: Stop treating ` AGI ' as the north-star goal of AI research. Proceedings of the 42nd Internation...

work page 2025

[65] [65]

Blattmann and T

A. Blattmann and T. Dockhorn and S. Kulal and D. Mendelevitch and M. Kilian and D. Lorenz and Y. Levi and Z. English anda V. Voleti and A. Letts and V. Jampani and R. Rombach. Stable Video Diffusion: Scaling Latent Video Diffusion Models to Large Datasets. 2023

work page 2023

[66] [66]

J. Bloch. Effective Java. 2008

work page 2008

[67] [67]

2021 , howpublished=

On the Opportunities and Risks of Foundation Models , author=. 2021 , howpublished=

work page 2021

[68] [68]

Bonnet and D

C. Bonnet and D. Luo and D. Byrne and S. Surana and S. Abramowitz and P. Duckworth and V. Coyette and L. I. Midgley and E. Tegegn and T. Kalloniatis and O. Mahjoub and M. Macfarlane and A. P. Smit and N. Grinsztajn and R. Bolge and C. N. Waters and M. A. Mimouni and U. A. Mbou Sob and R. de Kock and S. Singh and D. Furelos-Blanco and V. Le and A. Pretoriu...

work page

[69] [69]

AIIDE , year=

Matching Games and Algorithms for General Video Game Playing , author=. AIIDE , year=

work page

[70] [70]

A. Borvo. Anatomie D'un Jeu de Cartes: L'Aluette ou le Jeu de Vache. 1977

work page 1977

[71] [71]

Bou Ammar

H. Bou Ammar. Automated Transfer in Reinforcement Learning. 2013

work page 2013

[72] [72]

Bou Ammar and E

H. Bou Ammar and E. Eaton and P. Ruvolo and M. E. Taylor. Online Multi-Task Learning for Policy Gradient Methods. Proceedings of the 31st International Conference on Machine Learning. 2014

work page 2014

[73] [73]

Bou Ammar and E

H. Bou Ammar and E. Eaton and M. E. Taylor and D. C. Mocanu and K. Driessens and G. Weiss and K. Tuyls. An Automated Measure of MDP Similarity for Transfer in Reinforcement Learning. Proceedings of the Interactive Systems Workshop at the American Association of Artificial Intelligence (AAAI). 2014

work page 2014

[74] [74]

Bou Ammar and S

H. Bou Ammar and S. Chen and K. Tuyls and G. Weiss. Automated Transfer for Reinforcement Learning Tasks. K \"u nstliche Intelligenz. 2014

work page 2014

[75] [75]

Accounting for Variance in Machine Learning Benchmarks , volume =

Bouthillier, Xavier and Delaunay, Pierre and Bronzi, Mirko and Trofimov, Assya and Nichyporuk, Brennan and Szeto, Justin and Mohammadi Sepahvand, Nazanin and Raff, Edward and Madan, Kanika and Voleti, Vikram and Ebrahimi Kahou, Samira and Michalski, Vincent and Arbel, Tal and Pal, Chris and Varoquaux, Gael and Vincent, Pascal , booktitle =. Accounting for...

work page

[76] [76]

Bouzy and G

B. Bouzy and G. Chaslot. Bayesian Generation and Integration of K -Nearest-Neighbor Patterns for 19x19 G o. Proceedings of the 2005 IEEE Symposium on Computational Intelligence in Games. 2005

work page 2005

[77] [77]

B. Bouzy. Associating domain-dependent knowledge and M onte C arlo approaches within a G o program. Information Sciences. 2005

work page 2005

[78] [78]

Heads-Up Limit Hold

Bowling, Michael and Burch, Neil and Johanson, Michael and Tammelin, Oskari , year = 2015, journal =. Heads-Up Limit Hold

work page 2015

[79] [79]

Bradbury and R

J. Bradbury and R. Frostig and P. Hawkins and M. J. Johnson and C. Leary and D. Maclaurin and G. Necula and A. Paszke and J. Vander

work page

[80] [80]

S. R. K. Branavan and D. Silver and R. Barzilay. Learning to Win by Reading Manuals in a M onte- C arlo Framework. Journal of Artificial Intelligence Research. 2012

work page 2012