arxiv: 2603.28286 · v2 · submitted 2026-03-30 · 📡 eess.SY · cs.SY

Recognition: no theorem link

Competitor-aware Race Management for Electric Endurance Racing

Wytze de Vries , Erik van den Eshof , Jorn van Kampen , Mauro Salazar

Authors on Pith no claims yet

Pith reviewed 2026-05-14 21:56 UTC · model grok-4.3

classification 📡 eess.SY cs.SY

keywords electric racingaerodynamic interactionsgame theoryreinforcement learningenergy managementmulti-agent controlpit stops

0 comments

The pith

Exploiting aerodynamic drafting from competitors determines the winner in electric endurance races, unlike solo minimum-time strategies.

A machine-rendered reading of the paper's core claim, the machinery that carries it, and where it could break.

The paper establishes a bi-level framework that first solves multi-agent game-theoretic optimal control problems to model how cars interact aerodynamically and avoid collisions on each lap. Reinforcement learning then trains agents to manage energy and decide pit stops across the full race using this lap solver as the environment. In a simulated two-car 45-lap race, the approach shows that drafting saves energy and changes optimal strategies from those ignoring other cars. A reader would care because electric vehicles have tight energy budgets, so ignoring interactions wastes opportunities to finish ahead.

Core claim

Race-winning policies in electric endurance racing require jointly governing low-level driver inputs and high-level strategic decisions like energy management and charging through a bi-level setup. The lower level captures aerodynamic effects and asymmetric collision-avoidance constraints in a multi-agent game-theoretic optimal control problem for single laps. The upper level uses reinforcement learning on this environment to allocate battery energy and schedule pit stops over many laps, as shown in a two-agent 45-lap simulation where position-prioritizing strategies differ fundamentally from single-agent minimum-time ones.

What carries the argument

The bi-level framework combining multi-agent game-theoretic optimal control for single-lap interactions with reinforcement learning for race-long energy and pit management.

If this is right

Effective exploitation of aerodynamic interactions is decisive for race outcome.
Strategies prioritizing finishing position differ fundamentally from single-agent minimum-time approaches.
Joint governance of low-level inputs and high-level decisions like charging becomes necessary.
The single-lap multi-agent problem serves as the environment for training long-horizon policies.

Where Pith is reading between the lines

These are editorial extensions of the paper, not claims the author makes directly.

Similar bi-level methods could apply to other multi-vehicle energy systems like truck platooning.
Extending to more agents might reveal emergent cooperative or competitive behaviors.
Validating transfer from simulation to real tracks would test the framework's practical value.
Rule makers in motorsport could use such models to adjust energy limits or safety constraints.

Load-bearing premise

The simulated aerodynamic interactions and collision-avoidance constraints accurately represent real-world motorsport physics and rules so that policies transfer to actual races.

What would settle it

Running the trained policies on physical electric race cars in a multi-car endurance event and observing whether the multi-agent strategy achieves better finishing positions than single-agent alternatives under real aerodynamic conditions.

Figures

Figures reproduced from arXiv: 2603.28286 by Erik van den Eshof, Jorn van Kampen, Mauro Salazar, Wytze de Vries.

**Figure 2.** Figure 2: Free-body diagrams of the race car illustrating the longitudinal, [PITH_FULL_IMAGE:figures/full_fig_p002_2.png] view at source ↗

**Figure 3.** Figure 3: Map of the Zandvoort circuit showing the mini-sectors and the [PITH_FULL_IMAGE:figures/full_fig_p003_3.png] view at source ↗

**Figure 4.** Figure 4: Electric motor power, front braking power, velocity, time gap, [PITH_FULL_IMAGE:figures/full_fig_p006_4.png] view at source ↗

**Figure 5.** Figure 5: Driven racing lines, electric motor power, and front braking power during the overtaking maneuver through the combination of Turns 9 and 10. [PITH_FULL_IMAGE:figures/full_fig_p007_5.png] view at source ↗

**Figure 6.** Figure 6: The evolution of the battery energy, time gap and the battery energy [PITH_FULL_IMAGE:figures/full_fig_p007_6.png] view at source ↗

read the original abstract

Electric endurance racing is characterized by severe energy constraints and strong aerodynamic interactions. Determining race-winning policies therefore becomes a fundamentally multi-agent, game-theoretic problem. These policies must jointly govern low-level driver inputs as well as high-level strategic decisions, including energy management and charging. This paper proposes a bi-level framework for competitor-aware race management that combines game-theoretic optimal control with reinforcement learning. At the lower level, a multi-agent game-theoretic optimal control problem is solved to capture aerodynamic effects and asymmetric collision-avoidance constraints inspired by motorsport rules. Using this single-lap problem as the environment, reinforcement learning agents are trained to allocate battery energy and schedule pit stops over an entire race. The framework is demonstrated in a two-agent, 45-lap simulated race. The results show that effective exploitation of aerodynamic interactions is decisive for race outcome, with strategies that prioritize finishing position differing fundamentally from single-agent, minimum-time approaches.

Editorial analysis

A structured set of objections, weighed in public.

Desk editor's note, referee report, simulated authors' rebuttal, and a circularity audit. Tearing a paper down is the easy half of reading it; the pith above is the substance, this is the friction.

Desk Editor's Note private letter to a colleague

The bi-level game-RL setup for multi-agent racing is a reasonable structure but the simulation-only results leave the real-world claims unsupported.

read the letter

The main thing to know is that the paper builds a bi-level method: a game-theoretic optimal control problem at the single-lap level to handle aerodynamic drag, wake effects, and asymmetric collision rules, then uses that as the environment for RL agents that learn battery allocation and pit-stop timing over a full 45-lap race. In the two-agent simulation this produces strategies that differ from single-agent minimum-time policies and change the finishing order by exploiting drafting. That integration is the concrete new piece and it is executed cleanly enough to be worth looking at for anyone working on competitive energy management. The lower-level problem formulation looks logically sound for capturing the multi-agent interactions at the right scale. The soft spot is exactly what the stress-test note flags. All evidence stays inside the simulation with only two cars and no reported checks against wind-tunnel coefficients, real telemetry, or sensitivity to the aero parameters. The headline claim that effective exploitation of aerodynamic interactions is decisive for actual race outcomes therefore rests on an untested transfer assumption. Without that step the performance difference is true by construction inside the model but does not yet speak to real electric endurance racing. This paper is for researchers who need a working template for combining game-theoretic short-horizon control with RL for longer strategic decisions in constrained multi-agent settings. A reader already familiar with racing optimal control or multi-agent RL would get usable ideas from the framework even if they end up disagreeing with the transfer claims. It deserves peer review because the structure is coherent and the problem is well-posed; referees would likely push for validation or at least parameter sensitivity results rather than reject outright.

Referee Report

2 major / 2 minor

Summary. The paper proposes a bi-level framework for competitor-aware race management in electric endurance racing. The lower level solves a multi-agent game-theoretic optimal control problem to capture aerodynamic drag/wake effects and asymmetric collision-avoidance constraints for single-lap planning; the upper level uses reinforcement learning to optimize battery energy allocation and pit-stop scheduling over a full race. The approach is demonstrated in a two-agent 45-lap simulation, with the central claim that drafting-aware strategies differ fundamentally from single-agent minimum-time policies and that effective exploitation of aerodynamic interactions is decisive for race outcome.

Significance. If the simulation model is shown to be sufficiently representative of real aerodynamic coefficients, wake effects, and motorsport rules, the bi-level construction would offer a principled way to handle hierarchical multi-agent decisions under energy constraints. The explicit separation of single-lap game-theoretic interactions from multi-lap RL strategy is a clean architectural contribution that could be extended to other energy-limited competitive settings.

major comments (2)

[Results / Simulation Setup] The headline assertion that aerodynamic exploitation 'is decisive for race outcome' (abstract and results) rests on the fidelity of the simulated aero interactions and collision constraints; no wind-tunnel validation, telemetry comparison, or sensitivity analysis on drag/wake coefficients is reported, so the transfer from simulation ranking to real-world decisiveness remains unsupported.
[Method / Bi-level Framework] The demonstration is limited to a two-agent, 45-lap scenario; the computational cost and convergence behavior of repeatedly solving the game-theoretic OCP inside the RL loop for larger fields or longer races is not quantified, leaving open whether the framework scales to realistic endurance fields.

minor comments (2)

[Problem Formulation] Notation for the asymmetric collision constraints and the precise form of the aerodynamic interaction terms (e.g., wake velocity deficit model) should be stated explicitly rather than described only qualitatively.
[Results] The abstract states that strategies 'differ fundamentally' from single-agent minimum-time approaches; a quantitative metric (e.g., lap-time difference or energy-use delta) comparing the two policies would strengthen this claim.

Simulated Author's Rebuttal

2 responses · 0 unresolved

We thank the referee for the constructive feedback on simulation fidelity and framework scalability. We address each major comment below with targeted revisions to the manuscript.

read point-by-point responses

Referee: [Results / Simulation Setup] The headline assertion that aerodynamic exploitation 'is decisive for race outcome' (abstract and results) rests on the fidelity of the simulated aero interactions and collision constraints; no wind-tunnel validation, telemetry comparison, or sensitivity analysis on drag/wake coefficients is reported, so the transfer from simulation ranking to real-world decisiveness remains unsupported.

Authors: We agree that the lack of wind-tunnel validation or telemetry comparison means the real-world decisiveness claim is not fully supported by the current manuscript. The aerodynamic model uses coefficients drawn from published vehicle dynamics literature. In revision we will add a sensitivity analysis that varies drag-reduction and wake coefficients over a ±20% range around nominal values and show that the qualitative superiority of drafting-aware policies is preserved. We will also revise the abstract and results text to qualify the 'decisive' claim as holding within the simulated environment. revision: partial
Referee: [Method / Bi-level Framework] The demonstration is limited to a two-agent, 45-lap scenario; the computational cost and convergence behavior of repeatedly solving the game-theoretic OCP inside the RL loop for larger fields or longer races is not quantified, leaving open whether the framework scales to realistic endurance fields.

Authors: The two-agent, 45-lap case was chosen to focus exposition on the bi-level coupling. We acknowledge that computational cost and scaling behavior are not quantified. In the revised manuscript we will report average wall-clock time per lower-level OCP solve, per RL training episode, and a brief complexity discussion (noting that the number of agents enters the game-theoretic OCP size). We will also outline that for larger fields the lower level can be approximated by pairwise drafting models, which we flag as future work. revision: partial

Circularity Check

0 steps flagged

No significant circularity; framework is self-contained modeling construction

full rationale

The paper defines a bi-level architecture (game-theoretic single-lap OCP capturing aero drag/wake plus collision constraints, followed by RL for energy/pit allocation) and evaluates it inside a 2-agent 45-lap simulation. No equations, fitted parameters, or self-citations are shown that reduce the claimed outcome (drafting-aware policies differing from minimum-time policies) to a definition or input by construction. The central result is obtained by solving the stated optimization and learning problems; the transfer claim to real racing is an explicit modeling assumption rather than a derived equality. This is the normal case of an independent algorithmic proposal.

Axiom & Free-Parameter Ledger

0 free parameters · 0 axioms · 0 invented entities

Only the abstract is available, so no explicit free parameters, axioms, or invented entities can be extracted from the text.

pith-pipeline@v0.9.0 · 5462 in / 1080 out tokens · 34253 ms · 2026-05-14T21:56:26.942041+00:00 · methodology

discussion (0)

Reference graph

Works this paper leans on

38 extracted references · 38 canonical work pages · 2 internal anchors

[1]

Superb or tedious? the ver- dict on formula e’s peloton racing,

S. Smith, “Superb or tedious? the ver- dict on formula e’s peloton racing,” Sep

work page
[2]

Available: https://www.the-race.com/formula-e/ superb-or-tedious-the-verdict-on-formula-e-peloton-racing/

[Online]. Available: https://www.the-race.com/formula-e/ superb-or-tedious-the-verdict-on-formula-e-peloton-racing/

work page
[3]

Minimum curvature trajectory planning and control for an autonomous race car,

A. Heilmeier, A. Wischnewski, L. Hermansdorfer, J. Betz, M. Lienkamp, and B. Lohmann, “Minimum curvature trajectory planning and control for an autonomous race car,”Vehicle System Dynamics, 2020

work page 2020
[4]

Time-optimal control strategies for a hybrid electric race car,

S. Ebbesen, M. Salazar, P. Elbert, C. Bussi, and C. H. Onder, “Time-optimal control strategies for a hybrid electric race car,”IEEE Transactions on control systems technology, vol. 26, no. 1, pp. 233– 247, 2017

work page 2017
[5]

Time-optimal control policy for a hybrid electric race car,

M. Salazar, P. Elbert, S. Ebbesen, C. Bussi, and C. H. Onder, “Time-optimal control policy for a hybrid electric race car,”IEEE Transactions on Control Systems Technology, vol. 25, no. 6, pp. 1921– 1934, 2017

work page 1921
[6]

A convex optimization framework for minimum lap time design and control of electric race cars,

O. Borsboom, C. A. Fahdzyana, T. Hofman, and M. Salazar, “A convex optimization framework for minimum lap time design and control of electric race cars,”IEEE Transactions on Vehicular Technology, vol. 70, no. 9, pp. 8478–8489, 2021

work page 2021
[7]

Optimal energy management for formula-e cars with regulatory limits and thermal constraints,

X. Liu, A. Fotouhi, and D. J. Auger, “Optimal energy management for formula-e cars with regulatory limits and thermal constraints,” Applied Energy, vol. 279, p. 115805, 2020. [Online]. Available: https: //www.sciencedirect.com/science/article/pii/S0306261920312861

work page 2020
[8]

D. J. Limebeer and M. Massaro,Dynamics and optimal control of road vehicles. Oxford University Press, 2018

work page 2018
[9]

Interaction- aware game-theoretic motion planning for automated vehicles using bi-level optimization,

C. Burger, J. Fischer, F. Bieder, ¨O. S ¸. Tas ¸, and C. Stiller, “Interaction- aware game-theoretic motion planning for automated vehicles using bi-level optimization,” in2022 IEEE 25th International Conference on Intelligent Transportation Systems (ITSC). IEEE, 2022, pp. 3978– 3985

work page 2022
[10]

A generalized nash equilibrium approach for optimal control problems of autonomous cars,

A. Dreves and M. Gerdts, “A generalized nash equilibrium approach for optimal control problems of autonomous cars,”Optimal Control Applications and Methods, vol. 39, no. 1, pp. 326–342, 2018

work page 2018
[11]

Game theory in formula 1: Multi-agent physical and strategical interactions,

G. Fieni, M.-P. Neumann, F. Furia, A. Caucino, A. Cerofolini, V . Ravaglioli, and C. H. Onder, “Game theory in formula 1: Multi-agent physical and strategical interactions,”arXiv preprint arXiv:2503.05421, 2025

work page arXiv 2025
[12]

Game- theoretic planning for self-driving cars in multivehicle competitive scenarios,

M. Wang, Z. Wang, J. Talbot, J. C. Gerdes, and M. Schwager, “Game- theoretic planning for self-driving cars in multivehicle competitive scenarios,”IEEE Transactions on Robotics, vol. 37, no. 4, pp. 1313– 1325, 2021

work page 2021
[13]

Outracing champion gran turismo drivers with deep reinforcement learning,

P. R. Wurman, S. Barrett, K. Kawamoto, J. MacGlashan, K. Subrama- nian, T. J. Walsh, R. Capobianco, A. Devlic, F. Eckert, F. Fuchset al., “Outracing champion gran turismo drivers with deep reinforcement learning,”Nature, vol. 602, no. 7896, pp. 223–228, 2022

work page 2022
[14]

A race simulation for strategy decisions in circuit motorsports,

A. Heilmeier, M. Graf, and M. Lienkamp, “A race simulation for strategy decisions in circuit motorsports,” in2018 21st International Conference on Intelligent Transportation Systems (ITSC). IEEE, 2018, pp. 2986–2993

work page 2018
[15]

Planning formula one race strategies using discrete-event simulation,

J. Bekker and W. Lotz, “Planning formula one race strategies using discrete-event simulation,”Journal of the Operational Research Soci- ety, vol. 60, no. 7, pp. 952–961, 2009

work page 2009
[16]

Minimum-race-time energy allocation strategies for the hybrid- electric formula 1 power unit,

P. Duhr, D. Buccheri, C. Balerna, A. Cerofolini, and C. H. On- der, “Minimum-race-time energy allocation strategies for the hybrid- electric formula 1 power unit,”IEEE Transactions on Vehicular Technology, vol. 72, no. 6, pp. 7035–7050, 2023

work page 2023
[17]

Maximum-distance race strategies for a fully electric endurance race car,

J. van Kampen, T. Herrmann, and M. Salazar, “Maximum-distance race strategies for a fully electric endurance race car,”European Journal of Control, vol. 68, p. 100679, 2022

work page 2022
[18]

Formula-e race strategy development using distributed policy gradient reinforcement learning,

X. Liu, A. Fotouhi, and D. J. Auger, “Formula-e race strategy development using distributed policy gradient reinforcement learning,” Knowledge-Based Systems, vol. 216, p. 106781, 2021

work page 2021
[19]

Model predictive control strategies for electric endurance race cars accounting for competitors’ interactions,

J. van Kampen, M. Moriggi, F. Braghin, and M. Salazar, “Model predictive control strategies for electric endurance race cars accounting for competitors’ interactions,”IEEE Control Systems Letters, 2024

work page 2024
[20]

Optimizing pit stop strategies in formula 1 with dynamic programming and game theory,

F. Aguad and C. Thraves, “Optimizing pit stop strategies in formula 1 with dynamic programming and game theory,”European Journal of Operational Research, vol. 319, no. 3, pp. 908–919, 2024

work page 2024
[21]

Roughgarden,Best-Response Dynamics

T. Roughgarden,Best-Response Dynamics. Cambridge University Press, 2016, p. 216–229

work page 2016
[22]

Solving zero-sum games through alternating projections,

I. Anagnostides and P. Penna, “Solving zero-sum games through alternating projections,”arXiv preprint arXiv:2010.00109, 2020

work page arXiv 2010
[23]

Planning in the presence of cost functions controlled by an adversary,

H. B. McMahan, G. J. Gordon, and A. Blum, “Planning in the presence of cost functions controlled by an adversary,” inProceedings of the 20th International Conference on Machine Learning (ICML-03), 2003, pp. 536–543

work page 2003
[24]

Formula-e multi-car race strategy development—a novel approach using reinforcement learning,

X. Liu, A. Fotouhi, and D. Auger, “Formula-e multi-car race strategy development—a novel approach using reinforcement learning,”IEEE Transactions on Intelligent Transportation Systems, vol. 25, no. 8, pp. 9524–9534, 2024

work page 2024
[25]

H. B. Pacejka,Tire characteristics and vehicle handling and stability. Butterworth-Heinemann Oxford, UK, 2012

work page 2012
[26]

Driving standards guidelines,

FIA, “Driving standards guidelines,” Feb 2025

work page 2025
[27]

Double oracle algorithm for computing equilibria in continuous games,

L. Adam, R. Hor ˇc´ık, T. Kasl, and T. Kroupa, “Double oracle algorithm for computing equilibria in continuous games,” inProceedings of the AAAI Conference on Artificial Intelligence, vol. 35, no. 6, 2021, pp. 5070–5077

work page 2021
[28]

S. V . Albrecht, F. Christianos, and L. Sch ¨afer,Multi-agent reinforce- ment learning: Foundations and modern approaches. MIT Press, 2024

work page 2024
[29]

Policy invariance under reward transformations: Theory and application to reward shaping,

A. Y . Ng, D. Harada, and S. Russell, “Policy invariance under reward transformations: Theory and application to reward shaping,” inIcml, vol. 99. Citeseer, 1999, pp. 278–287

work page 1999
[30]

Proximal Policy Optimization Algorithms

J. Schulman, F. Wolski, P. Dhariwal, A. Radford, and O. Klimov, “Proximal policy optimization algorithms,”arXiv preprint arXiv:1707.06347, 2017

work page internal anchor Pith review Pith/arXiv arXiv 2017
[31]

Emergent Complexity via Multi-Agent Competition

T. Bansal, J. Pachocki, S. Sidor, I. Sutskever, and I. Mordatch, “Emergent complexity via multi-agent competition,”arXiv preprint arXiv:1710.03748, 2017

work page internal anchor Pith review Pith/arXiv arXiv 2017
[32]

Overcoming policy collapse in deep reinforcement learning,

S. Dohare, Q. Lan, and A. R. Mahmood, “Overcoming policy collapse in deep reinforcement learning,” inSixteenth European Workshop on Reinforcement Learning, 2023

work page 2023
[33]

Available: https://www.inmotion.tue.nl/

[Online]. Available: https://www.inmotion.tue.nl/

work page
[34]

Competitor-aware race management for electric endurance racing - full lap,

“Competitor-aware race management for electric endurance racing - full lap,” https://youtu.be/rUjys60b0j4?si=wiI53awyPaeB2y4r

work page
[35]

Competitor-aware race management for electric endurance racing - overtake,

“Competitor-aware race management for electric endurance racing - overtake,” https://youtu.be/wwAB7gZU6tc?si=jxsC Z44Nr6gCIrv

work page
[36]

CasADi – A software framework for nonlinear optimization and optimal control,

J. A. E. Andersson, J. Gillis, G. Horn, J. B. Rawlings, and M. Diehl, “CasADi – A software framework for nonlinear optimization and optimal control,”Mathematical Programming Computation, 2018

work page 2018
[37]

On the implementation of an interior- point filter line-search algorithm for large-scale nonlinear program- ming,

A. W ¨achter and L. T. Biegler, “On the implementation of an interior- point filter line-search algorithm for large-scale nonlinear program- ming,”Mathematical programming, vol. 106, no. 1, pp. 25–57, 2006

work page 2006
[38]

Reinforcement learning toolbox version: 25.2 (r2025b),

T. M. Inc., “Reinforcement learning toolbox version: 25.2 (r2025b),” Natick, Massachusetts, United States, 2025. [Online]. Available: https://www.mathworks.com 8

work page 2025