Recognition: no theorem link
Competitor-aware Race Management for Electric Endurance Racing
Pith reviewed 2026-05-14 21:56 UTC · model grok-4.3
The pith
Exploiting aerodynamic drafting from competitors determines the winner in electric endurance races, unlike solo minimum-time strategies.
A machine-rendered reading of the paper's core claim, the machinery that carries it, and where it could break.
Core claim
Race-winning policies in electric endurance racing require jointly governing low-level driver inputs and high-level strategic decisions like energy management and charging through a bi-level setup. The lower level captures aerodynamic effects and asymmetric collision-avoidance constraints in a multi-agent game-theoretic optimal control problem for single laps. The upper level uses reinforcement learning on this environment to allocate battery energy and schedule pit stops over many laps, as shown in a two-agent 45-lap simulation where position-prioritizing strategies differ fundamentally from single-agent minimum-time ones.
What carries the argument
The bi-level framework combining multi-agent game-theoretic optimal control for single-lap interactions with reinforcement learning for race-long energy and pit management.
If this is right
- Effective exploitation of aerodynamic interactions is decisive for race outcome.
- Strategies prioritizing finishing position differ fundamentally from single-agent minimum-time approaches.
- Joint governance of low-level inputs and high-level decisions like charging becomes necessary.
- The single-lap multi-agent problem serves as the environment for training long-horizon policies.
Where Pith is reading between the lines
- Similar bi-level methods could apply to other multi-vehicle energy systems like truck platooning.
- Extending to more agents might reveal emergent cooperative or competitive behaviors.
- Validating transfer from simulation to real tracks would test the framework's practical value.
- Rule makers in motorsport could use such models to adjust energy limits or safety constraints.
Load-bearing premise
The simulated aerodynamic interactions and collision-avoidance constraints accurately represent real-world motorsport physics and rules so that policies transfer to actual races.
What would settle it
Running the trained policies on physical electric race cars in a multi-car endurance event and observing whether the multi-agent strategy achieves better finishing positions than single-agent alternatives under real aerodynamic conditions.
Figures
read the original abstract
Electric endurance racing is characterized by severe energy constraints and strong aerodynamic interactions. Determining race-winning policies therefore becomes a fundamentally multi-agent, game-theoretic problem. These policies must jointly govern low-level driver inputs as well as high-level strategic decisions, including energy management and charging. This paper proposes a bi-level framework for competitor-aware race management that combines game-theoretic optimal control with reinforcement learning. At the lower level, a multi-agent game-theoretic optimal control problem is solved to capture aerodynamic effects and asymmetric collision-avoidance constraints inspired by motorsport rules. Using this single-lap problem as the environment, reinforcement learning agents are trained to allocate battery energy and schedule pit stops over an entire race. The framework is demonstrated in a two-agent, 45-lap simulated race. The results show that effective exploitation of aerodynamic interactions is decisive for race outcome, with strategies that prioritize finishing position differing fundamentally from single-agent, minimum-time approaches.
Editorial analysis
A structured set of objections, weighed in public.
Referee Report
Summary. The paper proposes a bi-level framework for competitor-aware race management in electric endurance racing. The lower level solves a multi-agent game-theoretic optimal control problem to capture aerodynamic drag/wake effects and asymmetric collision-avoidance constraints for single-lap planning; the upper level uses reinforcement learning to optimize battery energy allocation and pit-stop scheduling over a full race. The approach is demonstrated in a two-agent 45-lap simulation, with the central claim that drafting-aware strategies differ fundamentally from single-agent minimum-time policies and that effective exploitation of aerodynamic interactions is decisive for race outcome.
Significance. If the simulation model is shown to be sufficiently representative of real aerodynamic coefficients, wake effects, and motorsport rules, the bi-level construction would offer a principled way to handle hierarchical multi-agent decisions under energy constraints. The explicit separation of single-lap game-theoretic interactions from multi-lap RL strategy is a clean architectural contribution that could be extended to other energy-limited competitive settings.
major comments (2)
- [Results / Simulation Setup] The headline assertion that aerodynamic exploitation 'is decisive for race outcome' (abstract and results) rests on the fidelity of the simulated aero interactions and collision constraints; no wind-tunnel validation, telemetry comparison, or sensitivity analysis on drag/wake coefficients is reported, so the transfer from simulation ranking to real-world decisiveness remains unsupported.
- [Method / Bi-level Framework] The demonstration is limited to a two-agent, 45-lap scenario; the computational cost and convergence behavior of repeatedly solving the game-theoretic OCP inside the RL loop for larger fields or longer races is not quantified, leaving open whether the framework scales to realistic endurance fields.
minor comments (2)
- [Problem Formulation] Notation for the asymmetric collision constraints and the precise form of the aerodynamic interaction terms (e.g., wake velocity deficit model) should be stated explicitly rather than described only qualitatively.
- [Results] The abstract states that strategies 'differ fundamentally' from single-agent minimum-time approaches; a quantitative metric (e.g., lap-time difference or energy-use delta) comparing the two policies would strengthen this claim.
Simulated Author's Rebuttal
We thank the referee for the constructive feedback on simulation fidelity and framework scalability. We address each major comment below with targeted revisions to the manuscript.
read point-by-point responses
-
Referee: [Results / Simulation Setup] The headline assertion that aerodynamic exploitation 'is decisive for race outcome' (abstract and results) rests on the fidelity of the simulated aero interactions and collision constraints; no wind-tunnel validation, telemetry comparison, or sensitivity analysis on drag/wake coefficients is reported, so the transfer from simulation ranking to real-world decisiveness remains unsupported.
Authors: We agree that the lack of wind-tunnel validation or telemetry comparison means the real-world decisiveness claim is not fully supported by the current manuscript. The aerodynamic model uses coefficients drawn from published vehicle dynamics literature. In revision we will add a sensitivity analysis that varies drag-reduction and wake coefficients over a ±20% range around nominal values and show that the qualitative superiority of drafting-aware policies is preserved. We will also revise the abstract and results text to qualify the 'decisive' claim as holding within the simulated environment. revision: partial
-
Referee: [Method / Bi-level Framework] The demonstration is limited to a two-agent, 45-lap scenario; the computational cost and convergence behavior of repeatedly solving the game-theoretic OCP inside the RL loop for larger fields or longer races is not quantified, leaving open whether the framework scales to realistic endurance fields.
Authors: The two-agent, 45-lap case was chosen to focus exposition on the bi-level coupling. We acknowledge that computational cost and scaling behavior are not quantified. In the revised manuscript we will report average wall-clock time per lower-level OCP solve, per RL training episode, and a brief complexity discussion (noting that the number of agents enters the game-theoretic OCP size). We will also outline that for larger fields the lower level can be approximated by pairwise drafting models, which we flag as future work. revision: partial
Circularity Check
No significant circularity; framework is self-contained modeling construction
full rationale
The paper defines a bi-level architecture (game-theoretic single-lap OCP capturing aero drag/wake plus collision constraints, followed by RL for energy/pit allocation) and evaluates it inside a 2-agent 45-lap simulation. No equations, fitted parameters, or self-citations are shown that reduce the claimed outcome (drafting-aware policies differing from minimum-time policies) to a definition or input by construction. The central result is obtained by solving the stated optimization and learning problems; the transfer claim to real racing is an explicit modeling assumption rather than a derived equality. This is the normal case of an independent algorithmic proposal.
Axiom & Free-Parameter Ledger
Reference graph
Works this paper leans on
-
[1]
Superb or tedious? the ver- dict on formula e’s peloton racing,
S. Smith, “Superb or tedious? the ver- dict on formula e’s peloton racing,” Sep
-
[2]
[Online]. Available: https://www.the-race.com/formula-e/ superb-or-tedious-the-verdict-on-formula-e-peloton-racing/
-
[3]
Minimum curvature trajectory planning and control for an autonomous race car,
A. Heilmeier, A. Wischnewski, L. Hermansdorfer, J. Betz, M. Lienkamp, and B. Lohmann, “Minimum curvature trajectory planning and control for an autonomous race car,”Vehicle System Dynamics, 2020
work page 2020
-
[4]
Time-optimal control strategies for a hybrid electric race car,
S. Ebbesen, M. Salazar, P. Elbert, C. Bussi, and C. H. Onder, “Time-optimal control strategies for a hybrid electric race car,”IEEE Transactions on control systems technology, vol. 26, no. 1, pp. 233– 247, 2017
work page 2017
-
[5]
Time-optimal control policy for a hybrid electric race car,
M. Salazar, P. Elbert, S. Ebbesen, C. Bussi, and C. H. Onder, “Time-optimal control policy for a hybrid electric race car,”IEEE Transactions on Control Systems Technology, vol. 25, no. 6, pp. 1921– 1934, 2017
work page 1921
-
[6]
A convex optimization framework for minimum lap time design and control of electric race cars,
O. Borsboom, C. A. Fahdzyana, T. Hofman, and M. Salazar, “A convex optimization framework for minimum lap time design and control of electric race cars,”IEEE Transactions on Vehicular Technology, vol. 70, no. 9, pp. 8478–8489, 2021
work page 2021
-
[7]
Optimal energy management for formula-e cars with regulatory limits and thermal constraints,
X. Liu, A. Fotouhi, and D. J. Auger, “Optimal energy management for formula-e cars with regulatory limits and thermal constraints,” Applied Energy, vol. 279, p. 115805, 2020. [Online]. Available: https: //www.sciencedirect.com/science/article/pii/S0306261920312861
work page 2020
-
[8]
D. J. Limebeer and M. Massaro,Dynamics and optimal control of road vehicles. Oxford University Press, 2018
work page 2018
-
[9]
C. Burger, J. Fischer, F. Bieder, ¨O. S ¸. Tas ¸, and C. Stiller, “Interaction- aware game-theoretic motion planning for automated vehicles using bi-level optimization,” in2022 IEEE 25th International Conference on Intelligent Transportation Systems (ITSC). IEEE, 2022, pp. 3978– 3985
work page 2022
-
[10]
A generalized nash equilibrium approach for optimal control problems of autonomous cars,
A. Dreves and M. Gerdts, “A generalized nash equilibrium approach for optimal control problems of autonomous cars,”Optimal Control Applications and Methods, vol. 39, no. 1, pp. 326–342, 2018
work page 2018
-
[11]
Game theory in formula 1: Multi-agent physical and strategical interactions,
G. Fieni, M.-P. Neumann, F. Furia, A. Caucino, A. Cerofolini, V . Ravaglioli, and C. H. Onder, “Game theory in formula 1: Multi-agent physical and strategical interactions,”arXiv preprint arXiv:2503.05421, 2025
-
[12]
Game- theoretic planning for self-driving cars in multivehicle competitive scenarios,
M. Wang, Z. Wang, J. Talbot, J. C. Gerdes, and M. Schwager, “Game- theoretic planning for self-driving cars in multivehicle competitive scenarios,”IEEE Transactions on Robotics, vol. 37, no. 4, pp. 1313– 1325, 2021
work page 2021
-
[13]
Outracing champion gran turismo drivers with deep reinforcement learning,
P. R. Wurman, S. Barrett, K. Kawamoto, J. MacGlashan, K. Subrama- nian, T. J. Walsh, R. Capobianco, A. Devlic, F. Eckert, F. Fuchset al., “Outracing champion gran turismo drivers with deep reinforcement learning,”Nature, vol. 602, no. 7896, pp. 223–228, 2022
work page 2022
-
[14]
A race simulation for strategy decisions in circuit motorsports,
A. Heilmeier, M. Graf, and M. Lienkamp, “A race simulation for strategy decisions in circuit motorsports,” in2018 21st International Conference on Intelligent Transportation Systems (ITSC). IEEE, 2018, pp. 2986–2993
work page 2018
-
[15]
Planning formula one race strategies using discrete-event simulation,
J. Bekker and W. Lotz, “Planning formula one race strategies using discrete-event simulation,”Journal of the Operational Research Soci- ety, vol. 60, no. 7, pp. 952–961, 2009
work page 2009
-
[16]
Minimum-race-time energy allocation strategies for the hybrid- electric formula 1 power unit,
P. Duhr, D. Buccheri, C. Balerna, A. Cerofolini, and C. H. On- der, “Minimum-race-time energy allocation strategies for the hybrid- electric formula 1 power unit,”IEEE Transactions on Vehicular Technology, vol. 72, no. 6, pp. 7035–7050, 2023
work page 2023
-
[17]
Maximum-distance race strategies for a fully electric endurance race car,
J. van Kampen, T. Herrmann, and M. Salazar, “Maximum-distance race strategies for a fully electric endurance race car,”European Journal of Control, vol. 68, p. 100679, 2022
work page 2022
-
[18]
Formula-e race strategy development using distributed policy gradient reinforcement learning,
X. Liu, A. Fotouhi, and D. J. Auger, “Formula-e race strategy development using distributed policy gradient reinforcement learning,” Knowledge-Based Systems, vol. 216, p. 106781, 2021
work page 2021
-
[19]
J. van Kampen, M. Moriggi, F. Braghin, and M. Salazar, “Model predictive control strategies for electric endurance race cars accounting for competitors’ interactions,”IEEE Control Systems Letters, 2024
work page 2024
-
[20]
Optimizing pit stop strategies in formula 1 with dynamic programming and game theory,
F. Aguad and C. Thraves, “Optimizing pit stop strategies in formula 1 with dynamic programming and game theory,”European Journal of Operational Research, vol. 319, no. 3, pp. 908–919, 2024
work page 2024
-
[21]
Roughgarden,Best-Response Dynamics
T. Roughgarden,Best-Response Dynamics. Cambridge University Press, 2016, p. 216–229
work page 2016
-
[22]
Solving zero-sum games through alternating projections,
I. Anagnostides and P. Penna, “Solving zero-sum games through alternating projections,”arXiv preprint arXiv:2010.00109, 2020
-
[23]
Planning in the presence of cost functions controlled by an adversary,
H. B. McMahan, G. J. Gordon, and A. Blum, “Planning in the presence of cost functions controlled by an adversary,” inProceedings of the 20th International Conference on Machine Learning (ICML-03), 2003, pp. 536–543
work page 2003
-
[24]
Formula-e multi-car race strategy development—a novel approach using reinforcement learning,
X. Liu, A. Fotouhi, and D. Auger, “Formula-e multi-car race strategy development—a novel approach using reinforcement learning,”IEEE Transactions on Intelligent Transportation Systems, vol. 25, no. 8, pp. 9524–9534, 2024
work page 2024
-
[25]
H. B. Pacejka,Tire characteristics and vehicle handling and stability. Butterworth-Heinemann Oxford, UK, 2012
work page 2012
- [26]
-
[27]
Double oracle algorithm for computing equilibria in continuous games,
L. Adam, R. Hor ˇc´ık, T. Kasl, and T. Kroupa, “Double oracle algorithm for computing equilibria in continuous games,” inProceedings of the AAAI Conference on Artificial Intelligence, vol. 35, no. 6, 2021, pp. 5070–5077
work page 2021
-
[28]
S. V . Albrecht, F. Christianos, and L. Sch ¨afer,Multi-agent reinforce- ment learning: Foundations and modern approaches. MIT Press, 2024
work page 2024
-
[29]
Policy invariance under reward transformations: Theory and application to reward shaping,
A. Y . Ng, D. Harada, and S. Russell, “Policy invariance under reward transformations: Theory and application to reward shaping,” inIcml, vol. 99. Citeseer, 1999, pp. 278–287
work page 1999
-
[30]
Proximal Policy Optimization Algorithms
J. Schulman, F. Wolski, P. Dhariwal, A. Radford, and O. Klimov, “Proximal policy optimization algorithms,”arXiv preprint arXiv:1707.06347, 2017
work page internal anchor Pith review Pith/arXiv arXiv 2017
-
[31]
Emergent Complexity via Multi-Agent Competition
T. Bansal, J. Pachocki, S. Sidor, I. Sutskever, and I. Mordatch, “Emergent complexity via multi-agent competition,”arXiv preprint arXiv:1710.03748, 2017
work page internal anchor Pith review Pith/arXiv arXiv 2017
-
[32]
Overcoming policy collapse in deep reinforcement learning,
S. Dohare, Q. Lan, and A. R. Mahmood, “Overcoming policy collapse in deep reinforcement learning,” inSixteenth European Workshop on Reinforcement Learning, 2023
work page 2023
- [33]
-
[34]
Competitor-aware race management for electric endurance racing - full lap,
“Competitor-aware race management for electric endurance racing - full lap,” https://youtu.be/rUjys60b0j4?si=wiI53awyPaeB2y4r
-
[35]
Competitor-aware race management for electric endurance racing - overtake,
“Competitor-aware race management for electric endurance racing - overtake,” https://youtu.be/wwAB7gZU6tc?si=jxsC Z44Nr6gCIrv
-
[36]
CasADi – A software framework for nonlinear optimization and optimal control,
J. A. E. Andersson, J. Gillis, G. Horn, J. B. Rawlings, and M. Diehl, “CasADi – A software framework for nonlinear optimization and optimal control,”Mathematical Programming Computation, 2018
work page 2018
-
[37]
A. W ¨achter and L. T. Biegler, “On the implementation of an interior- point filter line-search algorithm for large-scale nonlinear program- ming,”Mathematical programming, vol. 106, no. 1, pp. 25–57, 2006
work page 2006
-
[38]
Reinforcement learning toolbox version: 25.2 (r2025b),
T. M. Inc., “Reinforcement learning toolbox version: 25.2 (r2025b),” Natick, Massachusetts, United States, 2025. [Online]. Available: https://www.mathworks.com 8
work page 2025
discussion (0)
Sign in with ORCID, Apple, or X to comment. Anyone can read and Pith papers without signing in.