Recognition: unknown
Learning to Route Electric Trucks Under Operational Uncertainty
Pith reviewed 2026-05-07 11:52 UTC · model grok-4.3
The pith
Reinforcement learning can route electric truck fleets under battery limits and charger competition nearly as well as optimization methods.
A machine-rendered reading of the paper's core claim, the machinery that carries it, and where it could break.
Core claim
The authors formulate electric truck routing as an event-driven semi-Markov decision process that incorporates shared charging resources, stochastic travel and energy consumption, and nonlinear fast-charging curves; they train a reinforcement learning policy inside a matching simulation environment and report that the resulting algorithm outperforms heuristic baselines across tested fleet sizes, reaches performance close to mathematical programming benchmarks in many cases, and sustains high success rates under charging congestion and uncertainty.
What carries the argument
An event-driven semi-Markov decision process equipped with a graph-based state representation and a rule-based action mask that restricts policies to operationally admissible decisions during reinforcement learning.
If this is right
- The learned policy scales to different fleet sizes while remaining computationally practical for daily use.
- High success rates are preserved even when charging stations become congested.
- Performance stays competitive with optimization benchmarks in many tested settings.
- Training inside simulation allows the method to account for uncertainty that makes exact optimization intractable at scale.
Where Pith is reading between the lines
- Fleet operators could integrate the policy into planning software to shorten the time needed to produce feasible daily routes.
- The same simulation-plus-masking structure might be adapted for routing other electric vehicles that share charging or refueling resources.
- Adding live updates from traffic or charger status feeds could be tested to see whether robustness improves beyond the current static training.
Load-bearing premise
The event-driven simulation with its stochastic travel and energy models and nonlinear charging curves captures enough real-world uncertainty and charger competition that policies trained inside it will work when applied to actual truck fleets.
What would settle it
Deploying the learned routing policy on a real electric truck fleet and measuring substantially lower success rates, more stranded vehicles, or higher total costs than the simulation predicted would show that the approach does not transfer.
Figures
read the original abstract
Electric truck operations require routing decisions that remain feasible under limited battery range, long charging times, travel and energy consumption, and competition for shared charging infrastructure. These features make electric truck routing a coupled logistics and energy problem, limiting the practicality of heuristics-based methods and rendering them computationally infeasible at scale. This paper proposes a learning-based framework for the stochastic electric truck routing under charging constraints and operational uncertainty. The problem, solved by Reinforcement Learning, is formulated as an event-driven semi-Markov decision process with shared charging resources, stochastic travel and energy requirements, and realistic nonlinear fast-charging behavior. To support learning in this setting, a graph-based representation of system state and feasible decisions is introduced, together with a rule-based action mask that restricts policies to operationally admissible actions; thus, improving training efficiency. Building on this formulation, an event-driven simulation environment is developed that supports both Reinforcement Learning and benchmarking against heuristic and mathematical programming baselines. Computational experiments across a range of fleet sizes show that the proposed learning-based algorithm consistently outperforms baselines and attains performance close to optimization benchmarks in many settings, while preserving high success rates under charging congestion and uncertainty.
Editorial analysis
A structured set of objections, weighed in public.
Referee Report
Summary. The manuscript proposes a reinforcement learning framework for the stochastic electric truck routing problem under charging constraints and operational uncertainty. The problem is formulated as an event-driven semi-Markov decision process (SMDP) that incorporates shared charging resources, stochastic travel and energy consumption, and nonlinear fast-charging curves. A graph-based state representation together with a rule-based action mask is introduced to restrict policies to feasible actions. An event-driven simulation environment is developed to support RL training and benchmarking against heuristic and mathematical programming baselines. Computational experiments across fleet sizes indicate that the learned policy consistently outperforms the baselines, approaches optimization benchmarks in many cases, and maintains high success rates under charging congestion and uncertainty.
Significance. If the reported simulation results hold under the stated modeling assumptions, the work supplies a scalable learning-based method for a coupled logistics-energy routing problem that is computationally intractable for exact methods at realistic fleet sizes. The explicit incorporation of nonlinear charging dynamics, stochastic travel/energy models, and shared-resource competition in an event-driven SMDP framework is a constructive contribution to electric fleet operations research. The provision of a reproducible simulation environment for both RL and benchmark comparisons is a positive feature that supports further development in this area.
minor comments (3)
- The abstract asserts consistent outperformance and near-optimality but supplies no numerical values, baseline definitions, or statistical measures; adding one or two key quantitative results would improve the summary's informativeness without altering the manuscript's scope.
- In the formulation section, the precise definition of the graph nodes and edges used for the system state representation would benefit from an accompanying diagram or explicit enumeration of node types to aid reader comprehension.
- The experimental section should include a brief sensitivity table or discussion on how variations in the stochastic travel and energy parameters affect the reported performance gaps, even if only for the largest fleet size examined.
Simulated Author's Rebuttal
We thank the referee for the constructive summary of our manuscript and the recommendation for minor revision. The positive assessment of the event-driven SMDP formulation, graph-based state representation, action masking, and simulation environment is appreciated. No specific major comments were provided in the report, so we have no point-by-point responses. We will incorporate any minor editorial or clarification improvements in the revised version.
Circularity Check
No significant circularity in derivation chain
full rationale
The manuscript formulates electric truck routing as an event-driven SMDP with graph state representation, rule-based action masking, stochastic travel/energy models, and nonlinear charging, then trains an RL policy inside a custom simulation and reports comparative performance against heuristics and optimization baselines. No equations, fitted parameters, or self-citations reduce the reported success rates or cost metrics to the inputs by construction; the evaluation remains an independent empirical comparison within the explicitly described simulator. The work is therefore self-contained as a standard RL application study.
Axiom & Free-Parameter Ledger
Reference graph
Works this paper leans on
-
[1]
X. Zhang, Z. Lin, C. Crawford, S. Li, Techno-economic comparison of electrification for heavy-duty trucks in china by 2040, Transportation Research Part D: Transport and Environment 102 (2022) 103152. URL:https://www.sciencedirect.com/science/article/pii/ S1361920921004478. doi:https://doi.org/10.1016/j.trd.2021.103152
-
[2]
I. Kucukoglu, R. Dewil, D. Cattrysse, The electric vehicle routing problem and its variations: A literature review, Computers & Industrial Engineering161(2021)107650.URL:https://www.sciencedirect.com/science/article/pii/S0360835221005544.doi:https: //doi.org/10.1016/j.cie.2021.107650
-
[3]
D. Smith, B. Ozpineci, R. L. Graves, P. T. Jones, J. Lustbader, K. Kelly, K. Walkowicz, A. Birky, G. Payne, C. Sigler, et al., Medium- and Heavy-Duty Vehicle Electrification: An Assessment of Technology and Knowledge Gaps, Technical Report, Oak Ridge National Laboratory (ORNL), Oak Ridge, TN (United States), 2020. URL:https://www.osti.gov/biblio/1615213. ...
-
[4]
M. A. Bragin, Z. Ye, N. Yu, Toward efficient transportation electrification of heavy-duty trucks: Joint scheduling of truck routing and charging, Transportation Research Part C: Emerging Technologies 160 (2024) 104494. URL:https://www.sciencedirect.com/ science/article/pii/S0968090X24000159. doi:https://doi.org/10.1016/j.trc.2024.104494
-
[5]
URL:https://www.sciencedirect.com/science/article/pii/ S0968090X2500484X
A.Spinelli,D.Bezzi,O.Jabali,F.Maggioni,Astochasticelectricvehicleroutingproblemunderuncertainenergyconsumption,Transportation Research Part C: Emerging Technologies 183 (2026) 105480. URL:https://www.sciencedirect.com/science/article/pii/ S0968090X2500484X. doi:https://doi.org/10.1016/j.trc.2025.105480
-
[6]
M. Keskin, B. Çatay, G. Laporte, A simulation-based heuristic for the electric vehicle routing problem with time windows and stochastic waiting times at recharging stations, Computers & Operations Research 125 (2021) 105060. URL:https://www.sciencedirect.com/ science/article/pii/S0305054820301775. doi:https://doi.org/10.1016/j.cor.2020.105060
-
[7]
Orfanoudakis, C
S. Orfanoudakis, C. Diaz-Londono, Y. Emre Yılmaz, P. Palensky, P. P. Vergara, Ev2gym: A flexible v2g simulator for ev smart charging research and benchmarking, IEEE Transactions on Intelligent Transportation Systems 26 (2025) 2410–2421
2025
-
[8]
W. Wang, Y. Adulyasak, J.-F. Cordeau, G. He, The heterogeneous-fleet electric vehicle routing problem with nonlinear charging functions, Transportation Research Part C: Emerging Technologies 170 (2025) 104932. URL:https://www.sciencedirect.com/science/ article/pii/S0968090X24004534. doi:https://doi.org/10.1016/j.trc.2024.104932. Orfanoudakis S. et al.:Pre...
-
[9]
Z.Li,N.Aristov,A.Germain,E.R.Dugundji, Multi-stagestochasticprogrammingforheavy-dutyelectrictruckroutingunderpubliccharging congestionuncertainty,in:2025IEEEHighPerformanceExtremeComputingConference(HPEC),2025,pp.1–6.doi:10.1109/HPEC67600. 2025.11196671
-
[10]
C. Cataldo-Díaz, R. Linfati, J. W. Escobar, Mathematical models for the electric vehicle routing problem with time windows considering different aspects of the charging process, Operational Research 24 (2023) 1. URL:https://doi.org/10.1007/s12351-023-00806-5. doi:10.1007/s12351-023-00806-5
-
[11]
A. Amiri, H. Zolfagharinia, S. H. Amin, A robust multi-objective routing problem for heavy-duty electric trucks with uncertain energy consumption, Computers & Industrial Engineering 178 (2023) 109108. URL:https://www.sciencedirect.com/science/article/ pii/S0360835223001328. doi:https://doi.org/10.1016/j.cie.2023.109108
-
[12]
R.Wang,P.Keyantuo,T.Zeng,J.Sandoval,A.Vishwanath,H.Borhan,S.Moura, Robustroutingforamixedfleetofheavy-dutytruckswith pickup and delivery under energy consumption uncertainty, Applied Energy 368 (2024) 123407. URL:https://www.sciencedirect. com/science/article/pii/S0306261924007906. doi:https://doi.org/10.1016/j.apenergy.2024.123407
-
[13]
C. L. Lara, J. D. Siirola, I. E. Grossmann, Electric power infrastructure planning under uncertainty: stochastic dual dynamic integer programming (SDDiP) and parallelization scheme, Optimization and Engineering 21 (2020) 1243–1281. URL:https://doi.org/10. 1007/s11081-019-09471-0. doi:10.1007/s11081-019-09471-0
-
[14]
J. Euchi, A. Yassine, A hybrid metaheuristic algorithm to solve the electric vehicle routing problem with battery recharging stations for sustainable environmental and energy optimization, Energy Systems 14 (2023) 243–267. URL:https://doi.org/10.1007/ s12667-022-00501-y. doi:10.1007/s12667-022-00501-y
-
[15]
J. Dong, H. Wang, S. Zhang, Dynamic electric vehicle routing problem considering mid-route recharging and new demand arrival using an improvedmemeticalgorithm, SustainableEnergyTechnologiesandAssessments58(2023)103366.URL:https://www.sciencedirect. com/science/article/pii/S2213138823003594. doi:https://doi.org/10.1016/j.seta.2023.103366
-
[16]
Y.-H.Jia,Y.Mei,M.Zhang, Abilevelantcolonyoptimizationalgorithmforcapacitatedelectricvehicleroutingproblem, IEEETransactions on Cybernetics 52 (2022) 10855–10868. doi:10.1109/TCYB.2021.3069942
-
[17]
S. Orfanoudakis, V. Robu, E. M. Salazar, P. Palensky, P. P. Vergara, Scalable reinforcement learning for large-scale coordination of electric vehicles using graph neural networks, Communications Engineering 4 (2025) 118. doi:10.1038/s44172-025-00457-8
-
[18]
URL:https://openreview.net/forum?id=ByxBFsRqYm
W.Kool,H.vanHoof,M.Welling, Attention,learntosolveroutingproblems!, in:7thInternationalConferenceonLearningRepresentations, ICLR 2019, New Orleans, LA, USA, May 6-9, 2019, OpenReview.net, 2019. URL:https://openreview.net/forum?id=ByxBFsRqYm
2019
-
[19]
URL:https://openreview.net/forum?id=UXTR6ZYV1x
R.Yang,C.Fan, Neuralcombinatorialoptimizationfortimedependenttravelingsalesmanproblem, in:TheThirty-ninthAnnualConference on Neural Information Processing Systems, 2025. URL:https://openreview.net/forum?id=UXTR6ZYV1x
2025
-
[20]
M. Kim, J. Park, J. Park, Sym-nco: Leveraging symmetricity for neural combinatorial optimization, in: S. Koyejo, S. Mohamed, A. Agarwal, D. Belgrave, K. Cho, A. Oh (Eds.), Advances in Neural Information Processing Systems 35: Annual Conference on Neural Information Processing Systems 2022, NeurIPS 2022, New Orleans, LA, USA, November 28 - December 9, 2022...
2022
-
[21]
R. Basso, B. Kulcsár, I. Sanchez-Diaz, X. Qu, Dynamic stochastic electric vehicle routing with safe reinforcement learning, Transportation ResearchPartE:LogisticsandTransportationReview157(2022)102496.URL:https://www.sciencedirect.com/science/article/ pii/S1366554521002581. doi:https://doi.org/10.1016/j.tre.2021.102496
-
[22]
J. Lin, X. Wang, R. Niu, Y. He, A q-learning-based hyper-heuristic for capacitated electric vehicle routing problem, IEEE Transactions on Intelligent Transportation Systems 26 (2025) 15746–15757. doi:10.1109/TITS.2025.3594393
-
[23]
doi:https://doi.org/10.1016/j.apenergy.2023.121711
M.Tang,W.Zhuang,B.Li,H.Liu,Z.Song,G.Yin, Energy-optimalroutingforelectricvehiclesusingdeepreinforcementlearningwithtrans- former, AppliedEnergy350(2023)121711.URL:https://www.sciencedirect.com/science/article/pii/S0306261923010759. doi:https://doi.org/10.1016/j.apenergy.2023.121711
-
[24]
B.Lin,B.Ghaddar,J.Nathwani, Deepreinforcementlearningfortheelectricvehicleroutingproblemwithtimewindows, IEEETransactions on Intelligent Transportation Systems 23 (2022) 11528–11538. doi:10.1109/TITS.2021.3105232
-
[25]
T. N. Kipf, M. Welling, Semi-supervised classification with graph convolutional networks, in: 5th International Conference on Learning Representations, ICLR 2017, Toulon, France, April 24-26, 2017, Conference Track Proceedings, OpenReview.net, 2017. URL:https: //openreview.net/forum?id=SJU4ayYgl
2017
-
[26]
N. Wang, Y. Sun, H. Wang, An adaptive memetic algorithm for dynamic electric vehicle routing problem with time-varying demands, Mathematical Problems in Engineering 2021 (2021) 6635749. URL:https:// onlinelibrary.wiley.com/doi/abs/10.1155/2021/6635749. doi:https://doi.org/10.1155/2021/6635749. arXiv:https://onlinelibrary.wiley.com/doi/pdf/10.1155/2021/6635749
-
[27]
B. V. Vani, D. Kishan, M. W. Ahmad, C. R. P. Reddy, An efficient optimization algorithm for electric vehicle routing problem, IET Power Electronics 19 (2023) e12555. URL:https://ietresearch.onlinelibrary.wiley.com/doi/abs/10.1049/pel2.12555. doi:https://doi.org/10.1049/pel2.12555
- [28]
-
[29]
Y. Ma, Z. Cao, Y. M. Chee, Learning to search feasible and infeasible regions of routing problems with flexible neural k-opt, in: A. Oh, T.Naumann,A.Globerson,K.Saenko,M.Hardt,S.Levine(Eds.),AdvancesinNeuralInformationProcessingSystems36:AnnualConference on Neural Information Processing Systems 2023, NeurIPS 2023, New Orleans, LA, USA, December 10 - 16, 2...
2023
-
[30]
M. Wang, Y. Wei, X. Huang, S. Gao, An end-to-end deep reinforcement learning framework for electric vehicle routing problem, IEEE Internet of Things Journal 11 (2024) 33671–33682. doi:10.1109/JIOT.2024.3432911
-
[31]
C. Wang, R. Zhang, R. Hong, H. Wang, Attention-enhanced deep reinforcement learning for electric vehicle routing optimization, IEEE Transactions on Transportation Electrification 11 (2025) 11228–11242. doi:10.1109/TTE.2025.3574546. Orfanoudakis S. et al.:Preprint submitted to ElsevierPage 21 of 22 Learning to Route Electric Trucks Under Operational Uncertainty
-
[32]
Y. Li, S. Su, M. Zhang, Q. Liu, X. Nie, M. Xia, D. D. Micu, Multi-agent graph reinforcement learning method for electric vehicle on-route charging guidance in coupled transportation electrification, IEEE Transactions on Sustainable Energy 15 (2024) 1180–1193. doi:10.1109/TSTE.2023.3330842
-
[33]
M. Alqahtani, M. Hu, Dynamic energy scheduling and routing of multiple electric vehicles using deep reinforcement learning, En- ergy244(2022)122626.URL:https://www.sciencedirect.com/science/article/pii/S0360544221028759.doi:https://doi. org/10.1016/j.energy.2021.122626
-
[34]
URL:https://www.sciencedirect.com/science/article/pii/ S0360835225004619
D.Mogale,A.Ghadge,S.K.Jena, Modellingandoptimisingamulti-depotvehicleroutingproblemforfreightdistributioninaretaillogistics network, Computers & Industrial Engineering 207 (2025) 111315. URL:https://www.sciencedirect.com/science/article/pii/ S0360835225004619. doi:https://doi.org/10.1016/j.cie.2025.111315
-
[35]
Lombard, S
A. Lombard, S. Tamayo, F. Fontane, Modelling the time-dependent vrp through open data, in: Proceedings of the International MultiConference of Engineers and Computer Scientists, volume 2, 2018
2018
-
[36]
URL:https://www.sciencedirect.com/science/article/pii/ S0968090X18300147
F.Rodrigues,F.C.Pereira, Heteroscedasticgaussianprocessesforuncertaintymodelinginlarge-scalecrowdsourcedtrafficdata, Transporta- tion Research Part C: Emerging Technologies 95 (2018) 636–651. URL:https://www.sciencedirect.com/science/article/pii/ S0968090X18300147. doi:https://doi.org/10.1016/j.trc.2018.08.007
-
[37]
G. Brockman, V. Cheung, L. Pettersson, J. Schneider, J. Schulman, J. Tang, W. Zaremba, Openai gym, arXiv:1606.01540 (2016)
work page internal anchor Pith review arXiv 2016
-
[38]
C. B. Saner, J. Saha, D. Srinivasan, A charge curve and battery management system aware optimal charging scheduling framework for electric vehicle fast charging stations with heterogeneous customer mix, IEEE Transactions on Intelligent Transportation Systems 24 (2023) 14890–14902. doi:10.1109/TITS.2023.3303621
-
[39]
Ibe, Markov Processes for Stochastic Modeling, Elsevier, 2013
O. Ibe, Markov Processes for Stochastic Modeling, Elsevier, 2013. URL:https://doi.org/10.1016/C2012-0-06106-6
-
[40]
Towards Generalization of Graph Neural Networks for AC Optimal Power Flow
O. Arowolo, J. L. Cremer, Towards generalization of graph neural networks for ac optimal power flow, 2025. URL:https://arxiv.org/ abs/2510.06860.arXiv:2510.06860
work page internal anchor Pith review Pith/arXiv arXiv 2025
-
[41]
S. Orfanoudakis, N. K. Panda, P. Palensky, P. P. Vergara, A graph neural network enhanced decision transformer for efficient optimization in dynamic smart charging environments, Energy and AI 23 (2026) 100679. doi:10.1016/j.egyai.2026.100679
-
[42]
Proximal Policy Optimization Algorithms
J. Schulman, F. Wolski, P. Dhariwal, A. Radford, O. Klimov, Proximal policy optimization algorithms, 2017. URL:https://arxiv.org/ abs/1707.06347.arXiv:1707.06347
work page internal anchor Pith review arXiv 2017
-
[43]
R. S. Sutton, A. G. Barto, Reinforcement learning: An introduction, MIT press, 2018
2018
-
[44]
Raffin, A
A. Raffin, A. Hill, A. Gleave, A. Kanervisto, M. Ernestus, N. Dormann, Stable-baselines3: Reliable reinforcement learning implementations, Journal of Machine Learning Research 22 (2021) 1–8. URL:http://jmlr.org/papers/v22/20-1364.html
2021
-
[45]
A Closer Look at Invalid Action Masking in Policy Gradient Algorithms , booktitle =
S. Huang, S. Ontañón, A closer look at invalid action masking in policy gradient algorithms, in: R. Barták, F. Keshtkar, M. Franklin (Eds.), Proceedings of the Thirty-Fifth International Florida Artificial Intelligence Research Society Conference, FLAIRS 2022, Hutchinson Island, JensenBeach,Florida,USA,May15-18,2022,FloridaOnlineJournals,2022.URL:https://...
-
[46]
G. A. Croes, A method for solving traveling-salesman problems, Operations Research 6 (1958) 791–812. URL:http://www.jstor.org/ stable/167074. Orfanoudakis S. et al.:Preprint submitted to ElsevierPage 22 of 22
1958
discussion (0)
Sign in with ORCID, Apple, or X to comment. Anyone can read and Pith papers without signing in.