arxiv: 2604.26566 · v1 · submitted 2026-04-29 · 📡 eess.SY · cs.LG· cs.SY

Recognition: unknown

Learning to Route Electric Trucks Under Operational Uncertainty

Chuchu Fan, Elenna Dugundji, Nikolay Aristov, Pedro P. Vergara, Ruixiao Yang, Stavros Orfanoudakis, Ziyan Li

Pith reviewed 2026-05-07 11:52 UTC · model grok-4.3

classification 📡 eess.SY cs.LGcs.SY

keywords electric truck routingreinforcement learningcharging constraintsoperational uncertaintyfleet managementsemi-Markov decision processstochastic routing

0 comments

The pith

Reinforcement learning can route electric truck fleets under battery limits and charger competition nearly as well as optimization methods.

A machine-rendered reading of the paper's core claim, the machinery that carries it, and where it could break.

This paper sets out to show that a reinforcement learning method can handle routing decisions for electric truck fleets when battery range is limited, charging takes a long time, travel and energy use are unpredictable, and chargers are shared among vehicles. It models the problem as an event-driven semi-Markov decision process, represents states with graphs, and uses rules to block invalid actions so learning stays efficient. A reader should care because exact optimization grows too slow for realistic fleet sizes while simple rule-based methods often leave trucks unable to complete routes. If the results hold, operators gain a practical way to generate feasible daily plans that succeed more often despite real variability.

Core claim

The authors formulate electric truck routing as an event-driven semi-Markov decision process that incorporates shared charging resources, stochastic travel and energy consumption, and nonlinear fast-charging curves; they train a reinforcement learning policy inside a matching simulation environment and report that the resulting algorithm outperforms heuristic baselines across tested fleet sizes, reaches performance close to mathematical programming benchmarks in many cases, and sustains high success rates under charging congestion and uncertainty.

What carries the argument

An event-driven semi-Markov decision process equipped with a graph-based state representation and a rule-based action mask that restricts policies to operationally admissible decisions during reinforcement learning.

If this is right

The learned policy scales to different fleet sizes while remaining computationally practical for daily use.
High success rates are preserved even when charging stations become congested.
Performance stays competitive with optimization benchmarks in many tested settings.
Training inside simulation allows the method to account for uncertainty that makes exact optimization intractable at scale.

Where Pith is reading between the lines

These are editorial extensions of the paper, not claims the author makes directly.

Fleet operators could integrate the policy into planning software to shorten the time needed to produce feasible daily routes.
The same simulation-plus-masking structure might be adapted for routing other electric vehicles that share charging or refueling resources.
Adding live updates from traffic or charger status feeds could be tested to see whether robustness improves beyond the current static training.

Load-bearing premise

The event-driven simulation with its stochastic travel and energy models and nonlinear charging curves captures enough real-world uncertainty and charger competition that policies trained inside it will work when applied to actual truck fleets.

What would settle it

Deploying the learned routing policy on a real electric truck fleet and measuring substantially lower success rates, more stranded vehicles, or higher total costs than the simulation predicted would show that the approach does not transfer.

Figures

Figures reproduced from arXiv: 2604.26566 by Chuchu Fan, Elenna Dugundji, Nikolay Aristov, Pedro P. Vergara, Ruixiao Yang, Stavros Orfanoudakis, Ziyan Li.

**Figure 1.** Figure 1: Comparison of the classic eVRP and the eTFRP with shared charging resources. (a) In eVRP, routing is typically planned for a single vehicle (or independently across vehicles), where charging stops are inserted to maintain energy feasibility along a prescribed tour. (b) In eTFRP, multiple trucks operate concurrently and compete for limited charging capacity, so route execution is coupled through shared stat… view at source ↗

**Figure 2.** Figure 2: Event-driven truck state machine used by the simulator. Decision steps occur when a truck becomes ready; arrivals, charging, unloading, and FCFS (first-come, first-served) queuing trigger subsequent events. 0 20 40 60 80 100 State of charge (%) 0 10 20 30 40 50 Power (kW) (a) Power profile Linear (constant power) CCCV (tapered) Taper start 0 2 4 6 8 10 Time (hours) 0 20 40 60 80 100 State of charge (%) (b)… view at source ↗

**Figure 3.** Figure 3: Charging power profile and SoC evolution under constant-power (linear) charging versus tapered CCCV fast charging. Tapering reduces effective power at high SoC and changes optimal charging durations. as 𝑏∕𝑏 taking values in [0, 1]. The CCCV charging power is inspired by [38], and is modeled as a piecewise function: 𝑃𝑐 (SoC) = ⎧ ⎪ ⎪ ⎨ ⎪ ⎪ ⎩ 𝑝𝑐 ⋅ (0.6 + 3.0 ⋅ SoC), SoC ∈ [0, 0.10], 𝑝𝑐 ⋅ ( 0.9 + 0.25 (SoC − 0… view at source ↗

**Figure 5.** Figure 5: Proposed graph-based actor–critic architecture. In (a), the heterogeneous state graph is encoded through graph feature extraction, hetero-interaction layers [40], and per-node-type mean pooling to obtain a fixed-size global state embedding, invariant to the number of nodes in the state. In (b), the critic head maps this state embedding to a statevalue estimate. In (c), the actor head processes the feasibl… view at source ↗

**Figure 6.** Figure 6: Experimental transportation network used in the case study. The figure shows the California transportation graph [9] with sampled delivery locations and charging stations overlaid on the map. Node and edge colors indicate the spatial variation in energy and travel conditions across the network. avoid implausible extremes. Energy consumption is also stochastic and correlated with traffic conditions. In the … view at source ↗

**Figure 7.** Figure 7: Comparison of episode-wise winning frequency across 200 simulation episodes for different eTFRP settings. Each bar shows the percentage of episodes in which a method produced the best-performing solution among the compared methods. setting, the proposed formulation readily extends to an arbitrary number of stops, as demonstrated in the next section. Performance is evaluated using normalized reward (relativ… view at source ↗

**Figure 8.** Figure 8: Generalization performance of a GraphPPO policy trained only on the 100T3S setting and evaluated without retraining across different eTFRP configurations, averaged over 50 random scenarios per case. Panel (a) reports the normalized reward ratio relative to Math. Opt., while panel (b) reports the win ratio against Math. Opt., defined as the percentage of scenarios in which GraphPPO achieves the better outco… view at source ↗

read the original abstract

Electric truck operations require routing decisions that remain feasible under limited battery range, long charging times, travel and energy consumption, and competition for shared charging infrastructure. These features make electric truck routing a coupled logistics and energy problem, limiting the practicality of heuristics-based methods and rendering them computationally infeasible at scale. This paper proposes a learning-based framework for the stochastic electric truck routing under charging constraints and operational uncertainty. The problem, solved by Reinforcement Learning, is formulated as an event-driven semi-Markov decision process with shared charging resources, stochastic travel and energy requirements, and realistic nonlinear fast-charging behavior. To support learning in this setting, a graph-based representation of system state and feasible decisions is introduced, together with a rule-based action mask that restricts policies to operationally admissible actions; thus, improving training efficiency. Building on this formulation, an event-driven simulation environment is developed that supports both Reinforcement Learning and benchmarking against heuristic and mathematical programming baselines. Computational experiments across a range of fleet sizes show that the proposed learning-based algorithm consistently outperforms baselines and attains performance close to optimization benchmarks in many settings, while preserving high success rates under charging congestion and uncertainty.

Editorial analysis

A structured set of objections, weighed in public.

Desk editor's note, referee report, simulated authors' rebuttal, and a circularity audit. Tearing a paper down is the easy half of reading it; the pith above is the substance, this is the friction.

Desk Editor's Note private letter to a colleague

This paper gives a workable RL setup for electric truck routing that beats baselines in simulation by using an event-driven semi-MDP with graph states and action masking, though results stay tied to how well the sim matches real operations.

read the letter

The one or two things to know: this paper sets up a reinforcement learning system for routing fleets of electric trucks that deals with battery constraints, charging delays, and uncertainty in travel and energy use, and their tests show it beats standard methods in simulation. What is new is the event-driven semi-Markov decision process combined with a graph-based state representation and a rule-based action mask that keeps the policy within operationally valid choices. They also include a nonlinear fast-charging model and stochastic elements for travel and consumption. This lets them create a simulation that supports both training and benchmarking against heuristics and math programming approaches. The paper does well in making the learning feasible for this coupled logistics-energy problem. The computational experiments across fleet sizes report consistent outperformance and near-optimality in many cases, with high success rates under congestion. The formulation looks internally consistent, and the stress on event-driven simulation helps handle the timing of charging events naturally. Soft spots are mostly around the simulation's fidelity to reality. The results depend on the chosen stochastic models for travel times and energy, plus the charging curves; if these don't capture real variability or charger competition accurately, the policies may not perform as well on the road. The claims of outperformance are relative to the baselines they chose, so the strength depends on how representative those are. No load-bearing inconsistencies appear in the setup. This work is for people in operations research or RL applied to sustainable transportation. A reader focused on decarbonizing heavy-duty fleets would pick up useful ideas on handling shared infrastructure in routing decisions. It deserves a serious referee because the problem is concrete and the method is developed enough to benefit from detailed review on the empirical validation. I would send it for peer review.

Referee Report

0 major / 3 minor

Summary. The manuscript proposes a reinforcement learning framework for the stochastic electric truck routing problem under charging constraints and operational uncertainty. The problem is formulated as an event-driven semi-Markov decision process (SMDP) that incorporates shared charging resources, stochastic travel and energy consumption, and nonlinear fast-charging curves. A graph-based state representation together with a rule-based action mask is introduced to restrict policies to feasible actions. An event-driven simulation environment is developed to support RL training and benchmarking against heuristic and mathematical programming baselines. Computational experiments across fleet sizes indicate that the learned policy consistently outperforms the baselines, approaches optimization benchmarks in many cases, and maintains high success rates under charging congestion and uncertainty.

Significance. If the reported simulation results hold under the stated modeling assumptions, the work supplies a scalable learning-based method for a coupled logistics-energy routing problem that is computationally intractable for exact methods at realistic fleet sizes. The explicit incorporation of nonlinear charging dynamics, stochastic travel/energy models, and shared-resource competition in an event-driven SMDP framework is a constructive contribution to electric fleet operations research. The provision of a reproducible simulation environment for both RL and benchmark comparisons is a positive feature that supports further development in this area.

minor comments (3)

The abstract asserts consistent outperformance and near-optimality but supplies no numerical values, baseline definitions, or statistical measures; adding one or two key quantitative results would improve the summary's informativeness without altering the manuscript's scope.
In the formulation section, the precise definition of the graph nodes and edges used for the system state representation would benefit from an accompanying diagram or explicit enumeration of node types to aid reader comprehension.
The experimental section should include a brief sensitivity table or discussion on how variations in the stochastic travel and energy parameters affect the reported performance gaps, even if only for the largest fleet size examined.

Simulated Author's Rebuttal

0 responses · 0 unresolved

We thank the referee for the constructive summary of our manuscript and the recommendation for minor revision. The positive assessment of the event-driven SMDP formulation, graph-based state representation, action masking, and simulation environment is appreciated. No specific major comments were provided in the report, so we have no point-by-point responses. We will incorporate any minor editorial or clarification improvements in the revised version.

Circularity Check

0 steps flagged

No significant circularity in derivation chain

full rationale

The manuscript formulates electric truck routing as an event-driven SMDP with graph state representation, rule-based action masking, stochastic travel/energy models, and nonlinear charging, then trains an RL policy inside a custom simulation and reports comparative performance against heuristics and optimization baselines. No equations, fitted parameters, or self-citations reduce the reported success rates or cost metrics to the inputs by construction; the evaluation remains an independent empirical comparison within the explicitly described simulator. The work is therefore self-contained as a standard RL application study.

Axiom & Free-Parameter Ledger

0 free parameters · 0 axioms · 0 invented entities

Review performed on abstract only; no explicit free parameters, axioms, or invented entities are stated. The formulation implicitly assumes standard MDP properties and simulator fidelity, but these are not enumerated.

pith-pipeline@v0.9.0 · 5524 in / 1272 out tokens · 41338 ms · 2026-05-07T11:52:19.626083+00:00 · methodology

discussion (0)

Reference graph

Works this paper leans on

46 extracted references · 36 canonical work pages · 3 internal anchors

[1]

Zhang, Z

X. Zhang, Z. Lin, C. Crawford, S. Li, Techno-economic comparison of electrification for heavy-duty trucks in china by 2040, Transportation Research Part D: Transport and Environment 102 (2022) 103152. URL:https://www.sciencedirect.com/science/article/pii/ S1361920921004478. doi:https://doi.org/10.1016/j.trd.2021.103152

work page doi:10.1016/j.trd.2021.103152 2040
[2]

Kucukoglu, R

I. Kucukoglu, R. Dewil, D. Cattrysse, The electric vehicle routing problem and its variations: A literature review, Computers & Industrial Engineering161(2021)107650.URL:https://www.sciencedirect.com/science/article/pii/S0360835221005544.doi:https: //doi.org/10.1016/j.cie.2021.107650

work page doi:10.1016/j.cie.2021.107650 2021
[3]

Smith, B

D. Smith, B. Ozpineci, R. L. Graves, P. T. Jones, J. Lustbader, K. Kelly, K. Walkowicz, A. Birky, G. Payne, C. Sigler, et al., Medium- and Heavy-Duty Vehicle Electrification: An Assessment of Technology and Knowledge Gaps, Technical Report, Oak Ridge National Laboratory (ORNL), Oak Ridge, TN (United States), 2020. URL:https://www.osti.gov/biblio/1615213. ...

work page doi:10.2172/1615213 2020
[4]

M. A. Bragin, Z. Ye, N. Yu, Toward efficient transportation electrification of heavy-duty trucks: Joint scheduling of truck routing and charging, Transportation Research Part C: Emerging Technologies 160 (2024) 104494. URL:https://www.sciencedirect.com/ science/article/pii/S0968090X24000159. doi:https://doi.org/10.1016/j.trc.2024.104494

work page doi:10.1016/j.trc.2024.104494 2024
[5]

URL:https://www.sciencedirect.com/science/article/pii/ S0968090X2500484X

A.Spinelli,D.Bezzi,O.Jabali,F.Maggioni,Astochasticelectricvehicleroutingproblemunderuncertainenergyconsumption,Transportation Research Part C: Emerging Technologies 183 (2026) 105480. URL:https://www.sciencedirect.com/science/article/pii/ S0968090X2500484X. doi:https://doi.org/10.1016/j.trc.2025.105480

work page doi:10.1016/j.trc.2025.105480 2026
[6]

Keskin, B

M. Keskin, B. Çatay, G. Laporte, A simulation-based heuristic for the electric vehicle routing problem with time windows and stochastic waiting times at recharging stations, Computers & Operations Research 125 (2021) 105060. URL:https://www.sciencedirect.com/ science/article/pii/S0305054820301775. doi:https://doi.org/10.1016/j.cor.2020.105060

work page doi:10.1016/j.cor.2020.105060 2021
[7]

Orfanoudakis, C

S. Orfanoudakis, C. Diaz-Londono, Y. Emre Yılmaz, P. Palensky, P. P. Vergara, Ev2gym: A flexible v2g simulator for ev smart charging research and benchmarking, IEEE Transactions on Intelligent Transportation Systems 26 (2025) 2410–2421

2025
[8]

W. Wang, Y. Adulyasak, J.-F. Cordeau, G. He, The heterogeneous-fleet electric vehicle routing problem with nonlinear charging functions, Transportation Research Part C: Emerging Technologies 170 (2025) 104932. URL:https://www.sciencedirect.com/science/ article/pii/S0968090X24004534. doi:https://doi.org/10.1016/j.trc.2024.104932. Orfanoudakis S. et al.:Pre...

work page doi:10.1016/j.trc.2024.104932 2025
[9]

2025.11196671

Z.Li,N.Aristov,A.Germain,E.R.Dugundji, Multi-stagestochasticprogrammingforheavy-dutyelectrictruckroutingunderpubliccharging congestionuncertainty,in:2025IEEEHighPerformanceExtremeComputingConference(HPEC),2025,pp.1–6.doi:10.1109/HPEC67600. 2025.11196671

work page doi:10.1109/hpec67600 2025
[10]

Cataldo-Díaz, R

C. Cataldo-Díaz, R. Linfati, J. W. Escobar, Mathematical models for the electric vehicle routing problem with time windows considering different aspects of the charging process, Operational Research 24 (2023) 1. URL:https://doi.org/10.1007/s12351-023-00806-5. doi:10.1007/s12351-023-00806-5

work page doi:10.1007/s12351-023-00806-5 2023
[11]

Amiri, H

A. Amiri, H. Zolfagharinia, S. H. Amin, A robust multi-objective routing problem for heavy-duty electric trucks with uncertain energy consumption, Computers & Industrial Engineering 178 (2023) 109108. URL:https://www.sciencedirect.com/science/article/ pii/S0360835223001328. doi:https://doi.org/10.1016/j.cie.2023.109108

work page doi:10.1016/j.cie.2023.109108 2023
[12]

URL:https://www.sciencedirect

R.Wang,P.Keyantuo,T.Zeng,J.Sandoval,A.Vishwanath,H.Borhan,S.Moura, Robustroutingforamixedfleetofheavy-dutytruckswith pickup and delivery under energy consumption uncertainty, Applied Energy 368 (2024) 123407. URL:https://www.sciencedirect. com/science/article/pii/S0306261924007906. doi:https://doi.org/10.1016/j.apenergy.2024.123407

work page doi:10.1016/j.apenergy.2024.123407 2024
[13]

C. L. Lara, J. D. Siirola, I. E. Grossmann, Electric power infrastructure planning under uncertainty: stochastic dual dynamic integer programming (SDDiP) and parallelization scheme, Optimization and Engineering 21 (2020) 1243–1281. URL:https://doi.org/10. 1007/s11081-019-09471-0. doi:10.1007/s11081-019-09471-0

work page doi:10.1007/s11081-019-09471-0 2020
[14]

Euchi, A

J. Euchi, A. Yassine, A hybrid metaheuristic algorithm to solve the electric vehicle routing problem with battery recharging stations for sustainable environmental and energy optimization, Energy Systems 14 (2023) 243–267. URL:https://doi.org/10.1007/ s12667-022-00501-y. doi:10.1007/s12667-022-00501-y

work page doi:10.1007/s12667-022-00501-y 2023
[15]

J. Dong, H. Wang, S. Zhang, Dynamic electric vehicle routing problem considering mid-route recharging and new demand arrival using an improvedmemeticalgorithm, SustainableEnergyTechnologiesandAssessments58(2023)103366.URL:https://www.sciencedirect. com/science/article/pii/S2213138823003594. doi:https://doi.org/10.1016/j.seta.2023.103366

work page doi:10.1016/j.seta.2023.103366 2023
[16]

doi:10.1109/TCYB.2021.3069942

Y.-H.Jia,Y.Mei,M.Zhang, Abilevelantcolonyoptimizationalgorithmforcapacitatedelectricvehicleroutingproblem, IEEETransactions on Cybernetics 52 (2022) 10855–10868. doi:10.1109/TCYB.2021.3069942

work page doi:10.1109/tcyb.2021.3069942 2022
[17]

Orfanoudakis, V

S. Orfanoudakis, V. Robu, E. M. Salazar, P. Palensky, P. P. Vergara, Scalable reinforcement learning for large-scale coordination of electric vehicles using graph neural networks, Communications Engineering 4 (2025) 118. doi:10.1038/s44172-025-00457-8

work page doi:10.1038/s44172-025-00457-8 2025
[18]

URL:https://openreview.net/forum?id=ByxBFsRqYm

W.Kool,H.vanHoof,M.Welling, Attention,learntosolveroutingproblems!, in:7thInternationalConferenceonLearningRepresentations, ICLR 2019, New Orleans, LA, USA, May 6-9, 2019, OpenReview.net, 2019. URL:https://openreview.net/forum?id=ByxBFsRqYm

2019
[19]

URL:https://openreview.net/forum?id=UXTR6ZYV1x

R.Yang,C.Fan, Neuralcombinatorialoptimizationfortimedependenttravelingsalesmanproblem, in:TheThirty-ninthAnnualConference on Neural Information Processing Systems, 2025. URL:https://openreview.net/forum?id=UXTR6ZYV1x

2025
[20]

M. Kim, J. Park, J. Park, Sym-nco: Leveraging symmetricity for neural combinatorial optimization, in: S. Koyejo, S. Mohamed, A. Agarwal, D. Belgrave, K. Cho, A. Oh (Eds.), Advances in Neural Information Processing Systems 35: Annual Conference on Neural Information Processing Systems 2022, NeurIPS 2022, New Orleans, LA, USA, November 28 - December 9, 2022...

2022
[21]

Basso, B

R. Basso, B. Kulcsár, I. Sanchez-Diaz, X. Qu, Dynamic stochastic electric vehicle routing with safe reinforcement learning, Transportation ResearchPartE:LogisticsandTransportationReview157(2022)102496.URL:https://www.sciencedirect.com/science/article/ pii/S1366554521002581. doi:https://doi.org/10.1016/j.tre.2021.102496

work page doi:10.1016/j.tre.2021.102496 2022
[22]

J. Lin, X. Wang, R. Niu, Y. He, A q-learning-based hyper-heuristic for capacitated electric vehicle routing problem, IEEE Transactions on Intelligent Transportation Systems 26 (2025) 15746–15757. doi:10.1109/TITS.2025.3594393

work page doi:10.1109/tits.2025.3594393 2025
[23]

doi:https://doi.org/10.1016/j.apenergy.2023.121711

M.Tang,W.Zhuang,B.Li,H.Liu,Z.Song,G.Yin, Energy-optimalroutingforelectricvehiclesusingdeepreinforcementlearningwithtrans- former, AppliedEnergy350(2023)121711.URL:https://www.sciencedirect.com/science/article/pii/S0306261923010759. doi:https://doi.org/10.1016/j.apenergy.2023.121711

work page doi:10.1016/j.apenergy.2023.121711 2023
[24]

doi:10.1109/TITS.2021.3105232

B.Lin,B.Ghaddar,J.Nathwani, Deepreinforcementlearningfortheelectricvehicleroutingproblemwithtimewindows, IEEETransactions on Intelligent Transportation Systems 23 (2022) 11528–11538. doi:10.1109/TITS.2021.3105232

work page doi:10.1109/tits.2021.3105232 2022
[25]

T. N. Kipf, M. Welling, Semi-supervised classification with graph convolutional networks, in: 5th International Conference on Learning Representations, ICLR 2017, Toulon, France, April 24-26, 2017, Conference Track Proceedings, OpenReview.net, 2017. URL:https: //openreview.net/forum?id=SJU4ayYgl

2017
[26]

N. Wang, Y. Sun, H. Wang, An adaptive memetic algorithm for dynamic electric vehicle routing problem with time-varying demands, Mathematical Problems in Engineering 2021 (2021) 6635749. URL:https:// onlinelibrary.wiley.com/doi/abs/10.1155/2021/6635749. doi:https://doi.org/10.1155/2021/6635749. arXiv:https://onlinelibrary.wiley.com/doi/pdf/10.1155/2021/6635749

work page doi:10.1155/2021/6635749 2021
[27]

B. V. Vani, D. Kishan, M. W. Ahmad, C. R. P. Reddy, An efficient optimization algorithm for electric vehicle routing problem, IET Power Electronics 19 (2023) e12555. URL:https://ietresearch.onlinelibrary.wiley.com/doi/abs/10.1049/pel2.12555. doi:https://doi.org/10.1049/pel2.12555

work page doi:10.1049/pel2.12555 2023
[28]

Nazari, A

M. Nazari, A. Oroojlooy, L. V. Snyder, M. Takáč, Reinforcement learning for solving the vehicle routing problem, 2018. URL:https: //arxiv.org/abs/1802.04240.arXiv:1802.04240

work page arXiv 2018
[29]

Y. Ma, Z. Cao, Y. M. Chee, Learning to search feasible and infeasible regions of routing problems with flexible neural k-opt, in: A. Oh, T.Naumann,A.Globerson,K.Saenko,M.Hardt,S.Levine(Eds.),AdvancesinNeuralInformationProcessingSystems36:AnnualConference on Neural Information Processing Systems 2023, NeurIPS 2023, New Orleans, LA, USA, December 10 - 16, 2...

2023
[30]

M. Wang, Y. Wei, X. Huang, S. Gao, An end-to-end deep reinforcement learning framework for electric vehicle routing problem, IEEE Internet of Things Journal 11 (2024) 33671–33682. doi:10.1109/JIOT.2024.3432911

work page doi:10.1109/jiot.2024.3432911 2024
[31]

C. Wang, R. Zhang, R. Hong, H. Wang, Attention-enhanced deep reinforcement learning for electric vehicle routing optimization, IEEE Transactions on Transportation Electrification 11 (2025) 11228–11242. doi:10.1109/TTE.2025.3574546. Orfanoudakis S. et al.:Preprint submitted to ElsevierPage 21 of 22 Learning to Route Electric Trucks Under Operational Uncertainty

work page doi:10.1109/tte.2025.3574546 2025
[32]

Y. Li, S. Su, M. Zhang, Q. Liu, X. Nie, M. Xia, D. D. Micu, Multi-agent graph reinforcement learning method for electric vehicle on-route charging guidance in coupled transportation electrification, IEEE Transactions on Sustainable Energy 15 (2024) 1180–1193. doi:10.1109/TSTE.2023.3330842

work page doi:10.1109/tste.2023.3330842 2024
[33]

Alqahtani, M

M. Alqahtani, M. Hu, Dynamic energy scheduling and routing of multiple electric vehicles using deep reinforcement learning, En- ergy244(2022)122626.URL:https://www.sciencedirect.com/science/article/pii/S0360544221028759.doi:https://doi. org/10.1016/j.energy.2021.122626

work page doi:10.1016/j.energy.2021.122626 2022
[34]

URL:https://www.sciencedirect.com/science/article/pii/ S0360835225004619

D.Mogale,A.Ghadge,S.K.Jena, Modellingandoptimisingamulti-depotvehicleroutingproblemforfreightdistributioninaretaillogistics network, Computers & Industrial Engineering 207 (2025) 111315. URL:https://www.sciencedirect.com/science/article/pii/ S0360835225004619. doi:https://doi.org/10.1016/j.cie.2025.111315

work page doi:10.1016/j.cie.2025.111315 2025
[35]

Lombard, S

A. Lombard, S. Tamayo, F. Fontane, Modelling the time-dependent vrp through open data, in: Proceedings of the International MultiConference of Engineers and Computer Scientists, volume 2, 2018

2018
[36]

URL:https://www.sciencedirect.com/science/article/pii/ S0968090X18300147

F.Rodrigues,F.C.Pereira, Heteroscedasticgaussianprocessesforuncertaintymodelinginlarge-scalecrowdsourcedtrafficdata, Transporta- tion Research Part C: Emerging Technologies 95 (2018) 636–651. URL:https://www.sciencedirect.com/science/article/pii/ S0968090X18300147. doi:https://doi.org/10.1016/j.trc.2018.08.007

work page doi:10.1016/j.trc.2018.08.007 2018
[37]

OpenAI Gym

G. Brockman, V. Cheung, L. Pettersson, J. Schneider, J. Schulman, J. Tang, W. Zaremba, Openai gym, arXiv:1606.01540 (2016)

work page internal anchor Pith review arXiv 2016
[38]

C. B. Saner, J. Saha, D. Srinivasan, A charge curve and battery management system aware optimal charging scheduling framework for electric vehicle fast charging stations with heterogeneous customer mix, IEEE Transactions on Intelligent Transportation Systems 24 (2023) 14890–14902. doi:10.1109/TITS.2023.3303621

work page doi:10.1109/tits.2023.3303621 2023
[39]

Ibe, Markov Processes for Stochastic Modeling, Elsevier, 2013

O. Ibe, Markov Processes for Stochastic Modeling, Elsevier, 2013. URL:https://doi.org/10.1016/C2012-0-06106-6

work page doi:10.1016/c2012-0-06106-6 2013
[40]

Towards Generalization of Graph Neural Networks for AC Optimal Power Flow

O. Arowolo, J. L. Cremer, Towards generalization of graph neural networks for ac optimal power flow, 2025. URL:https://arxiv.org/ abs/2510.06860.arXiv:2510.06860

work page internal anchor Pith review Pith/arXiv arXiv 2025
[41]

Orfanoudakis, N

S. Orfanoudakis, N. K. Panda, P. Palensky, P. P. Vergara, A graph neural network enhanced decision transformer for efficient optimization in dynamic smart charging environments, Energy and AI 23 (2026) 100679. doi:10.1016/j.egyai.2026.100679

work page doi:10.1016/j.egyai.2026.100679 2026
[42]

Proximal Policy Optimization Algorithms

J. Schulman, F. Wolski, P. Dhariwal, A. Radford, O. Klimov, Proximal policy optimization algorithms, 2017. URL:https://arxiv.org/ abs/1707.06347.arXiv:1707.06347

work page internal anchor Pith review arXiv 2017
[43]

R. S. Sutton, A. G. Barto, Reinforcement learning: An introduction, MIT press, 2018

2018
[44]

Raffin, A

A. Raffin, A. Hill, A. Gleave, A. Kanervisto, M. Ernestus, N. Dormann, Stable-baselines3: Reliable reinforcement learning implementations, Journal of Machine Learning Research 22 (2021) 1–8. URL:http://jmlr.org/papers/v22/20-1364.html

2021
[45]

A Closer Look at Invalid Action Masking in Policy Gradient Algorithms , booktitle =

S. Huang, S. Ontañón, A closer look at invalid action masking in policy gradient algorithms, in: R. Barták, F. Keshtkar, M. Franklin (Eds.), Proceedings of the Thirty-Fifth International Florida Artificial Intelligence Research Society Conference, FLAIRS 2022, Hutchinson Island, JensenBeach,Florida,USA,May15-18,2022,FloridaOnlineJournals,2022.URL:https://...

work page doi:10.32473/flairs.v35i.130584 2022
[46]

G. A. Croes, A method for solving traveling-salesman problems, Operations Research 6 (1958) 791–812. URL:http://www.jstor.org/ stable/167074. Orfanoudakis S. et al.:Preprint submitted to ElsevierPage 22 of 22

1958