pith. machine review for the scientific record. sign in

arxiv: 2604.26566 · v1 · submitted 2026-04-29 · 📡 eess.SY · cs.LG· cs.SY

Recognition: unknown

Learning to Route Electric Trucks Under Operational Uncertainty

Chuchu Fan, Elenna Dugundji, Nikolay Aristov, Pedro P. Vergara, Ruixiao Yang, Stavros Orfanoudakis, Ziyan Li

Pith reviewed 2026-05-07 11:52 UTC · model grok-4.3

classification 📡 eess.SY cs.LGcs.SY
keywords electric truck routingreinforcement learningcharging constraintsoperational uncertaintyfleet managementsemi-Markov decision processstochastic routing
0
0 comments X

The pith

Reinforcement learning can route electric truck fleets under battery limits and charger competition nearly as well as optimization methods.

A machine-rendered reading of the paper's core claim, the machinery that carries it, and where it could break.

This paper sets out to show that a reinforcement learning method can handle routing decisions for electric truck fleets when battery range is limited, charging takes a long time, travel and energy use are unpredictable, and chargers are shared among vehicles. It models the problem as an event-driven semi-Markov decision process, represents states with graphs, and uses rules to block invalid actions so learning stays efficient. A reader should care because exact optimization grows too slow for realistic fleet sizes while simple rule-based methods often leave trucks unable to complete routes. If the results hold, operators gain a practical way to generate feasible daily plans that succeed more often despite real variability.

Core claim

The authors formulate electric truck routing as an event-driven semi-Markov decision process that incorporates shared charging resources, stochastic travel and energy consumption, and nonlinear fast-charging curves; they train a reinforcement learning policy inside a matching simulation environment and report that the resulting algorithm outperforms heuristic baselines across tested fleet sizes, reaches performance close to mathematical programming benchmarks in many cases, and sustains high success rates under charging congestion and uncertainty.

What carries the argument

An event-driven semi-Markov decision process equipped with a graph-based state representation and a rule-based action mask that restricts policies to operationally admissible decisions during reinforcement learning.

If this is right

  • The learned policy scales to different fleet sizes while remaining computationally practical for daily use.
  • High success rates are preserved even when charging stations become congested.
  • Performance stays competitive with optimization benchmarks in many tested settings.
  • Training inside simulation allows the method to account for uncertainty that makes exact optimization intractable at scale.

Where Pith is reading between the lines

These are editorial extensions of the paper, not claims the author makes directly.

  • Fleet operators could integrate the policy into planning software to shorten the time needed to produce feasible daily routes.
  • The same simulation-plus-masking structure might be adapted for routing other electric vehicles that share charging or refueling resources.
  • Adding live updates from traffic or charger status feeds could be tested to see whether robustness improves beyond the current static training.

Load-bearing premise

The event-driven simulation with its stochastic travel and energy models and nonlinear charging curves captures enough real-world uncertainty and charger competition that policies trained inside it will work when applied to actual truck fleets.

What would settle it

Deploying the learned routing policy on a real electric truck fleet and measuring substantially lower success rates, more stranded vehicles, or higher total costs than the simulation predicted would show that the approach does not transfer.

Figures

Figures reproduced from arXiv: 2604.26566 by Chuchu Fan, Elenna Dugundji, Nikolay Aristov, Pedro P. Vergara, Ruixiao Yang, Stavros Orfanoudakis, Ziyan Li.

Figure 1
Figure 1. Figure 1: Comparison of the classic eVRP and the eTFRP with shared charging resources. (a) In eVRP, routing is typically planned for a single vehicle (or independently across vehicles), where charging stops are inserted to maintain energy feasibility along a prescribed tour. (b) In eTFRP, multiple trucks operate concurrently and compete for limited charging capacity, so route execution is coupled through shared stat… view at source ↗
Figure 2
Figure 2. Figure 2: Event-driven truck state machine used by the simulator. Decision steps occur when a truck becomes ready; arrivals, charging, unloading, and FCFS (first-come, first-served) queuing trigger subsequent events. 0 20 40 60 80 100 State of charge (%) 0 10 20 30 40 50 Power (kW) (a) Power profile Linear (constant power) CCCV (tapered) Taper start 0 2 4 6 8 10 Time (hours) 0 20 40 60 80 100 State of charge (%) (b)… view at source ↗
Figure 3
Figure 3. Figure 3: Charging power profile and SoC evolution under constant-power (linear) charging versus tapered CCCV fast charging. Tapering reduces effective power at high SoC and changes optimal charging durations. as 𝑏∕𝑏 taking values in [0, 1]. The CCCV charging power is inspired by [38], and is modeled as a piecewise function: 𝑃𝑐 (SoC) = ⎧ ⎪ ⎪ ⎨ ⎪ ⎪ ⎩ 𝑝𝑐 ⋅ (0.6 + 3.0 ⋅ SoC), SoC ∈ [0, 0.10], 𝑝𝑐 ⋅ ( 0.9 + 0.25 (SoC − 0… view at source ↗
Figure 5
Figure 5. Figure 5: Proposed graph-based actor–critic architecture. In (a), the heterogeneous state graph is encoded through graph feature extraction, hetero-interaction layers [40], and per-node-type mean pooling to obtain a fixed-size global state embedding, invariant to the number of nodes in the state. In (b), the critic head maps this state embedding to a state￾value estimate. In (c), the actor head processes the feasibl… view at source ↗
Figure 6
Figure 6. Figure 6: Experimental transportation network used in the case study. The figure shows the California transportation graph [9] with sampled delivery locations and charging stations overlaid on the map. Node and edge colors indicate the spatial variation in energy and travel conditions across the network. avoid implausible extremes. Energy consumption is also stochastic and correlated with traffic conditions. In the … view at source ↗
Figure 7
Figure 7. Figure 7: Comparison of episode-wise winning frequency across 200 simulation episodes for different eTFRP settings. Each bar shows the percentage of episodes in which a method produced the best-performing solution among the compared methods. setting, the proposed formulation readily extends to an arbitrary number of stops, as demonstrated in the next section. Performance is evaluated using normalized reward (relativ… view at source ↗
Figure 8
Figure 8. Figure 8: Generalization performance of a GraphPPO policy trained only on the 100T3S setting and evaluated without retraining across different eTFRP configurations, averaged over 50 random scenarios per case. Panel (a) reports the normalized reward ratio relative to Math. Opt., while panel (b) reports the win ratio against Math. Opt., defined as the percentage of scenarios in which GraphPPO achieves the better outco… view at source ↗
read the original abstract

Electric truck operations require routing decisions that remain feasible under limited battery range, long charging times, travel and energy consumption, and competition for shared charging infrastructure. These features make electric truck routing a coupled logistics and energy problem, limiting the practicality of heuristics-based methods and rendering them computationally infeasible at scale. This paper proposes a learning-based framework for the stochastic electric truck routing under charging constraints and operational uncertainty. The problem, solved by Reinforcement Learning, is formulated as an event-driven semi-Markov decision process with shared charging resources, stochastic travel and energy requirements, and realistic nonlinear fast-charging behavior. To support learning in this setting, a graph-based representation of system state and feasible decisions is introduced, together with a rule-based action mask that restricts policies to operationally admissible actions; thus, improving training efficiency. Building on this formulation, an event-driven simulation environment is developed that supports both Reinforcement Learning and benchmarking against heuristic and mathematical programming baselines. Computational experiments across a range of fleet sizes show that the proposed learning-based algorithm consistently outperforms baselines and attains performance close to optimization benchmarks in many settings, while preserving high success rates under charging congestion and uncertainty.

Editorial analysis

A structured set of objections, weighed in public.

Desk editor's note, referee report, simulated authors' rebuttal, and a circularity audit. Tearing a paper down is the easy half of reading it; the pith above is the substance, this is the friction.

Referee Report

0 major / 3 minor

Summary. The manuscript proposes a reinforcement learning framework for the stochastic electric truck routing problem under charging constraints and operational uncertainty. The problem is formulated as an event-driven semi-Markov decision process (SMDP) that incorporates shared charging resources, stochastic travel and energy consumption, and nonlinear fast-charging curves. A graph-based state representation together with a rule-based action mask is introduced to restrict policies to feasible actions. An event-driven simulation environment is developed to support RL training and benchmarking against heuristic and mathematical programming baselines. Computational experiments across fleet sizes indicate that the learned policy consistently outperforms the baselines, approaches optimization benchmarks in many cases, and maintains high success rates under charging congestion and uncertainty.

Significance. If the reported simulation results hold under the stated modeling assumptions, the work supplies a scalable learning-based method for a coupled logistics-energy routing problem that is computationally intractable for exact methods at realistic fleet sizes. The explicit incorporation of nonlinear charging dynamics, stochastic travel/energy models, and shared-resource competition in an event-driven SMDP framework is a constructive contribution to electric fleet operations research. The provision of a reproducible simulation environment for both RL and benchmark comparisons is a positive feature that supports further development in this area.

minor comments (3)
  1. The abstract asserts consistent outperformance and near-optimality but supplies no numerical values, baseline definitions, or statistical measures; adding one or two key quantitative results would improve the summary's informativeness without altering the manuscript's scope.
  2. In the formulation section, the precise definition of the graph nodes and edges used for the system state representation would benefit from an accompanying diagram or explicit enumeration of node types to aid reader comprehension.
  3. The experimental section should include a brief sensitivity table or discussion on how variations in the stochastic travel and energy parameters affect the reported performance gaps, even if only for the largest fleet size examined.

Simulated Author's Rebuttal

0 responses · 0 unresolved

We thank the referee for the constructive summary of our manuscript and the recommendation for minor revision. The positive assessment of the event-driven SMDP formulation, graph-based state representation, action masking, and simulation environment is appreciated. No specific major comments were provided in the report, so we have no point-by-point responses. We will incorporate any minor editorial or clarification improvements in the revised version.

Circularity Check

0 steps flagged

No significant circularity in derivation chain

full rationale

The manuscript formulates electric truck routing as an event-driven SMDP with graph state representation, rule-based action masking, stochastic travel/energy models, and nonlinear charging, then trains an RL policy inside a custom simulation and reports comparative performance against heuristics and optimization baselines. No equations, fitted parameters, or self-citations reduce the reported success rates or cost metrics to the inputs by construction; the evaluation remains an independent empirical comparison within the explicitly described simulator. The work is therefore self-contained as a standard RL application study.

Axiom & Free-Parameter Ledger

0 free parameters · 0 axioms · 0 invented entities

Review performed on abstract only; no explicit free parameters, axioms, or invented entities are stated. The formulation implicitly assumes standard MDP properties and simulator fidelity, but these are not enumerated.

pith-pipeline@v0.9.0 · 5524 in / 1272 out tokens · 41338 ms · 2026-05-07T11:52:19.626083+00:00 · methodology

discussion (0)

Sign in with ORCID, Apple, or X to comment. Anyone can read and Pith papers without signing in.

Reference graph

Works this paper leans on

46 extracted references · 36 canonical work pages · 3 internal anchors

  1. [1]

    Zhang, Z

    X. Zhang, Z. Lin, C. Crawford, S. Li, Techno-economic comparison of electrification for heavy-duty trucks in china by 2040, Transportation Research Part D: Transport and Environment 102 (2022) 103152. URL:https://www.sciencedirect.com/science/article/pii/ S1361920921004478. doi:https://doi.org/10.1016/j.trd.2021.103152

  2. [2]

    Kucukoglu, R

    I. Kucukoglu, R. Dewil, D. Cattrysse, The electric vehicle routing problem and its variations: A literature review, Computers & Industrial Engineering161(2021)107650.URL:https://www.sciencedirect.com/science/article/pii/S0360835221005544.doi:https: //doi.org/10.1016/j.cie.2021.107650

  3. [3]

    Smith, B

    D. Smith, B. Ozpineci, R. L. Graves, P. T. Jones, J. Lustbader, K. Kelly, K. Walkowicz, A. Birky, G. Payne, C. Sigler, et al., Medium- and Heavy-Duty Vehicle Electrification: An Assessment of Technology and Knowledge Gaps, Technical Report, Oak Ridge National Laboratory (ORNL), Oak Ridge, TN (United States), 2020. URL:https://www.osti.gov/biblio/1615213. ...

  4. [4]

    M. A. Bragin, Z. Ye, N. Yu, Toward efficient transportation electrification of heavy-duty trucks: Joint scheduling of truck routing and charging, Transportation Research Part C: Emerging Technologies 160 (2024) 104494. URL:https://www.sciencedirect.com/ science/article/pii/S0968090X24000159. doi:https://doi.org/10.1016/j.trc.2024.104494

  5. [5]

    URL:https://www.sciencedirect.com/science/article/pii/ S0968090X2500484X

    A.Spinelli,D.Bezzi,O.Jabali,F.Maggioni,Astochasticelectricvehicleroutingproblemunderuncertainenergyconsumption,Transportation Research Part C: Emerging Technologies 183 (2026) 105480. URL:https://www.sciencedirect.com/science/article/pii/ S0968090X2500484X. doi:https://doi.org/10.1016/j.trc.2025.105480

  6. [6]

    Keskin, B

    M. Keskin, B. Çatay, G. Laporte, A simulation-based heuristic for the electric vehicle routing problem with time windows and stochastic waiting times at recharging stations, Computers & Operations Research 125 (2021) 105060. URL:https://www.sciencedirect.com/ science/article/pii/S0305054820301775. doi:https://doi.org/10.1016/j.cor.2020.105060

  7. [7]

    Orfanoudakis, C

    S. Orfanoudakis, C. Diaz-Londono, Y. Emre Yılmaz, P. Palensky, P. P. Vergara, Ev2gym: A flexible v2g simulator for ev smart charging research and benchmarking, IEEE Transactions on Intelligent Transportation Systems 26 (2025) 2410–2421

  8. [8]

    W. Wang, Y. Adulyasak, J.-F. Cordeau, G. He, The heterogeneous-fleet electric vehicle routing problem with nonlinear charging functions, Transportation Research Part C: Emerging Technologies 170 (2025) 104932. URL:https://www.sciencedirect.com/science/ article/pii/S0968090X24004534. doi:https://doi.org/10.1016/j.trc.2024.104932. Orfanoudakis S. et al.:Pre...

  9. [9]

    2025.11196671

    Z.Li,N.Aristov,A.Germain,E.R.Dugundji, Multi-stagestochasticprogrammingforheavy-dutyelectrictruckroutingunderpubliccharging congestionuncertainty,in:2025IEEEHighPerformanceExtremeComputingConference(HPEC),2025,pp.1–6.doi:10.1109/HPEC67600. 2025.11196671

  10. [10]

    Cataldo-Díaz, R

    C. Cataldo-Díaz, R. Linfati, J. W. Escobar, Mathematical models for the electric vehicle routing problem with time windows considering different aspects of the charging process, Operational Research 24 (2023) 1. URL:https://doi.org/10.1007/s12351-023-00806-5. doi:10.1007/s12351-023-00806-5

  11. [11]

    Amiri, H

    A. Amiri, H. Zolfagharinia, S. H. Amin, A robust multi-objective routing problem for heavy-duty electric trucks with uncertain energy consumption, Computers & Industrial Engineering 178 (2023) 109108. URL:https://www.sciencedirect.com/science/article/ pii/S0360835223001328. doi:https://doi.org/10.1016/j.cie.2023.109108

  12. [12]

    URL:https://www.sciencedirect

    R.Wang,P.Keyantuo,T.Zeng,J.Sandoval,A.Vishwanath,H.Borhan,S.Moura, Robustroutingforamixedfleetofheavy-dutytruckswith pickup and delivery under energy consumption uncertainty, Applied Energy 368 (2024) 123407. URL:https://www.sciencedirect. com/science/article/pii/S0306261924007906. doi:https://doi.org/10.1016/j.apenergy.2024.123407

  13. [13]

    C. L. Lara, J. D. Siirola, I. E. Grossmann, Electric power infrastructure planning under uncertainty: stochastic dual dynamic integer programming (SDDiP) and parallelization scheme, Optimization and Engineering 21 (2020) 1243–1281. URL:https://doi.org/10. 1007/s11081-019-09471-0. doi:10.1007/s11081-019-09471-0

  14. [14]

    Euchi, A

    J. Euchi, A. Yassine, A hybrid metaheuristic algorithm to solve the electric vehicle routing problem with battery recharging stations for sustainable environmental and energy optimization, Energy Systems 14 (2023) 243–267. URL:https://doi.org/10.1007/ s12667-022-00501-y. doi:10.1007/s12667-022-00501-y

  15. [15]

    J. Dong, H. Wang, S. Zhang, Dynamic electric vehicle routing problem considering mid-route recharging and new demand arrival using an improvedmemeticalgorithm, SustainableEnergyTechnologiesandAssessments58(2023)103366.URL:https://www.sciencedirect. com/science/article/pii/S2213138823003594. doi:https://doi.org/10.1016/j.seta.2023.103366

  16. [16]

    doi:10.1109/TCYB.2021.3069942

    Y.-H.Jia,Y.Mei,M.Zhang, Abilevelantcolonyoptimizationalgorithmforcapacitatedelectricvehicleroutingproblem, IEEETransactions on Cybernetics 52 (2022) 10855–10868. doi:10.1109/TCYB.2021.3069942

  17. [17]

    Orfanoudakis, V

    S. Orfanoudakis, V. Robu, E. M. Salazar, P. Palensky, P. P. Vergara, Scalable reinforcement learning for large-scale coordination of electric vehicles using graph neural networks, Communications Engineering 4 (2025) 118. doi:10.1038/s44172-025-00457-8

  18. [18]

    URL:https://openreview.net/forum?id=ByxBFsRqYm

    W.Kool,H.vanHoof,M.Welling, Attention,learntosolveroutingproblems!, in:7thInternationalConferenceonLearningRepresentations, ICLR 2019, New Orleans, LA, USA, May 6-9, 2019, OpenReview.net, 2019. URL:https://openreview.net/forum?id=ByxBFsRqYm

  19. [19]

    URL:https://openreview.net/forum?id=UXTR6ZYV1x

    R.Yang,C.Fan, Neuralcombinatorialoptimizationfortimedependenttravelingsalesmanproblem, in:TheThirty-ninthAnnualConference on Neural Information Processing Systems, 2025. URL:https://openreview.net/forum?id=UXTR6ZYV1x

  20. [20]

    M. Kim, J. Park, J. Park, Sym-nco: Leveraging symmetricity for neural combinatorial optimization, in: S. Koyejo, S. Mohamed, A. Agarwal, D. Belgrave, K. Cho, A. Oh (Eds.), Advances in Neural Information Processing Systems 35: Annual Conference on Neural Information Processing Systems 2022, NeurIPS 2022, New Orleans, LA, USA, November 28 - December 9, 2022...

  21. [21]

    Basso, B

    R. Basso, B. Kulcsár, I. Sanchez-Diaz, X. Qu, Dynamic stochastic electric vehicle routing with safe reinforcement learning, Transportation ResearchPartE:LogisticsandTransportationReview157(2022)102496.URL:https://www.sciencedirect.com/science/article/ pii/S1366554521002581. doi:https://doi.org/10.1016/j.tre.2021.102496

  22. [22]

    J. Lin, X. Wang, R. Niu, Y. He, A q-learning-based hyper-heuristic for capacitated electric vehicle routing problem, IEEE Transactions on Intelligent Transportation Systems 26 (2025) 15746–15757. doi:10.1109/TITS.2025.3594393

  23. [23]

    doi:https://doi.org/10.1016/j.apenergy.2023.121711

    M.Tang,W.Zhuang,B.Li,H.Liu,Z.Song,G.Yin, Energy-optimalroutingforelectricvehiclesusingdeepreinforcementlearningwithtrans- former, AppliedEnergy350(2023)121711.URL:https://www.sciencedirect.com/science/article/pii/S0306261923010759. doi:https://doi.org/10.1016/j.apenergy.2023.121711

  24. [24]

    doi:10.1109/TITS.2021.3105232

    B.Lin,B.Ghaddar,J.Nathwani, Deepreinforcementlearningfortheelectricvehicleroutingproblemwithtimewindows, IEEETransactions on Intelligent Transportation Systems 23 (2022) 11528–11538. doi:10.1109/TITS.2021.3105232

  25. [25]

    T. N. Kipf, M. Welling, Semi-supervised classification with graph convolutional networks, in: 5th International Conference on Learning Representations, ICLR 2017, Toulon, France, April 24-26, 2017, Conference Track Proceedings, OpenReview.net, 2017. URL:https: //openreview.net/forum?id=SJU4ayYgl

  26. [26]

    N. Wang, Y. Sun, H. Wang, An adaptive memetic algorithm for dynamic electric vehicle routing problem with time-varying demands, Mathematical Problems in Engineering 2021 (2021) 6635749. URL:https:// onlinelibrary.wiley.com/doi/abs/10.1155/2021/6635749. doi:https://doi.org/10.1155/2021/6635749. arXiv:https://onlinelibrary.wiley.com/doi/pdf/10.1155/2021/6635749

  27. [27]

    B. V. Vani, D. Kishan, M. W. Ahmad, C. R. P. Reddy, An efficient optimization algorithm for electric vehicle routing problem, IET Power Electronics 19 (2023) e12555. URL:https://ietresearch.onlinelibrary.wiley.com/doi/abs/10.1049/pel2.12555. doi:https://doi.org/10.1049/pel2.12555

  28. [28]

    Nazari, A

    M. Nazari, A. Oroojlooy, L. V. Snyder, M. Takáč, Reinforcement learning for solving the vehicle routing problem, 2018. URL:https: //arxiv.org/abs/1802.04240.arXiv:1802.04240

  29. [29]

    Y. Ma, Z. Cao, Y. M. Chee, Learning to search feasible and infeasible regions of routing problems with flexible neural k-opt, in: A. Oh, T.Naumann,A.Globerson,K.Saenko,M.Hardt,S.Levine(Eds.),AdvancesinNeuralInformationProcessingSystems36:AnnualConference on Neural Information Processing Systems 2023, NeurIPS 2023, New Orleans, LA, USA, December 10 - 16, 2...

  30. [30]

    M. Wang, Y. Wei, X. Huang, S. Gao, An end-to-end deep reinforcement learning framework for electric vehicle routing problem, IEEE Internet of Things Journal 11 (2024) 33671–33682. doi:10.1109/JIOT.2024.3432911

  31. [31]

    C. Wang, R. Zhang, R. Hong, H. Wang, Attention-enhanced deep reinforcement learning for electric vehicle routing optimization, IEEE Transactions on Transportation Electrification 11 (2025) 11228–11242. doi:10.1109/TTE.2025.3574546. Orfanoudakis S. et al.:Preprint submitted to ElsevierPage 21 of 22 Learning to Route Electric Trucks Under Operational Uncertainty

  32. [32]

    Y. Li, S. Su, M. Zhang, Q. Liu, X. Nie, M. Xia, D. D. Micu, Multi-agent graph reinforcement learning method for electric vehicle on-route charging guidance in coupled transportation electrification, IEEE Transactions on Sustainable Energy 15 (2024) 1180–1193. doi:10.1109/TSTE.2023.3330842

  33. [33]

    Alqahtani, M

    M. Alqahtani, M. Hu, Dynamic energy scheduling and routing of multiple electric vehicles using deep reinforcement learning, En- ergy244(2022)122626.URL:https://www.sciencedirect.com/science/article/pii/S0360544221028759.doi:https://doi. org/10.1016/j.energy.2021.122626

  34. [34]

    URL:https://www.sciencedirect.com/science/article/pii/ S0360835225004619

    D.Mogale,A.Ghadge,S.K.Jena, Modellingandoptimisingamulti-depotvehicleroutingproblemforfreightdistributioninaretaillogistics network, Computers & Industrial Engineering 207 (2025) 111315. URL:https://www.sciencedirect.com/science/article/pii/ S0360835225004619. doi:https://doi.org/10.1016/j.cie.2025.111315

  35. [35]

    Lombard, S

    A. Lombard, S. Tamayo, F. Fontane, Modelling the time-dependent vrp through open data, in: Proceedings of the International MultiConference of Engineers and Computer Scientists, volume 2, 2018

  36. [36]

    URL:https://www.sciencedirect.com/science/article/pii/ S0968090X18300147

    F.Rodrigues,F.C.Pereira, Heteroscedasticgaussianprocessesforuncertaintymodelinginlarge-scalecrowdsourcedtrafficdata, Transporta- tion Research Part C: Emerging Technologies 95 (2018) 636–651. URL:https://www.sciencedirect.com/science/article/pii/ S0968090X18300147. doi:https://doi.org/10.1016/j.trc.2018.08.007

  37. [37]

    OpenAI Gym

    G. Brockman, V. Cheung, L. Pettersson, J. Schneider, J. Schulman, J. Tang, W. Zaremba, Openai gym, arXiv:1606.01540 (2016)

  38. [38]

    C. B. Saner, J. Saha, D. Srinivasan, A charge curve and battery management system aware optimal charging scheduling framework for electric vehicle fast charging stations with heterogeneous customer mix, IEEE Transactions on Intelligent Transportation Systems 24 (2023) 14890–14902. doi:10.1109/TITS.2023.3303621

  39. [39]

    Ibe, Markov Processes for Stochastic Modeling, Elsevier, 2013

    O. Ibe, Markov Processes for Stochastic Modeling, Elsevier, 2013. URL:https://doi.org/10.1016/C2012-0-06106-6

  40. [40]

    Towards Generalization of Graph Neural Networks for AC Optimal Power Flow

    O. Arowolo, J. L. Cremer, Towards generalization of graph neural networks for ac optimal power flow, 2025. URL:https://arxiv.org/ abs/2510.06860.arXiv:2510.06860

  41. [41]

    Orfanoudakis, N

    S. Orfanoudakis, N. K. Panda, P. Palensky, P. P. Vergara, A graph neural network enhanced decision transformer for efficient optimization in dynamic smart charging environments, Energy and AI 23 (2026) 100679. doi:10.1016/j.egyai.2026.100679

  42. [42]

    Proximal Policy Optimization Algorithms

    J. Schulman, F. Wolski, P. Dhariwal, A. Radford, O. Klimov, Proximal policy optimization algorithms, 2017. URL:https://arxiv.org/ abs/1707.06347.arXiv:1707.06347

  43. [43]

    R. S. Sutton, A. G. Barto, Reinforcement learning: An introduction, MIT press, 2018

  44. [44]

    Raffin, A

    A. Raffin, A. Hill, A. Gleave, A. Kanervisto, M. Ernestus, N. Dormann, Stable-baselines3: Reliable reinforcement learning implementations, Journal of Machine Learning Research 22 (2021) 1–8. URL:http://jmlr.org/papers/v22/20-1364.html

  45. [45]

    A Closer Look at Invalid Action Masking in Policy Gradient Algorithms , booktitle =

    S. Huang, S. Ontañón, A closer look at invalid action masking in policy gradient algorithms, in: R. Barták, F. Keshtkar, M. Franklin (Eds.), Proceedings of the Thirty-Fifth International Florida Artificial Intelligence Research Society Conference, FLAIRS 2022, Hutchinson Island, JensenBeach,Florida,USA,May15-18,2022,FloridaOnlineJournals,2022.URL:https://...

  46. [46]

    G. A. Croes, A method for solving traveling-salesman problems, Operations Research 6 (1958) 791–812. URL:http://www.jstor.org/ stable/167074. Orfanoudakis S. et al.:Preprint submitted to ElsevierPage 22 of 22