arxiv: 2604.27118 · v1 · submitted 2026-04-29 · 💻 cs.RO · cs.AI

Recognition: unknown

PALCAS: A Priority-Aware Intelligent Lane Change Advisory System for Autonomous Vehicles using Federated Reinforcement Learning

Yassine Ibork , Nhat Ha Nguyen , Myounggyu Won , Lokesh Das

Authors on Pith no claims yet

Pith reviewed 2026-05-07 09:44 UTC · model grok-4.3

classification 💻 cs.RO cs.AI

keywords autonomous vehicleslane change advisoryfederated reinforcement learningpriority-aware rewardmulti-agent coordinationtraffic efficiencySUMO simulatorvehicle-to-vehicle communication

0 comments

The pith

A priority-aware federated reinforcement learning system lets autonomous vehicles coordinate lane changes according to destination urgency.

A machine-rendered reading of the paper's core claim, the machinery that carries it, and where it could break.

The paper presents PALCAS as a multi-agent system in which autonomous vehicles learn to advise lane changes through federated reinforcement learning while weighting decisions by each vehicle's urgency to reach its destination. It introduces a custom reward function that enforces safety constraints during both mandatory and discretionary lane changes and employs the parameterized deep Q-network algorithm to support joint lateral and longitudinal control across agents. Simulations in the SUMO traffic simulator combined with the Mosaic V2X framework report gains in efficiency, safety, comfort, arrival rates, and merging success relative to baseline methods. A sympathetic reader would care because coordinated lane changes could ease congestion and reduce collisions once roads contain large numbers of self-driving cars operating alongside human drivers. The federated approach allows distributed decision making without requiring a central controller or sharing raw data.

Core claim

PALCAS is a priority-aware intelligent lane change advisory system for autonomous vehicles based on multi-agent federated reinforcement learning. It incorporates a novel priority-aware safe lane-change reward function that enables judicious decisions in mandatory and discretionary scenarios. The system leverages the parameterized deep Q-network algorithm to facilitate effective cooperation among agents for both lateral and longitudinal motion controls. Extensive simulations using the SUMO traffic simulator and Mosaic V2X communication framework demonstrate that PALCAS significantly improves traffic efficiency, driving safety, comfort, destination arrival rates, and merging success rates when

What carries the argument

The priority-aware safe lane-change reward function inside a multi-agent federated reinforcement learning setup that uses parameterized deep Q-networks to coordinate lateral and longitudinal controls across vehicles.

If this is right

Vehicles with urgent destinations arrive more reliably because the reward function explicitly raises their priority in lane-change decisions.
Overall traffic flow improves because coordinated discretionary lane changes reduce unnecessary slowing.
Safety and passenger comfort increase through the explicit safety constraints built into the reward function.
Merging success rates rise in mandatory lane-change situations such as highway exits or construction zones.
Decentralized cooperation among vehicles becomes feasible without transmitting raw sensor data to a central server.

Where Pith is reading between the lines

These are editorial extensions of the paper, not claims the author makes directly.

The priority weighting could be extended to other vehicle decisions such as gap acceptance or speed selection in mixed traffic.
Federated updates might allow the system to adapt online when new vehicles join the fleet without retraining from scratch.
Performance gains observed in simulation would need separate validation against real sensor noise and communication delays.
The approach might reduce aggregate energy use across a fleet by producing smoother collective trajectories.

Load-bearing premise

The SUMO simulator with the Mosaic V2X framework generates traffic dynamics and communication effects that closely enough match real-world mixed human and autonomous vehicle traffic.

What would settle it

A field experiment on actual roads with mixed human-driven and autonomous vehicles that measures no improvement in merging success rates or safety metrics compared with the same baseline methods would falsify the central performance claim.

Figures

Figures reproduced from arXiv: 2604.27118 by Lokesh Das, Myounggyu Won, Nhat Ha Nguyen, Yassine Ibork.

**Figure 1.** Figure 1: An operational overview of PALCAS. aspect of PALCAS is its ability to fuse local and global traffic information: each RSU coordinates with others via I2I communication to make globally informed yet locally optimized motion-control decisions for CAVs using Fed-MARL. This integration enables scalable, cooperative control across dynamic highway environments. Particularly, each agent (i.e., RSU) independent… view at source ↗

**Figure 3.** Figure 3: further demonstrates PALCAS’s ability to alleviate intermittent traffic congestion caused by merging and exiting vehicles, underscoring its effectiveness in prioritizing lane changes under dynamically varying traffic conditions. We further evaluate the impact of CAV penetration rate (PR) on traffic efficiency, with results sumTime(s) 50 100 150 200 250 300 Long. Pos. (m) 0 1000 2000 0 10 20 30 Speed (m… view at source ↗

**Figure 2.** Figure 2: Changes in average speed over time. CAS maintains 3.12% and 6.86% higher average speeds than Baseline-1 and Baseline-2, respectively, throughout the entire simulation. The space–time diagram in view at source ↗

**Figure 4.** Figure 4: Acceleration trajectories for CAVs on different view at source ↗

**Figure 6.** Figure 6: Inference Time. PALCAS has its own time complexity at each inference round view at source ↗

read the original abstract

We present a priority-aware intelligent lane change advisory system based on multi-agent federated reinforcement learning, namely PALCAS, for autonomous vehicles (AVs). While existing lane-change approaches typically focus on single-agent systems or centralized multi-agent systems, we introduce a federated reinforcement learning-based multi-agent lane change system prioritizing lane changing based on vehicle destination urgency. PALCAS incorporates a novel priority-aware safe lane-change reward function to enable judicious lane-change decisions in both mandatory and discretionary scenarios. PALCAS leverages the parameterized deep Q-network (PDQN) algorithm to facilitate effective cooperation among agents, enabling both lateral and longitudinal motion controls of AVs. Extensive simulations conducted using the SUMO traffic simulator and Mosaic V2X communication framework demonstrate that PALCAS significantly improves traffic efficiency, driving safety, comfort, destination arrival rates, and merging success rates compared to baseline methods.

Editorial analysis

A structured set of objections, weighed in public.

Desk editor's note, referee report, simulated authors' rebuttal, and a circularity audit. Tearing a paper down is the easy half of reading it; the pith above is the substance, this is the friction.

Desk Editor's Note private letter to a colleague

PALCAS shows simulation gains from priority-aware federated RL for AV lane changes but leaves real-world applicability unproven.

read the letter

Here's the quick read on PALCAS. The paper puts forward a multi-agent federated reinforcement learning system for lane change advisory in autonomous vehicles, with a priority mechanism tied to how urgent each vehicle's destination is. It uses a parameterized deep Q-network and a custom reward that balances safety in both required and optional lane changes. Simulations in SUMO paired with the Mosaic V2X setup report better traffic flow, safety, comfort, and success rates than the baselines. What stands out as new is the explicit priority handling based on urgency combined with federated learning to keep things decentralized. This lets agents cooperate on lane decisions without a central controller, which fits the distributed nature of real traffic. The reward function seems designed to encourage safe merges while respecting priorities, and the PDQN handles the continuous aspects of speed and lane choice. The work does a reasonable job building on prior RL applications in AVs by adding the federated and priority layers. The V2X integration adds a communication angle that many pure RL papers skip. The main limitation is that all the evidence comes from simulation. SUMO's models and the idealized V2X in Mosaic do not fully capture the messiness of real roads, like imperfect sensors, varying human driver responses, or actual wireless interference. The stress-test note hits it: if those factors change the state transitions or rewards, the reported improvements might shrink or disappear in practice. The abstract also skips details on how baselines were implemented or whether the gains are statistically significant across multiple runs, so the paper needs to fill those in clearly. This is the kind of paper for people focused on simulation-based development of AV coordination systems. It could interest researchers looking at federated methods for multi-agent traffic control. It has enough of a concrete proposal and evaluation to warrant a serious referee, though reviewers will likely push on the sim-to-real gap and ask for more rigorous stats. I would send it to peer review rather than desk reject.

Referee Report

3 major / 2 minor

Summary. The manuscript presents PALCAS, a multi-agent federated reinforcement learning system for priority-aware lane-change advisory in autonomous vehicles. It employs a parameterized deep Q-network (PDQN) with a novel priority-aware safe lane-change reward function to handle both mandatory and discretionary maneuvers, claiming statistically significant gains in traffic efficiency, safety, comfort, destination arrival rates, and merging success rates over baseline methods in SUMO simulations interfaced with the Mosaic V2X framework.

Significance. If the simulation results prove robust under detailed scrutiny and the federated updates demonstrate stable cooperation without centralization, the work could advance decentralized RL approaches for AV coordination in mixed traffic. The priority-aware mechanism addresses a practical gap in urgency-based decision making, but the exclusive reliance on idealized simulation limits immediate translational impact.

major comments (3)

[Abstract and Results] Abstract and Results section: the central claim of 'significant improvements' in efficiency, safety, comfort, arrival rates, and merging success is unsupported by any reported baseline implementation details, reward function equations, number of runs, variance measures, or statistical tests, rendering the performance assertions unverifiable from the provided text.
[Methodology] Methodology section: the novel priority-aware safe lane-change reward function is described conceptually but lacks an explicit mathematical formulation, parameter definitions, or weighting scheme, preventing assessment of whether it drives the reported gains or reduces to standard safety penalties.
[Simulation Environment and Evaluation] Simulation Environment and Evaluation: the SUMO + Mosaic V2X setup is used without any sensitivity analysis or comparison to real-world factors such as sensor noise, actuator delays, or heterogeneous human driver models, which directly affects the validity of the state transitions and reward signals underlying the PDQN policy improvements.

minor comments (2)

[Introduction] The acronym PDQN should be expanded at first use and the federated update mechanism should include a brief pseudocode or diagram for clarity.
[Figures] Figure captions for simulation results could explicitly state the number of independent trials and confidence intervals to aid interpretation.

Simulated Author's Rebuttal

3 responses · 0 unresolved

We thank the referee for their thorough review and constructive comments. We address each of the major comments below and commit to revising the manuscript to improve clarity and verifiability of our results.

read point-by-point responses

Referee: Abstract and Results section: the central claim of 'significant improvements' in efficiency, safety, comfort, arrival rates, and merging success is unsupported by any reported baseline implementation details, reward function equations, number of runs, variance measures, or statistical tests, rendering the performance assertions unverifiable from the provided text.

Authors: We agree that the current manuscript lacks sufficient details to fully support and verify the performance claims. In the revised version, we will provide explicit information on the baseline implementations, include the mathematical equations for the priority-aware safe lane-change reward function, report the number of simulation runs along with variance measures, and present statistical tests demonstrating the significance of the improvements. revision: yes
Referee: Methodology section: the novel priority-aware safe lane-change reward function is described conceptually but lacks an explicit mathematical formulation, parameter definitions, or weighting scheme, preventing assessment of whether it drives the reported gains or reduces to standard safety penalties.

Authors: We recognize the need for a precise formulation. The revised manuscript will include the full mathematical definition of the reward function, with clear definitions of all parameters (including priority weights derived from destination urgency) and the weighting scheme balancing safety, comfort, and efficiency components. This will clarify how the function contributes to the results beyond standard safety penalties. revision: yes
Referee: Simulation Environment and Evaluation: the SUMO + Mosaic V2X setup is used without any sensitivity analysis or comparison to real-world factors such as sensor noise, actuator delays, or heterogeneous human driver models, which directly affects the validity of the state transitions and reward signals underlying the PDQN policy improvements.

Authors: We agree on the value of sensitivity analysis. We will incorporate sensitivity tests in the revised evaluation section, varying parameters like communication delays and traffic conditions. For real-world factors, we will add a limitations discussion acknowledging the idealized nature of the simulations and note that extensions to include sensor noise and heterogeneous driver models are part of ongoing work. This addresses the core concern while maintaining the focus of the current study. revision: partial

Circularity Check

0 steps flagged

No circularity detected in PALCAS derivation chain

full rationale

The paper introduces an algorithmic system (priority-aware reward function, PDQN-based federated RL for lane changes) and validates it via SUMO+Mosaic simulations against baselines. No equations, derivations, or self-referential reductions appear in the provided text; performance claims rest on external empirical comparisons rather than fitted inputs renamed as predictions or self-citation chains that would force the result by construction. The derivation is self-contained as a proposal of a novel RL policy evaluated independently.

Axiom & Free-Parameter Ledger

0 free parameters · 0 axioms · 0 invented entities

Unable to fully audit due to abstract-only access; no explicit free parameters, axioms, or invented entities are stated in the provided text.

pith-pipeline@v0.9.0 · 5456 in / 1012 out tokens · 34048 ms · 2026-05-07T09:44:09.566544+00:00 · methodology

discussion (0)

Reference graph

Works this paper leans on

36 extracted references · 2 canonical work pages

[1]

Mix q-learning for lane changing: a collaborative decision-making method in multi-agent deep reinforce- ment learning

[Bi et al., 2025 ] Xiaojun Bi, Mingjie He, and Yiwen Sun. Mix q-learning for lane changing: a collaborative decision-making method in multi-agent deep reinforce- ment learning. IEEE Transactions on Vehicular Tech- nology,

2025
[2]

Personalized driver/vehicle lane change models for adas

[Butakov and Ioannou, 2014 ] Vadim A Butakov and Petros Ioannou. Personalized driver/vehicle lane change models for adas. IEEE Transactions on Ve- hicular Technology, 64(10):4422–4431,

2014
[3]

Federated learning for con- nected and automated vehicles: A survey of existing approaches and challenges

[Chellapandi et al., 2023 ] Vishnu Pandi Chellapandi, Liangqi Yuan, Christopher G Brinton, Stanislaw H Żak, and Ziran Wang. Federated learning for con- nected and automated vehicles: A survey of existing approaches and challenges. IEEE Transactions on In- telligent Vehicles, 9(1):119–137,

2023
[4]

Human-like interactive lane- change modeling based on reward-guided diffusive pre- dictor and planner

[Chen et al., 2024 ] Kehua Chen, Yuhao Luo, Meixin Zhu, and Hai Yang. Human-like interactive lane- change modeling based on reward-guided diffusive pre- dictor and planner. IEEE Transactions on Intelligent Transportation Systems,

2024
[5]

Defining time-to-collision thresholds by the type of lead vehicle in non-lane-based traﬀic en- vironments

[Das and Maurya, 2019 ] Sanhita Das and Akhilesh Ku- mar Maurya. Defining time-to-collision thresholds by the type of lead vehicle in non-lane-based traﬀic en- vironments. IEEE transactions on intelligent trans- portation systems, 21(12):4972–4982,

2019
[6]

Universally safe swerve ma- neuvers for autonomous driving

[De Iaco et al., 2021 ] Ryan De Iaco, Stephen L Smith, and Krzysztof Czarnecki. Universally safe swerve ma- neuvers for autonomous driving. IEEE Open Jour- nal of Intelligent Transportation Systems, 2:482–494,

2021
[7]

Standards for passenger comfort in automated ve- hicles: Acceleration and jerk

[De Winkel et al., 2023 ] Ksander N De Winkel, Tu- grul Irmak, Riender Happee, and Barys Shyrokau. Standards for passenger comfort in automated ve- hicles: Acceleration and jerk. Applied Ergonomics, 106:103881,

2023
[8]

Safe and optimal lane-change path planning for automated driving

[Ding et al., 2021 ] Yang Ding, Weichao Zhuang, Liangmo Wang, Jingxing Liu, Levent Guvenc, and Zhen Li. Safe and optimal lane-change path planning for automated driving. Proceedings of the Institu- tion of Mechanical Engineers, Part D: Journal of Automobile Engineering, 235(4):1070–1083,

2021
[9]

Driver monitoring-based lane-change prediction: A personalized federated learning framework

[Du et al., 2023 ] Runjia Du, Kyungtae Han, Rohit Gupta, Sikai Chen, Samuel Labi, and Ziran Wang. Driver monitoring-based lane-change prediction: A personalized federated learning framework. In 2023 IEEE Intelligent Vehicles Symposium (IV), pages 1–

2023
[10]

Blockchain-frl for vehicular lane changing: Toward traﬀic, data, and training safety

[Fan et al., 2023 ] Bo Fan, Yiwei Dong, Tongfei Li, and Yuan Wu. Blockchain-frl for vehicular lane changing: Toward traﬀic, data, and training safety. IEEE Inter- net of Things Journal, 10(24):22153–22164,

2023
[11]

A selective federated reinforcement learning strategy for autonomous driv- ing

[Fu et al., 2022 ] Yuchuan Fu, Changle Li, F Richard Yu, Tom H Luan, and Yao Zhang. A selective federated reinforcement learning strategy for autonomous driv- ing. IEEE transactions on intelligent transportation systems, 24(2):1655–1668,

2022
[12]

A secure personalized federated learning algorithm for autonomous driving

[Fu et al., 2024 ] Yuchuan Fu, Xinlong Tang, Changle Li, Fei Richard Yu, and Nan Cheng. A secure personalized federated learning algorithm for autonomous driving. IEEE Transactions on Intelligent Transportation Sys- tems,

2024
[13]

Heuristic-based multi-agent deep reinforcement learning approach for coordinating con- nected and automated vehicles at non-signalized in- tersection

[Guo et al., 2024 ] Zihan Guo, Yan Wu, Lifang Wang, and Junzhi Zhang. Heuristic-based multi-agent deep reinforcement learning approach for coordinating con- nected and automated vehicles at non-signalized in- tersection. IEEE Transactions on Intelligent Trans- portation Systems,

2024
[14]

Hu- man knowledge enhanced reinforcement learning for mandatory lane-change of autonomous vehicles in con- gested traﬀic

[Huang et al., 2023 ] Yanjun Huang, Yuxiao Gu, Kang Yuan, Shuo Yang, Tao Liu, and Hong Chen. Hu- man knowledge enhanced reinforcement learning for mandatory lane-change of autonomous vehicles in con- gested traﬀic. IEEE Transactions on Intelligent Vehi- cles, 9(2):3509–3519,

2023
[15]

On the development of autonomous vehicle safety distance by an rss model based on a variable focus function camera

[Kim et al., 2021 ] Min-Joong Kim, Sung-Hun Yu, Tong- Hyun Kim, Joo-Uk Kim, and Young-Min Kim. On the development of autonomous vehicle safety distance by an rss model based on a variable focus function camera. Sensors, 21(20):6733,

2021
[16]

Autonomous highway merging in mixed traﬀic using reinforcement learning and motion predictive safety controller

[Liu et al., 2022 ] Qianqian Liu, Fengying Dang, Xiao- fan Wang, and Xiaoqiang Ren. Autonomous highway merging in mixed traﬀic using reinforcement learning and motion predictive safety controller. In 2022 IEEE 25th International Conference on Intelligent Trans- portation Systems (ITSC), pages 1063–1069. IEEE,

2022
[17]

Microscopic traﬀic simulation using sumo

[Lopez et al., 2018b ] Pablo Alvarez Lopez, Michael Behrisch, Laura Bieker-Walz, Jakob Erdmann, Yun- Pang Flötteröd, Robert Hilbrich, Leonhard Lücken, Johannes Rummel, Peter Wagner, and Evamarie Wießner. Microscopic traﬀic simulation using sumo. In 2018 21st international conference on intelligent transportation systems (ITSC), pages 2575–2582. Ieee,

2018
[18]

A preference- based multi-agent federated reinforcement learning al- gorithm framework for trustworthy interactive urban autonomous driving

[Lu et al., 2025 ] Sikai Lu, Yingfeng Cai, Ze Liu, Yubo Lian, Long Chen, and Hai Wang. A preference- based multi-agent federated reinforcement learning al- gorithm framework for trustworthy interactive urban autonomous driving. IEEE Transactions on Intelligent Transportation Systems,

2025
[19]

Communication-eﬀicient learning of deep networks from decentralized data

[McMahan et al., 2017 ] Brendan McMahan, Eider Moore, Daniel Ramage, Seth Hampson, and Blaise Aguera y Arcas. Communication-eﬀicient learning of deep networks from decentralized data. In Artificial intelligence and statistics, pages 1273–1282. PMLR,

2017
[20]

Deep federated learning for autonomous driving

[Nguyen et al., 2022 ] Anh Nguyen, Tuong Do, Minh Tran, Binh X Nguyen, Chien Duong, Tu Phan, Er- man Tjiputra, and Quang D Tran. Deep federated learning for autonomous driving. In 2022 IEEE In- telligent Vehicles Symposium (IV), pages 1824–1830. IEEE,

2022
[21]

Vehicles

[NHTSA, 2023 ] NHTSA. Vehicles. In Traﬀic Safety Facts: A Compilation of Motor Vehicle Traﬀic Crash Data

2023
[22]

[Ni et al., 2020 ] Jie Ni, Jingwen Han, and Fei Dong

Traﬀic Safety Facts Annual Report Ta- bles. [Ni et al., 2020 ] Jie Ni, Jingwen Han, and Fei Dong. Multivehicle cooperative lane change control strategy for intelligent connected vehicle. Journal of Advanced Transportation, 2020(1):8672928,

2020
[23]

Lane change maneuvers for automated vehicles

[Nilsson et al., 2016 ] Julia Nilsson, Mattias Brännström, Erik Coelingh, and Jonas Fredriks- son. Lane change maneuvers for automated vehicles. IEEE Transactions on Intelligent Transportation Systems, 18(5):1087–1096,

2016
[24]

[Schrab et al., 2023 ] Karl Schrab, Maximilian Neubauer, Robert Protzmann, Ilja Radusch, Stamatis Manganiaris, Panagiotis Lytrivis, and Angelos J. Amditis. Modeling an its management solution for mixed highway traﬀic with eclipse mosaic. IEEE Transactions on Intelligent Transportation Systems, 24(6):6575–6585,

2023
[25]

Simulation framework for testing adas in chinese traﬀic situations

[Semrau and Erdmann, 2016 ] Marc Semrau and Jakob Erdmann. Simulation framework for testing adas in chinese traﬀic situations. SUMO 2016–Traﬀic, Mobil- ity, and Logistics, 30:103–115,

2016
[26]

On a Formal Model of Safe and Scalable Self-driving Cars

[Shalev-Shwartz et al., 2017 ] Shai Shalev-Shwartz, Shaked Shammah, and Amnon Shashua. On a formal model of safe and scalable self-driving cars. arXiv preprint arXiv:1708.06374,

work page Pith review arXiv 2017
[27]

Critical reasons for crashes investigated in the national motor vehicle crash cau- sation survey

[Singh, 2018 ] Santokh Singh. Critical reasons for crashes investigated in the national motor vehicle crash cau- sation survey. Traﬀic Safety Facts: Crash ·Stats; A Brief Statistical Summary DOT HS 812 506, U.S. De- partment of Transportation, National Highway Traﬀic Safety Administration, National Center for Statistics and Analysis, Washington, D.C., March

2018
[28]

[Toledo and Zohar, 2007 ] Tomer Toledo and David Zo- har

Based on data from the National Motor Vehicle Crash Cau- sation Survey (2005–2007). [Toledo and Zohar, 2007 ] Tomer Toledo and David Zo- har. Modeling duration of lane changes. Transporta- tion Research Record, 1999(1):71–78,

2005
[29]

A reinforcement learning based approach for automated lane change maneuvers

[Wang et al., 2018 ] Pin Wang, Ching-Yao Chan, and Arnaud de La Fortelle. A reinforcement learning based approach for automated lane change maneuvers. In 2018 IEEE intelligent vehicles symposium (IV), pages 1379–1384. IEEE,

2018
[30]

A multi-agent reinforcement learning-based longitudinal and lateral control of cavs to improve traﬀic eﬀiciency in a mandatory lane change scenario

[Wang et al., 2024 ] Shupei Wang, Ziyang Wang, Rui Jiang, Feng Zhu, Ruidong Yan, and Ying Shang. A multi-agent reinforcement learning-based longitudinal and lateral control of cavs to improve traﬀic eﬀiciency in a mandatory lane change scenario. Transportation Research Part C: Emerging Technologies, 158:104445,

2024
[31]

Robust lane change decision for autonomous vehicles in mixed traf- fic: A safety-aware multi-agent adversarial reinforce- ment learning approach

[Wang et al., 2025 ] Tao Wang, Minghui Ma, Shidong Liang, Jufen Yang, and Yansong Wang. Robust lane change decision for autonomous vehicles in mixed traf- fic: A safety-aware multi-agent adversarial reinforce- ment learning approach. Transportation Research Part C: Emerging Technologies, 172:105005,

2025
[32]

Anticipatory lane change warning using vehicle-to-vehicle communications

[Williams et al., 2018 ] Nigel Williams, Guoyuan Wu, Kanok Boriboonsomsin, Matthew Barth, Samer Ra- jab, and Sue Bai. Anticipatory lane change warning using vehicle-to-vehicle communications. In 2018 21st International Conference on Intelligent Transporta- tion Systems (ITSC), pages 117–122. IEEE,

2018
[33]

Parametrized Deep Q-Networks Learning: Reinforcement Learning with Discrete-Continuous Hybrid Action Space

[Xiong et al., 2018 ] Jiechao Xiong, Qing Wang, Zhuo- ran Yang, Peng Sun, Lei Han, Yang Zheng, Haobo Fu, Tong Zhang, Ji Liu, and Han Liu. Parametrized deep q-networks learning: Reinforcement learning with discrete-continuous hybrid action space. arXiv preprint arXiv:1810.06394,

work page Pith review arXiv 2018
[34]

Rlpg: Reinforcement learning approach for dynamic intra- platoon gap adaptation for highway on-ramp merging

[Yadavalli et al., 2023 ] Sushma Reddy Yadavalli, Lokesh Chandra Das, and Myounggyu Won. Rlpg: Reinforcement learning approach for dynamic intra- platoon gap adaptation for highway on-ramp merging. In 2023 IEEE/RSJ International Conference on Intel- ligent Robots and Systems (IROS), pages 5514–5521. IEEE,

2023
[35]

Hierarchical td3 re- inforcement learning with stackelberg game and at- tention mechanism for safe and eﬀicient autonomous lane-changing

[Yu et al., 2025 ] Xuan Yu, Donghua Zhao, Hongxia Wu, Jian Zhou, and Huiying Tang. Hierarchical td3 re- inforcement learning with stackelberg game and at- tention mechanism for safe and eﬀicient autonomous lane-changing. IEEE Access,

2025
[36]

Hgrl: Human-driving-data guided reinforcement learning for autonomous driving

[Zhuang et al., 2024 ] Hejian Zhuang, Hongqing Chu, Yafei Wang, Bingzhao Gao, and Hong Chen. Hgrl: Human-driving-data guided reinforcement learning for autonomous driving. IEEE Transactions on Intelligent Vehicles, 2024

2024