Reinforcement Learning for Testing Interdependent Requirements in Autonomous Vehicles: An Empirical Study

Aitor Arrieta; Chengjie Lu; Jiahui Wu; Shaukat Ali

arxiv: 2502.15792 · v2 · submitted 2025-02-18 · 💻 cs.SE · cs.LG· cs.RO

Reinforcement Learning for Testing Interdependent Requirements in Autonomous Vehicles: An Empirical Study

Jiahui Wu , Chengjie Lu , Aitor Arrieta , Shaukat Ali This is my paper

Pith reviewed 2026-05-23 02:32 UTC · model grok-4.3

classification 💻 cs.SE cs.LGcs.RO

keywords reinforcement learningautonomous vehiclesscenario-based testingmulti-objective RLsingle-objective RLrequirement violationsempirical study

0 comments

The pith

Multi-objective RL generates more diverse violation scenarios for interdependent AV requirements than single-objective RL, though the latter finds higher-severity violations.

A machine-rendered reading of the paper's core claim, the machinery that carries it, and where it could break.

The paper compares single-objective reinforcement learning, which folds multiple objectives into one reward, against multi-objective reinforcement learning for generating test scenarios that expose violations of interdependent requirements in autonomous vehicles. It reports that the two approaches show comparable effectiveness in many cases but differ in violation patterns: MORL produces more violating scenarios and broader scenario coverage, while SORL tends to surface higher-severity violations. These differences also vary with particular objective combinations and, less strongly, with road conditions. A reader cares because AV requirements involve explicit trade-offs, so the choice of RL method directly shapes which safety issues surface during testing.

Core claim

MORL and SORL differ mainly in how violations occur, while showing comparable effectiveness in many cases. MORL tends to generate more requirement-violation scenarios, whereas SORL produces higher-severity violations. Their relative performance also depends on specific objective combinations and, to a lesser extent, road conditions. Regarding diversity, MORL consistently covers a broader range of scenarios. Thus, MORL is preferable when scenario diversity and coverage are prioritized, whereas SORL may better expose severe violations.

What carries the argument

Empirical head-to-head comparison of single-objective RL (SORL), which merges objectives into one reward, versus multi-objective RL (MORL), which treats objectives separately, when both are used to generate critical test scenarios for interdependent AV requirements.

If this is right

MORL is preferable when scenario diversity and coverage are prioritized.
SORL may better expose severe violations.
Relative performance depends on the specific objective combinations chosen.
Road conditions affect the two methods to a lesser extent.

Where Pith is reading between the lines

These are editorial extensions of the paper, not claims the author makes directly.

If the simulator results transfer, practitioners could select MORL when the testing budget allows broad exploration and SORL when the priority is depth of severity.
The observed dependence on objective combinations suggests that requirement trade-offs should be mapped explicitly before choosing an RL variant.

Load-bearing premise

The high-fidelity simulator and end-to-end AV controller used in the experiments provide a faithful proxy for real-world interdependent requirement violations and their severity.

What would settle it

Repeating the comparison on a different simulator or with a different AV controller and observing that MORL no longer produces more violations or that SORL no longer yields higher severity would falsify the reported distinction.

Figures

Figures reproduced from arXiv: 2502.15792 by Aitor Arrieta, Chengjie Lu, Jiahui Wu, Shaukat Ali.

**Figure 1.** Figure 1: Overview of AV Testing with MOEQT the testing environment, which contains the state of both the AV and its operating environment. The agent then samples a vector of multi-objective weights ωt based on the number of requirements targeted for violation. Taking the concatenation of st and ωt as input, the MQ-network computes the corresponding Q-values. Based on these Q-values and the weight vector ωt, the beh… view at source ↗

**Figure 2.** Figure 2: Driving roads for specifying driving tasks. [PITH_FULL_IMAGE:figures/full_fig_p009_2.png] view at source ↗

**Figure 3.** Figure 3: Convergence Trends of TTC and RC achieved by MOEQT and SORLW on Different Roads. short, MOEQT achieved an overall stable performance during training and well-balanced the two objectives, indicating that MORL with an adaptive weighting mechanism can better handle these two objectives compared to single-objective RL with fixed equal weights. Analyzing violations of requirements. Recall that our goal is to ge… view at source ↗

read the original abstract

Autonomous vehicles (AVs) make driving decisions without humans, making dependability assurance critical. Scenario-based testing is widely used to evaluate AVs under diverse conditions, with reinforcement learning (RL) generating test scenarios that identify violations of functional and safety requirements. Many requirements are interdependent and involve trade-offs, making it unclear whether single-objective RL (SORL), which combines objectives into a single reward, can reliably reveal violations or whether multi-objective RL (MORL), which explicitly considers multiple objectives, is necessary. We present an empirical evaluation comparing SORL and MORL for generating critical scenarios that simultaneously test interdependent requirements using an end-to-end AV controller and high-fidelity simulator. Results suggest that MORL and SORL differ mainly in how violations occur, while showing comparable effectiveness in many cases. MORL tends to generate more requirement-violation scenarios, whereas SORL produces higher-severity violations. Their relative performance also depends on specific objective combinations and, to a lesser extent, road conditions. Regarding diversity, MORL consistently covers a broader range of scenarios. Thus, MORL is preferable when scenario diversity and coverage are prioritized, whereas SORL may better expose severe violations. Our empirical evaluation addresses a gap by systematically comparing SORL and MORL, highlighting the importance of requirement dependencies in RL-based AV testing.

Editorial analysis

A structured set of objections, weighed in public.

Desk editor's note, referee report, simulated authors' rebuttal, and a circularity audit. Tearing a paper down is the easy half of reading it; the pith above is the substance, this is the friction.

Desk Editor's Note private letter to a colleague

The paper compares SORL and MORL on interdependent AV requirements and reports MORL finds more violations with better diversity while SORL hits higher severity, but the results sit entirely inside one unvalidated simulator.

read the letter

The central result is that MORL generates more requirement-violation scenarios and covers a broader range while SORL tends to produce higher-severity violations, with the gap depending on which objectives are combined and to a smaller degree on road conditions. They ran this as a head-to-head empirical comparison using an end-to-end controller in a high-fidelity simulator. That setup directly targets the case where requirements trade off against each other rather than treating them independently. The work is new in performing that systematic comparison for AV testing, and it does a straightforward job of measuring violation count, severity, and diversity to show the practical differences. The guidance on when to prefer one approach over the other follows from the reported patterns. The main soft spot is the simulator. The stress-test concern holds up on the abstract: there is no cross-validation against physical tests, alternative simulators, or real sensor traces, so any mismatch in dynamics or perception noise could flip which method appears stronger. The abstract also stays light on run counts, statistical tests, and controls for training stochasticity, which leaves the robustness of the differences unclear. No free parameters or invented entities appear in the description. This paper is for researchers working on RL-based scenario generation for safety-critical systems who want concrete data on single versus multi-objective variants. A reader already familiar with the AV testing literature would get usable pointers on objective choice, though they would still need to adapt the setup. It deserves a serious referee because the comparison addresses a stated gap with actual experimental results that others can examine or replicate. I would send it to peer review.

Referee Report

2 major / 2 minor

Summary. The paper presents an empirical study comparing single-objective reinforcement learning (SORL) and multi-objective reinforcement learning (MORL) for generating test scenarios that expose violations of interdependent requirements in autonomous vehicles. Experiments use an end-to-end AV controller in a high-fidelity simulator; results indicate MORL produces more violation scenarios and greater diversity while SORL yields higher-severity violations, with relative performance depending on objective combinations and road conditions.

Significance. If the simulator and controller faithfully capture real-world violation dynamics and interdependencies, the work supplies concrete guidance on selecting RL methods for AV scenario-based testing and highlights the impact of requirement trade-offs. The systematic comparison of existing RL variants on a realistic controller is a positive contribution to empirical software engineering for safety-critical systems.

major comments (2)

[Abstract / Experimental Setup] Abstract and Experimental Setup section: the central comparative claims (MORL generates more violations, SORL higher severity, MORL broader diversity) rest on the unvalidated assumption that the chosen high-fidelity simulator produces counts, severity metrics, and diversity measures that reflect actual AV requirement trade-offs. No cross-validation against physical tests, alternative simulators, or real sensor data is described; mismatches in dynamics (tire models, perception noise, controller latency) could invert the reported MORL/SORL differences.
[Results] Results section: the abstract reports comparative results on violation count, severity, and diversity, yet the manuscript provides no details on statistical tests, effect sizes, controls for confounding factors (random seeds, hyperparameter sensitivity), or raw data availability. This undermines confidence that observed differences are robust rather than artifacts of the experimental configuration.

minor comments (2)

[Methods] Clarify the precise definitions and weighting schemes used for the multi-objective reward functions and severity metrics; these are referenced but not fully formalized in the provided abstract.
[Figures] Ensure all figures reporting scenario diversity include axis labels, legends, and error bars or confidence intervals consistent with the statistical analysis.

Simulated Author's Rebuttal

2 responses · 0 unresolved

We thank the referee for the constructive feedback. We address the two major comments point-by-point below and will revise the manuscript to incorporate the suggested improvements.

read point-by-point responses

Referee: [Abstract / Experimental Setup] Abstract and Experimental Setup section: the central comparative claims (MORL generates more violations, SORL higher severity, MORL broader diversity) rest on the unvalidated assumption that the chosen high-fidelity simulator produces counts, severity metrics, and diversity measures that reflect actual AV requirement trade-offs. No cross-validation against physical tests, alternative simulators, or real sensor data is described; mismatches in dynamics (tire models, perception noise, controller latency) could invert the reported MORL/SORL differences.

Authors: We agree this is a valid concern regarding external validity. Our work is positioned as a simulation-based empirical study, consistent with standard practice in AV testing literature where physical validation is often infeasible due to safety and cost. In the revision we will add a dedicated 'Threats to Validity' subsection that explicitly discusses simulator fidelity limitations, potential mismatches with real-world dynamics, and the scope of our comparative claims. We will also reference existing validation studies of the simulator where available. This addresses the comment without requiring new experiments. revision: yes
Referee: [Results] Results section: the abstract reports comparative results on violation count, severity, and diversity, yet the manuscript provides no details on statistical tests, effect sizes, controls for confounding factors (random seeds, hyperparameter sensitivity), or raw data availability. This undermines confidence that observed differences are robust rather than artifacts of the experimental configuration.

Authors: We acknowledge the omission of these methodological details. The revised manuscript will expand the Results section to report: statistical tests performed (including p-values), effect sizes, the use of multiple independent runs (with different random seeds) to control for stochasticity, a brief hyperparameter sensitivity analysis, and a public repository link for raw data and replication scripts. These additions will be included in the next version. revision: yes

Circularity Check

0 steps flagged

No circularity: purely empirical comparison without derivations or self-referential predictions

full rationale

The paper is an empirical study that runs SORL and MORL algorithms on a high-fidelity simulator to generate and compare test scenarios for interdependent AV requirements. No equations, fitted parameters renamed as predictions, or derivation chains appear in the abstract or described methodology. Central claims rest on observed differences in violation counts, severity, and diversity from experimental runs, which are falsifiable against the simulator outputs and do not reduce to self-definition or self-citation. Self-citations, if present, are not load-bearing for any uniqueness theorem or ansatz. This is a standard non-circular empirical evaluation.

Axiom & Free-Parameter Ledger

0 free parameters · 2 axioms · 0 invented entities

The study is purely empirical and rests on domain assumptions about the fidelity of the simulator and the representativeness of the chosen AV controller rather than new mathematical constructs.

axioms (2)

domain assumption The high-fidelity simulator accurately reproduces real-world AV dynamics and the effects of requirement violations
Invoked implicitly as the basis for all generated scenarios and severity measurements
standard math Standard RL training procedures can be configured to optimize for requirement-violation objectives
Background assumption required to treat SORL and MORL as valid testing generators

pith-pipeline@v0.9.0 · 5775 in / 1403 out tokens · 29546 ms · 2026-05-23T02:32:23.859797+00:00 · methodology

discussion (0)

Forward citations

Cited by 1 Pith paper

Reviewed papers in the Pith corpus that reference this work. Sorted by Pith novelty score.

From Research to Practice: An Interactive Rapid Review of Autonomous Driving System Testing in Industry
cs.SE 2026-05 unverdicted novelty 5.0

Industry practitioners identified 12 ADS testing challenges, prioritized two for end-to-end systems, and found that most of the 17 examined research studies lack direct applicability to real industrial contexts.

Reference graph

Works this paper leans on

53 extracted references · 53 canonical work pages · cited by 1 Pith paper · 2 internal anchors

[1]

Finding critical scenarios for automated driving systems: A systematic mapping study

Xinhai Zhang, Jianbo Tao, Kaige Tan, Martin Törngren, José Manuel Gaspar Sánchez, Muhammad Rusyadi Ramli, Xin Tao, Magnus Gyllenhammar, Franz Wotawa, Naveen Mohan, et al. Finding critical scenarios for automated driving systems: A systematic mapping study. IEEE Transactions on Software Engineering, 49(3):991–1026, 2022

work page 2022
[2]

A survey on safety-critical driving scenario generation—a methodological perspective

Wenhao Ding, Chejian Xu, Mansur Arief, Haohong Lin, Bo Li, and Ding Zhao. A survey on safety-critical driving scenario generation—a methodological perspective. IEEE Transactions on Intelligent Transportation Systems, 24(7):6971–6988, 2023

work page 2023
[3]

Mind the gap! a study on the transferability of virtual versus physical-world testing of autonomous driving systems

Andrea Stocco, Brian Pulfer, and Paolo Tonella. Mind the gap! a study on the transferability of virtual versus physical-world testing of autonomous driving systems. IEEE Transactions on Software Engineering, 49(4):1928– 1940, 2022

work page 1928
[4]

Model vs system level testing of autonomous driving systems: a replication and extension study

Andrea Stocco, Brian Pulfer, and Paolo Tonella. Model vs system level testing of autonomous driving systems: a replication and extension study. Empirical Software Engineering, 28(3):73, 2023

work page 2023
[5]

Machine learning testing in an adas case study using simulation-integrated bio-inspired search-based testing

Mahshid Helali Moghadam, Markus Borg, Mehrdad Saadatmand, Seyed Jalaleddin Mousavirad, Markus Bohlin, and Björn Lisper. Machine learning testing in an adas case study using simulation-integrated bio-inspired search-based testing. Journal of Software: Evolution and Process, 36(5):e2591, 2024

work page 2024
[6]

Identifying and explaining safety-critical scenarios for autonomous vehicles via key features

Neelofar Neelofar and Aldeida Aleti. Identifying and explaining safety-critical scenarios for autonomous vehicles via key features. ACM Transactions on Software Engineering and Methodology, 33(4):1–32, 2024

work page 2024
[7]

Pafot: A position-based approach for finding optimal tests of autonomous vehicles

Victor Crespo-Rodriguez, Neelofar, and Aldeida Aleti. Pafot: A position-based approach for finding optimal tests of autonomous vehicles. In Proceedings of the 5th ACM/IEEE International Conference on Automation of Software Test (AST 2024), pages 159–170, 2024

work page 2024
[8]

Carla: An open urban driving simulator

Alexey Dosovitskiy, German Ros, Felipe Codevilla, Antonio Lopez, and Vladlen Koltun. Carla: An open urban driving simulator. In Conference on robot learning, pages 1–16. PMLR, 2017

work page 2017
[9]

Neural network guided evolutionary fuzzing for finding traffic violations of autonomous vehicles

Ziyuan Zhong, Gail Kaiser, and Baishakhi Ray. Neural network guided evolutionary fuzzing for finding traffic violations of autonomous vehicles. IEEE Transactions on Software Engineering, 49(4):1860–1875, 2022

work page 2022
[10]

Epitester: Testing autonomous vehicles with epigenetic algorithm and attention mechanism

Chengjie Lu, Shaukat Ali, and Tao Yue. Epitester: Testing autonomous vehicles with epigenetic algorithm and attention mechanism. IEEE Transactions on Software Engineering, pages 1–19, 2024

work page 2024
[11]

scenorita: Generating diverse, fully mutable, test scenarios for autonomous vehicle planning

Yuqi Huai, Sumaya Almanee, Yuntianyi Chen, Xiafa Wu, Qi Alfred Chen, and Joshua Garcia. scenorita: Generating diverse, fully mutable, test scenarios for autonomous vehicle planning. IEEE Transactions on Software Engineering, 49(10):4656–4676, 2023

work page 2023
[12]

Specification-based autonomous driving system testing

Yuan Zhou, Yang Sun, Yun Tang, Yuqi Chen, Jun Sun, Christopher M Poskitt, Yang Liu, and Zijiang Yang. Specification-based autonomous driving system testing. IEEE Transactions on Software Engineering, 49(6):3391– 3410, 2023

work page 2023
[13]

Reinforcement learning: An introduction

Richard S Sutton and Andrew G Barto. Reinforcement learning: An introduction. MIT press, 2018

work page 2018
[14]

Learning configu- rations of operating environment of autonomous vehicles to maximize their collisions

Chengjie Lu, Yize Shi, Huihui Zhang, Man Zhang, Tiexin Wang, Tao Yue, and Shaukat Ali. Learning configu- rations of operating environment of autonomous vehicles to maximize their collisions. IEEE Transactions on Software Engineering, 49(1):384–402, 2022

work page 2022
[15]

Fitash Ul Haq, Donghwan Shin, and Lionel C. Briand. Many-objective reinforcement learning for online testing of dnn-enabled systems. In 2023 IEEE/ACM 45th International Conference on Software Engineering (ICSE), pages 1814–1826, 2023

work page 2023
[16]

Dense reinforcement learning for safety validation of autonomous vehicles

Shuo Feng, Haowei Sun, Xintao Yan, Haojie Zhu, Zhengxia Zou, Shengyin Shen, and Henry X Liu. Dense reinforcement learning for safety validation of autonomous vehicles. Nature, 615(7953):620–627, 2023

work page 2023
[17]

Multiobjective reinforcement learning: A comprehensive overview

Chunming Liu, Xin Xu, and Dewen Hu. Multiobjective reinforcement learning: A comprehensive overview. IEEE Transactions on Systems, Man, and Cybernetics: Systems, 45(3):385–398, 2015. 15 Multi-Objective Reinforcement Learning for Critical Scenario Generation of Autonomous Vehicles

work page 2015
[18]

A practical guide to multi-objective reinforcement learning and planning

Conor F Hayes, Roxana R˘adulescu, Eugenio Bargiacchi, Johan Källström, Matthew Macfarlane, Mathieu Rey- mond, Timothy Verstraeten, Luisa M Zintgraf, Richard Dazeley, Fredrik Heintz, et al. A practical guide to multi-objective reinforcement learning and planning. Autonomous Agents and Multi-Agent Systems, 36(1):26, 2022

work page 2022
[19]

Prediction-guided multi-objective reinforcement learning for continuous robot control

Jie Xu, Yunsheng Tian, Pingchuan Ma, Daniela Rus, Shinjiro Sueda, and Wojciech Matusik. Prediction-guided multi-objective reinforcement learning for continuous robot control. In International conference on machine learning, pages 10607–10616. PMLR, 2020

work page 2020
[20]

Deep reinforcement learning for autonomous driving: A survey

B Ravi Kiran, Ibrahim Sobh, Victor Talpaert, Patrick Mannion, Ahmad A Al Sallab, Senthil Yogamani, and Patrick Pérez. Deep reinforcement learning for autonomous driving: A survey. IEEE Transactions on Intelligent Transportation Systems, 23(6):4909–4926, 2021

work page 2021
[21]

A generalized algorithm for multi-objective reinforcement learning and policy adaptation

Runzhe Yang, Xingyuan Sun, and Karthik Narasimhan. A generalized algorithm for multi-objective reinforcement learning and policy adaptation. Advances in neural information processing systems, 32, 2019

work page 2019
[22]

Safety-enhanced autonomous driving using interpretable sensor fusion transformer

Hao Shao, Letian Wang, Ruobing Chen, Hongsheng Li, and Yu Liu. Safety-enhanced autonomous driving using interpretable sensor fusion transformer. In Conference on Robot Learning, pages 726–737. PMLR, 2023

work page 2023
[23]

Q-learning

Christopher JCH Watkins and Peter Dayan. Q-learning. Machine learning, 8:279–292, 1992

work page 1992
[24]

A survey of multi-objective sequential decision-making

Diederik M Roijers, Peter Vamplew, Shimon Whiteson, and Richard Dazeley. A survey of multi-objective sequential decision-making. Journal of Artificial Intelligence Research, 48:67–113, 2013

work page 2013
[25]

A toolkit for reliable benchmarking and research in multi-objective reinforcement learning

Florian Felten, Lucas N Alegre, Ann Nowe, Ana Bazzan, El Ghazali Talbi, Grégoire Danoy, and Bruno C da Silva. A toolkit for reliable benchmarking and research in multi-objective reinforcement learning. Advances in Neural Information Processing Systems, 36, 2024

work page 2024
[26]

Human-level control through deep reinforcement learning

V olodymyr Mnih, Koray Kavukcuoglu, David Silver, Andrei A Rusu, Joel Veness, Marc G Bellemare, Alex Graves, Martin Riedmiller, Andreas K Fidjeland, Georg Ostrovski, et al. Human-level control through deep reinforcement learning. nature, 518(7540):529–533, 2015

work page 2015
[27]

Modern homotopy methods in optimization

Layne T Watson and Raphael T Haftka. Modern homotopy methods in optimization. Computer Methods in Applied Mechanics and Engineering, 74(3):289–305, 1989

work page 1989
[28]

Prioritized Experience Replay

Tom Schaul. Prioritized experience replay. arXiv preprint arXiv:1511.05952, 2015

work page internal anchor Pith review Pith/arXiv arXiv 2015
[29]

A survey of state-action representations for autonomous driving

Edouard Leurent. A survey of state-action representations for autonomous driving. 2018

work page 2018
[30]

Attention is all you need

A Vaswani. Attention is all you need. Advances in Neural Information Processing Systems, 2017

work page 2017
[31]

A deep reinforcement learning approach for dynamically stable inverse kinematics of humanoid robots

S Phaniteja, Parijat Dewangan, Pooja Guhan, Abhishek Sarkar, and K Madhava Krishna. A deep reinforcement learning approach for dynamically stable inverse kinematics of humanoid robots. In 2017 IEEE international conference on robotics and biomimetics (ROBIO), pages 1818–1823. IEEE, 2017

work page 2017
[32]

Conditional dqn-based motion planning with fuzzy logic for autonomous driving

Long Chen, Xuemin Hu, Bo Tang, and Yu Cheng. Conditional dqn-based motion planning with fuzzy logic for autonomous driving. IEEE Transactions on Intelligent Transportation Systems, 23(4):2966–2977, 2020

work page 2020
[33]

Requirements- driven test generation for autonomous vehicles with machine learning components

Cumhur Erkan Tuncali, Georgios Fainekos, Danil Prokhorov, Hisahiro Ito, and James Kapinski. Requirements- driven test generation for autonomous vehicles with machine learning components. IEEE Transactions on Intelligent Vehicles, 5(2):265–280, 2019

work page 2019
[34]

Extended time-to-collision measures for road traffic safety assessment

Michiel M Minderhoud and Piet HL Bovy. Extended time-to-collision measures for road traffic safety assessment. Accident Analysis & Prevention, 33(1):89–97, 2001

work page 2001
[35]

Planning and decision-making for autonomous vehicles

Wilko Schwarting, Javier Alonso-Mora, and Daniela Rus. Planning and decision-making for autonomous vehicles. Annual Review of Control, Robotics, and Autonomous Systems, 1(1):187–210, 2018

work page 2018
[36]

Carla autonomous driving leaderboard

CARLA Team, Intel Autonomous Agents Lab, Embodied AI Foundation, and AlphaDrive. Carla autonomous driving leaderboard. https://leaderboard.carla.org, 2024

work page 2024
[37]

Epitester: Testing autonomous vehicles with epigenetic algorithm and attention mechanism

Chengjie Lu, Shaukat Ali, and Tao Yue. Epitester: Testing autonomous vehicles with epigenetic algorithm and attention mechanism. IEEE Transactions on Software Engineering, 2024

work page 2024
[38]

Optuna: A next-generation hyperparameter optimization framework

Takuya Akiba, Shotaro Sano, Toshihiko Yanase, Takeru Ohta, and Masanori Koyama. Optuna: A next-generation hyperparameter optimization framework. In Proceedings of the 25th ACM SIGKDD international conference on knowledge discovery & data mining, pages 2623–2631, 2019

work page 2019
[39]

Jiahui Wu, Chengjie Lu, Aitor Arrieta, and Shaukat Ali. MOEQT. https://github.com/Simula-COMPLEX/ MOEQT, 2025

work page 2025
[40]

Incentivizing Exploration In Reinforcement Learning With Deep Predictive Models

Bradly C Stadie, Sergey Levine, and Pieter Abbeel. Incentivizing exploration in reinforcement learning with deep predictive models. arXiv preprint arXiv:1507.00814, 2015. 16 Multi-Objective Reinforcement Learning for Critical Scenario Generation of Autonomous Vehicles

work page internal anchor Pith review Pith/arXiv arXiv 2015
[41]

Pre-crash scenario typology for crash avoidance research

Wassim G Najm, John D Smith, Mikio Yanagisawa, et al. Pre-crash scenario typology for crash avoidance research. Technical report, United States. Department of Transportation. National Highway Traffic Safety . . . , 2007

work page 2007
[42]

Statistical methods for research workers

Ronald Aylmer Fisher. Statistical methods for research workers. In Breakthroughs in statistics: Methodology and distribution, pages 66–70. Springer, 1970

work page 1970
[43]

A practical guide for using statistical tests to assess randomized algorithms in software engineering

Andrea Arcuri and Lionel Briand. A practical guide for using statistical tests to assess randomized algorithms in software engineering. In Proceedings of the 33rd international conference on software engineering, pages 1–10, 2011

work page 2011
[44]

Explaining odds ratios

Magdalena Szumilas. Explaining odds ratios. Journal of the Canadian academy of child and adolescent psychiatry, 19(3):227, 2010

work page 2010
[45]

Testing advanced driver assistance systems using multi-objective search and neural networks

Raja Ben Abdessalem, Shiva Nejati, Lionel C Briand, and Thomas Stifter. Testing advanced driver assistance systems using multi-objective search and neural networks. In Proceedings of the 31st IEEE/ACM international conference on automated software engineering, pages 63–74, 2016

work page 2016
[46]

Testing vision-based control systems using learnable evolutionary algorithms

Raja Ben Abdessalem, Shiva Nejati, Lionel C Briand, and Thomas Stifter. Testing vision-based control systems using learnable evolutionary algorithms. In Proceedings of the 40th International Conference on Software Engineering, pages 1016–1026, 2018

work page 2018
[47]

A fast and elitist multiobjective genetic algorithm: Nsga-ii

Kalyanmoy Deb, Amrit Pratap, Sameer Agarwal, and TAMT Meyarivan. A fast and elitist multiobjective genetic algorithm: Nsga-ii. IEEE transactions on evolutionary computation, 6(2):182–197, 2002

work page 2002
[48]

Av-fuzzer: Finding safety violations in autonomous driving systems

Guanpeng Li, Yiran Li, Saurabh Jha, Timothy Tsai, Michael Sullivan, Siva Kumar Sastry Hari, Zbigniew Kalbarczyk, and Ravishankar Iyer. Av-fuzzer: Finding safety violations in autonomous driving systems. In 2020 IEEE 31st international symposium on software reliability engineering (ISSRE), pages 25–36. IEEE, 2020

work page 2020
[49]

Efficient online testing for dnn-enabled systems using surrogate-assisted and many-objective optimization

Fitash Ul Haq, Donghwan Shin, and Lionel Briand. Efficient online testing for dnn-enabled systems using surrogate-assisted and many-objective optimization. In Proceedings of the 44th international conference on software engineering, pages 811–822, 2022

work page 2022
[50]

Lawbreaker: An approach for specifying traffic laws and fuzzing autonomous vehicles

Yang Sun, Christopher M Poskitt, Jun Sun, Yuqi Chen, and Zijiang Yang. Lawbreaker: An approach for specifying traffic laws and fuzzing autonomous vehicles. In Proceedings of the 37th IEEE/ACM International Conference on Automated Software Engineering, pages 1–12, 2022

work page 2022
[51]

Adversarial evaluation of autonomous vehicles in lane-change scenarios

Baiming Chen, Xiang Chen, Qiong Wu, and Liang Li. Adversarial evaluation of autonomous vehicles in lane-change scenarios. IEEE transactions on intelligent transportation systems, 23(8):10333–10342, 2021

work page 2021
[52]

Kochenderfer

Mark Koren, Saud Alsaif, Ritchie Lee, and Mykel J. Kochenderfer. Adaptive stress testing for autonomous vehicles. In 2018 IEEE Intelligent Vehicles Symposium (IV), pages 1–7, 2018

work page 2018
[53]

Adversarial testing with reinforcement learning: A case study on autonomous driving

Andréa Doreste, Matteo Biagiola, and Paolo Tonella. Adversarial testing with reinforcement learning: A case study on autonomous driving. In 2024 IEEE Conference on Software Testing, Verification and Validation (ICST), pages 293–304. IEEE, 2024. 17

work page 2024

[1] [1]

Finding critical scenarios for automated driving systems: A systematic mapping study

Xinhai Zhang, Jianbo Tao, Kaige Tan, Martin Törngren, José Manuel Gaspar Sánchez, Muhammad Rusyadi Ramli, Xin Tao, Magnus Gyllenhammar, Franz Wotawa, Naveen Mohan, et al. Finding critical scenarios for automated driving systems: A systematic mapping study. IEEE Transactions on Software Engineering, 49(3):991–1026, 2022

work page 2022

[2] [2]

A survey on safety-critical driving scenario generation—a methodological perspective

Wenhao Ding, Chejian Xu, Mansur Arief, Haohong Lin, Bo Li, and Ding Zhao. A survey on safety-critical driving scenario generation—a methodological perspective. IEEE Transactions on Intelligent Transportation Systems, 24(7):6971–6988, 2023

work page 2023

[3] [3]

Mind the gap! a study on the transferability of virtual versus physical-world testing of autonomous driving systems

Andrea Stocco, Brian Pulfer, and Paolo Tonella. Mind the gap! a study on the transferability of virtual versus physical-world testing of autonomous driving systems. IEEE Transactions on Software Engineering, 49(4):1928– 1940, 2022

work page 1928

[4] [4]

Model vs system level testing of autonomous driving systems: a replication and extension study

Andrea Stocco, Brian Pulfer, and Paolo Tonella. Model vs system level testing of autonomous driving systems: a replication and extension study. Empirical Software Engineering, 28(3):73, 2023

work page 2023

[5] [5]

Machine learning testing in an adas case study using simulation-integrated bio-inspired search-based testing

Mahshid Helali Moghadam, Markus Borg, Mehrdad Saadatmand, Seyed Jalaleddin Mousavirad, Markus Bohlin, and Björn Lisper. Machine learning testing in an adas case study using simulation-integrated bio-inspired search-based testing. Journal of Software: Evolution and Process, 36(5):e2591, 2024

work page 2024

[6] [6]

Identifying and explaining safety-critical scenarios for autonomous vehicles via key features

Neelofar Neelofar and Aldeida Aleti. Identifying and explaining safety-critical scenarios for autonomous vehicles via key features. ACM Transactions on Software Engineering and Methodology, 33(4):1–32, 2024

work page 2024

[7] [7]

Pafot: A position-based approach for finding optimal tests of autonomous vehicles

Victor Crespo-Rodriguez, Neelofar, and Aldeida Aleti. Pafot: A position-based approach for finding optimal tests of autonomous vehicles. In Proceedings of the 5th ACM/IEEE International Conference on Automation of Software Test (AST 2024), pages 159–170, 2024

work page 2024

[8] [8]

Carla: An open urban driving simulator

Alexey Dosovitskiy, German Ros, Felipe Codevilla, Antonio Lopez, and Vladlen Koltun. Carla: An open urban driving simulator. In Conference on robot learning, pages 1–16. PMLR, 2017

work page 2017

[9] [9]

Neural network guided evolutionary fuzzing for finding traffic violations of autonomous vehicles

Ziyuan Zhong, Gail Kaiser, and Baishakhi Ray. Neural network guided evolutionary fuzzing for finding traffic violations of autonomous vehicles. IEEE Transactions on Software Engineering, 49(4):1860–1875, 2022

work page 2022

[10] [10]

Epitester: Testing autonomous vehicles with epigenetic algorithm and attention mechanism

Chengjie Lu, Shaukat Ali, and Tao Yue. Epitester: Testing autonomous vehicles with epigenetic algorithm and attention mechanism. IEEE Transactions on Software Engineering, pages 1–19, 2024

work page 2024

[11] [11]

scenorita: Generating diverse, fully mutable, test scenarios for autonomous vehicle planning

Yuqi Huai, Sumaya Almanee, Yuntianyi Chen, Xiafa Wu, Qi Alfred Chen, and Joshua Garcia. scenorita: Generating diverse, fully mutable, test scenarios for autonomous vehicle planning. IEEE Transactions on Software Engineering, 49(10):4656–4676, 2023

work page 2023

[12] [12]

Specification-based autonomous driving system testing

Yuan Zhou, Yang Sun, Yun Tang, Yuqi Chen, Jun Sun, Christopher M Poskitt, Yang Liu, and Zijiang Yang. Specification-based autonomous driving system testing. IEEE Transactions on Software Engineering, 49(6):3391– 3410, 2023

work page 2023

[13] [13]

Reinforcement learning: An introduction

Richard S Sutton and Andrew G Barto. Reinforcement learning: An introduction. MIT press, 2018

work page 2018

[14] [14]

Learning configu- rations of operating environment of autonomous vehicles to maximize their collisions

Chengjie Lu, Yize Shi, Huihui Zhang, Man Zhang, Tiexin Wang, Tao Yue, and Shaukat Ali. Learning configu- rations of operating environment of autonomous vehicles to maximize their collisions. IEEE Transactions on Software Engineering, 49(1):384–402, 2022

work page 2022

[15] [15]

Fitash Ul Haq, Donghwan Shin, and Lionel C. Briand. Many-objective reinforcement learning for online testing of dnn-enabled systems. In 2023 IEEE/ACM 45th International Conference on Software Engineering (ICSE), pages 1814–1826, 2023

work page 2023

[16] [16]

Dense reinforcement learning for safety validation of autonomous vehicles

Shuo Feng, Haowei Sun, Xintao Yan, Haojie Zhu, Zhengxia Zou, Shengyin Shen, and Henry X Liu. Dense reinforcement learning for safety validation of autonomous vehicles. Nature, 615(7953):620–627, 2023

work page 2023

[17] [17]

Multiobjective reinforcement learning: A comprehensive overview

Chunming Liu, Xin Xu, and Dewen Hu. Multiobjective reinforcement learning: A comprehensive overview. IEEE Transactions on Systems, Man, and Cybernetics: Systems, 45(3):385–398, 2015. 15 Multi-Objective Reinforcement Learning for Critical Scenario Generation of Autonomous Vehicles

work page 2015

[18] [18]

A practical guide to multi-objective reinforcement learning and planning

Conor F Hayes, Roxana R˘adulescu, Eugenio Bargiacchi, Johan Källström, Matthew Macfarlane, Mathieu Rey- mond, Timothy Verstraeten, Luisa M Zintgraf, Richard Dazeley, Fredrik Heintz, et al. A practical guide to multi-objective reinforcement learning and planning. Autonomous Agents and Multi-Agent Systems, 36(1):26, 2022

work page 2022

[19] [19]

Prediction-guided multi-objective reinforcement learning for continuous robot control

Jie Xu, Yunsheng Tian, Pingchuan Ma, Daniela Rus, Shinjiro Sueda, and Wojciech Matusik. Prediction-guided multi-objective reinforcement learning for continuous robot control. In International conference on machine learning, pages 10607–10616. PMLR, 2020

work page 2020

[20] [20]

Deep reinforcement learning for autonomous driving: A survey

B Ravi Kiran, Ibrahim Sobh, Victor Talpaert, Patrick Mannion, Ahmad A Al Sallab, Senthil Yogamani, and Patrick Pérez. Deep reinforcement learning for autonomous driving: A survey. IEEE Transactions on Intelligent Transportation Systems, 23(6):4909–4926, 2021

work page 2021

[21] [21]

A generalized algorithm for multi-objective reinforcement learning and policy adaptation

Runzhe Yang, Xingyuan Sun, and Karthik Narasimhan. A generalized algorithm for multi-objective reinforcement learning and policy adaptation. Advances in neural information processing systems, 32, 2019

work page 2019

[22] [22]

Safety-enhanced autonomous driving using interpretable sensor fusion transformer

Hao Shao, Letian Wang, Ruobing Chen, Hongsheng Li, and Yu Liu. Safety-enhanced autonomous driving using interpretable sensor fusion transformer. In Conference on Robot Learning, pages 726–737. PMLR, 2023

work page 2023

[23] [23]

Q-learning

Christopher JCH Watkins and Peter Dayan. Q-learning. Machine learning, 8:279–292, 1992

work page 1992

[24] [24]

A survey of multi-objective sequential decision-making

Diederik M Roijers, Peter Vamplew, Shimon Whiteson, and Richard Dazeley. A survey of multi-objective sequential decision-making. Journal of Artificial Intelligence Research, 48:67–113, 2013

work page 2013

[25] [25]

A toolkit for reliable benchmarking and research in multi-objective reinforcement learning

Florian Felten, Lucas N Alegre, Ann Nowe, Ana Bazzan, El Ghazali Talbi, Grégoire Danoy, and Bruno C da Silva. A toolkit for reliable benchmarking and research in multi-objective reinforcement learning. Advances in Neural Information Processing Systems, 36, 2024

work page 2024

[26] [26]

Human-level control through deep reinforcement learning

V olodymyr Mnih, Koray Kavukcuoglu, David Silver, Andrei A Rusu, Joel Veness, Marc G Bellemare, Alex Graves, Martin Riedmiller, Andreas K Fidjeland, Georg Ostrovski, et al. Human-level control through deep reinforcement learning. nature, 518(7540):529–533, 2015

work page 2015

[27] [27]

Modern homotopy methods in optimization

Layne T Watson and Raphael T Haftka. Modern homotopy methods in optimization. Computer Methods in Applied Mechanics and Engineering, 74(3):289–305, 1989

work page 1989

[28] [28]

Prioritized Experience Replay

Tom Schaul. Prioritized experience replay. arXiv preprint arXiv:1511.05952, 2015

work page internal anchor Pith review Pith/arXiv arXiv 2015

[29] [29]

A survey of state-action representations for autonomous driving

Edouard Leurent. A survey of state-action representations for autonomous driving. 2018

work page 2018

[30] [30]

Attention is all you need

A Vaswani. Attention is all you need. Advances in Neural Information Processing Systems, 2017

work page 2017

[31] [31]

A deep reinforcement learning approach for dynamically stable inverse kinematics of humanoid robots

S Phaniteja, Parijat Dewangan, Pooja Guhan, Abhishek Sarkar, and K Madhava Krishna. A deep reinforcement learning approach for dynamically stable inverse kinematics of humanoid robots. In 2017 IEEE international conference on robotics and biomimetics (ROBIO), pages 1818–1823. IEEE, 2017

work page 2017

[32] [32]

Conditional dqn-based motion planning with fuzzy logic for autonomous driving

Long Chen, Xuemin Hu, Bo Tang, and Yu Cheng. Conditional dqn-based motion planning with fuzzy logic for autonomous driving. IEEE Transactions on Intelligent Transportation Systems, 23(4):2966–2977, 2020

work page 2020

[33] [33]

Requirements- driven test generation for autonomous vehicles with machine learning components

Cumhur Erkan Tuncali, Georgios Fainekos, Danil Prokhorov, Hisahiro Ito, and James Kapinski. Requirements- driven test generation for autonomous vehicles with machine learning components. IEEE Transactions on Intelligent Vehicles, 5(2):265–280, 2019

work page 2019

[34] [34]

Extended time-to-collision measures for road traffic safety assessment

Michiel M Minderhoud and Piet HL Bovy. Extended time-to-collision measures for road traffic safety assessment. Accident Analysis & Prevention, 33(1):89–97, 2001

work page 2001

[35] [35]

Planning and decision-making for autonomous vehicles

Wilko Schwarting, Javier Alonso-Mora, and Daniela Rus. Planning and decision-making for autonomous vehicles. Annual Review of Control, Robotics, and Autonomous Systems, 1(1):187–210, 2018

work page 2018

[36] [36]

Carla autonomous driving leaderboard

CARLA Team, Intel Autonomous Agents Lab, Embodied AI Foundation, and AlphaDrive. Carla autonomous driving leaderboard. https://leaderboard.carla.org, 2024

work page 2024

[37] [37]

Epitester: Testing autonomous vehicles with epigenetic algorithm and attention mechanism

Chengjie Lu, Shaukat Ali, and Tao Yue. Epitester: Testing autonomous vehicles with epigenetic algorithm and attention mechanism. IEEE Transactions on Software Engineering, 2024

work page 2024

[38] [38]

Optuna: A next-generation hyperparameter optimization framework

Takuya Akiba, Shotaro Sano, Toshihiko Yanase, Takeru Ohta, and Masanori Koyama. Optuna: A next-generation hyperparameter optimization framework. In Proceedings of the 25th ACM SIGKDD international conference on knowledge discovery & data mining, pages 2623–2631, 2019

work page 2019

[39] [39]

Jiahui Wu, Chengjie Lu, Aitor Arrieta, and Shaukat Ali. MOEQT. https://github.com/Simula-COMPLEX/ MOEQT, 2025

work page 2025

[40] [40]

Incentivizing Exploration In Reinforcement Learning With Deep Predictive Models

Bradly C Stadie, Sergey Levine, and Pieter Abbeel. Incentivizing exploration in reinforcement learning with deep predictive models. arXiv preprint arXiv:1507.00814, 2015. 16 Multi-Objective Reinforcement Learning for Critical Scenario Generation of Autonomous Vehicles

work page internal anchor Pith review Pith/arXiv arXiv 2015

[41] [41]

Pre-crash scenario typology for crash avoidance research

Wassim G Najm, John D Smith, Mikio Yanagisawa, et al. Pre-crash scenario typology for crash avoidance research. Technical report, United States. Department of Transportation. National Highway Traffic Safety . . . , 2007

work page 2007

[42] [42]

Statistical methods for research workers

Ronald Aylmer Fisher. Statistical methods for research workers. In Breakthroughs in statistics: Methodology and distribution, pages 66–70. Springer, 1970

work page 1970

[43] [43]

A practical guide for using statistical tests to assess randomized algorithms in software engineering

Andrea Arcuri and Lionel Briand. A practical guide for using statistical tests to assess randomized algorithms in software engineering. In Proceedings of the 33rd international conference on software engineering, pages 1–10, 2011

work page 2011

[44] [44]

Explaining odds ratios

Magdalena Szumilas. Explaining odds ratios. Journal of the Canadian academy of child and adolescent psychiatry, 19(3):227, 2010

work page 2010

[45] [45]

Testing advanced driver assistance systems using multi-objective search and neural networks

Raja Ben Abdessalem, Shiva Nejati, Lionel C Briand, and Thomas Stifter. Testing advanced driver assistance systems using multi-objective search and neural networks. In Proceedings of the 31st IEEE/ACM international conference on automated software engineering, pages 63–74, 2016

work page 2016

[46] [46]

Testing vision-based control systems using learnable evolutionary algorithms

Raja Ben Abdessalem, Shiva Nejati, Lionel C Briand, and Thomas Stifter. Testing vision-based control systems using learnable evolutionary algorithms. In Proceedings of the 40th International Conference on Software Engineering, pages 1016–1026, 2018

work page 2018

[47] [47]

A fast and elitist multiobjective genetic algorithm: Nsga-ii

Kalyanmoy Deb, Amrit Pratap, Sameer Agarwal, and TAMT Meyarivan. A fast and elitist multiobjective genetic algorithm: Nsga-ii. IEEE transactions on evolutionary computation, 6(2):182–197, 2002

work page 2002

[48] [48]

Av-fuzzer: Finding safety violations in autonomous driving systems

Guanpeng Li, Yiran Li, Saurabh Jha, Timothy Tsai, Michael Sullivan, Siva Kumar Sastry Hari, Zbigniew Kalbarczyk, and Ravishankar Iyer. Av-fuzzer: Finding safety violations in autonomous driving systems. In 2020 IEEE 31st international symposium on software reliability engineering (ISSRE), pages 25–36. IEEE, 2020

work page 2020

[49] [49]

Efficient online testing for dnn-enabled systems using surrogate-assisted and many-objective optimization

Fitash Ul Haq, Donghwan Shin, and Lionel Briand. Efficient online testing for dnn-enabled systems using surrogate-assisted and many-objective optimization. In Proceedings of the 44th international conference on software engineering, pages 811–822, 2022

work page 2022

[50] [50]

Lawbreaker: An approach for specifying traffic laws and fuzzing autonomous vehicles

Yang Sun, Christopher M Poskitt, Jun Sun, Yuqi Chen, and Zijiang Yang. Lawbreaker: An approach for specifying traffic laws and fuzzing autonomous vehicles. In Proceedings of the 37th IEEE/ACM International Conference on Automated Software Engineering, pages 1–12, 2022

work page 2022

[51] [51]

Adversarial evaluation of autonomous vehicles in lane-change scenarios

Baiming Chen, Xiang Chen, Qiong Wu, and Liang Li. Adversarial evaluation of autonomous vehicles in lane-change scenarios. IEEE transactions on intelligent transportation systems, 23(8):10333–10342, 2021

work page 2021

[52] [52]

Kochenderfer

Mark Koren, Saud Alsaif, Ritchie Lee, and Mykel J. Kochenderfer. Adaptive stress testing for autonomous vehicles. In 2018 IEEE Intelligent Vehicles Symposium (IV), pages 1–7, 2018

work page 2018

[53] [53]

Adversarial testing with reinforcement learning: A case study on autonomous driving

Andréa Doreste, Matteo Biagiola, and Paolo Tonella. Adversarial testing with reinforcement learning: A case study on autonomous driving. In 2024 IEEE Conference on Software Testing, Verification and Validation (ICST), pages 293–304. IEEE, 2024. 17

work page 2024