pith. sign in

arxiv: 2502.15792 · v2 · submitted 2025-02-18 · 💻 cs.SE · cs.LG· cs.RO

Reinforcement Learning for Testing Interdependent Requirements in Autonomous Vehicles: An Empirical Study

Pith reviewed 2026-05-23 02:32 UTC · model grok-4.3

classification 💻 cs.SE cs.LGcs.RO
keywords reinforcement learningautonomous vehiclesscenario-based testingmulti-objective RLsingle-objective RLrequirement violationsempirical study
0
0 comments X

The pith

Multi-objective RL generates more diverse violation scenarios for interdependent AV requirements than single-objective RL, though the latter finds higher-severity violations.

A machine-rendered reading of the paper's core claim, the machinery that carries it, and where it could break.

The paper compares single-objective reinforcement learning, which folds multiple objectives into one reward, against multi-objective reinforcement learning for generating test scenarios that expose violations of interdependent requirements in autonomous vehicles. It reports that the two approaches show comparable effectiveness in many cases but differ in violation patterns: MORL produces more violating scenarios and broader scenario coverage, while SORL tends to surface higher-severity violations. These differences also vary with particular objective combinations and, less strongly, with road conditions. A reader cares because AV requirements involve explicit trade-offs, so the choice of RL method directly shapes which safety issues surface during testing.

Core claim

MORL and SORL differ mainly in how violations occur, while showing comparable effectiveness in many cases. MORL tends to generate more requirement-violation scenarios, whereas SORL produces higher-severity violations. Their relative performance also depends on specific objective combinations and, to a lesser extent, road conditions. Regarding diversity, MORL consistently covers a broader range of scenarios. Thus, MORL is preferable when scenario diversity and coverage are prioritized, whereas SORL may better expose severe violations.

What carries the argument

Empirical head-to-head comparison of single-objective RL (SORL), which merges objectives into one reward, versus multi-objective RL (MORL), which treats objectives separately, when both are used to generate critical test scenarios for interdependent AV requirements.

If this is right

  • MORL is preferable when scenario diversity and coverage are prioritized.
  • SORL may better expose severe violations.
  • Relative performance depends on the specific objective combinations chosen.
  • Road conditions affect the two methods to a lesser extent.

Where Pith is reading between the lines

These are editorial extensions of the paper, not claims the author makes directly.

  • If the simulator results transfer, practitioners could select MORL when the testing budget allows broad exploration and SORL when the priority is depth of severity.
  • The observed dependence on objective combinations suggests that requirement trade-offs should be mapped explicitly before choosing an RL variant.

Load-bearing premise

The high-fidelity simulator and end-to-end AV controller used in the experiments provide a faithful proxy for real-world interdependent requirement violations and their severity.

What would settle it

Repeating the comparison on a different simulator or with a different AV controller and observing that MORL no longer produces more violations or that SORL no longer yields higher severity would falsify the reported distinction.

Figures

Figures reproduced from arXiv: 2502.15792 by Aitor Arrieta, Chengjie Lu, Jiahui Wu, Shaukat Ali.

Figure 1
Figure 1. Figure 1: Overview of AV Testing with MOEQT the testing environment, which contains the state of both the AV and its operating environment. The agent then samples a vector of multi-objective weights ωt based on the number of requirements targeted for violation. Taking the concatenation of st and ωt as input, the MQ-network computes the corresponding Q-values. Based on these Q-values and the weight vector ωt, the beh… view at source ↗
Figure 2
Figure 2. Figure 2: Driving roads for specifying driving tasks. [PITH_FULL_IMAGE:figures/full_fig_p009_2.png] view at source ↗
Figure 3
Figure 3. Figure 3: Convergence Trends of TTC and RC achieved by MOEQT and SORLW on Different Roads. short, MOEQT achieved an overall stable performance during training and well-balanced the two objectives, indicating that MORL with an adaptive weighting mechanism can better handle these two objectives compared to single-objective RL with fixed equal weights. Analyzing violations of requirements. Recall that our goal is to ge… view at source ↗
read the original abstract

Autonomous vehicles (AVs) make driving decisions without humans, making dependability assurance critical. Scenario-based testing is widely used to evaluate AVs under diverse conditions, with reinforcement learning (RL) generating test scenarios that identify violations of functional and safety requirements. Many requirements are interdependent and involve trade-offs, making it unclear whether single-objective RL (SORL), which combines objectives into a single reward, can reliably reveal violations or whether multi-objective RL (MORL), which explicitly considers multiple objectives, is necessary. We present an empirical evaluation comparing SORL and MORL for generating critical scenarios that simultaneously test interdependent requirements using an end-to-end AV controller and high-fidelity simulator. Results suggest that MORL and SORL differ mainly in how violations occur, while showing comparable effectiveness in many cases. MORL tends to generate more requirement-violation scenarios, whereas SORL produces higher-severity violations. Their relative performance also depends on specific objective combinations and, to a lesser extent, road conditions. Regarding diversity, MORL consistently covers a broader range of scenarios. Thus, MORL is preferable when scenario diversity and coverage are prioritized, whereas SORL may better expose severe violations. Our empirical evaluation addresses a gap by systematically comparing SORL and MORL, highlighting the importance of requirement dependencies in RL-based AV testing.

Editorial analysis

A structured set of objections, weighed in public.

Desk editor's note, referee report, simulated authors' rebuttal, and a circularity audit. Tearing a paper down is the easy half of reading it; the pith above is the substance, this is the friction.

Referee Report

2 major / 2 minor

Summary. The paper presents an empirical study comparing single-objective reinforcement learning (SORL) and multi-objective reinforcement learning (MORL) for generating test scenarios that expose violations of interdependent requirements in autonomous vehicles. Experiments use an end-to-end AV controller in a high-fidelity simulator; results indicate MORL produces more violation scenarios and greater diversity while SORL yields higher-severity violations, with relative performance depending on objective combinations and road conditions.

Significance. If the simulator and controller faithfully capture real-world violation dynamics and interdependencies, the work supplies concrete guidance on selecting RL methods for AV scenario-based testing and highlights the impact of requirement trade-offs. The systematic comparison of existing RL variants on a realistic controller is a positive contribution to empirical software engineering for safety-critical systems.

major comments (2)
  1. [Abstract / Experimental Setup] Abstract and Experimental Setup section: the central comparative claims (MORL generates more violations, SORL higher severity, MORL broader diversity) rest on the unvalidated assumption that the chosen high-fidelity simulator produces counts, severity metrics, and diversity measures that reflect actual AV requirement trade-offs. No cross-validation against physical tests, alternative simulators, or real sensor data is described; mismatches in dynamics (tire models, perception noise, controller latency) could invert the reported MORL/SORL differences.
  2. [Results] Results section: the abstract reports comparative results on violation count, severity, and diversity, yet the manuscript provides no details on statistical tests, effect sizes, controls for confounding factors (random seeds, hyperparameter sensitivity), or raw data availability. This undermines confidence that observed differences are robust rather than artifacts of the experimental configuration.
minor comments (2)
  1. [Methods] Clarify the precise definitions and weighting schemes used for the multi-objective reward functions and severity metrics; these are referenced but not fully formalized in the provided abstract.
  2. [Figures] Ensure all figures reporting scenario diversity include axis labels, legends, and error bars or confidence intervals consistent with the statistical analysis.

Simulated Author's Rebuttal

2 responses · 0 unresolved

We thank the referee for the constructive feedback. We address the two major comments point-by-point below and will revise the manuscript to incorporate the suggested improvements.

read point-by-point responses
  1. Referee: [Abstract / Experimental Setup] Abstract and Experimental Setup section: the central comparative claims (MORL generates more violations, SORL higher severity, MORL broader diversity) rest on the unvalidated assumption that the chosen high-fidelity simulator produces counts, severity metrics, and diversity measures that reflect actual AV requirement trade-offs. No cross-validation against physical tests, alternative simulators, or real sensor data is described; mismatches in dynamics (tire models, perception noise, controller latency) could invert the reported MORL/SORL differences.

    Authors: We agree this is a valid concern regarding external validity. Our work is positioned as a simulation-based empirical study, consistent with standard practice in AV testing literature where physical validation is often infeasible due to safety and cost. In the revision we will add a dedicated 'Threats to Validity' subsection that explicitly discusses simulator fidelity limitations, potential mismatches with real-world dynamics, and the scope of our comparative claims. We will also reference existing validation studies of the simulator where available. This addresses the comment without requiring new experiments. revision: yes

  2. Referee: [Results] Results section: the abstract reports comparative results on violation count, severity, and diversity, yet the manuscript provides no details on statistical tests, effect sizes, controls for confounding factors (random seeds, hyperparameter sensitivity), or raw data availability. This undermines confidence that observed differences are robust rather than artifacts of the experimental configuration.

    Authors: We acknowledge the omission of these methodological details. The revised manuscript will expand the Results section to report: statistical tests performed (including p-values), effect sizes, the use of multiple independent runs (with different random seeds) to control for stochasticity, a brief hyperparameter sensitivity analysis, and a public repository link for raw data and replication scripts. These additions will be included in the next version. revision: yes

Circularity Check

0 steps flagged

No circularity: purely empirical comparison without derivations or self-referential predictions

full rationale

The paper is an empirical study that runs SORL and MORL algorithms on a high-fidelity simulator to generate and compare test scenarios for interdependent AV requirements. No equations, fitted parameters renamed as predictions, or derivation chains appear in the abstract or described methodology. Central claims rest on observed differences in violation counts, severity, and diversity from experimental runs, which are falsifiable against the simulator outputs and do not reduce to self-definition or self-citation. Self-citations, if present, are not load-bearing for any uniqueness theorem or ansatz. This is a standard non-circular empirical evaluation.

Axiom & Free-Parameter Ledger

0 free parameters · 2 axioms · 0 invented entities

The study is purely empirical and rests on domain assumptions about the fidelity of the simulator and the representativeness of the chosen AV controller rather than new mathematical constructs.

axioms (2)
  • domain assumption The high-fidelity simulator accurately reproduces real-world AV dynamics and the effects of requirement violations
    Invoked implicitly as the basis for all generated scenarios and severity measurements
  • standard math Standard RL training procedures can be configured to optimize for requirement-violation objectives
    Background assumption required to treat SORL and MORL as valid testing generators

pith-pipeline@v0.9.0 · 5775 in / 1403 out tokens · 29546 ms · 2026-05-23T02:32:23.859797+00:00 · methodology

discussion (0)

Sign in with ORCID, Apple, or X to comment. Anyone can read and Pith papers without signing in.

Forward citations

Cited by 1 Pith paper

Reviewed papers in the Pith corpus that reference this work. Sorted by Pith novelty score.

  1. From Research to Practice: An Interactive Rapid Review of Autonomous Driving System Testing in Industry

    cs.SE 2026-05 unverdicted novelty 5.0

    Industry practitioners identified 12 ADS testing challenges, prioritized two for end-to-end systems, and found that most of the 17 examined research studies lack direct applicability to real industrial contexts.

Reference graph

Works this paper leans on

53 extracted references · 53 canonical work pages · cited by 1 Pith paper · 2 internal anchors

  1. [1]

    Finding critical scenarios for automated driving systems: A systematic mapping study

    Xinhai Zhang, Jianbo Tao, Kaige Tan, Martin Törngren, José Manuel Gaspar Sánchez, Muhammad Rusyadi Ramli, Xin Tao, Magnus Gyllenhammar, Franz Wotawa, Naveen Mohan, et al. Finding critical scenarios for automated driving systems: A systematic mapping study. IEEE Transactions on Software Engineering, 49(3):991–1026, 2022

  2. [2]

    A survey on safety-critical driving scenario generation—a methodological perspective

    Wenhao Ding, Chejian Xu, Mansur Arief, Haohong Lin, Bo Li, and Ding Zhao. A survey on safety-critical driving scenario generation—a methodological perspective. IEEE Transactions on Intelligent Transportation Systems, 24(7):6971–6988, 2023

  3. [3]

    Mind the gap! a study on the transferability of virtual versus physical-world testing of autonomous driving systems

    Andrea Stocco, Brian Pulfer, and Paolo Tonella. Mind the gap! a study on the transferability of virtual versus physical-world testing of autonomous driving systems. IEEE Transactions on Software Engineering, 49(4):1928– 1940, 2022

  4. [4]

    Model vs system level testing of autonomous driving systems: a replication and extension study

    Andrea Stocco, Brian Pulfer, and Paolo Tonella. Model vs system level testing of autonomous driving systems: a replication and extension study. Empirical Software Engineering, 28(3):73, 2023

  5. [5]

    Machine learning testing in an adas case study using simulation-integrated bio-inspired search-based testing

    Mahshid Helali Moghadam, Markus Borg, Mehrdad Saadatmand, Seyed Jalaleddin Mousavirad, Markus Bohlin, and Björn Lisper. Machine learning testing in an adas case study using simulation-integrated bio-inspired search-based testing. Journal of Software: Evolution and Process, 36(5):e2591, 2024

  6. [6]

    Identifying and explaining safety-critical scenarios for autonomous vehicles via key features

    Neelofar Neelofar and Aldeida Aleti. Identifying and explaining safety-critical scenarios for autonomous vehicles via key features. ACM Transactions on Software Engineering and Methodology, 33(4):1–32, 2024

  7. [7]

    Pafot: A position-based approach for finding optimal tests of autonomous vehicles

    Victor Crespo-Rodriguez, Neelofar, and Aldeida Aleti. Pafot: A position-based approach for finding optimal tests of autonomous vehicles. In Proceedings of the 5th ACM/IEEE International Conference on Automation of Software Test (AST 2024), pages 159–170, 2024

  8. [8]

    Carla: An open urban driving simulator

    Alexey Dosovitskiy, German Ros, Felipe Codevilla, Antonio Lopez, and Vladlen Koltun. Carla: An open urban driving simulator. In Conference on robot learning, pages 1–16. PMLR, 2017

  9. [9]

    Neural network guided evolutionary fuzzing for finding traffic violations of autonomous vehicles

    Ziyuan Zhong, Gail Kaiser, and Baishakhi Ray. Neural network guided evolutionary fuzzing for finding traffic violations of autonomous vehicles. IEEE Transactions on Software Engineering, 49(4):1860–1875, 2022

  10. [10]

    Epitester: Testing autonomous vehicles with epigenetic algorithm and attention mechanism

    Chengjie Lu, Shaukat Ali, and Tao Yue. Epitester: Testing autonomous vehicles with epigenetic algorithm and attention mechanism. IEEE Transactions on Software Engineering, pages 1–19, 2024

  11. [11]

    scenorita: Generating diverse, fully mutable, test scenarios for autonomous vehicle planning

    Yuqi Huai, Sumaya Almanee, Yuntianyi Chen, Xiafa Wu, Qi Alfred Chen, and Joshua Garcia. scenorita: Generating diverse, fully mutable, test scenarios for autonomous vehicle planning. IEEE Transactions on Software Engineering, 49(10):4656–4676, 2023

  12. [12]

    Specification-based autonomous driving system testing

    Yuan Zhou, Yang Sun, Yun Tang, Yuqi Chen, Jun Sun, Christopher M Poskitt, Yang Liu, and Zijiang Yang. Specification-based autonomous driving system testing. IEEE Transactions on Software Engineering, 49(6):3391– 3410, 2023

  13. [13]

    Reinforcement learning: An introduction

    Richard S Sutton and Andrew G Barto. Reinforcement learning: An introduction. MIT press, 2018

  14. [14]

    Learning configu- rations of operating environment of autonomous vehicles to maximize their collisions

    Chengjie Lu, Yize Shi, Huihui Zhang, Man Zhang, Tiexin Wang, Tao Yue, and Shaukat Ali. Learning configu- rations of operating environment of autonomous vehicles to maximize their collisions. IEEE Transactions on Software Engineering, 49(1):384–402, 2022

  15. [15]

    Fitash Ul Haq, Donghwan Shin, and Lionel C. Briand. Many-objective reinforcement learning for online testing of dnn-enabled systems. In 2023 IEEE/ACM 45th International Conference on Software Engineering (ICSE), pages 1814–1826, 2023

  16. [16]

    Dense reinforcement learning for safety validation of autonomous vehicles

    Shuo Feng, Haowei Sun, Xintao Yan, Haojie Zhu, Zhengxia Zou, Shengyin Shen, and Henry X Liu. Dense reinforcement learning for safety validation of autonomous vehicles. Nature, 615(7953):620–627, 2023

  17. [17]

    Multiobjective reinforcement learning: A comprehensive overview

    Chunming Liu, Xin Xu, and Dewen Hu. Multiobjective reinforcement learning: A comprehensive overview. IEEE Transactions on Systems, Man, and Cybernetics: Systems, 45(3):385–398, 2015. 15 Multi-Objective Reinforcement Learning for Critical Scenario Generation of Autonomous Vehicles

  18. [18]

    A practical guide to multi-objective reinforcement learning and planning

    Conor F Hayes, Roxana R˘adulescu, Eugenio Bargiacchi, Johan Källström, Matthew Macfarlane, Mathieu Rey- mond, Timothy Verstraeten, Luisa M Zintgraf, Richard Dazeley, Fredrik Heintz, et al. A practical guide to multi-objective reinforcement learning and planning. Autonomous Agents and Multi-Agent Systems, 36(1):26, 2022

  19. [19]

    Prediction-guided multi-objective reinforcement learning for continuous robot control

    Jie Xu, Yunsheng Tian, Pingchuan Ma, Daniela Rus, Shinjiro Sueda, and Wojciech Matusik. Prediction-guided multi-objective reinforcement learning for continuous robot control. In International conference on machine learning, pages 10607–10616. PMLR, 2020

  20. [20]

    Deep reinforcement learning for autonomous driving: A survey

    B Ravi Kiran, Ibrahim Sobh, Victor Talpaert, Patrick Mannion, Ahmad A Al Sallab, Senthil Yogamani, and Patrick Pérez. Deep reinforcement learning for autonomous driving: A survey. IEEE Transactions on Intelligent Transportation Systems, 23(6):4909–4926, 2021

  21. [21]

    A generalized algorithm for multi-objective reinforcement learning and policy adaptation

    Runzhe Yang, Xingyuan Sun, and Karthik Narasimhan. A generalized algorithm for multi-objective reinforcement learning and policy adaptation. Advances in neural information processing systems, 32, 2019

  22. [22]

    Safety-enhanced autonomous driving using interpretable sensor fusion transformer

    Hao Shao, Letian Wang, Ruobing Chen, Hongsheng Li, and Yu Liu. Safety-enhanced autonomous driving using interpretable sensor fusion transformer. In Conference on Robot Learning, pages 726–737. PMLR, 2023

  23. [23]

    Q-learning

    Christopher JCH Watkins and Peter Dayan. Q-learning. Machine learning, 8:279–292, 1992

  24. [24]

    A survey of multi-objective sequential decision-making

    Diederik M Roijers, Peter Vamplew, Shimon Whiteson, and Richard Dazeley. A survey of multi-objective sequential decision-making. Journal of Artificial Intelligence Research, 48:67–113, 2013

  25. [25]

    A toolkit for reliable benchmarking and research in multi-objective reinforcement learning

    Florian Felten, Lucas N Alegre, Ann Nowe, Ana Bazzan, El Ghazali Talbi, Grégoire Danoy, and Bruno C da Silva. A toolkit for reliable benchmarking and research in multi-objective reinforcement learning. Advances in Neural Information Processing Systems, 36, 2024

  26. [26]

    Human-level control through deep reinforcement learning

    V olodymyr Mnih, Koray Kavukcuoglu, David Silver, Andrei A Rusu, Joel Veness, Marc G Bellemare, Alex Graves, Martin Riedmiller, Andreas K Fidjeland, Georg Ostrovski, et al. Human-level control through deep reinforcement learning. nature, 518(7540):529–533, 2015

  27. [27]

    Modern homotopy methods in optimization

    Layne T Watson and Raphael T Haftka. Modern homotopy methods in optimization. Computer Methods in Applied Mechanics and Engineering, 74(3):289–305, 1989

  28. [28]

    Prioritized Experience Replay

    Tom Schaul. Prioritized experience replay. arXiv preprint arXiv:1511.05952, 2015

  29. [29]

    A survey of state-action representations for autonomous driving

    Edouard Leurent. A survey of state-action representations for autonomous driving. 2018

  30. [30]

    Attention is all you need

    A Vaswani. Attention is all you need. Advances in Neural Information Processing Systems, 2017

  31. [31]

    A deep reinforcement learning approach for dynamically stable inverse kinematics of humanoid robots

    S Phaniteja, Parijat Dewangan, Pooja Guhan, Abhishek Sarkar, and K Madhava Krishna. A deep reinforcement learning approach for dynamically stable inverse kinematics of humanoid robots. In 2017 IEEE international conference on robotics and biomimetics (ROBIO), pages 1818–1823. IEEE, 2017

  32. [32]

    Conditional dqn-based motion planning with fuzzy logic for autonomous driving

    Long Chen, Xuemin Hu, Bo Tang, and Yu Cheng. Conditional dqn-based motion planning with fuzzy logic for autonomous driving. IEEE Transactions on Intelligent Transportation Systems, 23(4):2966–2977, 2020

  33. [33]

    Requirements- driven test generation for autonomous vehicles with machine learning components

    Cumhur Erkan Tuncali, Georgios Fainekos, Danil Prokhorov, Hisahiro Ito, and James Kapinski. Requirements- driven test generation for autonomous vehicles with machine learning components. IEEE Transactions on Intelligent Vehicles, 5(2):265–280, 2019

  34. [34]

    Extended time-to-collision measures for road traffic safety assessment

    Michiel M Minderhoud and Piet HL Bovy. Extended time-to-collision measures for road traffic safety assessment. Accident Analysis & Prevention, 33(1):89–97, 2001

  35. [35]

    Planning and decision-making for autonomous vehicles

    Wilko Schwarting, Javier Alonso-Mora, and Daniela Rus. Planning and decision-making for autonomous vehicles. Annual Review of Control, Robotics, and Autonomous Systems, 1(1):187–210, 2018

  36. [36]

    Carla autonomous driving leaderboard

    CARLA Team, Intel Autonomous Agents Lab, Embodied AI Foundation, and AlphaDrive. Carla autonomous driving leaderboard. https://leaderboard.carla.org, 2024

  37. [37]

    Epitester: Testing autonomous vehicles with epigenetic algorithm and attention mechanism

    Chengjie Lu, Shaukat Ali, and Tao Yue. Epitester: Testing autonomous vehicles with epigenetic algorithm and attention mechanism. IEEE Transactions on Software Engineering, 2024

  38. [38]

    Optuna: A next-generation hyperparameter optimization framework

    Takuya Akiba, Shotaro Sano, Toshihiko Yanase, Takeru Ohta, and Masanori Koyama. Optuna: A next-generation hyperparameter optimization framework. In Proceedings of the 25th ACM SIGKDD international conference on knowledge discovery & data mining, pages 2623–2631, 2019

  39. [39]

    Jiahui Wu, Chengjie Lu, Aitor Arrieta, and Shaukat Ali. MOEQT. https://github.com/Simula-COMPLEX/ MOEQT, 2025

  40. [40]

    Incentivizing Exploration In Reinforcement Learning With Deep Predictive Models

    Bradly C Stadie, Sergey Levine, and Pieter Abbeel. Incentivizing exploration in reinforcement learning with deep predictive models. arXiv preprint arXiv:1507.00814, 2015. 16 Multi-Objective Reinforcement Learning for Critical Scenario Generation of Autonomous Vehicles

  41. [41]

    Pre-crash scenario typology for crash avoidance research

    Wassim G Najm, John D Smith, Mikio Yanagisawa, et al. Pre-crash scenario typology for crash avoidance research. Technical report, United States. Department of Transportation. National Highway Traffic Safety . . . , 2007

  42. [42]

    Statistical methods for research workers

    Ronald Aylmer Fisher. Statistical methods for research workers. In Breakthroughs in statistics: Methodology and distribution, pages 66–70. Springer, 1970

  43. [43]

    A practical guide for using statistical tests to assess randomized algorithms in software engineering

    Andrea Arcuri and Lionel Briand. A practical guide for using statistical tests to assess randomized algorithms in software engineering. In Proceedings of the 33rd international conference on software engineering, pages 1–10, 2011

  44. [44]

    Explaining odds ratios

    Magdalena Szumilas. Explaining odds ratios. Journal of the Canadian academy of child and adolescent psychiatry, 19(3):227, 2010

  45. [45]

    Testing advanced driver assistance systems using multi-objective search and neural networks

    Raja Ben Abdessalem, Shiva Nejati, Lionel C Briand, and Thomas Stifter. Testing advanced driver assistance systems using multi-objective search and neural networks. In Proceedings of the 31st IEEE/ACM international conference on automated software engineering, pages 63–74, 2016

  46. [46]

    Testing vision-based control systems using learnable evolutionary algorithms

    Raja Ben Abdessalem, Shiva Nejati, Lionel C Briand, and Thomas Stifter. Testing vision-based control systems using learnable evolutionary algorithms. In Proceedings of the 40th International Conference on Software Engineering, pages 1016–1026, 2018

  47. [47]

    A fast and elitist multiobjective genetic algorithm: Nsga-ii

    Kalyanmoy Deb, Amrit Pratap, Sameer Agarwal, and TAMT Meyarivan. A fast and elitist multiobjective genetic algorithm: Nsga-ii. IEEE transactions on evolutionary computation, 6(2):182–197, 2002

  48. [48]

    Av-fuzzer: Finding safety violations in autonomous driving systems

    Guanpeng Li, Yiran Li, Saurabh Jha, Timothy Tsai, Michael Sullivan, Siva Kumar Sastry Hari, Zbigniew Kalbarczyk, and Ravishankar Iyer. Av-fuzzer: Finding safety violations in autonomous driving systems. In 2020 IEEE 31st international symposium on software reliability engineering (ISSRE), pages 25–36. IEEE, 2020

  49. [49]

    Efficient online testing for dnn-enabled systems using surrogate-assisted and many-objective optimization

    Fitash Ul Haq, Donghwan Shin, and Lionel Briand. Efficient online testing for dnn-enabled systems using surrogate-assisted and many-objective optimization. In Proceedings of the 44th international conference on software engineering, pages 811–822, 2022

  50. [50]

    Lawbreaker: An approach for specifying traffic laws and fuzzing autonomous vehicles

    Yang Sun, Christopher M Poskitt, Jun Sun, Yuqi Chen, and Zijiang Yang. Lawbreaker: An approach for specifying traffic laws and fuzzing autonomous vehicles. In Proceedings of the 37th IEEE/ACM International Conference on Automated Software Engineering, pages 1–12, 2022

  51. [51]

    Adversarial evaluation of autonomous vehicles in lane-change scenarios

    Baiming Chen, Xiang Chen, Qiong Wu, and Liang Li. Adversarial evaluation of autonomous vehicles in lane-change scenarios. IEEE transactions on intelligent transportation systems, 23(8):10333–10342, 2021

  52. [52]

    Kochenderfer

    Mark Koren, Saud Alsaif, Ritchie Lee, and Mykel J. Kochenderfer. Adaptive stress testing for autonomous vehicles. In 2018 IEEE Intelligent Vehicles Symposium (IV), pages 1–7, 2018

  53. [53]

    Adversarial testing with reinforcement learning: A case study on autonomous driving

    Andréa Doreste, Matteo Biagiola, and Paolo Tonella. Adversarial testing with reinforcement learning: A case study on autonomous driving. In 2024 IEEE Conference on Software Testing, Verification and Validation (ICST), pages 293–304. IEEE, 2024. 17