pith. machine review for the scientific record. sign in

arxiv: 2604.24562 · v1 · submitted 2026-04-27 · 💻 cs.AI · cs.CL· cs.CY

Recognition: unknown

Towards Lawful Autonomous Driving: Deriving Scenario-Aware Driving Requirements from Traffic Laws and Regulations

Bowen Jian, Hong Wang, Liqiang Wang, Rongjie Yu, Zihang Zou

Authors on Pith no claims yet

Pith reviewed 2026-05-08 03:24 UTC · model grok-4.3

classification 💻 cs.AI cs.CLcs.CY
keywords autonomous vehiclestraffic lawslarge language modelsscenario taxonomylegal compliancedriving requirementsAV navigationscenario anchors
0
0 comments X

The pith

A traffic scenario taxonomy with node-wise anchors grounds LLMs to derive accurate mandatory and prohibitive driving requirements from traffic laws for autonomous vehicles.

A machine-rendered reading of the paper's core claim, the machinery that carries it, and where it could break.

Autonomous vehicles need to follow traffic laws in many different situations, yet manually writing formal rules for every case is slow and difficult to update. The paper shows that structuring scenarios into a taxonomy and using node-wise anchors to guide large language models helps the models pick the right legal provisions and turn them into clear requirements. This avoids the common problem where models pull in irrelevant rules or overlook applicable ones. When the approach works, it gives a practical way to build law-compliant behavior into AV systems at scale. The authors demonstrate this by creating a compliance layer for navigation and a real-time onboard monitor.

Core claim

The paper establishes that a pipeline grounding LLM reasoning in a traffic scenario taxonomy through node-wise anchors encoding hierarchical semantics improves law-scenario matching by 29.1 percent and raises accuracy of derived mandatory and prohibitive requirements by 36.9 percent and 38.2 percent respectively when tested on Chinese traffic laws and the OnSite dataset of 5,897 scenarios. It further shows real-world use by building a law-compliance layer for AV navigation and an onboard real-time compliance monitor for in-field testing.

What carries the argument

Traffic scenario taxonomy with node-wise anchors that encode hierarchical semantics to guide LLM reasoning over legal provisions

If this is right

  • A law-compliance layer can be added to existing AV navigation planners
  • An onboard real-time monitor can check compliance during actual driving
  • The approach supplies a scalable base for AV development and regulatory review
  • Mandatory and prohibitive requirements become more reliably tied to specific scenarios

Where Pith is reading between the lines

These are editorial extensions of the paper, not claims the author makes directly.

  • The same taxonomy structure could be adapted to traffic regulations in other countries to test broader use
  • Linking the derived requirements directly to real-time perception data would let AVs check compliance while driving
  • Expanding the taxonomy to include rare edge scenarios would reveal whether the accuracy gains persist

Load-bearing premise

The traffic scenario taxonomy and node-wise anchors capture the full range of real-world driving variability while keeping LLM outputs accurate and complete when applied outside the tested Chinese laws and OnSite dataset.

What would settle it

Running the same pipeline on traffic laws from a different jurisdiction or on a new dataset with substantially different scenarios and finding no gain in matching rate or requirement accuracy over plain LLM prompting would show the central claim does not hold.

read the original abstract

Driving in compliance with traffic laws and regulations is a basic requirement for human drivers, yet autonomous vehicles (AVs) can violate these requirements in diverse real-world scenarios. To encode law compliance into AV systems, conventional approaches use formal logic languages to explicitly specify behavioral constraints, but this process is labor-intensive, hard to scale, and costly to maintain. With recent advances in artificial intelligence, it is promising to leverage large language models (LLMs) to derive legal requirements from traffic laws and regulations. However, without explicitly grounding and reasoning in structured traffic scenarios, LLMs often retrieve irrelevant provisions or miss applicable ones, yielding imprecise requirements. To address this, we propose a novel pipeline that grounds LLM reasoning in a traffic scenario taxonomy through node-wise anchors that encode hierarchical semantics. On Chinese traffic laws and OnSite dataset (5,897 scenarios), our method improves law-scenario matching by 29.1\% and increases the accuracy of derived mandatory and prohibitive requirements by 36.9\% and 38.2\%, respectively. We further demonstrate real-world applicability by constructing a law-compliance layer for AV navigation and developing an onboard, real-time compliance monitor for in-field testing, providing a solid foundation for future AV development, deployment, and regulatory oversight.

Editorial analysis

A structured set of objections, weighed in public.

Desk editor's note, referee report, simulated authors' rebuttal, and a circularity audit. Tearing a paper down is the easy half of reading it; the pith above is the substance, this is the friction.

Referee Report

2 major / 1 minor

Summary. The paper proposes a pipeline to derive scenario-aware driving requirements from traffic laws by grounding LLM reasoning in a hierarchical traffic scenario taxonomy using node-wise anchors. On Chinese traffic laws and the OnSite dataset of 5,897 scenarios, it reports a 29.1% improvement in law-scenario matching along with 36.9% and 38.2% gains in accuracy for mandatory and prohibitive requirements; it further constructs a law-compliance layer for AV navigation and an onboard real-time monitor demonstrated via in-field testing.

Significance. If the grounding mechanism proves robust, the approach could offer a more scalable alternative to manual formal-logic encoding of traffic rules for autonomous vehicles, with direct implications for compliance monitoring and regulatory integration. The inclusion of a deployed real-time monitor provides a concrete path from derivation to operational use.

major comments (2)
  1. [Experiments] Experiments section: the reported gains (29.1% matching, 36.9%/38.2% accuracy) are presented without specifying the baseline methods, the precise definition or measurement protocol for 'accuracy' of derived requirements, error bars, or statistical significance tests, rendering the central empirical claims difficult to verify or reproduce.
  2. [Method] Method section: the scenario taxonomy and node-wise anchors are presented as the key grounding innovation, yet no ablation studies, derivation details independent of the test laws, or cross-jurisdiction validation are provided; this makes it impossible to determine whether the observed improvements stem from the proposed structure or from dataset-specific tuning.
minor comments (1)
  1. [Abstract] The abstract would benefit from a brief sentence outlining the main pipeline stages before stating the quantitative results.

Simulated Author's Rebuttal

2 responses · 1 unresolved

We thank the referee for their constructive and detailed feedback on our manuscript. We address each major comment point by point below, indicating where revisions will be made to improve clarity, reproducibility, and rigor.

read point-by-point responses
  1. Referee: [Experiments] Experiments section: the reported gains (29.1% matching, 36.9%/38.2% accuracy) are presented without specifying the baseline methods, the precise definition or measurement protocol for 'accuracy' of derived requirements, error bars, or statistical significance tests, rendering the central empirical claims difficult to verify or reproduce.

    Authors: We agree that the experimental reporting requires additional detail for full reproducibility and verifiability. In the revised manuscript, we will specify the baseline methods (direct LLM prompting without grounding and a keyword-based retrieval baseline), define accuracy as the fraction of derived requirements that match expert-verified ground truth (where experts assess whether mandatory and prohibitive requirements correctly capture applicable legal obligations for each scenario), report standard deviations across multiple runs with varied random seeds, and include statistical significance testing (paired t-tests) for the observed improvements. revision: yes

  2. Referee: [Method] Method section: the scenario taxonomy and node-wise anchors are presented as the key grounding innovation, yet no ablation studies, derivation details independent of the test laws, or cross-jurisdiction validation are provided; this makes it impossible to determine whether the observed improvements stem from the proposed structure or from dataset-specific tuning.

    Authors: The scenario taxonomy is derived from standard hierarchical traffic engineering classifications (e.g., based on road type, maneuver, and environmental factors) and is described in Section 3.1 as independent of the specific test laws. Node-wise anchors encode level-specific semantics to ground LLM reasoning. We will add ablation studies in the revision, comparing the full pipeline against variants without the hierarchy and without node-wise anchors. We will also clarify the taxonomy derivation process. However, cross-jurisdiction validation is not feasible in the current work due to the absence of equivalent annotated datasets. revision: partial

standing simulated objections not resolved
  • Cross-jurisdiction validation, as no equivalent annotated scenario datasets from other legal systems are currently available to the authors.

Circularity Check

0 steps flagged

No circularity: empirical gains are direct comparisons of a proposed pipeline against baselines

full rationale

The paper introduces a novel grounding pipeline (scenario taxonomy + node-wise anchors) for LLM-based requirement derivation and evaluates it via direct accuracy and matching metrics on the OnSite dataset against conventional formal-logic approaches. No equations, fitted parameters renamed as predictions, self-citations that bear the central claim, or ansatzes imported from prior author work appear in the derivation. The reported 29.1%/36.9%/38.2% improvements are presented as empirical outcomes of the new method, not reductions to the inputs by construction. The derivation chain is therefore self-contained.

Axiom & Free-Parameter Ledger

0 free parameters · 1 axioms · 1 invented entities

The central claim rests on the untested assumption that the chosen taxonomy plus anchors sufficiently represent real traffic variability and that LLMs produce faithful requirements once grounded; no explicit free parameters or new physical entities are introduced.

axioms (1)
  • domain assumption LLMs can retrieve and apply relevant legal provisions accurately when provided with structured scenario anchors encoding hierarchical semantics
    Invoked to justify the pipeline's improvement over ungrounded LLM use.
invented entities (1)
  • node-wise anchors no independent evidence
    purpose: Encode hierarchical semantics of traffic scenarios to guide LLM reasoning toward applicable laws
    New construct introduced in the proposed pipeline to address irrelevant or missed provisions.

pith-pipeline@v0.9.0 · 5535 in / 1320 out tokens · 32344 ms · 2026-05-08T03:24:00.912899+00:00 · methodology

discussion (0)

Sign in with ORCID, Apple, or X to comment. Anyone can read and Pith papers without signing in.

Reference graph

Works this paper leans on

62 extracted references · 6 canonical work pages · 4 internal anchors

  1. [1]

    An ethi- cal trajectory planning algorithm for autonomous vehicles.Nature Machine Intelligence, 5(2):137–144, 2023

    Maximilian Geisslinger, Franziska Poszler, and Markus Lienkamp. An ethi- cal trajectory planning algorithm for autonomous vehicles.Nature Machine Intelligence, 5(2):137–144, 2023

  2. [2]

    On a Formal Model of Safe and Scalable Self-driving Cars

    Shai Shalev-Shwartz, Shaked Shammah, and Amnon Shashua. On a formal model of safe and scalable self-driving cars.arXiv preprint arXiv:1708.06374, 2017

  3. [3]

    Using online verification to prevent autonomous vehicles from causing ac- cidents.Nature Machine Intelligence, 2(9):518–528, 2020

    Christian Pek, Stefanie Manzinger, Markus Koschi, and Matthias Althoff. Using online verification to prevent autonomous vehicles from causing ac- cidents.Nature Machine Intelligence, 2(9):518–528, 2020

  4. [4]

    Automotive World, March 2025

    Tesla pauses china FSD rollout to gain regulatory approval. Automotive World, March 2025. Accessed 2026-01-06

  5. [5]

    Tesla forced to halt FSD trials in china over new rules

    Brad Anderson. Tesla forced to halt FSD trials in china over new rules. Carscoops, March 2025. Accessed 2026-01-06

  6. [6]

    Yahoo Autos, March 2025

    Tesla stops FSD free trial in china just one week after launch. Yahoo Autos, March 2025. Accessed 2026-01-06

  7. [7]

    Tesla’s self-driving rollout in china hits a regulatory speed bump

    Max McDee. Tesla’s self-driving rollout in china hits a regulatory speed bump. ArenaEV, March 2025. Accessed 2026-01-06

  8. [8]

    Glob- alChinaEV, March 2025

    Tesla china reportedly paused full self-driving (FSD) trial rollout. Glob- alChinaEV, March 2025. Accessed 2026-01-06

  9. [9]

    People. Waymo vehicle allegedly blocks emergency crews responding to austin mass shooting.https://people.com/waymo-vehicle-allegedly -blocks-emergency-crews-responding-austin-mass-shooting-11917 679, 2026. [Accessed 03-03-2026]

  10. [10]

    Baidu’s mass robotaxi rollout stirs heated debate in china

    Sixth Tone. Baidu’s mass robotaxi rollout stirs heated debate in china. Sixth Tone, 2024

  11. [11]

    Robotaxis — arriving at a future near you.China Daily HK, 2024

    China Daily HK. Robotaxis — arriving at a future near you.China Daily HK, 2024

  12. [12]

    China’s xiaomi says it is cooperating with police after fatal ev accident.Reuters, 2025

    Reuters. China’s xiaomi says it is cooperating with police after fatal ev accident.Reuters, 2025

  13. [13]

    Xiaomi auto denies claims ’spontaneous combustion’ caused fire in fatal su7 car crash.Yicai Global, 2025

    Yicai Global. Xiaomi auto denies claims ’spontaneous combustion’ caused fire in fatal su7 car crash.Yicai Global, 2025

  14. [14]

    Half of new cars sold in china have l2 assisted driving tech, head of china ev100 says

    Yisi Xiao. Half of new cars sold in china have l2 assisted driving tech, head of china ev100 says. Yicai Global, July 2025. Accessed 2026-01-06

  15. [15]

    Delivering more for our riders in a year of incredible growth

    Waymo. Delivering more for our riders in a year of incredible growth. Waymo Blog, December 2025. Accessed 2026-01-06. 17

  16. [16]

    Waymo robotaxis did 14 million trips in 2025

    The Verge. Waymo robotaxis did 14 million trips in 2025. The Verge, December 2025. Accessed 2026-01-06

  17. [17]

    Baidu, Inc., February 2026

    Baidu announces fourth quarter and fiscal year 2025 results. Baidu, Inc., February 2026. Accessed 2026-03-10

  18. [18]

    Standing gen- eral order on crash reporting

    National Highway Traffic Safety Administration (NHTSA). Standing gen- eral order on crash reporting. NHTSA Webpage, 2021. Accessed 2026-01- 06

  19. [19]

    Odi investiga- tion opening resume: Pe25-013 (waymo llc) — performance around stopped school buses

    National Highway Traffic Safety Administration (NHTSA). Odi investiga- tion opening resume: Pe25-013 (waymo llc) — performance around stopped school buses. PDF, 2025

  20. [20]

    Part 573 safety recall report 25e-084 (waymo llc) — school bus stop violations

    National Highway Traffic Safety Administration (NHTSA). Part 573 safety recall report 25e-084 (waymo llc) — school bus stop violations. PDF, 2025

  21. [21]

    China bans ’smart’ and ’autonomous’ driving terms from vehicle ads

    Reuters. China bans ’smart’ and ’autonomous’ driving terms from vehicle ads. Reuters, April 2025. Accessed 2026-01-06

  22. [22]

    China mandates regulatory approvals for autonomous driving software upgrades

    Reuters. China mandates regulatory approvals for autonomous driving software upgrades. Reuters, February 2025. Accessed 2026-01-06

  23. [23]

    China pilots l3 vehicles on roads

    CHINA DAILY. China pilots l3 vehicles on roads. CHINA DAILY, De- cember 2025. Accessed 2026-02-14

  24. [24]

    Notice on further strengthening the management of product access, recall and online soft- ware upgrade of intelligent connected vehicles, 2025

    Ministry of Industry and Information Technology. Notice on further strengthening the management of product access, recall and online soft- ware upgrade of intelligent connected vehicles, 2025

  25. [25]

    Planning-oriented autonomous driving

    Yihan Hu, Jiazhi Yang, Li Chen, Keyu Li, Chonghao Sima, Xizhou Zhu, Siqi Chai, Senyao Du, Tianwei Lin, Wenhai Wang, et al. Planning-oriented autonomous driving. InProceedings of the IEEE/CVF conference on com- puter vision and pattern recognition, pages 17853–17862, 2023

  26. [26]

    Vad: Vectorized scene representation for efficient autonomous driving

    Bo Jiang, Shaoyu Chen, Qing Xu, Bencheng Liao, Jiajie Chen, Helong Zhou, Qian Zhang, Wenyu Liu, Chang Huang, and Xinggang Wang. Vad: Vectorized scene representation for efficient autonomous driving. InPro- ceedings of the IEEE/CVF International Conference on Computer Vision, pages 8340–8350, 2023

  27. [27]

    End-to-end driving with online trajectory evaluation via bev world model

    Yingyan Li, Yuqi Wang, Yang Liu, Jiawei He, Lue Fan, and Zhaoxiang Zhang. End-to-end driving with online trajectory evaluation via bev world model. InProceedings of the IEEE/CVF International Conference on Com- puter Vision, pages 27137–27146, 2025

  28. [28]

    Formalising and monitoring traffic rules for autonomous vehicles in isabelle/hol.Springer, Cham, 2017

    Albert Rizaldi, Jonas Keinholz, Monika Huber, Jochen Feldle, and Tobias Nipkow. Formalising and monitoring traffic rules for autonomous vehicles in isabelle/hol.Springer, Cham, 2017. 18

  29. [29]

    Intention-aware motion planning with road rules

    Jesper Karlsson and Jana Tumova. Intention-aware motion planning with road rules. In2020 IEEE 16th International Conference on Automation Science and Engineering (CASE), page 526–532. IEEE, 2020

  30. [30]

    Specifying safety of autonomous vehicles in signal temporal logic

    Nikos Arechiga. Specifying safety of autonomous vehicles in signal temporal logic. In2019 IEEE Intelligent Vehicles Symposium (IV), page 58–63. IEEE, 2019

  31. [31]

    Encoding and monitoring responsibility sensitive safety rules for automated vehicles in signal temporal logic

    Mohammad Hekmatnejad, Shakiba Yaghoubi, Adel Dokhanchi, Heni Ben Amor, Aviral Shrivastava, Lina Karam, and Georgios Fainekos. Encoding and monitoring responsibility sensitive safety rules for automated vehicles in signal temporal logic. InProceedings of the 17th ACM-IEEE Interna- tional Conference on Formal Methods and Models for System Design, page 1–11, 2019

  32. [32]

    Formalization of interstate traffic rules in temporal logic

    Sebastian Maierhofer, Anna-Katharina Rettinger, Eva Charlotte Mayer, and Matthias Althoff. Formalization of interstate traffic rules in temporal logic. In2020 IEEE Intelligent Vehicles Symposium (IV), page 752–759. IEEE, 2020

  33. [33]

    Formaliza- tion of intersection traffic rules in temporal logic

    Sebastian Maierhofer, Paul Moosbrugger, and Matthias Althoff. Formaliza- tion of intersection traffic rules in temporal logic. In2022 IEEE Intelligent Vehicles Symposium (IV), page 1135–1144. IEEE, 2022

  34. [34]

    Online legal driving behavior monitoring for self-driving vehicles.Nature commu- nications, 15(1):408, 2024

    Wenhao Yu, Chengxiang Zhao, Hong Wang, Jiaxin Liu, Xiaohan Ma, Yingkai Yang, Jun Li, Weida Wang, Xiaosong Hu, and Ding Zhao. Online legal driving behavior monitoring for self-driving vehicles.Nature commu- nications, 15(1):408, 2024

  35. [35]

    Language models are few-shot learners.Advances in neural information processing systems, 33:1877–1901, 2020

    Tom Brown, Benjamin Mann, Nick Ryder, Melanie Subbiah, Jared D Ka- plan, Prafulla Dhariwal, Arvind Neelakantan, Pranav Shyam, Girish Sastry, Amanda Askell, et al. Language models are few-shot learners.Advances in neural information processing systems, 33:1877–1901, 2020

  36. [36]

    GPT-4 Technical Report

    Josh Achiam, Steven Adler, Sandhini Agarwal, Lama Ahmad, Ilge Akkaya, Florencia Leoni Aleman, Diogo Almeida, Janko Altenschmidt, Sam Alt- man, Shyamal Anadkat, et al. Gpt-4 technical report.arXiv preprint arXiv:2303.08774, 2023

  37. [37]

    Gemini: A Family of Highly Capable Multimodal Models

    Gemini Team, Rohan Anil, Sebastian Borgeaud, Jean-Baptiste Alayrac, Jiahui Yu, Radu Soricut, Johan Schalkwyk, Andrew M Dai, Anja Hauth, Katie Millican, et al. Gemini: a family of highly capable multimodal mod- els.arXiv preprint arXiv:2312.11805, 2023

  38. [38]

    Wainwright, Pamela Mishkin, Chong Zhang, Sandhini Agarwal, Katarina Slama, Alex Ray, John Schulman, Jacob Hilton, Paul Christiano, Jan Leike, and Ryan Lowe

    Long Ouyang, Jeff Wu, Xu Jiang, Diogo Almeida, Carroll L. Wainwright, Pamela Mishkin, Chong Zhang, Sandhini Agarwal, Katarina Slama, Alex Ray, John Schulman, Jacob Hilton, Paul Christiano, Jan Leike, and Ryan Lowe. Training language models to follow instructions with human feed- back. InAdvances in Neural Information Processing Systems (NeurIPS), 2022. 19

  39. [39]

    Pre-train, prompt, and predict: A systematic survey of prompting methods in natural language processing.ACM Computing Surveys, 2023

    Pengfei Liu, Weizhe Yuan, Jinlan Fu, Zhengbao Jiang, Hiroaki Hayashi, and Graham Neubig. Pre-train, prompt, and predict: A systematic survey of prompting methods in natural language processing.ACM Computing Surveys, 2023

  40. [40]

    Zewei Zhou, Tianhui Cai, Yun Zhao, Seth Z.and Zhang, Zhiyu Huang, Bolei Zhou, and Jiaqi Ma. Autovla: A vision-language-action model for end-to- end autonomous driving with adaptive reasoning and reinforcement fine- tuning.Advances in Neural Information Processing Systems (NeurIPS), 2025

  41. [41]

    Parameter-efficient fine-tuning of large-scale pre-trained language models

    Ning Ding, Yujia Qin, Guang Yang, Fuchao Wei, Zonghan Yang, Yusheng Su, Shengding Hu, Yulin Chen, Chi-Min Chan, Weize Chen, et al. Parameter-efficient fine-tuning of large-scale pre-trained language models. Nature machine intelligence, 5(3):220–235, 2023

  42. [42]

    Lora: Low-rank adaptation of large language models.Iclr, 1(2):3, 2022

    Edward J Hu, Yelong Shen, Phillip Wallis, Zeyuan Allen-Zhu, Yuanzhi Li, Shean Wang, Liang Wang, Weizhu Chen, et al. Lora: Low-rank adaptation of large language models.Iclr, 1(2):3, 2022

  43. [43]

    The power of scale for parameter-efficient prompt tuning

    Brian Lester, Rami Al-Rfou, and Noah Constant. The power of scale for parameter-efficient prompt tuning. InProceedings of the 2021 conference on empirical methods in natural language processing, pages 3045–3059, 2021

  44. [44]

    Orion: A holistic end-to-end autonomous driving framework by vision-language instructed action generation

    Haoyu Fu, Diankun Zhang, Zongchuang Zhao, Jianfeng Cui, Dingkang Liang, Chong Zhang, Dingyuan Zhang, Hongwei Xie, Bing Wang, and Xiang Bai. Orion: A holistic end-to-end autonomous driving framework by vision-language instructed action generation. InProceedings of the IEEE/CVF International Conference on Computer Vision, pages 24823– 24834, 2025

  45. [45]

    Retrieval-augmented generation for knowledge-intensive NLP tasks

    Patrick Lewis, Ethan Perez, Aleksandra Piktus, Fabio Petroni, Vladimir Karpukhin, Naman Goyal, Heinrich K¨ uttler, Mike Lewis, Wen-tau Yih, Tim Rockt¨ aschel, Sebastian Riedel, and Douwe Kiela. Retrieval-augmented generation for knowledge-intensive NLP tasks. InAdvances in Neural In- formation Processing Systems (NeurIPS), 2020

  46. [46]

    Safeauto: Knowledge-enhanced safe autonomous driving with multi- modal foundation models

    Jiawei Zhang, Xuan Yang, Taiqi Wang, Yu Yao, Aleksandr Petiushko, and Bo Li. Safeauto: Knowledge-enhanced safe autonomous driving with multi- modal foundation models. InInternational Conference on Machine Learn- ing, pages 76497–76517. PMLR, 2025

  47. [47]

    G-retriever: Retrieval-augmented generation for textual graph understanding and question answering.Ad- vances in Neural Information Processing Systems, 37:132876–132907, 2024

    Xiaoxin He, Yijun Tian, Yifei Sun, Nitesh Chawla, Thomas Laurent, Yann LeCun, Xavier Bresson, and Bryan Hooi. G-retriever: Retrieval-augmented generation for textual graph understanding and question answering.Ad- vances in Neural Information Processing Systems, 37:132876–132907, 2024

  48. [48]

    From Local to Global: A Graph RAG Approach to Query-Focused Summarization

    Darren Edge, Ha Trinh, Newman Cheng, Joshua Bradley, Alex Chao, Apurva Mody, Steven Truitt, and Jonathan Larson. From local to global: 20 A graph rag approach to query-focused summarization.arXiv preprint arXiv:2404.16130, 2024

  49. [49]

    Detect- ing hallucinations in large language models using semantic entropy.Nature, 630(8017):625–630, 2024

    Sebastian Farquhar, Jannik Kossen, Lorenz Kuhn, and Yarin Gal. Detect- ing hallucinations in large language models using semantic entropy.Nature, 630(8017):625–630, 2024

  50. [50]

    The next decade in ai: four steps towards robust artificial intelligence.arXiv preprint arXiv:2002.06177, 2020

    Gary Marcus. The next decade in ai: four steps towards robust artificial intelligence.arXiv preprint arXiv:2002.06177, 2020

  51. [51]

    Survey of hallucina- tion in natural language generation.ACM computing surveys, 55(12):1–38, 2023

    Ziwei Ji, Nayeon Lee, Rita Frieske, Tiezheng Yu, Dan Su, Yan Xu, Etsuko Ishii, Ye Jin Bang, Andrea Madotto, and Pascale Fung. Survey of hallucina- tion in natural language generation.ACM computing surveys, 55(12):1–38, 2023

  52. [52]

    Hallucination detection: Ro- bustly discerning reliable answers in large language models

    Yuyan Chen, Qiang Fu, Yichen Yuan, Zhihao Wen, Ge Fan, Dayiheng Liu, Dongmei Zhang, Zhixu Li, and Yanghua Xiao. Hallucination detection: Ro- bustly discerning reliable answers in large language models. InProceedings of the 32nd ACM International Conference on Information and Knowledge Management, pages 245–255, 2023

  53. [53]

    ASAM e.V., 2021

    ASAM e.V.ASAM OpenDRIVE textregistered Specification, v1.7.0. ASAM e.V., 2021. Road network de- scription format for driving simulators

  54. [54]

    ASAM e.V., 2022

    ASAM e.V.ASAM OpenSCENARIO textregistered Specification, v1.2.0. ASAM e.V., 2022. Scenario description format for automated driving and simulation

  55. [55]

    Scene visualization, selection and download.https://onsi te.com.cn/#/dist/benchmarkLeaderBoard, 2025

    OnSite Team. Scene visualization, selection and download.https://onsi te.com.cn/#/dist/benchmarkLeaderBoard, 2025

  56. [56]

    Road vehicles — test scenarios for automated driving systems — scenario categorization, 2024

    ISO. Road vehicles — test scenarios for automated driving systems — scenario categorization, 2024

  57. [57]

    Qwen3 Technical Report

    An Yang, Anfeng Li, Baosong Yang, Beichen Zhang, Binyuan Hui, Bo Zheng, Bowen Yu, Chang Gao, Chengen Huang, Chenxu Lv, et al. Qwen3 technical report.arXiv preprint arXiv:2505.09388, 2025

  58. [58]

    Kingma and Jimmy Ba

    Diederik P. Kingma and Jimmy Ba. Adam: A method for stochastic optimization. InInternational Conference on Learning Representations (ICLR), 2015

  59. [59]

    Rouge: A package for automatic evaluation of summaries

    Chin-Yew Lin. Rouge: A package for automatic evaluation of summaries. InText summarization branches out, page 74–81, 2004

  60. [60]

    Bertscore: Evaluating text generation with bert

    Tianyi Zhang, Varsha Kishore, Felix Wu, Kilian Q Weinberger, and Yoav Artzi. Bertscore: Evaluating text generation with bert. InInternational Conference on Learning Representations, 2020. 21

  61. [61]

    Linearrag: Linear graph retrieval augmented generation on large-scale corpora

    Luyao Zhuang, Shengyuan Chen, Yilin Xiao, Huachi Zhou, Yujing Zhang, Hao Chen, Qinggang Zhang, and Xiao Huang. Linearrag: Linear graph retrieval augmented generation on large-scale corpora. InInternational Conference on Machine Learning, 2026

  62. [62]

    Microscopic traffic sim- ulation using sumo

    Pablo Alvarez Lopez, Michael Behrisch, Laura Bieker-Walz, Jakob Erd- mann, Yun-Pang Fl¨ otter¨ od, Robert Hilbrich, Leonhard L¨ ucken, Johannes Rummel, Peter Wagner, and Evamarie Wiessner. Microscopic traffic sim- ulation using sumo. In2018 21st International Conference on Intelligent Transportation Systems (ITSC), pages 2575–2582, 2018. 6 Acknowledgments...