NuRisk: A Visual Question Answering Dataset for Agent-Level Risk Assessment in Autonomous Driving

· 2025 · cs.AI · arXiv 2509.25944

2 Pith papers cite this work. Polarity classification is still indexing.

2 Pith papers citing it

open full Pith review browse 2 citing papers arXiv PDF

abstract

Understanding risk in autonomous driving requires not only perception and prediction, but also high-level reasoning about agent behavior and context. Current Vision Language Model (VLM)-based methods primarily ground agents in static images and provide qualitative judgments, lacking the spatio-temporal reasoning needed to capture how risks evolve over time. To address this gap, we propose NuRisk, a comprehensive Visual Question Answering (VQA) dataset comprising 2.9K scenarios and 1.1M agent-level samples, built on real-world data from nuScenes and Waymo, completed with safety-critical scenarios from the CommonRoad simulator. The dataset provides Bird's-eye view (BEV) based sequential images with quantitative, agent-level risk annotations, enabling spatio-temporal reasoning. We benchmark well-known VLMs across different prompting techniques and find that they fail to perform explicit spatio-temporal reasoning, resulting in a peak accuracy of 33% at high latency. To address these shortcomings, our fine-tuned 7B VLM agent improves accuracy to 41% and reduces latency by 75%, demonstrating explicit spatio-temporal reasoning capabilities that proprietary models lacked. While this represents a significant step forward, the modest accuracy underscores the profound challenge of the task, establishing NuRisk as a critical benchmark for advancing spatio-temporal reasoning in autonomous driving. More information can be found at https://github.com/TUM-AVS/NuRisk.

representative citing papers

V2X-QA: A Comprehensive Reasoning Dataset and Benchmark for Multimodal Large Language Models in Autonomous Driving Across Ego, Infrastructure, and Cooperative Views

cs.RO · 2026-04-03 · conditional · novelty 8.0

V2X-QA provides a view-decoupled benchmark showing infrastructure views aid macroscopic traffic understanding while cooperative reasoning requires explicit cross-view alignment, with V2X-MoE as a routing-based baseline that improves performance.

Can VLMs Unlock Semantic Anomaly Detection? A Framework for Structured Reasoning

cs.CV · 2025-10-20 · conditional · novelty 6.0 · 2 refs

SAVANT reformulates semantic anomaly detection as layered consistency verification, raising VLM recall by 18.5% on real driving images and enabling a fine-tuned 7B open model to reach 90.8% recall and 93.8% accuracy.

citing papers explorer

Showing 2 of 2 citing papers.

V2X-QA: A Comprehensive Reasoning Dataset and Benchmark for Multimodal Large Language Models in Autonomous Driving Across Ego, Infrastructure, and Cooperative Views cs.RO · 2026-04-03 · conditional · none · ref 5 · internal anchor
V2X-QA provides a view-decoupled benchmark showing infrastructure views aid macroscopic traffic understanding while cooperative reasoning requires explicit cross-view alignment, with V2X-MoE as a routing-based baseline that improves performance.
Can VLMs Unlock Semantic Anomaly Detection? A Framework for Structured Reasoning cs.CV · 2025-10-20 · conditional · none · ref 19 · 2 links · internal anchor
SAVANT reformulates semantic anomaly detection as layered consistency verification, raising VLM recall by 18.5% on real driving images and enabling a fine-tuned 7B open model to reach 90.8% recall and 93.8% accuracy.

NuRisk: A Visual Question Answering Dataset for Agent-Level Risk Assessment in Autonomous Driving

fields

years

verdicts

representative citing papers

citing papers explorer