DDD17: End-To-End DAVIS Driving Dataset
read the original abstract
Event cameras, such as dynamic vision sensors (DVS), and dynamic and active-pixel vision sensors (DAVIS) can supplement other autonomous driving sensors by providing a concurrent stream of standard active pixel sensor (APS) images and DVS temporal contrast events. The APS stream is a sequence of standard grayscale global-shutter image sensor frames. The DVS events represent brightness changes occurring at a particular moment, with a jitter of about a millisecond under most lighting conditions. They have a dynamic range of >120 dB and effective frame rates >1 kHz at data rates comparable to 30 fps (frames/second) image sensors. To overcome some of the limitations of current image acquisition technology, we investigate in this work the use of the combined DVS and APS streams in end-to-end driving applications. The dataset DDD17 accompanying this paper is the first open dataset of annotated DAVIS driving recordings. DDD17 has over 12 h of a 346x260 pixel DAVIS sensor recording highway and city driving in daytime, evening, night, dry and wet weather conditions, along with vehicle speed, GPS position, driver steering, throttle, and brake captured from the car's on-board diagnostics interface. As an example application, we performed a preliminary end-to-end learning study of using a convolutional neural network that is trained to predict the instantaneous steering angle from DVS and APS visual data.
This paper has not been read by Pith yet.
Forward citations
Cited by 4 Pith papers
-
NERVE: A Neuromorphic Vision and Radar Ensemble for Multi-Sensor Fusion Research
NERVE is a new 600GB multi-sensor dataset with DVS, RGB-D, and 24/77GHz radar plus baselines showing DVS+77GHz radar fusion improves human detection to 47.5% mAP with sub-1.8m distance error.
-
RE-VLM: Event-Augmented Vision-Language Model for Scene Understanding
RE-VLM is the first dual-stream VLM combining RGB and event data with a graph-based pipeline to generate training captions and QA pairs, showing gains over RGB-only and event-only models on new datasets for challengin...
-
Generative Event Pretraining with Foundation Model Alignment
GEP transfers semantic knowledge from image foundation models to event data via alignment and generative pretraining on mixed sequences to create transferable event-based visual models.
-
RE-VLM: Event-Augmented Vision-Language Model for Scene Understanding
RE-VLM fuses RGB and event data in a dual-stream VLM with a graph-based pipeline for generating training captions and QA pairs, plus two new datasets, showing gains over RGB-only and event-only baselines especially in...
discussion (0)
Sign in with ORCID, Apple, or X to comment. Anyone can read and Pith papers without signing in.