pith. sign in

arxiv: 1610.01983 · v2 · pith:G3UGYHMRnew · submitted 2016-10-06 · 💻 cs.CV · cs.RO

Driving in the Matrix: Can Virtual Worlds Replace Human-Generated Annotations for Real World Tasks?

classification 💻 cs.CV cs.RO
keywords datalearningalgorithmsannotateddeeprealtrainingannotations
0
0 comments X
read the original abstract

Deep learning has rapidly transformed the state of the art algorithms used to address a variety of problems in computer vision and robotics. These breakthroughs have relied upon massive amounts of human annotated training data. This time consuming process has begun impeding the progress of these deep learning efforts. This paper describes a method to incorporate photo-realistic computer images from a simulation engine to rapidly generate annotated data that can be used for the training of machine learning algorithms. We demonstrate that a state of the art architecture, which is trained only using these synthetic annotations, performs better than the identical architecture trained on human annotated real-world data, when tested on the KITTI data set for vehicle detection. By training machine learning algorithms on a rich virtual world, real objects in real scenes can be learned and classified using synthetic data. This approach offers the possibility of accelerating deep learning's application to sensor-based classification problems like those that appear in self-driving cars. The source code and data to train and validate the networks described in this paper are made available for researchers.

This paper has not been read by Pith yet.

discussion (0)

Sign in with ORCID, Apple, or X to comment. Anyone can read and Pith papers without signing in.

Forward citations

Cited by 5 Pith papers

Reviewed papers in the Pith corpus that reference this work. Sorted by Pith novelty score.

  1. Real-Time Source-Free Object Detection

    cs.CV 2026-06 unverdicted novelty 6.0

    RT-SFOD adapts dual-head detectors like YOLOv10 for source-free object detection via DHF pseudo-label fusion and MARD loss, delivering 1.4-3.5% mAP gains with 1.3x higher throughput and ~2x fewer parameters than prior...

  2. MULTI: Disentangling Camera Lens, Sensor, View, and Domain for Novel Image Generation

    cs.CV 2026-05 unverdicted novelty 6.0

    MULTI uses two-stage textual inversion to disentangle camera lens, sensor, view, and domain factors for novel image generation, supporting dataset extension and ControlNet modifications on the new DF-RICO benchmark.

  3. The Power of Light: Improving Synthetic-to-Real Domain Adaptation through Physically-Based Indirect Illumination

    cs.CV 2026-06 unverdicted novelty 5.0

    Empirical study shows complex indirect lighting and background variability in synthetic data improve YOLOv12 object detection transfer to real industrial scenes over direct lighting baselines.

  4. Training-Free Metrics for Synthetic Object Detection Data: A Proxy for Detector Performance

    cs.CV 2026-06 unverdicted novelty 5.0

    CCDM metrics achieve perfect Spearman correlation of 1.0 with YOLOv8 mAP on VisDrone-DET synthetic sets, outperforming prior synthetic-image metrics.

  5. A Regularized Convolutional Neural Network for Semantic Image Segmentation

    cs.CV 2019-06 unverdicted novelty 3.0

    Integrating total variation regularization into U-Net and SegNet yields segmentation results with improved spatial regularity and noise robustness on WBC, CamVid, and SUN-RGBD datasets.