PairedGTA: Generating Driving Datasets for Controlled Photometric Shift Analysis

Alessandro Biondi; Andrea Chianese; Giorgio Buttazzo; Giulio Rossolini; Marco Cococcioni

arxiv: 2606.01192 · v1 · pith:ROQSU2WEnew · submitted 2026-05-31 · 💻 cs.CV

PairedGTA: Generating Driving Datasets for Controlled Photometric Shift Analysis

Andrea Chianese , Giulio Rossolini , Alessandro Biondi , Marco Cococcioni , Giorgio Buttazzo This is my paper

Pith reviewed 2026-06-28 17:21 UTC · model grok-4.3

classification 💻 cs.CV

keywords paired datasetsphotometric shiftssemantic segmentationgame engineautonomous drivingweather conditionsillumination changessynthetic data

0 comments

The pith

A GTA-based framework produces perfectly paired driving images that differ only in weather and lighting to isolate photometric effects on perception models.

A machine-rendered reading of the paper's core claim, the machinery that carries it, and where it could break.

The paper presents a generation method that uses the GTA game engine to render multiple versions of the exact same driving scene, with identical geometry, camera position, and object placements but altered illumination or weather. Real datasets rarely supply such pairs because traffic and viewpoint change between captures, so model errors cannot be cleanly traced to photometric factors alone. The new approach samples locations, places dynamic objects procedurally, and renders pixel-aligned outputs under controlled adverse conditions. This setup lets researchers measure how semantic segmentation models degrade when only lighting or weather changes, rather than when geometry or semantics also shift.

Core claim

By leveraging software APIs that communicate with the GTA game engine, the framework modifies illumination and weather conditions while preserving scene geometry, camera pose, and the identity and placement of dynamic objects. For each sampled location, it procedurally instantiates dynamic entities and renders pixel-aligned images under diverse adverse conditions. The benefit of the proposed generation framework in driving scenarios is demonstrated through a systematic analysis of semantic segmentation models, whose output degradation can be attributed more directly to photometric shifts rather than to uncontrolled semantic or geometric factors.

What carries the argument

Procedural instantiation and rendering inside the GTA engine that changes only photometric parameters while fixing all other scene elements.

If this is right

Semantic segmentation output changes can be attributed more directly to photometric shifts.
Systematic evaluation of perception models becomes possible across many adverse conditions without confounding geometric or semantic variation.
Pixel-aligned image sets allow controlled measurement of model robustness to illumination and weather alone.
The generated data supports analysis that separates photometric from layout-related sources of error in autonomous driving perception.

Where Pith is reading between the lines

These are editorial extensions of the paper, not claims the author makes directly.

The same paired-generation technique could be applied to other perception tasks such as object detection or optical flow to check whether photometric sensitivity is task-dependent.
If the synthetic shifts prove close enough to real ones, the datasets could serve as a cheap way to augment scarce real paired data for robustness training.
The framework implicitly suggests that game-engine control over rendering parameters offers a route to test camera-invariant features without collecting new physical footage.

Load-bearing premise

The assumption that photometric shifts produced inside the GTA engine affect perception models in ways that represent real-world camera behavior.

What would settle it

A side-by-side test in which the same segmentation model is run on both PairedGTA pairs and any real-world driving pairs captured under matching condition changes; if degradation patterns diverge sharply, the claim that the synthetic pairs isolate representative photometric effects would be weakened.

Figures

Figures reproduced from arXiv: 2606.01192 by Alessandro Biondi, Andrea Chianese, Giorgio Buttazzo, Giulio Rossolini, Marco Cococcioni.

**Figure 2.** Figure 2: Example of a daytime sunny image and a corresponding sunset sunny variant generated by the framework, highlighting object poses and placement constraints. set of photometric conditions considered in the dataset. Each condition ci specifies environmental rendering parameters, such as time of day, illumination, rain, fog, or cloud coverage. For each scene k, the framework first constructs an internal scene d… view at source ↗

**Figure 3.** Figure 3: Communication pipeline between the proposed framework (blue block), other third-party software tools, and the GTA game engine. The interaction with the game engine is mediated by VPilot [37], which provides a Python interface for client-side communication and scenario orchestration. VPilot communicates with the DeepGTAV server, exposed by the game plugin through a TCP connection on port 8000, by default. T… view at source ↗

**Figure 4.** Figure 4: Dataset-level distribution of pseudo-labels generated with SegFormer-B5. The bottom panels show an example of a reference image and the segmentation map. Dataset. The evaluation is conducted on a dataset generated using the pipeline described in Section 3. The dataset contains more than 100 unique spatial locations. For each location k, we generate nine spatially aligned images corresponding to all combi… view at source ↗

**Figure 5.** Figure 5: Illustration of photometric shifts across three sunny scenarios under different illumination conditions: day, sunset, and night. In [PITH_FULL_IMAGE:figures/full_fig_p008_5.png] view at source ↗

**Figure 6.** Figure 6: Consistency analysis on our dataset and ACDC [30], using SegFormer (left) and MaskFormer (right). Frames are ranked according to the number of high-confidence predictions in clean scenarios for each class. Higher ρvalid indicates stronger preservation of the clean-to-adverse ranking. 4.4 Cost analysis of the generation steps This section reports a computational cost analysis for generating paired images wi… view at source ↗

**Figure 7.** Figure 7: Low-consistency examples for the person class from our dataset, on the left, and ACDC, on the right, with segmentation outputs produced by SegFormer-B5. Pipeline Overhead Total 10 20 30 40 Time [s] 29.8s 4.0s 33.8s 0 5 10 15 20 25 30 Average time [s] Phase 1 | 0.0s (0.0%) Phase 2 |6.1s (20.5%) Phase 3 | 23.7s (79.5%) [PITH_FULL_IMAGE:figures/full_fig_p010_7.png] view at source ↗

**Figure 8.** Figure 8: Computational cost analysis of the proposed framework. Left: execution time over 100 scenarios for the pipeline, external overhead, and total process. Right: average pipeline-time decomposition across phases. This enables the effect of photometric changes to be studied in isolation, reducing the confounding factors that typically affect real-world adverse-condition datasets. We used the generated data to e… view at source ↗

read the original abstract

Evaluating the performance of visual perception systems for autonomous driving is essential to ensure reliable operation across diverse environmental scenarios. Ideally, a balanced and fair analysis across different adverse conditions would require perfectly paired images of the same scene under different weather or illumination changes. This would allow evaluating the effect of photometric shifts independently of geometry and semantic changes. Unfortunately, real-world datasets rarely provide images of the same scene under different environmental conditions, because, normally, camera pose, traffic, and locations of dynamic objects (vehicles, pedestrians, etc.) vary over time, thus yielding only coarsely paired data. To address this challenge, this work introduces a data generation framework based on a high-fidelity game engine for extracting perfectly paired images. By leveraging software APIs that communicate with the GTA game engine, the framework modifies illumination and weather conditions while preserving scene geometry, camera pose, and the identity and placement of dynamic objects. For each sampled location, it procedurally instantiates dynamic entities and renders pixel-aligned images under diverse adverse conditions. The benefit of the proposed generation framework in driving scenarios is demonstrated through a systematic analysis of semantic segmentation models, whose output degradation can be attributed more directly to photometric shifts rather than to uncontrolled semantic or geometric factors.

Editorial analysis

A structured set of objections, weighed in public.

Desk editor's note, referee report, simulated authors' rebuttal, and a circularity audit. Tearing a paper down is the easy half of reading it; the pith above is the substance, this is the friction.

Desk Editor's Note private letter to a colleague

PairedGTA gives a workable pipeline for rendering the same GTA scene multiple times with only weather and lighting changed while locking down pose, geometry, and dynamic object placements.

read the letter

The main thing here is a framework that samples locations in GTA, places dynamic objects once via procedural instantiation, then renders pixel-aligned images under varied illumination and weather using the engine APIs.

This approach is new in its explicit focus on preserving the identity and exact placement of vehicles and pedestrians across the paired renders. Earlier GTA-based driving datasets typically allow more drift in those elements between captures.

The paper does a solid job showing how the setup supports segmentation experiments where degradation can be tied more directly to photometric shifts. The description of the API-driven control and the procedural step is clear enough to follow.

The soft spot is external validity. The photometric changes are generated inside the game engine, and the work does not include any direct matching or error analysis against real camera footage under comparable conditions. Readers who need the shifts to stand in for physical reality will have to assess that themselves.

The method stays within one engine, which is fine for the stated goal but limits immediate portability.

This is for people working on robustness testing of perception models for autonomous driving who already use or can work with synthetic data. A reader who wants controlled ablations on weather effects without geometry confounds would get practical value from the pipeline.

It deserves peer review because the central mechanism is technically plausible and the motivation for paired data is sound.

Referee Report

1 major / 0 minor

Summary. The manuscript introduces PairedGTA, a framework that uses GTA game engine APIs to generate perfectly paired driving-scene images under controlled changes in illumination and weather. The method procedurally instantiates dynamic objects at sampled locations and renders multiple pixel-aligned images while holding scene geometry, camera pose, and object identities/placements fixed, enabling analysis of semantic segmentation degradation attributable to photometric shifts rather than geometric or semantic confounders.

Significance. If the generated pairs function as described, the framework supplies a controlled testbed for isolating photometric effects on perception models, addressing a practical limitation of real-world driving datasets that rarely contain exact scene matches across conditions. The procedural instantiation step is a concrete engineering contribution that could support reproducible robustness studies in autonomous driving.

major comments (1)

[Abstract] Abstract: the claim that the framework 'demonstrates' the benefit for semantic segmentation analysis is unsupported by any quantitative results, error metrics, or comparison to real data in the provided description, leaving the utility claim only partially substantiated.

Simulated Author's Rebuttal

1 responses · 0 unresolved

We thank the referee for the constructive feedback and positive assessment of the framework's utility for controlled photometric analysis. We address the single major comment below.

read point-by-point responses

Referee: [Abstract] Abstract: the claim that the framework 'demonstrates' the benefit for semantic segmentation analysis is unsupported by any quantitative results, error metrics, or comparison to real data in the provided description, leaving the utility claim only partially substantiated.

Authors: We agree that the abstract, as a concise summary, does not itself contain quantitative metrics or direct comparisons. The manuscript body (Sections 4–5) presents the systematic analysis with mIoU and per-class degradation metrics across controlled photometric conditions on multiple segmentation models. To resolve the concern, we will revise the abstract wording to state that the benefit 'is demonstrated through systematic analysis' (removing any implication that metrics appear in the abstract) and will ensure the claim is fully supported by the experiments section. No comparison to real data is claimed or required, as the contribution is the controlled synthetic pairs themselves. revision: yes

Circularity Check

0 steps flagged

No significant circularity

full rationale

The paper describes a procedural framework for generating paired driving images via GTA engine APIs that control illumination/weather while fixing geometry, pose, and object placement. No equations, fitted parameters, predictions, or derivation chain exist in the provided text; the central mechanism is a direct engineering construction whose correctness does not reduce to self-definition or self-citation. External validity of the generated shifts is a separate question outside the pairing mechanism itself.

Axiom & Free-Parameter Ledger

0 free parameters · 1 axioms · 0 invented entities

The central claim rests on the fidelity of the game engine simulation and the assumption that procedural placement matches the statistical properties needed for driving scenarios.

axioms (1)

domain assumption The GTA game engine rendering produces photometric shifts whose impact on perception models is representative of real-world conditions.
Invoked in the description of how the framework enables attribution of degradation to photometric shifts.

pith-pipeline@v0.9.1-grok · 5753 in / 1120 out tokens · 30255 ms · 2026-06-28T17:21:25.945344+00:00 · methodology

discussion (0)

Reference graph

Works this paper leans on

42 extracted references · 4 canonical work pages · 3 internal anchors

[1]

Night-to-Day Image Translation for Retrieval-based Localization

A. Anoosheh, T. Sattler, R. Timofte, M. Pollefeys, and L. Van Gool. Night-to-day image translation for retrieval-based localization.arXiv preprint arXiv:1809.09767, 2018

work page internal anchor Pith review Pith/arXiv arXiv 2018
[2]

S. Baik, S. Kim, and E. Kim. Weatherflux: Universal weather translation with diffusion models.ICLR, 2025

2025
[3]

Ben-David, J

S. Ben-David, J. Blitzer, K. Crammer, and F. Pereira. A theory of learning from different domains.Machine Learning, 79(1–2):151–175, 2010

2010
[4]

Cao and R

M. Cao and R. Ramezani. Data generation using simulation technology to improve perception mechanism of autonomous vehicles, 2022

2022
[5]

L.-C. Chen, Y. Zhu, G. Papandreou, F. Schroff, and H. Adam. Encoder-decoder with atrous separable convolution for semantic image segmentation. InECCV, 2018

2018
[6]

Cheng, I

B. Cheng, I. Misra, A. G. Schwing, A. Kirillov, and R. Girdhar. Masked-attention mask transformer for universal image segmentation. InProceedings of the IEEE/CVF conference on computer vision and pattern recognition, pages 1290–1299, 2022

2022
[7]

Cordts, M

M. Cordts, M. Omran, S. Ramos, T. Rehfeld, M. Enzweiler, R. Benenson, U. Franke, S. Roth, and B. Schiele. The cityscapes dataset for semantic urban scene understanding. InProceedings of the IEEE Conference on Computer Vision and Pattern Recognition (CVPR), 2016

2016
[8]

B. H. K. Czarnecki and S. Waslander. Precise synthetic image and lidar (presil) dataset for autonomous vehicle perception.Computer Vision and Pattern Recognition, arXiv:1905.00160, 2019

work page internal anchor Pith review Pith/arXiv arXiv 1905
[9]

D’Amico, F

G. D’Amico, F. Nesti, G. Rossolini, M. Marinoni, S. Sabina, and G. Buttazzo. Syndra: Synthetic dataset for railway applications. InProceedings of the Winter Conference on Applications of Computer Vision (WACV), pages 3437–3446, February 2025

2025
[10]

aitorzip: https://github.com/aitorzip/DeepGTAV
[11]

Dosovitskiy, G

A. Dosovitskiy, G. Ros, F. Codevilla, A. Lopez, and V. Koltun. Carla: An open urban driving simulator. CoRL, 2017

2017
[12]

Gaidon, Q

A. Gaidon, Q. Wang, Y. Cabon, and E. Vig. Virtual worlds as proxy for multi-object tracking analysis. In Proceedings of the IEEE Conference on Computer Vision and Pattern Recognition (CVPR), 2016

2016
[13]

Gella, H

B. Gella, H. Zhang, R. Upadhyay, T. Chang, M. Waliman, Y. Ba, A. Wong, and A. Kadambi. Weatherproof: A paired-dataset approach to semantic segmentation in adverse weather.arXiv preprint arXiv:2312.09534, 2023

work page arXiv 2023
[14]

Gurbindo, A

U. Gurbindo, A. Brando, J. Abella, and C. König. Object detection in adverse weather conditions for autonomous vehicles using instruct pix2pix. In2025 International Joint Conference on Neural Networks (IJCNN), pages 1–8. IEEE, 2025

2025
[15]

H. Ha, X. Jin, J. Kim, J. Liu, Z. Wang, K. D. Nguyen, A. Blume, N. Peng, K.-W. Chang, and H. Ji. Synthia: Novel concept design with affordance composition.CVPR, 2021

2021
[16]

Benchmarking neural network robustness to common corruptions

Hendrycks and Dietterich. Benchmarking neural network robustness to common corruptions. InICLR, 2019

2019
[17]

FCNs in the Wild: Pixel-level Adversarial and Constraint-based Adaptation

J. Hoffman, D. Wang, F. Yu, and T. Darrell. FCNs in the wild: Pixel-level adversarial and constraint-based adaptation.arXiv:1612.02649, 2016

work page internal anchor Pith review Pith/arXiv arXiv 2016
[18]

Y. Hong, H. Pan, W. Sun, and Y. Jia. Deep dual-resolution networks for real-time and accurate semantic segmentation of road scenes. InCVPR, 2021

2021
[19]

Rockstar Games: Policy on posting copyrighted Rockstar Games material: http:// tinyurl.com/pjfoqo5r. 11
[20]

Isola, J.-Y

P. Isola, J.-Y. Zhu, T. Zhou, and A. A. Efros. Image-to-image translation with conditional adversarial networks. InProceedings of the IEEE Conference on Computer Vision and Pattern Recognition (CVPR), pages 1125–1134, 2017

2017
[21]

Y. Jia, L. Hoyer, S. Huang, T. Wang, L. Van Gool, K. Schindler, and A. Obukhov. Dginstyle: Domain- generalizable semantic segmentation with image diffusion models and stylized semantic control. InEuropean Conference on Computer Vision (ECCV), 2024

2024
[22]

Kiefer, D

B. Kiefer, D. Ott, and A. Zell. Leveraging synthetic data in object detection on unmanned aerial vehicles, 2021

2021
[23]

Martinez, C

M. Martinez, C. Sitawarin, K. Finch, L. Meincke, A. Yablonski, and A. Kornhauser. Beyond grand theft auto v for training, testing and enhancing deep learning in self driving cars, 2017

2017
[24]

Michaelis, B

C. Michaelis, B. Mitzkus, R. Geirhos, E. Rusak, O. Bringmann, A. S. Ecker, M. Bethge, and W. Brendel. Benchmarking robustness in object detection: Autonomous driving when winter is coming. InNeurIPS Workshop on Machine Learning for Autonomous Driving, 2019

2019
[25]

Neuhold, T

G. Neuhold, T. Ollmann, S. Rota Bulò, and P. Kontschieder. The mapillary vistas dataset for semantic understanding of street scenes. InICCV, 2017

2017
[26]

S. R. Richter, Z. Hayder, and V. Koltun. Playing for benchmarks. InProceedings of the IEEE International Conference on Computer Vision (ICCV), 2017

2017
[27]

S. R. Richter, V. Vineet, S. Roth, and V. Koltun. Playing for data: Ground truth from computer games. In Proceedings of the European Conference on Computer Vision (ECCV), pages 102–118, 2016

2016
[28]

Sakaridis, D

C. Sakaridis, D. Dai, and L. Van Gool. Semantic foggy scene understanding with synthetic data.International Journal of Computer Vision, 126(9):973–992, 2018

2018
[29]

Sakaridis, D

C. Sakaridis, D. Dai, and L. Van Gool. Guided curriculum model adaptation and uncertainty-aware evaluation for semantic nighttime image segmentation. InProceedings of the IEEE/CVF International Conference on Computer Vision (ICCV), 2019

2019
[30]

Sakaridis, D

C. Sakaridis, D. Dai, and L. Van Gool. Acdc: The adverse conditions dataset with correspondences for semantic driving scene understanding. InICCV, 2021

2021
[31]

Sankaranarayanan, Y

S. Sankaranarayanan, Y. Balaji, A. Jain, S. Nam Lim, and R. Chellappa. Learning from synthetic data: Addressing domain shift for semantic segmentation. InProceedings of the IEEE Conference on Computer Vision and Pattern Recognition (CVPR), pages 3752–3761, 2018

2018
[32]

Alexander Blade: http://www.dev-c.com/gtav/scripthookv/
[33]

T. Sun, M. Segu, J. Postels, Y. Wang, L. Van Gool, B. Schiele, F. Tombari, and F. Yu. Shift: A synthetic driving dataset for continuous multi-task domain adaptation. InProceedings of the IEEE/CVF Conference on Computer Vision and Pattern Recognition (CVPR), pages 21371–21382, 2022

2022
[34]

Taori et al

R. Taori et al. Measuring robustness to natural distribution shifts in image classification. InNeurIPS, 2020

2020
[35]

Torralba and A

A. Torralba and A. A. Efros. Unbiased look at dataset bias. InCVPR, 2011

2011
[36]

Tsai, W.-C

Y.-H. Tsai, W.-C. Hung, S. Schulter, K. Sohn, M.-H. Yang, and M. Chandraker. Learning to adapt structured output space for semantic segmentation. InProceedings of the IEEE Conference on Computer Vision and Pattern Recognition (CVPR), 2018

2018
[37]

aitorzip: https://github.com/aitorzip/VPilot
[38]

E. Xie, W. Wang, Z. Yu, A. Anandkumar, J. M. Alvarez, and P. Luo. Segformer: Simple and efficient design for semantic segmentation with transformers.Advances in neural information processing systems, 34:12077–12090, 2021

2021
[39]

J. Xu, E. Xie, X. Liu, W. Chen, D. Liang, and P. Luo. Pidnet: A real-time semantic segmentation network inspired from pid controller. InCVPR, 2023

2023
[40]

F. Yu, H. Chen, X. Wang, W. Xian, Y. Chen, F. Liu, V. Madhavan, and T. Darrell. BDD100K: A diverse driving dataset for heterogeneous multitask learning. InProceedings of the IEEE/CVF Conference on Computer Vision and Pattern Recognition (CVPR), 2020

2020
[41]

Zendel, M

O. Zendel, M. Murschitz, M. Zeilinger, D. Steininger, and C. Beleznai. Analyzing computer vision data - the good, the bad and the ugly. InCVPR Workshops, 2018

2018
[42]

J.-Y. Zhu, T. Park, P. Isola, and A. A. Efros. Unpaired image-to-image translation using cycle-consistent adversarial networks. InProceedings of the IEEE International Conference on Computer Vision (ICCV), 2017. 12

2017

[1] [1]

Night-to-Day Image Translation for Retrieval-based Localization

A. Anoosheh, T. Sattler, R. Timofte, M. Pollefeys, and L. Van Gool. Night-to-day image translation for retrieval-based localization.arXiv preprint arXiv:1809.09767, 2018

work page internal anchor Pith review Pith/arXiv arXiv 2018

[2] [2]

S. Baik, S. Kim, and E. Kim. Weatherflux: Universal weather translation with diffusion models.ICLR, 2025

2025

[3] [3]

Ben-David, J

S. Ben-David, J. Blitzer, K. Crammer, and F. Pereira. A theory of learning from different domains.Machine Learning, 79(1–2):151–175, 2010

2010

[4] [4]

Cao and R

M. Cao and R. Ramezani. Data generation using simulation technology to improve perception mechanism of autonomous vehicles, 2022

2022

[5] [5]

L.-C. Chen, Y. Zhu, G. Papandreou, F. Schroff, and H. Adam. Encoder-decoder with atrous separable convolution for semantic image segmentation. InECCV, 2018

2018

[6] [6]

Cheng, I

B. Cheng, I. Misra, A. G. Schwing, A. Kirillov, and R. Girdhar. Masked-attention mask transformer for universal image segmentation. InProceedings of the IEEE/CVF conference on computer vision and pattern recognition, pages 1290–1299, 2022

2022

[7] [7]

Cordts, M

M. Cordts, M. Omran, S. Ramos, T. Rehfeld, M. Enzweiler, R. Benenson, U. Franke, S. Roth, and B. Schiele. The cityscapes dataset for semantic urban scene understanding. InProceedings of the IEEE Conference on Computer Vision and Pattern Recognition (CVPR), 2016

2016

[8] [8]

B. H. K. Czarnecki and S. Waslander. Precise synthetic image and lidar (presil) dataset for autonomous vehicle perception.Computer Vision and Pattern Recognition, arXiv:1905.00160, 2019

work page internal anchor Pith review Pith/arXiv arXiv 1905

[9] [9]

D’Amico, F

G. D’Amico, F. Nesti, G. Rossolini, M. Marinoni, S. Sabina, and G. Buttazzo. Syndra: Synthetic dataset for railway applications. InProceedings of the Winter Conference on Applications of Computer Vision (WACV), pages 3437–3446, February 2025

2025

[10] [10]

aitorzip: https://github.com/aitorzip/DeepGTAV

[11] [11]

Dosovitskiy, G

A. Dosovitskiy, G. Ros, F. Codevilla, A. Lopez, and V. Koltun. Carla: An open urban driving simulator. CoRL, 2017

2017

[12] [12]

Gaidon, Q

A. Gaidon, Q. Wang, Y. Cabon, and E. Vig. Virtual worlds as proxy for multi-object tracking analysis. In Proceedings of the IEEE Conference on Computer Vision and Pattern Recognition (CVPR), 2016

2016

[13] [13]

Gella, H

B. Gella, H. Zhang, R. Upadhyay, T. Chang, M. Waliman, Y. Ba, A. Wong, and A. Kadambi. Weatherproof: A paired-dataset approach to semantic segmentation in adverse weather.arXiv preprint arXiv:2312.09534, 2023

work page arXiv 2023

[14] [14]

Gurbindo, A

U. Gurbindo, A. Brando, J. Abella, and C. König. Object detection in adverse weather conditions for autonomous vehicles using instruct pix2pix. In2025 International Joint Conference on Neural Networks (IJCNN), pages 1–8. IEEE, 2025

2025

[15] [15]

H. Ha, X. Jin, J. Kim, J. Liu, Z. Wang, K. D. Nguyen, A. Blume, N. Peng, K.-W. Chang, and H. Ji. Synthia: Novel concept design with affordance composition.CVPR, 2021

2021

[16] [16]

Benchmarking neural network robustness to common corruptions

Hendrycks and Dietterich. Benchmarking neural network robustness to common corruptions. InICLR, 2019

2019

[17] [17]

FCNs in the Wild: Pixel-level Adversarial and Constraint-based Adaptation

J. Hoffman, D. Wang, F. Yu, and T. Darrell. FCNs in the wild: Pixel-level adversarial and constraint-based adaptation.arXiv:1612.02649, 2016

work page internal anchor Pith review Pith/arXiv arXiv 2016

[18] [18]

Y. Hong, H. Pan, W. Sun, and Y. Jia. Deep dual-resolution networks for real-time and accurate semantic segmentation of road scenes. InCVPR, 2021

2021

[19] [19]

Rockstar Games: Policy on posting copyrighted Rockstar Games material: http:// tinyurl.com/pjfoqo5r. 11

[20] [20]

Isola, J.-Y

P. Isola, J.-Y. Zhu, T. Zhou, and A. A. Efros. Image-to-image translation with conditional adversarial networks. InProceedings of the IEEE Conference on Computer Vision and Pattern Recognition (CVPR), pages 1125–1134, 2017

2017

[21] [21]

Y. Jia, L. Hoyer, S. Huang, T. Wang, L. Van Gool, K. Schindler, and A. Obukhov. Dginstyle: Domain- generalizable semantic segmentation with image diffusion models and stylized semantic control. InEuropean Conference on Computer Vision (ECCV), 2024

2024

[22] [22]

Kiefer, D

B. Kiefer, D. Ott, and A. Zell. Leveraging synthetic data in object detection on unmanned aerial vehicles, 2021

2021

[23] [23]

Martinez, C

M. Martinez, C. Sitawarin, K. Finch, L. Meincke, A. Yablonski, and A. Kornhauser. Beyond grand theft auto v for training, testing and enhancing deep learning in self driving cars, 2017

2017

[24] [24]

Michaelis, B

C. Michaelis, B. Mitzkus, R. Geirhos, E. Rusak, O. Bringmann, A. S. Ecker, M. Bethge, and W. Brendel. Benchmarking robustness in object detection: Autonomous driving when winter is coming. InNeurIPS Workshop on Machine Learning for Autonomous Driving, 2019

2019

[25] [25]

Neuhold, T

G. Neuhold, T. Ollmann, S. Rota Bulò, and P. Kontschieder. The mapillary vistas dataset for semantic understanding of street scenes. InICCV, 2017

2017

[26] [26]

S. R. Richter, Z. Hayder, and V. Koltun. Playing for benchmarks. InProceedings of the IEEE International Conference on Computer Vision (ICCV), 2017

2017

[27] [27]

S. R. Richter, V. Vineet, S. Roth, and V. Koltun. Playing for data: Ground truth from computer games. In Proceedings of the European Conference on Computer Vision (ECCV), pages 102–118, 2016

2016

[28] [28]

Sakaridis, D

C. Sakaridis, D. Dai, and L. Van Gool. Semantic foggy scene understanding with synthetic data.International Journal of Computer Vision, 126(9):973–992, 2018

2018

[29] [29]

Sakaridis, D

C. Sakaridis, D. Dai, and L. Van Gool. Guided curriculum model adaptation and uncertainty-aware evaluation for semantic nighttime image segmentation. InProceedings of the IEEE/CVF International Conference on Computer Vision (ICCV), 2019

2019

[30] [30]

Sakaridis, D

C. Sakaridis, D. Dai, and L. Van Gool. Acdc: The adverse conditions dataset with correspondences for semantic driving scene understanding. InICCV, 2021

2021

[31] [31]

Sankaranarayanan, Y

S. Sankaranarayanan, Y. Balaji, A. Jain, S. Nam Lim, and R. Chellappa. Learning from synthetic data: Addressing domain shift for semantic segmentation. InProceedings of the IEEE Conference on Computer Vision and Pattern Recognition (CVPR), pages 3752–3761, 2018

2018

[32] [32]

Alexander Blade: http://www.dev-c.com/gtav/scripthookv/

[33] [33]

T. Sun, M. Segu, J. Postels, Y. Wang, L. Van Gool, B. Schiele, F. Tombari, and F. Yu. Shift: A synthetic driving dataset for continuous multi-task domain adaptation. InProceedings of the IEEE/CVF Conference on Computer Vision and Pattern Recognition (CVPR), pages 21371–21382, 2022

2022

[34] [34]

Taori et al

R. Taori et al. Measuring robustness to natural distribution shifts in image classification. InNeurIPS, 2020

2020

[35] [35]

Torralba and A

A. Torralba and A. A. Efros. Unbiased look at dataset bias. InCVPR, 2011

2011

[36] [36]

Tsai, W.-C

Y.-H. Tsai, W.-C. Hung, S. Schulter, K. Sohn, M.-H. Yang, and M. Chandraker. Learning to adapt structured output space for semantic segmentation. InProceedings of the IEEE Conference on Computer Vision and Pattern Recognition (CVPR), 2018

2018

[37] [37]

aitorzip: https://github.com/aitorzip/VPilot

[38] [38]

E. Xie, W. Wang, Z. Yu, A. Anandkumar, J. M. Alvarez, and P. Luo. Segformer: Simple and efficient design for semantic segmentation with transformers.Advances in neural information processing systems, 34:12077–12090, 2021

2021

[39] [39]

J. Xu, E. Xie, X. Liu, W. Chen, D. Liang, and P. Luo. Pidnet: A real-time semantic segmentation network inspired from pid controller. InCVPR, 2023

2023

[40] [40]

F. Yu, H. Chen, X. Wang, W. Xian, Y. Chen, F. Liu, V. Madhavan, and T. Darrell. BDD100K: A diverse driving dataset for heterogeneous multitask learning. InProceedings of the IEEE/CVF Conference on Computer Vision and Pattern Recognition (CVPR), 2020

2020

[41] [41]

Zendel, M

O. Zendel, M. Murschitz, M. Zeilinger, D. Steininger, and C. Beleznai. Analyzing computer vision data - the good, the bad and the ugly. InCVPR Workshops, 2018

2018

[42] [42]

J.-Y. Zhu, T. Park, P. Isola, and A. A. Efros. Unpaired image-to-image translation using cycle-consistent adversarial networks. InProceedings of the IEEE International Conference on Computer Vision (ICCV), 2017. 12

2017