pith. machine review for the scientific record. sign in

arxiv: 2604.18993 · v1 · submitted 2026-04-21 · 💻 cs.CV · cs.AI· cs.MM

Recognition: unknown

AutoAWG: Adverse Weather Generation with Adaptive Multi-Controls for Automotive Videos

Authors on Pith no claims yet

Pith reviewed 2026-05-10 03:21 UTC · model grok-4.3

classification 💻 cs.CV cs.AIcs.MM
keywords adverse weather generationvideo synthesisautonomous drivingperception robustnessnuScenes datasetcontrollable video generationFID and FVD evaluationsemantic fusion
0
0 comments X

The pith

AutoAWG creates adverse weather driving videos by fusing multiple controls while keeping safety objects intact.

A machine-rendered reading of the paper's core claim, the machinery that carries it, and where it could break.

The paper presents AutoAWG as a way to generate realistic videos of vehicles in rain, fog, or snow for training self-driving perception systems. The core problem it addresses is the shortage of real adverse-weather video data, which currently limits how well models handle dangerous conditions. AutoAWG fuses semantic controls adaptively to apply strong weather effects without distorting cars, pedestrians, or road signs, and it builds time sequences from single static images by anchoring to vanishing points. Masked training helps keep long clips stable. If the method works, it supplies reusable, high-quality training data that reuses clear-weather labels instead of requiring new real-world captures in bad weather.

Core claim

AutoAWG employs semantics-guided adaptive fusion of multiple controls to balance strong weather stylization with high-fidelity preservation of safety-critical targets; leverages a vanishing point-anchored temporal synthesis strategy to construct training sequences from static images; and adopts masked training to enhance long-horizon generation stability. On the nuScenes validation set, without first-frame conditioning FID and FVD drop by 50.0 percent and 16.1 percent relative to prior methods; with first-frame conditioning the reductions are 8.7 percent and 7.2 percent. The approach improves style fidelity, temporal consistency, and semantic-structural integrity for downstream perception in

What carries the argument

Semantics-guided adaptive fusion of multiple controls together with vanishing-point temporal synthesis, which applies weather style while anchoring object positions across frames from static images.

If this is right

  • Generated videos can reuse existing clear-weather annotations for adverse-weather training without new labeling.
  • Lower FID and FVD scores indicate higher visual and motion quality than earlier weather synthesis methods.
  • Vanishing-point anchoring reduces dependence on fully synthetic source data for creating video sequences.
  • Masked training improves stability when generating longer video clips for realistic driving scenarios.
  • The output videos support direct use in improving object detection and segmentation under weather stress.

Where Pith is reading between the lines

These are editorial extensions of the paper, not claims the author makes directly.

  • The same adaptive fusion idea could be tested on other scene variations such as night driving or seasonal changes.
  • Generated data might lower the cost of collecting rare adverse-weather recordings for fleet testing.
  • Combining the outputs with existing simulators could allow targeted creation of edge-case scenarios on demand.

Load-bearing premise

The generated videos preserve enough detail in cars, signs, and other safety targets that models trained on them actually perform better on real adverse-weather footage.

What would settle it

Measure whether perception models trained only on AutoAWG videos achieve higher accuracy on held-out real adverse-weather test sets than models trained on existing synthetic or limited real data.

Figures

Figures reproduced from arXiv: 2604.18993 by Daiguo Zhou, Danzhen Fu, Fei Wang, Fuhao Li, Haiyang Sun, Jiagao Hu, Jiayi Xie, Wenhua Liao, Zepeng Wang.

Figure 1
Figure 1. Figure 1: Overview of the proposed AutoAWG for adverse weather generation. R a w D e pth S k etc h Lin e art input fog rain snow night [PITH_FULL_IMAGE:figures/full_fig_p003_1.png] view at source ↗
Figure 2
Figure 2. Figure 2: Comparison of different control maps and gener [PITH_FULL_IMAGE:figures/full_fig_p003_2.png] view at source ↗
Figure 3
Figure 3. Figure 3: Adaptive fusion of controls guided by segmentation [PITH_FULL_IMAGE:figures/full_fig_p003_3.png] view at source ↗
Figure 4
Figure 4. Figure 4: Multi-camera scenario: stitched frames and corre [PITH_FULL_IMAGE:figures/full_fig_p004_4.png] view at source ↗
Figure 6
Figure 6. Figure 6: Visual comparison with image-based weather [PITH_FULL_IMAGE:figures/full_fig_p006_6.png] view at source ↗
Figure 7
Figure 7. Figure 7: Visual comparison with Panacea [38] and Vista [7]. Our approach generates more realistic and detailed vehicles, pedestrians, and obstacles compared to Panacea and Vista. superior visual fidelity to GCHQ and QTNet. This highlights the strong generalization ability of our model. Comparison with Video-based Generation. To visually compare our method with state-of-the-art approaches, we generate the same nuSce… view at source ↗
Figure 9
Figure 9. Figure 9: Transferring BDD100K samples to adverse weather [PITH_FULL_IMAGE:figures/full_fig_p006_9.png] view at source ↗
Figure 10
Figure 10. Figure 10: Long video generation results. The results show limited degradation over time, and the white van remains consistent. [PITH_FULL_IMAGE:figures/full_fig_p007_10.png] view at source ↗
Figure 12
Figure 12. Figure 12: Multi-camera results on nuScenes. The dashed [PITH_FULL_IMAGE:figures/full_fig_p007_12.png] view at source ↗
Figure 13
Figure 13. Figure 13: 6-view long-duration weather transformation results on nuScenes. [PITH_FULL_IMAGE:figures/full_fig_p008_13.png] view at source ↗
read the original abstract

Perception robustness under adverse weather remains a critical challenge for autonomous driving, with the core bottleneck being the scarcity of real-world video data in adverse weather. Existing weather generation approaches struggle to balance visual quality and annotation reusability. We present AutoAWG, a controllable Adverse Weather video Generation framework for Autonomous driving. Our method employs a semantics-guided adaptive fusion of multiple controls to balance strong weather stylization with high-fidelity preservation of safety-critical targets; leverages a vanishing point-anchored temporal synthesis strategy to construct training sequences from static images, thereby reducing reliance on synthetic data; and adopts masked training to enhance long-horizon generation stability. On the nuScenes validation set, AutoAWG significantly outperforms prior state-of-the-art methods: without first-frame conditioning, FID and FVD are relatively reduced by 50.0% and 16.1%; with first-frame conditioning, they are further reduced by 8.7% and 7.2%, respectively. Extensive qualitative and quantitative results demonstrate advantages in style fidelity, temporal consistency, and semantic--structural integrity, underscoring the practical value of AutoAWG for improving downstream perception in autonomous driving. Our code is available at: https://github.com/higherhu/AutoAWG

Editorial analysis

A structured set of objections, weighed in public.

Desk editor's note, referee report, simulated authors' rebuttal, and a circularity audit. Tearing a paper down is the easy half of reading it; the pith above is the substance, this is the friction.

Referee Report

2 major / 2 minor

Summary. The paper proposes AutoAWG, a controllable framework for generating adverse weather videos for autonomous driving. It employs semantics-guided adaptive fusion of multiple controls to balance strong weather stylization with high-fidelity preservation of safety-critical targets, a vanishing-point-anchored temporal synthesis strategy to build training sequences from static images, and masked training for long-horizon stability. On the nuScenes validation set, it reports relative FID reductions of 50.0% (no first-frame conditioning) and an additional 8.7% (with conditioning), plus FVD reductions of 16.1% and 7.2%, outperforming prior SOTA, with claims of advantages in style fidelity, temporal consistency, and semantic-structural integrity for downstream perception tasks. Code is released publicly.

Significance. If the adaptive fusion and temporal synthesis indeed preserve exact spatial layouts, instance boundaries, and motion trajectories of safety-critical objects (vehicles, pedestrians, signs) while enabling weather stylization, the work could help address the scarcity of real adverse-weather video data and improve robustness of perception models. Public code availability is a clear strength for reproducibility.

major comments (2)
  1. [Abstract] Abstract: The headline quantitative claims (FID reduced by 50.0% and 16.1% without first-frame conditioning; further 8.7% and 7.2% with conditioning) are presented without error bars, statistical significance tests, or ablations that isolate the contribution of the semantics-guided adaptive fusion and vanishing-point synthesis, which are load-bearing for the outperformance claim over prior SOTA.
  2. [Abstract] Abstract: The central assumption that the semantics-guided adaptive fusion preserves high-fidelity safety-critical targets for annotation reusability is not directly tested; FID/FVD operate on features insensitive to small geometric shifts or label drift, so the reported metric gains do not confirm the downstream-perception motivation.
minor comments (2)
  1. The abstract refers to 'extensive qualitative and quantitative results' without specifying the number of sequences, exact evaluation protocols, or additional metrics (e.g., instance-level consistency) used to support claims of semantic-structural integrity.
  2. Consider including a summary table of all reported FID/FVD values (with and without conditioning) alongside the baselines for easier comparison.

Simulated Author's Rebuttal

2 responses · 0 unresolved

We thank the referee for the constructive feedback on strengthening the quantitative reporting and validating core design assumptions. We respond to each major comment below and outline revisions to address the concerns.

read point-by-point responses
  1. Referee: [Abstract] Abstract: The headline quantitative claims (FID reduced by 50.0% and 16.1% without first-frame conditioning; further 8.7% and 7.2% with conditioning) are presented without error bars, statistical significance tests, or ablations that isolate the contribution of the semantics-guided adaptive fusion and vanishing-point synthesis, which are load-bearing for the outperformance claim over prior SOTA.

    Authors: We acknowledge that error bars, significance testing, and targeted ablations would increase confidence in the reported gains. The current numbers follow the single-run protocol standard in generative video papers. In the revised manuscript we will re-evaluate the main comparisons across three random seeds, report means and standard deviations, and add paired statistical tests against the strongest baselines. We will also extend the existing ablation section with two new controlled experiments that disable the adaptive fusion module and the vanishing-point anchoring in isolation, thereby quantifying their individual contributions to the final FID/FVD scores. revision: yes

  2. Referee: [Abstract] Abstract: The central assumption that the semantics-guided adaptive fusion preserves high-fidelity safety-critical targets for annotation reusability is not directly tested; FID/FVD operate on features insensitive to small geometric shifts or label drift, so the reported metric gains do not confirm the downstream-perception motivation.

    Authors: We agree that distribution-level metrics such as FID and FVD are insensitive to small geometric or label shifts and therefore do not by themselves confirm annotation reusability. The method was explicitly engineered with per-instance semantic and depth controls to keep object boundaries and trajectories intact; this is visually demonstrated across multiple figures. To provide direct evidence for the downstream claim, the revision will include a new evaluation that runs a frozen, pre-trained 3D object detector on both the original nuScenes videos and the AutoAWG-generated adverse-weather versions, reporting changes in detection mAP and instance-level IoU to quantify preservation of safety-critical annotations. revision: yes

Circularity Check

0 steps flagged

No circularity: empirical benchmark comparisons rest on external test data, not internal fitting or self-referential definitions.

full rationale

The paper introduces a generation framework (semantics-guided adaptive fusion, vanishing-point temporal synthesis, masked training) and reports FID/FVD improvements on the nuScenes validation set versus prior SOTA. No equations, derivations, or parameter-fitting steps are described that reduce outputs to inputs by construction. Claims depend on independent benchmark metrics rather than self-citations or renamed fits. This is the standard non-circular pattern for applied CV method papers.

Axiom & Free-Parameter Ledger

0 free parameters · 1 axioms · 0 invented entities

Abstract-only review; method implicitly relies on standard generative modeling assumptions without explicit free parameters or invented entities stated.

axioms (1)
  • domain assumption Generated videos preserve semantic-structural integrity of safety-critical objects under strong weather stylization
    Central to the balance claim but not demonstrated in abstract.

pith-pipeline@v0.9.0 · 5547 in / 1227 out tokens · 42336 ms · 2026-05-10T03:21:41.832258+00:00 · methodology

discussion (0)

Sign in with ORCID, Apple, or X to comment. Anyone can read and Pith papers without signing in.

Reference graph

Works this paper leans on

60 extracted references · 9 canonical work pages · 2 internal anchors

  1. [1]

    Holger Caesar, Varun Bankiti, Alex H Lang, Sourabh Vora, Venice Erin Liong, Qiang Xu, Anush Krishnan, Yu Pan, Giancarlo Baldan, and Oscar Beijbom. 2020. nuscenes: A multimodal dataset for autonomous driving. InProceedings of the IEEE/CVF conference on computer vision and pattern recognition. 11621–11631

  2. [2]

    Liang-Chieh Chen, George Papandreou, Florian Schroff, and Hartwig Adam. 2017. Rethinking atrous convolution for semantic image segmentation.arXiv preprint arXiv:1706.05587(2017)

  3. [3]

    Rui Chen, Zehuan Wu, Yichen Liu, Yuxin Guo, Jingcheng Ni, Haifeng Xia, and Siyu Xia. 2024. Unimlvg: Unified framework for multi-view long video generation with comprehensive control capabilities for autonomous driving.arXiv preprint arXiv:2412.04842(2024)

  4. [4]

    Marius Cordts, Mohamed Omran, Sebastian Ramos, Timo Rehfeld, Markus En- zweiler, Rodrigo Benenson, Uwe Franke, Stefan Roth, and Bernt Schiele. 2016. The cityscapes dataset for semantic urban scene understanding. InProceedings of the IEEE conference on computer vision and pattern recognition. 3213–3223

  5. [5]

    Ruiyuan Gao, Kai Chen, Bo Xiao, Lanqing Hong, Zhenguo Li, and Qiang Xu

  6. [6]

    MagicDrive-V2: High-Resolution Long Video Generation for Autonomous Driving with Adaptive Control. (2025)

  7. [7]

    Ruiyuan Gao, Kai Chen, Enze Xie, Lanqing Hong, Zhenguo Li, Dit-Yan Yeung, and Qiang Xu. 2023. Magicdrive: Street view generation with diverse 3d geometry control. InThe Eleventh International Conference on Learning Representations, ICLR 2023

  8. [8]

    Shenyuan Gao, Jiazhi Yang, Li Chen, Kashyap Chitta, Yihang Qiu, Andreas Geiger, Jun Zhang, and Hongyang Li. 2024. Vista: A generalizable driving world model with high fidelity and versatile controllability.arXiv preprint arXiv:2405.17398 (2024)

  9. [9]

    Mariam Hassan, Sebastian Stapf, Ahmad Rahimi, Pedro Rezende, Yasaman Haghighi, David Brüggemann, Isinsu Katircioglu, Lin Zhang, Xiaoran Chen, Suman Saha, et al . 2025. Gem: A generalizable ego-vision multimodal world model for fine-grained ego-motion, object dynamics, and scene composition control. InProceedings of the Computer Vision and Pattern Recognit...

  10. [10]

    Martin Heusel, Hubert Ramsauer, Thomas Unterthiner, Bernhard Nessler, and Sepp Hochreiter. 2017. Gans trained by a two time-scale update rule converge to a local nash equilibrium.Advances in neural information processing systems30 (2017)

  11. [11]

    Junpeng Jiang, Gangyi Hong, Lijun Zhou, Enhui Ma, Hengtong Hu, Xia Zhou, Jie Xiang, Fan Liu, Kaicheng Yu, Haiyang Sun, et al. 2024. Dive: Dit-based video generation with enhanced control.arXiv preprint arXiv:2409.01595(2024)

  12. [12]

    Bernhard Kerbl, Georgios Kopanas, Thomas Leimkühler, and George Drettakis

  13. [13]

    Graph.42, 4 (2023), 139–1

    3d gaussian splatting for real-time radiance field rendering.ACM Trans. Graph.42, 4 (2023), 139–1

  14. [14]

    Jeong Gi Kwak, Youngsaeng Jin, Yuanming Li, Dongsik Yoon, Donghyeon Kim, and Hanseok Ko. 2021. Adverse Weather Image Translation with Asymmetric and Uncertainty-aware GAN. In32nd British Machine Vision Conference, BMVC 2021

  15. [15]

    Gongjin Lan, Yang Peng, Qi Hao, and Chengzhong Xu. 2024. Sustechgan: Image generation for object detection in adverse conditions of autonomous driving. IEEE Transactions on Intelligent Vehicles(2024)

  16. [16]

    Ruoteng Li, Loong-Fah Cheong, and Robby T Tan. 2019. Heavy rain image restoration: Integrating physics model and conditional adversarial learning. In Proceedings of the IEEE/CVF conference on computer vision and pattern recognition. 1633–1642

  17. [17]

    Xuelong Li, Chen Li, Kai Kou, and Bin Zhao. 2022. Weather translation via weather-cue transferring.IEEE Transactions on Neural Networks and Learning Systems(2022)

  18. [18]

    Chih-Hao Lin, Zian Wang, Ruofan Liang, Yuxuan Zhang, Sanja Fidler, Shenlong Wang, and Zan Gojcic. 2025. Controllable Weather Synthesis and Removal with Video Diffusion Models.IEEE/CVF International Conference on Computer Vision (ICCV)(2025)

  19. [19]

    Tsung-Yi Lin, Michael Maire, Serge Belongie, James Hays, Pietro Perona, Deva Ramanan, Piotr Dollár, and C Lawrence Zitnick. 2014. Microsoft coco: Common objects in context. InComputer vision–ECCV 2014: 13th European conference, zurich, Switzerland, September 6-12, 2014, proceedings, part v 13. Springer, 740– 755

  20. [20]

    Yaron Lipman, Ricky TQ Chen, Heli Ben-Hamu, Maximilian Nickel, and Matthew Le. 2023. Flow Matching for Generative Modeling. InThe Eleventh International Conference on Learning Representations, ICLR 2023

  21. [21]

    Yun-Fu Liu, Da-Wei Jaw, Shih-Chia Huang, and Jenq-Neng Hwang. 2018. Desnownet: Context-aware deep network for snow removal.IEEE Transactions on Image Processing27, 6 (2018), 3064–3073

  22. [22]

    Zhijian Liu, Haotian Tang, Alexander Amini, Xinyu Yang, Huizi Mao, Daniela L Rus, and Song Han. 2023. Bevfusion: Multi-task multi-sensor fusion with unified bird’s-eye view representation. In2023 IEEE international conference on robotics and automation (ICRA). IEEE, 2774–2781

  23. [23]

    Ben Mildenhall, Pratul P Srinivasan, Matthew Tancik, Jonathan T Barron, Ravi Ramamoorthi, and Ren Ng. 2021. Nerf: Representing scenes as neural radiance fields for view synthesis.Commun. ACM65, 1 (2021), 99–106

  24. [24]

    Chaojun Ni, Guosheng Zhao, Xiaofeng Wang, Zheng Zhu, Wenkang Qin, Guan Huang, Chen Liu, Yuyin Chen, Yida Wang, Xueyang Zhang, et al. 2024. Recon- Dreamer: Crafting World Models for Driving Scene Reconstruction via Online Restoration.arXiv preprint arXiv:2411.19548(2024)

  25. [25]

    Jingcheng Ni, Yuxin Guo, Yichen Liu, Rui Chen, Lewei Lu, and Zehuan Wu. 2025. Maskgwm: A generalizable driving world model with video mask reconstruction. InProceedings of the Computer Vision and Pattern Recognition Conference. 22381– 22391

  26. [26]

    Siqi Ni, Xueyun Cao, Tao Yue, and Xuemei Hu. 2021. Controlling the rain: From removal to rendering. InProceedings of the IEEE/CVF Conference on Computer Vision and Pattern Recognition. 6328–6337

  27. [27]

    Ozan Özdenizci and Robert Legenstein. 2023. Restoring vision in adverse weather conditions with patch-based denoising diffusion models.IEEE Transactions on Pattern Analysis and Machine Intelligence45, 8 (2023), 10346–10357

  28. [28]

    Rémi Pautrat, Shaohui Liu, Petr Hruby, Marc Pollefeys, and Daniel Barath. 2023. Vanishing point estimation in uncalibrated images with prior gravity direction. In Proceedings of the IEEE/CVF International Conference on Computer Vision. 14118– 14127

  29. [29]

    Chenghao Qian, Wenjing Li, Yuhu Guo, and Gustav Markkula. 2025. Weath- erEdit: Controllable Weather Editing with 4D Gaussian Field. (2025). arXiv:2505.20471 [cs.CV] https://arxiv.org/abs/2505.20471

  30. [30]

    Rui Qian, Robby T Tan, Wenhan Yang, Jiajun Su, and Jiaying Liu. 2018. Attentive generative adversarial network for raindrop removal from a single image. In Proceedings of the IEEE conference on computer vision and pattern recognition. 2482–2491

  31. [31]

    Alec Radford, Jong Wook Kim, Chris Hallacy, Aditya Ramesh, Gabriel Goh, Sandhini Agarwal, Girish Sastry, Amanda Askell, Pamela Mishkin, Jack Clark, et al. 2021. Learning transferable visual models from natural language supervision. InInternational conference on machine learning. PmLR, 8748–8763

  32. [32]

    Christos Sakaridis, Dengxin Dai, and Luc Van Gool. 2021. ACDC: The adverse conditions dataset with correspondences for semantic driving scene understand- ing. InProceedings of the IEEE/CVF international conference on computer vision. 10765–10775

  33. [33]

    Yiren Song, Cheng Liu, and Mike Zheng Shou. 2025. Omniconsistency: Learn- ing style-agnostic consistency from paired stylization data.arXiv preprint arXiv:2505.18445(2025)

  34. [34]

    Zhuo Su, Jiehua Zhang, Longguang Wang, Hua Zhang, Zhen Liu, Matti Pietikäi- nen, and Li Liu. 2023. Lightweight pixel difference networks for efficient visual representation learning.IEEE Transactions on Pattern Analysis and Machine Intelligence45, 12 (2023), 14956–14974

  35. [35]

    Thomas Unterthiner, Sjoerd Van Steenkiste, Karol Kurach, Raphael Marinier, Marcin Michalski, and Sylvain Gelly. 2018. Towards accurate generative models of video: A new metric & challenges.arXiv preprint arXiv:1812.01717(2018)

  36. [36]

    Jeya Maria Jose Valanarasu, Rajeev Yasarla, and Vishal M Patel. 2022. Tran- sweather: Transformer-based restoration of images degraded by adverse weather conditions. InProceedings of the IEEE/CVF conference on computer vision and pattern recognition. 2353–2363

  37. [37]

    Lucai Wang, Hongda Qin, Xuanyu Zhou, Xiao Lu, and Fengting Zhang. 2023. R-YOLO: A robust object detector in adverse weather.IEEE Transactions on Instrumentation and Measurement72 (2023), 1–11

  38. [38]

    Xiaofeng Wang, Zheng Zhu, Guan Huang, Xinze Chen, Jiagang Zhu, and Jiwen Lu

  39. [39]

    InEuropean Conference on Computer Vision

    DriveDreamer: Towards Real-World-Drive World Models for Autonomous Driving. InEuropean Conference on Computer Vision. Springer, 55–72

  40. [40]

    Yuqi Wang, Jiawei He, Lue Fan, Hongxin Li, Yuntao Chen, and Zhaoxiang Zhang

  41. [41]

    InProceedings of the IEEE/CVF Conference on Computer Vision and Pattern Recognition

    Driving into the future: Multiview visual forecasting and planning with world model for autonomous driving. InProceedings of the IEEE/CVF Conference on Computer Vision and Pattern Recognition. 14749–14759

  42. [42]

    Yuqing Wen, Yucheng Zhao, Yingfei Liu, Fan Jia, Yanhui Wang, Chong Luo, Chi Zhang, Tiancai Wang, Xiaoyan Sun, and Xiangyu Zhang. 2024. Panacea: Panoramic and controllable video generation for autonomous driving. InPro- ceedings of the IEEE/CVF Conference on Computer Vision and Pattern Recognition. 6902–6912

  43. [43]

    Wei Wu, Xi Guo, Weixuan Tang, Tingxuan Huang, Chiyu Wang, Dongyue Chen, and Chenjing Ding. 2025. Drivescape: Towards high-resolution controllable multi-view driving video generation.Proceedings of the Computer Vision and Pattern Recognition Conference(2025)

  44. [44]

    Hanguang Xiao, Shihong Liu, Kun Zuo, Haipeng Xu, Yuyang Cai, Tianqi Liu, and Zhiying Yang. 2024. Multiple adverse weather image restoration: A review. Neurocomputing(2024), 129044

  45. [45]

    Bin Xie, Yingfei Liu, Tiancai Wang, Jiale Cao, and Xiangyu Zhang. 2025. Glad: A Streaming Scene Generator for Autonomous Driving. InThe Thirteenth Interna- tional Conference on Learning Representations, ICLR 2025, Singapore, April 24-28,

  46. [46]

    https://openreview.net/forum?id=ZFxpclrCCf ICMR ’26, June 16–19, 2026, Amsterdam, Netherlands Hu et al

    OpenReview.net. https://openreview.net/forum?id=ZFxpclrCCf ICMR ’26, June 16–19, 2026, Amsterdam, Netherlands Hu et al

  47. [47]

    Yunzhi Yan, Haotong Lin, Chenxu Zhou, Weijie Wang, Haiyang Sun, Kun Zhan, Xianpeng Lang, Xiaowei Zhou, and Sida Peng. 2024. Street gaussians: Model- ing dynamic urban scenes with gaussian splatting. InEuropean Conference on Computer Vision. Springer, 156–173

  48. [48]

    Jiazhi Yang, Shenyuan Gao, Yihang Qiu, Li Chen, Tianyu Li, Bo Dai, Kashyap Chitta, Penghao Wu, Jia Zeng, Ping Luo, et al. 2024. Generalized predictive model for autonomous driving. InProceedings of the IEEE/CVF Conference on Computer Vision and Pattern Recognition. 14662–14672

  49. [49]

    Yijun Yang, Angelica I Aviles-Rivero, Huazhu Fu, Ye Liu, Weiming Wang, and Lei Zhu. 2023. Video adverse-weather-component suppression network via weather messenger and adversarial backpropagation. InProceedings of the IEEE/CVF In- ternational Conference on Computer Vision. 13200–13210

  50. [50]

    Yijun Yang, Hongtao Wu, Angelica I Aviles-Rivero, Yulun Zhang, Jing Qin, and Lei Zhu. 2024. Genuine knowledge from practice: Diffusion test-time adaptation for video adverse weather removal. In2024 IEEE/CVF Conference on Computer Vision and Pattern Recognition (CVPR). IEEE, 25606–25616

  51. [51]

    Zhuoyi Yang, Jiayan Teng, Wendi Zheng, Ming Ding, Shiyu Huang, Jiazheng Xu, Yuanming Yang, Wenyi Hong, Xiaohan Zhang, Guanyu Feng, et al . 2025. CogVideoX: Text-to-Video Diffusion Models with An Expert Transformer. In The Thirteenth International Conference on Learning Representations,ICLR 2025, Singapore, April 24-28, 2025. OpenReview.net

  52. [52]

    Wei Yin, Jianming Zhang, Oliver Wang, Simon Niklaus, Long Mai, Simon Chen, and Chunhua Shen. 2021. Learning to recover 3d scene shape from a single image. InProceedings of the IEEE/CVF Conference on Computer Vision and Pattern Recognition. 204–213

  53. [53]

    Fisher Yu, Haofeng Chen, Xin Wang, Wenqi Xian, Yingying Chen, Fangchen Liu, Vashisht Madhavan, and Trevor Darrell. 2020. BDD100K: A Diverse Driv- ing Dataset for Heterogeneous Multitask Learning. InThe IEEE Conference on Computer Vision and Pattern Recognition (CVPR)

  54. [54]

    Yuxiao Zhang, Alexander Carballo, Hanting Yang, and Kazuya Takeda. 2023. Perception and sensing for autonomous vehicles under adverse weather condi- tions: A survey.ISPRS Journal of Photogrammetry and Remote Sensing196 (2023), 146–177

  55. [55]

    Guosheng Zhao, Chaojun Ni, Xiaofeng Wang, Zheng Zhu, Xueyang Zhang, Yida Wang, Guan Huang, Xinze Chen, Boyuan Wang, Youyi Zhang, et al. 2024. Drivedreamer4d: World models are effective data machines for 4d driving scene representation.arXiv preprint arXiv:2410.13571(2024)

  56. [56]

    Guosheng Zhao, Xiaofeng Wang, Zheng Zhu, Xinze Chen, Guan Huang, Xiaoyi Bao, and Xingang Wang. 2025. Drivedreamer-2: Llm-enhanced world models for diverse driving video generation. InProceedings of the AAAI Conference on Artificial Intelligence, Vol. 39. 10412–10420

  57. [57]

    Rui Zhao, Huibin Yan, and Shuoyao Wang. 2024. Revisiting Domain-Adaptive Object Detection in Adverse Weather by the Generation and Composition of High-Quality Pseudo-labels. InEuropean Conference on Computer Vision. Springer, 270–287

  58. [58]

    Xiaoyu Zhou, Zhiwei Lin, Xiaojun Shan, Yongtao Wang, Deqing Sun, and Ming- Hsuan Yang. 2024. Drivinggaussian: Composite gaussian splatting for surround- ing dynamic autonomous driving scenes. InProceedings of the IEEE/CVF conference on computer vision and pattern recognition. 21634–21643

  59. [59]

    Yunsong Zhou, Michael Simon, Zhenghao Peng, Sicheng Mo, Hongzi Zhu, Minyi Guo, and Bolei Zhou. 2024. Simgen: Simulator-conditioned driving scene genera- tion.Advances in Neural Information Processing Systems37 (2024), 48838–48874

  60. [60]

    Yurui Zhu, Tianyu Wang, Xueyang Fu, Xuanyu Yang, Xin Guo, Jifeng Dai, Yu Qiao, and Xiaowei Hu. 2023. Learning weather-general and weather-specific features for image restoration under multiple adverse weather conditions. In Proceedings of the IEEE/CVF conference on computer vision and pattern recognition. 21747–21758