pith. machine review for the scientific record. sign in

arxiv: 2605.11508 · v1 · submitted 2026-05-12 · 💻 cs.CV

Recognition: 2 theorem links

· Lean Theorem

LiBrA-Net: Lie-Algebraic Bilateral Affine Fields for Real-Time 4K Video Dehazing

Chengchao Shen, Dianjie Lu, Guangwei Gao, Guijuan Zhang, Pengwen Dai, Wei Wang, Yongcong Wang, Zhuoran Zheng

Authors on Pith no claims yet

Pith reviewed 2026-05-13 02:25 UTC · model grok-4.3

classification 💻 cs.CV
keywords video dehazingreal-time 4K processingbilateral gridsLie algebraaffine transformsUHD video restorationdepth-guided processing
0
0 comments X

The pith

Atmospheric dehazing of video reduces to applying per-pixel affine transforms whose parameters come from low-resolution bilateral grids fused in a Lie algebra.

A machine-rendered reading of the paper's core claim, the machinery that carries it, and where it could break.

The authors argue that the process of removing haze from video frames can be modeled as a per-pixel affine transformation controlled by the scene's depth, which varies slowly and can thus be captured in compact bilateral grids predicted at low resolution. This decoupling allows the network to handle high-resolution inputs like 4K without a corresponding increase in computation. LiBrA-Net implements this by splitting the field into spatial-color and temporal components, combining their coefficients using operations from the Lie algebra of 3 by 3 matrices to enforce consistency, converting them to valid transformations with a Cayley parameterization, and adding a simple branch to recover fine details from the original input. If successful, this would enable practical real-time dehazing of ultra-high-definition video on standard hardware, addressing the current limitation where existing methods cannot process continuous 4K sequences efficiently. The paper also introduces the first paired 4K video dehazing dataset with additional annotations to support such research.

Core claim

LiBrA-Net factorizes the spatiotemporal affine field into a spatial-color and a temporal bilateral sub-grid predicted at a fixed low resolution, fuses their coefficients in the gl(3) Lie algebra under group-theoretic regularization, maps the result to invertible GL(3) transforms via a Cayley parameterization, and restores high-frequency detail through a lightweight input-guided branch, achieving state-of-the-art performance on video dehazing benchmarks while running native 4K at 25 FPS with 6.12 million parameters.

What carries the argument

The Lie-algebraic bilateral affine field, which encodes the depth-governed per-pixel affine transform in low-resolution grids for efficient prediction decoupled from output resolution.

If this is right

  • LiBrA-Net achieves a new state of the art on the UHV-4K, REVIDE, and HazeWorld video dehazing benchmarks.
  • The method processes native 4K video at 25 frames per second on a single GPU using only 6.12 million parameters.
  • A new benchmark dataset UHV-4K is released, providing paired hazy and clear 4K videos with depth, transmission, and optical flow annotations for every frame.

Where Pith is reading between the lines

These are editorial extensions of the paper, not claims the author makes directly.

  • Similar bilateral grid encodings could be applied to other video restoration tasks where effects depend on low-frequency scene properties like depth or illumination.
  • The use of Lie algebra fusion might provide a general way to regularize spatiotemporal consistency in learned video processing models.
  • Testing the method on real-world captured hazy videos without synthetic pairing could reveal how well the affine model generalizes beyond the benchmark assumptions.

Load-bearing premise

Atmospheric dehazing reduces to a per-pixel affine transform governed by the low-frequency depth field, which can be compactly encoded in bilateral grids whose prediction cost is decoupled from the output resolution.

What would settle it

Running LiBrA-Net on a sequence of 4K frames where the haze formation deviates strongly from the depth-dependent affine model, such as in cases of non-uniform lighting or dense fog not following the atmospheric scattering equation, and observing if the output quality falls below prior methods or produces visible artifacts.

Figures

Figures reproduced from arXiv: 2605.11508 by Chengchao Shen, Dianjie Lu, Guangwei Gao, Guijuan Zhang, Pengwen Dai, Wei Wang, Yongcong Wang, Zhuoran Zheng.

Figure 1
Figure 1. Figure 1: (a) Higher resolution reveals finer scene structure that haze attenuates first. (b) UHV-4K benchmark with five aligned modalities. (c) Bilateral-grid prediction is resolution-decoupled: our throughput drops sub-linearly with pixel count from 720p to 8K, while dense baselines slow down roughly linearly. (d) Quality–speed trade-off on UHV-4K; LiBrA-Net reaches 24.28 dB at real-time 4K throughput. context tha… view at source ↗
Figure 2
Figure 2. Figure 2: Overview of LiBrA-Net. Two encoder branches predict bilateral grid coefficients at a fixed low resolution: the Chromatic Affine Field consumes the center frame, and the Temporal Affine Field consumes all T frames. Their coefficients are fused in gl(3) and mapped to per-pixel GL(3) affine transforms via the Cayley map. A lightweight HF-Refiner restores high-frequency detail. and recent 4K paired sets [52, 2… view at source ↗
Figure 3
Figure 3. Figure 3: UHV-4K composition. (a) Joint distribution of source scenes and geographic regions across the 100 videos. (b) Per-video scattering coefficient β and atmo￾spheric light A∞, grouped by haze tier (color) and train/test split (marker shape). The ASM attenuates image gradients by the local transmission: ∇I ≈ t ∇J (Eq. 8). Fine structures near the 4K Nyquist limit—thin cables, distant rail￾ings, signage strokes—… view at source ↗
Figure 4
Figure 4. Figure 4: Qualitative comparison. Rows: (a) UHV-4K, (b) REVIDE, (c) HazeWorld. Each pair shows two frames from the same video [PITH_FULL_IMAGE:figures/full_fig_p008_4.png] view at source ↗
Figure 5
Figure 5. Figure 5: Lie-algebraic composition at the trained operating point. (a) Distribution of relative composition error ϵ on UHV-4K and REVIDE. (b) ϵ vs. ∥M∥ 2 F on log–log axes; dashed line shows slope ≈1. We probe the trained grid to understand what it represents beyond which compo￾nents matter. Three findings emerge. First, the spatial–color grid exploits the full chromatic axis across all eight color bins rather than… view at source ↗
Figure 6
Figure 6. Figure 6: Inter-frame difference heatmaps on a UHV-4K test scene. Columns: three consecutive frames and the per-pixel absolute difference |Jˆ t−Jˆ t−1| (shared magma colorbar, 0–1). Amber boxes mark two fixed regions of interest. LiBrA-Net’s difference map stays uniformly dark, while single￾image methods exhibit halos around moving objects. G External Validity: Extended Results This section provides the visualizatio… view at source ↗
Figure 7
Figure 7. Figure 7: Downstream perception on a UHV-4K test scene. (a) Object detection (YOLOv8-X, COCO-pretrained). (b) Semantic segmentation (SegFormer-B5, ADE20K-pretrained). Both models run at default settings on each method’s dehazed output; any performance gap comes solely from the upstream dehazer. Ours MAP-Net CG-IDN DVD ViWS-Net 18 20 22 24 26 28 30 32 28.19 27.96 22.13 23.56 20.18 Hazy 26.91 ↑+4.8% ↑+3.9% ↓−17.8% ↓−1… view at source ↗
Figure 8
Figure 8. Figure 8: No-reference image quality on real-world 4K hazy videos. Bars show each video method’s score relative to the Hazy input (dashed line). LiBrA-Net is the only method that improves all three metrics. G.2 Real-World No-Reference Quality and Qualitative Results [PITH_FULL_IMAGE:figures/full_fig_p018_8.png] view at source ↗
Figure 9
Figure 9. Figure 9: Real-world 4K dehazing across eight consecutive frames. Top: lake-and-skyline scene; bottom: Venice street. Row labels: (a) CG-IDN, (b) DVD, (c) ViWS-Net, (d) MAP-Net, (e) Ours. Red boxes in the hazy row indicate the cropped region shown in rows (a)–(e). LiBrA-Net preserves thin structures and avoids color casts across both scenes; competing methods introduce halos, residual haze, or saturated shifts. (col… view at source ↗
Figure 10
Figure 10. Figure 10: Anatomy of the spatial–color grid. The grid exploits all eight color bins with distinct affine profiles (a) and self-organizes cells into color-coherent clusters (b), confirming a structured chromatic representation [PITH_FULL_IMAGE:figures/full_fig_p020_10.png] view at source ↗
read the original abstract

Currently, there is a gap in the field of ultra-high-definition (UHD) video dehazing due to the lack of a benchmark for evaluation. Furthermore, existing video dehazing methods cannot run on consumer-grade GPUs when processing continuous UHD sequences of 3--5 frames at a time. In this paper, we address both issues with a new benchmark and an efficient method. Our key observation is that atmospheric dehazing reduces to a per-pixel affine transform governed by the low-frequency depth field, which can be compactly encoded in bilateral grids whose prediction cost is decoupled from the output resolution. Building on this, we propose LiBrA-Net, which factorizes the spatiotemporal affine field into a spatial--color and a temporal bilateral sub-grid predicted at a fixed low resolution, fuses their coefficients in the $\mathfrak{gl}(3)$ Lie algebra under group-theoretic regularization, maps the result to invertible GL(3) transforms via a Cayley parameterization, and restores high-frequency detail through a lightweight input-guided branch. We further release UHV-4K, the first paired 4K video dehazing benchmark with depth, transmission, and optical-flow annotations on every frame. Across UHV-4K, REVIDE, and HazeWorld, LiBrA-Net sets a new state of the art among compared video dehazing methods while running native 4K at 25 FPS on a single GPU with only 6.12 M parameters. Code and data are available at https://anonymous.4open.science/r/LiBrA-Net-42B8.

Editorial analysis

A structured set of objections, weighed in public.

Desk editor's note, referee report, simulated authors' rebuttal, and a circularity audit. Tearing a paper down is the easy half of reading it; the pith above is the substance, this is the friction.

Referee Report

0 major / 3 minor

Summary. The paper claims that atmospheric dehazing reduces to per-pixel affine transforms governed by a low-frequency depth field, which can be compactly encoded in bilateral grids whose prediction cost is decoupled from output resolution. It proposes LiBrA-Net, which factorizes the spatiotemporal affine field into spatial-color and temporal bilateral sub-grids predicted at fixed low resolution, fuses coefficients in the gl(3) Lie algebra under group-theoretic regularization, maps to invertible GL(3) transforms via Cayley parameterization, and restores high-frequency detail via a lightweight input-guided branch. The work also releases the UHV-4K benchmark (first paired 4K video dehazing dataset with depth, transmission, and optical-flow annotations per frame) and reports state-of-the-art results on UHV-4K, REVIDE, and HazeWorld while running native 4K at 25 FPS on a single GPU with 6.12 M parameters.

Significance. If the results hold, the work is significant because it directly addresses the documented gap in UHD video dehazing benchmarks and real-time methods for consumer GPUs. The bilateral-grid factorization combined with Lie-algebraic fusion and Cayley parameterization provides a principled, resolution-independent efficiency mechanism that follows from the standard atmospheric scattering model. The public release of the UHV-4K benchmark together with code and data is a clear strength that supports reproducibility and future research.

minor comments (3)
  1. [Abstract] Abstract: the code and data link points to an anonymous repository; replace with a permanent, non-anonymous link in the camera-ready version.
  2. [Abstract] Abstract: the SOTA claim would be strengthened by including one or two concrete quantitative metrics (e.g., average PSNR or SSIM gains) rather than stating the claim only qualitatively.
  3. [Method] Notation: the transition from gl(3) fusion to GL(3) via Cayley parameterization is described at a high level; a short explicit statement of the mapping (e.g., the Cayley transform formula) in the main text would improve readability for readers unfamiliar with Lie-group methods.

Simulated Author's Rebuttal

0 responses · 0 unresolved

We thank the referee for their positive summary of LiBrA-Net and the UHV-4K benchmark, as well as the recommendation for minor revision. We appreciate the recognition of the significance of the Lie-algebraic factorization for resolution-independent efficiency and the public release of the first paired 4K video dehazing dataset with per-frame annotations.

Circularity Check

0 steps flagged

No significant circularity identified

full rationale

The paper grounds its approach in the standard atmospheric scattering model I = J·t + A(1-t), which is independently known to be a per-pixel affine transform whose coefficients are governed by transmission t (hence depth). Bilateral grids are a pre-existing, resolution-decoupled technique for representing low-frequency fields; the spatial-color/temporal factorization, gl(3) fusion, Cayley parameterization to GL(3), and input-guided high-frequency branch are explicit design decisions that follow from this model without reducing any claimed prediction to a fitted input or self-referential definition. No load-bearing self-citations, uniqueness theorems imported from the authors' prior work, or ansatzes smuggled via citation appear in the derivation chain. The efficiency and benchmark claims are independent of the modeling steps and rest on external evaluation.

Axiom & Free-Parameter Ledger

1 free parameters · 1 axioms · 0 invented entities

The central claim rests on the domain assumption that dehazing equals per-pixel affine transforms driven by low-frequency depth, plus the engineering choice to factorize the field into fixed-low-resolution bilateral grids.

free parameters (1)
  • fixed low resolution of bilateral grids
    Prediction is performed at a resolution decoupled from 4K output; exact value not stated in abstract but treated as a design choice.
axioms (1)
  • domain assumption Atmospheric dehazing reduces to a per-pixel affine transform governed by the low-frequency depth field
    Stated as the key observation that enables compact bilateral-grid encoding.

pith-pipeline@v0.9.0 · 5616 in / 1300 out tokens · 60510 ms · 2026-05-13T02:25:03.933513+00:00 · methodology

discussion (0)

Sign in with ORCID, Apple, or X to comment. Anyone can read and Pith papers without signing in.

Lean theorems connected to this paper

Citations machine-checked in the Pith Canon. Every link opens the source theorem in the public Lean library.

Reference graph

Works this paper leans on

53 extracted references · 53 canonical work pages · 1 internal anchor

  1. [1]

    O-haze: a dehazing benchmark with real hazy and haze-free outdoor images

    Codruta O Ancuti, Cosmin Ancuti, Radu Timofte, and Christophe De Vleeschouwer. O-haze: a dehazing benchmark with real hazy and haze-free outdoor images. InProceedings of the IEEE conference on computer vision and pattern recognition workshops, pages 754–762, 2018

  2. [2]

    I-haze: A dehazing benchmark with real hazy and haze-free indoor images

    Cosmin Ancuti, Codruta O Ancuti, Radu Timofte, and Christophe De Vleeschouwer. I-haze: A dehazing benchmark with real hazy and haze-free indoor images. InInternational conference on advanced concepts for intelligent vision systems, pages 620–631. Springer, 2018

  3. [3]

    Non-local image dehazing

    Dana Berman, Tali treibitz, and Shai Avidan. Non-local image dehazing. InProceedings of the IEEE Conference on Computer Vision and Pattern Recognition (CVPR), June 2016

  4. [4]

    Real-time edge-aware image processing with the bilateral grid.ACM Transactions on Graphics (TOG), 26(3):103–es, 2007

    Jiawen Chen, Sylvain Paris, and Frédo Durand. Real-time edge-aware image processing with the bilateral grid.ACM Transactions on Graphics (TOG), 26(3):103–es, 2007

  5. [5]

    Hasinoff

    Jiawen Chen, Andrew Adams, Neal Wadhwa, and Samuel W. Hasinoff. Bilateral guided upsampling.ACM Transactions on Graphics (TOG), 35:1 – 8, 2016

  6. [6]

    Tokenize image patches: Global context fusion for effective haze removal in large images

    Jiuchen Chen, Xinyu Yan, Qizhi Xu, and Kaiqi Li. Tokenize image patches: Global context fusion for effective haze removal in large images. InProceedings of the Computer Vision and Pattern Recognition Conference (CVPR), pages 2258–2268, June 2025

  7. [7]

    Multi-scale boosted dehazing network with dense feature fusion

    Hang Dong, Jinshan Pan, Lei Xiang, Zhe Hu, Xinyi Zhang, Fei Wang, and Ming-Hsuan Yang. Multi-scale boosted dehazing network with dense feature fusion. InProceedings of the IEEE/CVF conference on computer vision and pattern recognition, pages 2157–2167, 2020

  8. [8]

    Eskicioglu and P.S

    A.M. Eskicioglu and P.S. Fisher. Image quality measures and their performance.IEEE Transactions on Communications, 43(12):2959–2965, 1995. doi: 10.1109/26.477498

  9. [9]

    Depth-centric dehazing and depth-estimation from real-world hazy driving video, 2024

    Junkai Fan, Kun Wang, Zhiqiang Yan, Xiang Chen, Shangbing Gao, Jun Li, and Jian Yang. Depth-centric dehazing and depth-estimation from real-world hazy driving video, 2024. URL https://arxiv.org/abs/2412.11395

  10. [10]

    Driving-video dehazing with non-aligned regularization for safety assistance

    Junkai Fan, Jiangwei Weng, Kun Wang, Yijun Yang, Jianjun Qian, Jun Li, and Jian Yang. Driving-video dehazing with non-aligned regularization for safety assistance. InProceedings of the IEEE/CVF Conference on Computer Vision and Pattern Recognition, pages 26109–26119, 2024

  11. [11]

    Deep bilateral learning for real-time image enhancement.ACM Transactions on Graphics (TOG), 36(4):1–12, 2017

    Michaël Gharbi, Jiawen Chen, Jonathan T Barron, Samuel W Hasinoff, and Frédo Durand. Deep bilateral learning for real-time image enhancement.ACM Transactions on Graphics (TOG), 36(4):1–12, 2017

  12. [12]

    3d packing for self-supervised monocular depth estimation

    Vitor Guizilini, Rares Ambrus, Sudeep Pillai, Allan Raventos, and Adrien Gaidon. 3d packing for self-supervised monocular depth estimation. InProceedings of the IEEE/CVF conference on computer vision and pattern recognition, pages 2485–2494, 2020

  13. [13]

    Image dehazing transformer with transmission-aware 3d position embedding

    Chun-Le Guo, Qixin Yan, Saeed Anwar, Runmin Cong, Wenqi Ren, and Chongyi Li. Image dehazing transformer with transmission-aware 3d position embedding. InProceedings of the IEEE/CVF conference on computer vision and pattern recognition, pages 5812–5820, 2022

  14. [14]

    Single image haze removal using dark channel prior

    Kaiming He, Jian Sun, and Xiaoou Tang. Single image haze removal using dark channel prior. IEEE transactions on pattern analysis and machine intelligence, 33(12):2341–2353, 2010

  15. [15]

    Guided image filtering.IEEE Transactions on Pattern Analysis and Machine Intelligence, 35(6):1397–1409, 2013

    Kaiming He, Jian Sun, and Xiaoou Tang. Guided image filtering.IEEE Transactions on Pattern Analysis and Machine Intelligence, 35(6):1397–1409, 2013. doi: 10.1109/TPAMI.2012.213. 10

  16. [16]

    Orthogonal recurrent neural networks with scaled cayley transform

    Kyle Helfrich, Devin Willmott, and Qiang Ye. Orthogonal recurrent neural networks with scaled cayley transform. InInternational Conference on Machine Learning, pages 1969–1978. PMLR, 2018

  17. [17]

    Learning blind video temporal consistency

    Wei-Sheng Lai, Jia-Bin Huang, Oliver Wang, Eli Shechtman, Ersin Yumer, and Ming-Hsuan Yang. Learning blind video temporal consistency. InProceedings of the European conference on computer vision (ECCV), pages 170–185, 2018

  18. [18]

    Blind video temporal consistency via deep video prior.Advances in neural information processing systems, 33:1083–1093, 2020

    Chenyang Lei, Yazhou Xing, and Qifeng Chen. Blind video temporal consistency via deep video prior.Advances in neural information processing systems, 33:1083–1093, 2020

  19. [19]

    Cheap orthogonal constraints in neu- ral networks: A simple parametrization of the orthogonal and unitary group

    Mario Lezcano-Casado and David Martínez-Rubio. Cheap orthogonal constraints in neu- ral networks: A simple parametrization of the orthogonal and unitary group. In Kamalika Chaudhuri and Ruslan Salakhutdinov, editors,Proceedings of the 36th International Con- ference on Machine Learning, volume 97 ofProceedings of Machine Learning Research, pages 3794–3803...

  20. [20]

    End-to-end united video dehazing and detection

    Boyi Li, Xiulian Peng, Zhangyang Wang, Jizheng Xu, and Dan Feng. End-to-end united video dehazing and detection. InProceedings of the AAAI conference on artificial intelligence, volume 32, 2018

  21. [21]

    Benchmarking single-image dehazing and beyond.IEEE transactions on image processing, 28 (1):492–505, 2018

    Boyi Li, Wenqi Ren, Dengpan Fu, Dacheng Tao, Dan Feng, Wenjun Zeng, and Zhangyang Wang. Benchmarking single-image dehazing and beyond.IEEE transactions on image processing, 28 (1):492–505, 2018

  22. [22]

    Embedding fourier for ultra-high-definition low-light image enhancement

    Chongyi Li, Chunle Guo, Man Zhou, Zhexin Liang, Shangchen Zhou, Ruicheng Feng, and Chen Change Loy. Embedding fourier for ultra-high-definition low-light image enhancement. ArXiv, abs/2302.11831, 2023

  23. [23]

    Phase-based memory network for video dehazing

    Ye Liu, Liang Wan, Huazhu Fu, Jing Qin, and Lei Zhu. Phase-based memory network for video dehazing. InProceedings of the 30th ACM international conference on multimedia, pages 5427–5435, 2022

  24. [24]

    Uhd-processer: Unified uhd image restoration with progressive frequency learning and degradation-aware prompts

    Yidi Liu, Dong Li, Xueyang Fu, Xin Lu, Jie Huang, and Zheng-Jun Zha. Uhd-processer: Unified uhd image restoration with progressive frequency learning and degradation-aware prompts. In Proceedings of the Computer Vision and Pattern Recognition Conference, pages 23121–23130, 2025

  25. [25]

    Uvg dataset: 50/120fps 4k sequences for video codec analysis and development

    Alexandre Mercat, Marko Viitanen, and Jarno Vanne. Uvg dataset: 50/120fps 4k sequences for video codec analysis and development. InProceedings of the 11th ACM multimedia systems conference, pages 297–302, 2020

  26. [26]

    Vision and the atmosphere.International journal of computer vision, 48(3):233–254, 2002

    Srinivasa G Narasimhan and Shree K Nayar. Vision and the atmosphere.International journal of computer vision, 48(3):233–254, 2002

  27. [27]

    Maxime Oquab, Timothée Darcet, Théo Moutakanni, Huy V . V o, Marc Szafraniec, Vasil Khalidov, Pierre Fernandez, Daniel HAZIZA, Francisco Massa, Alaaeldin El-Nouby, Mido Assran, Nicolas Ballas, Wojciech Galuba, Russell Howes, Po-Yao Huang, Shang-Wen Li, Ishan Misra, Michael Rabbat, Vasu Sharma, Gabriel Synnaeve, Hu Xu, Herve Jegou, Julien Mairal, Patrick L...

  28. [28]

    The 2017 DAVIS Challenge on Video Object Segmentation

    Jordi Pont-Tuset, Federico Perazzi, Sergi Caelles, Pablo Arbeláez, Alex Sorkine-Hornung, and Luc Van Gool. The 2017 davis challenge on video object segmentation.arXiv preprint arXiv:1704.00675, 2017

  29. [29]

    Ffa-net: Feature fusion attention network for single image dehazing

    Xu Qin, Zhilin Wang, Yuanchao Bai, Xiaodong Xie, and Huizhu Jia. Ffa-net: Feature fusion attention network for single image dehazing. InProceedings of the AAAI conference on artificial intelligence, volume 34, pages 11908–11915, 2020. 11

  30. [30]

    Deep video dehazing with semantic segmentation.IEEE Transactions on Image Processing, 28 (4):1895–1908, 2019

    Wenqi Ren, Jingang Zhang, Xiangyu Xu, Lin Ma, Xiaochun Cao, Gaofeng Meng, and Wei Liu. Deep video dehazing with semantic segmentation.IEEE Transactions on Image Processing, 28 (4):1895–1908, 2019. doi: 10.1109/TIP.2018.2876178

  31. [31]

    Vision transformers for single image dehazing

    Yuda Song, Zhuqing He, Hui Qian, and Xin Du. Vision transformers for single image dehazing. IEEE Transactions on Image Processing, 32:1927–1941, 2023. doi: 10.1109/TIP.2023.3256763

  32. [32]

    Adapool: Exponential adaptive pooling for information- retaining downsampling.IEEE Transactions on Image Processing, 32:251–266, 2022

    Alexandros Stergiou and Ronald Poppe. Adapool: Exponential adaptive pooling for information- retaining downsampling.IEEE Transactions on Image Processing, 32:251–266, 2022

  33. [33]

    Raft: Recurrent all-pairs field transforms for optical flow

    Zachary Teed and Jia Deng. Raft: Recurrent all-pairs field transforms for optical flow. In European conference on computer vision, pages 402–419. Springer, 2020

  34. [34]

    Selvaraju, Michael Cogswell, Ab- hishek Das, Ramakrishna Vedantam, Devi Parikh, and Dhruv Batra

    C. Tomasi and R. Manduchi. Bilateral filtering for gray and color images. InSixth International Conference on Computer Vision (IEEE Cat. No.98CH36271), pages 839–846, 1998. doi: 10.1109/ICCV .1998.710815

  35. [35]

    Yolov8, 2023

    Ultralytics. Yolov8, 2023. URLhttps://github.com/ultralytics/ultralytics

  36. [36]

    Correlation matching transformation transformers for uhd image restoration, 2024

    Cong Wang, Jinshan Pan, Wei Wang, Gang Fu, Siyuan Liang, Mengzhu Wang, Xiao-Ming Wu, and Jun Liu. Correlation matching transformation transformers for uhd image restoration, 2024. URLhttps://arxiv.org/abs/2406.00629

  37. [37]

    Ultra-high-definition image restoration: New benchmarks and a dual interaction prior-driven solution.IEEE Transactions on Circuits and Systems for Video Technology, 2025

    Liyan Wang, Cong Wang, Jinshan Pan, Xiaofeng Liu, Weixiang Zhou, Xiaoran Sun, Wei Wang, and Zhixun Su. Ultra-high-definition image restoration: New benchmarks and a dual interaction prior-driven solution.IEEE Transactions on Circuits and Systems for Video Technology, 2025

  38. [38]

    Bovik, H.R

    Zhou Wang, A.C. Bovik, H.R. Sheikh, and E.P. Simoncelli. Image quality assessment: from error visibility to structural similarity.IEEE Transactions on Image Processing, 13(4):600–612,

  39. [39]

    doi: 10.1109/TIP.2003.819861

  40. [40]

    Ua-detrac: A new benchmark and protocol for multi-object detection and tracking.Computer Vision and Image Understanding, 193:102907, 2020

    Longyin Wen, Dawei Du, Zhaowei Cai, Zhen Lei, Ming-Ching Chang, Honggang Qi, Jongwoo Lim, Ming-Hsuan Yang, and Siwei Lyu. Ua-detrac: A new benchmark and protocol for multi-object detection and tracking.Computer Vision and Image Understanding, 193:102907, 2020

  41. [41]

    Video dehazing via a dual-stage temporal fusion net: J

    Junwei Xi, Zhihua Chen, Lei Dai, and Lei Liang. Video dehazing via a dual-stage temporal fusion net: J. xi et al.The Visual Computer, 41(11):8569–8578, 2025

  42. [42]

    Segformer: Simple and efficient design for semantic segmentation with transformers.Advances in neural information processing systems, 34:12077–12090, 2021

    Enze Xie, Wenhai Wang, Zhiding Yu, Anima Anandkumar, Jose M Alvarez, and Ping Luo. Segformer: Simple and efficient design for semantic segmentation with transformers.Advances in neural information processing systems, 34:12077–12090, 2021

  43. [43]

    Synfog: A photo- realistic synthetic fog dataset based on end-to-end imaging simulation for advancing real-world defogging in autonomous driving

    Yiming Xie, Henglu Wei, Zhenyi Liu, Xiaoyu Wang, and Xiangyang Ji. Synfog: A photo- realistic synthetic fog dataset based on end-to-end imaging simulation for advancing real-world defogging in autonomous driving. InProceedings of the IEEE/CVF Conference on Computer Vision and Pattern Recognition, pages 21763–21772, 2024

  44. [44]

    Video dehazing via a multi-range temporal alignment network with physical prior, 2023

    Jiaqi Xu, Xiaowei Hu, Lei Zhu, Qi Dou, Jifeng Dai, Yu Qiao, and Pheng-Ann Heng. Video dehazing via a multi-range temporal alignment network with physical prior, 2023. URL https://arxiv.org/abs/2303.09757

  45. [45]

    Canqian Yang, Meiguang Jin, Xu Jia, Yi Xu, and Ying Chen. Adaint: Learning adaptive intervals for 3d lookup tables on real-time image enhancement.2022 IEEE/CVF Conference on Computer Vision and Pattern Recognition (CVPR), pages 17501–17510, 2022

  46. [46]

    Seplut: Separable image-adaptive lookup tables for real-time image enhancement

    Canqian Yang, Meiguang Jin, Yi Xu, Rui Zhang, Ying Chen, and Huaida Liu. Seplut: Separable image-adaptive lookup tables for real-time image enhancement. InEuropean Conference on Computer Vision, pages 201–217. Springer, 2022

  47. [47]

    Depth anything v2.Advances in Neural Information Processing Systems, 37:21875– 21911, 2024

    Lihe Yang, Bingyi Kang, Zilong Huang, Zhen Zhao, Xiaogang Xu, Jiashi Feng, and Hengshuang Zhao. Depth anything v2.Advances in Neural Information Processing Systems, 37:21875– 21911, 2024. 12

  48. [48]

    Video adverse-weather-component suppression network via weather messenger and adversarial backpropagation

    Yijun Yang, Angelica I Aviles-Rivero, Huazhu Fu, Ye Liu, Weiming Wang, and Lei Zhu. Video adverse-weather-component suppression network via weather messenger and adversarial backpropagation. InProceedings of the IEEE/CVF International Conference on Computer Vision, pages 13200–13210, 2023

  49. [49]

    Learning image-adaptive 3d lookup tables for high performance photo enhancement in real-time.IEEE Transactions on Pattern Analysis and Machine Intelligence, 44:2058–2073, 2020

    Huiyu Zeng, Jianrui Cai, Lida Li, Zisheng Cao, and Lei Zhang. Learning image-adaptive 3d lookup tables for high performance photo enhancement in real-time.IEEE Transactions on Pattern Analysis and Machine Intelligence, 44:2058–2073, 2020

  50. [50]

    In: Proceedings of the IEEE Conference on Computer Vision and Pattern Recognition (CVPR)

    Richard Zhang, Phillip Isola, Alexei A. Efros, Eli Shechtman, and Oliver Wang. The unreasonable effectiveness of deep features as a perceptual metric. In2018 IEEE/CVF Conference on Computer Vision and Pattern Recognition, pages 586–595, 2018. doi: 10.1109/CVPR.2018.00068

  51. [51]

    2021 , url =

    Xinyi Zhang, Hang Dong, Jinshan Pan, Chao Zhu, Ying Tai, Chengjie Wang, Jilin Li, Feiyue Huang, and Fei Wang. Learning to restore hazy video: A new real-world dataset and a new method. In2021 IEEE/CVF Conference on Computer Vision and Pattern Recognition (CVPR), pages 9235–9244, 2021. doi: 10.1109/CVPR46437.2021.00912

  52. [52]

    Adaptive spatiotemporal par- titioning for efficient video dehazing: Adaptive spatiotemporal partitioning for efficient video dehazing.Vis

    Wang Zhen, Liu Yanli, Xing Guanyu, and Wei Housheng. Adaptive spatiotemporal par- titioning for efficient video dehazing: Adaptive spatiotemporal partitioning for efficient video dehazing.Vis. Comput., 41(14):12055–12070, August 2025. ISSN 0178-2789. doi: 10.1007/s00371-025-04144-9. URLhttps://doi.org/10.1007/s00371-025-04144-9

  53. [53]

    2021 , url =

    Zhuoran Zheng, Wenqi Ren, Xiaochun Cao, Xiaobin Hu, Tao Wang, Fenglong Song, and Xiuyi Jia. Ultra-high-definition image dehazing via multi-guided bilateral learning. In2021 IEEE/CVF Conference on Computer Vision and Pattern Recognition (CVPR), pages 16180–16189, 2021. doi: 10.1109/CVPR46437.2021.01592. A Proof of Proposition 1 We restate the result for co...