pith. sign in

arxiv: 2605.21079 · v2 · pith:I2EJYVKYnew · submitted 2026-05-20 · 💻 cs.CV

VDFP: Video Deflickering with Flicker-banding Priors

Pith reviewed 2026-05-22 09:45 UTC · model grok-4.3

classification 💻 cs.CV
keywords video deflickeringflicker bandingrolling shutterdegradation fieldscreen capturevideo restorationtemporal consistencyDeViD dataset
0
0 comments X

The pith

VDFP removes complex banding from smartphone recordings of digital screens by synthesizing realistic flicker patterns and tracking gradual luminance changes.

A machine-rendered reading of the paper's core claim, the machinery that carries it, and where it could break.

The paper sets out to fix severe periodic brightness fluctuations that appear when phones capture screens, a problem that breaks existing video restoration techniques and leaves artifacts or blurs textures. It does this by first building the DeViD collection of real screen recordings and then introducing a generation model whose degradation field step recreates the multi-banding that rolling shutters produce. A continuous prior perception module, trained with a flicker-aware error, replaces binary detection so the model can follow smooth brightness transitions across space and time. If the approach holds, common screen videos could be cleaned up while retaining sharp details and frame-to-frame smoothness, something prior methods have not achieved.

Core claim

VDFP constructs the DeViD real-world dataset and introduces Degradation Field Modeling based on the rolling shutter mechanism to synthesize complex multi-banding scenarios, paired with spatial-temporal continuous prior perception optimized via Flicker-Aware Mean Squared Error; zero-initializing an augmented input layer lets the model keep pre-trained generative priors while restoring videos that eliminate banding, maintain high-fidelity spatial details, and achieve temporal consistency, outperforming prior restoration methods.

What carries the argument

Degradation Field Modeling (DFM) that generates multi-banding from rolling shutter mismatches, together with spatial-temporal continuous prior perception (CPP) that learns luminance transitions through Flicker-Aware Mean Squared Error instead of binary segmentation.

If this is right

  • Screen-captured videos can be restored without residual periodic artifacts or texture over-smoothing.
  • Spatial details stay sharp while temporal consistency across frames is preserved.
  • Pre-trained generative models retain their knowledge when an extra input layer is added and zero-initialized.
  • Complex multi-banding cases become tractable through explicit rolling-shutter synthesis rather than generic noise models.

Where Pith is reading between the lines

These are editorial extensions of the paper, not claims the author makes directly.

  • The DeViD dataset could become a standard testbed for any future work on periodic artifact removal in mobile video.
  • The same degradation modeling idea might transfer to other hardware-synchronization issues such as rolling-shutter distortion in fast motion.
  • Wider adoption could improve quality in screen-sharing recordings and digital preservation of displayed content.

Load-bearing premise

The rolling shutter degradation field can generate synthetic banding patterns that closely match the structured fluctuations seen in actual smartphone screen captures.

What would settle it

Running VDFP on a fresh collection of real phone-recorded screen videos and observing that visible banding remains or that fine spatial textures are lost or smoothed would show the method does not fully solve the problem.

Figures

Figures reproduced from arXiv: 2605.21079 by Libo Zhu, Xiaokang Yang, Yulun Zhang, Zhiyi Zhou, Zihan Zhou.

Figure 1
Figure 1. Figure 1: Overview of our datasets and comparisons. The upper-left part shows our real-world dataset [PITH_FULL_IMAGE:figures/full_fig_p001_1.png] view at source ↗
Figure 2
Figure 2. Figure 2: Visual comparison between VDFP and other methods trained on our real-world dataset [PITH_FULL_IMAGE:figures/full_fig_p002_2.png] view at source ↗
Figure 3
Figure 3. Figure 3: Overview of the data acquisition and alignment of DeViD (left) and examples of some [PITH_FULL_IMAGE:figures/full_fig_p003_3.png] view at source ↗
Figure 4
Figure 4. Figure 4: Overview of our proposed model (VDFP). The first stage is the training process of the [PITH_FULL_IMAGE:figures/full_fig_p004_4.png] view at source ↗
Figure 5
Figure 5. Figure 5: Overview of our data simulation pipeline. [PITH_FULL_IMAGE:figures/full_fig_p005_5.png] view at source ↗
Figure 6
Figure 6. Figure 6: Overview of results of our simulation pipeline. The row and column respectively mean [PITH_FULL_IMAGE:figures/full_fig_p006_6.png] view at source ↗
Figure 7
Figure 7. Figure 7: Examples of predicted confidence maps of our real-world dataset (DeViD). Colors in the [PITH_FULL_IMAGE:figures/full_fig_p006_7.png] view at source ↗
Figure 8
Figure 8. Figure 8: Spatio-temporal kinematic tracking and phase alignment. (a-b) Y-T spatio-temporal slices, [PITH_FULL_IMAGE:figures/full_fig_p007_8.png] view at source ↗
Figure 9
Figure 9. Figure 9: Visual comparison between VDFP and other methods on our real-world dataset DeViD. [PITH_FULL_IMAGE:figures/full_fig_p008_9.png] view at source ↗
Figure 10
Figure 10. Figure 10: Visual comparisons of ablation study. The figures show three sets of examples, and from [PITH_FULL_IMAGE:figures/full_fig_p009_10.png] view at source ↗
Figure 11
Figure 11. Figure 11: More examples of our real-world dataset (DeViD) in various scenes, such as sports, [PITH_FULL_IMAGE:figures/full_fig_p014_11.png] view at source ↗
Figure 12
Figure 12. Figure 12: More visual comparison between VDFP and other methods on our real-world dataset [PITH_FULL_IMAGE:figures/full_fig_p017_12.png] view at source ↗
Figure 13
Figure 13. Figure 13: More visual comparisons of ablation study. The figures show three sets of examples, and [PITH_FULL_IMAGE:figures/full_fig_p018_13.png] view at source ↗
read the original abstract

Capturing digital screens with smartphones frequently induces severe banding due to hardware synchronization mismatches. Existing video restoration methods struggle with these structured, periodic luminance fluctuations, often resulting in residual artifacts or over-smoothed textures. We firstly construct DeViD, a real-world dataset in various scenes to deal with the lack of available datasets. Then we propose VDFP (Video Deflickering with Flicker-banding Priors), a novel perception-guided generation framework. First, we introduce a Degradation Field Modeling Based on Rolling Shutter Mechanism (DFM) capable of synthesizing complex multi-banding scenarios. Second, we present a spatial-temporal continuous prior perception (CPP). Unlike traditional binary segmentation, this module is optimized via a Flicker-Aware Mean Squared Error (FA-MSE) to capture the luminance transitions. By zero-initializing an augmented input layer, our model preserves pre-trained generative priors as well as spatial-temporal prior perception. Extensive experiments demonstrate that VDFP significantly outperforms other methods, eliminating complex banding with high-fidelity spatial details and temporal consistency. Our dataset and code will be released at https://github.com/ZhiyiZZhou/VDFP.

Editorial analysis

A structured set of objections, weighed in public.

Desk editor's note, referee report, simulated authors' rebuttal, and a circularity audit. Tearing a paper down is the easy half of reading it; the pith above is the substance, this is the friction.

Referee Report

2 major / 2 minor

Summary. The manuscript proposes VDFP for deflickering videos with flicker-banding from digital screen captures using smartphones. It first builds the DeViD dataset for real-world scenes. The method includes Degradation Field Modeling (DFM) based on rolling shutter to synthesize multi-banding, spatial-temporal continuous prior perception (CPP) optimized with Flicker-Aware MSE for luminance transitions, and zero-initialization to keep pre-trained priors. It claims extensive experiments show significant outperformance over other methods in removing banding with high fidelity and consistency.

Significance. This work tackles a common practical issue in video processing. The new dataset and the continuous prior approach could be impactful if properly validated. Credit is given for releasing the dataset and code, and for attempting to use pre-trained models efficiently.

major comments (2)
  1. [Abstract] The central claim that VDFP 'significantly outperforms other methods' is not backed by any numerical results, ablations, or statistical analysis in the text. This makes it impossible to evaluate the strength of the contribution without the full experimental details.
  2. [DFM and CPP sections] The assumption that DFM accurately synthesizes real-world multi-banding is not validated quantitatively against DeViD statistics (e.g., no comparison of periodicity or spatial characteristics). This is load-bearing as the training of CPP relies on these synthetic priors, potentially leading to domain shift in real applications.
minor comments (2)
  1. [Overall] The manuscript would benefit from clearer definitions of terms like 'Flicker-banding Priors' and how they are incorporated.
  2. [References] Ensure all related work on video deflickering and rolling shutter modeling is cited.

Simulated Author's Rebuttal

2 responses · 0 unresolved

We thank the referee for the constructive feedback on our manuscript. We have addressed each major comment below and revised the manuscript accordingly to strengthen the presentation and validation of our contributions.

read point-by-point responses
  1. Referee: [Abstract] The central claim that VDFP 'significantly outperforms other methods' is not backed by any numerical results, ablations, or statistical analysis in the text. This makes it impossible to evaluate the strength of the contribution without the full experimental details.

    Authors: We appreciate this observation. The experimental section of the manuscript (Section 4) contains quantitative comparisons, ablation studies, and statistical analyses (including PSNR, SSIM, and user-study results) demonstrating the outperformance. To directly support the abstract claim, we have revised the abstract to include a brief summary of key quantitative improvements and explicit references to the detailed results and ablations in Section 4. revision: yes

  2. Referee: [DFM and CPP sections] The assumption that DFM accurately synthesizes real-world multi-banding is not validated quantitatively against DeViD statistics (e.g., no comparison of periodicity or spatial characteristics). This is load-bearing as the training of CPP relies on these synthetic priors, potentially leading to domain shift in real applications.

    Authors: We agree that quantitative validation of the DFM synthesis against DeViD is important given its role in training CPP. In the revised manuscript, we have added a new analysis subsection that compares the periodicity (via frequency spectra) and spatial characteristics (via banding pattern distributions and statistics) of DFM-synthesized degradations to real samples from the DeViD dataset. This addition directly addresses concerns about domain shift. revision: yes

Circularity Check

0 steps flagged

No significant circularity in the method derivation

full rationale

The paper constructs the DeViD real-world dataset, introduces DFM to synthesize multi-banding degradations from a rolling-shutter physical model, defines CPP as a continuous prior-perception module trained with the custom FA-MSE loss, and preserves external pre-trained generative priors via zero-initialization of an input layer. These steps form an independent modeling and training pipeline whose outputs are evaluated on held-out real captures; no equation or claim reduces by construction to a fitted parameter, self-referential definition, or load-bearing self-citation chain within the paper itself.

Axiom & Free-Parameter Ledger

0 free parameters · 1 axioms · 1 invented entities

The central claim rests on the fidelity of the rolling-shutter-based synthesis and the effectiveness of the new perception module; these are introduced without independent external validation in the abstract. Pre-trained generative priors are treated as given from prior literature.

axioms (1)
  • domain assumption Rolling shutter mechanism in cameras produces the observed banding patterns in screen captures
    Invoked to justify the DFM synthesis of training data.
invented entities (1)
  • Degradation Field no independent evidence
    purpose: To model and synthesize complex multi-banding scenarios from rolling shutter effects
    New construct introduced for data generation in the absence of sufficient real paired data.

pith-pipeline@v0.9.0 · 5744 in / 1267 out tokens · 47115 ms · 2026-05-22T09:45:33.725550+00:00 · methodology

discussion (0)

Sign in with ORCID, Apple, or X to comment. Anyone can read and Pith papers without signing in.

Lean theorems connected to this paper

Citations machine-checked in the Pith Canon. Every link opens the source theorem in the public Lean library.

What do these tags mean?
matches
The paper's claim is directly supported by a theorem in the formal canon.
supports
The theorem supports part of the paper's argument, but the paper may add assumptions or extra steps.
extends
The paper goes beyond the formal theorem; the theorem is a base layer rather than the whole result.
uses
The paper appears to rely on the theorem as machinery.
contradicts
The paper's claim conflicts with a theorem or certificate in the canon.
unclear
Pith found a possible connection, but the passage is too broad, indirect, or ambiguous to say the theorem truly supports the claim.

Reference graph

Works this paper leans on

47 extracted references · 47 canonical work pages · 1 internal anchor

  1. [1]

    Align your latents: High-resolution video synthesis with latent diffusion models

    Andreas Blattmann, Robin Rombach, Huan Ling, Tim Dockhorn, Seung Wook Kim, Sanja Fidler, and Karsten Kreis. Align your latents: High-resolution video synthesis with latent diffusion models. InCVPR, 2023

  2. [2]

    Dove: Efficient one-step diffusion model for real-world video super-resolution

    Zheng Chen, Zichen Zou, Kewei Zhang, Xiongfei Su, Xin Yuan, Yong Guo, and Yulun Zhang. Dove: Efficient one-step diffusion model for real-world video super-resolution. InNeurIPS, 2025

  3. [3]

    Spatiotem- poral blind-spot network with calibrated flow alignment for self-supervised video denoising

    Zikang Chen, Tao Jiang, Xiaowan Hu, Wang Zhang, Huaqiu Li, and Haoqian Wang. Spatiotem- poral blind-spot network with calibrated flow alignment for self-supervised video denoising. In AAAI, 2025

  4. [4]

    Video demoireing with relation-based temporal consistency

    Peng Dai, Xin Yu, Lan Ma, Baoheng Zhang, Jia Li, Wenbo Li, Jiajun Shen, and Xiaojuan Qi. Video demoireing with relation-based temporal consistency. InCVPR, 2022

  5. [5]

    Image quality assessment: Unifying structure and texture similarity.TPAMI, 2022

    Keyan Ding, Kede Ma, Shiqi Wang, and Eero P Simoncelli. Image quality assessment: Unifying structure and texture similarity.TPAMI, 2022

  6. [6]

    Cogview2: faster and better text-to-image generation via hierarchical transformers

    Ming Ding, Wendi Zheng, Wenyi Hong, and Jie Tang. Cogview2: faster and better text-to-image generation via hierarchical transformers. InNeurIPS, 2022

  7. [7]

    Video demoireing using focused-defocused dual-camera system.TPAMI, 2025

    Xuan Dong, Xiangyuan Sun, Xia Wang, Jian Song, Ya Li, and Weixin Li. Video demoireing using focused-defocused dual-camera system.TPAMI, 2025

  8. [8]

    Woodhead Publishing, 2019

    Daniel Durini.High performance silicon imaging: Fundamentals and applications of CMOS and CCD sensors. Woodhead Publishing, 2019

  9. [9]

    Taming transformers for high-resolution image synthesis

    Patrick Esser, Robin Rombach, and Björn Ommer. Taming transformers for high-resolution image synthesis. InCVPR, 2021

  10. [10]

    Organic light-emitting diode (oled) technology: materials, devices and display technologies.Polymer international, 2006

    Bernard Geffroy, Philippe Le Roy, and Christophe Prat. Organic light-emitting diode (oled) technology: materials, devices and display technologies.Polymer international, 2006

  11. [11]

    Goodfellow, Jean Pouget-Abadie, Mehdi Mirza, Bing Xu, David Warde-Farley, Sherjil Ozair, Aaron Courville, and Yoshua Bengio

    Ian J. Goodfellow, Jean Pouget-Abadie, Mehdi Mirza, Bing Xu, David Warde-Farley, Sherjil Ozair, Aaron Courville, and Yoshua Bengio. Generative adversarial nets. InNeurIPS, 2014

  12. [12]

    Fhde2net: Full high definition demoireing network

    Bin He, Ce Wang, Boxin Shi, and Ling-Yu Duan. Fhde2net: Full high definition demoireing network. InECCV, 2020

  13. [13]

    Jonathan Ho, Tim Salimans, Alexey Gritsenko, William Chan, Mohammad Norouzi, and David J. Fleet. Video diffusion models. InNeurIPS, 2022

  14. [14]

    Alias-free generative adversarial networks

    Tero Karras, Miika Aittala, Samuli Laine, Erik Härkönen, Janne Hellsten, Jaakko Lehtinen, and Timo Aila. Alias-free generative adversarial networks. InNeurIPS, 2021

  15. [15]

    A style-based generator architecture for generative adversarial networks.TPAMI, 2021

    Tero Karras, Samuli Laine, and Timo Aila. A style-based generator architecture for generative adversarial networks.TPAMI, 2021

  16. [16]

    Musiq: Multi-scale image quality transformer

    Junjie Ke, Qifei Wang, Yilin Wang, Peyman Milanfar, and Feng Yang. Musiq: Multi-scale image quality transformer. InICCV, 2021

  17. [17]

    Generative adversarial networks

    Moez Krichen. Generative adversarial networks. InICCCNT, 2023

  18. [18]

    Learning blind video temporal consistency

    Wei-Sheng Lai, Jia-Bin Huang, Oliver Wang, Eli Shechtman, Ersin Yumer, and Ming-Hsuan Yang. Learning blind video temporal consistency. InECCV, 2018

  19. [19]

    Vmaf: The journey continues.Netflix Technology Blog, 25(1), 2018

    Zhi Li, Christos Bampis, Julie Novak, Anne Aaron, Kyle Swanson, Anush Moorthy, and JD Cock. Vmaf: The journey continues.Netflix Technology Blog, 25(1), 2018

  20. [20]

    Sora: A Review on Background, Technology, Limitations, and Opportunities of Large Vision Models

    Yixin Liu, Kai Zhang, Yuan Li, Zhiling Yan, Chujie Gao, Ruoxi Chen, Zhengqing Yuan, Yue Huang, Hanchi Sun, Jianfeng Gao, Lifang He, and Lichao Sun. Sora: A review on background, technology, limitations, and opportunities of large vision models.arXiv preprint arXiv:2402.17177, 2024. 10

  21. [21]

    Decoupled weight decay regularization

    Ilya Loshchilov and Frank Hutter. Decoupled weight decay regularization. InICLR, 2017

  22. [22]

    Meyers, editor.Encyclopedia of Physical Science and Technology

    Robert A. Meyers, editor.Encyclopedia of Physical Science and Technology. Academic Press, third edition, 2001

  23. [23]

    No-reference image quality assessment in the spatial domain.TIP, 2012

    Anish Mittal, Anush Krishna Moorthy, and Alan Conrad Bovik. No-reference image quality assessment in the spatial domain.TIP, 2012

  24. [24]

    Murugesakumar, Aakash S, A Arunmanikandan, M Chithra, and M Dhasaranjan

    B. Murugesakumar, Aakash S, A Arunmanikandan, M Chithra, and M Dhasaranjan. U-net convolutional networks for real-time biomedical image segmentation and anomaly detection. In ICONSTEM, 2025

  25. [25]

    Fpanet: Frequency-based video demoireing using frame-level post alignment

    Gyeongrok Oh, Sungjune Kim, Heon Gu, Sang Ho Yoon, Jinkyu Kim, and Sangpil Kim. Fpanet: Frequency-based video demoireing using frame-level post alignment. InNeural Networks, 2025

  26. [26]

    Burstdeflicker: A benchmark dataset for flicker removal in dynamic scenes

    Lishen Qu, Zhihao Liu, Shihao Zhou, Yaqi Luo, Jie Liang, Hui Zeng, Lei Zhang, and Jufeng Yang. Burstdeflicker: A benchmark dataset for flicker removal in dynamic scenes. InNeurIPS, 2025

  27. [27]

    It takes two: A duet of periodicity and directionality for burst flicker removal

    Lishen Qu, Shihao Zhou, Jie Liang, Hui Zeng, Lei Zhang, and Jufeng Yang. It takes two: A duet of periodicity and directionality for burst flicker removal. InCVPR, 2026

  28. [28]

    Zero-shot text-to-image generation

    Aditya Ramesh, Mikhail Pavlov, Gabriel Goh, Scott Gray, Chelsea V oss, Alec Radford, Mark Chen, and Ilya Sutskever. Zero-shot text-to-image generation. InICML, 2021

  29. [29]

    One-step diffusion for detail-rich and temporally consistent video super-resolution

    Yujing Sun, Lingchen Sun, Shuaizheng Liu, Rongyuan Wu, Zhengqiang Zhang, and Lei Zhang. One-step diffusion for detail-rich and temporally consistent video super-resolution. InNeurlPS, 2025

  30. [30]

    Mocha-former: Moiré- conditioned hybrid adaptive transformer for video demoiréing.Neurocomputing, 2025

    Jeahun Sung, Changhyun Roh, Chanho Eom, and Jihyong Oh. Mocha-former: Moiré- conditioned hybrid adaptive transformer for video demoiréing.Neurocomputing, 2025

  31. [31]

    Detail-revealing deep video super-resolution

    Xin Tao, Hongyun Gao, Renjie Liao, Jue Wang, and Jiaya Jia. Detail-revealing deep video super-resolution. InICCV, 2017

  32. [32]

    Exploring clip for assessing the look and feel of images

    Jianyi Wang, Kelvin CK Chan, and Chen Change Loy. Exploring clip for assessing the look and feel of images. InAAAI, 2023

  33. [33]

    Image quality assessment: from error visibility to structural similarity.TIP, 2004

    Zhou Wang, Alan C Bovik, Hamid R Sheikh, and Eero P Simoncelli. Image quality assessment: from error visibility to structural similarity.TIP, 2004

  34. [34]

    Exploring video quality assessment on user generated contents from aesthetic and technical perspectives

    Haoning Wu, Erli Zhang, Liang Liao, Chaofeng Chen, Jingwen Hou Hou, Annan Wang, Wenxiu Sun Sun, Qiong Yan, and Weisi Lin. Exploring video quality assessment on user generated contents from aesthetic and technical perspectives. InICCV, 2023

  35. [35]

    Star: Spatial-temporal augmentation with text-to-video models for real-world video super-resolution

    Rui Xie, Yinhong Liu, Penghao Zhou, Chen Zhao, Jun Zhou, Kai Zhang, Zhenyu Zhang, Jian Yang, Zhenheng Yang, and Ying Tai. Star: Spatial-temporal augmentation with text-to-video models for real-world video super-resolution. InICCV, 2025

  36. [36]

    A survey on video diffusion models.arXiv preprint arXiv:2310.10647, 2024

    Zhen Xing, Qijun Feng, Haoran Chen, Qi Dai, Han Hu, Hang Xu, Zuxuan Wu, and Yu-Gang Jiang. A survey on video diffusion models.arXiv preprint arXiv:2310.10647, 2024

  37. [37]

    Alignment- free raw video demoireing.arXiv preprint arXiv:2408.10679, 2025

    Shuning Xu, Xina Liu, Binbin Song, Xiangyu Chen, Qiubo Chen, and Jiantao Zhou. Alignment- free raw video demoireing.arXiv preprint arXiv:2408.10679, 2025

  38. [38]

    Direction-aware video demoireing with temporal-guided bilateral learning.arXiv preprint arXiv:2308.13388, 2023

    Shuning Xu, Binbin Song, Xiangyu Chen, and Jiantao Zhou. Direction-aware video demoireing with temporal-guided bilateral learning.arXiv preprint arXiv:2308.13388, 2023

  39. [39]

    Dsdnet: Raw domain demoiréing via dual color-space synergy

    Qirui Yang, Fangpu Zhang, Yeying Jin, Qihua Cheng, Pengtao Jiang, Huanjing Yue, and Jingyu Yang. Dsdnet: Raw domain demoiréing via dual color-space synergy. InACM MM, 2025

  40. [40]

    Progressive fusion video super-resolution network via exploiting non-local spatio-temporal correlations

    Peng Yi, Zhongyuan Wang, Kui Jiang, Junjun Jiang, and Jiayi Ma. Progressive fusion video super-resolution network via exploiting non-local spatio-temporal correlations. InICCV, 2019. 11

  41. [41]

    Towards efficient and scale-robust ultra-high-definition image demoiréing

    Xin Yu, Peng Dai, Wenbo Li, Lan Ma, Jiajun Shen, Jia Li, and Xiaojuan Qi. Towards efficient and scale-robust ultra-high-definition image demoiréing. InECCV, 2022

  42. [42]

    Recaptured raw screen image and video demoireing via channel and spatial modulations.arXiv preprint arXiv:2310.20332, 2023

    Huanjing Yue, Yijia Cheng, Xin Liu, and Jingyu Yang. Recaptured raw screen image and video demoireing via channel and spatial modulations.arXiv preprint arXiv:2310.20332, 2023

  43. [43]

    Video-swinunet: Spatio-temporal deep learning framework for vfss instance segmentation.arXiv preprint arXiv:2302.11325, 2023

    Chengxi Zeng, Xinyu Yang, David Smithard, Majid Mirmehdi, Alberto M Gambaruto, and Tilo Burghardt. Video-swinunet: Spatio-temporal deep learning framework for vfss instance segmentation.arXiv preprint arXiv:2302.11325, 2023

  44. [44]

    The unreason- able effectiveness of deep features as a perceptual metric

    Richard Zhang, Phillip Isola, Alexei A Efros, Eli Shechtman, and Oliver Wang. The unreason- able effectiveness of deep features as a perceptual metric. InCVPR, 2018

  45. [45]

    Compevent: Complex-valued event-rgb fusion for low-light video enhancement and deblurring

    Mingchen Zhong, Xin Lu, Dong Li, Senyan Xu, Ruixuan Jiang, Xueyang Fu, and Baocai Yin. Compevent: Complex-valued event-rgb fusion for low-light video enhancement and deblurring. InAAAI, 2026

  46. [46]

    Upscale-A- Video: Temporal-consistent diffusion model for real-world video super-resolution

    Shangchen Zhou, Peiqing Yang, Jianyi Wang, Yihang Luo, and Chen Change Loy. Upscale-A- Video: Temporal-consistent diffusion model for real-world video super-resolution. InCVPR, 2024

  47. [47]

    Rifle: Removal of image flicker-banding via latent diffusion enhancement.arXiv preprint arXiv:2509.24644, 2025

    Libo Zhu, Zihan Zhou, Xiaoyang Liu, Weihang Zhang, Keyu Shi, Yifan Fu, and Yulun Zhang. Rifle: Removal of image flicker-banding via latent diffusion enhancement.arXiv preprint arXiv:2509.24644, 2025. 12 A Technical Appendices and Supplementary Materials A.1 Mechanism of Flicker-banding Artifact Formation Flicker-banding is fundamentally a visual manifesta...