pith. machine review for the scientific record. sign in

arxiv: 2604.16177 · v2 · submitted 2026-04-17 · 💻 cs.CV

Recognition: unknown

Winner of CVPR2026 NTIRE Challenge on Image Shadow Removal: Semantic and Geometric Guidance for Shadow Removal via Cascaded Refinement

Authors on Pith no claims yet

Pith reviewed 2026-05-10 08:40 UTC · model grok-4.3

classification 💻 cs.CV
keywords wsrdchallengentireremovalshadowacrosscvpr2026earlier
0
0 comments X

The pith

A three-stage progressive refinement model guided by DINOv2 semantics and geometric depth/normals cues won the NTIRE 2026 image shadow removal challenge with top scores of 26.68 PSNR and 0.874 SSIM.

A machine-rendered reading of the paper's core claim, the machinery that carries it, and where it could break.

The method starts with a base image restoration network called OmniSR and turns shadow removal into a three-step process. Each later stage takes the output of the previous one and fixes remaining mistakes such as leftover shadow edges or color shifts. To help the network understand the scene, it adds two kinds of extra information that stay frozen during training: semantic features from the DINOv2 vision model that recognize objects and materials, and geometric signals from off-the-shelf monocular depth and surface normal estimators that describe 3D shape and orientation. A special loss term forces each stage to produce a smaller reconstruction error than the one before it, which keeps training stable. The team trains in phases, first on earlier shadow datasets, then on the new challenge data, and finally combines several checkpoints with cosine annealing to form an ensemble. On the official hidden test set this ensemble reached the highest ranking, and the same model also performed well on two other public shadow datasets.

Core claim

On the official WSRD+ 2026 hidden test set, our final ensemble achieved 26.680 PSNR, 0.8740 SSIM, 0.0578 LPIPS, and 26.135 FID, ranked first overall, and won the NTIRE 2026 Image Shadow Removal Challenge.

Load-bearing premise

That the frozen DINOv2 embeddings and monocular depth/normal estimates supply sufficiently accurate and task-relevant guidance, and that the contraction-constrained objective will reliably stabilize the cascade without capping final performance or causing under-fitting on complex shadows.

Figures

Figures reproduced from arXiv: 2604.16177 by Filip Svoboda, Jasmin Lampert, Jules Salzinger, Lorenzo Beltrame, Marco K\"orner, Phillipp Fanta-Jende, Radu Timofte.

Figure 1
Figure 1. Figure 1: Qualitative comparison on validation samples from [PITH_FULL_IMAGE:figures/full_fig_p001_1.png] view at source ↗
Figure 2
Figure 2. Figure 2: Proposed three-stage OmniSR pipeline. The shadowed RGB input [PITH_FULL_IMAGE:figures/full_fig_p003_2.png] view at source ↗
Figure 3
Figure 3. Figure 3: Representative qualitative results on the WSRD+ 2026 [PITH_FULL_IMAGE:figures/full_fig_p007_3.png] view at source ↗
Figure 4
Figure 4. Figure 4: Stage-wise visualizations on the WSRD+ validation set. [PITH_FULL_IMAGE:figures/full_fig_p007_4.png] view at source ↗
read the original abstract

We present a three-stage progressive shadow-removal pipeline for the CVPR2026 NTIRE WSRD+ challenge. Built on OmniSR, our method treats deshadowing as iterative direct refinement, where later stages correct residual artefacts left by earlier predictions. The model combines RGB appearance with frozen DINOv2 semantic guidance and geometric cues from monocular depth and surface normals, reused across all stages. To stabilise multi-stage optimisation, we introduce a contraction-constrained objective that encourages non-increasing reconstruction error across the cascade. A staged training pipeline transfers from earlier WSRD pretraining to WSRD+ supervision and final WSRD+ 2026 adaptation with cosine-annealed checkpoint ensembling. On the official WSRD+ 2026 hidden test set, our final ensemble achieved 26.680 PSNR, 0.8740 SSIM, 0.0578 LPIPS, and 26.135 FID, ranked first overall, and won the NTIRE 2026 Image Shadow Removal Challenge. The strong performance of the proposed model is further validated on the ISTD+ and UAV-SC+ datasets.

Editorial analysis

A structured set of objections, weighed in public.

Desk editor's note, referee report, simulated authors' rebuttal, and a circularity audit. Tearing a paper down is the easy half of reading it; the pith above is the substance, this is the friction.

Axiom & Free-Parameter Ledger

1 free parameters · 2 axioms · 0 invented entities

The claim depends on the transferability of pre-trained vision models and standard deep-learning training assumptions rather than new theoretical derivations.

free parameters (1)
  • contraction constraint strength
    A scalar or weighting term in the multi-stage loss that enforces non-increasing error; its exact value is not stated in the abstract.
axioms (2)
  • domain assumption Frozen DINOv2 features supply useful semantic context for shadow removal
    The method re-uses DINOv2 without adaptation, presupposing that its ImageNet-era representations remain informative for deshadowing.
  • domain assumption Monocular depth and normal estimates are sufficiently accurate to guide shadow removal
    Off-the-shelf estimators are treated as reliable geometric oracles without reported error analysis or fine-tuning.

pith-pipeline@v0.9.0 · 5539 in / 1330 out tokens · 103105 ms · 2026-05-10T08:40:22.101083+00:00 · methodology

discussion (0)

Sign in with ORCID, Apple, or X to comment. Anyone can read and Pith papers without signing in.

Reference graph

Works this paper leans on

43 extracted references · 3 canonical work pages · 2 internal anchors

  1. [1]

    The shadow effect on surface biophysical variables derived from remote sensing: a review.Land, 11(11):2025, 2022

    Seyed Kazem Alavipanah, Mohammad Karimi Firozjaei, Amir Sedighi, Solmaz Fathololoumi, Saeid Zare Naghadehi, Samiraalsadat Saleh, Maryam Naghdizadegan, Zinat Gomeh, Jamal Jokar Arsanjani, Mohsen Makki, et al. The shadow effect on surface biophysical variables derived from remote sensing: a review.Land, 11(11):2025, 2022. 1

  2. [2]

    A comprehensive review of vehicle detection techniques under varying moving cast shadow conditions using computer vision and deep learning

    Muhammad Umair Arif, Muhammad Umar Farooq, Rana Hammad Raza, Zain Ul Abideen Lodhi, and Muham- mad Abdur Rehman Hashmi. A comprehensive review of vehicle detection techniques under varying moving cast shadow conditions using computer vision and deep learning. IEEE Access, 10:104863–104886, 2022. 1

  3. [3]

    Rmmamba: Ran- domized mamba for remote sensing shadow removal.IEEE Transactions on Geoscience and Remote Sensing, 2025

    Jun Chu, Kaichen Chi, and Qi Wang. Rmmamba: Ran- domized mamba for remote sensing shadow removal.IEEE Transactions on Geoscience and Remote Sensing, 2025. 5, 8

  4. [4]

    Detecting moving objects, ghosts, and shad- ows in video streams.IEEE Transactions on Pattern Anal- ysis and Machine Intelligence, 25(10):1337–1342, 2003

    Rita Cucchiara, Costantino Grana, Massimo Piccardi, and Andrea Prati. Detecting moving objects, ghosts, and shad- ows in video streams.IEEE Transactions on Pattern Anal- ysis and Machine Intelligence, 25(10):1337–1342, 2003. 1, 2

  5. [5]

    Auto- exposure fusion for single-image shadow removal

    Lan Fu, Changqing Zhou, Qing Guo, Felix Juefei-Xu, Hongkai Yu, Wei Feng, Yang Liu, and Song Wang. Auto- exposure fusion for single-image shadow removal. InPro- ceedings of the IEEE/CVF conference on computer vision and pattern recognition, pages 10571–10580, 2021. 8

  6. [6]

    Interactive shadow removal and ground truth for variable scene categories

    Han Gong and DP Cosker. Interactive shadow removal and ground truth for variable scene categories. InBMVC 2014- Proceedings of the British Machine Vision Conference 2014,

  7. [7]

    Shadowformer: Global context helps image shadow removal

    Lanqing Guo, Siyu Huang, Ding Liu, Hao Cheng, and Bihan Wen. Shadowformer: Global context helps image shadow removal. InAAAI, 2023. 1, 2, 8

  8. [8]

    Shadowd- iffusion: When degradation prior meets diffusion model for shadow removal

    Lanqing Guo, Chong Wang, Wenhan Yang, Siyu Huang, Yufei Wang, Hanspeter Pfister, and Bihan Wen. Shadowd- iffusion: When degradation prior meets diffusion model for shadow removal. InCVPR, pages 14049–14058, 2023. 1, 2, 8

  9. [9]

    Mask-shadowgan: Learning to remove shadows from unpaired data

    Xiaowei Hu, Yitong Jiang, Chi-Wing Fu, and Pheng-Ann Heng. Mask-shadowgan: Learning to remove shadows from unpaired data. InProceedings of the IEEE/CVF interna- tional conference on computer vision, pages 2472–2481,

  10. [10]

    Learning from syn- thetic shadows for shadow detection and removal.IEEE Transactions on Circuits and Systems for Video Technology, 31(11):4187–4197, 2020

    Naoto Inoue and Toshihiko Yamasaki. Learning from syn- thetic shadows for shadow detection and removal.IEEE Transactions on Circuits and Systems for Video Technology, 31(11):4187–4197, 2020. 2

  11. [11]

    Re- thinking fid: Towards a better evaluation metric for image generation

    Sadeep Jayasumana, Srikumar Ramalingam, Andreas Veit, Daniel Glasner, Ayan Chakrabarti, and Sanjiv Kumar. Re- thinking fid: Towards a better evaluation metric for image generation. InIEEE/CVF Conference on Computer Vision and Pattern Recognition (CVPR), pages 9307–9315. IEEE,

  12. [12]

    Dc- shadownet: Single-image hard and soft shadow removal us- ing unsupervised domain-classifier guided network

    Yeying Jin, Aashish Sharma, and Robby T Tan. Dc- shadownet: Single-image hard and soft shadow removal us- ing unsupervised domain-classifier guided network. InPro- ceedings of the IEEE/CVF international conference on com- puter vision, pages 5027–5036, 2021. 8

  13. [13]

    Des3: Adaptive attention-driven self and soft shadow removal using vit similarity

    Yeying Jin, Wei Ye, Wenhan Yang, Yuan Yuan, and Robby T Tan. Des3: Adaptive attention-driven self and soft shadow removal using vit similarity. InProceedings of the AAAI Conference on Artificial Intelligence, pages 2634–2642,

  14. [14]

    Shadow removal via shadow image decomposition

    Hieu Le and Dimitris Samaras. Shadow removal via shadow image decomposition. InICCV, pages 8578–8587, 2019. 2

  15. [15]

    Regional atten- tion for shadow removal

    Hengxing Liu, Mingjia Li, and Xiaojie Guo. Regional atten- tion for shadow removal. InProceedings of the 32nd ACM International Conference on Multimedia, pages 5949–5957,

  16. [16]

    A decoupled multi-task network for shadow removal.IEEE Transactions on Multimedia, 25: 9449–9463, 2023

    Jiawei Liu, Qiang Wang, Huijie Fan, Wentao Li, Liangqiong Qu, and Yandong Tang. A decoupled multi-task network for shadow removal.IEEE Transactions on Multimedia, 25: 9449–9463, 2023. 8

  17. [17]

    A shadow imaging bilinear model and three- branch residual network for shadow removal.IEEE Trans- actions on Neural Networks and Learning Systems, 35(11): 15857–15871, 2023

    Jiawei Liu, Qiang Wang, Huijie Fan, Jiandong Tian, and Yandong Tang. A shadow imaging bilinear model and three- branch residual network for shadow removal.IEEE Trans- actions on Neural Networks and Learning Systems, 35(11): 15857–15871, 2023. 8

  18. [18]

    Shadow removal by a lightness-guided network with training on unpaired data.IEEE Transactions on Image Pro- cessing, 30:1853–1865, 2021

    Zhihao Liu, Hui Yin, Yang Mi, Mengyang Pu, and Song Wang. Shadow removal by a lightness-guided network with training on unpaired data.IEEE Transactions on Image Pro- cessing, 30:1853–1865, 2021. 8

  19. [19]

    From shadow generation to shadow removal

    Zhihao Liu, Hui Yin, Xinyi Wu, Zhenyao Wu, Yang Mi, and Song Wang. From shadow generation to shadow removal. InProceedings of the IEEE/CVF conference on computer vi- sion and pattern recognition, pages 4927–4936, 2021. 8

  20. [20]

    An evolutionary shadow correction network and a benchmark uav dataset for remote sensing images.IEEE Transactions on Geoscience and Re- mote Sensing, 61:1–14, 2023

    Shuang Luo, Huifang Li, Yiqiu Li, Chenglin Shao, Huan- feng Shen, and Liangpei Zhang. An evolutionary shadow correction network and a benchmark uav dataset for remote sensing images.IEEE Transactions on Geoscience and Re- mote Sensing, 61:1–14, 2023. 2

  21. [21]

    Refusion: Enabling large-size realis- tic image restoration with latent-space diffusion models

    Ziwei Luo, Fredrik K Gustafsson, Zheng Zhao, Jens Sj¨olund, and Thomas B Sch ¨on. Refusion: Enabling large-size realis- tic image restoration with latent-space diffusion models. In Proceedings of the IEEE/CVF conference on computer vi- sion and pattern recognition, pages 1680–1691, 2023. 8

  22. [22]

    Kangfu Mei, Luis Figueroa, Zhe Lin, Zhihong Ding, Scott Cohen, and Vishal M. Patel. Latent feature-guided diffusion models for shadow removal. In2024 IEEE/CVF Winter Con- ference on Applications of Computer Vision (WACV), pages 4301–4310, 2024. 8

  23. [23]

    DINOv2: Learning Robust Visual Features without Supervision

    Maxime Oquab, Timothee Darcet, Theo Moutakanni, Huy V o, Marc Szafraniec, Vasil Khalidov, Pierre Fernandez, Daniel Haziza, Francisco Massa, Alaa El-Nouby, Mahmoud Assran, et al. Dinov2: Learning robust visual features with- out supervision.arXiv preprint arXiv:2304.07193, 2023. 2, 3

  24. [24]

    Reehorst and Philip Schniter

    Eric T. Reehorst and Philip Schniter. Regularization by de- noising: Clarifications and new interpretations.IEEE Trans- actions on Computational Imaging, 5(1):52–67, 2019. 2

  25. [25]

    The little engine that could: Regularization by denoising (red)

    Yaniv Romano, Michael Elad, and Peyman Milanfar. The little engine that could: Regularization by denoising (red). SIAM Journal on Imaging Sciences, 10(4):1804–1844, 2017

  26. [26]

    Plug-and-play methods provably converge with properly trained denoisers

    Ernest Ryu, Jialin Liu, Sicheng Wang, Xiaohan Chen, Zhangyang Wang, and Wotao Yin. Plug-and-play methods provably converge with properly trained denoisers. InICML, pages 5546–5557, 2019. 2

  27. [27]

    Near real- time shadow detection and removal in aerial motion imagery application.ISPRS Journal of photogrammetry and remote sensing, 140:104–121, 2018

    Guilherme F Silva, Grace B Carneiro, Ricardo Doth, Leonardo A Amaral, and Dario FG de Azevedo. Near real- time shadow detection and removal in aerial motion imagery application.ISPRS Journal of photogrammetry and remote sensing, 140:104–121, 2018. 8

  28. [28]

    From image denoisers to regularizing imaging inverse problems: An overview.arXiv preprint arXiv:2509.03475, 2025

    Hong Ye Tan, Subhadip Mukherjee, and Junqi Tang. From image denoisers to regularizing imaging inverse problems: An overview.arXiv preprint arXiv:2509.03475, 2025. 2

  29. [29]

    Ntire 2023 image shadow removal challenge report

    Florin-Alexandru Vasluianu, Tim Seizinger, and Radu Tim- ofte. Ntire 2023 image shadow removal challenge report. InProceedings of the IEEE/CVF Conference on Computer Vision and Pattern Recognition (CVPR) Workshops, pages 1788–1807, 2023. 3

  30. [30]

    Wsrd: A novel benchmark for high resolution image shadow removal

    Florin-Alexandru Vasluianu, Tim Seizinger, and Radu Tim- ofte. Wsrd: A novel benchmark for high resolution image shadow removal. InProceedings of the IEEE/CVF Con- ference on Computer Vision and Pattern Recognition, pages 1826–1835, 2023. 2

  31. [31]

    Ntire 2024 im- age shadow removal challenge report

    Florin-Alexandru Vasluianu, Tim Seizinger, Zhuyun Zhou, Zongwei Wu, Cailian Chen, Radu Timofte, Wei Dong, Han Zhou, Yuqiong Tian, Jun Chen, et al. Ntire 2024 im- age shadow removal challenge report. InProceedings of the IEEE/CVF conference on computer vision and pattern recognition, pages 6547–6570, 2024. 2, 4

  32. [32]

    NTIRE 2026 single im- age shadow removal challenge report

    Florin-Alexandru Vasluianu, Tim Seizinger, Zhuyun Zhou, Zongwei WU, Radu Timofte, et al. NTIRE 2026 single im- age shadow removal challenge report. InProceedings of the IEEE/CVF Conference on Computer Vision and Pattern Recognition (CVPR) Workshops, 2026. 5, 6

  33. [33]

    Venkatakrishnan, Charles A

    Singanallur V . Venkatakrishnan, Charles A. Bouman, and Brendt Wohlberg. Plug-and-play priors for model based re- construction. In2013 IEEE Global Conference on Signal and Information Processing, pages 945–948, 2013. 2

  34. [34]

    Stacked condi- tional generative adversarial networks for jointly learning shadow detection and shadow removal

    Jifeng Wang, Xiang Li, and Jian Yang. Stacked condi- tional generative adversarial networks for jointly learning shadow detection and shadow removal. InProceedings of the IEEE conference on computer vision and pattern recog- nition, pages 1788–1797, 2018. 1, 2, 8

  35. [35]

    Bovik, Hamid R

    Zhou Wang, Alan C. Bovik, Hamid R. Sheikh, and Eero P. Simoncelli. Image quality assessment: From error visibility tostructural similarity.IEEE Transactions on Image Process- ing (TIP), 13(4):600–612, 2004. 5

  36. [36]

    Homoformer: Homogenized trans- former for image shadow removal

    Jie Xiao, Xueyang Fu, Yurui Zhu, Dong Li, Jie Huang, Kai Zhu, and Zheng-Jun Zha. Homoformer: Homogenized trans- former for image shadow removal. InCVPR, pages 25617– 25626, 2024. 1, 2, 8

  37. [37]

    Omnisr: Shadow removal un- der direct and indirect lighting

    Jiamin Xu, Zelong Li, Yuxin Zheng, Chenyu Huang, Renshu Gu, Weiwei Xu, and Gang Xu. Omnisr: Shadow removal un- der direct and indirect lighting. InAAAI, pages 8887–8895,

  38. [38]

    Detail-preserving latent diffusion for stable shadow removal

    Jiamin Xu, Yuxin Zheng, Zelong Li, Chi Wang, Renshu Gu, Weiwei Xu, and Gang Xu. Detail-preserving latent diffusion for stable shadow removal. InProceedings of the IEEE/CVF Conference on Computer Vision and Pattern Recognition, pages 7592–7602, 2025. 5, 8

  39. [39]

    Depth Anything V2

    Lihe Yang, Bingyi Kang, Zilong Huang, Zhen Zhao, Xiao- gang Xu, Jiashi Feng, and Hengshuang Zhao. Depth any- thing v2.arXiv:2406.09414, 2024. 2, 3

  40. [40]

    Efros, Eli Shecht- man, and Oliver Wang

    Richard Zhang, Phillip Isola, Alexei A. Efros, Eli Shecht- man, and Oliver Wang. The unreasonable effectiveness of deep features as a perceptual metric. InIEEE/CVF Con- ference on Computer Vision and Pattern Recognition, pages 586–595, 2018. 4, 5

  41. [41]

    Improving shadow suppression for illumination robust face recognition.IEEE Transactions on Pattern Analysis and Machine Intelligence, 41(3):611–624, 2018

    Wuming Zhang, Xi Zhao, Jean-Marie Morvan, and Liming Chen. Improving shadow suppression for illumination robust face recognition.IEEE Transactions on Pattern Analysis and Machine Intelligence, 41(3):611–624, 2018. 1, 2

  42. [42]

    Bijective mapping network for shadow removal

    Yurui Zhu, Jie Huang, Xueyang Fu, Feng Zhao, Qibin Sun, and Zheng-Jun Zha. Bijective mapping network for shadow removal. InProceedings of the IEEE/CVF conference on computer vision and pattern recognition, pages 5627–5636,

  43. [43]

    Bidirectional resid- ual transformer for shadow removal

    Yurui Zhu, Jie Xiao, Xueyang Fu, et al. Bidirectional resid- ual transformer for shadow removal. InECCV, 2022. 1, 2