arxiv: 2604.18032 · v1 · submitted 2026-04-20 · 💻 cs.CV

Recognition: unknown

CFSR: Geometry-Conditioned Shadow Removal via Physical Disentanglement

Hang Wang, Pan Wang, Xiujin Liu, Yihao Hu

Authors on Pith no claims yet

Pith reviewed 2026-05-10 04:52 UTC · model grok-4.3

classification 💻 cs.CV

keywords shadow removalimage restorationgeometric priorsguided attentionphysics constraints3D normalsfrequency reconstructionfoundation model features

0 comments

The pith

CFSR removes shadows by conditioning restoration on 3D geometry and semantic priors to enforce physical lighting rules.

A machine-rendered reading of the paper's core claim, the machinery that carries it, and where it could break.

The paper introduces CFSR as a way to treat shadow removal not as free-form image translation but as a process guided by physical constraints. It combines estimated depth, surface normals, DINO features, and CLIP semantics inside a guided attention block and a frequency-aware decoder. The goal is to recover local textures without breaking global illumination consistency. Readers would care because unconstrained networks frequently leave inconsistent lighting or boundary artifacts in everyday photos.

Core claim

CFSR maps inputs to HVI color space, fuses depth priors, modulates attention affinity directly with DINO features and 3D normals, injects CLIP priors for degraded areas, and uses a Frequency Collaborative Reconstruction Module that separates high-frequency boundary recovery from low-frequency illumination restoration, all conditioned on geometric cues.

What carries the argument

Geometric & Semantic Dual Explicit Guided Attention mechanism that modulates the attention matrix with DINO features and 3D surface normals to embed physical lighting constraints into the network.

If this is right

Shadow removal can separate high-frequency occlusion edges from low-frequency illumination without separate post-processing stages.
Depth and normal priors can be injected early to reduce the need for heavy post-correction of lighting errors.
Frozen foundation-model encoders supply stable semantic context for regions where local evidence is destroyed by shadows.
Decoupled frequency reconstruction produces sharper boundaries while preserving smooth shading across the image.

Where Pith is reading between the lines

These are editorial extensions of the paper, not claims the author makes directly.

The same geometry-conditioned attention pattern could be tested on related restoration problems such as reflection removal or low-light enhancement.
If depth estimation improves, the method's performance gap to oracle 3D inputs would indicate how much the current results depend on accurate geometry.
Extending the frequency module to video frames might reveal whether the per-frame physical constraints remain stable across time.

Load-bearing premise

That fusing HVI-mapped observations with depth priors and modulating attention via DINO features plus 3D normals will reliably enforce physical lighting constraints and close the 2D-3D gap.

What would settle it

A test set of images containing physically inconsistent shadows (for example, multiple light directions that violate a single light source) where the output still shows mismatched illumination or new artifacts at boundaries.

Figures

Figures reproduced from arXiv: 2604.18032 by Hang Wang, Pan Wang, Xiujin Liu, Yihao Hu.

**Figure 1.** Figure 1: Conceptual comparison of shadow removal paradigms. Unlike existing deterministic mapping approaches (red dashed line) that suffer from out-ofdistribution artifacts, our method reframes shadow removal as a physics-constrained degradation inversion (green solid line), ensuring the restored image strictly adheres to the latent clean image manifold Mclean. lacks the physical interpretability required to bal… view at source ↗

**Figure 2.** Figure 2: Visual and spectral comparison against SOTA [ [PITH_FULL_IMAGE:figures/full_fig_p002_2.png] view at source ↗

**Figure 3.** Figure 3: The overall architecture of the proposed CFSR. Given a shadowed image, CFSR first extracts 3D geometric (Point [PITH_FULL_IMAGE:figures/full_fig_p004_3.png] view at source ↗

**Figure 4.** Figure 4: Qualitative comparisons on the ISTD+, WSRD+, and SRD datasets. Our CFSR effectively removes shadows while strictly [PITH_FULL_IMAGE:figures/full_fig_p007_4.png] view at source ↗

**Figure 5.** Figure 5: Visual ablation of multi-modal priors. Variant Class A) Class B PSNR ↑ SSIM ↑ PSNR ↑ SSIM ↑ w/o Both (Baseline) 17.86 0.687 21.53 0.916 w/o Semantic (DINO) 18.16 0.708 21.90 0.916 w/o Geometry 20.74 0.759 29.60 0.967 Full Model 20.50 0.760 29.85 0.968 [PITH_FULL_IMAGE:figures/full_fig_p008_5.png] view at source ↗

**Figure 6.** Figure 6: Feature impact of CLIP semantic injection on Am [PITH_FULL_IMAGE:figures/full_fig_p008_6.png] view at source ↗

read the original abstract

Traditional shadow removal networks often treat image restoration as an unconstrained mapping, lacking the physical interpretability required to balance localized texture recovery with global illumination consistency. To address this, we propose CFSR, a multi-modal prior-driven framework that reframes shadow removal as a physics-constrained restoration process. By seamlessly integrating 3D geometric cues with large-scale foundation model semantics, CFSR effectively bridges the 2D-3D domain gap. Specifically, we first map observations into a custom HVI color space to suppress shadow-induced noise and robustly fuse RGB data with estimated depth priors. At its core, our Geometric & Semantic Dual Explicit Guided Attention mechanism utilizes DINO features and 3D surface normals to directly modulate the attention affinity matrix, structurally enforcing physical lighting constraints. To recover severely degraded regions, we inject holistic priors via a frozen CLIP encoder. Finally, our Frequency Collaborative Reconstruction Module (FCRM) achieves an optimal synthesis by decoupling the decoding process. Conditioned on geometric priors, FCRM seamlessly harmonizes the reconstruction of sharp high-frequency occlusion boundaries with the restoration of low-frequency global illumination. Extensive experiments demonstrate that CFSR achieves state-of-the-art performance across multiple challenging benchmarks.

Editorial analysis

A structured set of objections, weighed in public.

Desk editor's note, referee report, simulated authors' rebuttal, and a circularity audit. Tearing a paper down is the easy half of reading it; the pith above is the substance, this is the friction.

Desk Editor's Note private letter to a colleague

CFSR is a new shadow removal network with some fresh module combinations but the physical disentanglement claim rests on architecture description rather than shown constraints or diagnostics.

read the letter

The paper's main contribution is CFSR, which combines HVI color space mapping, depth fusion, DINO features plus normals in a dual guided attention block that modulates the affinity matrix, frozen CLIP priors, and a Frequency Collaborative Reconstruction Module. This specific stack of priors and the way they are wired together looks like a genuine new architecture for the shadow removal task rather than a minor tweak on existing networks. The approach tries to make the restoration more structured by conditioning on geometry and semantics, which is a reasonable direction for handling the 2D-3D gap in lighting consistency. The frequency module for separating high-frequency boundaries from low-frequency illumination is a sensible addition that could help with the usual trade-offs in these problems. What is less clear is whether the attention modulation actually imposes physical lighting constraints or simply supplies useful learned features. The abstract describes the mechanism but does not mention any explicit loss term, albedo invariance check, or edge consistency diagnostic that would show the physics framing is doing more than standard attention. Experiments are claimed to be state-of-the-art on multiple benchmarks, yet the provided text gives no numbers, baselines, or ablations, so the size of any improvement is unknown. The work is aimed at computer vision researchers focused on image restoration and editing applications. It is coherent enough on its own terms to warrant a serious referee, though reviewers will need to see the full experiments and controls before the physical claims can be taken at face value. I would send it out for review with requests for those diagnostics and ablations.

Referee Report

3 major / 2 minor

Summary. The paper proposes CFSR, a multi-modal shadow removal network that maps inputs to a custom HVI color space, fuses RGB with depth priors, employs Geometric & Semantic Dual Explicit Guided Attention (modulating attention via DINO features and 3D normals), injects frozen CLIP priors for degraded regions, and uses a Frequency Collaborative Reconstruction Module (FCRM) to separately handle high-frequency boundaries and low-frequency illumination. It claims this reframes shadow removal as physics-constrained restoration that bridges the 2D-3D gap and achieves state-of-the-art results on multiple benchmarks.

Significance. If the guided-attention modulation and FCRM can be shown to impose verifiable physical consistency (e.g., albedo invariance or illumination smoothness) beyond standard learned priors, the approach would offer a principled way to incorporate geometric cues into restoration tasks and improve generalization on challenging shadow cases.

major comments (3)

[Abstract, §3.2] Abstract and §3.2 (Geometric & Semantic Dual Explicit Guided Attention): the claim that DINO features plus 3D normals 'directly modulate the attention affinity matrix, structurally enforcing physical lighting constraints' is not supported by any explicit constraint equation, regularization term, or diagnostic metric (such as shadow-edge gradient consistency or albedo variance across lit/shadow regions). Without these, the mechanism reduces to an empirical attention prior whose physical interpretation remains unverified.
[Abstract, Experiments] Abstract and Experiments section: the assertion of 'state-of-the-art performance across multiple challenging benchmarks' is presented without any quantitative tables, baseline comparisons, error bars, or ablation results in the provided text. This prevents assessment of whether the reported gains are statistically meaningful or attributable to the proposed physical-disentanglement components rather than architecture scale.
[§3.3] §3.3 (FCRM): the description of decoupling high-frequency occlusion boundaries from low-frequency global illumination via frequency collaboration is conceptually appealing, but no loss formulation or frequency-domain analysis is supplied to demonstrate that the module enforces illumination consistency that standard decoders lack.

minor comments (2)

[§3.1] Notation for the HVI color space and the exact form of the attention-affinity modulation should be defined with equations rather than prose descriptions.
[§4] The manuscript would benefit from a clear statement of the overall training objective (including any auxiliary losses for depth or normal estimation) to allow reproducibility.

Simulated Author's Rebuttal

3 responses · 0 unresolved

We thank the referee for the constructive and detailed feedback. We have addressed each major comment below with clarifications and commitments to revisions that strengthen the physical grounding and empirical support of the method without overstating current claims.

read point-by-point responses

Referee: [Abstract, §3.2] Abstract and §3.2 (Geometric & Semantic Dual Explicit Guided Attention): the claim that DINO features plus 3D normals 'directly modulate the attention affinity matrix, structurally enforcing physical lighting constraints' is not supported by any explicit constraint equation, regularization term, or diagnostic metric (such as shadow-edge gradient consistency or albedo variance across lit/shadow regions). Without these, the mechanism reduces to an empirical attention prior whose physical interpretation remains unverified.

Authors: We agree that the current wording overstates the enforcement as 'structural' without supporting formalism. The modulation occurs by adding projected DINO and normal features to the attention logits before the softmax, which empirically biases attention toward geometrically consistent regions, but this remains a learned prior. In revision we will replace the claim with precise equations for the guided attention (A' = softmax((QK^T + g(DINO, N))/√d_k)), add a dedicated diagnostic subsection with metrics (albedo variance across lit/shadow pairs and shadow-edge gradient consistency), and report these results on the benchmarks. revision: yes
Referee: [Abstract, Experiments] Abstract and Experiments section: the assertion of 'state-of-the-art performance across multiple challenging benchmarks' is presented without any quantitative tables, baseline comparisons, error bars, or ablation results in the provided text. This prevents assessment of whether the reported gains are statistically meaningful or attributable to the proposed physical-disentanglement components rather than architecture scale.

Authors: The full manuscript contains an Experiments section with tables reporting PSNR/SSIM on ISTD, SRD, and ISTD+ benchmarks against recent baselines, plus ablations isolating each component and standard-deviation error bars. We acknowledge that the abstract does not reference these numbers and that the review copy may have obscured the tables. We will revise the abstract to cite the key quantitative margins and ensure all tables explicitly include error bars, statistical significance, and component-wise ablations demonstrating that gains derive from the geometric and frequency modules rather than scale alone. revision: partial
Referee: [§3.3] §3.3 (FCRM): the description of decoupling high-frequency occlusion boundaries from low-frequency global illumination via frequency collaboration is conceptually appealing, but no loss formulation or frequency-domain analysis is supplied to demonstrate that the module enforces illumination consistency that standard decoders lack.

Authors: We accept that the current description lacks the requested formalism. The FCRM applies separate high-frequency (edge-preserving) and low-frequency (illumination) branches with geometry-conditioned fusion; the loss includes an L1 term on high-frequency residuals and a total-variation smoothness term on the low-frequency illumination map. In revision we will insert the explicit loss equations, describe the frequency decomposition (wavelet-based), and add FFT spectrum plots plus quantitative illumination-consistency metrics comparing FCRM against a standard decoder baseline. revision: yes

Circularity Check

0 steps flagged

No circularity: empirical architecture with external benchmarks

full rationale

The paper proposes a neural architecture (HVI mapping, depth fusion, DINO+normals guided attention, FCRM) for shadow removal and reports SOTA on benchmarks. No derivation chain, equations, fitted parameters relabeled as predictions, or self-citations appear in the provided text. All performance claims rest on external evaluation rather than reducing to the method's own inputs by construction. The physical-constraint framing is a design motivation, not a self-referential proof.

Axiom & Free-Parameter Ledger

0 free parameters · 0 axioms · 0 invented entities

No free parameters, axioms, or invented entities are explicitly detailed in the abstract. The framework builds on standard deep learning elements and pre-trained models (DINO, CLIP) treated as external inputs.

pith-pipeline@v0.9.0 · 5512 in / 1203 out tokens · 59723 ms · 2026-05-10T04:52:17.228561+00:00 · methodology

discussion (0)

Reference graph

Works this paper leans on

59 extracted references · 15 canonical work pages · 2 internal anchors

[1]

Xuecheng Bai, Yuxiang Wang, Boyu Hu, Qinyuan Jie, Chuanzhi Xu, Kechen Li, Hongru Xiao, and Vera Chung. 2026. DRWKV: Focusing on Object Edges for Low-Light Image Enhancement. InProceedings of the IEEE/CVF Winter Conference on Applications of Computer Vision. 1554–1564

2026
[2]

Kerui Chen, Zhiliang Wu, Wenjin Hou, Kun Li, Hehe Fan, and Yi Yang. 2025. Prompt-aware controllable shadow removal.arXiv preprint arXiv:2501.15043 (2025)

work page arXiv 2025
[3]

Worameth Chinchuthakun, Pakkapon Phongthawee, Amit Raj, Varun Jampani, Pramook Khungurn, and Supasorn Suwajanakorn. 2026. DiffusionLight-Turbo: Accelerated Light Probes for Free Via Single-Pass Chrome Ball Inpainting.IEEE Transactions on Pattern Analysis and Machine Intelligence(2026), 1–14. doi:10. 1109/TPAMI.2026.3660066

work page arXiv 2026
[4]

Xiaodong Cun, Chi-Man Pun, and Cheng Shi. 2020. Towards ghost-free shadow removal via dual hierarchical aggregation network and shadow matting gan. In Proceedings of the AAAI conference on artificial intelligence, Vol. 34. 10680–10687

2020
[5]

Wei Dong, Han Zhou, Seyed Amirreza Mousavi, and Jun Chen. 2025. Retinex- guided histogram transformer for mask-free shadow removal. InProceedings of the Computer Vision and Pattern Recognition Conference. 1471–1481

2025
[6]

Wei Dong, Han Zhou, Yuqiong Tian, Jingke Sun, Xiaohong Liu, Guangtao Zhai, and Jun Chen. 2024. ShadowRefiner: Towards Mask-free Shadow Removal via Fast Fourier Transformer. InProceedings of the IEEE/CVF Conference on Computer Vision and Pattern Recognition (CVPR) Workshops. 6208–6217

2024
[7]

Wei Dong, Han Zhou, Yuqiong Tian, Jingke Sun, Xiaohong Liu, Guangtao Zhai, and Jun Chen. 2024. Shadowrefiner: Towards mask-free shadow removal via fast fourier transformer. InProceedings of the IEEE/CVF Conference on Computer Vision and Pattern Recognition. 6208–6217

2024
[8]

Tamir Einy, Efrat Immer, Gilad Vered, and Shai Avidan. 2022. Physics based image deshadowing using local linear model. InProceedings of the IEEE/CVF conference on computer vision and pattern recognition. 3012–3020

2022
[9]

Finlayson, S.D

G.D. Finlayson, S.D. Hordley, Cheng Lu, and M.S. Drew. 2006. On the removal of shadows from images.IEEE Transactions on Pattern Analysis and Machine Intelligence28, 1 (2006), 59–68. doi:10.1109/TPAMI.2006.18

work page doi:10.1109/tpami.2006.18 2006
[10]

Lan Fu, Changqing Zhou, Qing Guo, Felix Juefei-Xu, Hongkai Yu, Wei Feng, Yang Liu, and Song Wang. 2021. Auto-exposure fusion for single-image shadow removal. InProceedings of the IEEE/CVF conference on computer vision and pattern recognition. 10571–10580

2021
[11]

Lanqing Guo, Siyu Huang, Ding Liu, Hao Cheng, and Bihan Wen. 2023. Shad- owFormer: Global context helps shadow removal. InProceedings of the AAAI conference on artificial intelligence, Vol. 37. 710–718

2023
[12]

Lanqing Guo, Chong Wang, Wenhan Yang, Siyu Huang, Yufei Wang, Hanspeter Pfister, and Bihan Wen. 2023. Shadowdiffusion: When degradation prior meets diffusion model for shadow removal. InProceedings of the IEEE/CVF conference on computer vision and pattern recognition. 14049–14058

2023
[13]

Lanqing Guo, Chong Wang, Wenhan Yang, Yufei Wang, and Bihan Wen. 2023. Boundary-aware divide and conquer: A diffusion-based solution for unsupervised shadow removal. InProceedings of the IEEE/CVF International Conference on Computer Vision. 13045–13054

2023
[14]

Ruiqi Guo, Qieyun Dai, and Derek Hoiem. 2011. Single-image shadow detection and removal using paired regions. InCVPR 2011. IEEE, 2033–2040

2011
[15]

Jin Hu, Mingjia Li, and Xiaojie Guo. 2025. Shadowhack: Hacking shadows via luminance-color divide and conquer. InProceedings of the IEEE/CVF International Conference on Computer Vision. 11403–11413

2025
[16]

Tao Hu, Longyao Wu, Wei Dong, Peng Wu, Jinqiu Sun, Xiaogang Xu, Qingsen Yan, and Yanning Zhang. 2026. Boosting HDR Image Reconstruction via Semantic Knowledge Transfer.IEEE Transactions on Image Processing35 (2026), 1910–1922. doi:10.1109/TIP.2026.3652360

work page doi:10.1109/tip.2026.3652360 2026
[17]

Xiaowei Hu, Yitong Jiang, Chi-Wing Fu, and Pheng-Ann Heng. 2019. Mask- ShadowGAN: Learning to Remove Shadows From Unpaired Data. InProceedings of the IEEE/CVF International Conference on Computer Vision (ICCV)

2019
[18]

Yihao Hu, Pan Wang, Xiaodong Bai, Shijie Cai, Hang Wang, Huazhong Liu, Aiping Yang, Xiangxiang Li, Meiping Ding, Hongyan Liu, et al . 2025. SDE- DET: A Precision Network for Shatian Pomelo Detection in Complex Orchard Environments.arXiv preprint arXiv:2509.19990(2025)

work page arXiv 2025
[19]

Liming Jiang, Bo Dai, Wayne Wu, and Chen Change Loy. 2021. Focal frequency loss for image reconstruction and synthesis. InProceedings of the IEEE/CVF international conference on computer vision. 13919–13929

2021
[20]

Yeying Jin, Aashish Sharma, and Robby T Tan. 2021. Dc-shadownet: Single-image hard and soft shadow removal using unsupervised domain-classifier guided network. InProceedings of the IEEE/CVF international conference on computer vision. 5027–5036

2021
[21]

Dachun Kai, Jiayao Lu, Yueyi Zhang, and Xiaoyan Sun. 2026. EvTexture++: Event- Driven Texture Enhancement for Video Super-Resolution.IEEE Transactions on Pattern Analysis and Machine Intelligence(2026), 1–18. doi:10.1109/TPAMI.2026. 3660020

work page doi:10.1109/tpami.2026 2026
[22]

Alexander Kirillov, Eric Mintun, Nikhila Ravi, Hanzi Mao, Chloe Rolland, Laura Gustafson, Tete Xiao, Spencer Whitehead, Alexander C Berg, Wan-Yen Lo, et al
[23]

InProceedings of the IEEE/CVF international conference on computer vision

Segment anything. InProceedings of the IEEE/CVF international conference on computer vision. 4015–4026
[24]

Hieu Le and Dimitris Samaras. 2019. Shadow removal via shadow image decom- position. InProceedings of the IEEE/CVF International Conference on Computer Vision. 8578–8587

2019
[25]

Hieu Le and Dimitris Samaras. 2020. From shadow segmentation to shadow removal. InEuropean Conference on Computer Vision. Springer, 264–281

2020
[26]

Hieu Le and Dimitris Samaras. 2022. Physics-Based Shadow Image Decomposi- tion for Shadow Removal.IEEE Transactions on Pattern Analysis and Machine Intelligence44, 12 (2022), 9088–9101. doi:10.1109/TPAMI.2021.3124934

work page doi:10.1109/tpami.2021.3124934 2022
[27]

Chia-Ming Lee, Yu-Fan Lin, Yu-Jou Hsiao, Jin-Hui Jiang, Yu-Lun Liu, and Chih- Chung Hsu. 2026. PhaSR: Generalized Image Shadow Removal with Physically Aligned Priors.arXiv preprint arXiv:2601.17470(2026)

work page arXiv 2026
[28]

Zhengqin Li, Dilin Wang, Ka Chen, Zhaoyang Lv, Thu Nguyen-Phuoc, Milim Lee, Jia-Bin Huang, Lei Xiao, Yufeng Zhu, Carl S Marshall, et al . 2025. Lirm: Large inverse rendering model for progressive reconstruction of shape, materials and view-dependent radiance fields. InProceedings of the Computer Vision and Pattern Recognition Conference. 505–517

2025
[29]

Ruofan Liang, Zan Gojcic, Huan Ling, Jacob Munkberg, Jon Hasselgren, Chih- Hao Lin, Jun Gao, Alexander Keller, Nandita Vijaykumar, Sanja Fidler, et al. 2025. Diffusion renderer: Neural inverse and forward rendering with video diffusion models. InProceedings of the Computer Vision and Pattern Recognition Conference. 26069–26080

2025
[30]

Yu-Fan Lin, Chia-Ming Lee, and Chih-Chung Hsu. 2025. Densesr: Image shadow removal as dense prediction. InProceedings of the 33rd ACM International Con- ference on Multimedia. 7026–7035

2025
[31]

Feng Liu and Michael Gleicher. 2008. Texture-consistent shadow removal. In European Conference on Computer Vision. Springer, 437–450

2008
[32]

Hengxing Liu, Mingjia Li, and Xiaojie Guo. 2024. Regional attention for shadow removal. InProceedings of the 32nd ACM International Conference on Multimedia. 5949–5957

2024
[33]

Jiawei Liu, Qiang Wang, Huijie Fan, Wentao Li, Liangqiong Qu, and Yandong Tang. 2023. A decoupled multi-task network for shadow removal.IEEE Transac- tions on Multimedia25 (2023), 9449–9463

2023
[34]

Xiujin Liu. 2025. GNC-Pose: Geometry-Aware GNC-PnP for Accurate 6D Pose Estimation.arXiv preprint arXiv:2512.06565(2025)

work page internal anchor Pith review Pith/arXiv arXiv 2025
[35]

Zhihao Liu, Hui Yin, Xinyi Wu, Zhenyao Wu, Yang Mi, and Song Wang. 2021. From shadow generation to shadow removal. InProceedings of the IEEE/CVF conference on computer vision and pattern recognition. 4927–4936

2021
[36]

Implications of

Ziwei Luo, Fredrik K. Gustafsson, Zheng Zhao, Jens Sjölund, and Thomas B. Schön. 2023. Refusion: Enabling Large-Size Realistic Image Restoration with Latent-Space Diffusion Models. In2023 IEEE/CVF Conference on Computer Vi- sion and Pattern Recognition Workshops (CVPRW). 1680–1691. doi:10.1109/ CVPRW59228.2023.00169

work page arXiv 2023
[37]

Maxime Oquab, Timothée Darcet, Théo Moutakanni, Huy Vo, Marc Szafraniec, Vasil Khalidov, Pierre Fernandez, Daniel Haziza, Francisco Massa, Alaaeldin El- Nouby, et al. 2023. Dinov2: Learning robust visual features without supervision. arXiv preprint arXiv:2304.07193(2023)

work page internal anchor Pith review Pith/arXiv arXiv 2023
[38]

Liangqiong Qu, Jiandong Tian, Shengfeng He, Yandong Tang, and Rynson WH Lau. 2017. Deshadownet: A multi-context embedding deep network for shadow removal. InProceedings of the IEEE conference on computer vision and pattern recognition. 4067–4075

2017
[39]

Alec Radford, Jong Wook Kim, Chris Hallacy, Aditya Ramesh, Gabriel Goh, Sand- hini Agarwal, Girish Sastry, Amanda Askell, Pamela Mishkin, Jack Clark, et al
[40]

In International conference on machine learning

Learning transferable visual models from natural language supervision. In International conference on machine learning. PmLR, 8748–8763
[41]

Dai Quoc Tran, Armstrong Aboah, Yuntae Jeon, Maged Shoman, Minsoo Park, and Seunghee Park. 2024. Low-light image enhancement framework for improved object detection in fisheye lens datasets. InProceedings of the IEEE/CVF conference on computer vision and pattern recognition. 7056–7065

2024
[42]

Florin-Alexandru Vasluianu, Tim Seizinger, and Radu Timofte. 2023. Wsrd: A novel benchmark for high resolution image shadow removal. InProceedings of the IEEE/CVF Conference on Computer Vision and Pattern Recognition. 1826–1835

2023
[43]

Florin-Alexandru Vasluianu, Tim Seizinger, Zongwei Wu, Rakesh Ranjan, and Radu Timofte. 2024. Towards image ambient lighting normalization. InEuropean Conference on Computer Vision. Springer, 385–404

2024
[44]

Florin-Alexandru Vasluianu, Tim Seizinger, Zongwei Wu, and Radu Timofte
[45]

InProceedings of the IEEE/CVF International Conference on Computer Vision

After the Party: Navigating the Mapping From Color to Ambient Lighting. InProceedings of the IEEE/CVF International Conference on Computer Vision. 9218– 9229
[46]

Jifeng Wang, Xiang Li, and Jian Yang. 2018. Stacked conditional generative adversarial networks for jointly learning shadow detection and shadow removal. InProceedings of the IEEE conference on computer vision and pattern recognition. 1788–1797

2018
[47]

Pan Wang, Yihao Hu, Xiaodong Bai, Jingchu Yang, Leyi Zhou, Aiping Yang, Xi- angxiang Li, Meiping Ding, and Jianguo Yao. 2025. A Multi-Strategy Framework for Enhancing Shatian Pomelo Detection in Real-World Orchards.arXiv preprint arXiv:2510.09948(2025). CFSR: Geometry-Conditioned Shadow Removal via Physical Disentanglement

work page arXiv 2025
[48]

Tao Wang, Kaihao Zhang, Tianrun Shen, Wenhan Luo, Bjorn Stenger, and Tong Lu. 2023. Ultra-high-definition low-light image enhancement: A benchmark and transformer-based method. InProceedings of the AAAI conference on artificial intelligence, Vol. 37. 2654–2662

2023
[49]

Jie Xiao, Xueyang Fu, Yurui Zhu, Dong Li, Jie Huang, Kai Zhu, and Zheng-Jun Zha
[50]

In Proceedings of the IEEE/CVF conference on computer vision and pattern recognition

Homoformer: Homogenized transformer for image shadow removal. In Proceedings of the IEEE/CVF conference on computer vision and pattern recognition. 25617–25626
[51]

Jiamin Xu, Zelong Li, Yuxin Zheng, Chenyu Huang, Renshu Gu, Weiwei Xu, and Gang Xu. 2025. Omnisr: Shadow removal under direct and indirect lighting. In Proceedings of the AAAI Conference on Artificial Intelligence, Vol. 39. 8887–8895

2025
[52]

Jiamin Xu, Yuxin Zheng, Zelong Li, Chi Wang, Renshu Gu, Weiwei Xu, and Gang Xu. 2025. Detail-preserving latent diffusion for stable shadow removal. InPro- ceedings of the IEEE/CVF Conference on Computer Vision and Pattern Recognition. 7592–7602

2025
[53]

Lihe Yang, Bingyi Kang, Zilong Huang, Zhen Zhao, Xiaogang Xu, Jiashi Feng, and Hengshuang Zhao. 2024. Depth Anything V2. InAdvances in Neural Information Processing Systems, A. Globerson, L. Mackey, D. Belgrave, A. Fan, U. Paquet, J. Tomczak, and C. Zhang (Eds.), Vol. 37. Curran Associates, Inc., 21875–21911. doi:10.52202/079017-0688

work page doi:10.52202/079017-0688 2024
[54]

Qingxiong Yang, Kar-Han Tan, and Narendra Ahuja. 2012. Shadow Removal Using Bilateral Filtering.IEEE Transactions on Image Processing21, 10 (2012), 4361–4368. doi:10.1109/TIP.2012.2208976

work page doi:10.1109/tip.2012.2208976 2012
[55]

Ziyi Yang, Yanzhen Chen, Xinyu Gao, Yazhen Yuan, Yu Wu, Xiaowei Zhou, and Xiaogang Jin. 2023. Sire-ir: Inverse rendering for brdf reconstruction with shadow and illumination removal in high-illuminance scenes.arXiv preprint arXiv:2310.13030(2023)

work page arXiv 2023
[56]

Edward Zhang, Ricardo Martin-Brualla, Janne Kontkanen, and Brian L Curless
[57]

InProceedings of the IEEE/CVF Conference on Computer Vision and Pattern Recognition

No shadow left behind: Removing objects and their shadows using ap- proximate lighting and geometry. InProceedings of the IEEE/CVF Conference on Computer Vision and Pattern Recognition. 16397–16406
[58]

Yurui Zhu, Jie Huang, Xueyang Fu, Feng Zhao, Qibin Sun, and Zheng-Jun Zha
[59]

InProceedings of the IEEE/CVF conference on computer vision and pattern recognition

Bijective mapping network for shadow removal. InProceedings of the IEEE/CVF conference on computer vision and pattern recognition. 5627–5636