ATT-CR: Adaptive Triangular Transformer for Cloud Removal
Pith reviewed 2026-06-28 02:31 UTC · model grok-4.3
The pith
ATT-CR approximates self-attention with triangular matrices to remove clouds more efficiently from remote sensing images.
A machine-rendered reading of the paper's core claim, the machinery that carries it, and where it could break.
Core claim
ATT-CR consists of Triangular Attention (TAN) that approximates softmax self-attention using lower and upper triangular matrices for O(N) complexity and Feature Selected Gating Module (FSGM) that adaptively selects clean features to avoid interference from cloudy pixels, leading to better reconstruction of ground objects.
What carries the argument
Triangular Attention (TAN) combined with Feature Selected Gating Module (FSGM), where TAN approximates full attention with triangular matrices to reduce cost and FSGM filters invalid cloudy information.
Load-bearing premise
Approximating full self-attention with lower and upper triangular matrices still models the necessary long-range dependencies to accurately reconstruct ground objects hidden by clouds.
What would settle it
A test case where the triangular attention produces visibly worse reconstructions than full attention on images with intricate cloud patterns would falsify the effectiveness of the approximation.
Figures
read the original abstract
Cloud removal aims to accurately reconstruct the ground objects obscured by clouds in remote sensing images. Existing Transformer-based methods utilizing self-attention have shown impressive results by effectively modeling long-range dependencies in cloudy images. However, they suffer from the following issues: 1) the high computational complexity of self-attention limits scalability; 2) treating both cloudy and clean pixels as valid within the attention computation brings disturbances in subsequent layers, leading to suboptimal performance. To address these challenges, we propose the Adaptive Triangular Transformer for Cloud Removal (ATT-CR), a model that effectively reduces computational costs and mitigates interference from cloudy pixels. Specifically, it consists of two core components: Triangular Attention (TAN) and Feature Selected Gating Module (FSGM). TAN employs lower and upper triangular matrices to approximate Softmax attention with O(N) computational complexity, significantly reducing the computational costs. The FSGM, on the other hand, integrates with TAN to adaptively distinguish between cloudy and clean features, which minimizes the introduction of invalid information into subsequent layers. Extensive experiments on cloud removal benchmarks demonstrate that ATT-CR delivers superior performance compared to existing methods.
Editorial analysis
A structured set of objections, weighed in public.
Referee Report
Summary. The manuscript proposes ATT-CR, an Adaptive Triangular Transformer for Cloud Removal in remote sensing images. It introduces Triangular Attention (TAN) that approximates standard softmax self-attention via lower and upper triangular matrices to achieve O(N) complexity, paired with a Feature Selected Gating Module (FSGM) that adaptively distinguishes cloudy from clean features to reduce interference in subsequent layers. The central claim is that these components together yield superior performance over existing Transformer-based methods on cloud removal benchmarks while addressing scalability and disturbance issues.
Significance. If the triangular approximation is shown to retain sufficient long-range dependencies for accurate ground-object reconstruction and the performance gains are quantitatively verified, the work would offer a practical efficiency improvement for attention-based restoration models in remote sensing, where processing large images under cloud cover is common.
major comments (2)
- [TAN description] TAN description: The claim that lower/upper triangular matrices approximate softmax self-attention while preserving the long-range dependencies required to reconstruct obscured ground objects from distant clean pixels lacks any derivation, error-bound analysis, or attention-map evidence. Triangular masking restricts interactions to directional/partial token sets rather than dense pairwise relations; without showing that the approximation error remains small in the cloud-removal regime, the O(N) efficiency cannot be assumed to support the reconstruction performance.
- [Experimental results] Experimental results: The abstract asserts superior benchmark performance, yet no metrics (PSNR, SSIM, etc.), baselines, error bars, dataset details, or statistical tests are referenced. This leaves the central superiority claim without visible quantitative support; the results section must supply these to make the claim load-bearing.
minor comments (1)
- [Abstract] The abstract lists two issues with prior methods but does not explicitly state the datasets comprising the 'cloud removal benchmarks,' which would aid immediate context.
Simulated Author's Rebuttal
We thank the referee for the constructive comments on our manuscript. We address each major comment point by point below, indicating the revisions we will incorporate.
read point-by-point responses
-
Referee: The claim that lower/upper triangular matrices approximate softmax self-attention while preserving the long-range dependencies required to reconstruct obscured ground objects from distant clean pixels lacks any derivation, error-bound analysis, or attention-map evidence. Triangular masking restricts interactions to directional/partial token sets rather than dense pairwise relations; without showing that the approximation error remains small in the cloud-removal regime, the O(N) efficiency cannot be assumed to support the reconstruction performance.
Authors: We agree that the current description of TAN would be strengthened by explicit supporting analysis. In the revised manuscript we will add a mathematical derivation of the lower/upper triangular approximation to softmax attention, an error-bound analysis tailored to the cloud-removal setting, and attention-map visualizations that compare TAN with standard self-attention to show preservation of the long-range dependencies needed for ground-object reconstruction. revision: yes
-
Referee: The abstract asserts superior benchmark performance, yet no metrics (PSNR, SSIM, etc.), baselines, error bars, dataset details, or statistical tests are referenced. This leaves the central superiority claim without visible quantitative support; the results section must supply these to make the claim load-bearing.
Authors: The results section already contains the requested quantitative details (PSNR, SSIM, baselines, datasets). To make the abstract claim self-contained and to ensure all supporting evidence is immediately visible, we will revise the abstract to reference the key metrics and will confirm that error bars and statistical information are explicitly reported in the results tables and text. revision: yes
Circularity Check
No significant circularity detected
full rationale
The paper proposes ATT-CR via explicit architectural choices: TAN approximates softmax attention using lower/upper triangular matrices (stated as an O(N) design decision) and FSGM adaptively gates features. No equations or claims reduce by construction to fitted inputs, self-definitions, or prior self-citations; performance is reported as empirical benchmark results rather than derived predictions. The derivation chain consists of independent design steps without load-bearing self-referential reductions.
Axiom & Free-Parameter Ledger
Reference graph
Works this paper leans on
-
[1]
Second simulation of the satellite signal in the solar spectrum, 6s: an overview,
E. F. Vermote, D. Tanr ´e, J. Deuz´e, M. Herman, and J. Morcette, “Second simulation of the satellite signal in the solar spectrum, 6s: an overview,” IEEE Transactions on Geoscience and Remote Sensing, vol. 35, no. 3, pp. 675–686, 1997
1997
-
[2]
Thin cloud removal from single satellite images,
J. Liu, X. Wang, M. Chen, S. Liu, X. Zhou, Z. Shao, and P. Liu, “Thin cloud removal from single satellite images,”Optics express, vol. 22, no. 1, pp. 618–632, 2014
2014
-
[3]
Haze and thin cloud removal via sphere model improved dark channel prior,
J. Li, Q. Hu, and M. Ai, “Haze and thin cloud removal via sphere model improved dark channel prior,”IEEE Geoscience and Remote Sensing Letters, vol. 16, no. 3, pp. 472–476, 2019
2019
-
[4]
Haze and thin cloud removal using elliptical boundary prior for remote sensing image,
Q. Guo, H. Hu, and B. Li, “Haze and thin cloud removal using elliptical boundary prior for remote sensing image,”IEEE Transactions on Geoscience and Remote Sensing, vol. 57, no. 11, pp. 9124–9137, 2019
2019
-
[5]
Thin cloud removal with residual symmetrical concatenation network,
W. Li, Y . Li, D. Chen, and J. C.-W. Chan, “Thin cloud removal with residual symmetrical concatenation network,”ISPRS Journal of Photogrammetry and Remote Sensing, vol. 153, pp. 137–150, 2019
2019
-
[6]
Thin cloud removal for multispectral remote sensing images using convolutional neural networks combined with an imaging model,
Y . Zi, F. Xie, N. Zhang, Z. Jiang, W. Zhu, and H. Zhang, “Thin cloud removal for multispectral remote sensing images using convolutional neural networks combined with an imaging model,”IEEE Journal of Selected Topics in Applied Earth Observations and Remote Sensing, vol. 14, pp. 3811–3823, 2021
2021
-
[7]
Cloud removal in optical remote sensing imagery using multiscale distortion-aware networks,
W. Yu, X. Zhang, and M. Pun, “Cloud removal in optical remote sensing imagery using multiscale distortion-aware networks,”IEEE Geoscience and Remote Sensing Letters, vol. 19, pp. 1–5, 2022
2022
-
[8]
Wavelet integrated convolutional neural network for thin cloud removal in remote sensing images,
Y . Zi, H. Ding, F. Xie, Z. Jiang, and X. Song, “Wavelet integrated convolutional neural network for thin cloud removal in remote sensing images,”Remote Sensing, vol. 15, no. 3, p. 781, 2023
2023
-
[9]
Cloud-guided fusion with sar-to-optical translation for thick cloud removal,
X. Xiang, Y . Tan, and L. Yan, “Cloud-guided fusion with sar-to-optical translation for thick cloud removal,”IEEE Transactions on Geoscience and Remote Sensing, vol. 62, pp. 1–15, 2024
2024
-
[10]
Robust haze and thin cloud removal via conditional variational autoencoders,
H. Ding, F. Xie, L. Qiu, X. Zhang, and Z. Shi, “Robust haze and thin cloud removal via conditional variational autoencoders,”IEEE Transactions on Geoscience and Remote Sensing, vol. 62, pp. 1–16, 2024
2024
-
[11]
Cloud removal for remote sensing imagery via spatial attention generative adversarial network,
H. Pan, “Cloud removal for remote sensing imagery via spatial attention generative adversarial network,”arXiv preprint arXiv:2009.13015, 2020
arXiv 2009
-
[12]
Attentive contextual attention for cloud removal,
W. Huang, Y . Deng, Y . Wu, and J. Wang, “Attentive contextual attention for cloud removal,”IEEE Transactions on Geoscience and Remote Sensing, vol. 62, pp. 1–12, 2024
2024
-
[13]
Uncertainty-based thin cloud removal network via conditional variational autoencoders,
H. Ding, Y . Zi, and F. Xie, “Uncertainty-based thin cloud removal network via conditional variational autoencoders,” inComputer Vision - ACCV 2022 - 16th Asian Conference on Computer Vision, Macao, China, December 4-8, 2022, Proceedings, Part III, ser. Lecture Notes in Computer Science, vol. 13843, 2022, pp. 52–68
2022
-
[14]
Trinity-net: Gradient-guided swin transformer-based remote sensing image dehazing and beyond,
K. Chi, Y . Yuan, and Q. Wang, “Trinity-net: Gradient-guided swin transformer-based remote sensing image dehazing and beyond,”IEEE Transactions on Geoscience and Remote Sensing, vol. 61, pp. 1–14, 2023
2023
-
[15]
Cascaded memory network for optical remote sensing imagery cloud removal,
J. Liu, B. Pan, and Z. Shi, “Cascaded memory network for optical remote sensing imagery cloud removal,”IEEE Transactions on Geoscience and Remote Sensing, vol. 62, pp. 1–11, 2024
2024
-
[16]
Cr- former: Single-image cloud removal with focused taylor attention,
Y . Wu, Y . Deng, S. Zhou, Y . Liu, W. Huang, and J. Wang, “Cr- former: Single-image cloud removal with focused taylor attention,”IEEE Transactions on Geoscience and Remote Sensing, vol. 62, pp. 1–14, 2024
2024
-
[17]
Glf-cr: Sar-enhanced cloud removal with global–local fusion,
F. Xu, Y . Shi, P. Ebel, L. Yu, G.-S. Xia, W. Yang, and X. X. Zhu, “Glf-cr: Sar-enhanced cloud removal with global–local fusion,”ISPRS Journal of Photogrammetry and Remote Sensing, vol. 192, pp. 268–278, 2022
2022
-
[18]
Low-rank bottleneck in multi-head attention models,
S. Bhojanapalli, C. Yun, A. S. Rawat, S. J. Reddi, and S. Kumar, “Low-rank bottleneck in multi-head attention models,” inProceedings of the 37th International Conference on Machine Learning, Virtual Event, 2020, pp. 864–873
2020
-
[19]
Mamba- cr: A state-space model for remote sensing image cloud removal,
C. Zhang, F. Wang, X. Zhang, M. Wang, X. Wu, and S. Dang, “Mamba- cr: A state-space model for remote sensing image cloud removal,”IEEE Transactions on Geoscience and Remote Sensing, vol. 63, pp. 1–13, 2025
2025
-
[20]
Cr-famba: A frequency-domain assisted mamba for thin cloud removal in optical remote sensing imagery,
J. Liu, B. Pan, and Z. Shi, “Cr-famba: A frequency-domain assisted mamba for thin cloud removal in optical remote sensing imagery,”IEEE Transactions on Multimedia, vol. 27, pp. 5659–5668, 2025
2025
-
[21]
Mamba: Linear-time sequence modeling with selective state spaces,
A. Gu and T. Dao, “Mamba: Linear-time sequence modeling with selective state spaces,”arXiv preprint arXiv:2312.00752, 2023
Pith/arXiv arXiv 2023
-
[22]
Mambaout: Do we really need mamba for vision?
W. Yu and X. Wang, “Mambaout: Do we really need mamba for vision?” inProceedings of the IEEE/CVF Conference on Computer Vision and Pattern Recognition, 2025
2025
-
[23]
Efficient attention: Attention with linear complexities,
Z. Shen, M. Zhang, H. Zhao, S. Yi, and H. Li, “Efficient attention: Attention with linear complexities,” inProceedings of IEEE Winter Conference on Applications of Computer Vision, Waikoloa, HI, USA, 2021, pp. 3530–3538
2021
-
[24]
Efficientvit: Multi-scale linear attention for high-resolution dense prediction,
H. Cai, J. Li, M. Hu, C. Gan, and S. Han, “Efficientvit: Multi-scale linear attention for high-resolution dense prediction,”arXiv preprint arXiv:2205.14756, 2022
arXiv 2022
-
[25]
Filmy cloud removal on satellite imagery with multispectral conditional generative adversarial nets,
K. Enomoto, K. Sakurada, W. Wang, H. Fukui, M. Matsuoka, R. Naka- mura, and N. Kawaguchi, “Filmy cloud removal on satellite imagery with multispectral conditional generative adversarial nets,” inProceed- ings of IEEE Conference on Computer Vision and Pattern Recognition Workshops, Honolulu, HI, USA,, 2017, pp. 1533–1541
2017
-
[26]
Cloud-gan: Cloud removal for sentinel- 2 imagery using a cyclic consistent generative adversarial networks,
P. Singh and N. Komodakis, “Cloud-gan: Cloud removal for sentinel- 2 imagery using a cyclic consistent generative adversarial networks,” inProceedings of IEEE International Geoscience and Remote Sensing Symposium, Valencia, Spain, 2018, pp. 1772–1775
2018
-
[27]
Cermf-net: A sar- optical feature fusion for cloud elimination from sentinel-2 imagery using residual multiscale dilated network,
J. Anandakrishnan, V . M. Sundaram, and P. Paneer, “Cermf-net: A sar- optical feature fusion for cloud elimination from sentinel-2 imagery using residual multiscale dilated network,”IEEE Journal of Selected Topics in Applied Earth Observations and Remote Sensing, vol. 17, pp. 11 741–11 749, 2024
2024
-
[28]
Cloud removal based on sar-optical remote sensing data fusion via a two-flow network,
R. Mao, H. Li, G. Ren, and Z. Yin, “Cloud removal based on sar-optical remote sensing data fusion via a two-flow network,”IEEE Journal of Selected Topics in Applied Earth Observations and Remote Sensing, vol. 15, pp. 7677–7686, 2022
2022
-
[29]
Thin cloud removal in optical remote sensing images based on generative adversarial networks and physical model of cloud distortion,
J. Li, Z. Wu, Z. Hu, J. Zhang, M. Li, L. Mo, and M. Molinier, “Thin cloud removal in optical remote sensing images based on generative adversarial networks and physical model of cloud distortion,”ISPRS Journal of Photogrammetry and Remote Sensing, vol. 166, pp. 373– 389, 2020. JOURNAL OF LATEX CLASS FILES, VOL. X, NO. X, X 2025 15
2020
-
[30]
Blind single-image-based thin cloud removal using a cloud perception integrated fast fourier convolutional network,
Y . Guo, W. He, Y . Xia, and H. Zhang, “Blind single-image-based thin cloud removal using a cloud perception integrated fast fourier convolutional network,”ISPRS Journal of Photogrammetry and Remote Sensing, vol. 206, pp. 63–86, 2023
2023
-
[31]
Msar-defognet: Lightweight cloud removal network for high resolution remote sensing images based on multi scale convolution,
Y . Zhou, W. Jing, J. Wang, G. Chen, R. Scherer, and R. Damasevicius, “Msar-defognet: Lightweight cloud removal network for high resolution remote sensing images based on multi scale convolution,”IET Image Process., vol. 16, no. 3, pp. 659–668, 2022
2022
-
[32]
An effective network integrating residual learning and channel attention mechanism for thin cloud re- moval,
X. Wen, Z. Pan, Y . Hu, and J. Liu, “An effective network integrating residual learning and channel attention mechanism for thin cloud re- moval,”IEEE Geoscience and Remote Sensing Letters, vol. 19, pp. 1–5, 2022
2022
-
[33]
A novel dense-attention network for thick cloud removal by reconstructing semantic information,
Y . Chen, Z. Cai, J. Yuan, and L. Wu, “A novel dense-attention network for thick cloud removal by reconstructing semantic information,”IEEE Journal of Selected Topics in Applied Earth Observations and Remote Sensing, vol. 16, pp. 2339–2351, 2023
2023
-
[34]
Attention is all you need,
A. Vaswani, N. Shazeer, N. Parmar, J. Uszkoreit, L. Jones, A. N. Gomez, L. Kaiser, and I. Polosukhin, “Attention is all you need,” inProceedings of Advances in Neural Information Processing Systems, Long Beach, CA, USA, 2017, pp. 5998–6008
2017
-
[35]
An image is worth 16x16 words: Transformers for image recognition at scale,
A. Dosovitskiy, L. Beyer, A. Kolesnikov, D. Weissenborn, X. Zhai, T. Unterthiner, M. Dehghani, M. Minderer, G. Heigold, S. Gelly, J. Uszkoreit, and N. Houlsby, “An image is worth 16x16 words: Transformers for image recognition at scale,” inProceedings of 9th International Conference on Learning Representations. ICLR, Virtual Event, Austria, 2024
2024
-
[36]
Pyramid vision transformer: A versatile backbone for dense prediction without convolutions,
W. Wang, E. Xie, X. Li, D. Fan, K. Song, D. Liang, T. Lu, P. Luo, and L. Shao, “Pyramid vision transformer: A versatile backbone for dense prediction without convolutions,” inProceedings of IEEE/CVF International Conference on Computer Vision, ICCV, Montreal, QC, Canada, 2021, pp. 548–558
2021
-
[37]
Swin transformer: Hierarchical vision transformer using shifted windows,
Z. Liu, Y . Lin, Y . Cao, H. Hu, Y . Wei, Z. Zhang, S. Lin, and B. Guo, “Swin transformer: Hierarchical vision transformer using shifted windows,” inProceedings of IEEE/CVF International Conference on Computer Vision, ICCV, Montreal, QC, Canada, 2021, pp. 9992–10 002
2021
-
[38]
Cswin transformer: A general vision transformer backbone with cross-shaped windows,
X. Dong, J. Bao, D. Chen, W. Zhang, N. Yu, L. Yuan, D. Chen, and B. Guo, “Cswin transformer: A general vision transformer backbone with cross-shaped windows,” inProceedings of IEEE/CVF Conference on Computer Vision and Pattern Recognition, CVPR, New Orleans, LA, USA, 2022, pp. 12 114–12 124
2022
-
[39]
Restormer: Efficient transformer for high-resolution image restoration,
S. W. Zamir, A. Arora, S. Khan, M. Hayat, F. S. Khan, and M. Yang, “Restormer: Efficient transformer for high-resolution image restoration,” inProceedings of IEEE/CVF Conference on Computer Vision and Pattern Recognition, CVPR, New Orleans, LA, USA, 2022, pp. 5718– 5729
2022
-
[40]
Event- equalized dense video captioning,
K. Wu, P. Li, J. Fu, Y . Li, Y . Wu, Y . Liu, J. Wang, and S. Zhou, “Event- equalized dense video captioning,” inProceedings of the IEEE/CVF Conference on Computer Vision and Pattern Recognition, 2025, pp. 8417–8427
2025
-
[41]
Cloud-egan: Rethinking cyclegan from a feature enhancement perspective for cloud removal by combining cnn and transformer,
X. Ma, Y . Huang, X. Zhang, M.-O. Pun, and B. Huang, “Cloud-egan: Rethinking cyclegan from a feature enhancement perspective for cloud removal by combining cnn and transformer,”IEEE Journal of Selected Topics in Applied Earth Observations and Remote Sensing, vol. 16, pp. 4999–5012, 2023
2023
-
[42]
Cloudformer: A cloud-removal network combining self-attention mechanism and convolution,
P. Wu, Z. Pan, H. Tang, and Y . Hu, “Cloudformer: A cloud-removal network combining self-attention mechanism and convolution,”Remote. Sens., vol. 14, no. 23, p. 6132, 2022
2022
-
[43]
Density guided and frequency modulation dehazing network for remote sensing images,
H. Liu, J. Huang, J. Nie, J. Xie, L. Chen, and X. Zhou, “Density guided and frequency modulation dehazing network for remote sensing images,” IEEE Journal of Selected Topics in Applied Earth Observations and Remote Sensing, pp. 1–13, 2025
2025
-
[44]
Tsmcf: Transformer-based sar and multispectral cross-attention fusion for cloud removal,
H. Zhu, Z. Wang, L. Han, M. Xu, W. Li, Q. Liu, S. Liu, and B. Du, “Tsmcf: Transformer-based sar and multispectral cross-attention fusion for cloud removal,”IEEE Journal of Selected Topics in Applied Earth Observations and Remote Sensing, vol. 18, pp. 6710–6720, 2025
2025
-
[45]
Transformers are rnns: Fast autoregressive transformers with linear attention,
A. Katharopoulos, A. Vyas, N. Pappas, and F. Fleuret, “Transformers are rnns: Fast autoregressive transformers with linear attention,” in Proceedings of the 37th International Conference on Machine Learning, Virtual Event, 2020, pp. 5156–5165
2020
-
[46]
SOFT: softmax-free transformer with linear complexity,
J. Lu, J. Yao, J. Zhang, X. Zhu, H. Xu, W. Gao, C. Xu, T. Xiang, and L. Zhang, “SOFT: softmax-free transformer with linear complexity,” inAdvances in Neural Information Processing Systems 34: Annual Conference on Neural Information Processing Systems, virtual, 2021, pp. 21 297–21 309
2021
-
[47]
Squeeze-and-excitation networks,
J. Hu, L. Shen, and G. Sun, “Squeeze-and-excitation networks,” in Proceedings of the IEEE conference on computer vision and pattern recognition, 2018, pp. 7132–7141
2018
-
[48]
Free- form image inpainting with gated convolution,
J. Yu, Z. Lin, J. Yang, X. Shen, X. Lu, and T. S. Huang, “Free- form image inpainting with gated convolution,” inProceedings of the IEEE/CVF international conference on computer vision, 2019, pp. 4471– 4480
2019
-
[49]
Language modeling with gated convolutional networks,
Y . N. Dauphin, A. Fan, M. Auli, and D. Grangier, “Language modeling with gated convolutional networks,” inInternational conference on machine learning, 2017, pp. 933–941
2017
-
[50]
Sparse self- attention transformer for image inpainting,
W. Huang, Y . Deng, S. Hui, Y . Wu, S. Zhou, and J. Wang, “Sparse self- attention transformer for image inpainting,”Pattern Recognition, vol. 145, p. 109897, 2024
2024
-
[51]
Gated convolutional networks for cloud removal from bi-temporal remote sensing images,
P. Dai, S. Ji, and Y . Zhang, “Gated convolutional networks for cloud removal from bi-temporal remote sensing images,”Remote Sensing, vol. 12, no. 20, p. 3427, 2020
2020
-
[52]
Cloud removal with sar-optical data fusion using a unified spatial–spectral residual network,
Y . Wang, B. Zhang, W. Zhang, D. Hong, B. Zhao, and Z. Li, “Cloud removal with sar-optical data fusion using a unified spatial–spectral residual network,”IEEE Transactions on Geoscience and Remote Sens- ing, vol. 62, pp. 1–20, 2024
2024
-
[53]
cosformer: Rethinking softmax in attention,
Z. Qin, W. Sun, H. Deng, D. Li, Y . Wei, B. Lv, J. Yan, L. Kong, and Y . Zhong, “cosformer: Rethinking softmax in attention,” inProceedings of 10th International Conference on Learning Representations, ICLR, Virtual Event, April 25-29, 2022
2022
-
[54]
Flatten transformer: Vision transformer using focused linear attention,
D. Han, X. Pan, S. Song, and G. Huang, “Flatten transformer: Vision transformer using focused linear attention,” inProceedings of IEEE/CVF International Conference on Computer Vision, Paris, France, 2023, pp. 5938–5948
2023
-
[55]
Mb-taylorformer v2: Improved multi-branch linear transformer expanded by taylor formula for image restoration,
Z. Jin, Y . Qiu, K. Zhang, H. Li, and W. Luo, “Mb-taylorformer v2: Improved multi-branch linear transformer expanded by taylor formula for image restoration,”TPAMI, 2025
2025
-
[56]
Gridformer: Residual dense transformer with grid structure for image restoration in adverse weather conditions,
G. R. dense transformer with grid structure for image restoration in adverse weather conditions, “Gridformer: Residual dense transformer with grid structure for image restoration in adverse weather conditions,” International Journal of Computer Vision, pp. 1–23, 2024
2024
-
[57]
Deep dense multi-scale network for snow removal using semantic and depth priors,
K. Zhang, R. Li, Y . Yu, W. Luo, and C. Li, “Deep dense multi-scale network for snow removal using semantic and depth priors,”IEEE Transactions on Image Processing, vol. 30, pp. 7419–7431, 2021
2021
-
[58]
Wavelet approximation-aware residual network for single image deraining,
W.-Y . Hsu and W.-C. Chang, “Wavelet approximation-aware residual network for single image deraining,”IEEE transactions on pattern analysis and machine intelligence, vol. 45, no. 12, pp. 15 979–15 995, 2023
2023
-
[59]
Mobilenets: Efficient convolutional neural networks for mobile vision applications,
A. G. Howard, “Mobilenets: Efficient convolutional neural networks for mobile vision applications,”arXiv preprint arXiv:1704.04861, 2017
Pith/arXiv arXiv 2017
-
[60]
A remote sensing image dataset for cloud removal,
D. Lin, G. Xu, X. Wang, Y . Wang, X. Sun, and K. Fu, “A remote sensing image dataset for cloud removal,”CoRR, vol. abs/1901.00600, 2019
Pith/arXiv arXiv 1901
-
[61]
Multisensor data fusion for cloud removal in global and all-season sentinel-2 imagery,
P. Ebel, A. Meraner, M. Schmitt, and X. X. Zhu, “Multisensor data fusion for cloud removal in global and all-season sentinel-2 imagery,” IEEE Transactions on Geoscience and Remote Sensing, vol. 59, no. 7, pp. 5866–5878, 2020
2020
-
[62]
Uncrtaints: Uncertainty quantification for cloud removal in optical satellite time series,
P. Ebel, V . S. F. Garnot, M. Schmitt, J. D. Wegner, and X. X. Zhu, “Uncrtaints: Uncertainty quantification for cloud removal in optical satellite time series,” inProceedings of the IEEE/CVF Conference on Computer Vision and Pattern Recognition, 2023, pp. 2086–2096
2023
-
[63]
Image-to-image translation with conditional adversarial networks,
P. Isola, J. Zhu, T. Zhou, and A. A. Efros, “Image-to-image translation with conditional adversarial networks,” inProceedings of IEEE Confer- ence on Computer Vision and Pattern Recognition, Honolulu, HI, USA, 2017, pp. 5967–5976
2017
-
[64]
CTGAN : Cloud transformer generative adver- sarial network,
G. Huang and P. Wu, “CTGAN : Cloud transformer generative adver- sarial network,” inProceedings of IEEE International Conference on Image Processing, Bordeaux, France, 2022, pp. 511–515
2022
-
[65]
Root mean square error (rmse) or mean absolute error (mae)?–arguments against avoiding rmse in the literature,
T. Chai and R. R. Draxler, “Root mean square error (rmse) or mean absolute error (mae)?–arguments against avoiding rmse in the literature,” Geoscientific model development, vol. 7, no. 3, pp. 1247–1250, 2014
2014
-
[66]
The spectral image processing system (sips)—interactive visualization and analysis of imaging spectrometer data,
F. A. Kruse, A. Lefkoff, J. Boardman, K. Heidebrecht, A. Shapiro, P. Barloon, and A. Goetz, “The spectral image processing system (sips)—interactive visualization and analysis of imaging spectrometer data,”Remote sensing of environment, vol. 44, no. 2-3, pp. 145–163, 1993
1993
-
[67]
Peak signal-to-noise ratio revisited: Is simple beautiful?
J. Korhonen and J. You, “Peak signal-to-noise ratio revisited: Is simple beautiful?” inProceedings of 4th International Workshop on Quality of Multimedia Experience, 2012, pp. 37–38
2012
-
[68]
Image quality assessment: from error visibility to structural similarity,
Z. Wang, A. C. Bovik, H. R. Sheikh, and E. P. Simoncelli, “Image quality assessment: from error visibility to structural similarity,”IEEE Transactions on Image Processing, vol. 13, no. 4, pp. 600–612, 2004
2004
-
[69]
Cloud removal in sentinel-2 imagery using a deep residual neural network and sar-optical data fusion,
A. Meraner, P. Ebel, X. X. Zhu, and M. Schmitt, “Cloud removal in sentinel-2 imagery using a deep residual neural network and sar-optical data fusion,”ISPRS Journal of Photogrammetry and Remote Sensing, vol. 166, pp. 333–346, 2020. JOURNAL OF LATEX CLASS FILES, VOL. X, NO. X, X 2025 16
2020
-
[70]
Pvt v2: Improved baselines with pyramid vision transformer,
W. Wang, E. Xie, X. Li, D.-P. Fan, K. Song, D. Liang, T. Lu, P. Luo, and L. Shao, “Pvt v2: Improved baselines with pyramid vision transformer,” Computational Visual Media, vol. 8, no. 3, pp. 415–424, 2022
2022
-
[71]
Semantic-aware representation learning for homography estimation,
Y . Liu, Q. Huang, S. Hui, J. Fu, S. Zhou, K. Wu, P. Li, and J. Wang, “Semantic-aware representation learning for homography estimation,” in Proceedings of the 32nd ACM International Conference on Multimedia, 2024, pp. 2506–2514
2024
-
[72]
Mind the gap: Aligning vision foundation models to image feature matching,
Y . Liu, J. Fu, Y . Wu, K. Wu, P. Li, J. Wu, S. Zhou, and J. Xin, “Mind the gap: Aligning vision foundation models to image feature matching,” inProceedings of the IEEE/CVF International Conference on Computer Vision, 2025, pp. 20 313–20 323
2025
-
[73]
Patchcue: Enhancing vision-language model reasoning with patch- based visual cues,
Y . Qi, P. Fu, H. Li, Y . Liu, C. Jiang, B. Qin, Z. Luo, and J. Luan, “Patchcue: Enhancing vision-language model reasoning with patch- based visual cues,”arXiv preprint arXiv:2603.05869, 2026
arXiv 2026
-
[74]
Shaping schema via language representation as the next frontier for llm intelligence expanding,
Z. Yang, Y . Liu, J. Fu, M. Sugiyama, N. Zhenget al., “Shaping schema via language representation as the next frontier for llm intelligence expanding,”arXiv preprint arXiv:2605.09271, 2026
Pith/arXiv arXiv 2026
-
[75]
Structured progressive knowledge ac- tivation for llm-driven neural architecture search,
Z. Liu, Y . Liu, and J. Fu, “Structured progressive knowledge ac- tivation for llm-driven neural architecture search,”arXiv preprint arXiv:2605.04057, 2026
Pith/arXiv arXiv 2026
discussion (0)
Sign in with ORCID, Apple, or X to comment. Anyone can read and Pith papers without signing in.