pith. machine review for the scientific record. sign in

arxiv: 2602.12652 · v2 · submitted 2026-02-13 · 💻 cs.CV

Recognition: no theorem link

CBEN -- A Multimodal Machine Learning Dataset for Cloud Robust Remote Sensing Image Understanding

Authors on Pith no claims yet

Pith reviewed 2026-05-15 22:56 UTC · model grok-4.3

classification 💻 cs.CV
keywords cloud robust remote sensingmultimodal datasetoptical radar fusioncloudy satellite imageryBigEarthNetmachine learning remote sensingcloud occlusion
0
0 comments X

The pith

Multimodal remote sensing models drop 23-33 points on cloudy images but recover most performance when trained with cloudy optical-radar pairs

A machine-rendered reading of the paper's core claim, the machinery that carries it, and where it could break.

The paper introduces CloudyBigEarthNet, a dataset of paired optical and radar satellite images that include cloud cover. It shows that current state-of-the-art methods trained on clear-sky data lose substantial accuracy when clouds appear in the optical images. Retraining those methods with the cloudy examples produces clear gains on cloudy test cases. This matters because many practical uses, such as disaster response, need analysis even when skies are overcast and cannot rely on cloud removal steps that introduce artifacts.

Core claim

By assembling CloudyBigEarthNet of paired optical and radar images containing cloud occlusions, the authors establish that state-of-the-art multimodal methods suffer 23-33 percentage point drops in average precision on cloudy images, yet adapting training to include cloudy optical data yields relative improvements of 17.2-28.7 percentage points on cloudy test cases.

What carries the argument

The CBEN dataset of paired cloudy optical and radar images that enables training and evaluation of cloud-robust remote sensing models.

If this is right

  • State-of-the-art multimodal methods experience 23-33 percentage point drops in average precision when tested on cloudy images.
  • Including cloudy optical data in training produces 17.2-28.7 percentage point relative gains on cloudy test cases.
  • Cloud-robust methods become feasible without depending on cloud removal preprocessing.
  • The approach supports time-sensitive applications such as natural disaster monitoring that cannot wait for clear skies.

Where Pith is reading between the lines

These are editorial extensions of the paper, not claims the author makes directly.

  • The same training strategy could be tested on additional weather-affected modalities or sensor combinations.
  • CBEN could serve as a benchmark to compare cloud removal techniques against direct robust modeling.
  • Operational pipelines might integrate radar data routinely rather than only when clouds are detected.

Load-bearing premise

The cloud occlusions and image pairings in CBEN represent the cloudy conditions models will meet in real operations across sensors, regions, and seasons.

What would settle it

Evaluating the adapted models on cloudy images from a different sensor or geographic region and observing that the performance gains over clear-sky training disappear.

Figures

Figures reproduced from arXiv: 2602.12652 by Koichi Kise, Marco Stricker, Masakazu Iwamura.

Figure 1
Figure 1. Figure 1: Top row shows optical (RGB channels) of a subarea of the prefecture [PITH_FULL_IMAGE:figures/full_fig_p002_1.png] view at source ↗
Figure 2
Figure 2. Figure 2: Example location of SSL4EO-S12. The columns from left to right [PITH_FULL_IMAGE:figures/full_fig_p004_2.png] view at source ↗
Figure 3
Figure 3. Figure 3: Visualization of several samples of BigEarthNet with corresponding [PITH_FULL_IMAGE:figures/full_fig_p005_3.png] view at source ↗
Figure 4
Figure 4. Figure 4: Illustration of how BigEarthNet downloaded images. Boxes with [PITH_FULL_IMAGE:figures/full_fig_p006_4.png] view at source ↗
Figure 5
Figure 5. Figure 5: Visualization of different tiles and patches of our dataset CBEN with [PITH_FULL_IMAGE:figures/full_fig_p006_5.png] view at source ↗
Figure 6
Figure 6. Figure 6: Visual representation of the self-supervised task masked autoencoder [PITH_FULL_IMAGE:figures/full_fig_p007_6.png] view at source ↗
Figure 7
Figure 7. Figure 7: Visualized experimental architecture. (a) Trained experiment without [PITH_FULL_IMAGE:figures/full_fig_p008_7.png] view at source ↗
read the original abstract

Clouds are a common phenomenon that distorts optical satellite imagery, which poses a challenge for remote sensing. However, in the literature cloudless analysis is often performed where cloudy images are excluded from machine learning datasets and methods. Such an approach cannot be applied to time sensitive applications, e.g., during natural disasters. A possible solution is to apply cloud removal as a preprocessing step to ensure that cloudfree solutions are not failing under such conditions. But cloud removal methods are still actively researched and suffer from drawbacks, such as generated visual artifacts. Therefore, it is desirable to develop cloud robust methods that are less affected by cloudy weather. Cloud robust methods can be achieved by combining optical data with radar, a modality unaffected by clouds. While many datasets for machine learning combine optical and radar data, most researchers exclude cloudy images. We identify this exclusion from machine learning training and evaluation as a limitation that reduces applicability to cloudy scenarios. To investigate this, we assembled a dataset, named CloudyBigEarthNet (CBEN), of paired optical and radar images with cloud occlusion for training and evaluation. Using average precision (AP) as the evaluation metric, we show that state-of-the-art methods trained on combined clear-sky optical and radar imagery suffer performance drops of 23-33 percentage points when evaluated on cloudy images. We then adapt these methods to cloudy optical data during training, achieving relative improvement of 17.2-28.7 percentage points on cloudy test cases compared with the original approaches. Code and dataset are publicly available at: https://github.com/mstricker13/CBEN

Editorial analysis

A structured set of objections, weighed in public.

Desk editor's note, referee report, simulated authors' rebuttal, and a circularity audit. Tearing a paper down is the easy half of reading it; the pith above is the substance, this is the friction.

Referee Report

3 major / 2 minor

Summary. The paper introduces the CloudyBigEarthNet (CBEN) dataset of paired Sentinel-1 radar and Sentinel-2 optical images with cloud occlusions assembled from BigEarthNet. It reports that state-of-the-art multimodal methods trained on clear-sky optical+radar data suffer average-precision drops of 23-33 percentage points when evaluated on cloudy images, and that adapting the same methods by including cloudy optical data during training produces relative improvements of 17.2-28.7 percentage points on cloudy test cases.

Significance. If the empirical results hold after fuller documentation, the work supplies a publicly released benchmark dataset and reproducible code that directly quantifies the performance penalty of ignoring clouds in multimodal remote-sensing pipelines. This is practically relevant for time-critical applications such as disaster monitoring where cloud-free acquisitions cannot be guaranteed.

major comments (3)
  1. [Abstract / Dataset Construction] Abstract and dataset-construction section: the cloud-occlusion criterion, temporal pairing rule, and cloud-fraction threshold used to select CBEN patches are not specified, so it is impossible to judge whether the measured 23-33 pp drops and 17.2-28.7 pp gains are artifacts of the particular occlusion distribution chosen or representative of operational conditions.
  2. [Experimental Results] Experimental results: no error bars, standard deviations across runs, or statistical significance tests accompany the reported AP figures, leaving open the possibility that the observed differences fall within run-to-run variance.
  3. [Experimental Setup] Baseline and adaptation details: the precise state-of-the-art architectures, training hyperparameters, and exact procedure for incorporating cloudy optical samples are not enumerated, preventing independent verification of the adaptation gains.
minor comments (2)
  1. [Abstract] The abstract states 'relative improvement' without clarifying whether the percentage points are computed relative to the degraded clear-trained performance or as absolute gains; a short clarifying sentence would remove ambiguity.
  2. [Related Work] Consider adding a short table or paragraph comparing CBEN cloud statistics (mean cloud fraction, seasonal coverage) to other public multimodal datasets to help readers gauge novelty.

Simulated Author's Rebuttal

3 responses · 0 unresolved

We thank the referee for the constructive feedback, which highlights important aspects for improving the clarity, reproducibility, and statistical rigor of our CBEN dataset paper. We address each major comment below and commit to revisions that strengthen the manuscript without altering the core empirical findings.

read point-by-point responses
  1. Referee: [Abstract / Dataset Construction] Abstract and dataset-construction section: the cloud-occlusion criterion, temporal pairing rule, and cloud-fraction threshold used to select CBEN patches are not specified, so it is impossible to judge whether the measured 23-33 pp drops and 17.2-28.7 pp gains are artifacts of the particular occlusion distribution chosen or representative of operational conditions.

    Authors: We agree that explicit specification of these selection criteria is necessary for reproducibility and to demonstrate that the reported performance drops and gains are representative rather than artifacts. The manuscript describes the overall assembly process from BigEarthNet using paired Sentinel-1/Sentinel-2 acquisitions and cloud masks, but the precise numerical thresholds and pairing rules are not stated with sufficient detail in the abstract or main text. In the revision we will add a dedicated paragraph in Section 3 (Dataset Construction) and update the abstract to specify the exact cloud-occlusion criterion, temporal pairing window, and cloud-fraction threshold employed. revision: yes

  2. Referee: [Experimental Results] Experimental results: no error bars, standard deviations across runs, or statistical significance tests accompany the reported AP figures, leaving open the possibility that the observed differences fall within run-to-run variance.

    Authors: We acknowledge that the absence of error bars and statistical tests limits the strength of the claims. The current results are based on single-run evaluations using the standard AP metric on the CBEN splits. In the revised manuscript we will rerun all experiments across multiple random seeds, report mean AP values together with standard deviations, and include paired statistical significance tests (e.g., t-tests) between the clear-sky and cloudy-trained models to confirm that the 23-33 pp drops and 17.2-28.7 pp gains exceed run-to-run variance. revision: yes

  3. Referee: [Experimental Setup] Baseline and adaptation details: the precise state-of-the-art architectures, training hyperparameters, and exact procedure for incorporating cloudy optical samples are not enumerated, preventing independent verification of the adaptation gains.

    Authors: We agree that fuller enumeration is required for independent verification. The manuscript refers to “state-of-the-art multimodal methods” and notes that code is publicly released, but does not list the exact model variants, hyperparameter values, or the precise mixing strategy for cloudy samples. In the revision we will expand the Experimental Setup section to name the specific architectures, list all training hyperparameters, and describe the exact procedure used to incorporate cloudy optical data (including dataset splits and training protocol). We will also add direct pointers to the corresponding code files. revision: yes

Circularity Check

0 steps flagged

No significant circularity

full rationale

The paper introduces a new dataset (CBEN) assembled from existing Sentinel-1/2 patches and reports direct empirical results from training and evaluating models on clear vs. cloudy splits. No mathematical derivations, fitted parameters renamed as predictions, self-referential equations, or load-bearing self-citations appear in the reported claims. All performance deltas (23-33 pp drops, 17.2-28.7 pp gains) are obtained from standard train/test splits on held-out data within the dataset itself, making the work self-contained against external benchmarks with no reduction of outputs to inputs by construction.

Axiom & Free-Parameter Ledger

0 free parameters · 1 axioms · 0 invented entities

No free parameters or invented entities are introduced; the contribution rests on standard remote sensing assumptions about image pairing and cloud occlusion representation.

axioms (1)
  • domain assumption Paired optical and radar images from the source dataset can be spatially aligned even when clouds occlude the optical view.
    Required for constructing the paired cloudy dataset from existing sources.

pith-pipeline@v0.9.0 · 5591 in / 1260 out tokens · 63010 ms · 2026-05-15T22:56:33.035056+00:00 · methodology

discussion (0)

Sign in with ORCID, Apple, or X to comment. Anyone can read and Pith papers without signing in.

Reference graph

Works this paper leans on

88 extracted references · 88 canonical work pages · 5 internal anchors

  1. [1]

    Spatial and temporal distribution of clouds observed by modis onboard the terra and aqua satellites,

    M. D. King, S. Platnick, W. P. Menzel, S. A. Ackerman, and P. A. Hubanks, “Spatial and temporal distribution of clouds observed by modis onboard the terra and aqua satellites,”IEEE transactions on geoscience and remote sensing, vol. 51, no. 7, pp. 3826–3852, 2013

  2. [2]

    Cloud/shadow detection based on spectral indices for multi/hyperspectral optical remote sensing imagery,

    H. Zhai, H. Zhang, L. Zhang, and P. Li, “Cloud/shadow detection based on spectral indices for multi/hyperspectral optical remote sensing imagery,”ISPRS journal of photogrammetry and remote sensing, vol. 144, pp. 235–253, 2018

  3. [3]

    Impact of three-dimensional radiative effects on satellite retrievals of cloud droplet sizes,

    A. Marshak, S. Platnick, T. Várnai, G. Wen, and R. F. Cahalan, “Impact of three-dimensional radiative effects on satellite retrievals of cloud droplet sizes,”Journal of Geophysical Research: Atmospheres, vol. 111, no. D9, 2006

  4. [4]

    Cloud removal from satellite images using a deep learning model with the cloud-matting method,

    D. Ma, R. Wu, D. Xiao, and B. Sui, “Cloud removal from satellite images using a deep learning model with the cloud-matting method,” Remote Sensing, vol. 15, no. 4, p. 904, 2023

  5. [5]

    Explaining the effects of clouds on remote sensing scene classification,

    J. Gawlikowski, P. Ebel, M. Schmitt, and X. X. Zhu, “Explaining the effects of clouds on remote sensing scene classification,”IEEE Journal of Selected Topics in Applied Earth Observations and Remote Sensing, vol. 15, pp. 9976–9986, 2022

  6. [6]

    Ssl4eo-s12: A large-scale multimodal, multitemporal dataset for self-supervised learning in earth observation [software and data sets],

    Y . Wang, N. A. A. Braham, Z. Xiong, C. Liu, C. M. Albrecht, and X. X. Zhu, “Ssl4eo-s12: A large-scale multimodal, multitemporal dataset for self-supervised learning in earth observation [software and data sets],” IEEE Geoscience and Remote Sensing Magazine, vol. 11, no. 3, pp. 98–106, 2023

  7. [7]

    Ssl4eo- l: Datasets and foundation models for landsat imagery,

    A. Stewart, N. Lehmann, I. Corley, Y . Wang, Y .-C. Chang, N. A. Ait Ali Braham, S. Sehgal, C. Robinson, and A. Banerjee, “Ssl4eo- l: Datasets and foundation models for landsat imagery,”Advances in Neural Information Processing Systems, vol. 36, pp. 59 787–59 807, 2023

  8. [8]

    Sen1floods11: A georeferenced dataset to train and test deep learning flood algorithms for sentinel-1,

    D. Bonafilia, B. Tellman, T. Anderson, and E. Issenberg, “Sen1floods11: A georeferenced dataset to train and test deep learning flood algorithms for sentinel-1,” inProceedings of the IEEE/CVF conference on computer vision and pattern recognition workshops, 2020, pp. 210–211

  9. [9]

    Eurosat: A novel dataset and deep learning benchmark for land use and land cover classi- fication,

    P. Helber, B. Bischke, A. Dengel, and D. Borth, “Eurosat: A novel dataset and deep learning benchmark for land use and land cover classi- fication,”IEEE Journal of Selected Topics in Applied Earth Observations and Remote Sensing, vol. 12, no. 7, pp. 2217–2226, 2019

  10. [10]

    Bigearthnet-mm: A large- scale, multimodal, multilabel benchmark archive for remote sensing image classification and retrieval [software and data sets],

    G. Sumbul, A. De Wall, T. Kreuziger, F. Marcelino, H. Costa, P. Bene- vides, M. Caetano, B. Demir, and V . Markl, “Bigearthnet-mm: A large- scale, multimodal, multilabel benchmark archive for remote sensing image classification and retrieval [software and data sets],”IEEE Geo- science and Remote Sensing Magazine, vol. 9, no. 3, pp. 174–180, 2021. 11

  11. [11]

    Earthview: a large scale remote sensing dataset for self-supervision,

    D. Velazquez, P. Rodriguez, S. Alonso, J. M. Gonfaus, J. Gonzalez, G. Richarte, J. Marin, Y . Bengio, and A. Lacoste, “Earthview: a large scale remote sensing dataset for self-supervision,” inProceedings of the Winter Conference on Applications of Computer Vision, 2025, pp. 1228– 1237

  12. [12]

    Evaluating feature selection methods and machine learning algorithms for mapping mangrove forests using optical and synthetic aperture radar data,

    Z. Shen, J. Miao, J. Wang, D. Zhao, A. Tang, and J. Zhen, “Evaluating feature selection methods and machine learning algorithms for mapping mangrove forests using optical and synthetic aperture radar data,” Remote Sensing, vol. 15, no. 23, p. 5621, 2023

  13. [13]

    Optical and sar data fusion based on transformer for rice identification: A comparative analysis from early to late integration,

    C. He, J. Song, and H. Xu, “Optical and sar data fusion based on transformer for rice identification: A comparative analysis from early to late integration,”Agriculture, vol. 15, no. 7, p. 706, 2025

  14. [14]

    A comparison of satellite remote sensing data fusion methods to map peat swamp forest loss in sumatra, indonesia,

    M. Crowson, E. Warren-Thomas, J. K. Hill, B. Hariyadi, F. Agus, A. Saad, K. C. Hamer, J. A. Hodgson, W. D. Kartika, J. Luceyet al., “A comparison of satellite remote sensing data fusion methods to map peat swamp forest loss in sumatra, indonesia,”Remote sensing in ecology and conservation, vol. 5, no. 3, pp. 247–258, 2019

  15. [15]

    Remotely sensed high-resolution global cloud dynamics for predicting ecosystem and biodiversity distributions,

    A. M. Wilson and W. Jetz, “Remotely sensed high-resolution global cloud dynamics for predicting ecosystem and biodiversity distributions,” PLoS biology, vol. 14, no. 3, p. e1002415, 2016

  16. [16]

    There are no data like more data: Datasets for deep learning in earth observation,

    M. Schmitt, S. A. Ahmadi, Y . Xu, G. Ta¸ skin, U. Verma, F. Sica, and R. Hänsch, “There are no data like more data: Datasets for deep learning in earth observation,”IEEE Geoscience and Remote Sensing Magazine, vol. 11, no. 3, pp. 63–97, 2023

  17. [17]

    Multisensor approach to land use and land cover mapping in brazilian amazon,

    V . H. R. Prudente, S. Skakun, L. V . Oldoni, H. A. Xaud, M. R. Xaud, M. Adami, and I. D. Sanches, “Multisensor approach to land use and land cover mapping in brazilian amazon,”ISPRS Journal of Photogrammetry and Remote Sensing, vol. 189, pp. 95–109, 2022

  18. [18]

    Preventing illegal logging,

    S. T. Thompson and W. B. Magrath, “Preventing illegal logging,”Forest Policy and Economics, vol. 128, p. 102479, 2021

  19. [19]

    Change detection and estimation of illegal mining using satellite images,

    M. Suresh and K. Jain, “Change detection and estimation of illegal mining using satellite images,” inProceedings of 2nd International conference of Innovation in Electronics and communication Engineering (ICIECE-2013), 2013

  20. [20]

    Opti- mizing marine ecosystem protection: Combining sar and optical remote sensing with ai for oil spill and illegal fishing,

    S. Harinivashini, T. Harini, J. G. B. Patturose, and R. Priscilla, “Opti- mizing marine ecosystem protection: Combining sar and optical remote sensing with ai for oil spill and illegal fishing,” in2024 International Conference on Advances in Computing, Communication and Materials (ICACCM). IEEE, 2024, pp. 1–5

  21. [21]

    Monitoring and governance of illegal urban construction

    Y . Ding, D. Ouyang, Y . Yang, and B. Yang, “Monitoring and governance of illegal urban construction.”Sensors & Materials, vol. 36, 2024

  22. [22]

    Fusing digital elevation maps with satellite imagery for flood mapping,

    M. Stricker, T. Miyamoto, K. Iselborn, M. Nuske, and A. Dengel, “Fusing digital elevation maps with satellite imagery for flood mapping,” inIGARSS 2023-2023 IEEE International Geoscience and Remote Sensing Symposium. IEEE, 2023, pp. 6290–6293

  23. [23]

    Effect of terrain information on multimodal deep learning for flood disaster detection,

    T. Miyamoto, M. Stricker, J. Ogishima, K. Iselborn, M. Nuske, and A. Dengel, “Effect of terrain information on multimodal deep learning for flood disaster detection,” inIGARSS 2023-2023 IEEE International Geoscience and Remote Sensing Symposium. IEEE, 2023, pp. 448–451

  24. [24]

    Advances of satellite remote sensing technology in earthquake prediction,

    X. Zhao, S. Pan, Z. Sun, H. Guo, L. Zhang, and K. Feng, “Advances of satellite remote sensing technology in earthquake prediction,”Natural Hazards Review, vol. 22, no. 1, p. 03120001, 2021

  25. [25]

    Early wildfire detection technologies in practice—a review,

    A. Mohapatra and T. Trinh, “Early wildfire detection technologies in practice—a review,”Sustainability, vol. 14, no. 19, p. 12270, 2022

  26. [26]

    Uncrtaints: Uncertainty quantification for cloud removal in optical satellite time series,

    P. Ebel, V . S. F. Garnot, M. Schmitt, J. D. Wegner, and X. X. Zhu, “Uncrtaints: Uncertainty quantification for cloud removal in optical satellite time series,” inProceedings of the IEEE/CVF Conference on Computer Vision and Pattern Recognition, 2023, pp. 2086–2096

  27. [27]

    A lightweight machine-learning method for cloud removal in remote sensing images constrained by conditional information,

    W. Zhang, H. Zhang, X. Zhang, X. Shen, and L. Zou, “A lightweight machine-learning method for cloud removal in remote sensing images constrained by conditional information,”Remote Sensing, vol. 16, no. 17, p. 3134, 2024

  28. [28]

    Assessing global sentinel-2 coverage dynamics and data availability for operational earth observation (eo) applications using the eo-compass,

    M. Sudmanns, D. Tiede, H. Augustin, and S. Lang, “Assessing global sentinel-2 coverage dynamics and data availability for operational earth observation (eo) applications using the eo-compass,”International jour- nal of digital earth, vol. 13, no. 7, pp. 768–784, 2020

  29. [29]

    Mcanet: A joint semantic segmentation framework of optical and sar images for land use classification,

    X. Li, G. Zhang, H. Cui, S. Hou, S. Wang, X. Li, Y . Chen, Z. Li, and L. Zhang, “Mcanet: A joint semantic segmentation framework of optical and sar images for land use classification,”International Journal of Applied Earth Observation and Geoinformation, vol. 106, p. 102638, 2022

  30. [30]

    Flood detection in time series of optical and sar images,

    C. Rambour, N. Audebert, E. Koeniguer, B. Le Saux, M. Crucianu, and M. Datcu, “Flood detection in time series of optical and sar images,” ISPRS International Archives of the Photogrammetry, Remote Sensing and Spatial Information Sciences, vol. 43, pp. 1343–1346, 2020

  31. [31]

    From orbit to ground: A comprehensive review of multimodal self-supervised learning for remote sensing,

    L. Bai, X. Zhang, W. Qin, J. Long, H. Wang, X. Dong, and S. Du, “From orbit to ground: A comprehensive review of multimodal self-supervised learning for remote sensing,”Authorea Preprints, 2025

  32. [32]

    Skysense: A multi-modal remote sensing foundation model towards universal interpretation for earth observation imagery,

    X. Guo, J. Lao, B. Dang, Y . Zhang, L. Yu, L. Ru, L. Zhong, Z. Huang, K. Wu, D. Huet al., “Skysense: A multi-modal remote sensing foundation model towards universal interpretation for earth observation imagery,” inProceedings of the IEEE/CVF Conference on Computer Vision and Pattern Recognition, 2024, pp. 27 672–27 683

  33. [33]

    Omnisat: Self- supervised modality fusion for earth observation,

    G. Astruc, N. Gonthier, C. Mallet, and L. Landrieu, “Omnisat: Self- supervised modality fusion for earth observation,” inEuropean Confer- ence on Computer Vision. Springer, 2024, pp. 409–427

  34. [34]

    Self-supervised learning in remote sensing: A review,

    Y . Wang, C. M. Albrecht, N. A. A. Braham, L. Mou, and X. X. Zhu, “Self-supervised learning in remote sensing: A review,”IEEE Geoscience and Remote Sensing Magazine, vol. 10, no. 4, pp. 213–247, 2022

  35. [35]

    Functional map of the world,

    G. Christie, N. Fendley, J. Wilson, and R. Mukherjee, “Functional map of the world,” inProceedings of the IEEE Conference on Computer Vision and Pattern Recognition, 2018, pp. 6172–6180

  36. [36]

    Deepglobe 2018: A challenge to parse the earth through satellite images,

    I. Demir, K. Koperski, D. Lindenbaum, G. Pang, J. Huang, S. Basu, F. Hughes, D. Tuia, and R. Raskar, “Deepglobe 2018: A challenge to parse the earth through satellite images,” inProceedings of the IEEE conference on computer vision and pattern recognition workshops, 2018, pp. 172–181

  37. [37]

    Skysense v2: A unified foundation model for multi-modal remote sensing,

    Y . Zhang, L. Ru, K. Wu, L. Yu, L. Liang, Y . Li, and J. Chen, “Skysense v2: A unified foundation model for multi-modal remote sensing,”arXiv preprint arXiv:2507.13812, 2025

  38. [38]

    Masked au- toencoders are scalable vision learners,

    K. He, X. Chen, S. Xie, Y . Li, P. Dollár, and R. Girshick, “Masked au- toencoders are scalable vision learners,” inProceedings of the IEEE/CVF conference on computer vision and pattern recognition, 2022, pp. 16 000–16 009

  39. [39]

    Improved Baselines with Momentum Contrastive Learning

    X. Chen, H. Fan, R. Girshick, and K. He, “Improved baselines with mo- mentum contrastive learning,”arXiv preprint arXiv:2003.04297, 2020

  40. [40]

    Deep residual learning for image recognition,

    K. He, X. Zhang, S. Ren, and J. Sun, “Deep residual learning for image recognition,” inProceedings of the IEEE conference on computer vision and pattern recognition, 2016, pp. 770–778

  41. [41]

    An Image is Worth 16x16 Words: Transformers for Image Recognition at Scale

    A. Dosovitskiy, L. Beyer, A. Kolesnikov, D. Weissenborn, X. Zhai, T. Unterthiner, M. Dehghani, M. Minderer, G. Heigold, S. Gellyet al., “An image is worth 16x16 words: Transformers for image recognition at scale,”arXiv preprint arXiv:2010.11929, 2020

  42. [42]

    Decoupling common and unique representations for multimodal self-supervised learning,

    Y . Wang, C. M. Albrecht, N. A. A. Braham, C. Liu, Z. Xiong, and X. X. Zhu, “Decoupling common and unique representations for multimodal self-supervised learning,” inEuropean Conference on Computer Vision. Springer, 2024, pp. 286–303

  43. [43]

    Croma: Remote sensing representa- tions with contrastive radar-optical masked autoencoders,

    A. Fuller, K. Millard, and J. Green, “Croma: Remote sensing representa- tions with contrastive radar-optical masked autoencoders,”Advances in Neural Information Processing Systems, vol. 36, pp. 5506–5538, 2023

  44. [44]

    Omnivore: A single model for many visual modalities,

    R. Girdhar, M. Singh, N. Ravi, L. Van Der Maaten, A. Joulin, and I. Misra, “Omnivore: A single model for many visual modalities,” inProceedings of the IEEE/CVF conference on computer vision and pattern recognition, 2022, pp. 16 102–16 112

  45. [45]

    A Remote Sensing Image Dataset for Cloud Removal

    D. Lin, G. Xu, X. Wang, Y . Wang, X. Sun, and K. Fu, “A remote sensing image dataset for cloud removal,”arXiv preprint arXiv:1901.00600, 2019

  46. [46]

    Cloud removal from satellite images using spatiotemporal generator networks,

    V . Sarukkai, A. Jain, B. Uzkent, and S. Ermon, “Cloud removal from satellite images using spatiotemporal generator networks,” inProceed- ings of the IEEE/CVF winter conference on applications of computer vision, 2020, pp. 1796–1805

  47. [47]

    Deep learning based thin cloud removal fusing vegetation red edge and short wave infrared spectral information for sentinel-2a imagery,

    J. Li, Z. Wu, Z. Hu, Z. Li, Y . Wang, and M. Molinier, “Deep learning based thin cloud removal fusing vegetation red edge and short wave infrared spectral information for sentinel-2a imagery,”Remote Sensing, vol. 13, no. 1, p. 157, 2021

  48. [48]

    Sen12ms-cr-ts: A remote- sensing data set for multimodal multitemporal cloud removal,

    P. Ebel, Y . Xu, M. Schmitt, and X. X. Zhu, “Sen12ms-cr-ts: A remote- sensing data set for multimodal multitemporal cloud removal,”IEEE Transactions on Geoscience and Remote Sensing, vol. 60, pp. 1–14, 2022

  49. [49]

    Multimodal and multiresolution data fusion for high-resolution cloud removal: A novel baseline and benchmark,

    F. Xu, Y . Shi, P. Ebel, W. Yang, and X. X. Zhu, “Multimodal and multiresolution data fusion for high-resolution cloud removal: A novel baseline and benchmark,”IEEE Transactions on Geoscience and Remote Sensing, vol. 62, pp. 1–15, 2023

  50. [50]

    Filmy cloud removal on satellite imagery with multispectral conditional generative adversarial nets,

    K. Enomoto, K. Sakurada, W. Wang, H. Fukui, M. Matsuoka, R. Naka- mura, and N. Kawaguchi, “Filmy cloud removal on satellite imagery with multispectral conditional generative adversarial nets,” inProceed- ings of the IEEE conference on computer vision and pattern recognition workshops, 2017, pp. 48–56

  51. [51]

    Cloud-gan: Cloud removal for sentinel-2 imagery using a cyclic consistent generative adversarial networks,

    P. Singh and N. Komodakis, “Cloud-gan: Cloud removal for sentinel-2 imagery using a cyclic consistent generative adversarial networks,” in 12 IGARSS 2018-2018 IEEE International Geoscience and Remote Sensing Symposium. IEEE, 2018, pp. 1772–1775

  52. [52]

    Deep internal learning for inpainting of cloud-affected regions in satellite imagery,

    M. Czerkawski, P. Upadhyay, C. Davison, A. Werkmeister, J. Cardona, R. Atkinson, C. Michie, I. Andonovic, M. Macdonald, and C. Tachtatzis, “Deep internal learning for inpainting of cloud-affected regions in satellite imagery,”Remote Sensing, vol. 14, no. 6, p. 1342, 2022

  53. [53]

    Enhancing urban monitor- ing in cloudy conditions: a novel framework for synergizing cloud- contaminated optical and polsar data,

    J. Ling, H. Zhang, R. Liu, and Z. Sun, “Enhancing urban monitor- ing in cloudy conditions: a novel framework for synergizing cloud- contaminated optical and polsar data,”International Journal of Digital Earth, vol. 17, no. 1, p. 2430676, 2024

  54. [54]

    Cloudseg: A multi- modal learning framework for robust land cover mapping under cloudy conditions,

    F. Xu, Y . Shi, W. Yang, G.-S. Xia, and X. X. Zhu, “Cloudseg: A multi- modal learning framework for robust land cover mapping under cloudy conditions,”ISPRS Journal of Photogrammetry and Remote Sensing, vol. 214, pp. 21–32, 2024

  55. [55]

    Sentinel-2 land cover classification: State-of-the-art methods and the reality of operational deployment—a systematic review

    A. F. Jocea, L. Porumb, L. Necula, and D. Raducanu, “Sentinel-2 land cover classification: State-of-the-art methods and the reality of operational deployment—a systematic review.”Sustainability (2071- 1050), vol. 17, no. 22, 2025

  56. [56]

    Mapping cover crop species in southeastern michigan using sentinel-2 satellite data and google earth engine,

    X. Wang, J. Blesh, P. Rao, A. Paliwal, M. Umashaanker, and M. Jain, “Mapping cover crop species in southeastern michigan using sentinel-2 satellite data and google earth engine,”Frontiers in Artificial Intelli- gence, vol. 6, p. 1035502, 2023

  57. [57]

    Evaluation of sentinel-2 red-edge bands for empirical estimation of green lai and chlorophyll content,

    J. Delegido, J. Verrelst, L. Alonso, and J. Moreno, “Evaluation of sentinel-2 red-edge bands for empirical estimation of green lai and chlorophyll content,”Sensors, vol. 11, no. 7, pp. 7063–7081, 2011

  58. [58]

    Overview of sentinel-2,

    F. Spoto, O. Sy, P. Laberinti, P. Martimort, V . Fernandez, O. Colin, B. Hoersch, and A. Meygret, “Overview of sentinel-2,” in2012 IEEE international geoscience and remote sensing symposium. IEEE, 2012, pp. 1707–1710

  59. [59]

    Research advances of sar remote sensing for agriculture applications: A review,

    S. Yun, T. Hasiet al., “Research advances of sar remote sensing for agriculture applications: A review,”Journal of integrative agriculture, vol. 18, no. 3, pp. 506–525, 2019

  60. [60]

    Ssl4eo-s12 v1. 1: A multimodal, multiseasonal dataset for pretraining, updated,

    B. Blumenstiel, N. A. A. Braham, C. M. Albrecht, S. Maurogiovanni, and P. Fraccaro, “Ssl4eo-s12 v1. 1: A multimodal, multiseasonal dataset for pretraining, updated,”arXiv preprint arXiv:2503.00168, 2025

  61. [61]

    Anysat: One earth observation model for many resolutions, scales, and modalities,

    G. Astruc, N. Gonthier, C. Mallet, and L. Landrieu, “Anysat: One earth observation model for many resolutions, scales, and modalities,” inPro- ceedings of the Computer Vision and Pattern Recognition Conference, 2025, pp. 19 530–19 540

  62. [62]

    Mmearth: Exploring multi-modal pretext tasks for geospatial representation learning,

    V . Nedungadi, A. Kariryaa, S. Oehmcke, S. Belongie, C. Igel, and N. Lang, “Mmearth: Exploring multi-modal pretext tasks for geospatial representation learning,” inEuropean Conference on Computer Vision. Springer, 2024, pp. 164–182

  63. [63]

    Multisensor data fusion for cloud removal in global and all-season sentinel-2 imagery,

    P. Ebel, A. Meraner, M. Schmitt, and X. X. Zhu, “Multisensor data fusion for cloud removal in global and all-season sentinel-2 imagery,” IEEE Transactions on Geoscience and Remote Sensing, vol. 59, no. 7, pp. 5866–5878, 2020

  64. [64]

    SEN12MS -- A Curated Dataset of Georeferenced Multi-Spectral Sentinel-1/2 Imagery for Deep Learning and Data Fusion

    M. Schmitt, L. H. Hughes, C. Qiu, and X. X. Zhu, “Sen12ms–a curated dataset of georeferenced multi-spectral sentinel-1/2 imagery for deep learning and data fusion,”arXiv preprint arXiv:1906.07789, 2019

  65. [65]

    Aggregating cloud-free sentinel-2 images with google earth engine,

    ——, “Aggregating cloud-free sentinel-2 images with google earth engine,”ISPRS Annals of the Photogrammetry, Remote Sensing and Spatial Information Sciences, vol. 4, pp. 145–152, 2019

  66. [66]

    Satlaspretrain: A large-scale dataset for remote sensing image under- standing,

    F. Bastani, P. Wolters, R. Gupta, J. Ferdinando, and A. Kembhavi, “Satlaspretrain: A large-scale dataset for remote sensing image under- standing,” inProceedings of the IEEE/CVF International Conference on Computer Vision, 2023, pp. 16 772–16 782

  67. [67]

    Deep learning in neural networks: An overview,

    J. Schmidhuber, “Deep learning in neural networks: An overview,” Neural networks, vol. 61, pp. 85–117, 2015

  68. [68]

    Revisiting unreasonable effectiveness of data in deep learning era,

    C. Sun, A. Shrivastava, S. Singh, and A. Gupta, “Revisiting unreasonable effectiveness of data in deep learning era,” inProceedings of the IEEE international conference on computer vision, 2017, pp. 843–852

  69. [69]

    Feature dimensionality reduction: a review,

    W. Jia, M. Sun, J. Lian, and S. Hou, “Feature dimensionality reduction: a review,”Complex & Intelligent Systems, vol. 8, no. 3, pp. 2663–2693, 2022

  70. [70]

    A review of deep transfer learning and recent advancements,

    M. Iman, H. R. Arabnia, and K. Rasheed, “A review of deep transfer learning and recent advancements,”Technologies, vol. 11, no. 2, p. 40, 2023

  71. [71]

    A brief introduction to weakly supervised learning,

    Z.-H. Zhou, “A brief introduction to weakly supervised learning,” National science review, vol. 5, no. 1, pp. 44–53, 2018

  72. [72]

    A survey on deep semi- supervised learning,

    X. Yang, Z. Song, I. King, and Z. Xu, “A survey on deep semi- supervised learning,”IEEE transactions on knowledge and data engi- neering, vol. 35, no. 9, pp. 8934–8954, 2022

  73. [73]

    Self- supervised learning: A succinct review,

    V . Rani, S. T. Nabi, M. Kumar, A. Mittal, and K. Kumar, “Self- supervised learning: A succinct review,”Archives of Computational Methods in Engineering, vol. 30, no. 4, pp. 2761–2775, 2023

  74. [74]

    An empirical study of training self- supervised vision transformers,

    X. Chen, S. Xie, and K. He, “An empirical study of training self- supervised vision transformers,” inProceedings of the IEEE/CVF in- ternational conference on computer vision, 2021, pp. 9640–9649

  75. [75]

    Momentum contrast for unsupervised visual representation learning,

    K. He, H. Fan, Y . Wu, S. Xie, and R. Girshick, “Momentum contrast for unsupervised visual representation learning,” inProceedings of the IEEE/CVF conference on computer vision and pattern recognition, 2020, pp. 9729–9738

  76. [76]

    A simple framework for contrastive learning of visual representations,

    T. Chen, S. Kornblith, M. Norouzi, and G. Hinton, “A simple framework for contrastive learning of visual representations,” inInternational conference on machine learning. PmLR, 2020, pp. 1597–1607

  77. [77]

    Decoupled Weight Decay Regularization

    I. Loshchilov and F. Hutter, “Decoupled weight decay regularization,” arXiv preprint arXiv:1711.05101, 2017

  78. [78]

    Bootstrap your own latent-a new approach to self-supervised learning,

    J.-B. Grill, F. Strub, F. Altché, C. Tallec, P. Richemond, E. Buchatskaya, C. Doersch, B. Avila Pires, Z. Guo, M. Gheshlaghi Azaret al., “Bootstrap your own latent-a new approach to self-supervised learning,” Advances in neural information processing systems, vol. 33, pp. 21 271– 21 284, 2020

  79. [79]

    A review of practical ai for remote sensing in earth sciences,

    B. Janga, G. P. Asamani, Z. Sun, and N. Cristea, “A review of practical ai for remote sensing in earth sciences,”Remote Sensing, vol. 15, no. 16, p. 4112, 2023

  80. [80]

    Imagenet classification with deep convolutional neural networks,

    A. Krizhevsky, I. Sutskever, and G. E. Hinton, “Imagenet classification with deep convolutional neural networks,”Advances in neural informa- tion processing systems, vol. 25, 2012

Showing first 80 references.