pith. sign in

arxiv: 2606.17242 · v1 · pith:B2RY75FXnew · submitted 2026-06-15 · 💻 cs.CV

Landsat-Sentinel-2 Algal Bloom Mapping Using Vision Transformers: Model Description, Implementation, and Examples

Pith reviewed 2026-06-27 03:50 UTC · model grok-4.3

classification 💻 cs.CV
keywords algal bloom mappingvision transformersLandsatSentinel-2deep learningremote sensingcoastal monitoringfloating algae
0
0 comments X

The pith

Vision transformers can map coastal algal blooms at 30 m resolution from Landsat and Sentinel-2 imagery, outperforming spectral indices under clouds and glint.

A machine-rendered reading of the paper's core claim, the machinery that carries it, and where it could break.

The paper shows the first application of vision transformer models to detect floating algal blooms in coastal waters using harmonized 30 m multispectral images from Landsat-8/9 and Sentinel-2. A global dataset of bloom patches was assembled across diverse hotspots, and four transformer architectures were tested against a convolutional baseline under varying water types, atmospheres, and surface conditions. All models detected blooms with omission and commission errors of 8-65 percent; in time-series tests the Swin Transformer avoided the widespread false positives produced by traditional spectral-index methods when clouds and sun glint were present. Comparisons with MODIS products illustrated the advantage of the finer resolution for fragmented bloom structures. The work positions deep learning as a practical route to consistent medium-resolution bloom monitoring where bio-optical methods are limited by spectral coverage.

Core claim

This study demonstrates the first successful implementation of vision transformer-based coastal algal bloom mapping using 30-m Landsat-Sentinel-2 images. A globally distributed bloom patch dataset was generated and used to train and compare four transformer architectures against a convolutional baseline. All deep learning models detected floating bloom areas with omission and commission errors of 8-65 percent. In time-series observations under cloud and glint stress the Swin Transformer outperformed spectral-index approaches by avoiding false positives, while higher spatial resolution revealed fragmented blooms not captured by MODIS products.

What carries the argument

Vision transformer architectures (including Swin Transformer) trained on a globally distributed labeled bloom patch dataset to classify pixels in harmonized 30 m Landsat-Sentinel-2 multispectral imagery as bloom or non-bloom.

If this is right

  • Higher spatial resolution enables detection of small and irregularly shaped bloom patches that coarser sensors miss.
  • The Swin Transformer maintains performance when clouds and sun glint are present, reducing false positives that affect spectral-index methods.
  • Deep learning offers a data-driven alternative that works without harmonized bio-optical reflectance products.
  • Consistent global monitoring of floating algal blooms becomes feasible at 2-3 day revisit intervals.

Where Pith is reading between the lines

These are editorial extensions of the paper, not claims the author makes directly.

  • The approach could be extended to near-real-time operational alerts for coastal water-quality management.
  • Similar transformer pipelines might apply to other dynamic coastal features such as sediment plumes or oil slicks.
  • Fusion with additional sensors could further increase temporal density without sacrificing the 30 m spatial detail.

Load-bearing premise

The manually created global bloom patch dataset must be labeled accurately enough to serve as ground truth across diverse optical water types, atmospheric conditions, and surface states.

What would settle it

Independent field sampling or higher-resolution airborne imagery that reveals substantial labeling errors in the bloom patch dataset would invalidate the reported detection accuracies and the claim of outperformance over spectral indices.

read the original abstract

Coastal algal bloom monitoring requires frequent, spatially detailed, and globally consistent observations, provided by Landsat-8/9 and Sentinel-2 A/B/C. Together, these missions offer over a decade of medium-resolution multispectral imagery with near-global coverage every 2-3 days, enabling the detection of fragmented bloom structures not resolvable by coarse ocean-color sensors. However, their use in aquatic environments remains challenging due to limited spectral coverage and a lack of harmonized reflectance products. As an alternative to traditional bio-optical methods, deep learning-based image classification offers a data-driven approach that can overcome many of these limitations. This study presents the first successful implementation of vision transformer-based coastal algal bloom mapping using 30-m Landsat-Sentinel-2 images. A globally distributed bloom patch dataset was generated across bloom-prone coastal hotspots worldwide. Four transformer-based architectures were compared against a standard convolutional baseline for fine-scale bloom detection, and assessed under different optical water types and atmospheric and surface conditions. All deep learning models showed strong capabilities in detecting floating bloom areas, with omission and commission errors of 8-65%. Under cloud and glint stress in a time series, the Swin Transformer outperformed traditional spectral-index approaches, which produced widespread false positives, effectively avoiding cloud- and glint-affected pixels. Comparisons with MODIS-derived products further highlighted the benefits of higher spatial resolution in detecting fragmented and irregularly affected blooms. Our findings support deep learning as a reliable tool for medium-resolution, consistent monitoring of floating algal blooms in dynamic coastal environments.

Editorial analysis

A structured set of objections, weighed in public.

Desk editor's note, referee report, simulated authors' rebuttal, and a circularity audit. Tearing a paper down is the easy half of reading it; the pith above is the substance, this is the friction.

Referee Report

2 major / 0 minor

Summary. The manuscript claims to present the first successful implementation of vision transformer-based models for coastal algal bloom mapping at 30-m resolution using harmonized Landsat-8/9 and Sentinel-2 imagery. It describes generation of a globally distributed bloom patch dataset across coastal hotspots, compares four transformer architectures to a convolutional baseline under varying optical water types and atmospheric conditions, reports omission and commission errors of 8-65% for all deep learning models, and states that the Swin Transformer outperforms traditional spectral-index methods under cloud and glint stress in time series while also showing advantages over MODIS products due to finer spatial resolution.

Significance. If the empirical results are supported by validated ground truth, the work would establish a data-driven alternative to bio-optical methods for detecting fragmented floating blooms in dynamic coastal zones, leveraging the combined temporal and spatial coverage of Landsat and Sentinel-2. The stress-test comparisons and resolution benefits could inform operational monitoring protocols where spectral indices produce false positives.

major comments (2)
  1. [Abstract] Abstract: The central performance claims (omission and commission errors of 8-65%) and the assertion that Swin Transformer outperforms spectral indices under cloud/glint conditions rest on an undescribed 'globally distributed bloom patch dataset.' No information is supplied on dataset size, labeling protocol, inter-annotator agreement, train-test split strategy, or validation against in-situ or higher-resolution references, rendering it impossible to determine whether the reported errors and outperformance reflect model capability or labeling artifacts.
  2. [Abstract] Abstract: The claim that 'all deep learning models showed strong capabilities' with errors of 8-65% is presented without statistical significance testing, confidence intervals, or breakdown by optical water type, atmospheric condition, or surface state; this absence prevents assessment of whether the wide error range supports the reliability conclusion or indicates high variability that undermines the cross-condition claims.

Simulated Author's Rebuttal

2 responses · 0 unresolved

We thank the referee for the constructive comments on the abstract. We agree that additional methodological details and statistical analyses are required to support the performance claims and will revise the manuscript accordingly.

read point-by-point responses
  1. Referee: [Abstract] Abstract: The central performance claims (omission and commission errors of 8-65%) and the assertion that Swin Transformer outperforms spectral indices under cloud/glint conditions rest on an undescribed 'globally distributed bloom patch dataset.' No information is supplied on dataset size, labeling protocol, inter-annotator agreement, train-test split strategy, or validation against in-situ or higher-resolution references, rendering it impossible to determine whether the reported errors and outperformance reflect model capability or labeling artifacts.

    Authors: We acknowledge that the abstract (and current methods description) lacks sufficient detail on the bloom patch dataset. In the revised manuscript we will add a dedicated subsection describing dataset size, labeling protocol (including any inter-annotator agreement metrics), train-test split strategy, and validation against in-situ or higher-resolution references. This will allow readers to evaluate whether the reported errors reflect model performance or labeling artifacts. revision: yes

  2. Referee: [Abstract] Abstract: The claim that 'all deep learning models showed strong capabilities' with errors of 8-65% is presented without statistical significance testing, confidence intervals, or breakdown by optical water type, atmospheric condition, or surface state; this absence prevents assessment of whether the wide error range supports the reliability conclusion or indicates high variability that undermines the cross-condition claims.

    Authors: We agree that the performance claims require statistical support and stratification. In the revised manuscript we will add statistical significance testing (e.g., paired t-tests or Wilcoxon tests with p-values), confidence intervals on omission/commission errors, and breakdowns of results by optical water type, atmospheric condition, and surface state. These additions will clarify whether the 8-65% range reflects acceptable variability or undermines the cross-condition assertions. revision: yes

Circularity Check

0 steps flagged

No circularity: empirical ML evaluation on external dataset

full rationale

The paper reports an empirical comparison of vision transformer architectures against baselines for bloom detection on a generated dataset. No equations, parameter fits presented as predictions, self-definitional loops, or load-bearing self-citations appear in the provided text. Performance claims rest on held-out test data and external benchmarks (MODIS), making the work self-contained against those benchmarks rather than reducing to its own inputs by construction.

Axiom & Free-Parameter Ledger

0 free parameters · 0 axioms · 0 invented entities

Review performed on abstract only; full methods, dataset construction details, and any hyperparameter choices are unavailable. Consequently the ledger lists only the minimal assumptions visible in the provided text.

pith-pipeline@v0.9.1-grok · 5807 in / 1271 out tokens · 46083 ms · 2026-06-27T03:50:37.544371+00:00 · methodology

discussion (0)

Sign in with ORCID, Apple, or X to comment. Anyone can read and Pith papers without signing in.

Reference graph

Works this paper leans on

24 extracted references · 8 canonical work pages · 1 internal anchor

  1. [1]

    Detection of surface algal blooms using the newly developed algorithm surface algal bloom index (SABI)

    Alawadi, F. Detection of surface algal blooms using the newly developed algorithm surface algal bloom index (SABI). Proc. Volume 7825, Remote Sens. of the Ocean., Sea Ice, and Large Water Regions, 2010.Alexander, L.M. The delimitation of maritime boundaries. Political Geo. Quar.,

  2. [2]

    Monitoring cyanoHABs and water quality in Laguna Lake (Philippines) with Sentinel-2 satellites during the 2020 Pacific typhoon season

    Caballero, I., Navarro, G. Monitoring cyanoHABs and water quality in Laguna Lake (Philippines) with Sentinel-2 satellites during the 2020 Pacific typhoon season. Sci. Tot. Environ. 788,

  3. [3]

    A deep learning method for cyanobacterial harmful algae blooms prediction in Taihu Lake, China

    Cao, H.; Han, L.; Li, L. A deep learning method for cyanobacterial harmful algae blooms prediction in Taihu Lake, China. Harmful Algae, 113, 102189, 2022 Choi, Y.; et al. Defining Foundation Models for Computational Science: A Call for Clarity and Rigor. ArXiv,

  4. [4]

    arXiv preprint arXiv:2505.22904 , year =

    arXiv:2505.22904v2. Claverie, M.; Ju, J.; Masek, J.G.; Dungan, J.L.; Vermote, E.F.; Roger, J.C.; Skakun, S.V.; Justice, C. The harmonized Landsat and Sentinel -2 surface reflectance data set. Remote Sens. of Environ., 219, 145-161,

  5. [5]

    Assessing and refining the satellite -derived massive green macro-algal coverage in the Yellow Sea with high resolution images

    arXiv:2207.08051v3.Cui, T.W.; Laing, X.J.; Gong, J.L.; Tong, C.; Xiao, Y.F.; Liu, R.J.; Zhang, X.; Zhang, J. Assessing and refining the satellite -derived massive green macro-algal coverage in the Yellow Sea with high resolution images. ISPRS J. of Photogrammetry and Remote Sens., v. 144,

  6. [6]

    An image is worth 16x16 words: transformers for image recognition at scale

    41 Dosovitskiy, A., Beyer, L., Kolesnikov, A., Weissenborn, D., Zhai, X., Unterthiner, T., Dehghani, M., Minderer, M., Heigold, G., Gelly, S., Uszkoreit, J., Houlsby, N., 2021a. An image is worth 16x16 words: transformers for image recognition at scale. In: 9th Int. Conference on Learning Representations, ICLR 2021, Virtual Event, Austria, May 3-7,

  7. [7]

    Response of phytoplankton assemblages to variations in environmental parameters in a subtropical bay (Chabahar Bay, Iran): harmful algal blooms and coastal hypoxia

    Ershadifar, H., Koochaknejad, E., Ghazilou, A., Kor, K., Negarestan, H., Baskaleh, G. Response of phytoplankton assemblages to variations in environmental parameters in a subtropical bay (Chabahar Bay, Iran): harmful algal blooms and coastal hypoxia. Reg. Stud. Mar. Sci. 39, 2020, Feng, D., Zhang, Z., and Yan, K. A semantic segmentation method for remote ...

  8. [8]

    Distribution of float ing Sargassum in the Gulf of Mexico and the Atlantic Ocean

    Gower, J., & King, S. Distribution of float ing Sargassum in the Gulf of Mexico and the Atlantic Ocean. mapped using MERIS. Int. J. of Remote Sens., 32, 1917–1929,

  9. [9]

    Swin Transformer Embedding UNet for Remote Sens

    He, X.; Zhou, Y.; Zhao, J.; Zhang, D.; Yao, R.; Xue, Y. Swin Transformer Embedding UNet for Remote Sens. Image Semantic Segmentation. IEEE Trans. on Geosci. and Remote Sens., v. 60, 2022 Hordiiuk, D.; Oliinyk, I.; Hnatushenko, V.; Maksymov, K. Semantic Seg mentation for Ships Detection from Satellite Imagery, 2019 IEEE 39th Int. Conference on Electronics ...

  10. [10]

    Sargassum coverage in the northeastern Gulf of Mexico during 2010 from Landsat and airborne observations: Implications for the Deepwater Horizon oil spill impact assessment

    Hu, C.; et al. Sargassum coverage in the northeastern Gulf of Mexico during 2010 from Landsat and airborne observations: Implications for the Deepwater Horizon oil spill impact assessment. Mar. Pollution Bull., v. 107,

  11. [11]

    Passive optical remote sensing of cyanobacteria and other intense phytoplankton blooms in coastal and inland waters

    https://doi.org/10.1016/0034-4257(93)90013-N Kutser, T. Passive optical remote sensing of cyanobacteria and other intense phytoplankton blooms in coastal and inland waters. Int. J. of Remote Sens., v. 30, n. 17, p. 4401-4425,

  12. [12]

    Li, J.; R oy, D.P

    arXiv:1911.03090v1. Li, J.; R oy, D.P. A Global Analysis of Sentinel -2A, Sentinel -2B and Landsat -8 Data Revisit Intervals and Implications for Terrestrial Monitoring. Remote Sens.,

  13. [13]

    Change in area of Ebinur Lake during the 1998 - 2005 period

    Ma, M.; Wang, X.; Veroustraetes, F.; Dong, L. Change in area of Ebinur Lake during the 1998 - 2005 period. International Journal of Remote Sensing, v. 28, n. 24,

  14. [14]

    Plumes of discolored water of volcanic origin and possible implications for algal communities

    Mantas, V.M.; Pereira, A.J.S.C.; Morais, P.V. Plumes of discolored water of volcanic origin and possible implications for algal communities. The case of the Home Reef eruption of 2006 (Tonga, Southwest Pacific Ocean.). Remote Sens. Environ. 115,

  15. [15]

    Qi, L., Tsai, S.F., Chen, Y., Le, C., Hu, C

    arXiv:2408.13296v1. Qi, L., Tsai, S.F., Chen, Y., Le, C., Hu, C. In search of red Noctiluca scintillans blooms in the East China Sea. Geophys. Res. Lett. 46, 5997–6004,

  16. [16]

    Deep Learning is Robust to Massive Label Noise

    arXiv:1705.10694 Roux, P., Siano, R., Souchu, P., Collin, K., Schmitt, A., Manach, S., Schapira, M. Spatio-temporal dynamics and biogeochemical properties of green seawater discolorations caused by the marine dinoflagellate Lepidodinium chlorophorum along southern Brittany coast. Estuar. Coast. Shelf Sci. 107950,

  17. [17]

    Prithvi -EO-2.0: A Versatile Multi -Temporal Foundation Model for Earth Observation Applications

    https://doi.org/10.1016/j.rse.2024.114223 Szwarcman, D.; et al. Prithvi -EO-2.0: A Versatile Multi -Temporal Foundation Model for Earth Observation Applications. ArXiv,

  18. [18]

    Towards Seamless Global 30 -m Terrestrial Monitoring: Evaluating 2022 Cloud Free Coverage of Harmonized Landsat and Sentinel -2 (HLS) V2.0

    Zhou, Q.; et al. Towards Seamless Global 30 -m Terrestrial Monitoring: Evaluating 2022 Cloud Free Coverage of Harmonized Landsat and Sentinel -2 (HLS) V2.0. IEEE Geosci. and Remote Sens. Lett., v. 22,

  19. [19]

    In: 2018 9th Int

    Weighted Res -UNet for high -quality retina vessel segmentation. In: 2018 9th Int. Conference on Information Technol. in Medicine and Education (ITME), pp. 327–331,

  20. [20]

    Foundation Models for Remote Sens

    Xiao, A.; Xuan, W.; Wang, J.; Huang, J.; Tao, D.; Lu, S.; Yokoya, N. Foundation Models for Remote Sens. and Earth Observation: A Survey. ArXvi, arXiv:2410.16602v2,

  21. [21]

    The Coastal Ocean Circulation Influence on the 2018 West Florida Shelf K

    Weisberg, R.H.; Liu, Y.; Lembke, C.; Hu, C.; Hubbard, K.; Garrett, M. The Coastal Ocean Circulation Influence on the 2018 West Florida Shelf K. brevis Red Tide Bloom. JGR Oceans, v. 124,

  22. [22]

    The input image, resized to 224×224, is initially partitioned into non -overlapping 4×4 patches, each projected to a 96-dimensional token

    Architecture of the Swin Transformer model. The input image, resized to 224×224, is initially partitioned into non -overlapping 4×4 patches, each projected to a 96-dimensional token. The encoder comprises four hierarchical stages ([2, 2, 6, 2] ST blocks, green boxes), each integrating layer normalization, W -MSA, and MLP layers to produce four resolution ...

  23. [23]

    Metrics include recall, precision, F1 -score (Dice coefficient), commiss ion error (false positives), and omission error (false negatives)

    Evaluation metrics for algal bloom segmentation using five deep learning models: ResUNet, Vanilla ViT, Swin Transformer, SegFormer, and MAE (Prithvi). Metrics include recall, precision, F1 -score (Dice coefficient), commiss ion error (false positives), and omission error (false negatives). A buffer of 1 pixel was included in the ground mask, providing a c...

  24. [24]

    Metrics include recall, precision, F1-score (Dice coefficient), commission error (false positives), and omission error (false negatives)

    Evaluation metrics for algal bloom segmentation using Swin Transformer trained on three input channel configurations. Metrics include recall, precision, F1-score (Dice coefficient), commission error (false positives), and omission error (false negatives). Metrics VNIR + NDVI + FAI VNIR + NDVI VNIR Recall 0.87 0.87 0.89 Precision 0.42 0.41 0.37 F1-Score (D...