OceanMAE: A Foundation Model for Ocean Remote Sensing
Pith reviewed 2026-05-10 17:55 UTC · model grok-4.3
The pith
Integrating physically meaningful ocean descriptors into masked autoencoder pre-training improves downstream marine segmentation quality.
A machine-rendered reading of the paper's core claim, the machinery that carries it, and where it could break.
Core claim
OceanMAE extends standard MAE pre-training by jointly encoding multispectral Sentinel-2 observations and physically meaningful ocean descriptors on the Hydro dataset, producing latent representations that transfer to a modified UNet framework and deliver stronger marine pollutant and debris segmentation on MADOS and MARIDA together with competitive bathymetry results on MagicBathyNet.
What carries the argument
The auxiliary ocean descriptors added to the masked autoencoder pre-training objective, which guide the model toward ocean-aware latent representations from unlabeled multispectral imagery.
If this is right
- OceanMAE produces its largest accuracy gains on marine debris and pollutant segmentation tasks.
- Bathymetry estimation benefits remain competitive and vary with the specific regression setup.
- A controlled ablation confirms that the ocean descriptors themselves drive measurable downstream improvement over a plain MAE baseline.
- The resulting representations support transfer to both segmentation and regression heads via a shared UNet-style decoder.
Where Pith is reading between the lines
- The same descriptor-injection pattern could be tested on other remote-sensing domains that possess domain-specific physical variables.
- If the descriptors remain effective across different sensor resolutions, the method offers a route to build more general ocean foundation models without task-specific labels.
- Public release of code and weights allows direct replication and extension on additional ocean datasets.
Load-bearing premise
The selected ocean descriptors are physically meaningful and sufficiently independent of the downstream task labels that their use in pre-training genuinely aids generalization rather than introducing dataset-specific leakage.
What would settle it
Retraining the same architecture on the Hydro dataset without the auxiliary ocean descriptors and observing no improvement or a drop in segmentation metrics on the MARIDA test set relative to the full OceanMAE model.
Figures
read the original abstract
Accurate ocean mapping is essential for applications such as bathymetry estimation, seabed characterization, marine litter detection, and ecosystem monitoring. However, ocean remote sensing (RS) remains constrained by limited labeled data and by the reduced transferability of models pre-trained mainly on land-dominated Earth observation imagery. In this paper, we propose OceanMAE, an ocean-specific masked autoencoder that extends standard MAE pre-training by integrating multispectral Sentinel-2 observations with physically meaningful ocean descriptors during self-supervised learning. By incorporating these auxiliary ocean features, OceanMAE is designed to learn more informative and ocean-aware latent representations from large- scale unlabeled data. To transfer these representations to downstream applications, we further employ a modified UNet-based framework for marine segmentation and bathymetry estimation. Pre-trained on the Hydro dataset, OceanMAE is evaluated on MADOS and MARIDA for marine pollutant and debris segmentation, and on MagicBathyNet for bathymetry regression. The experiments show that OceanMAE yields the strongest gains on marine segmentation, while bathymetry benefits are competitive and task-dependent. In addition, an ablation against a standard MAE on MARIDA indicates that incorporating auxiliary ocean descriptors during pre-training improves downstream segmentation quality. These findings highlight the value of physically informed and domain-aligned self-supervised pre- training for ocean RS. Code and weights are publicly available at https://git.tu-berlin.de/joanna.stamer/SSLORS2.
Editorial analysis
A structured set of objections, weighed in public.
Referee Report
Summary. The manuscript introduces OceanMAE, a masked autoencoder pre-trained on the Hydro dataset that augments standard MAE with auxiliary ocean descriptors (e.g., chlorophyll concentration, sea-surface temperature) derived from Sentinel-2 multispectral observations. The model is transferred via a modified UNet to downstream tasks: marine pollutant/debris segmentation on MADOS and MARIDA, and bathymetry regression on MagicBathyNet. The central empirical claim is that the auxiliary descriptors yield stronger gains on segmentation than a standard MAE baseline, as shown by an ablation on MARIDA.
Significance. If the performance gains are shown to arise from genuinely ocean-aware representations rather than leakage, the work would provide a useful domain-adapted foundation model for ocean remote sensing, where labeled data are scarce. Public release of code and weights supports reproducibility and is a clear strength.
major comments (2)
- [Ablation study] Ablation study (abstract and experiments section): The reported improvement of OceanMAE over standard MAE on MARIDA does not include any quantitative check (correlation coefficients, mutual information, or per-descriptor ablation) that the auxiliary descriptors are statistically independent of the marine debris/pollutant segmentation labels. Without this, the performance gap could be explained by implicit weak supervision during pre-training rather than improved generalization.
- [Methods] Methods section: The description of how auxiliary ocean descriptors are encoded, normalized, and fused into the MAE encoder/decoder (including changes to input dimensionality, positional embeddings, or the reconstruction loss) is insufficiently detailed to allow replication or to assess whether the integration is parameter-free or introduces new hyperparameters that could affect the claimed gains.
minor comments (2)
- [Abstract] Abstract: No quantitative metrics, dataset sizes, or error bars are provided despite the claim of 'strongest gains'; adding these would strengthen the summary.
- [Experiments] Evaluation protocol: Clarify the exact fine-tuning procedure, number of epochs, learning-rate schedule, and whether the same data augmentations are used for both OceanMAE and the standard MAE baseline.
Simulated Author's Rebuttal
We thank the referee for their constructive and insightful comments. We address each major point below and will revise the manuscript accordingly to improve clarity and strengthen the empirical claims.
read point-by-point responses
-
Referee: [Ablation study] Ablation study (abstract and experiments section): The reported improvement of OceanMAE over standard MAE on MARIDA does not include any quantitative check (correlation coefficients, mutual information, or per-descriptor ablation) that the auxiliary descriptors are statistically independent of the marine debris/pollutant segmentation labels. Without this, the performance gap could be explained by implicit weak supervision during pre-training rather than improved generalization.
Authors: We agree this is a valid concern: without explicit independence checks, the observed gains could partly reflect correlations between the auxiliary descriptors and the downstream labels rather than purely improved generalization. Although pre-training remains fully self-supervised (no segmentation labels are used), the descriptors are derived from the same Sentinel-2 observations and could carry implicit information. In the revised manuscript we will add a dedicated analysis subsection that reports Pearson correlations and mutual information between each auxiliary descriptor and the MARIDA labels, together with per-descriptor ablation results. These additions will allow readers to assess the degree of any leakage and better attribute the performance improvements. revision: yes
-
Referee: [Methods] Methods section: The description of how auxiliary ocean descriptors are encoded, normalized, and fused into the MAE encoder/decoder (including changes to input dimensionality, positional embeddings, or the reconstruction loss) is insufficiently detailed to allow replication or to assess whether the integration is parameter-free or introduces new hyperparameters that could affect the claimed gains.
Authors: We acknowledge that the current methods description is too high-level for full reproducibility. In the revised version we will expand the OceanMAE architecture subsection to specify: (i) the exact normalization applied to each descriptor (z-score using Hydro dataset statistics), (ii) the encoding mechanism (concatenation as additional input channels with an adjusted linear patch embedding layer), (iii) any consequent changes to positional embeddings, and (iv) confirmation that the reconstruction loss remains the standard masked MSE with no auxiliary terms. We will also state explicitly that the only new design choice is the selection of the four descriptors; no additional hyperparameters are introduced beyond the original MAE configuration. revision: yes
Circularity Check
No circularity: purely empirical ablation with no derivation chain
full rationale
The paper presents no mathematical derivation, first-principles result, or predictive claim that reduces to its own inputs by construction. All load-bearing evidence consists of empirical ablations (OceanMAE vs. standard MAE on MARIDA) and downstream evaluations on public datasets (MADOS, MARIDA, MagicBathyNet). Pre-training incorporates auxiliary ocean descriptors by design, but the performance gap is measured externally rather than being tautological. No self-citation load-bearing steps, uniqueness theorems, or fitted parameters renamed as predictions appear. The work is self-contained as an empirical study with public code.
Axiom & Free-Parameter Ledger
axioms (1)
- domain assumption Masked reconstruction on multispectral imagery plus auxiliary descriptors produces ocean-aware latent features that transfer to segmentation and regression.
Lean theorems connected to this paper
-
IndisputableMonolith/Cost/FunctionalEquation.leanwashburn_uniqueness_aczel unclear?
unclearRelation between the paper passage and the cited Recognition theorem.
OceanMAE adapts the MAE framework to ocean imagery by augmenting representation learning with external oceanic variables... bathymetry, chlorophyll level, and Secchi depth... linearly projected... concatenated with the E_CLS token
-
IndisputableMonolith/Foundation/BranchSelection.leanbranch_selection unclear?
unclearRelation between the paper passage and the cited Recognition theorem.
an ablation against a standard MAE on MARIDA indicates that incorporating auxiliary ocean descriptors during pre-training improves downstream segmentation quality
What do these tags mean?
- matches
- The paper's claim is directly supported by a theorem in the formal canon.
- supports
- The theorem supports part of the paper's argument, but the paper may add assumptions or extra steps.
- extends
- The paper goes beyond the formal theorem; the theorem is a base layer rather than the whole result.
- uses
- The paper appears to rely on the theorem as machinery.
- contradicts
- The paper's claim conflicts with a theorem or certificate in the canon.
- unclear
- Pith found a possible connection, but the passage is too broad, indirect, or ambiguous to say the theorem truly supports the claim.
Reference graph
Works this paper leans on
-
[1]
I.Corley and C.Robinson, “Hydro foundation model,” 2024. [Online]. Available: https://github.com/isaaccorley/hydro-foundation-model
work page 2024
-
[2]
Detecting marine pollutants and sea surface features with deep learning in sentinel-2 imagery,
K.Kikaki, I.Kakogeorgiou, I.Hoteit, and K.Karantzalos, “Detecting marine pollutants and sea surface features with deep learning in sentinel-2 imagery,” vol. 210, pp. 39–54. [Online]. Available: https://www.sciencedirect.com/science/article/pii/S0924271624000625
-
[3]
MARIDA: A benchmark for marine debris detection from sentinel-2 remote sensing data,
K.Kikaki, I.Kakogeorgiou, P.Mikeli, D.E.Raitsos, and K.Karantzalos, “MARIDA: A benchmark for marine debris detection from sentinel-2 remote sensing data,” vol. 17, no. 1, p. e0262247. [Online]. Available: https://dx.plos.org/10.1371/journal.pone.0262247
-
[4]
P.Agrafiotis, Ł.Janowski, D.Skarlatos, and B.Demir, “MAGIC- BATHYNET: A multimodal remote sensing dataset for bathymetry prediction and pixel-based classification in shallow waters,” in IGARSS 2024 - 2024 IEEE International Geoscience and Remote Sensing Symposium. IEEE, pp. 249–253. [Online]. Available: https://ieeexplore.ieee.org/document/10641355/
-
[5]
A review of active and passive optical methods in hydrography,
G.Mandlburger, “A review of active and passive optical methods in hydrography,”The International Hydrographic Review, vol. 28, pp. 8– 52, 11 2022
work page 2022
-
[6]
Deepblue: Advanced convolutional neural network applications for ocean remote sensing,
H. Wang and X. Li, “Deepblue: Advanced convolutional neural network applications for ocean remote sensing,”IEEE geoscience and remote sensing magazine, vol. 12, no. 1, pp. 138–161, 2023
work page 2023
-
[7]
H. Chen, J. Cheng, X. Ruan, J. Li, L. Ye, S. Chu, L. Cheng, and K. Zhang, “Satellite remote sensing and bathymetry co-driven deep neu- ral network for coral reef shallow water benthic habitat classification,” International Journal of Applied Earth Observation and Geoinforma- tion, vol. 132, p. 104054, 2024
work page 2024
-
[8]
S. Khurram, A. B. Pour, M. Bagheri, E. H. Ariffin, M. F. Akhir, and S. B. Hamzah, “Developments in deep learning algorithms for coastline extraction from remote sensing imagery: a systematic review,”Earth Science Informatics, vol. 18, no. 3, p. 292, 2025
work page 2025
-
[9]
P. Agrafiotis and B. Demir, “Seabed-net: A multi-task network for joint bathymetry estimation and seabed classification from remote sensing imagery in shallow waters,”ISPRS Journal of Photogrammetry and Remote Sensing, vol. 231, pp. 414–430, 2026
work page 2026
-
[10]
Deep learning for ocean forecasting: A comprehensive review of methods, applications, and datasets,
R. Hao, Y . Zhao, S. Zhang, and X. Deng, “Deep learning for ocean forecasting: A comprehensive review of methods, applications, and datasets,”IEEE Transactions on Cybernetics, 2025
work page 2025
-
[11]
Seasonal contrast: Unsupervised pre-training from uncurated remote sensing data,
O.Manas, A.Lacoste, X. i Nieto, D.Vazquez, and P.Rodriguez, “Seasonal contrast: Unsupervised pre-training from uncurated remote sensing data,” in2021 IEEE/CVF International Conference on Computer Vision (ICCV). IEEE, pp. 9394–9403. [Online]. Available: https://ieeexplore.ieee.org/document/9710545/
-
[12]
Spectralgpt: Spectral remote sensing foun- dation model,
D. Hong, B. Zhang, X. Li, Y . Li, C. Li, J. Yao, N. Yokoya, H. Li, P. Ghamisi, X. Jiaet al., “Spectralgpt: Spectral remote sensing foun- dation model,”IEEE Transactions on Pattern Analysis and Machine Intelligence, vol. 46, no. 8, pp. 5227–5244, 2024
work page 2024
-
[13]
U-net: Convolutional networks for biomedical image segmentation,
O. Ronneberger, P. Fischer, and T. Brox, “U-net: Convolutional networks for biomedical image segmentation,” inInternational Conference on Medical image computing and computer-assisted intervention. Springer, 2015, pp. 234–241
work page 2015
-
[14]
Masked autoencoders are scalable vision learners
K.He, X.Chen, S.Xie, Y .Li, P.Dollar, and R.Girshick, “Masked autoencoders are scalable vision learners,” in2022 IEEE/CVF Conference on Computer Vision and Pattern Recognition (CVPR). IEEE, pp. 15 979–15 988. [Online]. Available: https://ieeexplore.ieee. org/document/9879206/
-
[15]
An Image is Worth 16x16 Words: Transformers for Image Recognition at Scale
A. Dosovitskiy, L. Beyer, A. Kolesnikov, D. Weissenborn, X. Zhai, T. Unterthiner, M. Dehghani, M. Minderer, G. Heigold, S. Gelly, J. Uszkoreit, and N. Houlsby, “An image is worth 16x16 words: Transformers for image recognition at scale,” 2021. [Online]. Available: https://arxiv.org/abs/2010.11929
work page internal anchor Pith review Pith/arXiv arXiv 2021
-
[16]
Feature guided masked autoencoder for self-supervised learning in remote sensing,
Y .Wang, H.H.Hern´andez, C.M.Albrecht, and X.X.Zhu, “Feature guided masked autoencoder for self-supervised learning in remote sensing,” vol. 18, pp. 321–336. [Online]. Available: https://ieeexplore.ieee.org/ document/10766851/
-
[17]
Y .Cong, S.Khanna, C.Meng, P.Liu, E.Rozi, Y .He, M.Burke, D.B.Lobell, and S.Ermon, “SatMAE: Pre-training transformers for temporal and multi-spectral satellite imagery,” version Number: 3. [Online]. Available: https://arxiv.org/abs/2207.08051
-
[18]
Y .Wang, N.A.A.Braham, Z.Xiong, C.Liu, C.M.Albrecht, and X.X.Zhu, “SSL4eo-s12: A large-scale multimodal, multitemporal dataset for self-supervised learning in earth observation [software and data sets],” pp. 98–106. [Online]. Available: https://ieeexplore.ieee.org/document/ 10261879/
discussion (0)
Sign in with ORCID, Apple, or X to comment. Anyone can read and Pith papers without signing in.