OlmoEarth v1.1: A more efficient family of OlmoEarth models
Pith reviewed 2026-05-21 05:31 UTC · model grok-4.3
The pith
A revised OlmoEarth model family cuts training GPU hours by 1.7 times and inference MACs by 2.9 times on Sentinel-2 tasks while maintaining overall performance.
A machine-rendered reading of the paper's core claim, the machinery that carries it, and where it could break.
Core claim
The central claim is that a set of improvements applied to the OlmoEarth family produces a 1.7-fold reduction in GPU hours to train Base models and a 2.9-fold reduction in MACs for inference on Sentinel-2 tasks, all while preserving the models' overall performance across evaluated tasks.
What carries the argument
The OlmoEarth v1.1 model family, created through a set of unspecified improvements that reduce computational costs during training and inference.
If this is right
- Base models now require 1.7 times fewer GPU hours to train.
- Inference on Sentinel-2 tasks uses 2.9 times fewer MACs.
- Overall performance metrics remain comparable to previous versions.
- Public release of the training code enables direct reproduction and further development.
Where Pith is reading between the lines
- The efficiency gains may allow researchers to train or fine-tune models more often when new satellite data becomes available.
- Similar changes could be tested on other remote-sensing architectures to check whether comparable savings appear outside this family.
Load-bearing premise
The claim of maintained performance assumes that the same tasks, datasets, and evaluation protocols were used as in the prior OlmoEarth versions.
What would settle it
A side-by-side comparison of v1.1 and earlier OlmoEarth models on the exact same benchmarks that shows a measurable drop in any primary performance metric.
read the original abstract
We present a set of improvements to the OlmoEarth family. These improvements allow us to cut compute costs during training ($1.7 \times$ reduction in GPU hours required to train our Base models) and inference ($2.9\times$ reductions in MACs on Sentinel-2 tasks), while maintaining the models' overall performance. All training code is available at github.com/allenai/olmoearth_pretrain.
Editorial analysis
A structured set of objections, weighed in public.
Referee Report
Summary. The paper presents improvements to the OlmoEarth family of models. These changes reduce training compute by 1.7× GPU hours for Base models and inference cost by 2.9× MACs on Sentinel-2 tasks while maintaining overall performance. Training code is released at github.com/allenai/olmoearth_pretrain.
Significance. If the performance parity holds under identical evaluation conditions to prior OlmoEarth versions, the work would offer a practical reduction in compute for remote-sensing foundation models, lowering barriers to training and deployment on Sentinel-2 data. Public code release supports reproducibility.
major comments (1)
- Abstract: The central claim that overall performance is maintained is asserted without any quantitative metrics, baselines, ablation tables, or explicit statement that the same tasks, datasets, splits, and metrics as the original OlmoEarth models were used. This leaves the comparability of the efficiency gains unverified.
Simulated Author's Rebuttal
We thank the referee for their constructive review and the recommendation for major revision. We address the single major comment below and will incorporate changes to improve clarity.
read point-by-point responses
-
Referee: [—] Abstract: The central claim that overall performance is maintained is asserted without any quantitative metrics, baselines, ablation tables, or explicit statement that the same tasks, datasets, splits, and metrics as the original OlmoEarth models were used. This leaves the comparability of the efficiency gains unverified.
Authors: We agree that the abstract is concise and does not itself contain the supporting quantitative details. The full manuscript addresses this in Section 4 and Tables 2–5, which report direct comparisons to the original OlmoEarth models on identical Sentinel-2 tasks, datasets, splits, and metrics (classification, segmentation, and change detection). These evaluations show performance parity, with mean differences below 1% across all reported metrics. To resolve the concern, we will revise the abstract to add a short quantitative clause and an explicit statement that the evaluation protocol matches the prior work. The revised abstract will read in part: “while maintaining the models’ overall performance (within 1% on average across the same Sentinel-2 benchmarks and metrics as OlmoEarth v1.0; see Section 4).” This revision will appear in the next manuscript version. revision: yes
Circularity Check
No significant circularity; claims rest on empirical measurements of compute and performance
full rationale
The paper presents a set of model improvements and reports direct empirical results: 1.7× reduction in GPU hours for Base models and 2.9× reduction in MACs on Sentinel-2 tasks, with maintained overall performance. No equations, derivations, or mathematical chains are described that reduce by construction to fitted parameters or prior self-citations. Efficiency metrics are self-contained measurements, and performance parity is asserted via reported evaluation rather than any definitional or fitted-input equivalence. The derivation is therefore self-contained against external benchmarks of training time and inference cost.
Axiom & Free-Parameter Ledger
Reference graph
Works this paper leans on
-
[1]
FlexiViT: One model for all patch sizes
Lucas Beyer, Pavel Izmailov, Alexander Kolesnikov, Mathilde Caron, Simon Kornblith, Xiaohua Zhai, Matthias Minderer, Michael Tschannen, Ibrahim Alabdulmohsin, and Filip Pavetic. FlexiViT: One model for all patch sizes. InProceedings of the IEEE/CVF Conference on Computer Vision and Pattern Recognition, pages 14496–14506, 2023
work page 2023
-
[2]
Yezhen Cong, Samar Khanna, Chenlin Meng, Patrick Liu, Erik Rozi, Yutong He, Marshall Burke, David Lobell, and Stefano Ermon. Satmae: Pre-training transformers for temporal and multi-spectral satellite imagery.Advances in Neural Information Processing Systems, 35:197–211, 2022
work page 2022
-
[3]
Olmoearth: Stable latent image modeling for multimodal earth observation
Henry Herzog, Favyen Bastani, Yawen Zhang, Gabriel Tseng, Joseph Redmon, Hadrien Sablon, Ryan Park, Jacob Morrison, Alexandra Buraczynski, Karen Farley, et al. Olmoearth: Stable latent image modeling for multimodal earth observation. InCVPR, 2026
work page 2026
-
[4]
Boosting contrastive self-supervised learning with false negative cancellation
Tri Huynh, Simon Kornblith, Matthew R Walter, Michael Maire, and Maryam Khademi. Boosting contrastive self-supervised learning with false negative cancellation. InProceedings of the IEEE/CVF winter conference on applications of computer vision, 2022
work page 2022
-
[5]
Shuttle Radar Topography Mission.https: //e4ftl01.cr.usgs.gov/MEASURES/SRTMGL1.003/, 2018
National Aeronautics and Space Administration (NASA) Earthdata. Shuttle Radar Topography Mission.https: //e4ftl01.cr.usgs.gov/MEASURES/SRTMGL1.003/, 2018
work page 2018
-
[6]
Planet dump retrieved from https://planet.osm.org .https://www.openstreetmap
OpenStreetMap contributors. Planet dump retrieved from https://planet.osm.org .https://www.openstreetmap. org, 2017
work page 2017
-
[7]
Daniela Szwarcman, Sujit Roy, Paolo Fraccaro, Thorsteinn Elí Gíslason, Benedikt Blumenstiel, Rinki Ghosal, Pedro Henrique de Oliveira, Joao Lucas de Sousa Almeida, Rocco Sedona, Yanghui Kang, et al. Prithvi-eo-2.0: A versatile multi-temporal foundation model for earth observation applications.arXiv preprint arXiv:2412.02732, 2024
-
[8]
Jamie Tolan, Hung-I Yang, Benjamin Nosarzewski, Guillaume Couairon, Huy V Vo, John Brandt, Justine Spore, Sayantan Majumdar, Daniel Haziza, Janaki Vamaraju, et al. Very high resolution canopy height maps from rgb imagery using self-supervised vision transformer and convolutional decoder trained on aerial lidar.Remote Sensing of Environment, 300:113888, 2024
work page 2024
-
[9]
Lightweight, pre-trained transformers for remote sensing timeseries,
Gabriel Tseng, Ruben Cartuyvels, Ivan Zvonkov, Mirali Purohit, David Rolnick, and Hannah Kerner. Lightweight, pre-trained transformers for remote sensing timeseries.arXiv preprint arXiv:2304.14065, 2023
-
[10]
Galileo: Learning global & local features of many remote sensing modalities
Gabriel Tseng, Anthony Fuller, Marlena Reil, Henry Herzog, Patrick Beukema, Favyen Bastani, James R Green, Evan Shelhamer, Hannah Kerner, and David Rolnick. Galileo: Learning global & local features of many remote sensing modalities. InForty-second International Conference on Machine Learning, 2025
work page 2025
-
[11]
Cropland Data Layer: USDA NASS, 2024
United States Department of Agriculture (USDA) National Agricultural Statistics Service (NASS). Cropland Data Layer: USDA NASS, 2024. National Agricultural Statistics Service Marketing and Information Services Office, Washington, D.C. Retrieved from Link: https://croplandcros.scinet.usda.gov/
work page 2024
-
[12]
Kristof Van Tricht, Jeroen Degerickx, Sven Gilliams, Daniele Zanaga, Marjorie Battude, Alex Grosu, Joost Brombacher, Myroslava Lesiv, Juan Carlos Laso Bayas, Santosh Karanam, et al. WorldCereal: a dynamic open-source system for global-scale, seasonal, and reproducible crop and irrigation mapping.Earth System Science Data Discussions, 2023:1–36, 2023
work page 2023
-
[13]
Towards latent masked image modeling for self-supervised visual representation learning
Yibing Wei, Abhinav Gupta, and Pedro Morgado. Towards latent masked image modeling for self-supervised visual representation learning. InECCV, 2024
work page 2024
-
[14]
Junkang Wu, Jiawei Chen, Jiancan Wu, Wentao Shi, Xiang Wang, and Xiangnan He. Understanding contrastive learning via distributionally robust optimization.Advances in Neural Information Processing Systems, 2023
work page 2023
-
[15]
Tete Xiao, Mannat Singh, Eric Mintun, Trevor Darrell, Piotr Dollár, and Ross Girshick. Early convolutions help transformers see better.Advances in Neural Information Processing Systems, 2021
work page 2021
-
[16]
Daniele Zanaga, Ruben Van De Kerchove, Dirk Daems, Wanda De Keersmaecker, Carsten Brockmann, Grit Kirches, Jan Wevers, Oliver Cartus, Maurizio Santoro, Steffen Fritz, et al. ESA WorldCover 10 m 2021 v200. ESA WorldCover Project, 2022. 8 A Speedups: Linear vs. convolutional patch embedding. FlexiViT [1] needs a patch embedding that accepts variable patch s...
work page 2021
discussion (0)
Sign in with ORCID, Apple, or X to comment. Anyone can read and Pith papers without signing in.