pith. machine review for the scientific record. sign in

arxiv: 2604.02719 · v1 · submitted 2026-04-03 · 💻 cs.CV · cs.AI· cs.LG

Recognition: 2 theorem links

· Lean Theorem

MOMO: Mars Orbital Model Foundation Model for Mars Orbital Applications

Bhanu Tokas, Bimal Gajera, Brian Bue, Hannah Kerner, Irish Mehta, Jacob Adler, Mirali Purohit, Scott Dickenshied, Serina Diniega, Steven Lu, Umaa Rebbapragada

Pith reviewed 2026-05-13 20:44 UTC · model grok-4.3

classification 💻 cs.CV cs.AIcs.LG
keywords Marsfoundation modelmodel mergingremote sensingtask arithmeticsegmentationmulti-sensorHiRISE
0
0 comments X

The pith

Merging checkpoints from three Mars sensors aligned by equal validation loss creates a multi-sensor foundation model that outperforms standard pretraining baselines on downstream tasks.

A machine-rendered reading of the paper's core claim, the machinery that carries it, and where it could break.

The paper presents MOMO, a foundation model for Mars orbital applications that combines data from HiRISE, CTX, and THEMIS sensors with resolutions ranging from 0.25 to 100 meters per pixel. It trains models independently on each sensor's data and merges them using a novel Equal Validation Loss strategy to select compatible checkpoints, followed by task arithmetic fusion. This merged model is evaluated on nine tasks from Mars-Bench, showing better overall performance than ImageNet pre-trained models, earth observation foundation models, sensor-specific pre-training, and fully-supervised approaches. The gains are particularly notable in segmentation tasks. This demonstrates that careful checkpoint selection in model merging can effectively build foundation models for multi-resolution remote sensing data.

Core claim

MOMO is constructed by independently pretraining models on large corpora from each of the three Martian sensors and then fusing them at checkpoints chosen to have equal validation loss values using task arithmetic, yielding a single model with superior generalization on Mars remote sensing benchmarks compared to non-merged alternatives.

What carries the argument

The Equal Validation Loss (EVL) strategy for aligning independently trained sensor models at similar convergence points before applying task arithmetic to merge their representations.

If this is right

  • Consistent performance improvements on segmentation tasks across the Mars-Bench suite.
  • Stable fusion of multi-resolution data without requiring simultaneous training on all sensors.
  • Outperformance over ImageNet, earth observation, and sensor-specific baselines in overall metrics.
  • Better generalization when merging models trained on data from 0.25 m/pixel to 100 m/pixel resolutions.

Where Pith is reading between the lines

These are editorial extensions of the paper, not claims the author makes directly.

  • The EVL approach may extend to fusing models from other planetary or Earth remote sensing datasets with varying resolutions.
  • Task arithmetic could be combined with other merging techniques for even broader multi-sensor integration.
  • Releasing the model weights and code allows direct testing on new Mars tasks not in the original benchmark.

Load-bearing premise

That matching validation loss values across separately trained models from different sensors produces representations compatible enough for stable and beneficial fusion via task arithmetic.

What would settle it

Training and merging the sensor models at checkpoints with deliberately mismatched validation losses and observing whether the performance on Mars-Bench tasks drops below or matches the EVL-aligned version.

Figures

Figures reproduced from arXiv: 2604.02719 by Bhanu Tokas, Bimal Gajera, Brian Bue, Hannah Kerner, Irish Mehta, Jacob Adler, Mirali Purohit, Scott Dickenshied, Serina Diniega, Steven Lu, Umaa Rebbapragada.

Figure 1
Figure 1. Figure 1: MOMO can be effectively ap￾plied across a wide range of resolutions and a broad spectrum of Martian remote sensing tasks. By leveraging diverse sen￾sors, our approach enables a single model to generalize across different orbital ap￾plications, including large-scale crater or landslide mapping and precise boulder lo￾calization. and comparable performance on classification. In summary, our main contributions… view at source ↗
Figure 2
Figure 2. Figure 2: Illustrative samples of poor- and high-quality image samples from the HiRISE, CTX, and THEMIS sensors. The top row shows [PITH_FULL_IMAGE:figures/full_fig_p005_2.png] view at source ↗
Figure 3
Figure 3. Figure 3: Loss landscape visualization across different checkpoint selection strategies on DoMars16k and Landmark datasets. The red [PITH_FULL_IMAGE:figures/full_fig_p008_3.png] view at source ↗
Figure 4
Figure 4. Figure 4: Example of a HiRISE map-projected image used in our study. The dark border around the image represents no-data regions that were filtered out during pre￾processing to ensure high-quality crop selection. HiRISE is mounted on the Mars Reconnaissance Orbiter (MRO) satellite and has been collecting data since 2006. HiRISE captures visible spectrum images at very high￾resolution, i.e., ∼ 0.25 meters/pixel. HiRI… view at source ↗
Figure 5
Figure 5. Figure 5: HiRISE pre-training data distribution [PITH_FULL_IMAGE:figures/full_fig_p014_5.png] view at source ↗
Figure 6
Figure 6. Figure 6: CTX pre-training data distribution [PITH_FULL_IMAGE:figures/full_fig_p014_6.png] view at source ↗
Figure 7
Figure 7. Figure 7: THEMIS pre-training data distribution A.2. Downstream Tasks As mentioned in Section 5, we evaluate MOMO on all orbital tasks from Mars-Bench [59]. In this section, we describe details of each downstream task and which sensor that downstream task belongs to. For simplicity, we remove the prefix “mb-” from all datasets, and for long dataset names, we represent that with a short, meaningful name. A.2.1. Class… view at source ↗
Figure 8
Figure 8. Figure 8: AtmosDust DoMars16k This is a multi-class classification dataset designed for geomorphologic feature recognition on Mars using im￾agery from the CTX sensor. It consists of 15 classes ( [PITH_FULL_IMAGE:figures/full_fig_p015_8.png] view at source ↗
Figure 9
Figure 9. Figure 9: DoMars16k Landmark This dataset is a multi-class classification corpus derived from orbital HiRISE imagery. Each image is assigned to one of eight geomorphological feature classes: Bright Dune, Crater, Dark Dune, Impact Ejecta, Slope Streak, Spider, Swiss Cheese, and Other ( [PITH_FULL_IMAGE:figures/full_fig_p016_9.png] view at source ↗
Figure 10
Figure 10. Figure 10: Landmark Frost This is a binary classification dataset designed to detect the presence or absence of surface frost in Mars satellite imagery. The dataset consists of HiRISE images labeled as either “Frost” or “Non Frost” ( [PITH_FULL_IMAGE:figures/full_fig_p016_10.png] view at source ↗
Figure 11
Figure 11. Figure 11: Frost mb-change cls hirise This dataset is designed for binary classification of surface changes using temporal image pairs; specifically, one image taken before and another after some time period, from the same Martian location. The task involves identifying whether meaningful surface change has occurred and classifying between “Change” and “No change”. Unlike standard single-image classification, this t… view at source ↗
Figure 12
Figure 12. Figure 12: mb-change cls hirise For the mb-change cls hirise dataset, we conducted experiments using MOMO and all baseline models, excluding EO￾FMs and DINOv3. All models achieved 100% accuracy and F1-score, indicating that the task is already saturated. Therefore, we did not include these results in the main paper and did not perform further experiments on EO-FMs for this dataset. A.2.2. Segmentation Boulder This i… view at source ↗
Figure 13
Figure 13. Figure 13: Boulder [PITH_FULL_IMAGE:figures/full_fig_p017_13.png] view at source ↗
Figure 14
Figure 14. Figure 14: ConeQuest MMLS This is a binary segmentation dataset designed to identify landslides on the Martian surface, with a focus on the Valles Marineris region from the CTX sensor. All annotations were manually created by expert geologists, ensuring high￾quality, scientifically accurate labels. Each image sample includes multi-modal satellite data comprising 7 channels: RGB (3), Digital Elevation Model (DEM), th… view at source ↗
Figure 15
Figure 15. Figure 15: MMLS Crater Binary & Crater Multi These two datasets focus on crater segmentation using THEMIS imagery. In partic￾ular, mb-crater binary seg is a binary segmentation dataset that distinguishes crater vs. non-crater regions, while mb￾crater multi seg is a multi-class segmentation dataset with four crater types: Other, Layered, Buried, and Secondary ( [PITH_FULL_IMAGE:figures/full_fig_p018_15.png] view at source ↗
Figure 16
Figure 16. Figure 16: Crater Segmentation Datasets of ϵ = 10−4 . We analyze the effect of different values of the tolerance parameter in Section C.4. During model merging, we apply a scaling coefficient of 0.3, following the recommendation of Ilharco et al. [30]. We further analyze the sensi￾tivity of our approach to different scaling coefficients in Section C.3. For the Data Merge experiments, we apply the same hyperparameter… view at source ↗
Figure 17
Figure 17. Figure 17: Reconstruction results using ViT-Base models pre-trained with only MSE loss. The figure compares the Original image against [PITH_FULL_IMAGE:figures/full_fig_p020_17.png] view at source ↗
Figure 18
Figure 18. Figure 18: Reconstruction results using models pre-trained with the proposed combined loss function (pixel-based + perceptual). This [PITH_FULL_IMAGE:figures/full_fig_p021_18.png] view at source ↗
Figure 19
Figure 19. Figure 19: Performance as a function of the scaling coefficient [PITH_FULL_IMAGE:figures/full_fig_p022_19.png] view at source ↗
Figure 20
Figure 20. Figure 20: Example of global map generation using MOMO on the out-of￾distribution region of the ConeQuest dataset. The center panel shows the orig￾inal large-scale HiRISE tile, and the right panel shows the stitched prediction map after inference. The left and top panels display representative 512×512 data samples and their corresponding segmentation outputs. This experiment demon￾strates MOMO’s capability to genera… view at source ↗
read the original abstract

We introduce MOMO, the first multi-sensor foundation model for Mars remote sensing. MOMO uses model merge to integrate representations learned independently from three key Martian sensors (HiRISE, CTX, and THEMIS), spanning resolutions from 0.25 m/pixel to 100 m/pixel. Central to our method is our novel Equal Validation Loss (EVL) strategy, which aligns checkpoints across sensors based on validation loss similarity before fusion via task arithmetic. This ensures models are merged at compatible convergence stages, leading to improved stability and generalization. We train MOMO on a large-scale, high-quality corpus of $\sim 12$ million samples curated from Mars orbital data and evaluate it on 9 downstream tasks from Mars-Bench. MOMO achieves better overall performance compared to ImageNet pre-trained, earth observation foundation model, sensor-specific pre-training, and fully-supervised baselines. Particularly on segmentation tasks, MOMO shows consistent and significant performance improvement. Our results demonstrate that model merging through an optimal checkpoint selection strategy provides an effective approach for building foundation models for multi-resolution data. The model weights, pretraining code, pretraining data, and evaluation code are available at: https://github.com/kerner-lab/MOMO.

Editorial analysis

A structured set of objections, weighed in public.

Desk editor's note, referee report, simulated authors' rebuttal, and a circularity audit. Tearing a paper down is the easy half of reading it; the pith above is the substance, this is the friction.

Referee Report

2 major / 2 minor

Summary. The paper introduces MOMO, the first multi-sensor foundation model for Mars remote sensing. Separate models are pretrained on HiRISE, CTX, and THEMIS imagery (resolutions 0.25–100 m/pix) using a ~12M-sample corpus; checkpoints are aligned via the proposed Equal Validation Loss (EVL) heuristic and fused by task arithmetic. The merged model is evaluated on 9 downstream tasks from Mars-Bench and is reported to outperform ImageNet-pretrained, Earth-observation, sensor-specific, and fully-supervised baselines, with the largest gains on segmentation.

Significance. If the empirical gains prove robust and the EVL alignment mechanism is shown to be more than a generic checkpoint-selection heuristic, the work would offer a practical route to multi-resolution foundation models for planetary remote sensing without requiring joint multi-sensor training. Public release of weights, pretraining code, data, and evaluation scripts strengthens reproducibility.

major comments (2)
  1. [§3.2] §3.2 (EVL checkpoint selection): Matching raw validation-loss scalars across sensors is asserted to place models at 'compatible convergence stages,' yet HiRISE (0.25 m/pix), CTX, and THEMIS (up to 100 m/pix) differ in input statistics, label distributions, and loss landscapes. No normalization of losses, CKA/representation-similarity analysis, or ablation against alternative selection heuristics is provided to show that the observed segmentation gains are attributable to the claimed mechanism rather than any reasonable checkpoint choice.
  2. [§4] §4 (experimental results): The abstract and summary claim 'better overall performance' and 'consistent and significant' gains on segmentation, but no numerical deltas, standard deviations, error bars, or statistical tests are referenced. Full tables must report exact metrics for all baselines (including how they were implemented and tuned) so that the central empirical claim can be verified.
minor comments (2)
  1. [§3.1] Notation for the task-arithmetic merge coefficients and the precise EVL loss-matching tolerance should be defined explicitly in §3.1.
  2. [Figures] Figure captions for the Mars-Bench task visualizations should state the exact resolution and sensor of each input example.

Simulated Author's Rebuttal

2 responses · 0 unresolved

We thank the referee for the constructive feedback on our manuscript. We address each major comment point by point below and have revised the manuscript to incorporate additional analysis and reporting details where needed.

read point-by-point responses
  1. Referee: [§3.2] §3.2 (EVL checkpoint selection): Matching raw validation-loss scalars across sensors is asserted to place models at 'compatible convergence stages,' yet HiRISE (0.25 m/pix), CTX, and THEMIS (up to 100 m/pix) differ in input statistics, label distributions, and loss landscapes. No normalization of losses, CKA/representation-similarity analysis, or ablation against alternative selection heuristics is provided to show that the observed segmentation gains are attributable to the claimed mechanism rather than any reasonable checkpoint choice.

    Authors: We agree that the EVL heuristic, as originally presented, relies on direct comparison of raw validation-loss values without cross-sensor normalization or similarity metrics, and that the manuscript lacks explicit ablations against other checkpoint-selection strategies. In the revised version we add a dedicated subsection in §3.2 that (i) normalizes each sensor’s validation loss by its value at the first checkpoint, (ii) reports CKA similarity between the selected checkpoints across sensors, and (iii) includes an ablation table comparing EVL against fixed-epoch selection and loss-threshold selection. The new results show that EVL-selected merges consistently outperform the alternatives on the segmentation tasks, providing empirical support for the mechanism beyond a generic checkpoint choice. revision: yes

  2. Referee: [§4] §4 (experimental results): The abstract and summary claim 'better overall performance' and 'consistent and significant' gains on segmentation, but no numerical deltas, standard deviations, error bars, or statistical tests are referenced. Full tables must report exact metrics for all baselines (including how they were implemented and tuned) so that the central empirical claim can be verified.

    Authors: We acknowledge that the original abstract and §4 summary statements were qualitative and that the main tables did not include standard deviations or statistical tests. The revised manuscript updates the abstract to report explicit average deltas (e.g., +3.2 mIoU on segmentation), augments all tables in §4 with mean ± std across three random seeds, adds error bars to the corresponding figures, and includes a new paragraph detailing baseline implementation details and hyper-parameter search ranges. Paired t-tests with p-values are now reported for the key segmentation comparisons to substantiate the claim of significant gains. revision: yes

Circularity Check

0 steps flagged

No significant circularity; empirical method with no self-referential derivations

full rationale

The MOMO paper introduces an empirical pipeline: independent sensor-specific pretraining followed by Equal Validation Loss (EVL) checkpoint alignment and task-arithmetic fusion. No equations, fitted parameters, or uniqueness theorems are presented that reduce by construction to the authors' own inputs or prior self-citations. Performance claims on Mars-Bench segmentation and other tasks rest on direct training/evaluation comparisons against baselines, not on any quantity defined in terms of the EVL scalars themselves. The EVL heuristic is a proposed selection rule whose validity is tested experimentally rather than assumed tautologically. No load-bearing self-citation chains or ansatz smuggling appear in the described method.

Axiom & Free-Parameter Ledger

0 free parameters · 0 axioms · 0 invented entities

Only the abstract is available; no explicit free parameters, axioms, or invented entities are stated. The EVL strategy is a procedural choice rather than a fitted constant or new physical entity.

pith-pipeline@v0.9.0 · 5558 in / 1188 out tokens · 35608 ms · 2026-05-13T20:44:48.766333+00:00 · methodology

discussion (0)

Sign in with ORCID, Apple, or X to comment. Anyone can read and Pith papers without signing in.

Lean theorems connected to this paper

Citations machine-checked in the Pith Canon. Every link opens the source theorem in the public Lean library.

Reference graph

Works this paper leans on

80 extracted references · 80 canonical work pages · 3 internal anchors

  1. [1]

    Git re-basin: Merging models modulo permutation symmetries

    Samuel Ainsworth, Jonathan Hayase, and Siddhartha Srini- vasa. Git re-basin: Merging models modulo permutation symmetries. InThe Eleventh International Conference on Learning Representations, 2023. 8

  2. [2]

    Git re-basin: Merging models modulo permuta- tion symmetries.arXiv preprint arXiv:2209.04836, 2022

    Samuel K Ainsworth, Jonathan Hayase, and Siddhartha Srinivasa. Git re-basin: Merging models modulo permuta- tion symmetries.arXiv preprint arXiv:2209.04836, 2022. 8

  3. [3]

    A General Language Assistant as a Laboratory for Alignment

    Amanda Askell, Yuntao Bai, Anna Chen, Dawn Drain, Deep Ganguli, Tom Henighan, Andy Jones, Nicholas Joseph, Ben Mann, Nova DasSarma, et al. A general language assistant as a laboratory for alignment.arXiv preprint arXiv:2112.00861, 2021. 2

  4. [4]

    AnySat: An earth observation model for any resolutions, scales, and modalities.arXiv preprint arXiv:2412.14123, 2024

    Guillaume Astruc, Nicolas Gonthier, Clement Mallet, and Loic Landrieu. AnySat: An earth observation model for any resolutions, scales, and modalities.arXiv preprint arXiv:2412.14123, 2024. 1, 2

  5. [5]

    Satlaspretrain: A large- scale dataset for remote sensing image understanding

    Favyen Bastani, Piper Wolters, Ritwik Gupta, Joe Ferdi- nando, and Aniruddha Kembhavi. Satlaspretrain: A large- scale dataset for remote sensing image understanding. In Proceedings of the IEEE/CVF International Conference on Computer Vision, pages 16772–16782, 2023. 2, 3

  6. [6]

    An empirical comparison of voting classification algorithms: Bagging, boosting, and variants.Machine learning, 36:105–139, 1999

    Eric Bauer and Ron Kohavi. An empirical comparison of voting classification algorithms: Bagging, boosting, and variants.Machine learning, 36:105–139, 1999. 2

  7. [7]

    Calibration and performance of the mars reconnais- sance orbiter context camera (ctx).International Journal of Mars Science and Exploration, 8:1–14, 2013

    JF Bell III, MC Malin, MA Caplinger, J Fahle, MJ Wolff, BA Cantor, PB James, T Ghaemi, LV Posiolova, MA Ravine, et al. Calibration and performance of the mars reconnais- sance orbiter context camera (ctx).International Journal of Mars Science and Exploration, 8:1–14, 2013. 4

  8. [8]

    Spec- tralEarth: Training hyperspectral foundation models at scale

    Nassim Ait Ali Braham, Conrad M Albrecht, Julien Mairal, Jocelyn Chanussot, Yi Wang, and Xiao Xiang Zhu. Spec- tralEarth: Training hyperspectral foundation models at scale. arXiv preprint arXiv:2408.08447, 2024. 2

  9. [9]

    Deploying geospatial foundation models in the real world: Lessons from worldcereal.arXiv preprint arXiv:2508.00858, 2025

    Christina Butsko, Kristof Van Tricht, Gabriel Tseng, Gior- gia Milli, David Rolnick, Ruben Cartuyvels, Inbal Becker Reshef, Zoltan Szantoi, and Hannah Kerner. Deploying geospatial foundation models in the real world: Lessons from worldcereal.arXiv preprint arXiv:2508.00858, 2025. 1

  10. [10]

    The Bruce Murray Labora- tory for Planetary Visualization.http://murray-lab

    California Institute of Technology - Division of Geologi- cal and Planetary Sciences. The Bruce Murray Labora- tory for Planetary Visualization.http://murray-lab. caltech.edu/CTX/. 13

  11. [11]

    Mars odyssey thermal emission imaging system in- frared reduced data record

    PR Christensen, NS Gorelick, GL Mehall, and KC Mur- ray. Mars odyssey thermal emission imaging system in- frared reduced data record. Technical report, ODY-M-THM- 5-IRRDR-V1. 0.[Dataset]. NASA Planetary Data System. https://pds . . . , 2001. 13

  12. [12]

    The thermal emission imaging system (themis) for the mars 2001 odyssey mission.Space Science Reviews, 110(1):85–130, 2004

    Philip R Christensen, Bruce M Jakosky, Hugh H Kieffer, Michael C Malin, Harry Y McSween Jr, Kenneth Nealson, Greg L Mehall, Steven H Silverman, Steven Ferry, Michael Caplinger, et al. The thermal emission imaging system (themis) for the mars 2001 odyssey mission.Space Science Reviews, 110(1):85–130, 2004. 4, 13

  13. [13]

    P. R. Christensen, E. Engle, S. Anwar, S. Dickenshied, D. Noss, N. Gorelick, and M. Weiss-Malik. Jmars – a planetary gis.http://adsabs.harvard.edu/ abs / 2009AGUFMIN22A . .06C, 2009. NASA/JPL- Caltech/Arizona State University. 23

  14. [14]

    SatMAE: Pre-training transformers for tem- poral and multi-spectral satellite imagery.Advances in Neu- ral Information Processing Systems, 35:197–211, 2022

    Yezhen Cong, Samar Khanna, Chenlin Meng, Patrick Liu, Erik Rozi, Yutong He, Marshall Burke, David Lobell, and Stefano Ermon. SatMAE: Pre-training transformers for tem- poral and multi-spectral satellite imagery.Advances in Neu- ral Information Processing Systems, 35:197–211, 2022. 2, 3

  15. [15]

    A global, blended ctx mosaic of mars with vectorized seam mapping: A new mosaicking pipeline using principles of non-destructive image editing

    JL Dickson, LA Kerber, CI Fassett, and BL Ehlmann. A global, blended ctx mosaic of mars with vectorized seam mapping: A new mosaicking pipeline using principles of non-destructive image editing. InLunar and planetary sci- ence conference, pages 1–2. Lunar and Planetary Institute The Woodlands, TX, USA, 2018. 13

  16. [16]

    Re- lease of the global ctx mosaic of mars: An experiment in informationpreserving image data processing

    JL Dickson, BL Ehlmann, LH Kerber, and CI Fassett. Re- lease of the global ctx mosaic of mars: An experiment in informationpreserving image data processing. In54th Lunar and Planetary Science Conference, pages 1–2, 2023. 3, 13

  17. [17]

    Sharpness-aware minimization for efficiently improving generalization.arXiv preprint arXiv:2010.01412,

    Pierre Foret, Ariel Kleiner, Hossein Mobahi, and Behnam Neyshabur. Sharpness-aware minimization for efficiently improving generalization.arXiv preprint arXiv:2010.01412,

  18. [18]

    A decision-theoretic generalization of on-line learning and an application to boosting.Journal of computer and system sciences, 55(1): 119–139, 1997

    Yoav Freund and Robert E Schapire. A decision-theoretic generalization of on-line learning and an application to boosting.Journal of computer and system sciences, 55(1): 119–139, 1997. 2

  19. [19]

    CROMA: Remote sensing representations with contrastive radar- optical masked autoencoders.Advances in Neural Informa- tion Processing Systems, 36, 2024

    Anthony Fuller, Koreen Millard, and James Green. CROMA: Remote sensing representations with contrastive radar- optical masked autoencoders.Advances in Neural Informa- tion Processing Systems, 36, 2024. 2

  20. [20]

    Task singular vectors: Reducing task in- terference in model merging

    Antonio Andrea Gargiulo, Donato Crisostomi, Maria Sofia Bucarelli, Simone Scardapane, Fabrizio Silvestri, and Emanuele Rodola. Task singular vectors: Reducing task in- terference in model merging. InProceedings of the Com- puter Vision and Pattern Recognition Conference, pages 18695–18705, 2025. 8

  21. [21]

    Loss surfaces, mode connectivity, and fast ensembling of dnns.Advances in neural information processing systems, 31, 2018

    Timur Garipov, Pavel Izmailov, Dmitrii Podoprikhin, Dmitry P Vetrov, and Andrew G Wilson. Loss surfaces, mode connectivity, and fast ensembling of dnns.Advances in neural information processing systems, 31, 2018. 7

  22. [22]

    Model patching: Closing the subgroup performance gap with data augmentation.arXiv preprint arXiv:2008.06775, 2020

    Karan Goel, Albert Gu, Yixuan Li, and Christopher R ´e. Model patching: Closing the subgroup performance gap with data augmentation.arXiv preprint arXiv:2008.06775, 2020. 2

  23. [23]

    The HEALPix Primer

    Krzysztof M Gorski, Benjamin D Wandelt, Frode K Hansen, Eric Hivon, and Anthony J Banday. The healpix primer. arXiv preprint astro-ph/9905275, 1999. 5

  24. [24]

    Stochastic weight aver- aging revisited.Applied Sciences, 13(5):2935, 2023

    Hao Guo, Jiyong Jin, and Bin Liu. Stochastic weight aver- aging revisited.Applied Sciences, 13(5):2935, 2023. 2

  25. [25]

    Skysense: A multi- modal remote sensing foundation model towards universal interpretation for earth observation imagery

    Xin Guo, Jiangwei Lao, Bo Dang, Yingying Zhang, Lei Yu, Lixiang Ru, Liheng Zhong, Ziyuan Huang, Kang Wu, Dingxiang Hu, Huimei He, Jian Wang, Jingdong Chen, Ming Yang, Yongjun Zhang, and Yansheng Li. Skysense: A multi- modal remote sensing foundation model towards universal interpretation for earth observation imagery. InProceedings of the IEEE/CVF Confere...

  26. [26]

    Bridging remote sensors with multisensor geospatial foundation models

    Boran Han, Shuai Zhang, Xingjian Shi, and Markus Reich- stein. Bridging remote sensors with multisensor geospatial foundation models. InProceedings of the IEEE/CVF Con- ference on Computer Vision and Pattern Recognition, pages 27852–27862, 2024. 1, 2

  27. [27]

    Masked autoencoders are scalable vision learners

    Kaiming He, Xinlei Chen, Saining Xie, Yanghao Li, Piotr Doll´ar, and Ross Girshick. Masked autoencoders are scalable vision learners. InProceedings of the IEEE/CVF conference on computer vision and pattern recognition, pages 16000– 16009, 2022. 3, 19

  28. [28]

    SpectralGPT: Spectral remote sensing foun- dation model.IEEE Transactions on Pattern Analysis and Machine Intelligence, 2024

    Danfeng Hong, Bing Zhang, Xuyang Li, Yuxuan Li, Chenyu Li, Jing Yao, Naoto Yokoya, Hao Li, Pedram Ghamisi, Xi- uping Jia, et al. SpectralGPT: Spectral remote sensing foun- dation model.IEEE Transactions on Pattern Analysis and Machine Intelligence, 2024. 2

  29. [29]

    A survey on remote sens- ing foundation models: From vision to multimodality.arXiv preprint arXiv:2503.22081, 2025

    Ziyue Huang, Hongxi Yan, Qiqi Zhan, Shuai Yang, Ming- ming Zhang, Chenkai Zhang, YiMing Lei, Zeming Liu, Qingjie Liu, and Yunhong Wang. A survey on remote sens- ing foundation models: From vision to multimodality.arXiv preprint arXiv:2503.22081, 2025. 1

  30. [30]

    Editing Models with Task Arithmetic

    Gabriel Ilharco, Marco Tulio Ribeiro, Mitchell Wortsman, Suchin Gururangan, Ludwig Schmidt, Hannaneh Hajishirzi, and Ali Farhadi. Editing models with task arithmetic.arXiv preprint arXiv:2212.04089, 2022. 4, 19, 22

  31. [31]

    Patching open-vocabulary models by interpolating weights.Advances in Neural Infor- mation Processing Systems, 35:29262–29277, 2022

    Gabriel Ilharco, Mitchell Wortsman, Samir Yitzhak Gadre, Shuran Song, Hannaneh Hajishirzi, Simon Kornblith, Ali Farhadi, and Ludwig Schmidt. Patching open-vocabulary models by interpolating weights.Advances in Neural Infor- mation Processing Systems, 35:29262–29277, 2022. 2

  32. [32]

    Averaging weights leads to wider optima and better generalization.arXiv preprint arXiv:1803.05407, 2018

    Pavel Izmailov, Dmitrii Podoprikhin, Timur Garipov, Dmitry Vetrov, and Andrew Gordon Wilson. Averaging weights leads to wider optima and better generalization.arXiv preprint arXiv:1803.05407, 2018. 2

  33. [33]

    Douglas M. Jennewein, Johnathan Lee, Chris Kurtz, William Dizon, Ian Shaeffer, Alan Chapman, Alejandro Chi- quete, Josh Burks, Amber Carlson, Natalie Mason, Arhat Kobawala, Thirugnanam Jagadeesan, Praful Bhargav Basani, Torey Battelle, Rebecca Belshe, Deb McCaffrey, Marisa Brazil, Chaitanya Inumella, Kirby Kuznia, Jade Buzinski, Dhruvil Deepakbhai Shah, S...

  34. [34]

    A robust end- to-end deep learning framework for detecting martian land- forms with arbitrary orientations.Knowledge-Based Sys- tems, 234:107562, 2021

    Shancheng Jiang, Fan Wu, Kai-Leung Yung, Yingqiao Yang, WH Ip, Ming Gao, and James Abbott Foster. A robust end- to-end deep learning framework for detecting martian land- forms with arbitrary orientations.Knowledge-Based Sys- tems, 234:107562, 2021. 2

  35. [35]

    Stop wasting my time! saving days of ima- genet and bert training with latest weight averaging.arXiv preprint arXiv:2209.14981, 2022

    Jean Kaddour. Stop wasting my time! saving days of ima- genet and bert training with latest weight averaging.arXiv preprint arXiv:2209.14981, 2022. 2

  36. [36]

    Hannah Rae Kerner, Kiri L Wagstaff, Brian D Bue, Patrick C Gray, James F Bell, and Heni Ben Amor. Toward general- ized change detection on planetary surfaces with convolu- tional autoencoders and transfer learning.IEEE Journal of Selected Topics in Applied Earth Observations and Remote Sensing, 12(10):3900–3918, 2019. 17

  37. [37]

    SatCLIP: Global, general- purpose location embeddings with satellite imagery.arXiv preprint arXiv:2311.17179, 2023

    Konstantin Klemmer, Esther Rolf, Caleb Robinson, Lester Mackey, and Marc Rußwurm. SatCLIP: Global, general- purpose location embeddings with satellite imagery.arXiv preprint arXiv:2311.17179, 2023. 2

  38. [38]

    Simple and scalable predictive uncertainty estima- tion using deep ensembles.Advances in neural information processing systems, 30, 2017

    Balaji Lakshminarayanan, Alexander Pritzel, and Charles Blundell. Simple and scalable predictive uncertainty estima- tion using deep ensembles.Advances in neural information processing systems, 30, 2017. 2

  39. [39]

    Branch-train-merge: Embarrassingly parallel training of ex- pert language models.arXiv preprint arXiv:2208.03306,

    Margaret Li, Suchin Gururangan, Tim Dettmers, Mike Lewis, Tim Althoff, Noah A Smith, and Luke Zettlemoyer. Branch-train-merge: Embarrassingly parallel training of ex- pert language models.arXiv preprint arXiv:2208.03306,

  40. [40]

    Trainable weight averaging: A general approach for subspace training

    Tao Li, Zhehao Huang, Yingwen Wu, Zhengbao He, Qinghua Tao, Xiaolin Huang, and Chih-Jen Lin. Trainable weight averaging: A general approach for subspace training. arXiv preprint arXiv:2205.13104, 2022. 2

  41. [41]

    Noise estimation from a single image

    Ce Liu, William T Freeman, Richard Szeliski, and Sing Bing Kang. Noise estimation from a single image. In2006 IEEE Computer Society Conference on Computer Vision and Pat- tern Recognition (CVPR’06), pages 901–908. IEEE, 2006. 5

  42. [42]

    Vision foundation models in remote sensing: A survey.IEEE Geoscience and Remote Sensing Magazine, 2025

    Siqi Lu, Junlin Guo, James R Zimmer-Dauphinee, Jordan M Nieusma, Xiao Wang, Steven A Wernke, Yuankai Huo, et al. Vision foundation models in remote sensing: A survey.IEEE Geoscience and Remote Sensing Magazine, 2025. 1

  43. [43]

    Structure-preserving super resolution with gradient guidance

    Cheng Ma, Yongming Rao, Yean Cheng, Ce Chen, Jiwen Lu, and Jie Zhou. Structure-preserving super resolution with gradient guidance. InProceedings of the IEEE/CVF con- ference on computer vision and pattern recognition, pages 7769–7778, 2020. 3

  44. [44]

    Context camera investigation on board the mars reconnaissance orbiter.Journal of Geophysical Re- search: Planets, 112(E5), 2007

    Michael C Malin, James F Bell III, Bruce A Cantor, Michael A Caplinger, Wendy M Calvin, R Todd Clancy, Kenneth S Edgett, Lawrence Edwards, Robert M Haberle, Philip B James, et al. Context camera investigation on board the mars reconnaissance orbiter.Journal of Geophysical Re- search: Planets, 112(E5), 2007. 4, 13

  45. [45]

    Seasonal contrast: Un- supervised pre-training from uncurated remote sensing data

    Oscar Manas, Alexandre Lacoste, Xavier Gir ´o-i Nieto, David Vazquez, and Pau Rodriguez. Seasonal contrast: Un- supervised pre-training from uncurated remote sensing data. InProceedings of the IEEE/CVF International Conference on Computer Vision, pages 9414–9423, 2021. 2

  46. [46]

    Mars reconnaissance orbiter’s high resolution imaging science experiment (hirise).Journal of Geophysical Research: Planets, 112(E5), 2007

    Alfred S McEwen, Eric M Eliason, James W Bergstrom, Nathan T Bridges, Candice J Hansen, W Alan Delamere, John A Grant, Virginia C Gulick, Kenneth E Herkenhoff, Laszlo Keszthelyi, et al. Mars reconnaissance orbiter’s high resolution imaging science experiment (hirise).Journal of Geophysical Research: Planets, 112(E5), 2007. 4

  47. [47]

    The high-resolution imaging science experiment (hirise) in the mro extended sci- ence phases (2009–2023).Icarus, 419:115795, 2024

    Alfred S McEwen, Shane Byrne, C Hansen, Ingrid J Daubar, Sarah Sutton, Colin M Dundas, Nicole Bardabelias, Nicole Baugh, J Bergstrom, R Beyer, et al. The high-resolution imaging science experiment (hirise) in the mro extended sci- ence phases (2009–2023).Icarus, 419:115795, 2024. 3, 13

  48. [48]

    Towards automatically- tuned neural networks

    Hector Mendoza, Aaron Klein, Matthias Feurer, Jost Tobias Springenberg, and Frank Hutter. Towards automatically- tuned neural networks. InWorkshop on automatic machine learning, pages 58–65. PMLR, 2016. 2

  49. [49]

    Fast model editing at scale

    Eric Mitchell, Charles Lin, Antoine Bosselut, Chelsea Finn, and Christopher D Manning. Fast model editing at scale. arXiv preprint arXiv:2110.11309, 2021. 2

  50. [50]

    Memory-based model editing at scale

    Eric Mitchell, Charles Lin, Antoine Bosselut, Christopher D Manning, and Chelsea Finn. Memory-based model editing at scale. InInternational Conference on Machine Learning, pages 15817–15831. PMLR, 2022. 2

  51. [51]

    Fixing model bugs with natural lan- guage patches.arXiv preprint arXiv:2211.03318, 2022

    Shikhar Murty, Christopher D Manning, Scott Lundberg, and Marco Tulio Ribeiro. Fixing model bugs with natural lan- guage patches.arXiv preprint arXiv:2211.03318, 2022. 2

  52. [52]

    Understanding ssim.arXiv preprint arXiv:2006.13846, 2020

    Jim Nilsson and Tomas Akenine-M ¨oller. Understanding ssim.arXiv preprint arXiv:2006.13846, 2020. 3, 5

  53. [53]

    Rethinking transformers pre-training for multi- spectral satellite imagery

    Mubashir Noman, Muzammal Naseer, Hisham Cholakkal, Rao Muhammad Anwer, Salman Khan, and Fahad Shah- baz Khan. Rethinking transformers pre-training for multi- spectral satellite imagery. InProceedings of the IEEE/CVF Conference on Computer Vision and Pattern Recognition (CVPR), pages 27811–27819, 2024. 2

  54. [54]

    Training language models to follow instructions with human feedback.Ad- vances in neural information processing systems, 35:27730– 27744, 2022

    Long Ouyang, Jeffrey Wu, Xu Jiang, Diogo Almeida, Car- roll Wainwright, Pamela Mishkin, Chong Zhang, Sandhini Agarwal, Katarina Slama, Alex Ray, et al. Training language models to follow instructions with human feedback.Ad- vances in neural information processing systems, 35:27730– 27744, 2022. 2

  55. [55]

    Can you trust your model’s uncertainty? evaluating predictive uncertainty under dataset shift.Advances in neural information processing systems, 32, 2019

    Yaniv Ovadia, Emily Fertig, Jie Ren, Zachary Nado, David Sculley, Sebastian Nowozin, Joshua Dillon, Balaji Lakshmi- narayanan, and Jasper Snoek. Can you trust your model’s uncertainty? evaluating predictive uncertainty under dataset shift.Advances in neural information processing systems, 32, 2019. 2

  56. [56]

    Zimmermann

    Elena Plekhanova, Damien Robert, Johannes Dollinger, Emilia Arens, Philipp Brun, Jan Dirk Wegner, and Niklaus E. Zimmermann. Ssl4eco: A global seasonal dataset for geospatial foundation models in ecology. InProceedings of the IEEE/CVF Conference on Computer Vision and Pattern Recognition (CVPR) Workshops, pages 2428–2439, 2025. 13

  57. [57]

    Conequest: A benchmark for cone segmentation on mars

    Mirali Purohit, Jacob Adler, and Hannah Kerner. Conequest: A benchmark for cone segmentation on mars. InProceed- ings of the IEEE/CVF Winter Conference on Applications of Computer Vision, pages 6026–6035, 2024. 1

  58. [58]

    Investigating the benefits of foundation models for mars science.LPI Contributions, 3007:3535, 2024

    MV Purohit, S Lu, S Diniega, UD Rebbapragada, and HR Kerner. Investigating the benefits of foundation models for mars science.LPI Contributions, 3007:3535, 2024. 1, 2

  59. [59]

    Mars-bench: A benchmark for evaluating foundation models for mars science tasks

    Mirali Purohit, Bimal Gajera, Vatsal Malaviya, Irish Mehta, Kunal Sunil Kasodekar, Jacob Adler, Steven Lu, Umaa Reb- bapragada, and Hannah Kerner. Mars-bench: A benchmark for evaluating foundation models for mars science tasks. In The Thirty-ninth Annual Conference on Neural Information Processing Systems Datasets and Benchmarks Track, 2025. 1, 4, 6, 15

  60. [60]

    How does the spatial distribution of pre-training data affect geospatial foundation models? InWorkshop on Preparing Good Data for Generative AI: Challenges and Ap- proaches, 2025

    Mirali Purohit, Gedeon Muhawenayo, Esther Rolf, and Han- nah Kerner. How does the spatial distribution of pre-training data affect geospatial foundation models? InWorkshop on Preparing Good Data for Generative AI: Challenges and Ap- proaches, 2025. 13

  61. [61]

    Rethinking model re-basin and linear mode con- nectivity

    Xingyu Qu. Rethinking model re-basin and linear mode con- nectivity. 2024. 8

  62. [62]

    Scale-MAE: A scale-aware masked autoencoder for multiscale geospatial representation learning

    Colorado J Reed, Ritwik Gupta, Shufan Li, Sarah Brock- man, Christopher Funk, Brian Clipp, Kurt Keutzer, Salvatore Candido, Matt Uyttendaele, and Trevor Darrell. Scale-MAE: A scale-aware masked autoencoder for multiscale geospatial representation learning. InProceedings of the IEEE/CVF International Conference on Computer Vision, pages 4088– 4099, 2023. 2

  63. [63]

    Position: Mission critical–satellite data is a distinct modality in machine learning

    Esther Rolf, Konstantin Klemmer, Caleb Robinson, and Hannah Kerner. Position: Mission critical–satellite data is a distinct modality in machine learning. InForty-first Inter- national Conference on Machine Learning, 2024. 1

  64. [64]

    Application-driven innovation in machine learning.arXiv preprint arXiv:2403.17381, 2024

    David Rolnick, Alan Aspuru-Guzik, Sara Beery, Bistra Dilk- ina, Priya L Donti, Marzyeh Ghassemi, Hannah Kerner, Claire Monteleoni, Esther Rolf, Milind Tambe, et al. Application-driven innovation in machine learning.arXiv preprint arXiv:2403.17381, 2024. 1

  65. [65]

    Edit- ing a classifier by rewriting its prediction rules.Advances in Neural Information Processing Systems, 34:23359–23373,

    Shibani Santurkar, Dimitris Tsipras, Mahalaxmi Elango, David Bau, Antonio Torralba, and Aleksander Madry. Edit- ing a classifier by rewriting its prediction rules.Advances in Neural Information Processing Systems, 34:23359–23373,

  66. [66]

    Zipit! merging mod- els from different tasks without training.arXiv preprint arXiv:2305.03053, 2023

    George Stoica, Daniel Bolya, Jakob Bjorner, Pratik Ramesh, Taylor Hearn, and Judy Hoffman. Zipit! merging mod- els from different tasks without training.arXiv preprint arXiv:2305.03053, 2023. 2, 3

  67. [67]

    Training neu- ral networks with fixed sparse masks.Advances in Neural Information Processing Systems, 34:24193–24205, 2021

    Yi-Lin Sung, Varun Nair, and Colin A Raffel. Training neu- ral networks with fixed sparse masks.Advances in Neural Information Processing Systems, 34:24193–24205, 2021. 2

  68. [68]

    Going deeper with convolutions

    Christian Szegedy, Wei Liu, Yangqing Jia, Pierre Sermanet, Scott Reed, Dragomir Anguelov, Dumitru Erhan, Vincent Vanhoucke, and Andrew Rabinovich. Going deeper with convolutions. InProceedings of the IEEE conference on computer vision and pattern recognition, pages 1–9, 2015. 2

  69. [69]

    Astrogeology Research Program (USGS), 2014

    Kenneth L Tanaka, James A Skinner, James M Dohm, Ross- man P Irwin, Eric J Kolb, Corey M Fortezzo, Thomas Platz, Gregory G Michael, and Trent M Hare.Geologic map of Mars. Astrogeology Research Program (USGS), 2014. 4

  70. [70]

    Generalized linear mode connectivity for transformers

    Alexander Theus, Alessandro Cabodi, Sotiris Anagnostidis, Antonio Orvieto, Sidak Pal Singh, and Valentina Boeva. Generalized linear mode connectivity for transformers. In The Thirty-ninth Annual Conference on Neural Information Processing Systems, 2025. 8

  71. [71]

    Lightweight, pre-trained transformers for remote sensing timeseries.arXiv preprint arXiv:2304.14065, 2023

    Gabriel Tseng, Ruben Cartuyvels, Ivan Zvonkov, Mirali Purohit, David Rolnick, and Hannah Kerner. Lightweight, pre-trained transformers for remote sensing timeseries.arXiv preprint arXiv:2304.14065, 2023. 1, 2, 3

  72. [72]

    Galileo: Learning global and local features in pretrained remote sens- ing models.arXiv preprint arXiv:2502.09356, 2025

    Gabriel Tseng, Anthony Fuller, Marlena Reil, Henry Her- zog, Patrick Beukema, Favyen Bastani, James R Green, Evan Shelhamer, Hannah Kerner, and David Rolnick. Galileo: Learning global and local features in pretrained remote sens- ing models.arXiv preprint arXiv:2502.09356, 2025. 2, 3

  73. [73]

    GeoCLIP: Clip-inspired alignment be- tween locations and images for effective worldwide geo- localization.Advances in Neural Information Processing Systems, 36:8690–8701, 2023

    Vicente Vivanco Cepeda, Gaurav Kumar Nayak, and Mubarak Shah. GeoCLIP: Clip-inspired alignment be- tween locations and images for effective worldwide geo- localization.Advances in Neural Information Processing Systems, 36:8690–8701, 2023. 2

  74. [74]

    Deep mars: Cnn clas- sification of mars imagery for the pds imaging atlas

    Kiri Wagstaff, You Lu, Alice Stanboli, Kevin Grimes, Thamme Gowda, and Jordan Padams. Deep mars: Cnn clas- sification of mars imagery for the pds imaging atlas. InPro- ceedings of the AAAI Conference on Artificial Intelligence,

  75. [75]

    Mars image content clas- sification: Three years of NASA deployment and recent ad- vances

    Kiri Wagstaff, Steven Lu, Emily Dunkel, Kevin Grimes, Brandon Zhao, Jesse Cai, Shoshanna B Cole, Gary Doran, Raymond Francis, Jake Lee, et al. Mars image content clas- sification: Three years of NASA deployment and recent ad- vances. InProceedings of the AAAI Conference on Artificial Intelligence, pages 15204–15213, 2021. 1, 23

  76. [76]

    Hyperparameter ensembles for robustness and un- certainty quantification.Advances in Neural Information Processing Systems, 33:6514–6527, 2020

    Florian Wenzel, Jasper Snoek, Dustin Tran, and Rodolphe Jenatton. Hyperparameter ensembles for robustness and un- certainty quantification.Advances in Neural Information Processing Systems, 33:6514–6527, 2020. 2

  77. [77]

    Model soups: averaging weights of multiple fine-tuned models improves accuracy without increasing in- ference time

    Mitchell Wortsman, Gabriel Ilharco, Samir Ya Gadre, Re- becca Roelofs, Raphael Gontijo-Lopes, Ari S Morcos, Hongseok Namkoong, Ali Farhadi, Yair Carmon, Simon Ko- rnblith, et al. Model soups: averaging weights of multiple fine-tuned models improves accuracy without increasing in- ference time. InInternational conference on machine learn- ing, pages 23965–...

  78. [78]

    Beyond the permutation symmetry of transformers: The role of rotation for model fusion

    Binchi Zhang, Zaiyi Zheng, Zhengzhang Chen, and Jundong Li. Beyond the permutation symmetry of transformers: The role of rotation for model fusion. InForty-second Interna- tional Conference on Machine Learning, 2025. 8

  79. [79]

    Dusty” and “Non dusty

    Richard Zhang, Phillip Isola, Alexei A Efros, Eli Shecht- man, and Oliver Wang. The unreasonable effectiveness of deep features as a perceptual metric. InProceedings of the IEEE conference on computer vision and pattern recogni- tion, pages 586–595, 2018. 3 A. Data Overview A.1. Pre-training Data Details Figure 4. Example of a HiRISE map-projected image u...

  80. [80]

    [75], enables researchers to efficiently identify images relevant to their investigations

    This system, developed using machine learning classification techniques by Wagstaff et al. [75], enables researchers to efficiently identify images relevant to their investigations. Bright dune Crater Dark dune Impact ejecta Other Slope Streak Spider Swiss cheese Macro Avg PDS 0.860.790.87 0.300.960.67 0.04 0.94 0.68 MOMO 0.90 0.75 0.91 0.40 0.96 0.78 0.0...