pith. machine review for the scientific record. sign in

arxiv: 2604.21127 · v1 · submitted 2026-04-22 · 💻 cs.CV

Recognition: unknown

HyperFM: An Efficient Hyperspectral Foundation Model with Spectral Grouping

Authors on Pith no claims yet

Pith reviewed 2026-05-09 23:52 UTC · model grok-4.3

classification 💻 cs.CV
keywords hyperspectral imagingfoundation modelsspectral attentioncloud property retrievalPACE missionparameter-efficient modelsremote sensingatmospheric retrieval
0
0 comments X

The pith

HyperFM uses spectral grouping with intra- and inter-group attention plus hybrid parameter decomposition to build an efficient foundation model that improves cloud property retrieval from PACE hyperspectral data.

A machine-rendered reading of the paper's core claim, the machinery that carries it, and where it could break.

The paper introduces HyperFM to process the high-volume, finely banded spectral observations from NASA's PACE mission, which captures ocean color, aerosols, and clouds but creates data too large and complex for standard models. Existing foundation models trained on RGB images or limited hyperspectral sets fail to handle continuous spectral signatures and often require heavy parameters or cloud-free training data. HyperFM groups spectral bands, applies attention within and across groups, and uses hybrid parameter decomposition to model spectral-spatial relationships more efficiently while lowering computational cost. This design yields measurable gains on four downstream atmospheric cloud property retrieval tasks compared with prior hyperspectral foundation models and task-specific methods. The authors also release the HyperFM250K dataset covering both clear and cloudy PACE scenes to support broader work.

Core claim

HyperFM is a parameter-efficient hyperspectral foundation model that leverages intra-group and inter-group spectral attention along with hybrid parameter decomposition to capture complex spectral-spatial relationships in PACE observations. It delivers consistent performance improvements over existing hyperspectral foundation models and task-specific state-of-the-art methods across four benchmark downstream atmospheric cloud property retrieval tasks while supporting both clear and cloudy scenes.

What carries the argument

Intra-group and inter-group spectral attention with hybrid parameter decomposition, which partitions the spectrum into groups to model local and global dependencies while keeping parameter count low.

If this is right

  • Consistent gains on cloud microphysics and related atmospheric retrievals from full-spectrum PACE data.
  • Lower parameter count and faster inference than prior hyperspectral foundation models, enabling operational use.
  • Handling of both clear-sky and cloudy scenes within a single model.
  • Release of the HyperFM250K dataset for training or fine-tuning additional models.

Where Pith is reading between the lines

These are editorial extensions of the paper, not claims the author makes directly.

  • The grouping strategy might extend to other instruments whose band counts differ from PACE, provided the intra- and inter-group logic is re-tuned.
  • Reduced compute demand could support on-board or near-real-time processing of satellite streams for air-quality alerts.
  • If the efficiency holds, similar decomposition patterns may apply to other high-dimensional remote-sensing modalities such as multi-temporal stacks.

Load-bearing premise

The combination of spectral grouping, intra- and inter-group attention, and hybrid decomposition will capture the needed relationships in hyperspectral data without overfitting to PACE or requiring large labeled sets.

What would settle it

Evaluating HyperFM on hyperspectral observations from a different satellite sensor or on a retrieval task outside the four cloud-property benchmarks and finding no improvement over current baselines would show the claimed gains do not hold.

Figures

Figures reproduced from arXiv: 2604.21127 by Sanjay Purushotham, Zahid Hassan Tushar.

Figure 1
Figure 1. Figure 1: Left Column: PACE Level 1B radiance observations at 659.6nm (top) and 2130.6nm (bottom); Middle Column: PACE Level 2 products COT (top) and CER (bottom); Right Column: PACE Level 2 products CWP (top) and CTH (bottom) [PITH_FULL_IMAGE:figures/full_fig_p003_1.png] view at source ↗
Figure 4
Figure 4. Figure 4: Hypoformer block, which replaces standard QKV atten [PITH_FULL_IMAGE:figures/full_fig_p005_4.png] view at source ↗
Figure 3
Figure 3. Figure 3: Group Embed module with local group attention (LGA) [PITH_FULL_IMAGE:figures/full_fig_p005_3.png] view at source ↗
Figure 5
Figure 5. Figure 5: Lightweight Decoder for downstream evaluation. [PITH_FULL_IMAGE:figures/full_fig_p006_5.png] view at source ↗
Figure 6
Figure 6. Figure 6: Comparison of Hyperspectral FMs on four pixel-wise regression tasks: cloud optical thickness(COT), cloud effective radius [PITH_FULL_IMAGE:figures/full_fig_p008_6.png] view at source ↗
Figure 7
Figure 7. Figure 7: Geographical scatter plot of HyperFM250k. The map depicts the global coverage of hyperspectral images within our dataset, demonstrating its extensive geographical scope [PITH_FULL_IMAGE:figures/full_fig_p013_7.png] view at source ↗
Figure 8
Figure 8. Figure 8: Comparison of Hyperspectral FMs on four pixel-wise regression tasks: cloud optical thickness(COT), cloud effective radius [PITH_FULL_IMAGE:figures/full_fig_p014_8.png] view at source ↗
read the original abstract

The NASA PACE mission provides unprecedented hyperspectral observations of ocean color, aerosols, and clouds, offering new insights into how these components interact and influence Earth's climate and air quality. Its Ocean Color Instrument measures light across hundreds of finely spaced wavelength bands, enabling detailed characterization of features such as phytoplankton composition, aerosol properties, and cloud microphysics. However, hyperspectral data of this scale is large, complex, and difficult to label, requiring specialized processing and analysis techniques. Existing foundation models, which have transformed computer vision and natural language processing, are generally trained on standard RGB imagery and therefore struggle to interpret the continuous spectral signatures captured by PACE. While recent advances have introduced hyperspectral foundation models, they are typically trained on cloud-free observations and often remain limited to single-sensor datasets due to spectral inconsistencies across instruments. Moreover, existing models tend to be parameter-heavy and computationally expensive, limiting scalability and adoption in operational settings. To address these challenges, we introduce HyperFM, a parameter-efficient hyperspectral foundation model that leverages intra-group and inter-group spectral attention along with hybrid parameter decomposition to better capture spectral spatial relationships while reducing computational cost. HyperFM demonstrates consistent performance improvements over existing hyperspectral foundation models and task-specific state-of-the-art methods across four benchmark downstream atmospheric cloud property retrieval tasks. To support further research, we additionally release HyperFM250K, a large-scale hyperspectral dataset from the PACE mission that includes both clear and cloudy scenes.

Editorial analysis

A structured set of objections, weighed in public.

Desk editor's note, referee report, simulated authors' rebuttal, and a circularity audit. Tearing a paper down is the easy half of reading it; the pith above is the substance, this is the friction.

Referee Report

3 major / 2 minor

Summary. The paper introduces HyperFM, a parameter-efficient hyperspectral foundation model that uses intra-group and inter-group spectral attention combined with hybrid parameter decomposition to capture spectral-spatial relationships in large-scale PACE hyperspectral observations. It claims consistent performance improvements over prior hyperspectral foundation models and task-specific SOTA methods on four downstream atmospheric cloud property retrieval benchmarks, while also releasing the HyperFM250K dataset containing both clear and cloudy scenes from the PACE mission.

Significance. If the reported gains are shown to stem from the proposed architectural mechanisms rather than dataset differences, HyperFM could advance scalable hyperspectral modeling for Earth observation applications such as cloud microphysics retrieval from the PACE Ocean Color Instrument. The dataset release would further support community research on cloudy hyperspectral scenes.

major comments (3)
  1. [Abstract] Abstract: The central claim of 'consistent performance improvements' over existing hyperspectral foundation models and task-specific SOTA methods supplies no numerical metrics, error bars, baseline details, or experimental protocol for the four cloud property tasks, making it impossible to evaluate whether the data support the claim.
  2. [Experiments] Experiments section: The manuscript provides no evidence of controlled comparisons in which prior hyperspectral models are retrained or adapted on the new HyperFM250K dataset (which includes cloudy scenes absent from prior cloud-free training data); without such isolation or component ablations on intra-group/inter-group attention and hybrid decomposition, gains cannot be attributed to the architecture rather than distribution shift.
  3. [Model Architecture] Model description: The hybrid parameter decomposition and spectral grouping mechanisms are described at a high level without equations quantifying parameter reduction or computational cost relative to baselines, which is load-bearing for the efficiency claims that underpin the model's positioning as scalable for operational use.
minor comments (2)
  1. [Abstract] The abstract would be strengthened by briefly noting key quantitative results or at least the specific cloud property tasks (e.g., optical depth, effective radius) to allow readers to gauge the scope of the improvements.
  2. [Model Architecture] Notation for intra-group and inter-group attention should be defined more explicitly with reference to standard transformer attention formulations to improve clarity for readers unfamiliar with hyperspectral adaptations.

Simulated Author's Rebuttal

3 responses · 0 unresolved

We thank the referee for their constructive and detailed feedback, which has helped us strengthen the manuscript. We address each major comment point by point below, indicating revisions made to the next version.

read point-by-point responses
  1. Referee: [Abstract] Abstract: The central claim of 'consistent performance improvements' over existing hyperspectral foundation models and task-specific SOTA methods supplies no numerical metrics, error bars, baseline details, or experimental protocol for the four cloud property tasks, making it impossible to evaluate whether the data support the claim.

    Authors: We agree that the abstract would benefit from explicit quantitative support. In the revised manuscript, we have updated the abstract to report key metrics, including average relative improvements (with standard deviations) across the four cloud property retrieval tasks, the specific baselines used, and a brief reference to the evaluation protocol and dataset splits detailed in Section 4. Full tables with error bars remain in the experiments section. revision: yes

  2. Referee: [Experiments] Experiments section: The manuscript provides no evidence of controlled comparisons in which prior hyperspectral models are retrained or adapted on the new HyperFM250K dataset (which includes cloudy scenes absent from prior cloud-free training data); without such isolation or component ablations on intra-group/inter-group attention and hybrid decomposition, gains cannot be attributed to the architecture rather than distribution shift.

    Authors: This concern is valid and we have addressed it directly. The revised experiments section now includes (i) results for prior hyperspectral foundation models fine-tuned on HyperFM250K to control for dataset effects, and (ii) targeted ablations that isolate the contributions of intra-group attention, inter-group attention, and the hybrid parameter decomposition. These additions demonstrate that the architectural components yield measurable gains even after accounting for the inclusion of cloudy scenes. revision: yes

  3. Referee: [Model Architecture] Model description: The hybrid parameter decomposition and spectral grouping mechanisms are described at a high level without equations quantifying parameter reduction or computational cost relative to baselines, which is load-bearing for the efficiency claims that underpin the model's positioning as scalable for operational use.

    Authors: We acknowledge that the original description was insufficiently quantitative. The revised model section now provides the explicit mathematical formulations for spectral grouping (intra- and inter-group attention) and the hybrid decomposition (combining low-rank and grouped factors). We have also added a dedicated efficiency table reporting parameter counts, FLOPs, and inference latency relative to the main baselines, directly supporting the scalability claims. revision: yes

Circularity Check

0 steps flagged

No circularity: empirical performance claims with no derivations or self-referential reductions

full rationale

The paper introduces HyperFM as an architectural innovation (intra-group/inter-group spectral attention plus hybrid decomposition) and reports empirical gains on four downstream cloud property tasks using the new HyperFM250K dataset. No equations, derivations, fitted parameters renamed as predictions, or load-bearing self-citations appear in the abstract or structure. The central claim is a falsifiable empirical statement comparing model performance, not a mathematical reduction to its own inputs. The skeptic concern about dataset confounding is a validity issue, not circularity. This is a standard self-contained empirical contribution.

Axiom & Free-Parameter Ledger

0 free parameters · 0 axioms · 0 invented entities

Abstract-only review provides no equations or implementation details, so no free parameters, axioms, or invented entities can be identified with certainty. The architectural components (spectral grouping, intra/inter-group attention, hybrid decomposition) are presented as novel but their grounding is not specified.

pith-pipeline@v0.9.0 · 5559 in / 1203 out tokens · 38234 ms · 2026-05-09T23:52:38.599964+00:00 · methodology

discussion (0)

Sign in with ORCID, Apple, or X to comment. Anyone can read and Pith papers without signing in.

Forward citations

Cited by 1 Pith paper

Reviewed papers in the Pith corpus that reference this work. Sorted by Pith novelty score.

  1. Foundation AI Models for Aerosol Optical Depth Estimation from PACE Satellite Data

    cs.CV 2026-05 unverdicted novelty 7.0

    ViTCG, a channel-grouped Vision Transformer, retrieves AOD from PACE hyperspectral data with 62% lower MSE than prior foundation models while producing spatially coherent fields.

Reference graph

Works this paper leans on

61 extracted references · 5 canonical work pages · cited by 1 Pith paper · 1 internal anchor

  1. [1]

    Self- supervised material and texture representation learning for remote sensing tasks

    Peri Akiva, Matthew Purri, and Matthew Leotta. Self- supervised material and texture representation learning for remote sensing tasks. InProceedings of the IEEE/CVF con- ference on computer vision and pattern recognition, pages 8203–8215, 2022. 2

  2. [2]

    Foundation models defining a new era in vision: a survey and outlook

    Muhammad Awais, Muzammal Naseer, Salman Khan, Rao Muhammad Anwer, Hisham Cholakkal, Mubarak Shah, Ming-Hsuan Yang, and Fahad Shahbaz Khan. Foundation models defining a new era in vision: a survey and outlook. IEEE Transactions on Pattern Analysis and Machine Intelli- gence, 2025. 2

  3. [3]

    Spec- tralearth: Training hyperspectral foundation models at scale

    Nassim Ait Ali Braham, Conrad M Albrecht, Julien Mairal, Jocelyn Chanussot, Yi Wang, and Xiao Xiang Zhu. Spec- tralearth: Training hyperspectral foundation models at scale. IEEE Journal of Selected Topics in Applied Earth Observa- tions and Remote Sensing, 2025. 1, 2, 3, 5, 6, 8, 4

  4. [4]

    Pace ocean color in- strument (oci) version 3.1 data products overview.https: / / pace

    NASA Goddard Space Flight Center. Pace ocean color in- strument (oci) version 3.1 data products overview.https: / / pace . oceansciences . org / access _ pace _ data.htm, 2024. Plankton, Aerosol, Cloud, ocean Ecosys- tem (PACE) Mission. 2

  5. [5]

    Functional map of the world

    Gordon Christie, Neil Fendley, James Wilson, and Ryan Mukherjee. Functional map of the world. InProceedings of the IEEE Conference on Computer Vision and Pattern Recognition, pages 6172–6180, 2018. 1

  6. [6]

    Satmae: Pre-training transformers for tem- poral and multi-spectral satellite imagery.Advances in Neu- ral Information Processing Systems, 35:197–211, 2022

    Yezhen Cong, Samar Khanna, Chenlin Meng, Patrick Liu, Erik Rozi, Yutong He, Marshall Burke, David Lobell, and Stefano Ermon. Satmae: Pre-training transformers for tem- poral and multi-spectral satellite imagery.Advances in Neu- ral Information Processing Systems, 35:197–211, 2022. 1, 2

  7. [7]

    An Image is Worth 16x16 Words: Transformers for Image Recognition at Scale

    Alexey Dosovitskiy. An image is worth 16x16 words: Transformers for image recognition at scale.arXiv preprint arXiv:2010.11929, 2020. 1, 2, 4, 5

  8. [8]

    Hyspecnet- 11k: A large-scale hyperspectral dataset for benchmarking learning-based hyperspectral image compression methods

    Martin Hermann Paul Fuchs and Beg ¨um Demir. Hyspecnet- 11k: A large-scale hyperspectral dataset for benchmarking learning-based hyperspectral image compression methods. InIGARSS 2023-2023 IEEE International Geoscience and Remote Sensing Symposium, pages 1779–1782. IEEE, 2023. 5

  9. [9]

    Skysense: A multi-modal remote sens- ing foundation model towards universal interpretation for earth observation imagery

    Xin Guo, Jiangwei Lao, Bo Dang, Yingying Zhang, Lei Yu, Lixiang Ru, Liheng Zhong, Ziyuan Huang, Kang Wu, Dingxiang Hu, et al. Skysense: A multi-modal remote sens- ing foundation model towards universal interpretation for earth observation imagery. InProceedings of the IEEE/CVF Conference on Computer Vision and Pattern Recognition, pages 27672–27683, 2024. 2

  10. [10]

    Deep residual learning for image recognition

    Kaiming He, Xiangyu Zhang, Shaoqing Ren, and Jian Sun. Deep residual learning for image recognition. InProceed- ings of the IEEE conference on computer vision and pattern recognition, pages 770–778, 2016. 2

  11. [11]

    Foundation model for advancing healthcare: Challenges, opportunities and future directions.IEEE Reviews in Biomedical Engi- neering, 2024

    Yuting He, Fuxiang Huang, Xinrui Jiang, Yuxiang Nie, Minghao Wang, Jiguang Wang, and Hao Chen. Foundation model for advancing healthcare: Challenges, opportunities and future directions.IEEE Reviews in Biomedical Engi- neering, 2024. 1

  12. [12]

    Patrick Helber, Benjamin Bischke, Andreas Dengel, and Damian Borth. Eurosat: A novel dataset and deep learning benchmark for land use and land cover classification.IEEE Journal of Selected Topics in Applied Earth Observations and Remote Sensing, 12(7):2217–2226, 2019. 1

  13. [13]

    Tensorized embedding layers

    Oleksii Hrinchuk, Valentin Khrulkov, Leyla Mirvakhabova, Elena Orlova, and Ivan Oseledets. Tensorized embedding layers. InFindings of the association for computational lin- guistics: EMNLP 2020, pages 4847–4860, 2020. 2

  14. [14]

    He Huang, Quan Wang, Chao Liu, and Chen Zhou. Optimal estimation of cloud properties from thermal infrared obser- vations with a combination of deep learning and radiative transfer simulation.Atmospheric Measurement Techniques, 17(24):7129–7141, 2024. 1

  15. [15]

    IPCC Official Website, 2024

    Intergovernmental Panel on Climate Change (IPCC). IPCC Official Website, 2024. Accessed: 2024-12-23. 1, 6, 2

  16. [16]

    Evaluation of a forward operator to assim- ilate cloud water path into wrf-dart.Monthly weather review, 141(7):2272–2289, 2013

    Thomas A Jones, David J Stensrud, Patrick Minnis, and Ra- bindra Palikonda. Evaluation of a forward operator to assim- ilate cloud water path into wrf-dart.Monthly weather review, 141(7):2272–2289, 2013. 6, 2

  17. [17]

    Science plan of the environmental map- ping and analysis program (enmap)

    Hermann Kaufmann, S F ¨orster, Hendrik Wulf, K Segl, Luis Guanter, M Bochow, U Heiden, A M ¨uller, W Heldens, T Schneiderhan, et al. Science plan of the environmental map- ping and analysis program (enmap). 2012. 2, 1

  18. [18]

    Convection di- agnosis and nowcasting for oceanic aviation applications

    Cathy Kessinger, Michael Donovan, Richard Bankert, Earle Williams, Jeffrey Hawkins, Huaqing Cai, Nancy Rehak, Daniel Megenhardt, and Matthias Steiner. Convection di- agnosis and nowcasting for oceanic aviation applications. In Remote Sensing Applications for Aviation Weather Hazard Detection and Decision Support, pages 77–88. SPIE, 2008. 6, 2

  19. [19]

    Transfer-learning-based approach to retrieve the cloud prop- erties using diverse remote sensing datasets.IEEE Transac- tions on Geoscience and Remote Sensing, 2023

    Jingwei Li, Feng Zhang, Wenwen Li, Xuan Tong, BaoXi- ang Pan, Jun Li, Han Lin, Husi Letu, and Frahan Mustafa. Transfer-learning-based approach to retrieve the cloud prop- erties using diverse remote sensing datasets.IEEE Transac- tions on Geoscience and Remote Sensing, 2023. 1, 2, 6, 8, 4

  20. [20]

    Hyperfree: A channel-adaptive and tuning-free foundation model for hyperspectral remote sens- ing imagery

    Jingtao Li, Yingyi Liu, Xinyu Wang, Yunning Peng, Chen Sun, Shaoyu Wang, Zhendong Sun, Tian Ke, Xiao Jiang, Tangwei Lu, et al. Hyperfree: A channel-adaptive and tuning-free foundation model for hyperspectral remote sens- ing imagery. InProceedings of the Computer Vision and Pattern Recognition Conference, pages 23048–23058, 2025. 2, 6, 8, 3, 4

  21. [21]

    Hypoformer: Hybrid decomposition transformer for edge-friendly neural machine translation

    Sunzhu Li, Peng Zhang, Guobing Gan, Xiuqing Lv, Benyou Wang, Junqiu Wei, and Xin Jiang. Hypoformer: Hybrid decomposition transformer for edge-friendly neural machine translation. InProceedings of the 2022 conference on empir- ical methods in natural language processing, pages 7056– 7068, 2022. 2, 4, 5, 7

  22. [22]

    Wenwen Li, Feng Zhang, Bin Guo, Haoyang Fu, and Husi Letu. Physics-driven machine learning algorithm facilitates multilayer cloud property retrievals from geostationary pas- sive imager measurements.IEEE Transactions on Geo- science and Remote Sensing, 62:1–18, 2024. 1

  23. [23]

    S2mae: A spatial-spectral pretraining foundation model for spectral remote sensing data

    Xuyang Li, Danfeng Hong, and Jocelyn Chanussot. S2mae: A spatial-spectral pretraining foundation model for spectral remote sensing data. InProceedings of the IEEE/CVF Con- ference on Computer Vision and Pattern Recognition, pages 24088–24097, 2024. 2

  24. [24]

    Goddard Space Flight Center, 2002

    Rebecca Lindsey and David Herring.MODIS: Moderate Resolution Imaging Spectroradiometer: NASA’s Earth Ob- serving System. Goddard Space Flight Center, 2002. 2

  25. [25]

    Re- moteclip: A vision language foundation model for remote sensing.IEEE Transactions on Geoscience and Remote Sensing, 62:1–16, 2024

    Fan Liu, Delong Chen, Zhangqingyun Guan, Xiaocong Zhou, Jiale Zhu, Qiaolin Ye, Liyong Fu, and Jun Zhou. Re- moteclip: A vision language foundation model for remote sensing.IEEE Transactions on Geoscience and Remote Sensing, 62:1–16, 2024. 1

  26. [26]

    Enabling lightweight fine- tuning for pre-trained language model compression based on matrix product operators.arXiv preprint arXiv:2106.02205,

    Peiyu Liu, Ze-Feng Gao, Wayne Xin Zhao, Zhi-Yuan Xie, Zhong-Yi Lu, and Ji-Rong Wen. Enabling lightweight fine- tuning for pre-trained language model compression based on matrix product operators.arXiv preprint arXiv:2106.02205,

  27. [27]

    Swin transformer: Hierarchical vision transformer using shifted windows

    Ze Liu, Yutong Lin, Yue Cao, Han Hu, Yixuan Wei, Zheng Zhang, Stephen Lin, and Baining Guo. Swin transformer: Hierarchical vision transformer using shifted windows. In Proceedings of the IEEE/CVF international conference on computer vision, pages 10012–10022, 2021. 2

  28. [28]

    Determination of the optical thickness and effective particle radius of clouds from reflected solar radiation measurements

    Teruyuki Nakajima and Michael D King. Determination of the optical thickness and effective particle radius of clouds from reflected solar radiation measurements. part i: Theory. Journal of Atmospheric Sciences, 47(15):1878–1893, 1990. 1, 2

  29. [29]

    PACE Sci- ence Data Reprocessing Version 3.x Notes.https : / / oceancolor

    NASA Goddard Space Flight Center. PACE Sci- ence Data Reprocessing Version 3.x Notes.https : / / oceancolor . gsfc . nasa . gov / files / data / reprocessing/V3/PACE_Reprocessing_V3.x_ notes.pdf, 2025. Accessed: 2025-11-20. 1

  30. [30]

    Segmentation-based multi-pixel cloud optical thickness retrieval using a convolutional neural net- work.Atmospheric Measurement Techniques Discussions, pages 1–34, 2022

    Vikas Nataraja, Sebastian Schmidt, Hong Chen, Takanobu Yamaguchi, Jan Kazil, Graham Feingold, Kevin Wolf, and Hironobu Iwabuchi. Segmentation-based multi-pixel cloud optical thickness retrieval using a convolutional neural net- work.Atmospheric Measurement Techniques Discussions, pages 1–34, 2022. 1, 2, 6, 8, 4

  31. [31]

    Towards the copernicus hy- perspectral imaging mission for the environment (chime)

    Jens Nieke and Michael Rast. Towards the copernicus hy- perspectral imaging mission for the environment (chime). In Igarss 2018-2018 ieee international geoscience and remote sensing symposium, pages 157–159. IEEE, 2018. 2, 1

  32. [32]

    Compressing pre- trained language models by matrix decomposition

    Matan Ben Noach and Yoav Goldberg. Compressing pre- trained language models by matrix decomposition. InPro- ceedings of the 1st Conference of the Asia-Pacific Chapter of the Association for Computational Linguistics and the 10th International Joint Conference on Natural Language Pro- cessing, pages 884–889, 2020. 2

  33. [33]

    Rethinking transformers pre-training for multi- spectral satellite imagery

    Mubashir Noman, Muzammal Naseer, Hisham Cholakkal, Rao Muhammad Anwer, Salman Khan, and Fahad Shah- baz Khan. Rethinking transformers pre-training for multi- spectral satellite imagery. InProceedings of the IEEE/CVF Conference on Computer Vision and Pattern Recognition, pages 27811–27819, 2024. 1, 2

  34. [34]

    Rintaro Okamura, Hironobu Iwabuchi, and K Sebastian Schmidt. Feasibility study of multi-pixel retrieval of opti- cal thickness and droplet effective radius of inhomogeneous clouds using deep learning.Atmospheric Measurement Tech- niques, 10(12):4747–4759, 2017. 1, 2, 6

  35. [35]

    The prisma hyperspectral mission: Science activi- ties and opportunities for agriculture and land monitoring

    Stefano Pignatti, Angelo Palombo, Simone Pascucci, Filom- ena Romano, Federico Santini, Tiziana Simoniello, Amato Umberto, Cuomo Vincenzo, Nicola Acito, Marco Diani, et al. The prisma hyperspectral mission: Science activi- ties and opportunities for agriculture and land monitoring. In2013 IEEE international geoscience and remote sensing symposium-IGARSS, ...

  36. [36]

    Modis atmosphere l2 cloud product (06 l2), nasa modis adaptive processing system, goddard space flight center.URL http://dx

    S Platnick, S Ackerman, M King, K Meyer, WP Men- zel, RE Holz, BA Baum, and P Yang. Modis atmosphere l2 cloud product (06 l2), nasa modis adaptive processing system, goddard space flight center.URL http://dx. doi. org/10.5067/MODIS/MOD06 L, 2, 2015. 3, 1

  37. [37]

    S Platnick, KG Meyer, P Hubanks, R Holz, SA Ackerman, and AK Heidinger. Viirs atmosphere l3 cloud properties product.Version-1.1, NASA Level-1 and Atmosphere Archive & Distribution System (LAADS) Distributed Active Archive Center (DAAC), Goddard Space Flight Center, 2019. 3, 1

  38. [38]

    Cloud retrievals from satellite data using optimal estimation: evaluation and application to atsr.Atmospheric Measurement Techniques, 5(8):1889–1910, 2012

    CA Poulsen, R Siddans, GE Thomas, AM Sayer, RG Grainger, E Campmany, SM Dean, C Arnold, and PD Watts. Cloud retrievals from satellite data using optimal estimation: evaluation and application to atsr.Atmospheric Measurement Techniques, 5(8):1889–1910, 2012. 2

  39. [39]

    Learning transferable visual models from natural language supervi- sion

    Alec Radford, Jong Wook Kim, Chris Hallacy, Aditya Ramesh, Gabriel Goh, Sandhini Agarwal, Girish Sastry, Amanda Askell, Pamela Mishkin, Jack Clark, et al. Learning transferable visual models from natural language supervi- sion. InInternational conference on machine learning, pages 8748–8763. PmLR, 2021. 1

  40. [40]

    Zero-shot text-to-image generation

    Aditya Ramesh, Mikhail Pavlov, Gabriel Goh, Scott Gray, Chelsea V oss, Alec Radford, Mark Chen, and Ilya Sutskever. Zero-shot text-to-image generation. InInternational confer- ence on machine learning, pages 8821–8831. Pmlr, 2021. 1

  41. [41]

    Scale-mae: A scale-aware masked autoencoder for multiscale geospatial representation learning

    Colorado J Reed, Ritwik Gupta, Shufan Li, Sarah Brock- man, Christopher Funk, Brian Clipp, Kurt Keutzer, Salvatore Candido, Matt Uyttendaele, and Trevor Darrell. Scale-mae: A scale-aware masked autoencoder for multiscale geospatial representation learning. InProceedings of the IEEE/CVF International Conference on Computer Vision, pages 4088– 4099, 2023. 2, 6

  42. [42]

    Masked vision transformers for hyperspectral image classi- fication

    Linus Scheibenreif, Michael Mommert, and Damian Borth. Masked vision transformers for hyperspectral image classi- fication. InProceedings of the IEEE/CVF conference on computer vision and pattern recognition, pages 2166–2176,

  43. [43]

    Self-supervised learning of remote sensing scene representations using con- trastive multiview coding

    Vladan Stojnic and Vladimir Risojevic. Self-supervised learning of remote sensing scene representations using con- trastive multiview coding. InProceedings of the IEEE/CVF Conference on Computer Vision and Pattern Recognition, pages 1182–1191, 2021. 2

  44. [44]

    Bigearthnet: A large-scale benchmark archive for remote sensing image understanding

    Gencer Sumbul, Marcela Charfuelan, Beg ¨um Demir, and V olker Markl. Bigearthnet: A large-scale benchmark archive for remote sensing image understanding. InIGARSS 2019- 2019 IEEE international geoscience and remote sensing symposium, pages 5901–5904. IEEE, 2019. 1

  45. [45]

    Rank and run-time aware compression of nlp applications.arXiv preprint arXiv:2010.03193, 2020

    Urmish Thakker, Jesse Beu, Dibakar Gope, Ganesh Dasika, and Matthew Mattina. Rank and run-time aware compression of nlp applications.arXiv preprint arXiv:2010.03193, 2020. 2

  46. [46]

    Maxvit: Multi-axis vision transformer

    Zhengzhong Tu, Hossein Talebi, Han Zhang, Feng Yang, Peyman Milanfar, Alan Bovik, and Yinxiao Li. Maxvit: Multi-axis vision transformer. InEuropean conference on computer vision, pages 459–479. Springer, 2022. 4, 5

  47. [47]

    Cloudunet: Adapt- ing unet for retrieving cloud properties

    Zahid Hassan Tushar, Adeleke Ademakinwa, Jianwu Wang, Zhibo Zhang, and Sanjay Purushotham. Cloudunet: Adapt- ing unet for retrieving cloud properties. InIGARSS 2024 IEEE International Geoscience and Remote Sensing Sympo- sium, pages 7163–7167. IEEE, 2024. 1, 2, 6, 8, 4

  48. [48]

    Joint retrieval of cloud properties using attention-based deep learning models

    Zahid Hassan Tushar, Adeleke Ademakinwa, Jianwu Wang, Zhibo Zhang, and Sanjay Purushotham. Joint retrieval of cloud properties using attention-based deep learning models. InIGARSS 2025-2025 IEEE International Geoscience and Remote Sensing Symposium, pages 4616–4621. IEEE, 2025. 1, 2, 6, 7, 8, 4

  49. [49]

    Hypersigma: Hyperspectral intelligence comprehen- sion foundation model.PAMI, 2025

    Di Wang, Meiqi Hu, Yao Jin, Yuchun Miao, Jiaqi Yang, Yichu Xu, Xiaolei Qin, Jiaqi Ma, Lingyu Sun, Chenxing Li, et al. Hypersigma: Hyperspectral intelligence comprehen- sion foundation model.PAMI, 2025. 1, 2, 5, 6, 8, 3, 4

  50. [50]

    Retrieval of cloud properties from thermal infrared radiometry using convolu- tional neural network.Remote Sensing of Environment, 278: 113079, 2022

    Quan Wang, Chen Zhou, Xiaoyong Zhuge, Chao Liu, Fuzhong Weng, and Minghuai Wang. Retrieval of cloud properties from thermal infrared radiometry using convolu- tional neural network.Remote Sensing of Environment, 278: 113079, 2022. 6

  51. [51]

    Yue Wang, Ming Wen, Hailiang Zhang, Jinyu Sun, Qiong Yang, Zhimin Zhang, and Hongmei Lu. Hsimae: A uni- fied masked autoencoder with large-scale pre-training for hy- perspectral image classification.IEEE Journal of Selected Topics in Applied Earth Observations and Remote Sensing,

  52. [52]

    Calipso mission: spaceborne lidar for observation of aerosols and clouds

    David M Winker, Jacques R Pelon, and M Patrick Mc- Cormick. Calipso mission: spaceborne lidar for observation of aerosols and clouds. InLidar remote sensing for industry and environment monitoring III, pages 1–11. SPIE, 2003. 2

  53. [53]

    Foundation models for remote sensing and earth observation: A survey

    Aoran Xiao, Weihao Xuan, Junjue Wang, Jiaxing Huang, Dacheng Tao, Shijian Lu, and Naoto Yokoya. Foundation models for remote sensing and earth observation: A survey. IEEE Geoscience and Remote Sensing Magazine, 2025. 2

  54. [54]

    A large-scale evaluation of speech foundation models.IEEE/ACM Trans- actions on Audio, Speech, and Language Processing, 32: 2884–2899, 2024

    Shu-wen Yang, Heng-Jui Chang, Zili Huang, Andy T Liu, Cheng-I Lai, Haibin Wu, Jiatong Shi, Xuankai Chang, Hsiang-Sheng Tsai, Wen-Chin Huang, et al. A large-scale evaluation of speech foundation models.IEEE/ACM Trans- actions on Audio, Speech, and Language Processing, 32: 2884–2899, 2024. 1

  55. [55]

    Low-rank few-shot adaptation of vision-language models

    Maxime Zanella and Ismail Ben Ayed. Low-rank few-shot adaptation of vision-language models. InProceedings of the IEEE/CVF Conference on Computer Vision and Pattern Recognition, pages 1593–1603, 2024. 2

  56. [56]

    Opensarurban: A sentinel-1 sar image dataset for urban interpretation.IEEE Journal of Selected Topics in Applied Earth Observations and Remote Sensing, 13:187–203, 2020

    Juanping Zhao, Zenghui Zhang, Wei Yao, Mihai Datcu, Huilin Xiong, and Wenxian Yu. Opensarurban: A sentinel-1 sar image dataset for urban interpretation.IEEE Journal of Selected Topics in Applied Earth Observations and Remote Sensing, 13:187–203, 2020. 1

  57. [57]

    Influences of cloud microphysics on the components of solar irradiance in the wrf-solar model

    Xin Zhou, Yangang Liu, Yunpeng Shan, Satoshi Endo, Yu Xie, and Manajit Sengupta. Influences of cloud microphysics on the components of solar irradiance in the wrf-solar model. Atmosphere, 15(1):39, 2023. 6, 2

  58. [58]

    Mixture-of-experts with expert choice routing.Ad- vances in Neural Information Processing Systems, 35:7103– 7114, 2022

    Yanqi Zhou, Tao Lei, Hanxiao Liu, Nan Du, Yanping Huang, Vincent Zhao, Andrew M Dai, Quoc V Le, James Laudon, et al. Mixture-of-experts with expert choice routing.Ad- vances in Neural Information Processing Systems, 35:7103– 7114, 2022. 5

  59. [59]

    Ar- gus: A compact and versatile foundation model for vision

    Weiming Zhuang, Chen Chen, Zhizhong Li, Sina Sajad- manesh, Jingtao Li, Jiabo Huang, Vikash Sehwag, Vivek Sharma, Hirotaka Shinozaki, Felan Carlo Garcia, et al. Ar- gus: A compact and versatile foundation model for vision. In Proceedings of the Computer Vision and Pattern Recognition Conference, pages 4418–4429, 2025. 2 HyperFM: An Efficient Hyperspectral...

  60. [60]

    Notable examples include En- MAP [17], PRISMA [35], and the forthcoming CHIME mission [31]

    Our HyperFM250k Dataset Hyperspectral imaging from space offers detailed spectral information about the Earth’s surface and atmosphere, and recent missions have significantly increased the volume and quality of available data. Notable examples include En- MAP [17], PRISMA [35], and the forthcoming CHIME mission [31]. These systems are optimized for land-f...

  61. [61]

    6 here which were excluded due to space limitation

    Additional Results We present additional results from Sec. 6 here which were excluded due to space limitation. We compared with an- other recent hyperspectral foundation model called Hyper- Free [20] by loading their ViT-base weights and adding the convolutional decoder as shown in Fig. 5. Note that we re- move theneckfrom HyperFree ViT-b encoder for fair...