pith. machine review for the scientific record. sign in

arxiv: 2604.07092 · v2 · submitted 2026-04-08 · 💻 cs.CV

Recognition: no theorem link

Location Is All You Need: Continuous Spatiotemporal Neural Representations of Earth Observation Data

Authors on Pith no claims yet

Pith reviewed 2026-05-10 18:31 UTC · model grok-4.3

classification 💻 cs.CV
keywords neural fieldsEarth observationsatellite imagerycoordinate-based modelssemantic segmentationspatiotemporal representationfine-tuning without data
0
0 comments X

The pith

A neural field pretrained on Earth observation data reconstructs satellite imagery from coordinates and adapts to tasks like segmentation using only labels, with no further access to the raw pixels.

A machine-rendered reading of the paper's core claim, the machinery that carries it, and where it could break.

The paper introduces a neural representation that encodes multi-temporal satellite imagery for a specific region as a continuous function of space and time coordinates. Once this function is pretrained on the available imagery, it can be fine-tuned for downstream tasks such as semantic segmentation or regression by supplying only the task-specific labels. The approach eliminates the need for users to download, store, or preprocess the original large satellite datasets after the initial pretraining step. Performance on the adapted tasks is shown to remain competitive with models trained from scratch on the raw data or with established geospatial foundation models.

Core claim

LIANet models multi-temporal spaceborne Earth observation data for a given region of interest as a continuous spatiotemporal neural field. Given only spatial and temporal coordinates, LIANet reconstructs the corresponding satellite imagery. Once pretrained, this neural representation can be adapted to various EO downstream tasks, such as semantic segmentation or pixel-wise regression, without requiring access to the original satellite data. It serves as a user-friendly alternative to Geospatial Foundation Models by eliminating the overhead of data access and preprocessing for end-users and enabling fine-tuning solely based on labels.

What carries the argument

LIANet, a coordinate-based neural network that accepts spatial and temporal coordinates as input and outputs the corresponding pixel values to reconstruct satellite imagery as a continuous spatiotemporal field over a region.

If this is right

  • Users can adapt the pretrained field to new EO tasks without downloading or handling the original satellite data volumes.
  • Fine-tuning achieves competitive accuracy on segmentation and regression relative to training from scratch or using larger foundation models.
  • The same pretrained field works across regions of different sizes after a single pretraining pass on the available imagery.
  • End-users need only task labels for adaptation once the coordinate-based representation has been learned.

Where Pith is reading between the lines

These are editorial extensions of the paper, not claims the author makes directly.

  • This setup could support applications where raw imagery cannot be shared due to bandwidth limits or data policies, since only the compact neural weights are needed after pretraining.
  • The same coordinate-driven approach might extend to other spatiotemporal domains such as weather fields or urban sensor networks.
  • Location-specific pretrained fields could become a practical way to distribute EO-derived knowledge without distributing the underlying imagery.

Load-bearing premise

A neural field pretrained on a region's satellite imagery retains enough information in its weights to support effective fine-tuning on new tasks when given only labels and no further access to the original image pixels.

What would settle it

A controlled test in which LIANet is pretrained on a region and then fine-tuned for semantic segmentation using only labels, yet produces accuracy substantially below both a model trained directly on the raw satellite images and a standard geospatial foundation model.

Figures

Figures reproduced from arXiv: 2604.07092 by Conrad M Albrecht, Isabelle Wittmann, Jonathan Prexl, Michael Schmitt, Mojgan Madadikhaljan.

Figure 1
Figure 1. Figure 1: Comparison of model performance as a function of [PITH_FULL_IMAGE:figures/full_fig_p001_1.png] view at source ↗
Figure 2
Figure 2. Figure 2: Visualization of the LIANet workflow. Pretraining: The specified (x, y) coordinates (model input) are used to query learned representations from a hash table by interpolating table entries for the neighboring pixels (blue dots) for the given input position (red dots) for all different resulution grids Gi (in this illustration Ngrids = 3) and adding them to the corresponding temporal encoding vector. This p… view at source ↗
Figure 3
Figure 3. Figure 3: Left plot shows the reconstruction performance of [PITH_FULL_IMAGE:figures/full_fig_p006_3.png] view at source ↗
Figure 4
Figure 4. Figure 4: Performance across five downstream tasks as a func [PITH_FULL_IMAGE:figures/full_fig_p007_4.png] view at source ↗
Figure 5
Figure 5. Figure 5: provides PSNR as a function of bpp for var￾ious flavors of LIANet. Our results demonstrate that the reconstruction quality reaches acceptable (PSNR>30, sig￾nal approx. 30 times stronger than mean error) to excellent (PSNR>40) levels with minor inaccuracies, with the latter often imperceptible to the human eye. The advancement in neural compression will guide fu￾ture design updates of LIANet to improve comp… view at source ↗
Figure 6
Figure 6. Figure 6: Two examples of reconstruction quality of [PITH_FULL_IMAGE:figures/full_fig_p017_6.png] view at source ↗
Figure 7
Figure 7. Figure 7: Three example visualization of LIANet-Base and LIANet-Large reconstruction and prediction performances on downstream ap￾plications. The landcover labels have seasonal predictions, whereas the other tasks have a single label for all timestamps. The improvement in the prediction of LIANet-Large can clearly be seen in the tasks concerning the segmentation and density estimation of building footprints [PITH_F… view at source ↗
read the original abstract

In this work, we present LIANet (Location Is All You Need Network), a coordinate-based neural representation that models multi-temporal spaceborne Earth observation (EO) data for a given region of interest as a continuous spatiotemporal neural field. Given only spatial and temporal coordinates, LIANet reconstructs the corresponding satellite imagery. Once pretrained, this neural representation can be adapted to various EO downstream tasks, such as semantic segmentation or pixel-wise regression, importantly, without requiring access to the original satellite data. LIANet intends to serve as a user-friendly alternative to Geospatial Foundation Models (GFMs) by eliminating the overhead of data access and preprocessing for end-users and enabling fine-tuning solely based on labels. We demonstrate the pretraining of LIANet across target areas of varying sizes and show that fine-tuning it for downstream tasks achieves competitive performance compared to training from scratch or using established GFMs. The source code and datasets are publicly available at https://github.com/mojganmadadi/LIANet/tree/v1.0.1.

Editorial analysis

A structured set of objections, weighed in public.

Desk editor's note, referee report, simulated authors' rebuttal, and a circularity audit. Tearing a paper down is the easy half of reading it; the pith above is the substance, this is the friction.

Referee Report

2 major / 3 minor

Summary. The paper introduces LIANet, a coordinate-based neural representation that models multi-temporal Earth observation (EO) data for a given region as a continuous spatiotemporal neural field. Given only spatial and temporal coordinates, the pretrained model reconstructs the corresponding satellite imagery. It can then be adapted to downstream tasks such as semantic segmentation or pixel-wise regression using only labels, without re-accessing the original satellite pixels. The authors demonstrate pretraining on regions of varying sizes and report competitive performance versus training from scratch or established geospatial foundation models (GFMs), with public code and datasets released.

Significance. If the empirical results hold, the work provides a practical, low-overhead alternative to large GFMs by eliminating data access and preprocessing burdens for end-users. The application of implicit neural representations to multi-temporal EO data is a novel direction, and the public code release supports reproducibility and further investigation. The approach could lower barriers for fine-tuning on label-only scenarios in remote sensing.

major comments (2)
  1. [§4] §4 (Experimental results): The manuscript claims competitive performance on downstream tasks, but the evaluation lacks reported error bars, number of random seeds, or statistical tests across runs. This weakens the ability to assess robustness of the fine-tuning results versus baselines.
  2. [§3.3] §3.3 (Fine-tuning procedure): The description of how the pretrained coordinate-based field is adapted for tasks like segmentation without any input imagery pixels is high-level; it is unclear whether the network uses frozen layers, specific feature extraction from coordinates, or additional heads, which is load-bearing for the central 'labels-only' claim.
minor comments (3)
  1. [Abstract] Abstract: The statement of 'competitive performance' would benefit from a parenthetical reference to the specific table or metric values to allow readers to immediately gauge the claim.
  2. [Figure 2] Figure 2 or equivalent (pretraining visualization): Axis labels and color scales on the reconstructed imagery panels should be clarified for direct comparison to ground-truth satellite bands.
  3. [Related Work] Related work section: The discussion of prior implicit neural representations for images (e.g., NeRF variants) could explicitly contrast the spatiotemporal extension and multi-temporal handling unique to LIANet.

Simulated Author's Rebuttal

2 responses · 0 unresolved

We thank the referee for the encouraging summary and recommendation for minor revision. The comments highlight important aspects of robustness and clarity that we will address. Below we respond point by point to the major comments.

read point-by-point responses
  1. Referee: [§4] §4 (Experimental results): The manuscript claims competitive performance on downstream tasks, but the evaluation lacks reported error bars, number of random seeds, or statistical tests across runs. This weakens the ability to assess robustness of the fine-tuning results versus baselines.

    Authors: We agree that the current presentation of results would benefit from greater statistical rigor. In the revised manuscript we will report mean performance and standard deviation over at least five independent random seeds for all downstream-task tables, and we will include paired statistical significance tests (e.g., Wilcoxon signed-rank) against the strongest baselines. These additions will be placed in §4 and the corresponding supplementary tables. revision: yes

  2. Referee: [§3.3] §3.3 (Fine-tuning procedure): The description of how the pretrained coordinate-based field is adapted for tasks like segmentation without any input imagery pixels is high-level; it is unclear whether the network uses frozen layers, specific feature extraction from coordinates, or additional heads, which is load-bearing for the central 'labels-only' claim.

    Authors: We acknowledge that §3.3 is currently concise. The fine-tuning procedure freezes all weights of the pretrained spatiotemporal field and extracts intermediate coordinate-based features; a small task-specific head (MLP for regression or decoder for segmentation) is then trained on these features using only the provided labels. No satellite pixels are accessed. In the revision we will expand §3.3 with an explicit architectural diagram, a pseudocode listing of the fine-tuning loop, and a statement confirming that the original imagery is never reloaded. revision: yes

Circularity Check

0 steps flagged

No significant circularity

full rationale

The paper introduces LIANet as a coordinate-based MLP that is pretrained to map (x, y, t) inputs to multi-spectral pixel values on a given region, then fine-tuned on task-specific labels. All performance claims rest on explicit supervised training runs whose inputs are the original imagery plus labels; no equation, uniqueness theorem, or self-citation is invoked to force the reported accuracy by construction. The central assertion—that the pretrained field can be adapted without re-accessing raw pixels—is an empirical statement verified by the released code and datasets rather than a definitional identity. Consequently the derivation chain contains no self-definitional, fitted-input-renamed-as-prediction, or load-bearing self-citation steps.

Axiom & Free-Parameter Ledger

0 free parameters · 1 axioms · 0 invented entities

The central claim rests on the empirical success of a standard MLP trained to map coordinates to image values, with no additional free parameters, axioms, or invented entities beyond ordinary neural network assumptions.

axioms (1)
  • domain assumption A sufficiently large MLP can approximate the underlying continuous spatiotemporal field of satellite observations
    Invoked implicitly by the choice of coordinate-based neural representation for image reconstruction.

pith-pipeline@v0.9.0 · 5493 in / 1073 out tokens · 55999 ms · 2026-05-10T18:31:42.863159+00:00 · methodology

discussion (0)

Sign in with ORCID, Apple, or X to comment. Anyone can read and Pith papers without signing in.

Reference graph

Works this paper leans on

69 extracted references · 14 canonical work pages · 1 internal anchor

  1. [1]

    RingMoE: Mixture-of- modality-experts multi-modal foundation models for uni- versal remote sensing image interpretation.arXiv preprint arXiv:2504.03166, 2025

    Hanbo Bi, Yingchao Feng, Boyuan Tong, Mengyu Wang, Haichen Yu, Yongqiang Mao, Hao Chang, Wenhui Diao, Peijin Wang, Yue Yu, et al. RingMoE: Mixture-of- modality-experts multi-modal foundation models for uni- versal remote sensing image interpretation.arXiv preprint arXiv:2504.03166, 2025. 2

  2. [2]

    Albrecht, Julien Mairal, Jocelyn Chanussot, Yi Wang, and Xiao Xiang Zhu

    Nassim Ait Ali Braham, Conrad M. Albrecht, Julien Mairal, Jocelyn Chanussot, Yi Wang, and Xiao Xiang Zhu. Spec- tralEarth: Training hyperspectral foundation models at scale. IEEE J. Sel. Topics Appl. Earth Observ. Remote Sens., 18: 16780–16797, 2025. 2

  3. [3]

    Brown, Steven P

    Christopher F. Brown, Steven P. Brumby, Brookie Guzder- Williams, Tanya Birch, Samantha Brooks Hyde, Joseph Mazzariello, Wanda Czerwinski, Valerie J. Pasquarella, Robert Haertel, Simon Ilyushchenko, et al. Dynamic World, near real-time global 10 m land use land cover mapping.Sci- entific Data, 9(1):251, 2022. 5, 1

  4. [4]

    Alphaearth foundations: An embedding field model for accurate and efficient global mapping from sparse label data.arXiv preprint arXiv:2507.22291, 2025

    Christopher F. Brown, Michal R. Kazmierski, Valerie J. Pasquarella, William J. Rucklidge, Masha Samsikova, Chen- hui Zhang, Evan Shelhamer, Estefania Lahera, Olivia Wiles, Simon Ilyushchenko, et al. AlphaEarth founda- tions: An embedding field model for accurate and effi- cient global mapping from sparse label data.arXiv preprint arXiv:2507.22291, 2025. 1, 2, 3

  5. [5]

    Cantrell, Jeff Clauson, and Cody Anderson

    Simon J. Cantrell, Jeff Clauson, and Cody Anderson. Earth observation remote sensing tools—assessing systems, trends, and characteristics. Technical report, US Geological Survey, 2024. 1

  6. [6]

    Neural compression for multispectral satellite images

    Woojin Cho, Steve Andreas Immanuel, Junhyuk Heo, and Darongsae Kwon. Neural compression for multispectral satellite images. InNeurIPS workshop, 2024. 3

  7. [7]

    SatMAE: Pre-training transformers for tem- poral and multi-spectral satellite imagery.NeurIPS, 35:197– 211, 2022

    Yezhen Cong, Samar Khanna, Chenlin Meng, Patrick Liu, Erik Rozi, Yutong He, Marshall Burke, David Lobell, and Stefano Ermon. SatMAE: Pre-training transformers for tem- poral and multi-spectral satellite imagery.NeurIPS, 35:197– 211, 2022. 1, 2

  8. [8]

    segmentation_models.pytorch, 2019–

    Qubvel contributors. segmentation_models.pytorch, 2019–. Accessed 1 August 2025. 4

  9. [9]

    Sentinel-2: ESA’s optical high-resolution mission for GMES operational services.Remote Sensing of Environ- ment, 120:25–36, 2012

    Matthias Drusch, Umberto Del Bello, Sébastien Carlier, Olivier Colin, Veronica Fernandez, Ferran Gascon, Bianca Hoersch, Claudia Isola, Paolo Laberinti, Philippe Martimort, et al. Sentinel-2: ESA’s optical high-resolution mission for GMES operational services.Remote Sensing of Environ- ment, 120:25–36, 2012. 5

  10. [10]

    Iris Dumeur, Silvia Valero, and Jordi Inglada. Self- supervised spatio-temporal representation learning of satel- lite image time series.IEEE Journal of Selected Topics in Applied Earth Observations and Remote Sensing, 17:4350– 4367, 2024. 2

  11. [11]

    Coin: Compression with implicit neural representations.arXiv preprint arXiv:2103.03123, 2021

    Emilien Dupont, Adam Goli ´nski, Milad Alizadeh, Yee Whye Teh, and Arnaud Doucet. COIN: Compression with implicit neural representations.arXiv preprint arXiv:2103.03123,

  12. [12]

    Essakine, Y

    Amer Essakine, Yanqi Cheng, Chun-Wun Cheng, Lipei Zhang, Zhongying Deng, Lei Zhu, Carola-Bibiane Schön- lieb, and Angelica I Aviles-Rivero. Where do we stand with implicit neural representations? A technical and per- formance survey.arXiv preprint arXiv:2411.03688, 2024. 1

  13. [13]

    Dominant leaf type 2018–present (raster 10m), europe, yearly, 2024

    European Environment Agency (EEA) and Copernicus Land Monitoring Service (CLMS). Dominant leaf type 2018–present (raster 10m), europe, yearly, 2024. Dataset. Accessed 12 October 2025. 5, 1

  14. [14]

    TESSERA: Temporal Embeddings of Surface Spectra for Earth Representation and Analysis

    Zhengpeng Feng, Clement Atzberger, Sadiq Jaffer, Jovana Knezevic, Silja Sormunen, Robin Young, Madeline C. Li- saius, Markus Immitzer, Toby Jackson, James Ball, David A. Coomes, Anil Madhavapeddy, Andrew Blake, and Srini- vasan Keshav. Tessera: Temporal embeddings of surface spectra for earth representation and analysis.arXiv preprint arXiv:2506.20380, 2025. 3

  15. [15]

    PhilEO bench: Evaluating geo-spatial foundation models

    Casper Fibaek, Luke Camilleri, Andreas Luyts, Nikolaos Dionelis, and Bertrand Le Saux. PhilEO bench: Evaluating geo-spatial foundation models. InIEEE Int. Geosci. Remote Sens. Symp., pages 2739–2744. IEEE, 2024. 5

  16. [16]

    CROMA: Remote sensing representations with contrastive radar- optical masked autoencoders.NeurIPS, 36:5506–5538,

    Anthony Fuller, Koreen Millard, and James Green. CROMA: Remote sensing representations with contrastive radar- optical masked autoencoders.NeurIPS, 36:5506–5538,

  17. [17]

    Ter- ratorch: The geospatial foundation models toolkit.arXiv preprint arXiv:2503.20563, 2025

    Carlos Gomes, Benedikt Blumenstiel, Joao Lucas de Sousa Almeida, Pedro Henrique de Oliveira, Paolo Fraccaro, Francesc Marti Escofet, Daniela Szwarcman, Naomi Si- mumba, Romeo Kienzler, and Bianca Zadrozny. TerraTorch: The geospatial foundation models toolkit.arXiv preprint arXiv:2503.20563, 2025. 6

  18. [18]

    Belenguer-Plomer, Kennedy Adriko, Paolo Fraccaro, Romeo Kienzler, Rania Briq, Sab- rina Benassou, Michele Lazzarini, and Conrad M

    Carlos Gomes, Isabelle Wittmann, Damien Robert, Johannes Jakubik, Tim Reichelt, Stefano Maurogiovanni, Rikard Vinge, Jonas Hurst, Erik Scheurer, Rocco Sedona, Thomas Brunschwiler, Stefan Kesselheim, Matej Bati ˇc, Philip Stier, Jan Dirk Wegner, Gabriele Cavallaro, Edzer Pebesma, Michael Marszalek, Miguel A. Belenguer-Plomer, Kennedy Adriko, Paolo Fraccaro...

  19. [19]

    Skysense: A multi-modal remote sens- ing foundation model towards universal interpretation for earth observation imagery

    Xin Guo, Jiangwei Lao, Bo Dang, Yingying Zhang, Lei Yu, Lixiang Ru, Liheng Zhong, Ziyuan Huang, Kang Wu, Dingxiang Hu, et al. Skysense: A multi-modal remote sens- ing foundation model towards universal interpretation for earth observation imagery. InCVPR, pages 27672–27683,

  20. [20]

    SpectralGPT: Spectral remote sensing foun- dation model.arXiv preprint arXiv:2311.07113, 2023

    Danfeng Hong, Bing Zhang, Xuyang Li, Yuxuan Li, Chenyu Li, Jing Yao, Naoto Yokoya, Hao Li, Pedram Ghamisi, Xi- uping Jia, et al. SpectralGPT: Spectral remote sensing foun- dation model.arXiv preprint arXiv:2311.07113, 2023. 2

  21. [21]

    Jakubik, S

    Johannes Jakubik, Sujit Roy, CE Phillips, Paolo Fraccaro, Denys Godwin, Bianca Zadrozny, Daniela Szwarcman, Car- los Gomes, Gabby Nyirjesy, Blair Edwards, et al. Foundation models for generalist geospatial artificial intelligence.arXiv preprint arXiv:2310.18660, 2023. 1, 2

  22. [22]

    Terramind: Large-scale generative multimodality for earth observation.arXiv preprint arXiv:2504.11171, 2025

    Johannes Jakubik, Felix Yang, Benedikt Blumenstiel, Erik Scheurer, Rocco Sedona, Stefano Maurogiovanni, Jente Bosmans, Nikolaos Dionelis, Valerio Marsocci, Niklas Kopp, et al. TerraMind: Large-scale generative multimodal- ity for earth observation.arXiv preprint arXiv:2504.11171,

  23. [23]

    A comprehensive review of U-Net and its variants: Ad- vances and applications in medical image segmentation.IET Image Processing, 19(1):e70019, 2025

    Wang Jiangtao, Nur Intan Raihana Ruhaiyem, and Fu Pan- pan. A comprehensive review of U-Net and its variants: Ad- vances and applications in medical image segmentation.IET Image Processing, 19(1):e70019, 2025. 7

  24. [24]

    Wi- ley Online Library, 2022

    Argyro Kavvada, Douglas Cripe, and Lawrence Friedl.Earth observation applications and global policy frameworks. Wi- ley Online Library, 2022. 1

  25. [25]

    Diffusionsat: A generative foundation model for satellite imagery.arXiv preprint arXiv:2312.03606, 2023

    Samar Khanna, Patrick Liu, Linqi Zhou, Chenlin Meng, Robin Rombach, Marshall Burke, David Lobell, and Stefano Ermon. DiffusionSat: A generative foundation model for satellite imagery.arXiv preprint arXiv:2312.03606, 2023. 2, 3

  26. [26]

    Detecting marine pollutants and sea surface features with deep learning in sentinel-2 im- agery.ISPRS Journal of Photogrammetry and Remote Sens- ing, 210:39–54, 2024

    Katerina Kikaki, Ioannis Kakogeorgiou, Ibrahim Hoteit, and Konstantinos Karantzalos. Detecting marine pollutants and sea surface features with deep learning in sentinel-2 im- agery.ISPRS Journal of Photogrammetry and Remote Sens- ing, 210:39–54, 2024. 3

  27. [27]

    SatCLIP: Global, general- purpose location embeddings with satellite imagery

    Konstantin Klemmer, Esther Rolf, Caleb Robinson, Lester Mackey, and Marc Rußwurm. SatCLIP: Global, general- purpose location embeddings with satellite imagery. In AAAI, pages 4347–4355, 2025. 1, 2, 3

  28. [28]

    Geo- bench: Toward foundation models for earth monitoring

    Alexandre Lacoste, Nils Lehmann, Pau Rodriguez, Evan Sherwin, Hannah Kerner, Björn Lütjens, Jeremy Irvin, David Dao, Hamed Alemohammad, Alexandre Drouin, et al. Geo- bench: Toward foundation models for earth monitoring. Advances in Neural Information Processing Systems, 36: 51080–51093, 2023. 3

  29. [29]

    A high-resolution canopy height model of the earth

    Nico Lang, Walter Jetz, Konrad Schindler, and Jan Dirk Wegner. A high-resolution canopy height model of the earth. Nature Ecology & Evolution, 7(11):1778–1789, 2023. 1

  30. [30]

    Remote sensing image compression method based on implicit neu- ral representation

    Xin Li, Baile Sun, Jixiu Liao, and Xiaofei Zhao. Remote sensing image compression method based on implicit neu- ral representation. InProceedings of the International Con- ference on Computing and Pattern Recognition, pages 432– 439, 2023. 3

  31. [31]

    Xiang Li, Congcong Wen, Yuan Hu, and Nan Zhou. RS- CLIP: Zero shot remote sensing scene classification via con- trastive vision-language supervision.International Jour- nal of Applied Earth Observation and Geoinformation, 124: 103497, 2023. 2

  32. [32]

    Text2Earth: Unlocking text-driven remote sensing image generation with a global-scale dataset and a foundation model.IEEE Geosci

    Chenyang Liu, Keyan Chen, Rui Zhao, Zhengxia Zou, and Zhenwei Shi. Text2Earth: Unlocking text-driven remote sensing image generation with a global-scale dataset and a foundation model.IEEE Geosci. Remote Sens. Mag., 13(3): 238–259, 2025. 3

  33. [33]

    Re- moteCLIP: A vision language foundation model for remote sensing.IEEE Trans

    Fan Liu, Delong Chen, Zhangqingyun Guan, Xiaocong Zhou, Jiale Zhu, Qiaolin Ye, Liyong Fu, and Jun Zhou. Re- moteCLIP: A vision language foundation model for remote sensing.IEEE Trans. Geosci. Remote Sens., 62:10504785,

  34. [34]

    Diffusion models meet remote sensing: Principles, methods, and perspectives.IEEE Trans

    Yidan Liu, Jun Yue, Shaobo Xia, Pedram Ghamisi, Weiy- ing Xie, and Leyuan Fang. Diffusion models meet remote sensing: Principles, methods, and perspectives.IEEE Trans. Geosci. Remote Sens., page 10684806, 2024. 3

  35. [35]

    Pangaea: A global and inclusive benchmark for geospatial foundation models.arXiv preprint arXiv:2412.04204, 2024

    Valerio Marsocci, Yuru Jia, Georges Le Bellier, David Kerekes, Liang Zeng, Sebastian Hafner, Sebastian Gerard, Eric Brune, Ritu Yadav, Ali Shibli, et al. PANGAEA: A global and inclusive benchmark for geospatial foundation models.arXiv preprint arXiv:2412.04204, 2024. 2, 7, 1, 3

  36. [36]

    High resolution canopy height maps (chm), 2024

    Meta and World Resources Institute (WRI). High resolution canopy height maps (chm), 2024. Source imagery for CHM © 2016 Maxar. Accessed 13 June 2025. 5, 1

  37. [37]

    Microsoft building footprints, 2022

    Microsoft. Microsoft building footprints, 2022. Accessed: 2022-11-01. 5, 2

  38. [38]

    Instant neural graphics primitives with a multires- olution hash encoding.TOG, 41(4), 2022

    Thomas Müller, Alex Evans, Christoph Schied, and Alexan- der Keller. Instant neural graphics primitives with a multires- olution hash encoding.TOG, 41(4), 2022. 1, 3, 4

  39. [39]

    HLS Foundation Burnscars Dataset, 2023

    Christopher Phillips, Sujit Roy, Kumar Ankur, and Rahul Ra- machandran. HLS Foundation Burnscars Dataset, 2023. 5, 3

  40. [40]

    Multi-modal multi- objective contrastive learning for Sentinel-1/2 imagery

    Jonathan Prexl and Michael Schmitt. Multi-modal multi- objective contrastive learning for Sentinel-1/2 imagery. In Proceedings of CVPR Workshops, pages 2135–2143, 2023. 2

  41. [41]

    The potential of Sentinel-2 data for global building footprint mapping with high temporal resolution

    Jonathan Prexl and Michael Schmitt. The potential of Sentinel-2 data for global building footprint mapping with high temporal resolution. InJoint Urban Remote Sensing Event, page 10144166. IEEE, 2023. 5

  42. [42]

    SenPa-MAE: Sensor parameter aware masked autoencoder for multi-satellite self- supervised pretraining

    Jonathan Prexl and Michael Schmitt. SenPa-MAE: Sensor parameter aware masked autoencoder for multi-satellite self- supervised pretraining. InProceedings of GCPR, pages 317– 331, 2024. 2

  43. [43]

    Jonathan Prexl, Anton Baumann, and Michael Schmitt. A comparison of uncertainty estimation methods for building footprint change detection from Sentinel-2 imagery.ISPRS Annals of the Photogrammetry, Remote Sensing and Spatial Information Sciences, 10:339–346, 2024. 5

  44. [44]

    SAR- Former – an acquisition parameter aware vision transformer for synthetic aperture radar data

    Jonathan Prexl, Michael Recla, and Michael Schmitt. SAR- Former – an acquisition parameter aware vision transformer for synthetic aperture radar data. InProceedings of CVPR Workshops, pages 2225–2234, 2025. 2

  45. [45]

    Hyperspectral im- age compression using sampling and implicit neural repre- sentations.IEEE Trans

    Shima Rezasoltani and Faisal Z Qureshi. Hyperspectral im- age compression using sampling and implicit neural repre- sentations.IEEE Trans. Geosci. Remote Sens., 63:10804213,

  46. [46]

    Panoptic seg- mentation of satellite image time series with convolutional temporal attention networks.ICCV, 2021

    Vivien Sainte Fare Garnot and Loic Landrieu. Panoptic seg- mentation of satellite image time series with convolutional temporal attention networks.ICCV, 2021. 5, 3

  47. [47]

    GeoSynth: Contextually-aware high- resolution satellite image synthesis

    Srikumar Sastry, Subash Khanal, Aayush Dhakal, and Nathan Jacobs. GeoSynth: Contextually-aware high- resolution satellite image synthesis. InCVPR, pages 460– 470, 2024. 3

  48. [48]

    Prithvi wxc: Foundation model for weather and climate,

    Johannes Schmude, Sujit Roy, Will Trojak, Johannes Jaku- bik, Daniel Salles Civitarese, Shraddha Singh, Julian Kuehn- ert, Kumar Ankur, Aman Gupta, Christopher E Phillips, et al. Prithvi wxc: Foundation model for weather and climate. arXiv preprint arXiv:2409.13598, 2024. 2

  49. [49]

    RSDiff: Remote sensing image generation from text using diffusion model

    Ahmad Sebaq and Mohamed ElHelw. RSDiff: Remote sensing image generation from text using diffusion model. Neural Computing and Applications, 36(36):23103–23111,

  50. [50]

    Implicit neural representa- tions with periodic activation functions.NeurIPS, 33:7462– 7473, 2020

    Vincent Sitzmann, Julien Martel, Alexander Bergman, David Lindell, and Gordon Wetzstein. Implicit neural representa- tions with periodic activation functions.NeurIPS, 33:7462– 7473, 2020. 3

  51. [51]

    Implicit neural representations for image compression

    Yannick Strümpler, Janis Postels, Ren Yang, Luc Van Gool, and Federico Tombari. Implicit neural representations for image compression. InECCV, pages 74–91. Springer, 2022. 3

  52. [52]

    a rXiv preprint arXiv:2412.02732 (2024)

    Daniela Szwarcman, Sujit Roy, Paolo Fraccaro, Þorsteinn Éli Gíslason, Benedikt Blumenstiel, Rinki Ghosal, Pedro Hen- rique de Oliveira, Joao Lucas de Sousa Almeida, Rocco Sedona, Yanghui Kang, et al. Prithvi-eo-2.0: A versatile multi-temporal foundation model for earth observation ap- plications.arXiv preprint arXiv:2412.02732, 2024. 2, 3, 6

  53. [53]

    CRS-Diff: Control- lable remote sensing image generation with diffusion model

    Datao Tang, Xiangyong Cao, Xingsong Hou, Zhongyuan Jiang, Junmin Liu, and Deyu Meng. CRS-Diff: Control- lable remote sensing image generation with diffusion model. IEEE Trans. Geosci. Remote Sens., 62:10663449, 2024. 3

  54. [54]

    Dynamicearthnet: Daily multi-spectral satellite dataset for semantic change segmentation

    Aysim Toker, Lukas Kondmann, Mark Weber, Marvin Eisenberger, Andrés Camero, Jingliang Hu, Ariadna Pregel Hoderlein, Ça ˘glar ¸ Senaras, Timothy Davis, Daniel Cre- mers, Giovanni Marchisio, Xiao Xiang Zhu, and Laura Leal-Taixé. Dynamicearthnet: Daily multi-spectral satellite dataset for semantic change segmentation. InProceedings of the IEEE/CVF Conference...

  55. [55]

    Enabling country-scale land cover mapping with meter-resolution satellite imagery.ISPRS Journal of Photogrammetry and Re- mote Sensing, 196:178–196, 2023

    Xin-Yi Tong, Gui-Song Xia, and Xiao Xiang Zhu. Enabling country-scale land cover mapping with meter-resolution satellite imagery.ISPRS Journal of Photogrammetry and Re- mote Sensing, 196:178–196, 2023. 3

  56. [56]

    Crop mapping from image time series: Deep learning with multi-scale label hierarchies.Remote Sensing of Environment, 264:112603, 2021

    Mehmet Ozgur Turkoglu, Stefano D’Aronco, Gregor Perich, Frank Liebisch, Constantin Streit, Konrad Schindler, and Jan Dirk Wegner. Crop mapping from image time series: Deep learning with multi-scale label hierarchies.Remote Sensing of Environment, 264:112603, 2021. 1

  57. [57]

    A survey on self- supervised methods for visual representation learning.Ma- chine Learning, 114(4):111, 2025

    Tobias Uelwer, Jan Robine, Stefan Sylvius Wagner, Marc Höftmann, Eric Upschulte, Sebastian Konietzny, Maike Behrendt, and Stefan Harmeling. A survey on self- supervised methods for visual representation learning.Ma- chine Learning, 114(4):111, 2025. 1

  58. [58]

    The multi- temporal urban development spacenet dataset

    Adam Van Etten, Daniel Hogan, Jesus Martinez Manso, Ja- cob Shermeyer, Nicholas Weir, and Ryan Lewis. The multi- temporal urban development spacenet dataset. InProceed- ings of the IEEE/CVF Conference on Computer Vision and Pattern Recognition, pages 6398–6407, 2021. 3

  59. [59]

    EarthView: a large scale remote sensing dataset for self-supervision

    Diego Velazquez, Pau Rodriguez, Sergio Alonso, Josep M Gonfaus, Jordi Gonzalez, Gerardo Richarte, Javier Marin, Yoshua Bengio, and Alexandre Lacoste. EarthView: a large scale remote sensing dataset for self-supervision. InPro- ceedings of the Winter Conference on Applications of Com- puter Vision, pages 1228–1237, 2025. 2

  60. [60]

    Geoclip: Clip-inspired alignment be- tween locations and images for effective worldwide geo- localization.Advances in Neural Information Processing Systems, 36:8690–8701, 2023

    Vicente Vivanco Cepeda, Gaurav Kumar Nayak, and Mubarak Shah. Geoclip: Clip-inspired alignment be- tween locations and images for effective worldwide geo- localization.Advances in Neural Information Processing Systems, 36:8690–8701, 2023. 1, 3

  61. [61]

    Self-supervised learning in remote sensing: A review.IEEE Geoscience and Remote Sensing Magazine, 10(4):213–247, 2022

    Yi Wang, Conrad M Albrecht, Nassim Ait Ali Braham, Lichao Mou, and Xiao Xiang Zhu. Self-supervised learning in remote sensing: A review.IEEE Geoscience and Remote Sensing Magazine, 10(4):213–247, 2022. 1

  62. [62]

    Environmen- tal impacts of earth observation data in the constellation and cloud computing era.Science of the Total Environment, 909: 168584, 2024

    R Wilkinson, MM Mleczko, RJW Brewin, KJ Gaston, M Mueller, JD Shutler, X Yan, and K Anderson. Environmen- tal impacts of earth observation data in the constellation and cloud computing era.Science of the Total Environment, 909: 168584, 2024. 1

  63. [63]

    Neural plasticity-inspired founda- tion model for observing the Earth crossing modalities

    Zhitong Xiong, Yi Wang, Fahong Zhang, Adam J. Stewart, Joëlle Hanna, Damian Borth, Ioannis Papoutsis, Bertrand Le Saux, Gustau Camps-Valls, and Xiao Xiang Zhu. Neural plasticity-inspired foundation model for observing the earth crossing modalities.CoRR, abs/2403.15356, 2024. 1, 2, 3

  64. [64]

    MetaEarth: A generative foundation model for global-scale remote sensing image generation.IEEE TPAMI, 47(3):10768939, 2024

    Zhiping Yu, Chenyang Liu, Liqin Liu, Zhenwei Shi, and Zhengxia Zou. MetaEarth: A generative foundation model for global-scale remote sensing image generation.IEEE TPAMI, 47(3):10768939, 2024. 2, 3

  65. [65]

    Com- pressing hyperspectral images into multilayer perceptrons using fast-time hyperspectral neural radiance fields.IEEE Geosci

    Lili Zhang, Tianpeng Pan, Jiahui Liu, and Lin Han. Com- pressing hyperspectral images into multilayer perceptrons using fast-time hyperspectral neural radiance fields.IEEE Geosci. Remote Sensing Lett., 21:10433191, 2024. 3 Location Is All You Need: Continuous Spatiotemporal Neural Representations of Earth Observation Data Supplementary Material In this su...

  66. [66]

    Every epoch is backpropa- gated with 1,024,000 randomly selected points, trained with a batch size of 64

    Pretraining Setup To pretrain the proposedLIANeton different-sized areas, we useL 1 loss, AdamW optimizer with a base learning rate of5×10 −4, and a Cosine learning rate scheduler that has 5 warm-up epochs. Every epoch is backpropa- gated with 1,024,000 randomly selected points, trained with a batch size of 64. The number of training epochs forA 0 is 225,...

  67. [67]

    Fine-Tuning onA + andA ++ We evaluate on five downstream tasks, including regres- sion, binary, and multi-class segmentation, to assess the utility of the learned representations. Three visual sample patches from two seasons together with their reconstruction results ofLIANet-BaseandLIANet-Large, as well as their predicted labels for all tasks, are illust...

  68. [68]

    Since our framework models multispectral Sentinel-2 data within ge- ographically contiguous regions, benchmarks must provide compatible imagery and sufficient spatial density

    Fine-Tuning on Standard Benchmark Datasets Selecting appropriate benchmarks for evaluatingLIANetre- quires compatibility between the benchmark input modality and the pretraining modality ofLIANet, the availability of georeferencing, and extensive spatial coverage. Since our framework models multispectral Sentinel-2 data within ge- ographically contiguous ...

  69. [69]

    AlthoughLIANetis not meant to serve as an EO image compressor, we study compression performance in terms of reconstruction error as complemen- tary information

    Neural Compression Analysis Implicit neural representations are related to the field of neural compression [11, 18]. AlthoughLIANetis not meant to serve as an EO image compressor, we study compression performance in terms of reconstruction error as complemen- tary information. Given coordinates(x, y, t),LIANetis able to regenerate imagesIover an areaA. Co...