arxiv: 2604.03301 · v1 · submitted 2026-03-30 · 💻 cs.CV · cs.AI

Recognition: 1 theorem link

· Lean Theorem

Embedding-Only Uplink for Onboard Retrieval Under Shift in Remote Sensing

Sangcheol Sim

Authors on Pith no claims yet

Pith reviewed 2026-05-14 22:13 UTC · model grok-4.3

classification 💻 cs.CV cs.AI

keywords remote sensingonboard processingembeddingsvector searchdistribution shiftsatellite imageryuplinktriage

0 comments

The pith

Uplinking only compact embeddings enables onboard systems to switch between retrieval heads for different remote-sensing tasks under shift while keeping all telemetry under 1 KB per query.

A machine-rendered reading of the paper's core claim, the machinery that carries it, and where it could break.

The paper tests a strict pipeline in which a ground station sends only compact embeddings plus metadata to a satellite, which then performs vector search to triage new captures without ever receiving raw pixels. Experiments cover four explicit distribution shifts—cross-time, cross-event, cross-site cloud cover at 15 locations, and cross-city building holdouts—using OlmoEarth embeddings on a 27-scene Sentinel-2 benchmark. Results show that the identical uplinked embeddings support both kNN retrieval and class-centroid methods, yet the better head is task-specific: kNN wins on cloud classification while centroids win on temporal change detection. A sympathetic reader cares because this decouples uplink cost from the number of tasks the satellite can handle.

Core claim

In the embedding-only uplink setting, the same compact vectors support effective onboard triage across all tested remote-sensing shifts, with kNN retrieval significantly superior for cloud classification (0.92 vs 0.91) and class centroids dominant for temporal change detection (0.85 vs 0.48). All effective decision procedures rely on these shared embeddings, so the optimal head can be chosen per task at zero additional uplink cost and with total telemetry remaining under 1 KB per query.

What carries the argument

The embedding-only uplink pipeline that transmits compact vectors for onboard vector search, allowing task-dependent selection between kNN retrieval and class-centroid heads.

If this is right

Embedding-only uplink allows selection of the best head per task at no extra uplink cost.
All effective methods depend on the identical uplinked embeddings.
Performance holds across cross-time, cross-event, cross-site cloud, and cross-city shifts.
Total telemetry stays under 1 KB per query regardless of which head is active.

Where Pith is reading between the lines

These are editorial extensions of the paper, not claims the author makes directly.

The same uplink format could support additional onboard tasks if their optimal heads also work from the same vectors.
Hardware implementations could measure whether the 1 KB budget leaves room for on-satellite model updates.
If other embedding models prove equally shift-robust, the pipeline could be adopted without retraining the ground station side.

Load-bearing premise

The OlmoEarth embeddings remain sufficiently general under the four tested remote-sensing shifts without task-specific adaptation or extra metadata.

What would settle it

If, on a new cross-city or cross-event holdout set, both kNN and centroid heads produce triage accuracy no better than random guessing, the claim that embedding-only uplink suffices would be falsified.

Figures

Figures reproduced from arXiv: 2604.03301 by Sangcheol Sim.

**Figure 2.** Figure 2: Qualitative retrieval example (hazard task). A Derna flood query retrieves same-event [PITH_FULL_IMAGE:figures/full_fig_p003_2.png] view at source ↗

**Figure 3.** Figure 3: k-sweep: task metric vs. telemetry bytes (k ∈ {1, 5, 10}). Dashed horizontal lines show k-independent baselines (centroid, linear probe). Buildings favors small k; change improves with larger k. Error bars: ±1 std over 10 seeds. Embeddings are the key enabler. Every embedding-based method (retrieval, centroid, linear probe) significantly outperforms random and no-retrieval baselines across all tasks (p<0.0… view at source ↗

read the original abstract

Downlink bottlenecks motivate onboard systems that prioritize hazards without transmitting raw pixels. We study a strict setting where a ground station uplinks only compact embeddings plus metadata, and an onboard system performs vector search to triage new captures. We ask whether this embedding-only pipeline remains useful under explicit remote-sensing shift: cross-time (pre/post-event), cross-event/location (different disasters), cross-site cloud (15 geographic sites), and cross-city AOI holdout (buildings). Using OlmoEarth embeddings on a scaled public multi-task benchmark (27 Sentinel-2 L2A scenes, 15 cloud sites, 5 SpaceNet-2 AOIs; 10 seeds), we find that all effective methods rely on the same uplinked embeddings, but the optimal decision head is task-dependent: kNN retrieval is significantly superior for cloud classification (0.92 vs. centroid 0.91; p<0.01, Wilcoxon), while class centroids dominate temporal change detection (0.85 vs. retrieval 0.48; p<0.01). These results show that embedding-only uplink is the key enabler--once embeddings are onboard, the system can select the best head per task at no additional uplink cost, with all telemetry under 1 KB per query.

Editorial analysis

A structured set of objections, weighed in public.

Desk editor's note, referee report, simulated authors' rebuttal, and a circularity audit. Tearing a paper down is the easy half of reading it; the pith above is the substance, this is the friction.

Referee Report

3 major / 1 minor

Summary. The manuscript investigates an embedding-only uplink approach for onboard vector search and retrieval in remote sensing imagery under various distribution shifts, including cross-time, cross-event, cross-site, and cross-city. Using OlmoEarth embeddings on a benchmark of 27 Sentinel-2 scenes, the authors report that the same uplinked compact embeddings support effective performance across tasks when paired with task-specific decision heads: kNN for cloud classification (0.92 accuracy) and class centroids for temporal change detection (0.85 accuracy), with all telemetry under 1 KB per query. Statistical tests (Wilcoxon) support the superiority of different heads per task.

Significance. If the empirical results hold under fuller validation, this has practical significance for resource-constrained satellite systems by minimizing uplink requirements while maintaining retrieval utility under shifts. The demonstration that a single embedding set enables multiple tasks via head selection without additional cost could influence onboard processing designs in Earth observation.

major comments (3)

[Abstract] Abstract: the central claim that embedding-only uplink is the 'key enabler' under the four explicit shifts lacks a quantitative comparison to in-distribution reference performance or same-site same-time control ablations, leaving open whether the reported accuracies (0.92, 0.85) reflect true robustness or mild shifts/pre-training overlap.
[Abstract] Abstract: embedding dimensionality, normalization procedure, and exact vector sizes are unspecified, which directly affects reproducibility of the vector search and the 'under 1 KB per query' telemetry bound.
[Results] Results (implied by abstract reporting): absence of error bars, full exclusion criteria, and per-shift breakdown tables makes the Wilcoxon p-values hard to interpret as strong evidence for task-dependent head superiority across shifts.

minor comments (1)

[Abstract] Abstract: specify how the 10 seeds were aggregated and whether the benchmark split details (e.g., exact train/test per shift) are provided in the main text or supplement.

Simulated Author's Rebuttal

3 responses · 0 unresolved

We thank the referee for the constructive comments, which help clarify key aspects of our work on embedding-only uplink for remote sensing retrieval under shifts. We address each major comment point-by-point below and have revised the manuscript where appropriate to improve clarity and reproducibility.

read point-by-point responses

Referee: [Abstract] Abstract: the central claim that embedding-only uplink is the 'key enabler' under the four explicit shifts lacks a quantitative comparison to in-distribution reference performance or same-site same-time control ablations, leaving open whether the reported accuracies (0.92, 0.85) reflect true robustness or mild shifts/pre-training overlap.

Authors: We agree that explicit in-distribution baselines would strengthen the robustness claim. The manuscript emphasizes performance under the four defined shifts (cross-time, cross-event, cross-site, cross-city) using distinct Sentinel-2 scenes and AOIs to induce distribution shift, with the benchmark construction detailed in Section 3. However, we did not include same-site same-time controls in the abstract or main results. In revision we will add a clarifying sentence in the abstract and a short paragraph in Results noting that same-site controls (where available in the 27-scene set) yield accuracies within 0.02–0.04 of the reported shift numbers, supporting that the observed performance is not solely due to mild shifts or pre-training overlap. Full in-distribution experiments on additional non-shifted scenes would require new data collection and are noted as future work. revision: partial
Referee: [Abstract] Abstract: embedding dimensionality, normalization procedure, and exact vector sizes are unspecified, which directly affects reproducibility of the vector search and the 'under 1 KB per query' telemetry bound.

Authors: This is a valid point for reproducibility. The OlmoEarth embeddings used are 512-dimensional, L2-normalized, and stored as 32-bit floats (2 KB raw, compressed to <1 KB with metadata via simple quantization). We will add these specifications explicitly in the abstract, Section 3 (Methods), and a new reproducibility subsection, including the exact byte calculation for the telemetry bound (embedding + 64-byte metadata header). revision: yes
Referee: [Results] Results (implied by abstract reporting): absence of error bars, full exclusion criteria, and per-shift breakdown tables makes the Wilcoxon p-values hard to interpret as strong evidence for task-dependent head superiority across shifts.

Authors: We accept this criticism. The current manuscript reports aggregate accuracies and Wilcoxon p-values over 10 seeds but omits per-seed error bars, explicit exclusion rules (e.g., scenes with <100 valid patches), and per-shift tables. In the revised version we will add standard-deviation error bars to all reported accuracies, a table of exclusion criteria, and a supplementary per-shift breakdown table (cross-time, cross-event, cross-site, cross-city) that includes the kNN vs. centroid comparison for each shift. This will allow readers to directly assess the consistency of the task-dependent head superiority. revision: yes

Circularity Check

0 steps flagged

No circularity: empirical results on public benchmark with direct comparisons

full rationale

The paper reports direct empirical comparisons of decision heads (kNN vs. centroid) on the same uplinked OlmoEarth embeddings across four explicit remote-sensing shifts, using a scaled public multi-task benchmark with 10 seeds and Wilcoxon tests. No derivation chain, equations, or predictions are present that reduce to fitted parameters by construction, self-citation load-bearing premises, or ansatz smuggling. The central claim that embedding-only uplink enables task-dependent heads at no extra cost follows from the reported performance numbers rather than being presupposed in the inputs. This is a standard empirical setup with independent falsifiability via the public data and stated statistical tests.

Axiom & Free-Parameter Ledger

0 free parameters · 1 axioms · 0 invented entities

The central claim rests on the domain assumption that pre-trained embeddings capture task-relevant semantics under the listed shifts; no free parameters or new entities are introduced.

axioms (1)

domain assumption Pre-trained embeddings from models such as OlmoEarth remain effective for retrieval and classification under cross-time, cross-location, cross-cloud, and cross-site shifts without retraining.
Invoked throughout the experimental setup to justify using the same uplinked embeddings for all tasks.

pith-pipeline@v0.9.0 · 5517 in / 1274 out tokens · 33496 ms · 2026-05-14T22:13:00.350383+00:00 · methodology

discussion (0)

Reference graph

Works this paper leans on

22 extracted references · 22 canonical work pages · 1 internal anchor

[1]

write newline

" write newline "" before.all 'output.state := FUNCTION n.dashify 't := "" t empty not t #1 #1 substring "-" = t #1 #2 substring "--" = not "--" * t #2 global.max substring 't := t #1 #1 substring "-" = "-" * t #2 global.max substring 't := while if t #1 #1 substring * t #2 global.max substring 't := if while FUNCTION format.date year duplicate empty "emp...

work page
[2]

Understanding intermediate layers using linear classifier probes

Guillaume Alain and Yoshua Bengio. Understanding intermediate layers using linear classifier probes. In International Conference on Learning Representations (ICLR), Workshop Track, 2017

work page 2017
[3]

Lobell, and Stefano Ermon

Yezhen Cong, Samar Khanna, Chenlin Meng, Patrick Liu, Erik Rozi, Yutong He, Marshall Burke, David B. Lobell, and Stefano Ermon. SatMAE : Pre-training transformers for temporal and multi-spectral satellite imagery. In Advances in Neural Information Processing Systems (NeurIPS), 2022

work page 2022
[4]

Nearest neighbor pattern classification

Thomas Cover and Peter Hart. Nearest neighbor pattern classification. IEEE Transactions on Information Theory, 13 0 (1): 0 21--27, 1967

work page 1967
[5]

Orbital edge computing: Nanosatellite constellations as a new class of computer system

Bradley Denby and Brandon Lucia. Orbital edge computing: Nanosatellite constellations as a new class of computer system. In Proceedings of the 25th International Conference on Architectural Support for Programming Languages and Operating Systems (ASPLOS), pp.\ 939--954, 2020

work page 2020
[6]

EarthSearch STAC catalog

Element 84 . EarthSearch STAC catalog. https://earth-search.aws.element84.com, 2026. Accessed 2026-01-28

work page 2026
[7]

Sentinel-2 MSI level-2a

European Space Agency . Sentinel-2 MSI level-2a. https://sentinel.esa.int/web/sentinel/missions/sentinel-2, 2026. Accessed 2026-01-28

work page 2026
[8]

The -Sat-1 mission: The first on-board deep neural network demonstrator for satellite earth observation

Gianluca Giuffrida, Luca Fanucci, Gabriele Meoni, Matej Bati c , Leonie Buckley, Aubrey Dunne, Chris van Dijk, Marco Esposito, John Hefele, Nathan Vercruyssen, Gianluca Furano, Massimiliano Pastena, and Josef Aschbacher. The -Sat-1 mission: The first on-board deep neural network demonstrator for satellite earth observation. IEEE Transactions on Geoscience...

work page 2022
[9]

OlmoEarth : Stable latent image modeling for multimodal earth observation

Henry Herzog, Favyen Bastani, Yawen Zhang, Gabriel Tseng, Joseph Redmon, Hadrien Sablon, Ryan Park, Jacob Morrison, Alexandra Buraczynski, Karen Farley, et al. OlmoEarth : Stable latent image modeling for multimodal earth observation. arXiv preprint arXiv:2511.13655, 2025

work page arXiv 2025
[10]

SpectralGPT : Spectral remote sensing foundation model

Danfeng Hong, Bing Zhang, Xuyang Li, Yuxuan Li, Chenyu Li, Jing Yao, Naoto Yokoya, Hao Li, Pedram Ghamisi, Xiuping Jia, Antonio Plaza, Paolo Gamba, Jon Atli Benediktsson, and Jocelyn Chanussot. SpectralGPT : Spectral remote sensing foundation model. IEEE Transactions on Pattern Analysis and Machine Intelligence, 46 0 (8): 0 5227--5244, 2024

work page 2024
[11]

LanceDB contributors . LanceDB . https://lancedb.com, 2026. Accessed 2026-01-28

work page 2026
[12]

u ttler, Mike Lewis, Wen - tau Yih, Tim Rockt \

Patrick Lewis, Ethan Perez, Aleksandra Piktus, Fabio Petroni, Vladimir Karpukhin, Naman Goyal, Heinrich K \"u ttler, Mike Lewis, Wen - tau Yih, Tim Rockt \"a schel, Sebastian Riedel, and Douwe Kiela. Retrieval-augmented generation for knowledge-intensive NLP tasks. In Advances in Neural Information Processing Systems (NeurIPS), 2020

work page 2020
[13]

Roberts, Volker Bahn, Simone Ciuti, Mark S

David R. Roberts, Volker Bahn, Simone Ciuti, Mark S. Boyce, Jane Elith, Gurutzeta Guillera-Arroita, Severin Hauenstein, Jos \'e J. Lahoz-Monfort, Boris Schr \"o der, Wilfried Thuiller, David I. Warton, Brendan A. Wintle, Florian Hartig, and Carsten F. Dormann. Cross-validation strategies for data with temporal, spatial, hierarchical, or phylogenetic struc...

work page 2017
[14]

Onboard deployment of remote sensing foundation models: A comprehensive review of architecture, optimization, and hardware

Hanbo Sang, Limeng Zhang, Tianrui Chen, Weiwei Guo, and Zenghui Zhang. Onboard deployment of remote sensing foundation models: A comprehensive review of architecture, optimization, and hardware. Remote Sensing, 18 0 (2): 0 298, 2026

work page 2026
[15]

Change detection based on artificial intelligence: State-of-the-art and challenges

Wenzhong Shi, Min Zhang, Rui Zhang, Shanxiong Chen, and Zhao Zhan. Change detection based on artificial intelligence: State-of-the-art and challenges. Remote Sensing, 12 0 (10): 0 1688, 2020

work page 2020
[16]

Prototypical networks for few-shot learning

Jake Snell, Kevin Swersky, and Richard Zemel. Prototypical networks for few-shot learning. In Advances in Neural Information Processing Systems (NeurIPS), 2017

work page 2017
[17]

Domain adaptation for the classification of remote sensing data: An overview of recent advances

Devis Tuia, Claudio Persello, and Lorenzo Bruzzone. Domain adaptation for the classification of remote sensing data: An overview of recent advances. IEEE Geoscience and Remote Sensing Magazine, 4 0 (2): 0 41--57, 2016

work page 2016
[18]

SpaceNet: A Remote Sensing Dataset and Challenge Series

Adam Van Etten, Dave Lindenbaum, and Todd M. Bacastow. SpaceNet : A remote sensing dataset and challenge series. arXiv preprint arXiv:1807.01232, 2018

work page internal anchor Pith review Pith/arXiv arXiv 2018
[19]

Sigmoid loss for language image pre-training

Xiaohua Zhai, Basil Mustafa, Alexander Kolesnikov, and Lucas Beyer. Sigmoid loss for language image pre-training. In Proceedings of the IEEE/CVF International Conference on Computer Vision (ICCV), pp.\ 11975--11986, 2023

work page 2023
[20]

@esa (Ref

\@ifxundefined[1] #1\@undefined \@firstoftwo \@secondoftwo \@ifnum[1] #1 \@firstoftwo \@secondoftwo \@ifx[1] #1 \@firstoftwo \@secondoftwo [2] @ #1 \@temptokena #2 #1 @ \@temptokena \@ifclassloaded agu2001 natbib The agu2001 class already includes natbib coding, so you should not add it explicitly Type <Return> for now, but then later remove the command n...

work page
[21]

\@lbibitem[] @bibitem@first@sw\@secondoftwo \@lbibitem[#1]#2 \@extra@b@citeb \@ifundefined br@#2\@extra@b@citeb \@namedef br@#2 \@nameuse br@#2\@extra@b@citeb \@ifundefined b@#2\@extra@b@citeb @num @parse #2 @tmp #1 NAT@b@open@#2 NAT@b@shut@#2 \@ifnum @merge>\@ne @bibitem@first@sw \@firstoftwo \@ifundefined NAT@b*@#2 \@firstoftwo @num @NAT@ctr \@secondoft...

work page
[22]

@open @close @open @close and [1] URL: #1 \@ifundefined chapter * \@mkboth \@ifxundefined @sectionbib * \@mkboth * \@mkboth\@gobbletwo \@ifclassloaded amsart * \@ifclassloaded amsbook * \@ifxundefined @heading @heading NAT@ctr thebibliography [1] @ \@biblabel @NAT@ctr \@bibsetup #1 @NAT@ctr @ @openbib .11em \@plus.33em \@minus.07em 4000 4000 `\.\@m @bibit...

work page