pith. machine review for the scientific record. sign in

arxiv: 2604.11562 · v1 · submitted 2026-04-13 · 💻 cs.CV

Recognition: unknown

The Impact of Federated Learning on Distributed Remote Sensing Archives

Authors on Pith no claims yet

Pith reviewed 2026-05-10 16:05 UTC · model grok-4.3

classification 💻 cs.CV
keywords federated learningremote sensingnon-IID dataimage classificationFedProxFedAvgCNNmulti-label
0
0 comments X

The pith

FedProx outperforms FedAvg for deeper models under label skew on remote sensing data while LeNet offers the best accuracy-communication trade-off.

A machine-rendered reading of the paper's core claim, the machinery that carries it, and where it could break.

Remote sensing archives from missions like Sentinel hold petabytes of imagery that cannot be centralized due to volume, sovereignty, and geographic spread. Federated learning trains models by exchanging updates only, yet non-identical label distributions across regions hinder standard algorithms like FedAvg. The study applies FedAvg, FedProx, and bulk synchronous parallel to the UC Merced multi-label dataset under controlled label skew using LeNet, AlexNet, and ResNet-34. It shows FedProx improves convergence for deeper networks, BSP nears centralized accuracy at high communication cost, and LeNet delivers the strongest balance at the scales tested.

Core claim

Under controlled non-IID label-skew conditions on the UC Merced multi-label dataset, FedProx outperforms FedAvg for deeper convolutional architectures, bulk synchronous parallel approaches centralized accuracy levels at the expense of high sequential communication, and LeNet provides the best accuracy-communication trade-off among the evaluated models.

What carries the argument

Joint comparison of federated algorithms (FedAvg, FedProx, BSP) against CNN depths (LeNet, AlexNet, ResNet-34) under varying client counts, client fractions, batch sizes, and label-skew partitions.

Load-bearing premise

The artificial non-IID label skew created on the UC Merced dataset mirrors the geographic and label variations in real distributed remote sensing archives.

What would settle it

Partitioning actual Sentinel-1 or Sentinel-2 imagery by geographic region, applying the same federated protocols, and checking whether FedProx still outperforms FedAvg for deeper models would confirm or refute the results.

Figures

Figures reproduced from arXiv: 2604.11562 by Anand Umashankar, Karam Tomotaki-Dawoud, Nicolai Schneider.

Figure 2
Figure 2. Figure 2: Basic steps of Federated Averaging 2) FedProx: FedProx [5] is an extension to FedAvg that has modifications to tackle non-identical distributions in data and also accounts for system heterogeneity. FedProx provides more reliable convergence when compared to FedAvg. On average, a 22% accuracy improvement is shown across highly heterogeneous settings. Their work is mainly based on adding a ”proximal” term to… view at source ↗
Figure 1
Figure 1. Figure 1: Basic steps of Federated Learning 1) Federated Averaging: The basic Federated Learning model presents one major issue, which is the enormous communication between the clients and the host, and the high computation. One of the most common Federated Learning algorithms, which tries to tackle these issues, introduced in [4], is FedAvg. Let K be the set of clients, for each training round FedAvg only sends the… view at source ↗
Figure 3
Figure 3. Figure 3: Residual block used by ResNet architecture [PITH_FULL_IMAGE:figures/full_fig_p003_3.png] view at source ↗
Figure 4
Figure 4. Figure 4: Some UC Merced Land Use Dataset examples, showing both the [PITH_FULL_IMAGE:figures/full_fig_p004_4.png] view at source ↗
Figure 5
Figure 5. Figure 5: Example of the different augmentation methods on the same image, [PITH_FULL_IMAGE:figures/full_fig_p004_5.png] view at source ↗
Figure 6
Figure 6. Figure 6: UC Merced Land Use Dataset multilabel distribution: the total number [PITH_FULL_IMAGE:figures/full_fig_p005_6.png] view at source ↗
Figure 7
Figure 7. Figure 7: UC Merced Land Use Dataset multilabel cosinus similarity matrix; [PITH_FULL_IMAGE:figures/full_fig_p005_7.png] view at source ↗
Figure 8
Figure 8. Figure 8: Comparing centralized vs BSP and federated Learning. For BSP, FedAvg, and FedProx 8 clients [PITH_FULL_IMAGE:figures/full_fig_p006_8.png] view at source ↗
Figure 9
Figure 9. Figure 9: FedAvg vs FedProx for different Deep Learning Models. 8 clients [PITH_FULL_IMAGE:figures/full_fig_p007_9.png] view at source ↗
Figure 10
Figure 10. Figure 10: Effect of varying Cfraction ∈ {0.5, 0.75, 1}. 8 clients, and 40% skewness on the less common labels were used. 2) Number of clients: We vary the number of clients between 10, 25, and 50, with the client fraction set to 0.5 for all the runs. This effectively means approximately half of the data is used for training on each round. Since the client numbers are large, we have to use the less common labels to … view at source ↗
Figure 11
Figure 11. Figure 11: Effect of varying number of clients ∈ {10, 25, 50}. cfraction of 0.5, and 40% skewness on the less common labels were used here, however, since the number of clients is more than the number of unique labels, the distribution over the clients ends up more IID. and for larger batch sizes (64, 128, 256), the model fails to converge to any meaningful results. While it is quite evident that batch size 1 perfor… view at source ↗
Figure 13
Figure 13. Figure 13: Effect of varying data skewness ∈ {0, 20, 40}% on common labels on LeNet. 4 clients with a cfraction of 0.75 were used. the skewness of the data does not impact the learning of the Deep Learning models using FedAvg. These results might vary when using a larger dataset. D. Communication Cost Comparison In this section, we present the communication costs associ￾ated with the different federated algorithms a… view at source ↗
Figure 14
Figure 14. Figure 14: (a) shows the effect of varying data skewness ∈ {40, 60, 80}% on less common labels on LeNet. 8 clients with a cfraction of 0.75 were used. (b) shows the difference between the max F1 scores achieved by BSP and FedAvg. The problem with BSP, however, is that the model has to be trained sequentially on each client, and this means the runtime on BSP is again high compared to FedAvg, where the training can ha… view at source ↗
Figure 15
Figure 15. Figure 15: Both graphs show the training communication costs in KiloBytes [PITH_FULL_IMAGE:figures/full_fig_p010_15.png] view at source ↗
read the original abstract

Remote sensing archives are inherently distributed: Earth observation missions such as Sentinel-1, Sentinel-2, and Sentinel-3 have collectively accumulated more than 5 petabytes of imagery, stored and processed across many geographically dispersed platforms. Training machine learning models on such data in a centralized fashion is impractical due to data volume, sovereignty constraints, and geographic distribution. Federated learning (FL) addresses this by keeping data local and exchanging only model updates. A central challenge for remote sensing is the non-IID nature of Earth observation data: label distributions vary strongly by geographic region, degrading the convergence of standard FL algorithms. In this paper, we conduct a systematic empirical study of three FL strategies -- FedAvg, FedProx, and bulk synchronous parallel (BSP) -- applied to multi-label remote sensing image classification under controlled non-IID label-skew conditions. We evaluate three convolutional neural network (CNN) architectures of increasing depth (LeNet, AlexNet, and ResNet-34) and analyze the joint effect of algorithm choice, model capacity, client fraction, client count, batch size, and communication cost. Experiments on the UC Merced multi-label dataset show that FedProx outperforms FedAvg for deeper architectures under data heterogeneity, that BSP approaches centralized accuracy at the cost of high sequential communication, and that LeNet provides the best accuracy-communication trade-off for the dataset scale considered.

Editorial analysis

A structured set of objections, weighed in public.

Desk editor's note, referee report, simulated authors' rebuttal, and a circularity audit. Tearing a paper down is the easy half of reading it; the pith above is the substance, this is the friction.

Referee Report

1 major / 2 minor

Summary. The paper conducts a systematic empirical study of three federated learning strategies (FedAvg, FedProx, and bulk synchronous parallel) for multi-label remote sensing image classification. It evaluates three CNN architectures of increasing depth (LeNet, AlexNet, ResNet-34) on the UC Merced dataset under controlled non-IID label-skew partitions, while sweeping hyperparameters including client fraction, client count, batch size, and communication cost. The reported findings are that FedProx outperforms FedAvg for deeper architectures under heterogeneity, BSP approaches centralized accuracy at the expense of high sequential communication, and LeNet yields the best accuracy-communication trade-off for the dataset scale considered.

Significance. If the results hold under more representative conditions, the work supplies actionable guidance on algorithm and architecture selection for federated training on distributed Earth-observation data, quantifying concrete trade-offs between accuracy and communication overhead in non-IID regimes. The comprehensive hyperparameter analysis and direct comparison against centralized baselines are strengths of the empirical design.

major comments (1)
  1. [Abstract] Abstract: The motivating use case is explicitly the non-IID structure of Sentinel-1/2/3 archives arising from geographic region, acquisition time, spectral characteristics, and land-cover co-occurrence at continental scale. All quantitative claims, however, derive from label-skew partitions of the UC Merced dataset (2100 high-resolution aerial photographs from a narrow set of U.S. locations). Because UC Merced is neither satellite imagery nor geographically distributed at the scale of operational EO archives, the observed performance ordering (FedProx > FedAvg for deeper nets; LeNet best trade-off) may not transfer; a direct test on geographically partitioned Sentinel tiles would be required to support the title and abstract claims.
minor comments (2)
  1. The abstract and experimental summary omit explicit statements on the number of independent runs, random seeds, and whether error bars or statistical tests accompany the reported accuracy rankings; these details are needed to assess robustness of the relative ordering between FedProx and FedAvg.
  2. A short paragraph in the conclusions or discussion section explicitly acknowledging that UC Merced serves only as a controlled proxy and that real Sentinel heterogeneity may alter the conclusions would improve clarity without altering the empirical contribution.

Simulated Author's Rebuttal

1 responses · 1 unresolved

We thank the referee for the careful reading and for identifying the important gap between the motivating application and the experimental setting. We agree that the claims require clarification and will revise the abstract, introduction, and discussion accordingly while preserving the core empirical contributions on the UC Merced benchmark.

read point-by-point responses
  1. Referee: [Abstract] Abstract: The motivating use case is explicitly the non-IID structure of Sentinel-1/2/3 archives arising from geographic region, acquisition time, spectral characteristics, and land-cover co-occurrence at continental scale. All quantitative claims, however, derive from label-skew partitions of the UC Merced dataset (2100 high-resolution aerial photographs from a narrow set of U.S. locations). Because UC Merced is neither satellite imagery nor geographically distributed at the scale of operational EO archives, the observed performance ordering (FedProx > FedAvg for deeper nets; LeNet best trade-off) may not transfer; a direct test on geographically partitioned Sentinel tiles would be required to support the title and abstract claims.

    Authors: We acknowledge that UC Merced is a controlled, small-scale aerial benchmark rather than a direct proxy for continental-scale Sentinel archives. The label-skew partitioning was chosen to isolate the impact of non-IID label distributions in a reproducible way, following common practice in federated learning studies on remote-sensing classification. We agree that the specific performance ordering may not generalize to geographically partitioned satellite tiles with additional spectral and temporal heterogeneity. In the revised manuscript we will (i) rephrase the abstract and title to state that the study examines federated learning on a standard multi-label remote-sensing benchmark under controlled label skew, motivated by the challenges of distributed Earth-observation archives, and (ii) add an explicit limitations paragraph that discusses the dataset choice and calls for future work on real Sentinel partitions. We cannot, however, conduct new experiments on full Sentinel tiles within the scope of this revision. revision: yes

standing simulated objections not resolved
  • Conducting additional experiments on geographically partitioned Sentinel-1/2/3 tiles at operational scale, due to data-access, storage, and computational constraints.

Circularity Check

0 steps flagged

No derivation chain; purely empirical comparisons with no fitted predictions or self-referential claims

full rationale

The manuscript contains no equations, first-principles derivations, or claims that a quantity is predicted from a model. All reported results are direct experimental measurements of accuracy, communication cost, and convergence on controlled label-skew partitions of the UC Merced dataset. No parameters are fitted and then re-used as predictions, no uniqueness theorems are invoked, and no self-citations form load-bearing premises. The work is therefore self-contained as an empirical benchmark study; the skeptic concern about dataset representativeness is a question of external validity, not circularity within the paper's own logic.

Axiom & Free-Parameter Ledger

0 free parameters · 0 axioms · 0 invented entities

This is an empirical comparison study; no free parameters, axioms, or invented entities are introduced beyond standard federated learning and CNN components already established in the literature.

pith-pipeline@v0.9.0 · 5550 in / 1136 out tokens · 71702 ms · 2026-05-10T16:05:46.012156+00:00 · methodology

discussion (0)

Sign in with ORCID, Apple, or X to comment. Anyone can read and Pith papers without signing in.

Reference graph

Works this paper leans on

22 extracted references · 5 canonical work pages

  1. [1]

    Vitor C. F. Gomes, Gilberto R. Queiroz, and Karine R. Ferreira. An overview of platforms for big earth obser- vation data management and analysis.Remote Sensing, 12(8), 2020

  2. [2]

    Federated learning: Collaborative machine learning without cen- tralized training data

    Brendan McMahan and Daniel Ramage. Federated learning: Collaborative machine learning without cen- tralized training data. https://ai.googleblog.com/2017/04/ federated-learning-collaborative.html, 2017

  3. [3]

    Kevin Hsieh, Amar Phanishayee, Onur Mutlu, and Phillip B. Gibbons. The non-iid data quagmire of decentralized machine learning.CoRR, abs/1910.00189, 2019

  4. [4]

    Communication-efficient learning of deep networks from decentralized data,

    H. Brendan McMahan, Eider Moore, Daniel Ramage, and Blaise Ag ¨uera y Arcas. Federated learning of deep networks using model averaging.CoRR, abs/1602.05629, 2016

  5. [5]

    2020 , eprint=

    Anit Kumar Sahu, Tian Li, Maziar Sanjabi, Manzil Zaheer, Ameet Talwalkar, and Virginia Smith. On the convergence of federated optimization in heterogeneous networks.CoRR, abs/1812.06127, 2018

  6. [6]

    Leslie G. Valiant. A bridging model for parallel compu- tation.Commun. ACM, 33(8):103–111, August 1990

  7. [7]

    Enhanced deep residual networks for single image super-resolution.CoRR, abs/1707.02921, 2017

    Bee Lim, Sanghyun Son, Heewon Kim, Seungjun Nah, and Kyoung Mu Lee. Enhanced deep residual networks for single image super-resolution.CoRR, abs/1707.02921, 2017

  8. [8]

    Imagenet classification with deep convolutional neural networks

    Alex Krizhevsky, Ilya Sutskever, and Geoffrey E Hinton. Imagenet classification with deep convolutional neural networks. InAdvances in neural information processing systems, pages 1097–1105, 2012

  9. [9]

    Gradient-based learning applied to document recognition

    Yann LeCun, L ´eon Bottou, Yoshua Bengio, and Patrick Haffner. Gradient-based learning applied to document recognition. InProceedings of the IEEE, volume 86, pages 2278–2324, 1998

  10. [10]

    Yi Yang and Shawn D. Newsam. Bag-of-visual-words and spatial extensions for land-use classification. In Divyakant Agrawal, Pusheng Zhang, Amr El Abbadi, and Mohamed F. Mokbel, editors,GIS, pages 270–279. ACM, 2010

  11. [11]

    Ganger, Phillip B

    Kevin Hsieh, Aaron Harlap, Nandita Vijaykumar, Dim- itris Konomis, Gregory R. Ganger, Phillip B. Gibbons, and Onur Mutlu. Gaia: Geo-distributed machine learning approaching lan speeds. InProceedings of the 14th USENIX Conference on Networked Systems Design and Implementation, NSDI’17, page 629–647, USA, 2017. USENIX Association

  12. [12]

    Yujun Lin, Song Han, Huizi Mao, Yu Wang, and William J. Dally. Deep gradient compression: Reducing the communication bandwidth for distributed training, 2020

  13. [13]

    Reddi, Sebastian U

    Sai Praneeth Karimireddy, Satyen Kale, Mehryar Mohri, Sashank J. Reddi, Sebastian U. Stich, and Ananda Theertha Suresh. Scaffold: Stochastic controlled averaging for federated learning, 2020

  14. [14]

    FedBoost: A communication-efficient algorithm for federated learning

    Jenny Hamer, Mehryar Mohri, and Ananda Theertha Suresh. FedBoost: A communication-efficient algorithm for federated learning. In Hal Daum ´e III and Aarti Singh, editors,Proceedings of the 37th International Conference on Machine Learning, volume 119 ofProceedings of Machine Learning Research, pages 3973–3983. PMLR, 13–18 Jul 2020

  15. [15]

    Fetchsgd: Communication-efficient federated learning with sketching, 2020

    Daniel Rothchild, Ashwinee Panda, Enayat Ullah, Nikita Ivkin, Ion Stoica, Vladimir Braverman, Joseph Gonzalez, and Raman Arora. Fetchsgd: Communication-efficient federated learning with sketching, 2020

  16. [16]

    Chaudhuri, B

    B. Chaudhuri, B. Demir, S. Chaudhuri, and L. Bruzzone. Multilabel remote sensing image retrieval using a semisu- pervised graph-theoretic method.IEEE Transactions on Geoscience and Remote Sensing, 56(2):1144–1158, 2018

  17. [17]

    Dietterich

    Dan Hendrycks and Thomas G. Dietterich. Benchmark- ing neural network robustness to common corruptions and surface variations, 2019

  18. [18]

    Multi-labelled classification using maximum entropy method

    Shenghuo Zhu, Xiang Ji, Wei Xu, and Yihong Gong. Multi-labelled classification using maximum entropy method. pages 274–281, 08 2005

  19. [19]

    Data mining and knowledge discovery handbook, 2005

    Oded Z Maimon. Data mining and knowledge discovery handbook, 2005

  20. [20]

    Torch7: A Matlab-like Environment for Machine Learning

    Ronan Collobert, Koray Kavukcuoglu, and Cl ´ement Fara- bet. Torch7: A Matlab-like Environment for Machine Learning. Technical report

  21. [21]

    Openmined/pysyft

    OpenMined. Openmined/pysyft

  22. [22]

    Bigearthnet: A large-scale benchmark archive for remote sensing image understanding,

    Gencer Sumbul, Marcela Charfuelan, Beg ¨um Demir, and V olker Markl. Bigearthnet: A large-scale benchmark archive for remote sensing image understanding.CoRR, abs/1902.06148, 2019. TABLE I EXPERIMENTALSETUP: PARAMETERS DL Model FL Algorithm Epochs Clients Batch Size C- Fraction Skewness Client Epochs Small Skew LeNet Centralized 100 NA 4 NA NA NA NA ResNe...