pith. sign in

arxiv: 1906.10718 · v1 · pith:XXNAZLBMnew · submitted 2019-06-25 · 💻 cs.DC · cs.LG

Active Learning Solution on Distributed Edge Computing

Pith reviewed 2026-05-25 15:46 UTC · model grok-4.3

classification 💻 cs.DC cs.LG
keywords active learningfederated learningedge computingfog computingdistributed machine learningimage classification
0
0 comments X

The pith

Active learning on edge devices plus federated learning on fog nodes reduces the samples and communication needed to train image classifiers in distributed setups.

A machine-rendered reading of the paper's core claim, the machinery that carries it, and where it could break.

The paper proposes splitting data handling in fog platforms so that edge devices run active learning to pick informative samples while the fog node runs federated learning to combine models. This division is presented as a way to cut the volume of data required for training and the amount of data moved between devices and nodes. The approach is evaluated on an image classification task under both massively distributed and non-massively distributed conditions.

Core claim

By decomposing data aggregation and processing between edge devices and fog nodes, active learning at the edges selects fewer samples and federated learning at the fog node aggregates models without centralizing raw data, thereby lowering both training sample count and communication cost for image classification in the two distribution regimes.

What carries the argument

Intelligent division of active learning (edge) and federated learning (fog) that performs sample selection locally and model aggregation centrally.

If this is right

  • Fewer raw data samples need to be stored or transmitted from edge devices.
  • Communication volume between edges and fog decreases because only model updates or selected samples move.
  • Local processing at edges supports privacy by limiting data sharing.
  • Separate solutions are offered for massively distributed versus non-massively distributed device populations.

Where Pith is reading between the lines

These are editorial extensions of the paper, not claims the author makes directly.

  • The method may extend to other supervised tasks if active learning query strategies remain effective on edge hardware.
  • Energy use on battery-powered edges could drop if fewer samples are processed locally.
  • Deployment would still require verifying that the fog node can handle the federated aggregation load without becoming a bottleneck.

Load-bearing premise

The split of tasks between edges and fog nodes can be arranged so that accuracy stays acceptable and no new overheads erase the claimed reductions in samples and communication.

What would settle it

A direct comparison on the same image classification task showing that the active-plus-federated method requires at least as many samples or as much communication as a baseline centralized or non-active approach while matching accuracy.

Figures

Figures reproduced from arXiv: 1906.10718 by Jia Qian, Lars Kai Hansen, Sayantan Sengupta.

Figure 1
Figure 1. Figure 1: Pool-based Active Learning Framework. maximizing the likelihood. Uncertainty-based methods aim to use uncertain information to enhance the model during the training process. It plays the role of the exploitation while acts as the exploration part. We will introduce three different ways to estimate the uncertainty. – Maximal Entropy: H[y|x, Dtrain] is the predictive en￾tropy expectation as defined in [9]. H… view at source ↗
Figure 2
Figure 2. Figure 2: Overview of the scheme. non-massive case, a small number of distributed devices, let’s say four edge devices and one centralized node. Initially, we trained LeNet model by 20 images at the centralized node (Fog Node), and then dispatch the model to the edge devices. On the devices side, we further trained the model by additional data points that are generated locally. They are acquired by entropy, bald or … view at source ↗
Figure 4
Figure 4. Figure 4: Learning curve: Well-Trained Model. B. Experiment II: AL acquisition number In this series of experiment, we study how does the acqui￾sition number influence the performance. Recall that during every data acquisition, we include 10 additional images for further training [PITH_FULL_IMAGE:figures/full_fig_p005_4.png] view at source ↗
Figure 7
Figure 7. Figure 7: Active Learning Vs Random Sample (20 Acquisitions). [PITH_FULL_IMAGE:figures/full_fig_p006_7.png] view at source ↗
Figure 5
Figure 5. Figure 5: Learning Curve of Edge Devices for 10, 20, 30 and 40 acquisitions [PITH_FULL_IMAGE:figures/full_fig_p006_5.png] view at source ↗
Figure 6
Figure 6. Figure 6: Active Learning Vs Random Sample (10 Acquisitions). [PITH_FULL_IMAGE:figures/full_fig_p006_6.png] view at source ↗
Figure 8
Figure 8. Figure 8: learning curves: 20 devices, trained by 60 images. [PITH_FULL_IMAGE:figures/full_fig_p007_8.png] view at source ↗
Figure 9
Figure 9. Figure 9: Accuracy from the centralized fog node where we have 20 devices [PITH_FULL_IMAGE:figures/full_fig_p007_9.png] view at source ↗
Figure 10
Figure 10. Figure 10: Accuracy from the centralized fog node where we have 20 devices [PITH_FULL_IMAGE:figures/full_fig_p007_10.png] view at source ↗
Figure 11
Figure 11. Figure 11: Architecture of massively distributed setting. Diagram A indicates [PITH_FULL_IMAGE:figures/full_fig_p008_11.png] view at source ↗
read the original abstract

Industry 4.0 becomes possible through the convergence between Operational and Information Technologies. All the requirements to realize the convergence is integrated on the Fog Platform. Fog Platform is introduced between the cloud server and edge devices when the unprecedented generation of data causes the burden of the cloud server, leading the ineligible latency. In this new paradigm, we divide the computation tasks and push it down to edge devices. Furthermore, local computing (at edge side) may improve privacy and trust. To address these problems, we present a new method, in which we decompose the data aggregation and processing, by dividing them between edge devices and fog nodes intelligently. We apply active learning on edge devices; and federated learning on the fog node which significantly reduces the data samples to train the model as well as the communication cost. To show the effectiveness of the proposed method, we implemented and evaluated its performance for an image classification task. In addition, we consider two settings: massively distributed and non-massively distributed and offer the corresponding solutions.

Editorial analysis

A structured set of objections, weighed in public.

Desk editor's note, referee report, simulated authors' rebuttal, and a circularity audit. Tearing a paper down is the easy half of reading it; the pith above is the substance, this is the friction.

Referee Report

0 major / 2 minor

Summary. The manuscript proposes a hybrid active learning and federated learning approach for distributed edge computing in Industry 4.0 settings. Active learning is performed on edge devices to select informative samples, while federated learning aggregates models at fog nodes. This is claimed to reduce the number of training samples and communication costs. The method is evaluated on an image classification task under both massively distributed and non-massively distributed settings.

Significance. If the reported experimental reductions in samples and communication hold with maintained accuracy, the work could provide a practical technique for lowering overhead in fog-edge deployments while preserving privacy. The explicit handling of two distribution regimes is a useful contribution, and the presence of concrete accuracy and communication metrics in the experimental section strengthens the central claim.

minor comments (2)
  1. The abstract asserts significant reductions in data samples and communication cost but provides no quantitative results, baselines, or error bars; adding a sentence with key metrics would better support the claim.
  2. The description of how the two settings (massively vs. non-massively distributed) are implemented could be expanded with more detail on data partitioning and model update frequency to aid reproducibility.

Simulated Author's Rebuttal

0 responses · 0 unresolved

We thank the referee for the positive summary, significance assessment, and recommendation of minor revision. The acknowledgment of the practical value for fog-edge deployments and the explicit treatment of the two distribution regimes is appreciated. No specific major comments were provided in the report.

Circularity Check

0 steps flagged

No significant circularity; empirical evaluation stands alone

full rationale

The manuscript describes an empirical architecture that applies active learning at edge devices and federated learning at the fog node, then reports concrete accuracy and communication metrics on an image-classification task under massively and non-massively distributed regimes. No equations, parameter-fitting steps, uniqueness theorems, or self-citations appear in the provided text that would allow any claimed result to reduce to its own inputs by construction. The central claims are therefore supported by external experimental outcomes rather than definitional or self-referential loops.

Axiom & Free-Parameter Ledger

0 free parameters · 0 axioms · 0 invented entities

Only the abstract is available; it contains no mathematical derivations, fitted constants, or postulated entities, so the ledger is empty.

pith-pipeline@v0.9.0 · 5699 in / 1217 out tokens · 32151 ms · 2026-05-25T15:46:44.613417+00:00 · methodology

discussion (0)

Sign in with ORCID, Apple, or X to comment. Anyone can read and Pith papers without signing in.

Reference graph

Works this paper leans on

29 extracted references · 29 canonical work pages · 5 internal anchors

  1. [1]

    ”Fog computing and its role in the internet of things.” Proceedings of the first edition of the MCC workshop on Mobile cloud computing

    Bonomi, Flavio, et al. ”Fog computing and its role in the internet of things.” Proceedings of the first edition of the MCC workshop on Mobile cloud computing. ACM, 2012

  2. [2]

    ”Fog computing: A platform for internet of things and analytics.” Big data and internet of things: A roadmap for smart environments

    Bonomi, Flavio, et al. ”Fog computing: A platform for internet of things and analytics.” Big data and internet of things: A roadmap for smart environments. Springer, Cham, 2014. 169-186

  3. [3]

    ”Fog computing: Helping the Internet of Things realize its potential.” Computer 49.8 (2016): 112- 116

    Dastjerdi, Amir Vahid, and Rajkumar Buyya. ”Fog computing: Helping the Internet of Things realize its potential.” Computer 49.8 (2016): 112- 116

  4. [4]

    ”Active learning literature survey

    Settles, Burr. ”Active learning literature survey. 2010.” Computer Sci- ences Technical Report 1648 (2014)

  5. [5]

    ”Dropout as a Bayesian approxi- mation: Representing model uncertainty in deep learning.” international conference on machine learning

    Gal, Yarin, and Zoubin Ghahramani. ”Dropout as a Bayesian approxi- mation: Representing model uncertainty in deep learning.” international conference on machine learning. 2016

  6. [6]

    Federated Learning: Strategies for Improving Communication Efficiency

    Konen, Jakub, et al. ”Federated learning: Strategies for improving communication efficiency.” arXiv preprint arXiv:1610.05492 (2016)

  7. [7]

    ”Gaussian processes in machine learning.” Advanced lectures on machine learning

    Rasmussen, Carl Edward. ”Gaussian processes in machine learning.” Advanced lectures on machine learning. Springer, Berlin, Heidelberg,

  8. [8]

    LeCun, Yann, Corinna Cortes, and C. J. Burges. ”MNIST handwritten digit database.” AT &T Labs [Online]. Available: http://yann. lecun. com/exdb/mnist 2 (2010)

  9. [9]

    ”A mathematical theory of communication.” Bell system technical journal 27.3 (1948): 379-423

    Shannon, Claude Elwood. ”A mathematical theory of communication.” Bell system technical journal 27.3 (1948): 379-423

  10. [10]

    ”Support vector machine active learning with applications to text classification.” Journal of machine learning research 2.Nov (2001): 45-66

    Tong, Simon, and Daphne Koller. ”Support vector machine active learning with applications to text classification.” Journal of machine learning research 2.Nov (2001): 45-66

  11. [11]

    Bayesian Active Learning for Classification and Preference Learning

    Houlsby, Neil, et al. ”Bayesian active learning for classification and preference learning.” arXiv preprint arXiv:1112.5745 (2011)

  12. [12]

    Elementary applied statistics: for students in behav- ioral science

    Freeman, Linton C. Elementary applied statistics: for students in behav- ioral science. John Wiley and Sons, 1965

  13. [13]

    Deep Bayesian Active Learning with Image Data

    Gal, Yarin, Riashat Islam, and Zoubin Ghahramani. ”Deep bayesian active learning with image data.” arXiv preprint arXiv:1703.02910 (2017)

  14. [14]

    and Chang, E., 2001, October

    Tong, S. and Chang, E., 2001, October. Support vector machine active learning for image retrieval. In Proceedings of the ninth ACM interna- tional conference on Multimedia (pp. 107-118). ACM

  15. [15]

    and Chilamkurti, N., 2018

    Diro, A.A. and Chilamkurti, N., 2018. Distributed attack detection scheme using deep learning approach for Internet of Things. Future Generation Computer Systems, 82, pp.761-768

  16. [16]

    Thompson, Cynthia A., Mary Elaine Califf, and Raymond J. Mooney. ”Active learning for natural language parsing and information extrac- tion.” ICML. 1999

  17. [17]

    ”Balancing exploration and exploitation: A new algorithm for active machine learning.” Data Mining, Fifth IEEE International Conference on

    Osugi, Thomas, Deng Kim, and Stephen Scott. ”Balancing exploration and exploitation: A new algorithm for active machine learning.” Data Mining, Fifth IEEE International Conference on. IEEE, 2005

  18. [18]

    and Blundell, C., 2017

    Lakshminarayanan, B., Pritzel, A. and Blundell, C., 2017. Simple and scalable predictive uncertainty estimation using deep ensembles. In Advances in Neural Information Processing Systems (pp. 6402-6413)

  19. [19]

    and Van Roy, B., 2016

    Osband, I., Blundell, C., Pritzel, A. and Van Roy, B., 2016. Deep exploration via bootstrapped DQN. In Advances in neural information processing systems (pp. 4026-4034)

  20. [20]

    and Ghahramani, Z., 2016, June

    Gal, Y . and Ghahramani, Z., 2016, June. Dropout as a Bayesian approximation: Representing model uncertainty in deep learning. In international conference on machine learning (pp. 1050-1059)

  21. [21]

    and Bengio, Y ., 1999

    LeCun, Y ., Haffner, P., Bottou, L. and Bengio, Y ., 1999. Object recog- nition with gradient-based learning. In Shape, contour and grouping in computer vision (pp. 319-345). Springer, Berlin, Heidelberg

  22. [22]

    Differential privacy: A survey of results

    Dwork, C., 2008, April. Differential privacy: A survey of results. In International Conference on Theory and Applications of Models of Computation (pp. 1-19). Springer, Berlin, Heidelberg

  23. [23]

    Bayesian Convolutional Neural Networks with Bernoulli Approximate Variational Inference

    Tang, B., Chen, Z., Hefferman, G., Wei, T., He, H. and Yang, Q., 2015, October. A hierarchical distributed fog computing architecture for big data analysis in smart cities. In Proceedings of the ASE BigData and SocialInformatics 2015 (p. 28). ACM.reprint arXiv:1506.02158

  24. [24]

    Federated Optimization: Distributed Machine Learning for On-Device Intelligence

    Konecn, J., McMahan, H.B., Ramage, D. and Richtrik, P., 2016. Federated optimization: Distributed machine learning for on-device intelligence. arXiv preprint arXiv:1610.02527

  25. [25]

    and Si, L., 2012, August

    Hong, D. and Si, L., 2012, August. Mixture model with multiple centralized retrieval algorithms for result merging in federated search. In Proceedings of the 35th international ACM SIGIR conference on Research and development in information retrieval (pp. 821-830). ACM

  26. [26]

    and McAuliffe, J.D., 2017

    Blei, D.M., Kucukelbir, A. and McAuliffe, J.D., 2017. Variational inference: A review for statisticians. Journal of the American Statistical Association, 112(518), pp.859-877

  27. [27]

    and Lerer, A., 2017

    Paszke, A., Gross, S., Chintala, S., Chanan, G., Yang, E., DeVito, Z., Lin, Z., Desmaison, A., Antiga, L. and Lerer, A., 2017. Automatic differentiation in pytorch

  28. [28]

    and Haffner, P., 1998

    LeCun, Y ., Bottou, L., Bengio, Y . and Haffner, P., 1998. Gradient-based learning applied to document recognition. Proceedings of the IEEE, 86(11), pp.2278-2324

  29. [29]

    and Salakhutdi- nov, R., 2014

    Srivastava, N., Hinton, G., Krizhevsky, A., Sutskever, I. and Salakhutdi- nov, R., 2014. Dropout: a simple way to prevent neural networks from overfitting. The Journal of Machine Learning Research, 15(1), pp.1929- 1958