arxiv: 2605.11815 · v1 · submitted 2026-05-12 · 💻 cs.LG

Recognition: 2 theorem links

· Lean Theorem

Fed-BAC: Federated Bandit-Guided Additive Clustering in Hierarchical Federated Learning

Muddesar Iqbal, Satwat Bashir, Tasos Dagiuklas

Authors on Pith no claims yet

Pith reviewed 2026-05-13 07:08 UTC · model grok-4.3

classification 💻 cs.LG

keywords hierarchical federated learningbandit algorithmscluster personalizationnon-IID dataclient selectionedge computingadditive decomposition

0 comments

The pith

A two-level bandit framework with additive cluster personalization lets hierarchical federated learning jointly optimize server assignments and client selection, delivering large accuracy gains under non-IID data with only 80 percent client

A machine-rendered reading of the paper's core claim, the machinery that carries it, and where it could break.

The paper shows that existing hierarchical federated learning methods cannot jointly handle cluster assignment and client selection when data is heterogeneous. Fed-BAC addresses this by running contextual bandits at the cloud to assign servers to clusters and Thompson Sampling at each edge server to pick high-value clients, while an additive decomposition shares a global model across clusters and keeps cluster-specific heads for local variation. The result is measured on CIFAR-10, SVHN, and Fashion-MNIST under both moderate and severe Dirichlet partitions. A reader should care because the approach reduces the number of clients that must participate, speeds convergence, and raises accuracy most when data distributions diverge most.

Core claim

Fed-BAC integrates additive cluster personalization with a two-level bandit framework: contextual bandits at the cloud learn server-to-cluster assignments while Thompson Sampling at each edge server identifies high-contributing clients. The additive decomposition enables sharing of knowledge between groups through a globally aggregated network, while cluster-specific networks capture distribution variations. On three classification benchmarks under moderate (alpha=0.5) and severe (alpha=0.1) Dirichlet non-IID partitioning, this yields distributed accuracy gains of up to +35.5pp over HierFAVG and +8.4pp over IFCA, requires only 80 percent client participation, converges 1.5 to 4.8 times更快, is

What carries the argument

Two-level bandit framework (contextual bandits at cloud for server-to-cluster assignment plus Thompson Sampling at edge servers for client selection) combined with additive decomposition of global and cluster-specific networks.

Load-bearing premise

The two-level bandit framework combined with additive decomposition can reliably identify high-value clusters and clients without introducing selection bias under the tested Dirichlet partitions.

What would settle it

Running the method on a new dataset whose heterogeneity structure differs markedly from the tested Dirichlet partitions and observing that accuracy gains and convergence speedups both disappear would falsify the central claim.

Figures

Figures reproduced from arXiv: 2605.11815 by Muddesar Iqbal, Satwat Bashir, Tasos Dagiuklas.

**Figure 1.** Figure 1: Fed-BAC system architecture. LinUCB at the cloud assigns servers to clusters; TS at each edge server selects [PITH_FULL_IMAGE:figures/full_fig_p005_1.png] view at source ↗

**Figure 2.** Figure 2: Distributed accuracy vs. communication rounds (exponentially smoothed). Top row: moderate heterogeneity ( [PITH_FULL_IMAGE:figures/full_fig_p007_2.png] view at source ↗

**Figure 3.** Figure 3: Cluster dynamics on Fashion-MNIST (α= 0.5, 100 rounds). the shared global component captures transferable structure that isolated clustering discards. Cluster dynamics. Fed-BAC’s LinUCB bandit reassigns servers actively in early rounds and stabilizes as the bandit converges (Figs. 3–5). On Fashion-MNIST (initialized as one cluster), Fed-BAC discovers 7–8 active clusters versus IFCA’s static 3; on CIFAR-10 … view at source ↗

**Figure 4.** Figure 4: Cluster dynamics on CIFAR-10 (α= 0.5, 200 rounds). 0 20 40 60 80 100 120 140 0 200 400 600 Round Cumulative Reassignments Fed-BAC reassign. IFCA reassign. 0 5 10 Active Clusters Fed-BAC clusters IFCA clusters [PITH_FULL_IMAGE:figures/full_fig_p008_4.png] view at source ↗

**Figure 5.** Figure 5: Cluster dynamics on SVHN (α= 0.5, 150 rounds). Cross-server fairness. Table IV reports per-server accuracy distributions. At α= 0.5, Fed-BAC achieves the lowest crossserver variance on SVHN and Fashion-MNIST (σ = 1.41 and 2.05) and halves IFCA’s variance on CIFAR-10, while its worst server (69.54%) exceeds every other method’s mean. Under severe heterogeneity (α= 0.1), HierFAVG’s σ increases sharply (2.10… view at source ↗

read the original abstract

Hierarchical federated learning (HFL) leverages edge servers for partial aggregation in edge computing. Yet existing FL methods lack mechanisms for jointly optimizing cluster assignment and client selection under data heterogeneity. This paper proposes Fed-BAC, which integrates additive cluster personalization with a two-level bandit framework: contextual bandits at the cloud learn server-to-cluster assignments, while Thompson Sampling at each edge server identifies high-contributing clients. The additive decomposition enables the sharing of knowledge between groups through a globally aggregated network, while cluster-specific networks capture distribution variations. Across three classification benchmarks (CIFAR-10, SVHN, Fashion-MNIST) under moderate ($\alpha = 0.5$) and severe ($\alpha = 0.1$) Dirichlet non-IID partitioning, Fed-BAC achieves distributed accuracy gains of up to +35.5pp over HierFAVG and +8.4pp over IFCA, while requiring only 80% client participation, converging 1.5 to 4.8$\times$ faster depending on dataset and accuracy target, and improving cross-server fairness. These gains are further validated at 5$\times$ deployment scale on CIFAR-10. The advantage of Fed-BAC increases with heterogeneity severity, confirming that additive cluster personalization becomes increasingly valuable as data distributions diverge.

Editorial analysis

A structured set of objections, weighed in public.

Desk editor's note, referee report, simulated authors' rebuttal, and a circularity audit. Tearing a paper down is the easy half of reading it; the pith above is the substance, this is the friction.

Desk Editor's Note private letter to a colleague

Fed-BAC combines two-level bandits with additive clustering for hierarchical FL and reports sizable accuracy gains on non-IID benchmarks, but the client selection mechanism lacks supporting analysis on bias or reward design.

read the letter

Fed-BAC pairs contextual bandits at the cloud for server-to-cluster assignment with Thompson Sampling at edge servers for client selection, all inside an additive setup that keeps a shared global network while adding cluster-specific heads. The abstract positions this as a direct response to missing joint optimization in prior hierarchical FL work like HierFAVG and IFCA. The concrete numbers are the main draw: up to 35.5 points accuracy over HierFAVG and 8.4 over IFCA on CIFAR-10, SVHN, and Fashion-MNIST under Dirichlet splits at alpha 0.5 and 0.1, plus 80 percent participation, 1.5-4.8x faster convergence, better cross-server fairness, and a 5x scale-up test on CIFAR-10. The gains widen with stronger heterogeneity, which fits the motivation for additive personalization in messy edge data. That combination of bandit layers and additive decomposition is the clearest new element relative to the cited baselines. The experiments cover three standard benchmarks and two heterogeneity levels, which gives the results some breadth for the subfield. The approach looks practically oriented toward reducing communication while handling distribution shift. The soft spots sit in the selection process and the missing details. The abstract gives no error bars, no statistical tests, and no description of the exact reward signal fed to Thompson Sampling or how the additive decomposition is implemented. Under severe non-IID, local accuracy after partial training can easily track per-client data difficulty or label imbalance rather than marginal contribution to the global model. If the bandit keeps favoring easier clients, the shared network receives biased updates while cluster heads overfit to the remaining hard ones. No regret bounds or posterior analysis appear in the abstract, so the mechanism's robustness beyond the tested synthetic partitions stays unproven. This paper is for researchers working on hierarchical FL and non-IID mitigation who want to see bandit methods applied to both clustering and client choice. Someone already using clustering or personalization in FL would find the scale and participation numbers worth examining. I would send it to peer review so the full methods, reward construction, and any additional experiments can be checked properly.

Referee Report

3 major / 2 minor

Summary. The paper introduces Fed-BAC for hierarchical federated learning, combining additive cluster personalization (global network plus cluster-specific heads) with a two-level bandit mechanism: contextual bandits at the cloud server for assigning edge servers to clusters, and Thompson Sampling at each edge server for selecting high-contributing clients. Empirical results on CIFAR-10, SVHN, and Fashion-MNIST under Dirichlet non-IID partitions (α=0.5 and α=0.1) claim accuracy gains up to +35.5pp over HierFAVG and +8.4pp over IFCA at 80% client participation, 1.5–4.8× faster convergence, improved fairness, and robustness at 5× scale.

Significance. If the empirical gains and fairness improvements are reproducible with proper statistical controls, the work offers a practical mechanism for client and cluster selection in HFL that reduces communication while handling severe heterogeneity. The additive decomposition provides a clean way to share global knowledge without full personalization overhead, which could influence future edge-computing FL deployments.

major comments (3)

[§3.2] §3.2 (Thompson Sampling at edge servers): the reward signal (local loss/accuracy after partial training) is not shown to be unbiased with respect to client data difficulty or label imbalance; under α=0.1 Dirichlet partitions this risks preferential selection of easier clients, potentially biasing the global network while cluster heads overfit to remaining hard clients. No regret bound or posterior convergence analysis is provided to support stationarity.
[§5.1] §5.1 and Table 2: reported accuracy gains (+35.5pp, +8.4pp) and convergence speedups lack error bars, standard deviations across runs, or statistical significance tests; without these the headline claims cannot be assessed for reliability, especially given the scale-up experiment on CIFAR-10.
[§4.3] §4.3 (additive decomposition): the claim that the global network plus cluster heads reliably separates shared and distribution-specific knowledge is not accompanied by any formal decomposition bound or ablation isolating the contribution of the bandit-driven selection versus the additive structure itself.

minor comments (2)

[Figure 1] Figure 1: the diagram of the two-level bandit flow would benefit from explicit arrows showing how the contextual bandit output feeds into edge-server Thompson Sampling.
[§3.1] Notation: the symbols for cluster assignment probabilities and client selection probabilities are introduced without a consolidated table, making cross-references in §3.1–3.3 harder to follow.

Simulated Author's Rebuttal

3 responses · 0 unresolved

We thank the referee for the constructive and detailed feedback on Fed-BAC. We address each major comment point by point below, indicating the revisions we will incorporate to improve the manuscript.

read point-by-point responses

Referee: [§3.2] §3.2 (Thompson Sampling at edge servers): the reward signal (local loss/accuracy after partial training) is not shown to be unbiased with respect to client data difficulty or label imbalance; under α=0.1 Dirichlet partitions this risks preferential selection of easier clients, potentially biasing the global network while cluster heads overfit to remaining hard clients. No regret bound or posterior convergence analysis is provided to support stationarity.

Authors: We acknowledge the concern that the reward signal (local loss reduction after partial training) may not be fully unbiased under severe heterogeneity. The signal is chosen to reflect marginal contribution to model improvement rather than absolute accuracy, which helps prioritize clients providing useful gradients. Empirical results on α=0.1 partitions still show gains in both accuracy and fairness, suggesting the mechanism is robust in practice. No regret bound is derived because the primary focus is empirical validation in HFL; we will add a dedicated limitations paragraph discussing potential selection bias and stationarity assumptions in the revised version. revision: partial
Referee: [§5.1] §5.1 and Table 2: reported accuracy gains (+35.5pp, +8.4pp) and convergence speedups lack error bars, standard deviations across runs, or statistical significance tests; without these the headline claims cannot be assessed for reliability, especially given the scale-up experiment on CIFAR-10.

Authors: We agree that statistical rigor is necessary to substantiate the reported gains. In the revised manuscript we will rerun all experiments with 5 independent random seeds, report mean ± standard deviation for accuracy and convergence metrics, add error bars to figures and Table 2, and include paired t-test p-values to confirm statistical significance of the improvements over baselines. revision: yes
Referee: [§4.3] §4.3 (additive decomposition): the claim that the global network plus cluster heads reliably separates shared and distribution-specific knowledge is not accompanied by any formal decomposition bound or ablation isolating the contribution of the bandit-driven selection versus the additive structure itself.

Authors: The additive structure is motivated by the goal of sharing a common representation while allowing cluster-specific adaptation. While we do not provide a formal decomposition bound, we will add an ablation study in the revision that compares (i) full Fed-BAC, (ii) additive heads without bandit selection, and (iii) bandit selection without additive heads, to quantify the individual contributions of each component. revision: partial

Circularity Check

0 steps flagged

No circularity: empirical bandit integration with independent experimental validation

full rationale

The paper presents Fed-BAC as an algorithmic method that combines contextual bandits at the cloud level with Thompson Sampling at edge servers for client/cluster selection, plus additive decomposition for personalization. All performance claims (+35.5pp accuracy, faster convergence, fairness at 80% participation) are supported solely by empirical results on CIFAR-10, SVHN, and Fashion-MNIST under Dirichlet non-IID partitions. No equations, first-principles derivations, or predictions are shown that reduce by construction to fitted inputs or self-citations. The method is described as a practical integration of existing bandit techniques with HFL, without load-bearing self-citation chains or self-definitional steps. The derivation chain is therefore self-contained against external benchmarks.

Axiom & Free-Parameter Ledger

0 free parameters · 0 axioms · 0 invented entities

Abstract supplies no explicit free parameters, axioms, or invented entities; all technical details remain at the level of high-level description.

pith-pipeline@v0.9.0 · 5537 in / 1126 out tokens · 47724 ms · 2026-05-13T07:08:03.037158+00:00 · methodology

discussion (0)

Lean theorems connected to this paper

Citations machine-checked in the Pith Canon. Every link opens the source theorem in the public Lean library.

IndisputableMonolith/Cost/FunctionalEquation.lean washburn_uniqueness_aczel unclear
additive cluster personalization ... ŷ = h(x; Θ_global) + f(x; Θ_k)

Reference graph

Works this paper leans on

38 extracted references · 38 canonical work pages · 1 internal anchor

[1]

Communication-efficient learning of deep networks from decentralized data,

B. McMahan, E. Moore, D. Ramage, S. Hampson, and B. A. y Arcas, “Communication-efficient learning of deep networks from decentralized data,” inArtificial Intelligence and Statistics, pp. 1273–1282, PMLR, 2017

work page 2017
[2]

Advances and open problems in federated learning,

P. Kairouz, H. B. McMahan, B. Avent,et al., “Advances and open problems in federated learning,”Foundations and Trends in Machine Learning, vol. 14, no. 1–2, pp. 1–210, 2021

work page 2021
[3]

Federated optimization in heterogeneous networks,

T. Li, A. K. Sahu, M. Zaheer, M. Sanjabi, A. Talwalkar, and V . Smith, “Federated optimization in heterogeneous networks,” inProceedings of Machine Learning and Systems, vol. 2, pp. 429–450, 2020

work page 2020
[4]

Client-edge-cloud hierarchical federated learning,

L. Liu, J. Zhang, S. Song, and K. B. Letaief, “Client-edge-cloud hierarchical federated learning,” inIEEE International Conference on Communications (ICC), pp. 1–6, IEEE, 2020

work page 2020
[5]

Hierarchical federated learning across heterogeneous cellular networks,

M. S. H. Abad, E. Ozfatura, D. G ¨und¨uz, and O. Ercetin, “Hierarchical federated learning across heterogeneous cellular networks,” inIEEE International Conference on Acoustics, Speech and Signal Processing (ICASSP), pp. 8866–8870, IEEE, 2020

work page 2020
[6]

Topology-aware federated learning in edge computing: A comprehensive survey,

A. Wu, Y . Sun, Y . Zhu, J. Gu, J. Yang, and F. Wei, “Topology-aware federated learning in edge computing: A comprehensive survey,”ACM Computing Surveys, vol. 56, no. 12, pp. 1–40, 2024

work page 2024
[7]

Demystifying why local aggregation helps: Convergence analysis of hierarchical sgd,

J. Wang, Q. Liu, H. Liang, G. Joshi, and H. V . Poor, “Demystifying why local aggregation helps: Convergence analysis of hierarchical sgd,” Advances in Neural Information Processing Systems, vol. 35, pp. 13251– 13265, 2022

work page 2022
[8]

Communication-efficient federated learning via knowledge distillation,

C. Wu, F. Wu, L. Lyu, Y . Huang, and X. Xie, “Communication-efficient federated learning via knowledge distillation,”Nature Communications, vol. 13, no. 1, p. 2032, 2022

work page 2032
[9]

Data-free knowledge distillation for het- erogeneous federated learning,

Z. Zhu, J. Hong, and J. Zhou, “Data-free knowledge distillation for het- erogeneous federated learning,” inInternational Conference on Machine Learning, pp. 12878–12889, PMLR, 2021

work page 2021
[10]

Fast heterogeneous federated learning with hybrid client selection,

D. Song, A. Zhou, W. Sun, B. Li, and W. Li, “Fast heterogeneous federated learning with hybrid client selection,” inUncertainty in Artificial Intelligence, pp. 2006–2015, PMLR, 2023

work page 2006
[11]

Feder- ated learning with non-iid data,

Y . Zhao, M. Li, L. Lai, N. Suda, D. Civin, and V . Chandra, “Feder- ated learning with non-iid data,” inNeurIPS Workshop on Federated Learning, 2018

work page 2018
[12]

The non-iid data quagmire of decentralized machine learning,

K. Hsieh, A. Phanishayee, O. Mutlu, and P. Gibbons, “The non-iid data quagmire of decentralized machine learning,” inInternational Conference on Machine Learning, pp. 4387–4398, PMLR, 2020

work page 2020
[13]

An efficient framework for clustered federated learning,

A. Ghosh, J. Chung, D. Yin, and K. Ramchandran, “An efficient framework for clustered federated learning,” inAdvances in Neural Information Processing Systems, vol. 33, pp. 19586–19597, 2020

work page 2020
[14]

Clustered federated learning: Model-agnostic distributed multitask optimization under privacy con- straints,

F. Sattler, K.-R. M ¨uller, and W. Samek, “Clustered federated learning: Model-agnostic distributed multitask optimization under privacy con- straints,”IEEE Transactions on Neural Networks and Learning Systems, vol. 32, no. 8, pp. 3710–3722, 2021

work page 2021
[15]

Towards understanding biased client selection in federated learning,

Y . J. Cho, J. Wang, and G. Joshi, “Towards understanding biased client selection in federated learning,” inInternational Conference on Artificial Intelligence and Statistics, pp. 10351–10375, PMLR, 2022

work page 2022
[16]

Oort: Effi- cient federated learning via guided participant selection,

F. Lai, X. Zhu, H. V . Madhyastha, and M. Chowdhury, “Oort: Effi- cient federated learning via guided participant selection,” inUSENIX Symposium on Operating Systems Design and Implementation (OSDI), pp. 19–35, 2021

work page 2021
[17]

Multi- armed bandit-based client scheduling for federated learning,

W. Xia, T. Q. Quek, K. Guo, W. Wen, H. H. Yang, and H. Zhu, “Multi- armed bandit-based client scheduling for federated learning,”IEEE Transactions on Wireless Communications, vol. 19, no. 11, pp. 7108– 7123, 2020

work page 2020
[18]

Ditto: Fair and robust federated learning through personalization,

T. Li, S. Hu, A. Beirami, and V . Smith, “Ditto: Fair and robust federated learning through personalization,” inInternational Conference on Machine Learning, pp. 6357–6368, PMLR, 2021

work page 2021
[19]

Adaptive personalized federated learning.arXiv preprint arXiv:2003.13461, 2020

Y . Deng, M. M. Kamani, and M. Mahdavi, “Adaptive personalized federated learning,”arXiv preprint arXiv:2003.13461, 2020

work page arXiv 2003
[20]

Towards personalized federated learning,

A. Z. Tan, H. Yu, L. Cui, and Q. Yang, “Towards personalized federated learning,”IEEE Transactions on Neural Networks and Learning Systems, vol. 34, no. 12, pp. 9587–9603, 2022

work page 2022
[21]

Structured federated learning through clustered additive modeling,

J. Ma, T. Zhou, G. Long, J. Jiang, and C. Zhang, “Structured federated learning through clustered additive modeling,” inAdvances in Neural Information Processing Systems, vol. 36, 2023

work page 2023
[22]

A contextual-bandit approach to personalized news article recommendation,

L. Li, W. Chu, J. Langford, and R. E. Schapire, “A contextual-bandit approach to personalized news article recommendation,” inProceedings of the 19th International Conference on World Wide Web, pp. 661–670, 2010

work page 2010
[23]

A tutorial on Thompson sampling,

D. J. Russo, B. Van Roy, A. Kazerouni, I. Osband, and Z. Wen, “A tutorial on Thompson sampling,”Foundations and Trends in Machine Learning, vol. 14, no. 1–2, pp. 1–96, 2020

work page 2020
[24]

Bandit-based communication-efficient client selection strategies for federated learn- ing,

Y . J. Cho, S. Gupta, G. Joshi, and O. Ya ˘gan, “Bandit-based communication-efficient client selection strategies for federated learn- ing,” in54th Asilomar Conference on Signals, Systems, and Computers, pp. 1066–1069, IEEE, 2020

work page 2020
[25]

Context-aware online client selection for hierarchical federated learning,

Z. Qu, R. Duan, L. Chen, J. Xu, Z. Lu, and Y . Liu, “Context-aware online client selection for hierarchical federated learning,”IEEE Transactions on Parallel and Distributed Systems, vol. 33, no. 12, pp. 4353–4367, 2022

work page 2022
[26]

Client selection in hierarchical feder- ated learning,

S. Trindade and N. da Fonseca, “Client selection in hierarchical feder- ated learning,”IEEE Internet of Things Journal, 2024

work page 2024
[27]

Chpfl: Clustered adaptive hierarchical federated learning for edge-level personalization,

L. Song, J. Li, H. Jiang, S. Wei, and Y . Guo, “Chpfl: Clustered adaptive hierarchical federated learning for edge-level personalization,”High- Confidence Computing, vol. 5, no. 3, p. 100279, 2025

work page 2025
[28]

Personalizing federated learning for hierarchical edge networks with non-IID data,

S. Lee, O. Tavallaie, S. Chen, K. Thilakarathna, S. Seneviratne, A. N. Toosi, and A. Y . Zomaya, “Personalizing federated learning for hierarchical edge networks with non-IID data,”arXiv preprint arXiv:2504.08872, 2025

work page arXiv 2025
[29]

Federated learning with hierarchical clustering of local updates to improve training on non-IID data,

C. Briggs, Z. Fan, and P. Andras, “Federated learning with hierarchical clustering of local updates to improve training on non-IID data,” inIEEE International Joint Conference on Neural Networks (IJCNN), pp. 1–9, IEEE, 2020

work page 2020
[30]

Interaction- aware Gaussian weighting for clustered federated learning,

A. Licciardi, D. Leo, E. Fan `ı, B. Caputo, and M. Ciccone, “Interaction- aware Gaussian weighting for clustered federated learning,” inInterna- tional Conference on Machine Learning (ICML), 2025

work page 2025
[31]

Clustered federated learning with hierarchical knowl- edge distillation,

S. Ahmadet al., “Clustered federated learning with hierarchical knowl- edge distillation,”arXiv preprint arXiv:2512.10443, 2025

work page arXiv 2025
[32]

Exploiting shared representations for personalized federated learning,

L. Collins, H. Hassani, A. Mokhtari, and S. Shakkottai, “Exploiting shared representations for personalized federated learning,” inInterna- tional Conference on Machine Learning, pp. 2089–2099, PMLR, 2021

work page 2089
[33]

FedCSAD: Federated learning with contextual client selection and confidence-weighted multi-teacher knowledge distillation in power equipment inspection,

T. Luet al., “FedCSAD: Federated learning with contextual client selection and confidence-weighted multi-teacher knowledge distillation in power equipment inspection,” inAlgorithms and Architectures for Parallel Processing (ICA3PP), vol. 16382 ofLecture Notes in Computer Science, Springer, 2026

work page 2026
[34]

Measuring the effects of non- identical data distribution for federated visual classification,

T.-M. H. Hsu, H. Qi, and M. Brown, “Measuring the effects of non- identical data distribution for federated visual classification,” inNeurIPS Workshop on Federated Learning, 2019

work page 2019
[35]

Analysis of Thompson sampling for the multi-armed bandit problem,

S. Agrawal and N. Goyal, “Analysis of Thompson sampling for the multi-armed bandit problem,” inConference on Learning Theory, pp. 39.1–39.26, JMLR, 2012

work page 2012
[36]

Learning multiple layers of features from tiny images,

A. Krizhevsky and G. Hinton, “Learning multiple layers of features from tiny images,” tech. rep., University of Toronto, 2009

work page 2009
[37]

Reading digits in natural images with unsupervised feature learning,

Y . Netzer, T. Wang, A. Coates, A. Bissacco, B. Wu, and A. Y . Ng, “Reading digits in natural images with unsupervised feature learning,” in NIPS Workshop on Deep Learning and Unsupervised Feature Learning, 2011

work page 2011
[38]

Fashion-MNIST: a Novel Image Dataset for Benchmarking Machine Learning Algorithms

H. Xiao, K. Rasul, and R. V ollgraf, “Fashion-MNIST: a novel image dataset for benchmarking machine learning algorithms,”arXiv preprint arXiv:1708.07747, 2017

work page internal anchor Pith review Pith/arXiv arXiv 2017