arxiv: 1708.07747 · v2 · submitted 2017-08-25 · 💻 cs.LG · cs.CV· stat.ML

Recognition: 1 theorem link

· Lean Theorem

Fashion-MNIST: a Novel Image Dataset for Benchmarking Machine Learning Algorithms

Han Xiao, Kashif Rasul, Roland Vollgraf

Pith reviewed 2026-05-11 05:17 UTC · model grok-4.3

classification 💻 cs.LG cs.CVstat.ML

keywords Fashion-MNISTimage classification datasetMNIST replacementmachine learning benchmarksfashion product imagesgrayscale 28x28 imagesdrop-in dataset

0 comments

The pith

Fashion-MNIST supplies a drop-in replacement for MNIST using 28x28 fashion images.

A machine-rendered reading of the paper's core claim, the machinery that carries it, and where it could break.

The paper releases Fashion-MNIST, a collection of 70,000 grayscale images showing fashion products from ten categories. It keeps exactly the same 28-by-28 size, single-channel format, and 60,000/10,000 train-test split as the original MNIST handwritten-digit set. The change in subject matter is meant to raise the difficulty of the classification task while preserving every detail of the evaluation protocol. Researchers can therefore swap the dataset into existing code and obtain more informative benchmark numbers without any other changes.

Core claim

Fashion-MNIST is a new dataset of 28x28 grayscale images of 70,000 fashion products from 10 categories, with 7,000 images per category. The training set contains 60,000 images and the test set contains 10,000 images. The dataset is constructed to function as a direct drop-in replacement for the original MNIST dataset, matching its image size, data format, and training-testing split structure exactly.

What carries the argument

The exact structural match to MNIST (image dimensions, grayscale format, and split sizes) applied to a new subject domain of clothing items.

If this is right

Existing benchmark code and leaderboards can be reused unchanged while testing on more varied visual content.
Performance gaps between models will more accurately reflect generalization beyond simple digit shapes.
New algorithms can be compared directly against prior work without needing to re-implement MNIST baselines.
The dataset remains freely downloadable and usable under the same conditions as the original MNIST.

Where Pith is reading between the lines

These are editorial extensions of the paper, not claims the author makes directly.

Widespread adoption would discourage overfitting to the specific visual statistics of handwritten digits.
The format match could inspire similar replacements for other long-standing but overly simple benchmarks.
Developers of feature-extraction methods would need to handle intra-class variation in texture and shape that digits lack.

Load-bearing premise

That the fashion images will prove meaningfully harder for models yet still accessible enough that the community will switch to this dataset instead of continuing to use MNIST.

What would settle it

A broad survey of recent papers that shows most new algorithms still report results only on MNIST and not on Fashion-MNIST.

read the original abstract

We present Fashion-MNIST, a new dataset comprising of 28x28 grayscale images of 70,000 fashion products from 10 categories, with 7,000 images per category. The training set has 60,000 images and the test set has 10,000 images. Fashion-MNIST is intended to serve as a direct drop-in replacement for the original MNIST dataset for benchmarking machine learning algorithms, as it shares the same image size, data format and the structure of training and testing splits. The dataset is freely available at https://github.com/zalandoresearch/fashion-mnist

Editorial analysis

A structured set of objections, weighed in public.

Desk editor's note, referee report, simulated authors' rebuttal, and a circularity audit. Tearing a paper down is the easy half of reading it; the pith above is the substance, this is the friction.

Desk Editor's Note private letter to a colleague

Fashion-MNIST is a clean data release that copies MNIST's format with clothing images to create a modestly harder benchmark.

read the letter

The key takeaway is that Fashion-MNIST is a new dataset designed as a straightforward replacement for the classic MNIST, using fashion product images instead of handwritten digits while keeping the exact same size, format, and split structure. What stands out is how cleanly they executed the release. The 70,000 images are 28 by 28 grayscale, divided into 60,000 training and 10,000 test examples across 10 categories like t-shirts, trousers, and sneakers. They made the files available in the same binary format as MNIST, which means no code changes are needed to use it. The paper describes the collection process from Zalando's catalog and includes some basic performance numbers from standard classifiers to show it's more challenging than the original. This approach works well for lowering the barrier to better benchmarks. MNIST has been too easy for years, with models hitting 99% accuracy easily, so having something similar but with real-world objects helps measure progress more meaningfully without jumping to huge datasets. The limitations are clear though. The paper doesn't introduce any new techniques or deep analysis; it's primarily a data description. The claim of novelty rests entirely on the image content, and there's little discussion of potential issues like label noise or class balance beyond the basics. Whether this becomes a standard depends on community uptake rather than anything proven in the work itself. Readers who run frequent small experiments on image classification will get the most out of it, especially those looking for something between MNIST and CIFAR in difficulty. It makes sense to send this to peer review because solid dataset papers support a lot of downstream work, and this one is well-documented with public access. I would recommend accepting it for review rather than rejecting outright.

Referee Report

0 major / 2 minor

Summary. The manuscript introduces Fashion-MNIST, a dataset of 70,000 28x28 grayscale images of fashion products across 10 categories (7,000 images per category), with a 60,000-image training set and 10,000-image test set. It is explicitly positioned as a direct drop-in replacement for the original MNIST dataset, matching it in image size, data format (IDX binary), and train/test split structure. The dataset is released publicly via the cited GitHub repository.

Significance. If adopted, the dataset offers a more challenging yet compatible benchmark for image classification algorithms, addressing MNIST's simplicity while preserving reproducibility and ease of use in existing pipelines. The public release, identical format, and clear specification of splits constitute a concrete contribution that enables immediate community use and more realistic model evaluations.

minor comments (2)

[Abstract] Abstract: the phrasing 'comprising of' is nonstandard; 'consisting of' or 'comprising' would be clearer.
[Dataset description] The manuscript would benefit from a short table or paragraph in the main text explicitly comparing the exact file formats and split sizes to MNIST to strengthen the drop-in claim.

Simulated Author's Rebuttal

0 responses · 0 unresolved

We thank the referee for their positive assessment of the manuscript and for recommending acceptance. The review accurately captures the intent and contribution of Fashion-MNIST as a direct drop-in replacement for MNIST.

Circularity Check

0 steps flagged

No significant circularity

full rationale

The paper is a dataset release note with no derivations, equations, predictions, fitted parameters, or theoretical claims. The central statement that Fashion-MNIST matches MNIST in size, format, and split structure is a direct description of the released data files themselves (publicly provided in the cited GitHub repository in identical IDX format). No load-bearing step reduces to a self-citation, ansatz, or input-by-construction; the format equivalence is verifiable externally from the dataset release without any internal loop.

Axiom & Free-Parameter Ledger

0 free parameters · 0 axioms · 0 invented entities

This is a dataset introduction paper with no mathematical derivations, models, or theoretical claims, so the ledger contains no free parameters, axioms, or invented entities.

pith-pipeline@v0.9.0 · 5399 in / 1089 out tokens · 52756 ms · 2026-05-11T05:17:28.596879+00:00 · methodology

discussion (0)

Forward citations

Cited by 60 Pith papers

Reviewed papers in the Pith corpus that reference this work. Sorted by Pith novelty score.

Gradient-Free Continual Learning in Spiking Neural Networks via Inter-Spike Interval Regularization
cs.NE 2026-04 unverdicted novelty 8.0

ISI-CV derives a synaptic importance score from the regularity of neuron firing intervals to enable continual learning without gradients or forgetting on SNNs.
QLAM: A Quantum Long-Attention Memory Approach to Long-Sequence Token Modeling
cs.LG 2026-05 unverdicted novelty 7.0

QLAM extends state-space models with quantum superposition in the hidden state for linear-time long-sequence modeling and reports consistent gains over RNN and transformer baselines on sequential image tasks.
From Compression to Accountability: Harmless Copyright Protection for Dataset Distillation
cs.CR 2026-05 unverdicted novelty 7.0

SubPopMark protects distilled datasets by injecting verifiable subpopulation biases that create distinguishable model behaviors for copyright tracing without using backdoors.
Fixed-Point Neural Optimal Transport without Implicit Differentiation
math.OC 2026-05 unverdicted novelty 7.0

A single-network fixed-point formulation for neural optimal transport eliminates adversarial min-max optimization and implicit differentiation while enforcing dual feasibility exactly.
Classification-Head Bias in Class-Level Machine Unlearning: Diagnosis, Mitigation, and Evaluation
cs.LG 2026-05 conditional novelty 7.0

Class-level unlearning shortcuts via bias suppression in the classification head; new bias-aware training mechanisms and bias-specific metrics are introduced to diagnose and reduce this dependence.
Pre-training Enables Extraordinary All-optical Image Denoising
physics.optics 2026-05 unverdicted novelty 7.0

Pre-training diffractive optical networks on millions of simple images followed by fine-tuning enables all-optical denoising that raises PSNR from below 8 dB to above 18 dB across diverse datasets including MNIST, Che...
TRACE: Transport Alignment Conformal Prediction via Diffusion and Flow Matching Models
stat.ML 2026-05 unverdicted novelty 7.0

TRACE creates valid conformal prediction sets for complex generative models by scoring outputs via averaged denoising or velocity errors along stochastic transport paths instead of likelihoods.
Test-Time Compositional Generalization in Diffusion Models via Concept Discovery
cs.LG 2026-05 unverdicted novelty 7.0

Diffusion models can extract reusable density-mode concepts from their time-indexed scores to enable compositional generation at test time on held-out benchmarks from ColorMNIST and CelebA.
The Interplay of Data Structure and Imbalance in the Learning Dynamics of Diffusion Models
stat.ML 2026-05 unverdicted novelty 7.0

Higher-variance classes are learned first in diffusion models; strong class imbalance reverses the order and imposes distinct delayed learning times on minority classes.
Non-Myopic Active Feature Acquisition via Pathwise Policy Gradients
cs.LG 2026-05 unverdicted novelty 7.0

NM-PPG optimizes non-myopic acquisition policies for costly features by enabling pathwise gradients via continuous relaxation and straight-through rollouts in POMDPs, outperforming SOTA baselines.
Spectral Graph Sparsification Preserves Representation Geometry in Graph Neural Networks
cs.LG 2026-05 unverdicted novelty 7.0

Spectral sparsification preserves GNN embedding geometry up to O(ε) perturbations in filters, representations, Gram matrices, and training trajectories.
Quantum Interval Bound Propagation for Certified Training of Quantum Neural Networks
quant-ph 2026-05 unverdicted novelty 7.0

QIBP adapts interval bound propagation to quantum neural networks for certified adversarial robustness via interval and affine arithmetic implementations.
Heterogeneous-Horizon Exact-Weight Local SGD
math.OC 2026-04 unverdicted novelty 7.0

HEW-Local SGD provides exact-weight adaptive aggregation for heterogeneous local SGD with one-step guarantees and explicit convergence results under unequal local horizons.
Diverse Dictionary Learning
cs.LG 2026-04 unverdicted novelty 7.0

Diverse dictionary learning identifies intersections, complements, and dependency structures of latent variables from data X = g(Z) up to indeterminacies, and full identifiability when structural diversity is sufficient.
The Multi-Block DC Function Class: Theory, Algorithms, and Applications
math.OC 2026-04 unverdicted novelty 7.0

The Multi-Block DC class admits polynomial-size DC decompositions for problems that require exponential size under standard DC programming and supplies explicit constructive formulations for deep ReLU networks togethe...
Feature-level analysis and adversarial transfer in rotationally equivariant quantum machine learning
quant-ph 2026-04 unverdicted novelty 7.0

Rotationally equivariant quantum models can rely on vulnerable invariant statistics such as ring-averaged intensities, leaving them susceptible to classical transfer attacks, but suppressing the associated symmetry se...
The Linear Centroids Hypothesis: Features as Directions Learned by Local Experts
cs.LG 2026-04 unverdicted novelty 7.0

The Linear Centroids Hypothesis reframes network features as directions in centroid spaces of local affine experts, unifying interpretability methods and yielding sparser, more faithful dictionaries, circuits, and sal...
Tensor-based Multi-layer Decoupling
eess.SY 2026-04 unverdicted novelty 7.0

A new tensor framework for multi-layer decoupling of multivariate functions is proposed via ParaTuck decompositions and bilevel optimization.
Toward Exact Convergence in Byzantine-Robust Decentralized Learning: A Statistical Identification Approach
stat.ME 2026-04 unverdicted novelty 7.0

DRSGD-ByMI identifies Byzantine machines via sample-splitting score statistics with FDR control, then prunes them to recover sufficient connectivity and achieve order-optimal convergence rates identical to standard de...
XFED: Non-Collusive Model Poisoning Attack Against Byzantine-Robust Federated Classifiers
cs.CR 2026-04 unverdicted novelty 7.0

XFED is the first aggregation-agnostic non-collusive model poisoning attack that bypasses eight state-of-the-art defenses on six benchmark datasets without attacker coordination.
Instance-Adaptive Parametrization for Amortized Variational Inference
cs.LG 2026-04 unverdicted novelty 7.0

IA-VAE augments amortized variational inference with hypernetwork-generated instance-adaptive modulations, strictly containing the standard variational family and improving held-out ELBO on synthetic and image data.
Drifting Fields are not Conservative
cs.LG 2026-04 conditional novelty 7.0

Drift fields in single-pass generative models are not conservative except for Gaussian kernels; a sharp kernel normalization makes them conservative for any radial kernel while noting that non-conservative fields offe...
UMAP: Uniform Manifold Approximation and Projection for Dimension Reduction
stat.ML 2018-02 unverdicted novelty 7.0

UMAP is a novel, scalable manifold learning algorithm for dimension reduction that competes with t-SNE while preserving more global structure and having no embedding dimension restrictions.
Quantitative Linear Logic for Neuro-Symbolic Learning and Verification
cs.LO 2026-05 unverdicted novelty 6.0

Quantitative Linear Logic interprets logical connectives via natural ML operations on logits to embed constraints in neural training while satisfying most linear logic laws and correlating performance with independent...
Exact Fixed-Point Constraints in Neural-ODEs with Provable Universality
cond-mat.dis-nn 2026-05 unverdicted novelty 6.0

A technique plants exact fixed points in Neural-ODE velocity fields with a rigorous proof that universality is preserved under local constraints.
SEMASIA: A Large-Scale Dataset of Semantically Structured Latent Representations
cs.LG 2026-05 unverdicted novelty 6.0

SEMASIA supplies a large-scale, metadata-rich collection of latent representations from diverse vision models to enable systematic study of semantic geometry and cross-model alignment.
HARMONY: Bridging the Personalization-Generalization Gap by Mitigating Representation Skew in Heterogeneous Split Federated Learning
cs.LG 2026-05 unverdicted novelty 6.0

HARMONY mitigates representation skew in heterogeneous hybrid split federated learning via meta-learning to simulate diverse extractors and server-side contrastive learning to align features, delivering up to 43% accu...
Competing nonlinearities, criticality, and order-to-chaos transition in deep networks
cond-mat.dis-nn 2026-05 unverdicted novelty 6.0

A statistical mixture of Tanh and Swish activations with critical mixing fraction p_c induces a continuous phase transition to scale-invariant signal propagation in deep networks while preserving smoothness.
DR-SNE: Density-Regularized Stochastic Neighbor Embedding
cs.LG 2026-05 unverdicted novelty 6.0

DR-SNE augments the SNE objective with a density regularization term from normalized log-density estimates to preserve relative densities while retaining neighborhood structure.
Retrieval with Multiple Query Vectors through Anomalous Pattern Detection
cs.LG 2026-05 unverdicted novelty 6.0

A retrieval approach identifies anomalous dimensions in a set of query vectors and retrieves database vectors that are anomalous across those dimensions, with performance improving as query set size grows to around 8.
Model Merging: Foundations and Algorithms
cs.LG 2026-05 unverdicted novelty 6.0

New cycle-consistent optimization, task vector theory, singular vector decompositions, adaptive routing, and efficient evolutionary search provide foundations for merging neural network weights across tasks.
Class Angular Distortion Index for Dimensionality Reduction
cs.LG 2026-05 unverdicted novelty 6.0

CADI quantifies the preservation of relative cluster angles in low-dimensional projections using internal angles from point triples.
Possibilistic Predictive Uncertainty for Deep Learning
cs.LG 2026-05 unverdicted novelty 6.0

DAPPr introduces a possibilistic framework that projects parameter posteriors to predictions via supremum and approximates them with Dirichlet possibility functions to yield efficient, closed-form epistemic uncertaint...
Efficient Mutation Testing of Quantum Machine Learning Models
quant-ph 2026-04 unverdicted novelty 6.0

New mutation operators and directed mutant generation produce more diverse faulty quantum neural network circuits than prior techniques, as shown in experiments.
Defending Quantum Classifiers against Adversarial Perturbations through Quantum Autoencoders
quant-ph 2026-04 unverdicted novelty 6.0

A quantum autoencoder purifies adversarial perturbations for quantum classifiers and supplies a confidence score for unrecoverable inputs, claiming up to 68% accuracy gains over prior defenses without adversarial training.
Controlled Steering-Based State Preparation for Adversarial-Robust Quantum Machine Learning
quant-ph 2026-04 unverdicted novelty 6.0

A passive steering method for quantum state preparation improves adversarial accuracy in QML models by up to 40% across tested cases.
Diverse Image Priors for Black-box Data-free Knowledge Distillation
cs.LG 2026-04 unverdicted novelty 6.0

DIP-KD achieves state-of-the-art results in black-box data-free knowledge distillation across 12 benchmarks by synthesizing diverse image priors, applying contrastive learning, and using a primer student for soft-prob...
Complexity of Linear Regions in Self-supervised Deep ReLU Networks
cs.LG 2026-04 unverdicted novelty 6.0

Self-supervised ReLU networks form substantially fewer linear regions than supervised models for comparable accuracy, with contrastive methods rapidly expanding regions and self-distillation consolidating them, enabli...
High-dimensional Semi-supervised Classification via the Fermat Distance
stat.ML 2026-04 unverdicted novelty 6.0

Fermat distance enables minimax-optimal weighted k-NN classifiers for high-dimensional semi-supervised learning with exponentially decaying estimation error from unlabeled data.
Toward Polymorphic Backdoor against Semantic Communication via Intensity-Based Poisoning
cs.CR 2026-04 unverdicted novelty 6.0

SemBugger achieves polymorphic backdoors in semantic communication via graded-intensity trigger poisoning and hierarchical loss, plus a noise-based defense with a theoretical efficacy bound.
LTBs-KAN: Linear-Time B-splines Kolmogorov-Arnold Networks
cs.LG 2026-04 unverdicted novelty 6.0

LTBs-KAN delivers linear-time B-spline evaluation in KANs plus parameter reduction via product-of-sums factorization, with competitive results on MNIST, Fashion-MNIST, and CIFAR-10.
Component-Based Out-of-Distribution Detection
cs.CV 2026-04 unverdicted novelty 6.0

CoOD decomposes inputs into components and applies Component Shift Score plus Compositional Consistency Score to improve detection of both standard and compositional out-of-distribution data.
QuanForge: A Mutation Testing Framework for Quantum Neural Networks
cs.SE 2026-04 unverdicted novelty 6.0

QuanForge introduces statistical mutation killing and nine post-training mutation operators for QNNs to distinguish test suites and localize vulnerable circuit regions.
Deep sprite-based image models: An analysis
cs.CV 2026-04 unverdicted novelty 6.0

A deep sprite-based image decomposition method matches SOTA unsupervised class-aware segmentation on CLEVR, scales linearly with objects, explicitly identifies categories, and fully models images interpretably.
Optimizing Stochastic Gradient Push under Broadcast Communications
cs.LG 2026-04 unverdicted novelty 6.0

An efficient mixing-matrix design algorithm for SGP that uses graph-theoretic parameters to reduce convergence time in broadcast DFL while providing performance guarantees.
Task Alignment: A simple and effective proxy for model merging in computer vision
cs.CV 2026-04 unverdicted novelty 6.0

Task alignment serves as an efficient proxy for hyperparameter selection in model merging, accelerating the process by orders of magnitude while preserving performance in vision models with heterogeneous decoders.
The Linear Centroids Hypothesis: Features as Directions Learned by Local Experts
cs.LG 2026-04 unverdicted novelty 6.0

Features in deep networks correspond to linear directions of centroids summarizing local functional behavior, enabling sparser and more effective feature dictionaries via sparse autoencoders applied to centroids rathe...
Extraction of linearized models from pre-trained networks via knowledge distillation
cs.LG 2026-04 unverdicted novelty 6.0

Koopman theory plus knowledge distillation yields linearized models from pre-trained nets that outperform standard least-squares Koopman approximations on MNIST and Fashion-MNIST in accuracy and stability.
Label Leakage Attacks in Machine Unlearning: A Parameter and Inversion-Based Approach
cs.CR 2026-04 unverdicted novelty 6.0

Parameter-difference and model-inversion attacks can identify forgotten classes after machine unlearning on standard image datasets.
Drifting Fields are not Conservative
cs.LG 2026-04 unverdicted novelty 6.0

Drift fields are not conservative except for Gaussian kernels; sharp normalization makes them conservative for any radial kernel by equating them to score differences of kernel density estimates.
Shot-Based Quantum Encoding: A Data-Loading Paradigm for Quantum Neural Networks
quant-ph 2026-04 unverdicted novelty 6.0

SBQE encodes data via learnable shot distributions over initial states to form mixed quantum representations, achieving 89.1% accuracy on Semeion and 80.95% on Fashion MNIST without encoding gates.
A Spectral Framework for Multi-Scale Nonlinear Dimensionality Reduction
cs.LG 2026-04 unverdicted novelty 6.0

A spectral framework for nonlinear DR uses spectral bases plus cross-entropy optimization to create multi-scale embeddings that preserve both global manifold geometry and local neighborhoods while supporting graph-fre...
Deep Image Clustering Based on Curriculum Learning and Density Information
cs.CV 2026-03 unverdicted novelty 6.0

IDCL adds density-based curriculum learning and density-core guidance to deep image clustering, claiming superior robustness, faster convergence, and flexibility on benchmark datasets.
LightSplit: Practical Privacy-Preserving Split Learning via Orthogonal Projections
cs.LG 2026-05 unverdicted novelty 5.0

LightSplit uses non-invertible orthogonal projections as an information bottleneck in split learning to reduce transmitted dimensionality by 32x while retaining more than 95% accuracy and limiting reconstruction risk.
Fed-BAC: Federated Bandit-Guided Additive Clustering in Hierarchical Federated Learning
cs.LG 2026-05 unverdicted novelty 5.0

Fed-BAC uses contextual bandits and Thompson Sampling with additive clustering to deliver up to 35.5 percentage point accuracy gains and 1.5-4.8x faster convergence in hierarchical federated learning on non-IID data.
FedSurrogate: Backdoor Defense in Federated Learning via Layer Criticality and Surrogate Replacement
cs.CR 2026-05 unverdicted novelty 5.0

FedSurrogate defends federated learning against backdoors by clustering on security-critical layers and substituting malicious updates with benign surrogates, reporting false-positive rates below 10% and attack succes...
Risk-Consistent Multiclass Learning from Random Label-Subset Membership Queries
cs.LG 2026-05 unverdicted novelty 5.0

The paper introduces risk-consistent multiclass learning from random label-subset queries by deriving an unbiased risk estimator under ERM, plus non-negative and absolute-value corrections, with generalization bounds ...
Resource Utilization of Differentiable Logic Gate Networks Deployed on FPGAs
cs.AR 2026-05 unverdicted novelty 5.0

Narrowing the final layer of an LGN cuts FPGA resource usage by 28% and permits deeper or wider networks under timing limits because that layer controls the size of summing logic.
FedPLT: Scalable, Resource-Efficient, and Heterogeneity-Aware Federated Learning via Partial Layer Training
cs.DC 2026-05 unverdicted novelty 5.0

FedPLT assigns client-specific model layers for training and matches or beats full-model federated learning accuracy with 71-82 percent fewer trainable parameters per client.
Dendritic Neural Networks with Equilibrium Propagation
cs.LG 2026-05 unverdicted novelty 5.0

Dendritic EP matches standard EP on simple tasks but significantly outperforms it on KMNIST and FMNIST, and in deeper models, approaching the performance of backpropagation-trained dendritic networks.

Reference graph

Works this paper leans on

6 extracted references · 6 canonical work pages · cited by 73 Pith papers

[1]

Ciregan, U

D. Ciregan, U. Meier, and J. Schmidhuber. Multi-column deep neural networks for image classification. In Computer Vision and Pattern Recognition (CVPR), 2012 IEEE Conference on, pages 3642--3649. IEEE, 2012

work page 2012
[2]

EMNIST: an extension of MNIST to handwritten letters

G. Cohen, S. Afshar, J. Tapson, and A. van Schaik. Emnist: an extension of mnist to handwritten letters. arXiv preprint arXiv:1702.05373, 2017

work page Pith review arXiv 2017
[3]

J. Deng, W. Dong, R. Socher, L.-J. Li, K. Li, and L. Fei-Fei. Imagenet: A large-scale hierarchical image database. In Computer Vision and Pattern Recognition, 2009. CVPR 2009. IEEE Conference on, pages 248--255. IEEE, 2009

work page 2009
[4]

Krizhevsky and G

A. Krizhevsky and G. Hinton. Learning multiple layers of features from tiny images. 2009

work page 2009
[5]

LeCun, L

Y. LeCun, L. Bottou, Y. Bengio, and P. Haffner. Gradient-based learning applied to document recognition. Proceedings of the IEEE, 86 0 (11): 0 2278--2324, 1998

work page 1998
[6]

L. Wan, M. Zeiler, S. Zhang, Y. L. Cun, and R. Fergus. Regularization of neural networks using dropconnect. In Proceedings of the 30th international conference on machine learning (ICML-13), pages 1058--1066, 2013

work page 2013