pith. sign in

arxiv: 1708.07747 · v2 · submitted 2017-08-25 · 💻 cs.LG · cs.CV· stat.ML

Fashion-MNIST: a Novel Image Dataset for Benchmarking Machine Learning Algorithms

Pith reviewed 2026-05-11 05:17 UTC · model grok-4.3

classification 💻 cs.LG cs.CVstat.ML
keywords Fashion-MNISTimage classification datasetMNIST replacementmachine learning benchmarksfashion product imagesgrayscale 28x28 imagesdrop-in dataset
0
0 comments X

The pith

Fashion-MNIST supplies a drop-in replacement for MNIST using 28x28 fashion images.

A machine-rendered reading of the paper's core claim, the machinery that carries it, and where it could break.

The paper releases Fashion-MNIST, a collection of 70,000 grayscale images showing fashion products from ten categories. It keeps exactly the same 28-by-28 size, single-channel format, and 60,000/10,000 train-test split as the original MNIST handwritten-digit set. The change in subject matter is meant to raise the difficulty of the classification task while preserving every detail of the evaluation protocol. Researchers can therefore swap the dataset into existing code and obtain more informative benchmark numbers without any other changes.

Core claim

Fashion-MNIST is a new dataset of 28x28 grayscale images of 70,000 fashion products from 10 categories, with 7,000 images per category. The training set contains 60,000 images and the test set contains 10,000 images. The dataset is constructed to function as a direct drop-in replacement for the original MNIST dataset, matching its image size, data format, and training-testing split structure exactly.

What carries the argument

The exact structural match to MNIST (image dimensions, grayscale format, and split sizes) applied to a new subject domain of clothing items.

If this is right

  • Existing benchmark code and leaderboards can be reused unchanged while testing on more varied visual content.
  • Performance gaps between models will more accurately reflect generalization beyond simple digit shapes.
  • New algorithms can be compared directly against prior work without needing to re-implement MNIST baselines.
  • The dataset remains freely downloadable and usable under the same conditions as the original MNIST.

Where Pith is reading between the lines

These are editorial extensions of the paper, not claims the author makes directly.

  • Widespread adoption would discourage overfitting to the specific visual statistics of handwritten digits.
  • The format match could inspire similar replacements for other long-standing but overly simple benchmarks.
  • Developers of feature-extraction methods would need to handle intra-class variation in texture and shape that digits lack.

Load-bearing premise

That the fashion images will prove meaningfully harder for models yet still accessible enough that the community will switch to this dataset instead of continuing to use MNIST.

What would settle it

A broad survey of recent papers that shows most new algorithms still report results only on MNIST and not on Fashion-MNIST.

read the original abstract

We present Fashion-MNIST, a new dataset comprising of 28x28 grayscale images of 70,000 fashion products from 10 categories, with 7,000 images per category. The training set has 60,000 images and the test set has 10,000 images. Fashion-MNIST is intended to serve as a direct drop-in replacement for the original MNIST dataset for benchmarking machine learning algorithms, as it shares the same image size, data format and the structure of training and testing splits. The dataset is freely available at https://github.com/zalandoresearch/fashion-mnist

Editorial analysis

A structured set of objections, weighed in public.

Desk editor's note, referee report, simulated authors' rebuttal, and a circularity audit. Tearing a paper down is the easy half of reading it; the pith above is the substance, this is the friction.

Referee Report

0 major / 2 minor

Summary. The manuscript introduces Fashion-MNIST, a dataset of 70,000 28x28 grayscale images of fashion products across 10 categories (7,000 images per category), with a 60,000-image training set and 10,000-image test set. It is explicitly positioned as a direct drop-in replacement for the original MNIST dataset, matching it in image size, data format (IDX binary), and train/test split structure. The dataset is released publicly via the cited GitHub repository.

Significance. If adopted, the dataset offers a more challenging yet compatible benchmark for image classification algorithms, addressing MNIST's simplicity while preserving reproducibility and ease of use in existing pipelines. The public release, identical format, and clear specification of splits constitute a concrete contribution that enables immediate community use and more realistic model evaluations.

minor comments (2)
  1. [Abstract] Abstract: the phrasing 'comprising of' is nonstandard; 'consisting of' or 'comprising' would be clearer.
  2. [Dataset description] The manuscript would benefit from a short table or paragraph in the main text explicitly comparing the exact file formats and split sizes to MNIST to strengthen the drop-in claim.

Simulated Author's Rebuttal

0 responses · 0 unresolved

We thank the referee for their positive assessment of the manuscript and for recommending acceptance. The review accurately captures the intent and contribution of Fashion-MNIST as a direct drop-in replacement for MNIST.

Circularity Check

0 steps flagged

No significant circularity

full rationale

The paper is a dataset release note with no derivations, equations, predictions, fitted parameters, or theoretical claims. The central statement that Fashion-MNIST matches MNIST in size, format, and split structure is a direct description of the released data files themselves (publicly provided in the cited GitHub repository in identical IDX format). No load-bearing step reduces to a self-citation, ansatz, or input-by-construction; the format equivalence is verifiable externally from the dataset release without any internal loop.

Axiom & Free-Parameter Ledger

0 free parameters · 0 axioms · 0 invented entities

This is a dataset introduction paper with no mathematical derivations, models, or theoretical claims, so the ledger contains no free parameters, axioms, or invented entities.

pith-pipeline@v0.9.0 · 5399 in / 1089 out tokens · 52756 ms · 2026-05-11T05:17:28.596879+00:00 · methodology

discussion (0)

Sign in with ORCID, Apple, or X to comment. Anyone can read and Pith papers without signing in.

Forward citations

Cited by 60 Pith papers

Reviewed papers in the Pith corpus that reference this work. Sorted by Pith novelty score.

  1. Gradient-Free Continual Learning in Spiking Neural Networks via Inter-Spike Interval Regularization

    cs.NE 2026-04 unverdicted novelty 8.0

    ISI-CV derives a synaptic importance score from the regularity of neuron firing intervals to enable continual learning without gradients or forgetting on SNNs.

  2. Pointwise Generalization in Deep Neural Networks

    cs.LG 2026-05 unverdicted novelty 7.0

    Proposes pointwise Riemannian Dimension from feature eigenvalues to derive tighter, representation-aware generalization bounds for deep networks in the nonlinear regime.

  3. BESplit: Bias-Compensated Split Federated Learning with Evidential Aggregation

    cs.LG 2026-05 unverdicted novelty 7.0

    BESplit mitigates non-IID bias in split federated learning via evidential aggregation, bias-compensated client pairing, and dual-teacher distillation, outperforming prior methods on five benchmarks.

  4. PCDM: A Diffusion-Based Data Poisoning Attack Against Federated Learning Systems

    cs.CR 2026-05 unverdicted novelty 7.0

    PCDM uses a poisoning-oriented conditional diffusion model with an adjustable vector and jumping strategy to create stealthier and more effective poisoned data than GAN-based attacks against federated learning.

  5. Byzantine-Resilient Federated Learning via QUBO-Based Client Selection on Quantum Annealers

    cs.LG 2026-05 unverdicted novelty 7.0

    QUBO formulation on quantum annealers for joint client selection in federated learning, combined with a MultiSignal routing ensemble, yields higher Byzantine attack detection accuracy than MultiKrum on challenging att...

  6. Quantitative Linear Logic for Neuro-Symbolic Learning and Verification

    cs.LO 2026-05 unverdicted novelty 7.0

    QLL is a novel logic for neuro-symbolic learning that uses ML-native operations (sum, log-sum-exp) on logits to embed constraints, satisfying most linear logic properties and showing stronger correlation between empir...

  7. QLAM: A Quantum Long-Attention Memory Approach to Long-Sequence Token Modeling

    cs.LG 2026-05 unverdicted novelty 7.0

    QLAM extends state-space models with quantum superposition in the hidden state for linear-time long-sequence modeling and reports consistent gains over RNN and transformer baselines on sequential image tasks.

  8. FeatCal: Feature Calibration for Post-Merging Models

    cs.LG 2026-05 conditional novelty 7.0

    FeatCal reduces feature drift in merged models via layer-wise closed-form calibration on a small dataset, outperforming prior post-merging methods on CLIP and GLUE benchmarks with high sample efficiency.

  9. From Compression to Accountability: Harmless Copyright Protection for Dataset Distillation

    cs.CR 2026-05 unverdicted novelty 7.0

    SubPopMark protects distilled datasets by injecting verifiable subpopulation biases that create distinguishable model behaviors for copyright tracing without using backdoors.

  10. From Compression to Accountability: Harmless Copyright Protection for Dataset Distillation

    cs.CR 2026-05 unverdicted novelty 7.0

    SubPopMark embeds verifiable subpopulation biases into distilled datasets via CVM and USTM optimization stages, allowing provenance inference through comparison of model output signatures against a reference behavior bank.

  11. Fixed-Point Neural Optimal Transport without Implicit Differentiation

    math.OC 2026-05 unverdicted novelty 7.0

    A single-network fixed-point formulation for neural optimal transport eliminates adversarial min-max optimization and implicit differentiation while enforcing dual feasibility exactly.

  12. Classification-Head Bias in Class-Level Machine Unlearning: Diagnosis, Mitigation, and Evaluation

    cs.LG 2026-05 conditional novelty 7.0

    Class-level unlearning shortcuts via bias suppression in the classification head; new bias-aware training mechanisms and bias-specific metrics are introduced to diagnose and reduce this dependence.

  13. Pre-training Enables Extraordinary All-optical Image Denoising

    physics.optics 2026-05 unverdicted novelty 7.0

    Pre-training diffractive optical networks on millions of simple images followed by fine-tuning enables all-optical denoising that raises PSNR from below 8 dB to above 18 dB across diverse datasets including MNIST, Che...

  14. TRACE: Transport Alignment Conformal Prediction via Diffusion and Flow Matching Models

    stat.ML 2026-05 unverdicted novelty 7.0

    TRACE creates valid conformal prediction sets for complex generative models by scoring outputs via averaged denoising or velocity errors along stochastic transport paths instead of likelihoods.

  15. Test-Time Compositional Generalization in Diffusion Models via Concept Discovery

    cs.LG 2026-05 unverdicted novelty 7.0

    Diffusion models can extract reusable density-mode concepts from their time-indexed scores to enable compositional generation at test time on held-out benchmarks from ColorMNIST and CelebA.

  16. The Interplay of Data Structure and Imbalance in the Learning Dynamics of Diffusion Models

    stat.ML 2026-05 unverdicted novelty 7.0

    Higher-variance classes are learned first in diffusion models; strong class imbalance reverses the order and imposes distinct delayed learning times on minority classes.

  17. Non-Myopic Active Feature Acquisition via Pathwise Policy Gradients

    cs.LG 2026-05 unverdicted novelty 7.0

    NM-PPG optimizes non-myopic acquisition policies for costly features by enabling pathwise gradients via continuous relaxation and straight-through rollouts in POMDPs, outperforming SOTA baselines.

  18. Spectral Graph Sparsification Preserves Representation Geometry in Graph Neural Networks

    cs.LG 2026-05 unverdicted novelty 7.0

    Spectral sparsification preserves GNN embedding geometry up to O(ε) perturbations in filters, representations, Gram matrices, and training trajectories.

  19. Quantum Interval Bound Propagation for Certified Training of Quantum Neural Networks

    quant-ph 2026-05 unverdicted novelty 7.0

    QIBP adapts interval bound propagation to quantum neural networks for certified adversarial robustness via interval and affine arithmetic implementations.

  20. Heterogeneous-Horizon Exact-Weight Local SGD

    math.OC 2026-04 unverdicted novelty 7.0

    HEW-Local SGD provides exact-weight adaptive aggregation for heterogeneous local SGD with one-step guarantees and explicit convergence results under unequal local horizons.

  21. Diverse Dictionary Learning

    cs.LG 2026-04 unverdicted novelty 7.0

    Diverse dictionary learning identifies intersections, complements, and dependency structures of latent variables from data X = g(Z) up to indeterminacies, and full identifiability when structural diversity is sufficient.

  22. The Multi-Block DC Function Class: Theory, Algorithms, and Applications

    math.OC 2026-04 unverdicted novelty 7.0

    The Multi-Block DC class admits polynomial-size DC decompositions for problems that require exponential size under standard DC programming and supplies explicit constructive formulations for deep ReLU networks togethe...

  23. Feature-level analysis and adversarial transfer in rotationally equivariant quantum machine learning

    quant-ph 2026-04 unverdicted novelty 7.0

    Rotationally equivariant quantum models can rely on vulnerable invariant statistics such as ring-averaged intensities, leaving them susceptible to classical transfer attacks, but suppressing the associated symmetry se...

  24. The Linear Centroids Hypothesis: Features as Directions Learned by Local Experts

    cs.LG 2026-04 unverdicted novelty 7.0

    The Linear Centroids Hypothesis reframes network features as directions in centroid spaces of local affine experts, unifying interpretability methods and yielding sparser, more faithful dictionaries, circuits, and sal...

  25. Tensor-based Multi-layer Decoupling

    eess.SY 2026-04 unverdicted novelty 7.0

    A new tensor framework for multi-layer decoupling of multivariate functions is proposed via ParaTuck decompositions and bilevel optimization.

  26. Toward Exact Convergence in Byzantine-Robust Decentralized Learning: A Statistical Identification Approach

    stat.ME 2026-04 unverdicted novelty 7.0

    DRSGD-ByMI identifies Byzantine machines via sample-splitting score statistics with FDR control, then prunes them to recover sufficient connectivity and achieve order-optimal convergence rates identical to standard de...

  27. XFED: Non-Collusive Model Poisoning Attack Against Byzantine-Robust Federated Classifiers

    cs.CR 2026-04 unverdicted novelty 7.0

    XFED is the first aggregation-agnostic non-collusive model poisoning attack that bypasses eight state-of-the-art defenses on six benchmark datasets without attacker coordination.

  28. Instance-Adaptive Parametrization for Amortized Variational Inference

    cs.LG 2026-04 unverdicted novelty 7.0

    IA-VAE augments amortized variational inference with hypernetwork-generated instance-adaptive modulations, strictly containing the standard variational family and improving held-out ELBO on synthetic and image data.

  29. Drifting Fields are not Conservative

    cs.LG 2026-04 conditional novelty 7.0

    Drift fields in single-pass generative models are not conservative except for Gaussian kernels; a sharp kernel normalization makes them conservative for any radial kernel while noting that non-conservative fields offe...

  30. Selectivity and Shape in the Design of Forward-Forward Goodness Functions

    cs.LG 2026-03 unverdicted novelty 7.0

    Shape- and peak-sensitive goodness functions for Forward-Forward deliver up to 72pp gains over sum-of-squares, reaching 98.2% on MNIST and 89% on Fashion-MNIST.

  31. How Out-of-Equilibrium Phase Transitions can Seed Pattern Formation in Trained Diffusion Models

    cs.LG 2026-03 unverdicted novelty 7.0

    Pattern formation in trained diffusion models emerges from out-of-equilibrium phase transitions driven by instabilities in low-frequency denoising modes linked to data symmetries and architectural constraints.

  32. Programmable superconducting neuron with intrinsic in-memory computation and dual-timescale plasticity for ultra-efficient neuromorphic computing

    cs.ET 2026-03 unverdicted novelty 7.0

    A programmable superconducting LIF neuron with intrinsic static memory and dual-timescale plasticity achieves 45 GHz operation and femtojoule energy per spike.

  33. FlashSinkhorn: IO-Aware Entropic Optimal Transport on GPU

    cs.LG 2026-02 conditional novelty 7.0

    FlashSinkhorn delivers up to 32x forward and 161x end-to-end speedups for entropic OT on A100 GPUs via IO-aware Triton kernels that fuse log-domain updates and streaming transport application.

  34. Re-Key-Free, Risky-Free: Adaptable Model Usage Control

    cs.CR 2025-11 unverdicted novelty 7.0

    AdaLoc keeps a model locked to authorized users by confining all post-deployment updates to a chosen subset of weights, preserving both task performance for authorized use and near-random accuracy for unauthorized use...

  35. Tensor Computation of Euler Characteristic Functions and Transforms

    cs.CG 2025-11 unverdicted novelty 7.0

    A GPU-optimized tensor method computes WECT and ECF for arbitrary-dimensional simplicial and cubical complexes with reported speedups over prior approaches and ships as the pyECT Python package.

  36. RACE Attention: A Strictly Linear-Time Attention Layer for Training on Outrageously Large Contexts

    cs.LG 2025-10 unverdicted novelty 7.0

    RACE Attention is a strictly linear-time attention mechanism that approximates softmax attention outputs using Gaussian projections and soft LSH to enable training on contexts up to 12 million tokens.

  37. The Tsetlin Machine Goes Deep: Logical Learning and Reasoning With Graphs

    cs.LG 2025-07 unverdicted novelty 7.0

    GraphTM uses message passing on graphs to build nested deep clauses, achieving 3.86% higher accuracy than convolutional TM on CIFAR-10 and competitive results on action tracking, recommendations, and genome sequences.

  38. Conformal-DP: A Density-Aware Mechanism for Differential Privacy over Riemannian Manifolds via Conformal Transformation

    cs.CR 2025-04 unverdicted novelty 7.0

    Conformal-DP applies conformal transformations to create a density-aware DP mechanism on Riemannian manifolds, proving ε-DP and deriving a closed-form geodesic error bound dependent only on density ratio and independe...

  39. Privacy Leakage via Output Label Space and Differentially Private Continual Learning

    cs.LG 2024-11 unverdicted novelty 7.0

    Identifies output label space as a privacy side-channel in DP continual learning, formalizes DP for CL, and demonstrates two mitigation methods yielding higher accuracy than prior work.

  40. CRONOS: Enhancing Deep Learning with Scalable GPU Accelerated Convex Neural Networks

    cs.LG 2024-11 unverdicted novelty 7.0

    CRONOS introduces scalable convex optimization for two-layer neural networks reaching ImageNet scale, with CRONOS-AM extending to arbitrary multi-layer architectures while matching tuned deep learning performance.

  41. DaiMoN: A Decentralized Artificial Intelligence Model Network

    cs.LG 2019-07 unverdicted novelty 7.0

    DaiMoN introduces a decentralized ledger-based network for collaborative ML model improvement with label-hidden proof-of-improvement enabled by a novel learnable Distance Embedding for Labels (DEL) function.

  42. k-GANs: Ensemble of Generative Models with Semi-Discrete Optimal Transport

    stat.ML 2019-07 unverdicted novelty 7.0

    k-GANs trains an ensemble of GANs by mapping point masses to Voronoi tiles of the data distribution using semi-discrete optimal transport and iteratively optimizing both generators and point masses, outperforming base...

  43. UMAP: Uniform Manifold Approximation and Projection for Dimension Reduction

    stat.ML 2018-02 unverdicted novelty 7.0

    UMAP is a novel, scalable manifold learning algorithm for dimension reduction that competes with t-SNE while preserving more global structure and having no embedding dimension restrictions.

  44. AutoMCU: Feasibility-First MCU Neural Network Customization via LLM-based Multi-Agent Systems

    cs.LG 2026-05 unverdicted novelty 6.0

    AutoMCU uses feasibility-first LLM multi-agent coordination to automate MCU-constrained neural network design, delivering competitive accuracy on CIFAR-10/100 in 1-2 hours versus hundreds of GPU hours for prior HW-NAS...

  45. Closed-form predictive coding via hierarchical Gaussian filters

    cs.LG 2026-05 unverdicted novelty 6.0

    Predictive coding is recast as deep hierarchical Gaussian filters to restore precision-weighted message passing, yielding closed-form inference and online precision learning that matches backpropagation speed on Fashi...

  46. Unlocking the Potential of Continual Model Merging: An ODE Perspective

    cs.LG 2026-05 unverdicted novelty 6.0

    ODE-M traces low-loss connecting paths via time-dependent velocity fields and barrier constraints to improve controllability and reduce forgetting in continual model merging.

  47. Unlocking the Potential of Continual Model Merging: An ODE Perspective

    cs.LG 2026-05 unverdicted novelty 6.0

    Introduces ODE-M, an ODE-based merging method for continual model merging that follows low-loss connecting paths to mitigate catastrophic forgetting.

  48. TIDE: Asymmetric Neural Circuits for Stabilized Temporal Inhibitory-Excitatory Dynamics

    cs.LG 2026-05 unverdicted novelty 6.0

    TIDE is a neuro-inspired architecture using stabilized asymmetric E-I networks with lateral inhibition and 80:20 balance that trains in under half the time of CTM while gaining +1.65% top-1 accuracy on perturbed ImageNet.

  49. A Two-Phase Adaptive Balanced Penalty Method for Controllable Pareto Front Learning under Split Feasibility Conditions

    cs.LG 2026-05 unverdicted novelty 6.0

    Introduces ABP algorithm for constrained CPFL with convergence proofs and EFHV metric, demonstrating superior feasibility in experiments.

  50. Geometric Prototype Learning in Quantum Hilbert Space with Matrix Product States

    quant-ph 2026-05 unverdicted novelty 6.0

    A quantum prototype learning scheme encodes class representatives as generative matrix product states and performs classification and clustering via geometric measures in Hilbert space, outperforming classical prototy...

  51. E-PMQ: Expert-Guided Post-Merge Quantization with Merged-Weight Anchoring

    cs.CL 2026-05 unverdicted novelty 6.0

    E-PMQ improves 4-bit quantization accuracy on merged models by 8-42 points across CLIP and GLUE tasks through expert-guided calibration and merged-weight anchoring.

  52. Interaction-Aware Influence Functions for Group Attribution

    cs.LG 2026-05 conditional novelty 6.0

    Extends influence functions with a second-order pairwise interaction term that improves group attribution accuracy over simple summation on multiple model-dataset pairs and instruction-tuning selection tasks.

  53. Response-Conditioned Parallel-to-Sequential Orchestration for Multi-Agent Systems

    cs.CL 2026-05 unverdicted novelty 6.0

    Nexa learns a response-conditioned policy that starts with parallel agent execution and adds at most one round of sequential message passing via a predicted sparse DAG, strictly subsuming pure parallel mode.

  54. On the Fragility of Data Attribution When Learning Is Distributed

    cs.LG 2026-05 unverdicted novelty 6.0

    A single adversary in distributed training inflates its attribution value via latent optimization on synthetic batches without degrading accuracy or triggering basic defenses.

  55. From Weight Perturbation to Feature Attribution for Explaining Fully Connected Neural Networks

    cs.LG 2026-05 unverdicted novelty 6.0

    XWP and XWP_c are novel attribution methods for FCNNs that estimate feature importance by perturbing attached weights to avoid added bias and out-of-distribution issues in occlusion approaches.

  56. Quantitative Linear Logic for Neuro-Symbolic Learning and Verification

    cs.LO 2026-05 unverdicted novelty 6.0

    Quantitative Linear Logic interprets logical connectives via natural ML operations on logits to embed constraints in neural training while satisfying most linear logic laws and correlating performance with independent...

  57. Bayesian Model Merging

    cs.LG 2026-05 unverdicted novelty 6.0

    Bayesian Model Merging introduces a bi-level optimization framework that merges task-specific models via closed-form Bayesian regression with an anchor prior and global hyperparameter search, outperforming baselines a...

  58. Adaptive Multi-Scale Goodness Aggregation for Forward-Forward Learning

    cs.LG 2026-05 unverdicted novelty 6.0

    AMSGA extends Forward-Forward learning via multi-scale goodness aggregation, curriculum-guided hard negative mining, and adaptive thresholds, reporting up to 1.5% accuracy gains on MNIST and Fashion-MNIST.

  59. Exact Fixed-Point Constraints in Neural-ODEs with Provable Universality

    cond-mat.dis-nn 2026-05 unverdicted novelty 6.0

    A technique plants exact fixed points in Neural-ODE velocity fields with a rigorous proof that universality is preserved under local constraints.

  60. SEMASIA: A Large-Scale Dataset of Semantically Structured Latent Representations

    cs.LG 2026-05 unverdicted novelty 6.0

    SEMASIA supplies a large-scale, metadata-rich collection of latent representations from diverse vision models to enable systematic study of semantic geometry and cross-model alignment.

Reference graph

Works this paper leans on

6 extracted references · 6 canonical work pages · cited by 157 Pith papers

  1. [1]

    Ciregan, U

    D. Ciregan, U. Meier, and J. Schmidhuber. Multi-column deep neural networks for image classification. In Computer Vision and Pattern Recognition (CVPR), 2012 IEEE Conference on, pages 3642--3649. IEEE, 2012

  2. [2]

    EMNIST: an extension of MNIST to handwritten letters

    G. Cohen, S. Afshar, J. Tapson, and A. van Schaik. Emnist: an extension of mnist to handwritten letters. arXiv preprint arXiv:1702.05373, 2017

  3. [3]

    J. Deng, W. Dong, R. Socher, L.-J. Li, K. Li, and L. Fei-Fei. Imagenet: A large-scale hierarchical image database. In Computer Vision and Pattern Recognition, 2009. CVPR 2009. IEEE Conference on, pages 248--255. IEEE, 2009

  4. [4]

    Krizhevsky and G

    A. Krizhevsky and G. Hinton. Learning multiple layers of features from tiny images. 2009

  5. [5]

    LeCun, L

    Y. LeCun, L. Bottou, Y. Bengio, and P. Haffner. Gradient-based learning applied to document recognition. Proceedings of the IEEE, 86 0 (11): 0 2278--2324, 1998

  6. [6]

    L. Wan, M. Zeiler, S. Zhang, Y. L. Cun, and R. Fergus. Regularization of neural networks using dropconnect. In Proceedings of the 30th international conference on machine learning (ICML-13), pages 1058--1066, 2013