pith. machine review for the scientific record. sign in

arxiv: 1702.06257 · v1 · submitted 2017-02-21 · 💻 cs.CV

Recognition: unknown

The Power of Sparsity in Convolutional Neural Networks

Authors on Pith no claims yet
classification 💻 cs.CV
keywords convolutionalnetworksaccuracynetworkapproachleadsmanymemory
0
0 comments X
read the original abstract

Deep convolutional networks are well-known for their high computational and memory demands. Given limited resources, how does one design a network that balances its size, training time, and prediction accuracy? A surprisingly effective approach to trade accuracy for size and speed is to simply reduce the number of channels in each convolutional layer by a fixed fraction and retrain the network. In many cases this leads to significantly smaller networks with only minimal changes to accuracy. In this paper, we take a step further by empirically examining a strategy for deactivating connections between filters in convolutional layers in a way that allows us to harvest savings both in run-time and memory for many network architectures. More specifically, we generalize 2D convolution to use a channel-wise sparse connection structure and show that this leads to significantly better results than the baseline approach for large networks including VGG and Inception V3.

This paper has not been read by Pith yet.

discussion (0)

Sign in with ORCID, Apple, or X to comment. Anyone can read and Pith papers without signing in.

Forward citations

Cited by 2 Pith papers

Reviewed papers in the Pith corpus that reference this work. Sorted by Pith novelty score.

  1. DECO: Sparse Mixture-of-Experts with Dense-Comparable Performance on End-Side Devices

    cs.LG 2026-05 unverdicted novelty 6.0

    DECO sparse MoE matches dense Transformer performance at 20% expert activation with a 3x hardware inference speedup.

  2. DECO: Sparse Mixture-of-Experts with Dense-Comparable Performance on End-Side Devices

    cs.LG 2026-05 conditional novelty 6.0

    DECO matches dense model performance at 20% expert activation via ReLU-based routing with learnable scaling and the NormSiLU activation, plus a 3x real-hardware speedup.