pith. machine review for the scientific record. sign in

arxiv: 1506.02626 · v3 · submitted 2015-06-08 · 💻 cs.NE · cs.CV· cs.LG

Recognition: unknown

Learning both Weights and Connections for Efficient Neural Networks

Authors on Pith no claims yet
classification 💻 cs.NE cs.CVcs.LG
keywords connectionsmethodmillionnetworksaccuracyneuralarchitectureimportant
0
0 comments X
read the original abstract

Neural networks are both computationally intensive and memory intensive, making them difficult to deploy on embedded systems. Also, conventional networks fix the architecture before training starts; as a result, training cannot improve the architecture. To address these limitations, we describe a method to reduce the storage and computation required by neural networks by an order of magnitude without affecting their accuracy by learning only the important connections. Our method prunes redundant connections using a three-step method. First, we train the network to learn which connections are important. Next, we prune the unimportant connections. Finally, we retrain the network to fine tune the weights of the remaining connections. On the ImageNet dataset, our method reduced the number of parameters of AlexNet by a factor of 9x, from 61 million to 6.7 million, without incurring accuracy loss. Similar experiments with VGG-16 found that the number of parameters can be reduced by 13x, from 138 million to 10.3 million, again with no loss of accuracy.

This paper has not been read by Pith yet.

discussion (0)

Sign in with ORCID, Apple, or X to comment. Anyone can read and Pith papers without signing in.

Forward citations

Cited by 6 Pith papers

Reviewed papers in the Pith corpus that reference this work. Sorted by Pith novelty score.

  1. MedCore: Boundary-Preserving Medical Core Pruning for MedSAM

    cs.CV 2026-05 unverdicted novelty 7.0

    MedCore achieves 60% parameter and 58.4% FLOP reduction on MedSAM with Dice 0.9549 and preserved boundary metrics via dual-intervention pruning and a new boundary leverage principle.

  2. FlashAttention: Fast and Memory-Efficient Exact Attention with IO-Awareness

    cs.LG 2022-05 accept novelty 7.0

    FlashAttention reduces GPU high-bandwidth memory accesses in self-attention via tiling, delivering exact attention with lower IO complexity, 2-3x wall-clock speedups on models like GPT-2, and the ability to train on s...

  3. AgentSlimming: Towards Efficient and Cost-Aware Multi-Agent Systems

    cs.LG 2026-05 unverdicted novelty 6.0

    AgentSlimming compresses graph-structured multi-agent systems by estimating agent importance and removing or replacing low-value agents, cutting token costs by up to 78.9% with negligible performance loss.

  4. OFA-Diffusion Compression: Compressing Diffusion Model in One-Shot Manner

    cs.CV 2026-04 conditional novelty 6.0

    OFA-Diffusion Compression trains diffusion models once to yield multiple size-specific compressed subnetworks via restricted candidate spaces, importance-based channel allocation, and reweighting.

  5. RecGPT-Mobile: On-Device Large Language Models for User Intent Understanding in Taobao Feed Recommendation

    cs.IR 2026-05 unverdicted novelty 4.0

    RecGPT-Mobile runs a compact LLM on phones to understand evolving user intent from behaviors and improve mobile e-commerce recommendations.

  6. Sparse-on-Dense: Area and Energy-Efficient Computing of Sparse Neural Networks on Dense Matrix Multiplication Accelerators

    cs.AR 2026-04 unverdicted novelty 4.0

    Sparse neural networks achieve better area and energy efficiency when executed on dense matrix multiplication accelerators using a Sparse-on-Dense approach than on dedicated sparse accelerators.