Learning both Weights and Connections for Efficient Neural Networks

Song Han , Jeff Pool , John Tran , William J. Dally

Authors on Pith no claims yet

classification 💻 cs.NE cs.CVcs.LG

keywords connectionsmethodmillionnetworksaccuracyneuralarchitectureimportant

read the original abstract

Neural networks are both computationally intensive and memory intensive, making them difficult to deploy on embedded systems. Also, conventional networks fix the architecture before training starts; as a result, training cannot improve the architecture. To address these limitations, we describe a method to reduce the storage and computation required by neural networks by an order of magnitude without affecting their accuracy by learning only the important connections. Our method prunes redundant connections using a three-step method. First, we train the network to learn which connections are important. Next, we prune the unimportant connections. Finally, we retrain the network to fine tune the weights of the remaining connections. On the ImageNet dataset, our method reduced the number of parameters of AlexNet by a factor of 9x, from 61 million to 6.7 million, without incurring accuracy loss. Similar experiments with VGG-16 found that the number of parameters can be reduced by 13x, from 138 million to 10.3 million, again with no loss of accuracy.

This paper has not been read by Pith yet.

discussion (0)

Forward citations

Cited by 6 Pith papers

Reviewed papers in the Pith corpus that reference this work. Sorted by Pith novelty score.

MedCore: Boundary-Preserving Medical Core Pruning for MedSAM
cs.CV 2026-05 unverdicted novelty 7.0

MedCore achieves 60% parameter and 58.4% FLOP reduction on MedSAM with Dice 0.9549 and preserved boundary metrics via dual-intervention pruning and a new boundary leverage principle.
FlashAttention: Fast and Memory-Efficient Exact Attention with IO-Awareness
cs.LG 2022-05 accept novelty 7.0

FlashAttention reduces GPU high-bandwidth memory accesses in self-attention via tiling, delivering exact attention with lower IO complexity, 2-3x wall-clock speedups on models like GPT-2, and the ability to train on s...
AgentSlimming: Towards Efficient and Cost-Aware Multi-Agent Systems
cs.LG 2026-05 unverdicted novelty 6.0

AgentSlimming compresses graph-structured multi-agent systems by estimating agent importance and removing or replacing low-value agents, cutting token costs by up to 78.9% with negligible performance loss.
OFA-Diffusion Compression: Compressing Diffusion Model in One-Shot Manner
cs.CV 2026-04 conditional novelty 6.0

OFA-Diffusion Compression trains diffusion models once to yield multiple size-specific compressed subnetworks via restricted candidate spaces, importance-based channel allocation, and reweighting.
RecGPT-Mobile: On-Device Large Language Models for User Intent Understanding in Taobao Feed Recommendation
cs.IR 2026-05 unverdicted novelty 4.0

RecGPT-Mobile runs a compact LLM on phones to understand evolving user intent from behaviors and improve mobile e-commerce recommendations.
Sparse-on-Dense: Area and Energy-Efficient Computing of Sparse Neural Networks on Dense Matrix Multiplication Accelerators
cs.AR 2026-04 unverdicted novelty 4.0

Sparse neural networks achieve better area and energy efficiency when executed on dense matrix multiplication accelerators using a Sparse-on-Dense approach than on dedicated sparse accelerators.