Pruning Filters for Efficient ConvNets

Asim Kadav; Hanan Samet; Hans Peter Graf; Hao Li; Igor Durdanovic

arxiv: 1608.08710 · v3 · pith:CDWBTEI2new · submitted 2016-08-31 · 💻 cs.CV · cs.LG

Pruning Filters for Efficient ConvNets

Hao Li , Asim Kadav , Igor Durdanovic , Hanan Samet , Hans Peter Graf This is my paper

classification 💻 cs.CV cs.LG

keywords pruningcostsaccuracycnnscomputationfilterslayersweights

0 comments

read the original abstract

The success of CNNs in various applications is accompanied by a significant increase in the computation and parameter storage costs. Recent efforts toward reducing these overheads involve pruning and compressing the weights of various layers without hurting original accuracy. However, magnitude-based pruning of weights reduces a significant number of parameters from the fully connected layers and may not adequately reduce the computation costs in the convolutional layers due to irregular sparsity in the pruned networks. We present an acceleration method for CNNs, where we prune filters from CNNs that are identified as having a small effect on the output accuracy. By removing whole filters in the network together with their connecting feature maps, the computation costs are reduced significantly. In contrast to pruning weights, this approach does not result in sparse connectivity patterns. Hence, it does not need the support of sparse convolution libraries and can work with existing efficient BLAS libraries for dense matrix multiplications. We show that even simple filter pruning techniques can reduce inference costs for VGG-16 by up to 34% and ResNet-110 by up to 38% on CIFAR10 while regaining close to the original accuracy by retraining the networks.

This paper has not been read by Pith yet.

discussion (0)

Forward citations

Cited by 15 Pith papers

Reviewed papers in the Pith corpus that reference this work. Sorted by Pith novelty score.

Discovering Data Encoding Strategies for Quantum-Classical Neural Networks Using Monte Carlo Tree Search
quant-ph 2026-05 conditional novelty 7.0

MCTS discovers superior data encoding circuits for QCCNNs that outperform standard encodings on medical datasets, with effective rank of feature maps serving as a performance predictor.
Re-Key-Free, Risky-Free: Adaptable Model Usage Control
cs.CR 2025-11 unverdicted novelty 7.0

AdaLoc keeps a model locked to authorized users by confining all post-deployment updates to a chosen subset of weights, preserving both task performance for authorized use and near-random accuracy for unauthorized use...
NetTailor: Tuning the Architecture, Not Just the Weights
cs.CV 2019-06 unverdicted novelty 7.0

NetTailor adapts CNN architecture for new tasks by assembling pre-trained universal blocks with task-specific layers, trained via activation mimicry and complexity penalties to match accuracy while reducing size for s...
SparseForge: Efficient Semi-Structured LLM Sparsification via Annealing of Hessian-Guided Soft-Mask
cs.LG 2026-05 unverdicted novelty 6.0

SparseForge achieves 57.27% zero-shot accuracy on LLaMA-2-7B at 2:4 sparsity using only 5B retraining tokens, beating the dense baseline and nearly matching a 40B-token SOTA method.
Neural Network Pruning via QUBO Optimization
cs.CV 2026-04 unverdicted novelty 6.0

A hybrid QUBO pruning framework using Taylor/Fisher metrics and activation similarity outperforms greedy Taylor and L1-QUBO baselines on the SIDD denoising dataset, with further gains from Tensor-Train refinement.
Memory- and Communication-Aware Model Compression for Distributed Deep Learning Inference on IoT
stat.ML 2019-07 unverdicted novelty 6.0

NoNN partitions a teacher model into disjoint compressed students via network science for distributed IoT inference, matching teacher accuracy with far lower per-device memory and communication.
COP: Customized Deep Model Compression via Regularized Correlation-Based Filter-Level Pruning
cs.CV 2019-06 unverdicted novelty 6.0

COP prunes CNN filters using correlation-based importance with global normalization and dual regularization on parameter quantity and FLOPs to enable customized compression.
TAPIOCA: Why Task- Aware Pruning Improves OOD model Capability
cs.LG 2026-05 unverdicted novelty 5.0

Task-aware pruning improves OOD performance by removing layers that distort task-adapted representation profiles, realigning OOD inputs with the geometry observed on ID data.
Rethinking Layer Relevance in Large Language Models Beyond Cosine Similarity
cs.LG 2026-05 unverdicted novelty 5.0

Cosine similarity poorly predicts performance degradation from layer removal in LLMs, making direct accuracy-drop ablation a more reliable relevance metric.
Engineering Resource-constrained Software Systems with DNN Components: a Concept-based Pruning Approach
cs.SE 2026-04 unverdicted novelty 5.0

A concept-based pruning method for DNNs guided by interpretable concepts and system requirements produces smaller, computationally efficient models that maintain effectiveness on image classification tasks.
Modality-Aware Zero-Shot Pruning and Sparse Attention for Efficient Multimodal Edge Inference
cs.LG 2026-04 unverdicted novelty 5.0

SentryFuse delivers modality-aware zero-shot pruning and sparse attention that improves accuracy by 12.7% on average and up to 18% under sensor dropout while cutting memory 28.2% and latency up to 1.63x across multimo...
Vanishing Contributions: A Unified Framework for Smooth and Iterative Model Compression
cs.LG 2025-10 unverdicted novelty 5.0

VCON is a unified framework for smooth iterative DNN compression that uses parallel execution and an affine combination to progressively replace the original model with its compressed form during fine-tuning.
Efficient compression of neural networks and datasets
cs.LG 2025-05 unverdicted novelty 5.0

Refined probabilistic and smooth l0 pruning techniques approximate minimum description length for neural networks, achieving high compression with minimal accuracy loss and empirically verifying better sample efficien...
Neuron ranking -- an informed way to condense convolutional neural networks architecture
cs.LG 2019-07 unverdicted novelty 5.0

Shapley value and variational importance switch methods produce consistent rankings of filter importance in CNNs, enabling compression and interpretability.
A Targeted Acceleration and Compression Framework for Low bit Neural Networks
cs.CV 2019-07 unverdicted novelty 4.0

TAC framework separates optimization of convolutional and fully connected layers in 1-bit DNNs to improve accuracy while maintaining efficiency.