TPV measures first-order sensitivity of model outputs to parameter perturbations, unifies robustness analysis under one lens, proves train-to-test convergence in overparameterized limits, and enables label-free pruning and model selection applications.
hub
Pruning Convolutional Neural Networks for Resource Efficient Inference
11 Pith papers cite this work. Polarity classification is still indexing.
abstract
We propose a new formulation for pruning convolutional kernels in neural networks to enable efficient inference. We interleave greedy criteria-based pruning with fine-tuning by backpropagation - a computationally efficient procedure that maintains good generalization in the pruned network. We propose a new criterion based on Taylor expansion that approximates the change in the cost function induced by pruning network parameters. We focus on transfer learning, where large pretrained networks are adapted to specialized tasks. The proposed criterion demonstrates superior performance compared to other criteria, e.g. the norm of kernel weights or feature map activation, for pruning large CNNs after adaptation to fine-grained classification tasks (Birds-200 and Flowers-102) relaying only on the first order gradient information. We also show that pruning can lead to more than 10x theoretical (5x practical) reduction in adapted 3D-convolutional filters with a small drop in accuracy in a recurrent gesture classifier. Finally, we show results for the large-scale ImageNet dataset to emphasize the flexibility of our approach.
hub tools
citation-role summary
citation-polarity summary
roles
background 2polarities
background 2representative citing papers
A hybrid sparse Byzantine attack using network pruning insights and slow accumulation bypasses eight state-of-the-art defenses in federated learning simulations.
NetTailor adapts CNN architecture for new tasks by assembling pre-trained universal blocks with task-specific layers, trained via activation mimicry and complexity penalties to match accuracy while reducing size for simpler tasks.
MNAFT identifies language-agnostic and language-specific neurons via activation analysis and selectively fine-tunes only relevant ones in MLLMs to close the modality gap and outperform full fine-tuning and other methods on image translation benchmarks.
A CNN modulator jointly trained with a neural receiver spreads information across local time-frequency neighborhoods in OFDM, breaking QAM rotational symmetry to support sparse or zero pilots under high Doppler.
H2O evicts non-heavy-hitter tokens from the KV cache using a dynamic submodular policy, retaining recent and frequent-co-occurrence tokens to reduce memory while preserving accuracy.
COP prunes CNN filters using correlation-based importance with global normalization and dual regularization on parameter quantity and FLOPs to enable customized compression.
ReST-KV formulates KV eviction as layer-wise output reconstruction optimization with spatial-temporal smoothing, outperforming baselines by 2.58% on LongBench and 15.2% on RULER while cutting decoding latency by 10.61x at 128k context.
LoRA-Over injects auxiliary parameters into low-rank adapters during training and decomposes them back into standard LoRA at inference, with static or dynamic scheduling to allocate extra capacity where needed, yielding better generalization than vanilla LoRA on GLUE, MT-Bench, GSM8K and HumanEval.
Pruning small-magnitude weights from pre-trained LLMs causes monotonic irreversible performance degradation on difficult downstream tasks, supporting the Junk DNA Hypothesis that these weights hold essential knowledge.
SentryFuse delivers modality-aware zero-shot pruning and sparse attention that improves accuracy by 12.7% on average and up to 18% under sensor dropout while cutting memory 28.2% and latency up to 1.63x across multimodal edge models.
citing papers explorer
-
TPV: Parameter Perturbations Through the Lens of Test Prediction Variance
TPV measures first-order sensitivity of model outputs to parameter perturbations, unifies robustness analysis under one lens, proves train-to-test convergence in overparameterized limits, and enables label-free pruning and model selection applications.
-
Aggressive or Imperceptible, or Both: Network Pruning Assisted Hybrid Byzantines in Federated Learning
A hybrid sparse Byzantine attack using network pruning insights and slow accumulation bypasses eight state-of-the-art defenses in federated learning simulations.
-
NetTailor: Tuning the Architecture, Not Just the Weights
NetTailor adapts CNN architecture for new tasks by assembling pre-trained universal blocks with task-specific layers, trained via activation mimicry and complexity penalties to match accuracy while reducing size for simpler tasks.
-
MNAFT: modality neuron-aware fine-tuning of multimodal large language models for image translation
MNAFT identifies language-agnostic and language-specific neurons via activation analysis and selectively fine-tunes only relevant ones in MLLMs to close the modality gap and outperform full fine-tuning and other methods on image translation benchmarks.
-
Deep-OFDM: Neural Modulation for High Mobility
A CNN modulator jointly trained with a neural receiver spreads information across local time-frequency neighborhoods in OFDM, breaking QAM rotational symmetry to support sparse or zero pilots under high Doppler.
-
H$_2$O: Heavy-Hitter Oracle for Efficient Generative Inference of Large Language Models
H2O evicts non-heavy-hitter tokens from the KV cache using a dynamic submodular policy, retaining recent and frequent-co-occurrence tokens to reduce memory while preserving accuracy.
-
COP: Customized Deep Model Compression via Regularized Correlation-Based Filter-Level Pruning
COP prunes CNN filters using correlation-based importance with global normalization and dual regularization on parameter quantity and FLOPs to enable customized compression.
-
ReST-KV: Robust KV Cache Eviction with Layer-wise Output Reconstruction and Spatial-Temporal Smoothing
ReST-KV formulates KV eviction as layer-wise output reconstruction optimization with spatial-temporal smoothing, outperforming baselines by 2.58% on LongBench and 15.2% on RULER while cutting decoding latency by 10.61x at 128k context.
-
Strategic Over-Parameterization for Generalizable Low-Rank Adaptation
LoRA-Over injects auxiliary parameters into low-rank adapters during training and decomposes them back into standard LoRA at inference, with static or dynamic scheduling to allocate extra capacity where needed, yielding better generalization than vanilla LoRA on GLUE, MT-Bench, GSM8K and HumanEval.
-
Junk DNA Hypothesis: Pruning Small Pre-Trained Weights Irreversibly and Monotonically Impairs "Difficult" Downstream Tasks in LLMs
Pruning small-magnitude weights from pre-trained LLMs causes monotonic irreversible performance degradation on difficult downstream tasks, supporting the Junk DNA Hypothesis that these weights hold essential knowledge.
-
Modality-Aware Zero-Shot Pruning and Sparse Attention for Efficient Multimodal Edge Inference
SentryFuse delivers modality-aware zero-shot pruning and sparse attention that improves accuracy by 12.7% on average and up to 18% under sensor dropout while cutting memory 28.2% and latency up to 1.63x across multimodal edge models.