COP: Customized Deep Model Compression via Regularized Correlation-Based Filter-Level Pruning

Cong Fu; Deng Cai; Jishun Guo; Wenxiao Wang; Xiaofei He

arxiv: 1906.10337 · v1 · pith:BGLIJ45Xnew · submitted 2019-06-25 · 💻 cs.CV · cs.LG

COP: Customized Deep Model Compression via Regularized Correlation-Based Filter-Level Pruning

Wenxiao Wang , Cong Fu , Jishun Guo , Deng Cai , Xiaofei He This is my paper

Pith reviewed 2026-05-25 17:03 UTC · model grok-4.3

classification 💻 cs.CV cs.LG

keywords model compressionfilter pruningconvolutional neural networkscorrelationregularizationCNN pruning

0 comments

The pith

COP prunes CNN filters by correlation after global normalization and adds regularization to let users customize for fewer parameters or lower computation.

A machine-rendered reading of the paper's core claim, the machinery that carries it, and where it could break.

Existing filter pruning methods score importance independently inside each layer, so they cannot compare filters across layers and force users to set pruning ratios manually per layer. They also ignore that removing the same number of parameters from different positions changes computational cost by different amounts. COP computes importance from pairwise filter correlations, normalizes the scores globally so filters become comparable across the whole network, and augments the score with separate regularization terms for total parameter count and total FLOPs. This produces a single ranking that automatically decides how many filters to remove from each layer while letting the user steer the result toward either a smaller model or a faster one. The paper reports that the resulting compressed networks retain higher accuracy than prior methods on standard image-classification benchmarks.

Core claim

Filter importance can be defined as a correlation score computed after global normalization across all layers and then regularized by both parameter quantity and computational cost, so that pruning decisions become cross-layer, redundancy-aware, and directly controllable for size versus speed without manual per-layer ratios.

What carries the argument

Regularized correlation-based importance score obtained after global normalization of filter responses

If this is right

Pruning ratios no longer require manual specification for each layer because global normalization produces a single comparable ranking.
Users can choose compression that favors smaller parameter count or lower FLOPs by adjusting the relative strength of the two regularization terms.
Redundancy is removed by considering relationships among filters rather than scoring each filter in isolation.
The same importance definition can be applied at different overall compression targets without re-tuning layer-wise schedules.

Where Pith is reading between the lines

These are editorial extensions of the paper, not claims the author makes directly.

The global-normalization step could be tested on architectures whose layers have very different filter counts to check whether the cross-layer ranking remains stable.
The dual regularization could be extended to include an additional term for memory bandwidth if deployment constraints change.

Load-bearing premise

That filters showing low correlation to others after global normalization are genuinely redundant and can be removed while preserving accuracy, and that the two regularization terms correctly balance parameter reduction against FLOPs reduction.

What would settle it

Prune a ResNet or VGG model on ImageNet to a fixed compression ratio using COP with the parameter-regularization term dominant, then compare top-1 accuracy against the same ratio obtained by a method that uses only local importance scores; a clear accuracy drop relative to the baseline would falsify the claim.

Figures

Figures reproduced from arXiv: 1906.10337 by Cong Fu, Deng Cai, Jishun Guo, Wenxiao Wang, Xiaofei He.

**Figure 2.** Figure 2: The figure is an illustration of the pruning process on a [PITH_FULL_IMAGE:figures/full_fig_p003_2.png] view at source ↗

**Figure 4.** Figure 4: The figure shows the output feature maps of “conv1 [PITH_FULL_IMAGE:figures/full_fig_p006_4.png] view at source ↗

read the original abstract

Neural network compression empowers the effective yet unwieldy deep convolutional neural networks (CNN) to be deployed in resource-constrained scenarios. Most state-of-the-art approaches prune the model in filter-level according to the "importance" of filters. Despite their success, we notice they suffer from at least two of the following problems: 1) The redundancy among filters is not considered because the importance is evaluated independently. 2) Cross-layer filter comparison is unachievable since the importance is defined locally within each layer. Consequently, we must manually specify layer-wise pruning ratios. 3) They are prone to generate sub-optimal solutions because they neglect the inequality between reducing parameters and reducing computational cost. Reducing the same number of parameters in different positions in the network may reduce different computational cost. To address the above problems, we develop a novel algorithm named as COP (correlation-based pruning), which can detect the redundant filters efficiently. We enable the cross-layer filter comparison through global normalization. We add parameter-quantity and computational-cost regularization terms to the importance, which enables the users to customize the compression according to their preference (smaller or faster). Extensive experiments have shown COP outperforms the others significantly. The code is released at https://github.com/ZJULearning/COP.

Editorial analysis

A structured set of objections, weighed in public.

Desk editor's note, referee report, simulated authors' rebuttal, and a circularity audit. Tearing a paper down is the easy half of reading it; the pith above is the substance, this is the friction.

Desk Editor's Note private letter to a colleague

COP's global normalization step for cross-layer filter comparison is the part that does not obviously hold up.

read the letter

The main issue with this paper is that its global normalization of correlation scores is unlikely to produce fair cross-layer comparisons when layers have different filter statistics. COP scores filters by correlation to catch redundancy, normalizes those scores globally so they can be compared across layers, and adds regularization terms for parameter count and FLOPs so users can choose smaller or faster models. This directly targets the three problems listed in the abstract: independent importance scores, local definitions that force manual ratios, and ignoring the param-compute inequality. The correlation approach and the dual regularization are the actual new pieces. The normalization is the weak point. Early layers and deep layers in typical CNNs like VGG or ResNet have systematically different weight scales and activation patterns. Applying one global norm can map genuinely redundant filters in one layer to low scores relative to another layer just due to distribution shift. Since the final importance combines the normalized correlation with the two reg terms, that mismatch carries through to the customization. The abstract offers no math or test showing the normalization is robust to these differences. The experiments are said to beat other methods, but without any numbers or setup details it's impossible to tell how much comes from the new components. This work is for researchers focused on practical CNN compression who need tunable trade-offs. Someone in that area could get something from the regularization idea. I would put it through peer review to verify the experiments and whether the normalization actually works as claimed.

Referee Report

3 major / 0 minor

Summary. The manuscript proposes COP, a correlation-based filter pruning algorithm for CNN compression. It identifies three limitations in prior work (independent importance scoring that ignores redundancy, lack of cross-layer comparability requiring manual per-layer ratios, and neglect of the unequal impact of parameter vs. computational-cost reduction) and claims to solve them via correlation-based redundancy detection, global normalization of importance scores, and the addition of parameter-quantity and computational-cost regularization terms that allow users to customize the size/speed trade-off. Experiments are said to show significant outperformance, and code is released.

Significance. If the central claims hold, the work would provide a practical method for global, customizable filter pruning that directly addresses redundancy and the param/FLOP asymmetry. The explicit release of code at the cited GitHub repository is a clear strength for reproducibility.

major comments (3)

[Abstract] Abstract: the claim that global normalization enables reliable cross-layer filter comparison is load-bearing for the entire cross-layer contribution, yet no derivation, invariance proof, or ablation is referenced showing that the normalized correlation scores remain comparable when filter statistics differ systematically by depth or channel count (common in VGG/ResNet).
[Abstract] Abstract: the two regularization terms are added to the importance score to enable customization, but the manuscript supplies no analysis of whether their weights constitute new free hyperparameters that must be tuned per model or per preference, which directly affects the claim that the method avoids new layer-wise tuning needs.
[Abstract] Abstract: the statement that 'extensive experiments have shown COP outperforms the others significantly' is presented without naming datasets, baselines, controls, or metrics, preventing assessment of whether post-hoc tuning or missing ablations undermine the outperformance claim.

Simulated Author's Rebuttal

3 responses · 0 unresolved

We thank the referee for the constructive comments. We address each major comment point by point below, indicating planned changes to the manuscript where appropriate.

read point-by-point responses

Referee: [Abstract] Abstract: the claim that global normalization enables reliable cross-layer filter comparison is load-bearing for the entire cross-layer contribution, yet no derivation, invariance proof, or ablation is referenced showing that the normalized correlation scores remain comparable when filter statistics differ systematically by depth or channel count (common in VGG/ResNet).

Authors: We acknowledge that the abstract does not reference supporting analysis for cross-layer comparability of the normalized scores. The manuscript describes the global normalization in the method section, but to strengthen the claim we will add an ablation study and brief invariance discussion in the revised version, and update the abstract to reference this material. revision: yes
Referee: [Abstract] Abstract: the two regularization terms are added to the importance score to enable customization, but the manuscript supplies no analysis of whether their weights constitute new free hyperparameters that must be tuned per model or per preference, which directly affects the claim that the method avoids new layer-wise tuning needs.

Authors: The regularization weights are global hyperparameters controlling the overall parameter/FLOP trade-off rather than per-layer ratios. This still eliminates the need for manual layer-wise pruning ratios. We will add sensitivity analysis for these weights in the revision and clarify the distinction in the abstract. revision: partial
Referee: [Abstract] Abstract: the statement that 'extensive experiments have shown COP outperforms the others significantly' is presented without naming datasets, baselines, controls, or metrics, preventing assessment of whether post-hoc tuning or missing ablations undermine the outperformance claim.

Authors: The full manuscript provides the experimental details (datasets, baselines, metrics) in Sections 4-5. We will revise the abstract to briefly name the key datasets and metrics for improved clarity. revision: yes

Circularity Check

0 steps flagged

No circularity: COP defines a new importance score constructively without reduction to inputs or self-citation chains

full rationale

The paper presents COP as a novel procedure that computes filter importance from correlation (with global normalization for cross-layer use) plus explicit regularization terms for parameter count and FLOPs. No equation or step reduces a claimed 'prediction' to a fitted quantity defined from the same data by construction. No load-bearing self-citation, uniqueness theorem, or ansatz smuggling is described. The central claim is an algorithmic definition, not a derivation that collapses to its inputs. This is the common honest case of a self-contained proposal.

Axiom & Free-Parameter Ledger

1 free parameters · 0 axioms · 0 invented entities

Abstract-only review; the method introduces two regularization coefficients whose values are not specified and must be chosen or tuned by the user.

free parameters (1)

regularization weights for parameter quantity and computational cost
Two scalar coefficients that control the relative strength of the two regularization terms; their values determine the customization behavior.

pith-pipeline@v0.9.0 · 5765 in / 1023 out tokens · 28288 ms · 2026-05-25T17:03:48.401662+00:00 · methodology

discussion (0)

Reference graph

Works this paper leans on

18 extracted references · 18 canonical work pages · 6 internal anchors

[1]

Predicting parameters in deep learning

[Denil et al., 2013] Misha Denil, Babak Shakibi, Laurent Dinh, Nando De Freitas, et al. Predicting parameters in deep learning. In Advances in Neural Information Pro- cessing Systems, pages 2148–2156,

work page 2013
[2]

Learning to prune deep neural networks via layer- wise optimal brain surgeon

[Dong et al., 2017] Xin Dong, Shangyu Chen, and Sinno Pan. Learning to prune deep neural networks via layer- wise optimal brain surgeon. In Advances in Neural Infor- mation Processing Systems, pages 4857–4867,

work page 2017
[3]

Dynamic network surgery for efﬁcient dnns

[Guo et al., 2016] Yiwen Guo, Anbang Yao, and Yurong Chen. Dynamic network surgery for efﬁcient dnns. In Ad- vances In Neural Information Processing Systems , pages 1379–1387,

work page 2016
[4]

Deep Compression: Compressing Deep Neural Networks with Pruning, Trained Quantization and Huffman Coding

[Han et al., 2015] Song Han, Huizi Mao, and William J Dally. Deep compression: Compressing deep neural net- works with pruning, trained quantization and huffman cod- ing. arXiv preprint arXiv:1510.00149,

work page internal anchor Pith review Pith/arXiv arXiv 2015
[5]

Identity mappings in deep residual networks

[He et al., 2016] Kaiming He, Xiangyu Zhang, Shaoqing Ren, and Jian Sun. Identity mappings in deep residual networks. In European Conference on Computer Vision , pages 630–645. Springer,

work page 2016
[6]

Channel pruning for accelerating very deep neural net- works

[He et al., 2017] Yihui He, Xiangyu Zhang, and Jian Sun. Channel pruning for accelerating very deep neural net- works. In International Conference on Computer Vision , volume 2,

work page 2017
[7]

Filter Pruning via Geometric Median for Deep Convolutional Neural Networks Acceleration

[He et al., 2018b] Yang He, Ping Liu, Ziwei Wang, and Yi Yang. Pruning ﬁlter via geometric median for deep convolutional neural networks acceleration.arXiv preprint arXiv:1811.00250,

work page internal anchor Pith review Pith/arXiv arXiv
[8]

MobileNets: Efficient Convolutional Neural Networks for Mobile Vision Applications

[Howard et al., 2017] Andrew G Howard, Menglong Zhu, Bo Chen, Dmitry Kalenichenko, Weijun Wang, Tobias Weyand, Marco Andreetto, and Hartwig Adam. Mo- bilenets: Efﬁcient convolutional neural networks for mo- bile vision applications. arXiv preprint arXiv:1704.04861,

work page internal anchor Pith review Pith/arXiv arXiv 2017
[9]

Efﬁcient dnn neuron pruning by minimizing layer-wise nonlinear reconstruction error

[Jiang et al., 2018] Chunhui Jiang, Guiying Li, Chao Qian, and Ke Tang. Efﬁcient dnn neuron pruning by minimizing layer-wise nonlinear reconstruction error. In International Joint Conference on Artiﬁcial Intelligence , volume 2018, pages 2–2,

work page 2018
[10]

Learning multiple layers of features from tiny images

[Krizhevsky and Hinton, 2009] Alex Krizhevsky and Geof- frey Hinton. Learning multiple layers of features from tiny images. Technical report, Citeseer,

work page 2009
[11]

Pruning Filters for Efficient ConvNets

[Li et al., 2016] Hao Li, Asim Kadav, Igor Durdanovic, Hanan Samet, and Hans Peter Graf. Pruning ﬁlters for ef- ﬁcient convnets. arXiv preprint arXiv:1608.08710,

work page internal anchor Pith review Pith/arXiv arXiv 2016
[12]

Learning efﬁcient convolutional networks through net- work slimming

[Liu et al., 2017] Zhuang Liu, Jianguo Li, Zhiqiang Shen, Gao Huang, Shoumeng Yan, and Changshui Zhang. Learning efﬁcient convolutional networks through net- work slimming. In IEEE International Conference on Computer Vision, pages 2755–2763. IEEE,

work page 2017
[13]

Thinet: A ﬁlter level pruning method for deep neural network compression

[Luo et al., 2017] Jian-Hao Luo, Jianxin Wu, and Weiyao Lin. Thinet: A ﬁlter level pruning method for deep neural network compression. In IEEE International Conference on Computer Vision, pages 5068–5076. IEEE,

work page 2017
[14]

Pruning Convolutional Neural Networks for Resource Efficient Inference

[Molchanov et al., 2016] Pavlo Molchanov, Stephen Tyree, Tero Karras, Timo Aila, and Jan Kautz. Pruning convo- lutional neural networks for resource efﬁcient inference. arXiv preprint arXiv:1611.06440,

work page internal anchor Pith review Pith/arXiv arXiv 2016
[15]

Imagenet large scale visual recogni- tion challenge

[Russakovsky et al., 2015] Olga Russakovsky, Jia Deng, Hao Su, Jonathan Krause, Sanjeev Satheesh, Sean Ma, Zhiheng Huang, Andrej Karpathy, Aditya Khosla, Michael Bernstein, et al. Imagenet large scale visual recogni- tion challenge. International Journal of Computer Vision, 115(3):211–252,

work page 2015
[16]

Very Deep Convolutional Networks for Large-Scale Image Recognition

[Simonyan and Zisserman, 2014] Karen Simonyan and An- drew Zisserman. Very deep convolutional networks for large-scale image recognition. arXiv preprint arXiv:1409.1556,

work page internal anchor Pith review Pith/arXiv arXiv 2014
[17]

Network compression using cor- relation analysis of layer responses

[Suau et al., 2018] Xavier Suau, Luca Zappella, and Nicholas Apostoloff. Network compression using cor- relation analysis of layer responses. arXiv preprint arXiv:1807.10585,

work page arXiv 2018
[18]

Nisp: Pruning net- works using neuron importance score propagation

[Yu et al., 2018] Ruichi Yu, Ang Li, Chun-Fu Chen, Jui- Hsin Lai, Vlad I Morariu, Xintong Han, Mingfei Gao, Ching-Yung Lin, and Larry S Davis. Nisp: Pruning net- works using neuron importance score propagation. In IEEE Conference on Computer Vision and Pattern Recog- nition, pages 9194–9203, 2018

work page 2018

[1] [1]

Predicting parameters in deep learning

[Denil et al., 2013] Misha Denil, Babak Shakibi, Laurent Dinh, Nando De Freitas, et al. Predicting parameters in deep learning. In Advances in Neural Information Pro- cessing Systems, pages 2148–2156,

work page 2013

[2] [2]

Learning to prune deep neural networks via layer- wise optimal brain surgeon

[Dong et al., 2017] Xin Dong, Shangyu Chen, and Sinno Pan. Learning to prune deep neural networks via layer- wise optimal brain surgeon. In Advances in Neural Infor- mation Processing Systems, pages 4857–4867,

work page 2017

[3] [3]

Dynamic network surgery for efﬁcient dnns

[Guo et al., 2016] Yiwen Guo, Anbang Yao, and Yurong Chen. Dynamic network surgery for efﬁcient dnns. In Ad- vances In Neural Information Processing Systems , pages 1379–1387,

work page 2016

[4] [4]

Deep Compression: Compressing Deep Neural Networks with Pruning, Trained Quantization and Huffman Coding

[Han et al., 2015] Song Han, Huizi Mao, and William J Dally. Deep compression: Compressing deep neural net- works with pruning, trained quantization and huffman cod- ing. arXiv preprint arXiv:1510.00149,

work page internal anchor Pith review Pith/arXiv arXiv 2015

[5] [5]

Identity mappings in deep residual networks

[He et al., 2016] Kaiming He, Xiangyu Zhang, Shaoqing Ren, and Jian Sun. Identity mappings in deep residual networks. In European Conference on Computer Vision , pages 630–645. Springer,

work page 2016

[6] [6]

Channel pruning for accelerating very deep neural net- works

[He et al., 2017] Yihui He, Xiangyu Zhang, and Jian Sun. Channel pruning for accelerating very deep neural net- works. In International Conference on Computer Vision , volume 2,

work page 2017

[7] [7]

Filter Pruning via Geometric Median for Deep Convolutional Neural Networks Acceleration

[He et al., 2018b] Yang He, Ping Liu, Ziwei Wang, and Yi Yang. Pruning ﬁlter via geometric median for deep convolutional neural networks acceleration.arXiv preprint arXiv:1811.00250,

work page internal anchor Pith review Pith/arXiv arXiv

[8] [8]

MobileNets: Efficient Convolutional Neural Networks for Mobile Vision Applications

[Howard et al., 2017] Andrew G Howard, Menglong Zhu, Bo Chen, Dmitry Kalenichenko, Weijun Wang, Tobias Weyand, Marco Andreetto, and Hartwig Adam. Mo- bilenets: Efﬁcient convolutional neural networks for mo- bile vision applications. arXiv preprint arXiv:1704.04861,

work page internal anchor Pith review Pith/arXiv arXiv 2017

[9] [9]

Efﬁcient dnn neuron pruning by minimizing layer-wise nonlinear reconstruction error

[Jiang et al., 2018] Chunhui Jiang, Guiying Li, Chao Qian, and Ke Tang. Efﬁcient dnn neuron pruning by minimizing layer-wise nonlinear reconstruction error. In International Joint Conference on Artiﬁcial Intelligence , volume 2018, pages 2–2,

work page 2018

[10] [10]

Learning multiple layers of features from tiny images

[Krizhevsky and Hinton, 2009] Alex Krizhevsky and Geof- frey Hinton. Learning multiple layers of features from tiny images. Technical report, Citeseer,

work page 2009

[11] [11]

Pruning Filters for Efficient ConvNets

[Li et al., 2016] Hao Li, Asim Kadav, Igor Durdanovic, Hanan Samet, and Hans Peter Graf. Pruning ﬁlters for ef- ﬁcient convnets. arXiv preprint arXiv:1608.08710,

work page internal anchor Pith review Pith/arXiv arXiv 2016

[12] [12]

Learning efﬁcient convolutional networks through net- work slimming

[Liu et al., 2017] Zhuang Liu, Jianguo Li, Zhiqiang Shen, Gao Huang, Shoumeng Yan, and Changshui Zhang. Learning efﬁcient convolutional networks through net- work slimming. In IEEE International Conference on Computer Vision, pages 2755–2763. IEEE,

work page 2017

[13] [13]

Thinet: A ﬁlter level pruning method for deep neural network compression

[Luo et al., 2017] Jian-Hao Luo, Jianxin Wu, and Weiyao Lin. Thinet: A ﬁlter level pruning method for deep neural network compression. In IEEE International Conference on Computer Vision, pages 5068–5076. IEEE,

work page 2017

[14] [14]

Pruning Convolutional Neural Networks for Resource Efficient Inference

[Molchanov et al., 2016] Pavlo Molchanov, Stephen Tyree, Tero Karras, Timo Aila, and Jan Kautz. Pruning convo- lutional neural networks for resource efﬁcient inference. arXiv preprint arXiv:1611.06440,

work page internal anchor Pith review Pith/arXiv arXiv 2016

[15] [15]

Imagenet large scale visual recogni- tion challenge

[Russakovsky et al., 2015] Olga Russakovsky, Jia Deng, Hao Su, Jonathan Krause, Sanjeev Satheesh, Sean Ma, Zhiheng Huang, Andrej Karpathy, Aditya Khosla, Michael Bernstein, et al. Imagenet large scale visual recogni- tion challenge. International Journal of Computer Vision, 115(3):211–252,

work page 2015

[16] [16]

Very Deep Convolutional Networks for Large-Scale Image Recognition

[Simonyan and Zisserman, 2014] Karen Simonyan and An- drew Zisserman. Very deep convolutional networks for large-scale image recognition. arXiv preprint arXiv:1409.1556,

work page internal anchor Pith review Pith/arXiv arXiv 2014

[17] [17]

Network compression using cor- relation analysis of layer responses

[Suau et al., 2018] Xavier Suau, Luca Zappella, and Nicholas Apostoloff. Network compression using cor- relation analysis of layer responses. arXiv preprint arXiv:1807.10585,

work page arXiv 2018

[18] [18]

Nisp: Pruning net- works using neuron importance score propagation

[Yu et al., 2018] Ruichi Yu, Ang Li, Chun-Fu Chen, Jui- Hsin Lai, Vlad I Morariu, Xintong Han, Mingfei Gao, Ching-Yung Lin, and Larry S Davis. Nisp: Pruning net- works using neuron importance score propagation. In IEEE Conference on Computer Vision and Pattern Recog- nition, pages 9194–9203, 2018

work page 2018