pith. machine review for the scientific record. sign in

arxiv: 1904.11486 · v2 · submitted 2019-04-25 · 💻 cs.CV · cs.LG

Recognition: unknown

Making Convolutional Networks Shift-Invariant Again

Authors on Pith no claims yet
classification 💻 cs.CV cs.LG
keywords networksconvolutionaldeepdownsamplinginputmax-poolingmodernobserve
0
0 comments X
read the original abstract

Modern convolutional networks are not shift-invariant, as small input shifts or translations can cause drastic changes in the output. Commonly used downsampling methods, such as max-pooling, strided-convolution, and average-pooling, ignore the sampling theorem. The well-known signal processing fix is anti-aliasing by low-pass filtering before downsampling. However, simply inserting this module into deep networks degrades performance; as a result, it is seldomly used today. We show that when integrated correctly, it is compatible with existing architectural components, such as max-pooling and strided-convolution. We observe \textit{increased accuracy} in ImageNet classification, across several commonly-used architectures, such as ResNet, DenseNet, and MobileNet, indicating effective regularization. Furthermore, we observe \textit{better generalization}, in terms of stability and robustness to input corruptions. Our results demonstrate that this classical signal processing technique has been undeservingly overlooked in modern deep networks. Code and anti-aliased versions of popular networks are available at https://richzhang.github.io/antialiased-cnns/ .

This paper has not been read by Pith yet.

discussion (0)

Sign in with ORCID, Apple, or X to comment. Anyone can read and Pith papers without signing in.

Forward citations

Cited by 2 Pith papers

Reviewed papers in the Pith corpus that reference this work. Sorted by Pith novelty score.

  1. Parameter-Efficient Architectural Modifications for Translation-Invariant CNNs

    cs.CV 2026-04 unverdicted novelty 5.0

    Strategic insertion of Global Average Pooling layers in VGG-16 reduces trainable parameters by 98%, maintains 66.4% ImageNet Top-1 accuracy, doubles translation robustness, and yields superior Spearman correlations in...

  2. GeomPrompt: Geometric Prompt Learning for RGB-D Semantic Segmentation Under Missing and Degraded Depth

    cs.CV 2026-04 unverdicted novelty 5.0

    GeomPrompt learns a task-driven geometric prompt from RGB alone to substitute for missing or degraded depth in frozen RGB-D semantic segmentation models, yielding up to +6.1 mIoU gains on SUN RGB-D while being faster ...