Slimmable Neural Networks

Jiahui Yu , Linjie Yang , Ning Xu , Jianchao Yang , Thomas Huang

Authors on Pith no claims yet

classification 💻 cs.CV cs.AI

keywords networksslimmabledifferentmodelsnetworkneuralbetterdetection

read the original abstract

We present a simple and general method to train a single neural network executable at different widths (number of channels in a layer), permitting instant and adaptive accuracy-efficiency trade-offs at runtime. Instead of training individual networks with different width configurations, we train a shared network with switchable batch normalization. At runtime, the network can adjust its width on the fly according to on-device benchmarks and resource constraints, rather than downloading and offloading different models. Our trained networks, named slimmable neural networks, achieve similar (and in many cases better) ImageNet classification accuracy than individually trained models of MobileNet v1, MobileNet v2, ShuffleNet and ResNet-50 at different widths respectively. We also demonstrate better performance of slimmable models compared with individual ones across a wide range of applications including COCO bounding-box object detection, instance segmentation and person keypoint detection without tuning hyper-parameters. Lastly we visualize and discuss the learned features of slimmable networks. Code and models are available at: https://github.com/JiahuiYu/slimmable_networks

This paper has not been read by Pith yet.

discussion (0)

Forward citations

Cited by 4 Pith papers

Reviewed papers in the Pith corpus that reference this work. Sorted by Pith novelty score.

MatryoshkaLoRA: Learning Accurate Hierarchical Low-Rank Representations for LLM Fine-Tuning
cs.CL 2026-05 unverdicted novelty 7.0

MatryoshkaLoRA inserts a crafted diagonal matrix P into LoRA to learn accurate nested low-rank adapters that support dynamic rank selection with minimal performance drop.
Elastic Attention Cores for Scalable Vision Transformers
cs.CV 2026-05 unverdicted novelty 6.0

VECA learns effective visual representations using core-periphery attention where patches interact exclusively via a resolution-invariant set of learned core embeddings, achieving linear O(N) complexity while maintain...
Objective-Specific Privileged Bases via Full-Prefix Matryoshka Learning
cs.LG 2026-05 unverdicted novelty 6.0

Full-prefix Matryoshka Representation Learning recovers ordered principal directions in the linear case and yields consistent per-dimension task-aligned structure.
CADENCE: Context-Adaptive Depth Estimation for Navigation and Computational Efficiency
cs.RO 2026-04 unverdicted novelty 4.0

CADENCE dynamically adjusts a slimmable depth estimation network's computational load according to context, cutting energy expenditure by 75% and boosting navigation accuracy by 7.43% versus static baselines.