DoReFa-Net: Training Low Bitwidth Convolutional Neural Networks with Low Bitwidth Gradients

Shuchang Zhou , Yuxin Wu , Zekun Ni , Xinyu Zhou , He Wen , Yuheng Zou

Authors on Pith no claims yet

classification 💻 cs.NE cs.LG

keywords bitwidthdorefa-netgradientsactivationsconvolutionalneuraltrainingweights

read the original abstract

We propose DoReFa-Net, a method to train convolutional neural networks that have low bitwidth weights and activations using low bitwidth parameter gradients. In particular, during backward pass, parameter gradients are stochastically quantized to low bitwidth numbers before being propagated to convolutional layers. As convolutions during forward/backward passes can now operate on low bitwidth weights and activations/gradients respectively, DoReFa-Net can use bit convolution kernels to accelerate both training and inference. Moreover, as bit convolutions can be efficiently implemented on CPU, FPGA, ASIC and GPU, DoReFa-Net opens the way to accelerate training of low bitwidth neural network on these hardware. Our experiments on SVHN and ImageNet datasets prove that DoReFa-Net can achieve comparable prediction accuracy as 32-bit counterparts. For example, a DoReFa-Net derived from AlexNet that has 1-bit weights, 2-bit activations, can be trained from scratch using 6-bit gradients to get 46.1\% top-1 accuracy on ImageNet validation set. The DoReFa-Net AlexNet model is released publicly.

This paper has not been read by Pith yet.

discussion (0)

Forward citations

Cited by 9 Pith papers

Reviewed papers in the Pith corpus that reference this work. Sorted by Pith novelty score.

Training single-electron and single-photon stochastic physical neural networks
quant-ph 2026-04 unverdicted novelty 7.0

Single-electron and single-photon stochastic physical neural networks achieve over 97% MNIST test accuracy when trained with empirical outputs in the backward pass using few trials per layer.
Mixed Precision Training
cs.AI 2017-10 accept novelty 7.0

Mixed precision training uses FP16 for most computations, FP32 master weights for accumulation, and loss scaling to enable accurate training of large DNNs with halved memory usage.
SURGE: Surrogate Gradient Adaptation in Binary Neural Networks
cs.LG 2026-05 unverdicted novelty 6.0

SURGE proposes a dual-path gradient compensator and adaptive scaler to learn better surrogate gradients for binary neural network training, outperforming prior methods on classification, detection, and language tasks.
DiBA: Diagonal and Binary Matrix Approximation for Neural Network Weight Compression
cs.LG 2026-05 unverdicted novelty 6.0

DiBA factors weight matrices into diagonal-binary-diagonal-binary-diagonal form to cut matrix-vector multiplies from mn to m+k+n operations and improves accuracy on DistilBERT and audio transformer tasks after replacement.
Multibit neural inference in a N-ary crossbar architecture
cs.AR 2026-04 unverdicted novelty 5.0

Simulation of 4-state MTJ crossbars achieves 94.48% MNIST accuracy for neural inference, close to 97.56% software baseline, with analysis showing quantization as primary error and an optimal number of states per cell.
Design and Implementation of BNN-Based Object Detection on FPGA
cs.AR 2026-05 unverdicted novelty 4.0

A BNN-based YOLOv3-tiny-like object detector with 1-bit weights and 8-bit activations is implemented in Verilog on FPGA, achieving 39.6% mAP50 on VOC and 0.999964 correlation with the ONNX model in RTL simulation.
Carbon-Taxed Transformers: A Green Compression Pipeline for Overgrown Language Models
cs.SE 2026-04 unverdicted novelty 4.0

CTT is a compression pipeline for LLMs that achieves up to 49x memory reduction, 10x faster inference, 81% lower CO2 emissions, and retains 68-98% accuracy on code clone detection, summarization, and generation tasks.
Prune-Quantize-Distill: An Ordered Pipeline for Efficient Neural Network Compression
cs.LG 2026-04 unverdicted novelty 4.0

The prune-quantize-distill ordering produces a better accuracy-size-latency frontier on CIFAR-10/100 than any single technique or other orderings, with INT8 QAT providing the main runtime gain.
Design and Implementation of BNN-Based Object Detection on FPGA
cs.AR 2026-05 unverdicted novelty 3.0

A BNN-based YOLOv3-tiny object detector is implemented on FPGA achieving 39.6% mAP50 on VOC dataset with 0.098 GFLOPs and near-exact match to ONNX model in RTL simulation.