Dorefa-net: Training low bitwidth convolutional neural networks with low bitwidth gradients

· 2016 · arXiv 1606.06160

8 Pith papers cite this work. Polarity classification is still indexing.

8 Pith papers citing it

representative citing papers

Training single-electron and single-photon stochastic physical neural networks

quant-ph · 2026-04-12 · unverdicted · novelty 7.0

Single-electron and single-photon stochastic physical neural networks achieve over 97% MNIST test accuracy when trained with empirical outputs in the backward pass using few trials per layer.

Mixed Precision Training

cs.AI · 2017-10-10 · accept · novelty 7.0

Mixed precision training uses FP16 for most computations, FP32 master weights for accumulation, and loss scaling to enable accurate training of large DNNs with halved memory usage.

SURGE: Surrogate Gradient Adaptation in Binary Neural Networks

cs.LG · 2026-05-09 · unverdicted · novelty 6.0

SURGE proposes a dual-path gradient compensator and adaptive scaler to learn better surrogate gradients for binary neural network training, outperforming prior methods on classification, detection, and language tasks.

DiBA: Diagonal and Binary Matrix Approximation for Neural Network Weight Compression

cs.LG · 2026-05-07 · unverdicted · novelty 6.0

DiBA factors weight matrices into diagonal-binary-diagonal-binary-diagonal form to cut matrix-vector multiplies from mn to m+k+n operations and improves accuracy on DistilBERT and audio transformer tasks after replacement.

Multibit neural inference in a N-ary crossbar architecture

cs.AR · 2026-04-28 · unverdicted · novelty 5.0

Simulation of 4-state MTJ crossbars achieves 94.48% MNIST accuracy for neural inference, close to 97.56% software baseline, with analysis showing quantization as primary error and an optimal number of states per cell.

Design and Implementation of BNN-Based Object Detection on FPGA

cs.AR · 2026-05-05 · unverdicted · novelty 4.0 · 2 refs

A BNN-based YOLOv3-tiny-like object detector with 1-bit weights and 8-bit activations is implemented in Verilog on FPGA, achieving 39.6% mAP50 on VOC and 0.999964 correlation with the ONNX model in RTL simulation.

Carbon-Taxed Transformers: A Green Compression Pipeline for Overgrown Language Models

cs.SE · 2026-04-28 · unverdicted · novelty 4.0

CTT is a compression pipeline for LLMs that achieves up to 49x memory reduction, 10x faster inference, 81% lower CO2 emissions, and retains 68-98% accuracy on code clone detection, summarization, and generation tasks.

Prune-Quantize-Distill: An Ordered Pipeline for Efficient Neural Network Compression

cs.LG · 2026-04-05 · unverdicted · novelty 4.0

The prune-quantize-distill ordering produces a better accuracy-size-latency frontier on CIFAR-10/100 than any single technique or other orderings, with INT8 QAT providing the main runtime gain.

citing papers explorer

Showing 8 of 8 citing papers.

Training single-electron and single-photon stochastic physical neural networks quant-ph · 2026-04-12 · unverdicted · none · ref 39
Single-electron and single-photon stochastic physical neural networks achieve over 97% MNIST test accuracy when trained with empirical outputs in the backward pass using few trials per layer.
Mixed Precision Training cs.AI · 2017-10-10 · accept · none · ref 34
Mixed precision training uses FP16 for most computations, FP32 master weights for accumulation, and loss scaling to enable accurate training of large DNNs with halved memory usage.
SURGE: Surrogate Gradient Adaptation in Binary Neural Networks cs.LG · 2026-05-09 · unverdicted · none · ref 20
SURGE proposes a dual-path gradient compensator and adaptive scaler to learn better surrogate gradients for binary neural network training, outperforming prior methods on classification, detection, and language tasks.
DiBA: Diagonal and Binary Matrix Approximation for Neural Network Weight Compression cs.LG · 2026-05-07 · unverdicted · none · ref 16
DiBA factors weight matrices into diagonal-binary-diagonal-binary-diagonal form to cut matrix-vector multiplies from mn to m+k+n operations and improves accuracy on DistilBERT and audio transformer tasks after replacement.
Multibit neural inference in a N-ary crossbar architecture cs.AR · 2026-04-28 · unverdicted · none · ref 18
Simulation of 4-state MTJ crossbars achieves 94.48% MNIST accuracy for neural inference, close to 97.56% software baseline, with analysis showing quantization as primary error and an optimal number of states per cell.
Design and Implementation of BNN-Based Object Detection on FPGA cs.AR · 2026-05-05 · unverdicted · none · ref 9 · 2 links
A BNN-based YOLOv3-tiny-like object detector with 1-bit weights and 8-bit activations is implemented in Verilog on FPGA, achieving 39.6% mAP50 on VOC and 0.999964 correlation with the ONNX model in RTL simulation.
Carbon-Taxed Transformers: A Green Compression Pipeline for Overgrown Language Models cs.SE · 2026-04-28 · unverdicted · none · ref 73
CTT is a compression pipeline for LLMs that achieves up to 49x memory reduction, 10x faster inference, 81% lower CO2 emissions, and retains 68-98% accuracy on code clone detection, summarization, and generation tasks.
Prune-Quantize-Distill: An Ordered Pipeline for Efficient Neural Network Compression cs.LG · 2026-04-05 · unverdicted · none · ref 14
The prune-quantize-distill ordering produces a better accuracy-size-latency frontier on CIFAR-10/100 than any single technique or other orderings, with INT8 QAT providing the main runtime gain.

Dorefa-net: Training low bitwidth convolutional neural networks with low bitwidth gradients

fields

years

verdicts

representative citing papers

citing papers explorer