PACT: Parameterized Clipping Activation for Quantized Neural Networks

Jungwook Choi, Kailash Gopalakrishnan, Pierce I-Jen Chuang, Swagath Venkataramani, Vijayalakshmi Srinivasan, Zhuo Wang

Authors on Pith no claims yet

classification 💻 cs.CV cs.AI

keywords activationsaccuracyactivationquantizationclippingnetworkspactprecision

read the original abstract

Deep learning algorithms achieve high classification accuracy at the expense of significant computation cost. To address this cost, a number of quantization schemes have been proposed - but most of these techniques focused on quantizing weights, which are relatively smaller in size compared to activations. This paper proposes a novel quantization scheme for activations during training - that enables neural networks to work well with ultra low precision weights and activations without any significant accuracy degradation. This technique, PArameterized Clipping acTivation (PACT), uses an activation clipping parameter $\alpha$ that is optimized during training to find the right quantization scale. PACT allows quantizing activations to arbitrary bit precisions, while achieving much better accuracy relative to published state-of-the-art quantization schemes. We show, for the first time, that both weights and activations can be quantized to 4-bits of precision while still achieving accuracy comparable to full precision networks across a range of popular models and datasets. We also show that exploiting these reduced-precision computational units in hardware can enable a super-linear improvement in inferencing performance due to a significant reduction in the area of accelerator compute engines coupled with the ability to retain the quantized model and activation data in on-chip memories.

This paper has not been read by Pith yet.

discussion (0)

Forward citations

Cited by 6 Pith papers

Reviewed papers in the Pith corpus that reference this work. Sorted by Pith novelty score.

OSAQ: Outlier Self-Absorption for Accurate Low-bit LLM Quantization
cs.LG 2026-05 unverdicted novelty 6.0

OSAQ uses the low-rank structure of the Hessian to construct a closed-form additive weight transformation that suppresses outliers without changing task loss, enabling better low-bit LLM quantization.
OSAQ: Outlier Self-Absorption for Accurate Low-bit LLM Quantization
cs.LG 2026-05 unverdicted novelty 6.0

OSAQ suppresses weight outliers in LLMs via a closed-form additive transformation from the Hessian's stable null space, improving 2-bit quantization perplexity by over 40% versus vanilla GPTQ with no inference overhead.
STRIDe: Cross-Coupled STT-MRAM Enabling Robust In-Memory-Computing for Deep Neural Network Accelerators
cs.ET 2026-04 unverdicted novelty 6.0

STRIDe cross-coupled STT-MRAM improves sense margin up to 3.86x and read disturb margin up to 27.6% for XNOR and AND IMC, achieving near-software DNN inference accuracy on CIFAR10 despite process variations.
End-to-end Automated Deep Neural Network Optimization for PPG-based Blood Pressure Estimation on Wearables
cs.LG 2026-04 unverdicted novelty 5.0

An end-to-end hardware-aware optimization pipeline produces DNNs for PPG-based blood pressure estimation with up to 7.99% lower error and 83x fewer parameters that fit on ultra-low-power SoCs like GAP8.
Deployment-Aligned Low-Precision Neural Architecture Search for Spaceborne Edge AI
cs.CV 2026-04 unverdicted novelty 4.0

Deployment-aligned low-precision NAS recovers about two-thirds of the accuracy drop from post-training quantization, achieving 0.826 mIoU on-device for a 95k-parameter model on Intel Movidius Myriad X without added co...
Network Edge Inference for Large Language Models: Principles, Techniques, and Opportunities
cs.DC 2026-04 unverdicted novelty 3.0

A survey synthesizing challenges, system architectures, model optimizations, deployment methods, and resource management techniques for large language model inference at the network edge.