pith. machine review for the scientific record. sign in

arxiv: 1805.06085 · v2 · submitted 2018-05-16 · 💻 cs.CV · cs.AI

Recognition: unknown

PACT: Parameterized Clipping Activation for Quantized Neural Networks

Jungwook Choi, Kailash Gopalakrishnan, Pierce I-Jen Chuang, Swagath Venkataramani, Vijayalakshmi Srinivasan, Zhuo Wang

Authors on Pith no claims yet
classification 💻 cs.CV cs.AI
keywords activationsaccuracyactivationquantizationclippingnetworkspactprecision
0
0 comments X
read the original abstract

Deep learning algorithms achieve high classification accuracy at the expense of significant computation cost. To address this cost, a number of quantization schemes have been proposed - but most of these techniques focused on quantizing weights, which are relatively smaller in size compared to activations. This paper proposes a novel quantization scheme for activations during training - that enables neural networks to work well with ultra low precision weights and activations without any significant accuracy degradation. This technique, PArameterized Clipping acTivation (PACT), uses an activation clipping parameter $\alpha$ that is optimized during training to find the right quantization scale. PACT allows quantizing activations to arbitrary bit precisions, while achieving much better accuracy relative to published state-of-the-art quantization schemes. We show, for the first time, that both weights and activations can be quantized to 4-bits of precision while still achieving accuracy comparable to full precision networks across a range of popular models and datasets. We also show that exploiting these reduced-precision computational units in hardware can enable a super-linear improvement in inferencing performance due to a significant reduction in the area of accelerator compute engines coupled with the ability to retain the quantized model and activation data in on-chip memories.

This paper has not been read by Pith yet.

discussion (0)

Sign in with ORCID, Apple, or X to comment. Anyone can read and Pith papers without signing in.

Forward citations

Cited by 6 Pith papers

Reviewed papers in the Pith corpus that reference this work. Sorted by Pith novelty score.

  1. OSAQ: Outlier Self-Absorption for Accurate Low-bit LLM Quantization

    cs.LG 2026-05 unverdicted novelty 6.0

    OSAQ uses the low-rank structure of the Hessian to construct a closed-form additive weight transformation that suppresses outliers without changing task loss, enabling better low-bit LLM quantization.

  2. OSAQ: Outlier Self-Absorption for Accurate Low-bit LLM Quantization

    cs.LG 2026-05 unverdicted novelty 6.0

    OSAQ suppresses weight outliers in LLMs via a closed-form additive transformation from the Hessian's stable null space, improving 2-bit quantization perplexity by over 40% versus vanilla GPTQ with no inference overhead.

  3. STRIDe: Cross-Coupled STT-MRAM Enabling Robust In-Memory-Computing for Deep Neural Network Accelerators

    cs.ET 2026-04 unverdicted novelty 6.0

    STRIDe cross-coupled STT-MRAM improves sense margin up to 3.86x and read disturb margin up to 27.6% for XNOR and AND IMC, achieving near-software DNN inference accuracy on CIFAR10 despite process variations.

  4. End-to-end Automated Deep Neural Network Optimization for PPG-based Blood Pressure Estimation on Wearables

    cs.LG 2026-04 unverdicted novelty 5.0

    An end-to-end hardware-aware optimization pipeline produces DNNs for PPG-based blood pressure estimation with up to 7.99% lower error and 83x fewer parameters that fit on ultra-low-power SoCs like GAP8.

  5. Deployment-Aligned Low-Precision Neural Architecture Search for Spaceborne Edge AI

    cs.CV 2026-04 unverdicted novelty 4.0

    Deployment-aligned low-precision NAS recovers about two-thirds of the accuracy drop from post-training quantization, achieving 0.826 mIoU on-device for a 95k-parameter model on Intel Movidius Myriad X without added co...

  6. Network Edge Inference for Large Language Models: Principles, Techniques, and Opportunities

    cs.DC 2026-04 unverdicted novelty 3.0

    A survey synthesizing challenges, system architectures, model optimizations, deployment methods, and resource management techniques for large language model inference at the network edge.