pith. sign in

Integer quantization for deep learning inference: Principles and empirical evaluation

9 Pith papers cite this work. Polarity classification is still indexing.

9 Pith papers citing it

years

2026 7 2022 2

clear filters

representative citing papers

LLM.int8(): 8-bit Matrix Multiplication for Transformers at Scale

cs.LG · 2022-08-15 · conditional · novelty 7.0

LLM.int8() performs 8-bit inference for transformers up to 175B parameters with no accuracy loss by combining vector-wise quantization for most features with 16-bit mixed-precision handling of systematic outlier dimensions.

FP8 Formats for Deep Learning

cs.LG · 2022-09-12 · unverdicted · novelty 6.0

FP8 formats E4M3 and E5M2 match 16-bit training accuracy on CNNs, RNNs, and Transformers up to 175B parameters without hyperparameter changes.

citing papers explorer

Showing 4 of 4 citing papers after filters.

  • Transformers Provably Learn to Internalize Chain-of-Thought cs.LG · 2026-05-27 · unverdicted · none · ref 51

    L-layer transformers under Log-ICoT curriculum provably learn k-parity with poly(n) samples and log k stages, matching explicit CoT efficiency without inference overhead.

  • LLM.int8(): 8-bit Matrix Multiplication for Transformers at Scale cs.LG · 2022-08-15 · conditional · none · ref 170

    LLM.int8() performs 8-bit inference for transformers up to 175B parameters with no accuracy loss by combining vector-wise quantization for most features with 16-bit mixed-precision handling of systematic outlier dimensions.

  • QuIDE: Mastering the Quantized Intelligence Trade-off via Active Optimization cs.LG · 2026-05-05 · unverdicted · none · ref 6

    QuIDE defines the Intelligence Index I = (C × P) / log₂(T+1) as a unified score for the compression-accuracy-latency trade-off in quantized neural networks, with experiments showing task-dependent optimal bit widths.

  • FP8 Formats for Deep Learning cs.LG · 2022-09-12 · unverdicted · none · ref 23

    FP8 formats E4M3 and E5M2 match 16-bit training accuracy on CNNs, RNNs, and Transformers up to 175B parameters without hyperparameter changes.