Understanding and overcoming the challenges of efficient transformer quantization

Bondarenko, Y · 2021 · arXiv 2109.12948

4 Pith papers cite this work. Polarity classification is still indexing.

4 Pith papers citing it

representative citing papers

Massive Activations in Large Language Models

cs.CL · 2024-02-27 · unverdicted · novelty 7.0

Massive activations are constant large values in LLMs that function as indispensable bias terms and concentrate attention probabilities on specific tokens.

LLM.int8(): 8-bit Matrix Multiplication for Transformers at Scale

cs.LG · 2022-08-15 · conditional · novelty 7.0

LLM.int8() performs 8-bit inference for transformers up to 175B parameters with no accuracy loss by combining vector-wise quantization for most features with 16-bit mixed-precision handling of systematic outlier dimensions.

MUXQ: Mixed-to-Uniform Precision MatriX Quantization via Low-Rank Outlier Decomposition

cs.LG · 2026-04-06 · unverdicted · novelty 5.0

MUXQ uses low-rank outlier decomposition to redistribute activation outliers, allowing mixed-to-uniform INT8 quantization of LLMs with lower perplexity than naive methods on GPT-2 models.

$\text{Log}_\text{b}$Quant: Quantizing Language Models in Logarithmic Space

cs.CL · 2026-07-01 · unverdicted · novelty 4.0

Log_b Quant is an adjustable-base logarithmic quantization technique that outperforms tensor-wise asymmetric linear quantization at 4-bit precision on language model benchmarks while providing memory savings.

citing papers explorer

Showing 4 of 4 citing papers.

Massive Activations in Large Language Models cs.CL · 2024-02-27 · unverdicted · none · ref 107
Massive activations are constant large values in LLMs that function as indispensable bias terms and concentrate attention probabilities on specific tokens.
LLM.int8(): 8-bit Matrix Multiplication for Transformers at Scale cs.LG · 2022-08-15 · conditional · none · ref 120
LLM.int8() performs 8-bit inference for transformers up to 175B parameters with no accuracy loss by combining vector-wise quantization for most features with 16-bit mixed-precision handling of systematic outlier dimensions.
MUXQ: Mixed-to-Uniform Precision MatriX Quantization via Low-Rank Outlier Decomposition cs.LG · 2026-04-06 · unverdicted · none · ref 8
MUXQ uses low-rank outlier decomposition to redistribute activation outliers, allowing mixed-to-uniform INT8 quantization of LLMs with lower perplexity than naive methods on GPT-2 models.
$\text{Log}_\text{b}$Quant: Quantizing Language Models in Logarithmic Space cs.CL · 2026-07-01 · unverdicted · none · ref 6
Log_b Quant is an adjustable-base logarithmic quantization technique that outperforms tensor-wise asymmetric linear quantization at 4-bit precision on language model benchmarks while providing memory savings.

Understanding and overcoming the challenges of efficient transformer quantization

fields

years

verdicts

representative citing papers

citing papers explorer