Llm.int8(): 8-bit matrix multiplication for transformers at scale

Tim Dettmers, Mike Lewis, Younes Belkada, Luke Zettlemoyer, “Llm · 2022

2 Pith papers cite this work. Polarity classification is still indexing.

2 Pith papers citing it

browse 2 citing papers

representative citing papers

Graph-Guided Adaptive Channel Elimination for KV Cache Compression

eess.SP · 2026-04-18 · unverdicted · novelty 6.0

GRACE reframes KV cache channel pruning as graph optimization to find a near-optimal subset, achieving 60% compression with negligible degradation and outperforming prior methods.

Quantizing Whisper-small: How design choices affect ASR performance

eess.AS · 2025-11-11 · unverdicted · novelty 4.0

Dynamic int8 quantization via Quanto on Whisper-small reduces size by 57% and improves WER on LibriSpeech test sets compared to the unquantized baseline.

citing papers explorer

Showing 2 of 2 citing papers.

Graph-Guided Adaptive Channel Elimination for KV Cache Compression eess.SP · 2026-04-18 · unverdicted · none · ref 26
GRACE reframes KV cache channel pruning as graph optimization to find a near-optimal subset, achieving 60% compression with negligible degradation and outperforming prior methods.
Quantizing Whisper-small: How design choices affect ASR performance eess.AS · 2025-11-11 · unverdicted · none · ref 13
Dynamic int8 quantization via Quanto on Whisper-small reduces size by 57% and improves WER on LibriSpeech test sets compared to the unquantized baseline.

Llm.int8(): 8-bit matrix multiplication for transformers at scale

fields

years

verdicts

representative citing papers

citing papers explorer