Allowing each quantization group to select among multiple 4-bit grids improves accuracy over single-grid FP4 for both post-training and pre-training of LLMs.
Lee, Kathryn Le, Junxian Guo, Giovanni Traverso, Anantha P
2 Pith papers cite this work. Polarity classification is still indexing.
2
Pith papers citing it
citation-role summary
background 1
citation-polarity summary
fields
cs.LG 2years
2026 2roles
background 1polarities
background 1representative citing papers
SOP post-training quantization for LLMs reports lower weight reconstruction error than per-layer FP8 at 1.5 bpw lower cost using per-layer codebook search and hardware-aware formats.
citing papers explorer
-
Grid Games: The Power of Multiple Grids for Quantizing Large Language Models
Allowing each quantization group to select among multiple 4-bit grids improves accuracy over single-grid FP4 for both post-training and pre-training of LLMs.
-
A Hardware-Aware, Per-Layer Methodology for Post-Training Quantization of Large Language Models
SOP post-training quantization for LLMs reports lower weight reconstruction error than per-layer FP8 at 1.5 bpw lower cost using per-layer codebook search and hardware-aware formats.