DuQuant++ adapts outlier-aware fine-grained rotation to MXFP4 by matching block size to the 32-element microscaling group, enabling a single rotation that smooths distributions and achieves SOTA performance on LLaMA-3 with lower cost.
Sherry: Hardware-efficient 1.25-bit ternary quantization via fine-grained sparsification
2 Pith papers cite this work. Polarity classification is still indexing.
2
Pith papers citing it
years
2026 2representative citing papers
citing papers explorer
-
DuQuant++: Fine-grained Rotation Enhances Microscaling FP4 Quantization
DuQuant++ adapts outlier-aware fine-grained rotation to MXFP4 by matching block size to the 32-element microscaling group, enabling a single rotation that smooths distributions and achieves SOTA performance on LLaMA-3 with lower cost.
- Hy-MT2: A Family of Fast, Efficient and Powerful Multilingual Translation Models in the Wild