Allowing each quantization group to select among multiple 4-bit grids improves accuracy over single-grid FP4 for both post-training and pre-training of LLMs.
Quartet II: Accu- rate LLM pre-training in NVFP4 by improved unbiased gradient estimation
4 Pith papers cite this work. Polarity classification is still indexing.
fields
cs.LG 4years
2026 4representative citing papers
Two randomized Hadamard transforms suffice to make coordinate marginals O(d^{-1/2})-close to Gaussian for most quantization methods, with three needed for vector quantization to match uniform random rotations asymptotically.
nGPT's hypersphere constraint makes dot-product signal accumulate constructively under 4-bit quantization while noise averages out, enabling native low-precision training.
HiFloat4 FP4 with stabilization techniques trains dense and MoE language models on Ascend NPUs at relative error within 1% of full-precision baselines.
citing papers explorer
-
Grid Games: The Power of Multiple Grids for Quantizing Large Language Models
Allowing each quantization group to select among multiple 4-bit grids improves accuracy over single-grid FP4 for both post-training and pre-training of LLMs.
-
Quantizing With Randomized Hadamard Transforms: Efficient Heuristic Now Proven
Two randomized Hadamard transforms suffice to make coordinate marginals O(d^{-1/2})-close to Gaussian for most quantization methods, with three needed for vector quantization to match uniform random rotations asymptotically.
-
Normalized Architectures are Natively 4-Bit
nGPT's hypersphere constraint makes dot-product signal accumulate constructively under 4-bit quantization while noise averages out, enabling native low-precision training.
-
HiFloat4 Format for Language Model Pre-training on Ascend NPUs
HiFloat4 FP4 with stabilization techniques trains dense and MoE language models on Ascend NPUs at relative error within 1% of full-precision baselines.