Quartet II: Accu- rate LLM pre-training in NVFP4 by improved unbiased gradient estimation

Andrei Panferov, Erik Schultheis, Soroush Tabesh, Dan Alistarh · 2026 · arXiv 2601.22813

4 Pith papers cite this work. Polarity classification is still indexing.

4 Pith papers citing it

representative citing papers

Grid Games: The Power of Multiple Grids for Quantizing Large Language Models

cs.LG · 2026-05-12 · accept · novelty 8.0

Allowing each quantization group to select among multiple 4-bit grids improves accuracy over single-grid FP4 for both post-training and pre-training of LLMs.

Quantizing With Randomized Hadamard Transforms: Efficient Heuristic Now Proven

cs.LG · 2026-05-07 · unverdicted · novelty 7.0

Two randomized Hadamard transforms suffice to make coordinate marginals O(d^{-1/2})-close to Gaussian for most quantization methods, with three needed for vector quantization to match uniform random rotations asymptotically.

Normalized Architectures are Natively 4-Bit

cs.LG · 2026-05-07 · conditional · novelty 6.0

nGPT's hypersphere constraint makes dot-product signal accumulate constructively under 4-bit quantization while noise averages out, enabling native low-precision training.

HiFloat4 Format for Language Model Pre-training on Ascend NPUs

cs.LG · 2026-04-09 · unverdicted · novelty 4.0

HiFloat4 FP4 with stabilization techniques trains dense and MoE language models on Ascend NPUs at relative error within 1% of full-precision baselines.

citing papers explorer

Showing 4 of 4 citing papers.

Grid Games: The Power of Multiple Grids for Quantizing Large Language Models cs.LG · 2026-05-12 · accept · none · ref 29
Allowing each quantization group to select among multiple 4-bit grids improves accuracy over single-grid FP4 for both post-training and pre-training of LLMs.
Quantizing With Randomized Hadamard Transforms: Efficient Heuristic Now Proven cs.LG · 2026-05-07 · unverdicted · none · ref 29
Two randomized Hadamard transforms suffice to make coordinate marginals O(d^{-1/2})-close to Gaussian for most quantization methods, with three needed for vector quantization to match uniform random rotations asymptotically.
Normalized Architectures are Natively 4-Bit cs.LG · 2026-05-07 · conditional · none · ref 13
nGPT's hypersphere constraint makes dot-product signal accumulate constructively under 4-bit quantization while noise averages out, enabling native low-precision training.
HiFloat4 Format for Language Model Pre-training on Ascend NPUs cs.LG · 2026-04-09 · unverdicted · none · ref 12
HiFloat4 FP4 with stabilization techniques trains dense and MoE language models on Ascend NPUs at relative error within 1% of full-precision baselines.

Quartet II: Accu- rate LLM pre-training in NVFP4 by improved unbiased gradient estimation

fields

years

verdicts

representative citing papers

citing papers explorer