pith. machine review for the scientific record. sign in

Four Over Six: More Accurate NVFP4 Quantization with Adaptive Block Scaling

7 Pith papers cite this work. Polarity classification is still indexing.

7 Pith papers citing it
abstract

As large language models have grown larger, interest has grown in low-precision numerical formats such as NVFP4 as a way to improve speed and reduce memory usage. However, quantizing models to NVFP4 remains challenging as the lack of precision generally degrades model performance. In this work, we address this issue with Four Over Six (4/6), a modification to the block-scaled NVFP4 quantization algorithm that yields reduced quantization error. Unlike integer formats, floating point formats have non-uniform step sizes which create larger quantization error on larger values. 4/6 takes advantage of this by adaptively scaling some blocks to smaller FP4 values, making the distribution of representable values more uniform and reducing quantization error for near-maximal values. We show that 4/6 can be implemented efficiently on modern hardware accelerators, resulting in performance gains during both pre-training and inference with minimal computational overhead. In pre-training experiments with the Nemotron 3 Nano 30B-A3B model architecture, we find that 4/6 brings training loss closer to BF16 compared to models trained with current state-of-the-art NVFP4 training recipes. Our code is available at https://github.com/mit-han-lab/fouroversix.

years

2026 7

clear filters

representative citing papers

Finer is Better (with the Right Scaling)

cs.LG · 2026-05-08 · unverdicted · novelty 6.0

Finer block sizes strictly improve theoretical MSE in microscaling for LLMs when scaling is adjusted to handle heavy-tailed distributions and FP4 binning, allowing standard formats to match custom wider-exponent ones.

Normalized Architectures are Natively 4-Bit

cs.LG · 2026-05-07 · conditional · novelty 6.0

nGPT's hypersphere constraint makes dot-product signal accumulate constructively under 4-bit quantization while noise averages out, enabling native low-precision training.

QuantClaw: Precision Where It Matters for OpenClaw

cs.AI · 2026-04-24 · unverdicted · novelty 6.0

QuantClaw dynamically routes precision in agent workflows to cut cost by up to 21.4% and latency by 15.7% while keeping or improving task performance.

DuQuant++: Fine-grained Rotation Enhances Microscaling FP4 Quantization

cs.CV · 2026-04-20 · unverdicted · novelty 4.0

DuQuant++ adapts outlier-aware fine-grained rotation to MXFP4 by matching block size to the 32-element microscaling group, enabling a single rotation that smooths distributions and achieves SOTA performance on LLaMA-3 with lower cost.

citing papers explorer

Showing 5 of 5 citing papers after filters.

  • SOAR: Scale Optimization for Accurate Reconstruction in NVFP4 Quantization cs.LG · 2026-05-12 · unverdicted · none · ref 10 · internal anchor

    SOAR improves NVFP4 post-training quantization accuracy for LLMs by analytically solving joint scale optimization and searching decoupled scales.

  • Finer is Better (with the Right Scaling) cs.LG · 2026-05-08 · unverdicted · none · ref 2 · internal anchor

    Finer block sizes strictly improve theoretical MSE in microscaling for LLMs when scaling is adjusted to handle heavy-tailed distributions and FP4 binning, allowing standard formats to match custom wider-exponent ones.

  • QuantClaw: Precision Where It Matters for OpenClaw cs.AI · 2026-04-24 · unverdicted · none · ref 37 · internal anchor

    QuantClaw dynamically routes precision in agent workflows to cut cost by up to 21.4% and latency by 15.7% while keeping or improving task performance.

  • DuQuant++: Fine-grained Rotation Enhances Microscaling FP4 Quantization cs.CV · 2026-04-20 · unverdicted · none · ref 3 · internal anchor

    DuQuant++ adapts outlier-aware fine-grained rotation to MXFP4 by matching block size to the 32-element microscaling group, enabling a single rotation that smooths distributions and achieves SOTA performance on LLaMA-3 with lower cost.

  • HiFloat4 Format for Language Model Pre-training on Ascend NPUs cs.LG · 2026-04-09 · unverdicted · none · ref 6 · internal anchor

    HiFloat4 FP4 with stabilization techniques trains dense and MoE language models on Ascend NPUs at relative error within 1% of full-precision baselines.