Fp4 all the way: Fully quantized training of llms

Brian Chmiel, Maxim Fishman, Ron Banner, Daniel Soudry · 2025 · arXiv 2505.19115

7 Pith papers cite this work. Polarity classification is still indexing.

7 Pith papers citing it

representative citing papers

Grid Games: The Power of Multiple Grids for Quantizing Large Language Models

cs.LG · 2026-05-12 · accept · novelty 8.0

Allowing each quantization group to select among multiple 4-bit grids improves accuracy over single-grid FP4 for both post-training and pre-training of LLMs.

Search Your Block Floating Point Scales!

cs.LG · 2026-05-12 · unverdicted · novelty 6.0

ScaleSearch optimizes block floating point scales via fine-grained search to cut quantization error by 27% for NVFP4, improving PTQ by up to 15 points on MATH500 for Qwen3-8B and attention PPL by 0.77 on Llama 3.1 70B.

LoKA: Low-precision Kernel Applications for Recommendation Models At Scale

cs.LG · 2026-05-11 · unverdicted · novelty 6.0

LoKA enables practical FP8 use in numerically sensitive large recommendation models via profiling, model adaptations, and runtime kernel orchestration.

Pretraining large language models with MXFP4 on Native FP4 Hardware

cs.LG · 2026-05-11 · unverdicted · novelty 6.0

Weight gradient quantization is the main driver of instability in full-pipeline FP4 LLM training, mitigated by deterministic Hadamard rotations rather than added stochasticity.

LOCALUT: Harnessing Capacity-Computation Tradeoffs for LUT-Based Inference in DRAM-PIM

cs.AR · 2026-04-06 · conditional · novelty 6.0

LOCALUT delivers 1.82x geometric mean speedup for quantized DNN inference on real UPMEM DRAM-PIM devices by using operation-packed LUTs with canonicalization, reordering, and slice streaming.

AdaHOP: Fast and Accurate Low-Precision Training via Outlier-Pattern-Aware Rotation

cs.LG · 2026-04-02 · unverdicted · novelty 6.0

AdaHOP applies pattern-aware Hadamard transforms and selective outlier extraction to enable from-scratch MXFP4 training of LLMs at BF16 quality with up to 3.6X memory compression and 1.46X speedup.

HiFloat4 Format for Language Model Pre-training on Ascend NPUs

cs.LG · 2026-04-09 · unverdicted · novelty 4.0

HiFloat4 FP4 with stabilization techniques trains dense and MoE language models on Ascend NPUs at relative error within 1% of full-precision baselines.

citing papers explorer

Showing 0 of 0 citing papers after filters.

No citing papers match the current filters.

Fp4 all the way: Fully quantized training of llms

fields

years

verdicts

representative citing papers

citing papers explorer