Training llms with mxfp4

Albert Tseng, Tao Yu, Youngsuk Park · 2025 · arXiv 2502.20586

3 Pith papers cite this work. Polarity classification is still indexing.

3 Pith papers citing it

representative citing papers

Grid Games: The Power of Multiple Grids for Quantizing Large Language Models

cs.LG · 2026-05-12 · accept · novelty 8.0

Allowing each quantization group to select among multiple 4-bit grids improves accuracy over single-grid FP4 for both post-training and pre-training of LLMs.

Search Your Block Floating Point Scales!

cs.LG · 2026-05-12 · unverdicted · novelty 6.0

ScaleSearch optimizes block floating point scales via fine-grained search to cut quantization error by 27% for NVFP4, improving PTQ by up to 15 points on MATH500 for Qwen3-8B and attention PPL by 0.77 on Llama 3.1 70B.

VFA: Relieving Vector Operations in Flash Attention with Global Maximum Pre-computation

cs.LG · 2026-04-14 · unverdicted · novelty 5.0

VFA optimizes Flash Attention by pre-computing global max approximations from key blocks and reordering traversal to reduce vector bottlenecks while preserving exact computation.

citing papers explorer

Showing 3 of 3 citing papers.

Grid Games: The Power of Multiple Grids for Quantizing Large Language Models cs.LG · 2026-05-12 · accept · none · ref 38
Allowing each quantization group to select among multiple 4-bit grids improves accuracy over single-grid FP4 for both post-training and pre-training of LLMs.
Search Your Block Floating Point Scales! cs.LG · 2026-05-12 · unverdicted · none · ref 158
ScaleSearch optimizes block floating point scales via fine-grained search to cut quantization error by 27% for NVFP4, improving PTQ by up to 15 points on MATH500 for Qwen3-8B and attention PPL by 0.77 on Llama 3.1 70B.
VFA: Relieving Vector Operations in Flash Attention with Global Maximum Pre-computation cs.LG · 2026-04-14 · unverdicted · none · ref 24
VFA optimizes Flash Attention by pre-computing global max approximations from key blocks and reordering traversal to reduce vector bottlenecks while preserving exact computation.

Training llms with mxfp4

fields

years

verdicts

representative citing papers

citing papers explorer