Four Over Six adaptively scales blocks in NVFP4 quantization to smaller FP4 values, making representable value distributions more uniform and reducing quantization error especially for near-maximal values.
Recipes for Pre-training LLMs with MXFP8, August 2025
3 Pith papers cite this work. Polarity classification is still indexing.
citation-role summary
citation-polarity summary
roles
background 1polarities
unclear 1representative citing papers
StoSignSGD resolves SignSGD divergence on non-smooth objectives via structural stochasticity, matching optimal convex rates and improving non-convex bounds while delivering 1.44-2.14x speedups in FP8 LLM pretraining.
OSP-Next reports 83.73% VBench score and up to 2.27x speedup via hybrid sparse attention, SSP parallelism, HiF8 quantization, and Mix-GRPO on diffusion transformers.
citing papers explorer
-
Four Over Six: More Accurate NVFP4 Quantization with Adaptive Block Scaling
Four Over Six adaptively scales blocks in NVFP4 quantization to smaller FP4 values, making representable value distributions more uniform and reducing quantization error especially for near-maximal values.
-
StoSignSGD: Unbiased Structural Stochasticity Fixes SignSGD for Training Large Language Models
StoSignSGD resolves SignSGD divergence on non-smooth objectives via structural stochasticity, matching optimal convex rates and improving non-convex bounds while delivering 1.44-2.14x speedups in FP8 LLM pretraining.
-
OSP-Next: Efficient High-Quality Video Generation with Sparse Sequence Parallelism, HiF8 Quantization, and Reinforcement Learning
OSP-Next reports 83.73% VBench score and up to 2.27x speedup via hybrid sparse attention, SSP parallelism, HiF8 quantization, and Mix-GRPO on diffusion transformers.