ReSET mitigates accuracy degradation in NVFP4-quantized reasoning models via step-aware entropy-based temperature scaling and provides a small-M CUDA kernel for up to 2.5x kernel speedup and 2x end-to-end speedup.
AMXFP4: Taming activation outliers with asymmetric microscaling floating-point for 4-bit LLM inference
1 Pith paper cite this work. Polarity classification is still indexing.
1
Pith paper citing it
fields
cs.LG 1years
2026 1verdicts
UNVERDICTED 1representative citing papers
citing papers explorer
-
ReSET: Accurate Latency-Critical NVFP4 Reasoning via Step-Aware Temperature Scaling
ReSET mitigates accuracy degradation in NVFP4-quantized reasoning models via step-aware entropy-based temperature scaling and provides a small-M CUDA kernel for up to 2.5x kernel speedup and 2x end-to-end speedup.