Extremely quantized LLMs degrade in smoothness, sparsifying the decoding tree and hurting generation quality; a smoothness-preserving principle delivers gains beyond numerical fitting.
Title resolution pending
2 Pith papers cite this work. Polarity classification is still indexing.
years
2026 2verdicts
UNVERDICTED 2representative citing papers
EnergyFlow shows that denoising score matching on diffusion policies recovers the gradient of the expert's soft Q-function under maximum-entropy optimality, enabling non-adversarial reward extraction and improved policy generalization.
citing papers explorer
-
Fitting Is Not Enough: Smoothness in Extremely Quantized LLMs
Extremely quantized LLMs degrade in smoothness, sparsifying the decoding tree and hurting generation quality; a smoothness-preserving principle delivers gains beyond numerical fitting.
-
Recovering Hidden Reward in Diffusion-Based Policies
EnergyFlow shows that denoising score matching on diffusion policies recovers the gradient of the expert's soft Q-function under maximum-entropy optimality, enabling non-adversarial reward extraction and improved policy generalization.