Allowing each quantization group to select among multiple 4-bit grids improves accuracy over single-grid FP4 for both post-training and pre-training of LLMs.
Winogrande: An adversarial winograd schema challenge at scale.Communications of the ACM, 64(9):99–106
5 Pith papers cite this work. Polarity classification is still indexing.
citation-role summary
citation-polarity summary
roles
background 1polarities
background 1representative citing papers
LLaDA is a scalable diffusion-based language model that matches autoregressive LLMs like LLaMA3 8B on tasks and surpasses GPT-4o on reversal poem completion.
PARSE trains a prompt-aware linear router on dense-model outputs to select dynamic SVD ranks, improving accuracy up to 10% at 0.6 compression ratio on LLaMA-7B while delivering 2.5x prefill and 2.4x decode speedups.
HTAF is a sigmoid-tanh composite that approximates the Heaviside function to allow stable gradient training of binary activation networks, yielding ICBMs with stable discretization and competitive performance on image tasks.
LoPT achieves competitive task performance in LLM post-training by limiting task gradients to the upper model half and training the lower half with local feature reconstruction.
citing papers explorer
-
Grid Games: The Power of Multiple Grids for Quantizing Large Language Models
Allowing each quantization group to select among multiple 4-bit grids improves accuracy over single-grid FP4 for both post-training and pre-training of LLMs.
-
Large Language Diffusion Models
LLaDA is a scalable diffusion-based language model that matches autoregressive LLMs like LLaMA3 8B on tasks and surpasses GPT-4o on reversal poem completion.
-
Different Prompts, Different Ranks: Prompt-aware Dynamic Rank Selection for SVD-based LLM Compression
PARSE trains a prompt-aware linear router on dense-model outputs to select dynamic SVD ranks, improving accuracy up to 10% at 0.6 compression ratio on LLaMA-7B while delivering 2.5x prefill and 2.4x decode speedups.
-
A Composite Activation Function for Learning Stable Binary Representations
HTAF is a sigmoid-tanh composite that approximates the Heaviside function to allow stable gradient training of binary activation networks, yielding ICBMs with stable discretization and competitive performance on image tasks.
-
Rethinking Local Learning: A Cheaper and Faster Recipe for LLM Post-Training
LoPT achieves competitive task performance in LLM post-training by limiting task gradients to the upper model half and training the lower half with local feature reconstruction.