Allowing each quantization group to select among multiple 4-bit grids improves accuracy over single-grid FP4 for both post-training and pre-training of LLMs.
Measuring massive multitask language understanding.Proceedings of the International Conference on Learning Representations (ICLR)
4 Pith papers cite this work. Polarity classification is still indexing.
years
2026 4representative citing papers
LogiHard hardens reasoning benchmarks by transforming 0-order selection into 2-order judgment, causing 31-56% accuracy drops in 12 frontier LLMs and a 47% drop on zero-shot MMLU, revealing a combinatorial reasoning gap rather than knowledge deficits.
EMO pretrains MoEs using document boundaries to induce semantic expert specialization, enabling modular subset deployment with minimal accuracy loss unlike standard MoEs.
An ensemble-based information-theoretic active learning method using ensemble Kalman inversion selects valuable tasks to optimize communication structures in LLM multi-agent systems more reliably than random sampling under limited training budgets.
citing papers explorer
-
Grid Games: The Power of Multiple Grids for Quantizing Large Language Models
Allowing each quantization group to select among multiple 4-bit grids improves accuracy over single-grid FP4 for both post-training and pre-training of LLMs.
-
From 0-Order Selection to 2-Order Judgment: Combinatorial Hardening Exposes Compositional Failures in Frontier LLMs
LogiHard hardens reasoning benchmarks by transforming 0-order selection into 2-order judgment, causing 31-56% accuracy drops in 12 frontier LLMs and a 47% drop on zero-shot MMLU, revealing a combinatorial reasoning gap rather than knowledge deficits.
-
EMO: Pretraining Mixture of Experts for Emergent Modularity
EMO pretrains MoEs using document boundaries to induce semantic expert specialization, enabling modular subset deployment with minimal accuracy loss unlike standard MoEs.
-
Active Learning for Communication Structure Optimization in LLM-Based Multi-Agent Systems
An ensemble-based information-theoretic active learning method using ensemble Kalman inversion selects valuable tasks to optimize communication structures in LLM multi-agent systems more reliably than random sampling under limited training budgets.