SPEAR places input-dependent error compensators at CKA-selected layers and fuses them into low-bit GEMMs to recover 56-75% of the W4-to-FP16 perplexity gap with <1% memory overhead and near-baseline latency.
Title resolution pending
2 Pith papers cite this work. Polarity classification is still indexing.
2
Pith papers citing it
years
2026 2verdicts
UNVERDICTED 2representative citing papers
VSRAQ is a MoE-specific quantization objective that combines value and structure alignment to preserve expert-selection behavior and reduce quality loss without inference overhead.
citing papers explorer
-
SPEAR: A System for Post-Quantization Error-Adaptive Recovery Enabling Efficient Low-Bit LLM Serving
SPEAR places input-dependent error compensators at CKA-selected layers and fuses them into low-bit GEMMs to recover 56-75% of the W4-to-FP16 perplexity gap with <1% memory overhead and near-baseline latency.
-
Value-and-Structure Alignment for Routing-Consistent Quantization of Mixture-of-Experts Models
VSRAQ is a MoE-specific quantization objective that combines value and structure alignment to preserve expert-selection behavior and reduce quality loss without inference overhead.