RQ-MoE combines two-level MoE with dual-stream quantization to create input-dependent codebooks that recover prior methods as special cases and deliver 6-14x faster decoding with on-par reconstruction and retrieval performance.
Title resolution pending
1 Pith paper cite this work. Polarity classification is still indexing.
1
Pith paper citing it
fields
cs.LG 1years
2026 1verdicts
UNVERDICTED 1representative citing papers
citing papers explorer
-
RQ-MoE: Residual Quantization via Mixture of Experts for Efficient Input-Dependent Vector Compression
RQ-MoE combines two-level MoE with dual-stream quantization to create input-dependent codebooks that recover prior methods as special cases and deliver 6-14x faster decoding with on-par reconstruction and retrieval performance.