Lexico: Extreme kv cache compression via sparse coding over universal dictionaries

Lexico: Extreme KV Cache Compression via Sparse Coding over Universal Dictionaries , author= · 2024 · arXiv 2412.08890

3 Pith papers cite this work. Polarity classification is still indexing.

3 Pith papers citing it

representative citing papers

cs.AI · 2026-05-06 · unverdicted · novelty 6.0

ZAYA1-8B is a reasoning MoE model with 700M active parameters that matches larger models on math and coding benchmarks and reaches 91.9% on AIME'25 via Markovian RSA test-time compute.

TurboQuant: Online Vector Quantization with Near-optimal Distortion Rate

cs.LG · 2025-04-28 · unverdicted · novelty 6.0

TurboQuant achieves near-optimal vector quantization distortion for both MSE and inner products via random rotation and per-coordinate scalar quantization, with a formal proof that it matches lower bounds within a factor of approximately 2.7.

ZONOS2 Technical Report

cs.SD · 2026-06-23 · unverdicted · novelty 4.0

ZONOS2 8B is a scaled MoE TTS model with 900M active parameters trained on 6M hours of data that reports competitive SOTA results on naturalness, speaker similarity, WER, and a new ZTTS1-Eval benchmark while releasing weights and code.

citing papers explorer

Showing 3 of 3 citing papers after filters.

ZAYA1-8B Technical Report cs.AI · 2026-05-06 · unverdicted · none · ref 142
ZAYA1-8B is a reasoning MoE model with 700M active parameters that matches larger models on math and coding benchmarks and reaches 91.9% on AIME'25 via Markovian RSA test-time compute.
TurboQuant: Online Vector Quantization with Near-optimal Distortion Rate cs.LG · 2025-04-28 · unverdicted · none · ref 36
TurboQuant achieves near-optimal vector quantization distortion for both MSE and inner products via random rotation and per-coordinate scalar quantization, with a formal proof that it matches lower bounds within a factor of approximately 2.7.
ZONOS2 Technical Report cs.SD · 2026-06-23 · unverdicted · none · ref 167
ZONOS2 8B is a scaled MoE TTS model with 900M active parameters trained on 6M hours of data that reports competitive SOTA results on naturalness, speaker similarity, WER, and a new ZTTS1-Eval benchmark while releasing weights and code.

Lexico: Extreme kv cache compression via sparse coding over universal dictionaries

fields

years

verdicts

representative citing papers

citing papers explorer