Efficient streaming language models with attention sinks

Guangxuan Xiao, Yuandong Tian, Beidi Chen, Song Han, Mike Lewis · 2024

5 Pith papers cite this work. Polarity classification is still indexing.

5 Pith papers citing it

browse 5 citing papers

citation-role summary

background 1

citation-polarity summary

background 1

representative citing papers

TriAxialKV: Toward Extreme Low-Precision KV-Cache Quantization for Agentic Inference Tasks

cs.LG · 2026-05-16 · unverdicted · novelty 7.0

TriAxialKV introduces triaxial mixed-precision KV-cache quantization that matches BF16 accuracy at 4.5x cache size and 30% higher throughput for a Qwen3-VL agent on OSWorld.

Zero-Ablation Overstates Register Content Dependence in DINO Vision Transformers

cs.CV · 2026-04-15 · accept · novelty 7.0

Zero-ablation overstates register content dependence in DINO ViTs because mean, noise, and cross-image shuffle replacements preserve performance while zeroing does not.

Self Forcing: Bridging the Train-Test Gap in Autoregressive Video Diffusion

cs.CV · 2025-06-09 · unverdicted · novelty 6.0

Self Forcing trains autoregressive video diffusion models by performing autoregressive rollout with KV caching during training to close the exposure bias gap, using a holistic video-level loss and few-step diffusion for efficiency.

HorizonStream: Long-Horizon Attention for Streaming 3D Reconstruction

cs.CV · 2026-05-22 · unverdicted · novelty 5.0

HorizonStream is a long-horizon Transformer that factorizes geometric evidence influence into channel-wise linear attention for long-range temporal propagation and local spatiotemporal attention for short-range matching, claiming stable generalization from 48-frame training to over 10,000-frame test

GHOST: Geometry-Hierarchical Online Streaming Token Eviction for Efficient 3D Reconstruction

cs.CV · 2026-05-15

citing papers explorer

Showing 5 of 5 citing papers.

TriAxialKV: Toward Extreme Low-Precision KV-Cache Quantization for Agentic Inference Tasks cs.LG · 2026-05-16 · unverdicted · none · ref 40
TriAxialKV introduces triaxial mixed-precision KV-cache quantization that matches BF16 accuracy at 4.5x cache size and 30% higher throughput for a Qwen3-VL agent on OSWorld.
Zero-Ablation Overstates Register Content Dependence in DINO Vision Transformers cs.CV · 2026-04-15 · accept · none · ref 13
Zero-ablation overstates register content dependence in DINO ViTs because mean, noise, and cross-image shuffle replacements preserve performance while zeroing does not.
Self Forcing: Bridging the Train-Test Gap in Autoregressive Video Diffusion cs.CV · 2025-06-09 · unverdicted · none · ref 92
Self Forcing trains autoregressive video diffusion models by performing autoregressive rollout with KV caching during training to close the exposure bias gap, using a holistic video-level loss and few-step diffusion for efficiency.
HorizonStream: Long-Horizon Attention for Streaming 3D Reconstruction cs.CV · 2026-05-22 · unverdicted · none · ref 51
HorizonStream is a long-horizon Transformer that factorizes geometric evidence influence into channel-wise linear attention for long-range temporal propagation and local spatiotemporal attention for short-range matching, claiming stable generalization from 48-frame training to over 10,000-frame test
GHOST: Geometry-Hierarchical Online Streaming Token Eviction for Efficient 3D Reconstruction cs.CV · 2026-05-15 · unreviewed · ref 26

Efficient streaming language models with attention sinks

citation-role summary

citation-polarity summary

fields

years

verdicts

roles

polarities

representative citing papers

citing papers explorer