OmniDrop is a training-free layer-wise token pruning framework for omni-modal LLMs that uses query guidance and temporal diversity to reduce prefill latency by up to 40% and memory by 14.7% while improving benchmark scores by up to 3.58 points.
Masked autoencoders are scalable vision learners
2 Pith papers cite this work. Polarity classification is still indexing.
years
2026 2verdicts
UNVERDICTED 2representative citing papers
Extending linear LAMs to model exogenous state shows standard reconstruction encodes future exogenous info in latent actions, while endogenous-focused spaces and auxiliary objectives like action-supervision enforce consistency across noise.
citing papers explorer
-
OmniDrop: Layer-wise Token Pruning for Omni-modal LLMs via Query-Guidance
OmniDrop is a training-free layer-wise token pruning framework for omni-modal LLMs that uses query guidance and temporal diversity to reduce prefill latency by up to 40% and memory by 14.7% while improving benchmark scores by up to 3.58 points.
-
Why Latent Actions Fail, and How to Prevent It
Extending linear LAMs to model exogenous state shows standard reconstruction encodes future exogenous info in latent actions, while endogenous-focused spaces and auxiliary objectives like action-supervision enforce consistency across noise.