LayerBoost selectively replaces or removes attention in non-critical transformer layers to cut inference latency up to 68% while recovering quality via brief distillation.
The Twelfth International Conference on Learning Representations , year=
2 Pith papers cite this work. Polarity classification is still indexing.
2
Pith papers citing it
years
2026 2verdicts
UNVERDICTED 2representative citing papers
LaplacianFormer uses a Laplacian kernel with an injective feature map and efficient approximations to achieve linear attention that preserves mid-range interactions better than Gaussian-based linear attention in vision transformers.
citing papers explorer
-
LayerBoost: Layer-Aware Attention Reduction for Efficient LLMs
LayerBoost selectively replaces or removes attention in non-critical transformer layers to cut inference latency up to 68% while recovering quality via brief distillation.
-
LaplacianFormer:Rethinking Linear Attention with Laplacian Kernel
LaplacianFormer uses a Laplacian kernel with an injective feature map and efficient approximations to achieve linear attention that preserves mid-range interactions better than Gaussian-based linear attention in vision transformers.