hub

arXiv preprint arXiv:2202.07800 , year=

Liang, Y · 2022 · arXiv 2202.07800

16 Pith papers cite this work. Polarity classification is still indexing.

16 Pith papers citing it

read on arXiv browse 16 citing papers

hub tools

JSON dossier citing papers JSON arXiv source

citation-role summary

background 3

citation-polarity summary

background 3

representative citing papers

CoReDiT: Spatial Coherence-Guided Token Pruning and Reconstruction for Efficient Diffusion Transformers

cs.CV · 2026-05-13 · unverdicted · novelty 7.0

CoReDiT reduces self-attention FLOPs in DiTs by up to 55% via linear-time spatial coherence pruning and neighbor-based reconstruction, delivering 1.33x-1.72x speedups with maintained quality.

Why Training-Free Token Reduction Collapses: The Inherent Instability of Pairwise Scoring Signals

cs.AI · 2026-04-17 · unverdicted · novelty 7.0

Pairwise scoring signals in Vision Transformer token reduction are inherently unstable due to high perturbation counts and degrade in deep layers, causing collapse, while unary signals with triage enable CATIS to retain 96.9% accuracy at 63% FLOPs reduction on ViT-Large ImageNet-1K.

When Attention Collapses: Stage-Aware Visual Token Pruning from Structure to Semantics

cs.CV · 2026-06-02 · unverdicted · novelty 6.0

STS is a two-stage pruning framework that decouples structural diversity via repulsion sampling from semantic filtering via cross-attention to reduce redundancy in visual tokens for VLMs.

See Less, Specify More: Visual Evidence Budgets for Generalizable VLAs

cs.RO · 2026-06-01 · unverdicted · novelty 6.0

S2 improves generalization in vision-language-action models by using goal-preserving refined language guidance and explicit visual evidence budgets, raising mean subtask success from 54.2% to 79.0% on eight real-robot tasks compared to pi0.5.

SToRe3D: Sparse Token Relevance in ViTs for Efficient Multi-View 3D Object Detection

cs.CV · 2026-05-13 · unverdicted · novelty 6.0

SToRe3D delivers up to 3x faster inference for multi-view 3D object detection in ViTs by selecting relevant 2D tokens and 3D queries via mutual relevance heads with only marginal accuracy loss.

Provable Sparse Inversion and Token Relabel Enhanced One-shot Federated Learning with ViTs

cs.LG · 2026-05-11 · unverdicted · novelty 6.0

FedMITR uses sparse model inversion and token relabeling to improve one-shot federated learning with ViTs under non-IID conditions, delivering a tighter generalization bound via algorithmic stability analysis and better empirical performance.

VideoRouter: Query-Adaptive Dual Routing for Efficient Long-Video Understanding

cs.CV · 2026-05-07 · unverdicted · novelty 6.0 · 2 refs

VideoRouter uses dual semantic and image routers for query-adaptive token compression in long-video models, delivering up to 67.9% reduction while outperforming the InternVL baseline on VideoMME, MLVU, and LongVideoBench.

MaMe & MaRe: Matrix-Based Token Merging and Restoration for Efficient Visual Perception and Synthesis

cs.CV · 2026-04-15 · unverdicted · novelty 6.0

MaMe is a differentiable matrix-only token merging method that doubles ViT-B throughput with a 2% accuracy drop on pre-trained models and enables faster, higher-quality image synthesis when paired with MaRe.

Accelerating Vision Transformers with Adaptive Patch Sizes

cs.CV · 2025-10-20 · conditional · novelty 6.0

APT adaptively varies patch sizes within a single image to reduce ViT token count, delivering 40-50% throughput gains on large models with no downstream performance loss.

One Trajectory, One Token: Grounded Video Tokenization via Panoptic Sub-object Trajectory

cs.CV · 2025-05-29 · unverdicted · novelty 6.0

TrajViT tokenizes videos via panoptic sub-object trajectories, achieving 10x token reduction and outperforming ViT3D by 6% on retrieval and 5.2% on VideoQA tasks with faster training and inference.

MVPruner: Dynamic Token Pruning for Accelerating Multi-view Vision-Language Models in Autonomous Driving

cs.CV · 2026-06-26 · unverdicted · novelty 5.0 · 2 refs

MVPruner is a two-stage adaptive token pruning technique for multi-view VLMs that achieves 87.3% FLOPs reduction and 4.97x prefilling speedup while retaining 98.5% accuracy on DriveLM.

VisionPulse: Dynamic Visual Sparsity for Efficient Multimodal Reasoning

cs.CV · 2026-05-29 · unverdicted · novelty 5.0

VisionPulse is a step-wise visual token pruning method for LMMs that retains 5% of tokens per step, shortens reasoning traces by 11.2%, and maintains accuracy.

ASAP: Attention Sink Anchored Pruning

cs.LG · 2026-05-21 · unverdicted · novelty 5.0

ASAP prunes tokens in ViTs by anchoring on attention sinks modeled as lazy random walks, using cumulative transition matrices and radial diffusion clustering to compress redundancy while preserving accuracy.

Temporal Aware Pruning for Efficient Diffusion-based Video Generation

cs.CV · 2026-05-18 · unverdicted · novelty 5.0 · 2 refs

TAPE applies temporal-aware token pruning with smoothing, reselection, and timestep scheduling to speed up video diffusion models while preserving visual fidelity and coherence.

Revisiting Token Compression for Accelerating ViT-based Sparse Multi-View 3D Object Detectors

cs.CV · 2026-04-16 · conditional · novelty 5.0

SEPatch3D accelerates ViT-based 3D object detectors up to 57% faster than StreamPETR via dynamic patch sizing and cross-granularity enhancement while keeping comparable accuracy on nuScenes and Argoverse 2.

CATP: Confidence-Aware Token Pruning for Camouflaged Object Detection

cs.CV · 2026-04-18 · unverdicted · novelty 4.0

CATP prunes low-confidence tokens in COD Transformers and uses dual-path compensation to cut computation while preserving segmentation accuracy on boundary regions.

citing papers explorer

Showing 0 of 0 citing papers after filters.

No citing papers match the current filters.

arXiv preprint arXiv:2202.07800 , year=

hub tools

citation-role summary

citation-polarity summary

fields

years

verdicts

roles

polarities

representative citing papers

citing papers explorer