Token Merging (ToMe) doubles the throughput of large Vision Transformers on images, video, and audio by merging similar tokens with a fast matching algorithm, incurring only 0.2-0.4% accuracy loss.
For all results, im/s indicates throughput and “speed” indicates improvement over the baseline
1 Pith paper cite this work. Polarity classification is still indexing.
1
Pith paper citing it
fields
cs.CV 1years
2022 1verdicts
UNVERDICTED 1representative citing papers
citing papers explorer
-
Token Merging: Your ViT But Faster
Token Merging (ToMe) doubles the throughput of large Vision Transformers on images, video, and audio by merging similar tokens with a fast matching algorithm, incurring only 0.2-0.4% accuracy loss.