Pyramid vision transformer: A versatile backbone for dense prediction without convolutions

Wenhai Wang, Enze Xie, Xiang Li, Deng-Ping Fan, Kaitao Song, Ding Liang, Tong Lu, Ping Luo, Ling Shao · 2021

2 Pith papers cite this work. Polarity classification is still indexing.

2 Pith papers citing it

browse 2 citing papers

citation-role summary

background 2

citation-polarity summary

background 2

representative citing papers

YOLOv12: Attention-Centric Real-Time Object Detectors

cs.CV · 2025-02-18 · unverdicted · novelty 6.0

YOLOv12 is a new attention-based real-time object detector that reports higher accuracy than YOLOv10, YOLOv11, and RT-DETR variants at comparable or better speed and efficiency.

MobileVLM : A Fast, Strong and Open Vision Language Assistant for Mobile Devices

cs.CV · 2023-12-28 · unverdicted · novelty 5.0

MobileVLM achieves on-par performance with much larger vision-language models on standard benchmarks while delivering state-of-the-art inference speeds of 21.5 tokens per second on Snapdragon 888 CPU and 65.3 on Jetson Orin GPU.

citing papers explorer

Showing 2 of 2 citing papers.

YOLOv12: Attention-Centric Real-Time Object Detectors cs.CV · 2025-02-18 · unverdicted · none · ref 62
YOLOv12 is a new attention-based real-time object detector that reports higher accuracy than YOLOv10, YOLOv11, and RT-DETR variants at comparable or better speed and efficiency.
MobileVLM : A Fast, Strong and Open Vision Language Assistant for Mobile Devices cs.CV · 2023-12-28 · unverdicted · none · ref 122
MobileVLM achieves on-par performance with much larger vision-language models on standard benchmarks while delivering state-of-the-art inference speeds of 21.5 tokens per second on Snapdragon 888 CPU and 65.3 on Jetson Orin GPU.

Pyramid vision transformer: A versatile backbone for dense prediction without convolutions

citation-role summary

citation-polarity summary

fields

years

verdicts

roles

polarities

representative citing papers

citing papers explorer