A Twins-SVT vision transformer backbone with multiscale CNN decoder and Category Focus Module auxiliary task reduces MAE by 33-64% on VisDrone and iSAID multi-class counting benchmarks versus prior density estimators.
A convolutional neural-network-based pedestrian counting model for various crowded scenes.Computer-Aided Civil and Infrastructure Engineering, 34(10):897–914
1 Pith paper cite this work. Polarity classification is still indexing.
1
Pith paper citing it
fields
cs.CV 1years
2025 1verdicts
UNVERDICTED 1representative citing papers
citing papers explorer
-
Getting the Numbers Right$\unicode{x2014}$Modelling Multi-Class Object Counting in Dense and Varied Scenes
A Twins-SVT vision transformer backbone with multiscale CNN decoder and Category Focus Module auxiliary task reduces MAE by 33-64% on VisDrone and iSAID multi-class counting benchmarks versus prior density estimators.