The paper defines the 4DVLT task for worldline-centered 4D scene understanding, releases Instruct-4D with 129.4K QA pairs, and presents 4DTrack achieving 62.68 TGA_Top1, outperforming adapted baselines by 19.62 points.
GOT-10k: A Large High-Diversity Benchmark for Generic Object Tracking in the Wild , volume=
3 Pith papers cite this work. Polarity classification is still indexing.
years
2026 3representative citing papers
SymTrack is the first systematic detection-free framework for scene text tracking that constructs benchmarks from video text spotting datasets and reports up to 11.97% AUC gains over prior trackers.
Hybrid ANN-CANN network for visual object tracking that operationalizes bias-variance complementarity to outperform baselines on nine benchmarks.
citing papers explorer
-
4DVLT: Dynamic Scene Understanding with Worldline-Centered Vision-Language Tracking
The paper defines the 4DVLT task for worldline-centered 4D scene understanding, releases Instruct-4D with 129.4K QA pairs, and presents 4DTrack achieving 62.68 TGA_Top1, outperforming adapted baselines by 19.62 points.
-
Beyond Detection: A Structure-Aware Framework for Scene Text Tracking
SymTrack is the first systematic detection-free framework for scene text tracking that constructs benchmarks from video text spotting datasets and reports up to 11.97% AUC gains over prior trackers.
-
A Theory-grounded Hybrid Neural Network Integrating Complementary Estimation Mechanisms for Stable Visual Object TrackingA
Hybrid ANN-CANN network for visual object tracking that operationalizes bias-variance complementarity to outperform baselines on nine benchmarks.