The paper defines the 4DVLT task for worldline-centered 4D scene understanding, releases Instruct-4D with 129.4K QA pairs, and presents 4DTrack achieving 62.68 TGA_Top1, outperforming adapted baselines by 19.62 points.
Henriques, Andrea Vedaldi, and Philip H
2 Pith papers cite this work. Polarity classification is still indexing.
2
Pith papers citing it
years
2026 2representative citing papers
Hybrid ANN-CANN network for visual object tracking that operationalizes bias-variance complementarity to outperform baselines on nine benchmarks.
citing papers explorer
-
4DVLT: Dynamic Scene Understanding with Worldline-Centered Vision-Language Tracking
The paper defines the 4DVLT task for worldline-centered 4D scene understanding, releases Instruct-4D with 129.4K QA pairs, and presents 4DTrack achieving 62.68 TGA_Top1, outperforming adapted baselines by 19.62 points.
-
A Theory-grounded Hybrid Neural Network Integrating Complementary Estimation Mechanisms for Stable Visual Object TrackingA
Hybrid ANN-CANN network for visual object tracking that operationalizes bias-variance complementarity to outperform baselines on nine benchmarks.