Towards uniﬁed token learning for vision-language tracking

Yaozong Zheng, Bineng Zhong, Qihua Liang, Guorong Li, Rongrong Ji, Xianxian Li · 2024 · arXiv 2023.330193

2 Pith papers cite this work. Polarity classification is still indexing.

2 Pith papers citing it

representative citing papers

4DVLT: Dynamic Scene Understanding with Worldline-Centered Vision-Language Tracking

cs.CV · 2026-06-21 · conditional · novelty 7.0

The paper defines the 4DVLT task for worldline-centered 4D scene understanding, releases Instruct-4D with 129.4K QA pairs, and presents 4DTrack achieving 62.68 TGA_Top1, outperforming adapted baselines by 19.62 points.

A General Differentiable Ray-Wave Framework for Hybrid Refractive-Diffractive System Modeling and Optimization

physics.optics · 2026-05-14 · unverdicted · novelty 6.0

A plug-and-play differentiable model bridging ray and wave optics for hybrid systems that enables end-to-end optimization of planar and conformal diffractive elements.

citing papers explorer

Showing 2 of 2 citing papers.

4DVLT: Dynamic Scene Understanding with Worldline-Centered Vision-Language Tracking cs.CV · 2026-06-21 · conditional · none · ref 56
The paper defines the 4DVLT task for worldline-centered 4D scene understanding, releases Instruct-4D with 129.4K QA pairs, and presents 4DTrack achieving 62.68 TGA_Top1, outperforming adapted baselines by 19.62 points.
A General Differentiable Ray-Wave Framework for Hybrid Refractive-Diffractive System Modeling and Optimization physics.optics · 2026-05-14 · unverdicted · none · ref 98
A plug-and-play differentiable model bridging ray and wave optics for hybrid systems that enables end-to-end optimization of planar and conformal diffractive elements.

Towards uniﬁed token learning for vision-language tracking

fields

years

verdicts

representative citing papers

citing papers explorer