Models frames and words as cooperative game players to value uncertain vision-language correspondences for proposal-free moment localization, reporting superior results on Charades-STA and ActivityNet Caption.
IEEE Transactions on Neural Networks and Learning Systems pp
1 Pith paper cite this work. Polarity classification is still indexing.
1
Pith paper citing it
fields
cs.CV 1years
2026 1verdicts
UNVERDICTED 1representative citing papers
citing papers explorer
-
Rethinking Weakly-supervised Video Temporal Grounding From a Game Perspective
Models frames and words as cooperative game players to value uncertain vision-language correspondences for proposal-free moment localization, reporting superior results on Charades-STA and ActivityNet Caption.