Self-chained image-language model for video localization and question answering

Yu, S · 2023 · arXiv 2305.06988

2 Pith papers cite this work. Polarity classification is still indexing.

2 Pith papers citing it

representative citing papers

Gemini: A Family of Highly Capable Multimodal Models

cs.CL · 2023-12-19 · conditional · novelty 6.0

Gemini Ultra reaches human-expert performance on MMLU for the first time and sets new state-of-the-art results on 30 of 32 benchmarks, including all 20 multimodal ones tested.

Rethinking Video-Language Model from the Language Input Perspective

cs.CV · 2026-05-27 · unverdicted · novelty 5.0

Introduces a plug-and-play framework that generates varied texts and uses attribute reasoning plus video-guided loss to improve state-of-the-art Video-Language Models.

citing papers explorer

Showing 2 of 2 citing papers.

Gemini: A Family of Highly Capable Multimodal Models cs.CL · 2023-12-19 · conditional · none · ref 128
Gemini Ultra reaches human-expert performance on MMLU for the first time and sets new state-of-the-art results on 30 of 32 benchmarks, including all 20 multimodal ones tested.
Rethinking Video-Language Model from the Language Input Perspective cs.CV · 2026-05-27 · unverdicted · none · ref 72
Introduces a plug-and-play framework that generates varied texts and uses attribute reasoning plus video-guided loss to improve state-of-the-art Video-Language Models.

Self-chained image-language model for video localization and question answering

fields

years

verdicts

representative citing papers

citing papers explorer