Gemini Ultra reaches human-expert performance on MMLU for the first time and sets new state-of-the-art results on 30 of 32 benchmarks, including all 20 multimodal ones tested.
Self-chained image-language model for video localization and question answering
2 Pith papers cite this work. Polarity classification is still indexing.
2
Pith papers citing it
representative citing papers
Introduces a plug-and-play framework that generates varied texts and uses attribute reasoning plus video-guided loss to improve state-of-the-art Video-Language Models.
citing papers explorer
-
Gemini: A Family of Highly Capable Multimodal Models
Gemini Ultra reaches human-expert performance on MMLU for the first time and sets new state-of-the-art results on 30 of 32 benchmarks, including all 20 multimodal ones tested.
-
Rethinking Video-Language Model from the Language Input Perspective
Introduces a plug-and-play framework that generates varied texts and uses attribute reasoning plus video-guided loss to improve state-of-the-art Video-Language Models.