The synergy between data and multi-modal large language mod- els: A survey from co-development perspective

Zhen Qin, Daoyuan Chen, Wenhao Zhang, Liuyi Yao, Yilun Huang, Bolin Ding, Yaliang Li, Shuiguang Deng · 2024 · arXiv 2407.08583

1 Pith paper cite this work. Polarity classification is still indexing.

1 Pith paper citing it

read on arXiv browse 1 citing papers

representative citing papers

HumanVBench: Probing Human-Centric Video Understanding in MLLMs with Automatically Synthesized Benchmarks

cs.CV · 2024-12-23 · unverdicted · novelty 7.0

HumanVBench provides a 16-task benchmark for human-centric video understanding in MLLMs, created through automated annotation and distractor synthesis pipelines, and shows top models lag human performance on emotion perception and cross-modal alignment.

citing papers explorer

Showing 1 of 1 citing paper.

HumanVBench: Probing Human-Centric Video Understanding in MLLMs with Automatically Synthesized Benchmarks cs.CV · 2024-12-23 · unverdicted · none · ref 45
HumanVBench provides a 16-task benchmark for human-centric video understanding in MLLMs, created through automated annotation and distractor synthesis pipelines, and shows top models lag human performance on emotion perception and cross-modal alignment.

The synergy between data and multi-modal large language mod- els: A survey from co-development perspective

fields

years

verdicts

representative citing papers

citing papers explorer