pith. sign in

hub Mixed citations

Mon- key: Image resolution and text label are important things for large multi-modal models

Mixed citation behavior. Most common role is background (62%).

15 Pith papers citing it
Background 62% of classified citations

hub tools

citation-role summary

background 5 baseline 1 dataset 1 method 1

citation-polarity summary

fields

cs.CV 15

representative citing papers

Are We on the Right Way for Evaluating Large Vision-Language Models?

cs.CV · 2024-03-29 · conditional · novelty 6.0

Current LVLM benchmarks overestimate capabilities because many questions can be answered without images due to design flaws or data leakage; MMStar is a human-curated set of 1,500 vision-indispensable samples across 6 capabilities and 18 axes with new metrics for leakage and true multi-modal gain.

MMBench: Is Your Multi-modal Model an All-around Player?

cs.CV · 2023-07-12 · accept · novelty 6.0

MMBench is a new bilingual benchmark that uses curated questions, CircularEval, and LLM-assisted answer conversion to provide objective, fine-grained evaluation of vision-language models.

A Survey on Multimodal Large Language Models

cs.CV · 2023-06-23 · accept · novelty 3.0

This survey organizes the architectures, training strategies, data, evaluation methods, extensions, and challenges of Multimodal Large Language Models.

citing papers explorer

Showing 15 of 15 citing papers.