pith. machine review for the scientific record. sign in

Mmiu: Multimodal multi-image understanding for evaluating large vision-language models

6 Pith papers cite this work. Polarity classification is still indexing.

6 Pith papers citing it

citation-role summary

dataset 3

citation-polarity summary

fields

cs.CV 6

roles

dataset 3

polarities

use dataset 3

representative citing papers

CGC: Compositional Grounded Contrast for Fine-Grained Multi-Image Understanding

cs.CV · 2026-04-24 · unverdicted · novelty 7.0

CGC improves fine-grained multi-image understanding in MLLMs by constructing contrastive training instances from existing single-image annotations and adding a rule-based spatial reward, achieving SOTA on MIG-Bench and VLM2-Bench with transfer gains to other multimodal tasks.

citing papers explorer

Showing 6 of 6 citing papers.