pith. machine review for the scientific record. sign in

mplug-owl3: Towards long image-sequence understanding in multi-modal large language models

9 Pith papers cite this work. Polarity classification is still indexing.

9 Pith papers citing it

fields

cs.CV 9

years

2026 8 2025 1

verdicts

UNVERDICTED 9

representative citing papers

CGC: Compositional Grounded Contrast for Fine-Grained Multi-Image Understanding

cs.CV · 2026-04-24 · unverdicted · novelty 7.0

CGC improves fine-grained multi-image understanding in MLLMs by constructing contrastive training instances from existing single-image annotations and adding a rule-based spatial reward, achieving SOTA on MIG-Bench and VLM2-Bench with transfer gains to other multimodal tasks.

citing papers explorer

Showing 9 of 9 citing papers.