pith. machine review for the scientific record. sign in

hub Baseline reference

Mantis: Interleaved multi-image instruction tuning

Baseline reference. 80% of citing Pith papers use this work as a benchmark or comparison.

11 Pith papers citing it
Baseline 80% of classified citations

hub tools

citation-role summary

dataset 3 background 1 baseline 1

citation-polarity summary

fields

cs.CV 9 cs.CL 2

representative citing papers

CGC: Compositional Grounded Contrast for Fine-Grained Multi-Image Understanding

cs.CV · 2026-04-24 · unverdicted · novelty 7.0

CGC improves fine-grained multi-image understanding in MLLMs by constructing contrastive training instances from existing single-image annotations and adding a rule-based spatial reward, achieving SOTA on MIG-Bench and VLM2-Bench with transfer gains to other multimodal tasks.

Improving Video Generation with Human Feedback

cs.CV · 2025-01-23 · unverdicted · novelty 6.0

A human preference dataset and VideoReward model enable Flow-DPO and Flow-NRG to produce smoother, better-aligned videos from text prompts in flow-based generators.

LLaVA-OneVision: Easy Visual Task Transfer

cs.CV · 2024-08-06 · unverdicted · novelty 5.0

LLaVA-OneVision is the first single open LMM to simultaneously achieve strong performance in single-image, multi-image, and video scenarios with cross-scenario transfer capabilities.

citing papers explorer

Showing 11 of 11 citing papers.