Evaluating object hallucination in large vision-language models

Yifan Li, Yifan Du, Kun Zhou, Jinpeng Wang, Xin Zhao, Ji-Rong Wen · 2023

4 Pith papers cite this work. Polarity classification is still indexing.

4 Pith papers citing it

browse 4 citing papers

citation-role summary

background 1

citation-polarity summary

background 1

representative citing papers

UHR-Micro: Diagnosing and Mitigating the Resolution Illusion in Earth Observation VLMs

cs.CV · 2026-05-12 · unverdicted · novelty 7.0

VLMs show a resolution illusion on UHR Earth observation imagery where higher resolution does not improve micro-target perception; UHR-Micro benchmark and MAP-Agent address this via evidence-centered active inspection.

Vision-OPD: Learning to See Fine Details for Multimodal LLMs via On-Policy Self-Distillation

cs.CV · 2026-05-18 · unverdicted · novelty 6.0

Vision-OPD transfers an MLLM's privileged regional perception to its full-image policy through on-policy token-level self-distillation, yielding competitive results on fine-grained visual benchmarks.

Unlocking Dense Metric Depth Estimation in VLMs

cs.CV · 2026-05-15 · unverdicted · novelty 6.0 · 2 refs

DepthVLM converts a standard VLM into a dense metric depth predictor by attaching a lightweight head and training under unified vision-text supervision, outperforming prior VLMs and some pure vision models on a new indoor-outdoor benchmark.

Anisotropic Modality Align

cs.MM · 2026-05-08 · unverdicted · novelty 6.0

Modality representations share dominant semantic geometry but have an anisotropic residual gap; AnisoAlign corrects source representations boundedly using target geometry for unpaired alignment.

citing papers explorer

Showing 4 of 4 citing papers.

UHR-Micro: Diagnosing and Mitigating the Resolution Illusion in Earth Observation VLMs cs.CV · 2026-05-12 · unverdicted · none · ref 14
VLMs show a resolution illusion on UHR Earth observation imagery where higher resolution does not improve micro-target perception; UHR-Micro benchmark and MAP-Agent address this via evidence-centered active inspection.
Vision-OPD: Learning to See Fine Details for Multimodal LLMs via On-Policy Self-Distillation cs.CV · 2026-05-18 · unverdicted · none · ref 18
Vision-OPD transfers an MLLM's privileged regional perception to its full-image policy through on-policy token-level self-distillation, yielding competitive results on fine-grained visual benchmarks.
Unlocking Dense Metric Depth Estimation in VLMs cs.CV · 2026-05-15 · unverdicted · none · ref 30 · 2 links
DepthVLM converts a standard VLM into a dense metric depth predictor by attaching a lightweight head and training under unified vision-text supervision, outperforming prior VLMs and some pure vision models on a new indoor-outdoor benchmark.
Anisotropic Modality Align cs.MM · 2026-05-08 · unverdicted · none · ref 8
Modality representations share dominant semantic geometry but have an anisotropic residual gap; AnisoAlign corrects source representations boundedly using target geometry for unpaired alignment.

Evaluating object hallucination in large vision-language models

citation-role summary

citation-polarity summary

fields

years

verdicts

roles

polarities

representative citing papers

citing papers explorer