hub

AMBER: An LLM-free multi-dimensional benchmark for MLLMs hallucination evaluation

Amber: An llm-free multi-dimensional benchmark for mllms hallucination evaluation , author= · 2023 · arXiv 2311.07397

16 Pith papers cite this work. Polarity classification is still indexing.

16 Pith papers citing it

read on arXiv browse 16 citing papers

hub tools

JSON dossier citing papers JSON arXiv source

citation-role summary

baseline 1

citation-polarity summary

baseline 1

representative citing papers

AVID: A Benchmark for Omni-Modal Audio-Visual Inconsistency Understanding via Agent-Driven Construction

cs.MM · 2026-04-15 · unverdicted · novelty 8.0

AVID is the first large-scale benchmark for audio-visual inconsistency detection, grounding, classification, and reasoning in long videos, constructed via agent-driven methods and showing that state-of-the-art models struggle while a fine-tuned baseline improves performance.

OxyEcomBench: Benchmarking Multimodal Foundation Models across E-Commerce Ecosystems

cs.DB · 2026-05-13 · conditional · novelty 7.0

OxyEcomBench is a unified multimodal benchmark covering 6 capability areas and 29 tasks with authentic e-commerce data to measure how well foundation models handle real platform, merchant, and customer challenges.

DO-Bench: An Attributable Benchmark for Diagnosing Object Hallucination in Vision-Language Models

cs.CV · 2026-04-18 · unverdicted · novelty 7.0

DO-Bench is a controlled benchmark that attributes VLM object hallucination errors to textual prior pressure, perceptual limits, or their interaction via two diagnostic dimensions and metrics.

Dual-Pathway Circuits of Object Hallucination in Vision-Language Models

cs.CV · 2026-05-13 · unverdicted · novelty 6.0

Vision-language models contain identifiable grounding and hallucination pathways; suppressing the latter reduces object hallucinations by up to 76% while preserving accuracy.

Learning to See What You Need: Gaze Attention for Multimodal Large Language Models

cs.CV · 2026-05-13 · unverdicted · novelty 6.0

Gaze Attention groups visual embeddings into selectable regions and dynamically restricts attention to task-relevant ones, matching dense baselines with up to 90% fewer visual KV entries via added context tokens.

Vocabulary Hijacking in LVLMs: Unveiling Critical Attention Heads by Excluding Inert Tokens to Mitigate Hallucination

cs.MM · 2026-05-11 · unverdicted · novelty 6.0

LVLMs show vocabulary hijacking by inert tokens that decode to hijacking anchors; HABI locates them, NHAR finds resilient heads, and HAVAE boosts those heads to cut hallucinations.

Through the Lens of Character: Resolving Modality-Role Interference in Multimodal Role-Playing Agent

cs.CV · 2026-05-10 · unverdicted · novelty 6.0

CAVI framework uses character-guided token pruning, orthogonal feature modulation, and modality-adaptive role steering to resolve modality-role interference in multimodal RPAs.

R-CoV: Region-Aware Chain-of-Verification for Alleviating Object Hallucinations in LVLMs

cs.CV · 2026-04-22 · conditional · novelty 6.0

R-CoV is a six-step region-aware chain-of-verification technique that elicits coordinate and description outputs from LVLMs themselves to detect and reduce object hallucinations without external models or retraining.

Mitigating Multimodal Hallucination via Phase-wise Self-reward

cs.CV · 2026-04-20 · unverdicted · novelty 6.0

PSRD mitigates visual hallucinations in LVLMs via phase-wise self-reward decoding, cutting rates by 50% on LLaVA-1.5-7B and outperforming prior methods on five benchmarks.

SIF: Semantically In-Distribution Fingerprints for Large Vision-Language Models

cs.CV · 2026-04-18 · unverdicted · novelty 6.0

SIF creates semantically in-distribution fingerprints for LVLMs by distilling text watermarks into visual inputs and optimizing for robustness against detection and modification.

Uncertainty-Aware Exploratory Direct Preference Optimization for Multimodal Large Language Models

cs.LG · 2026-05-06 · unverdicted · novelty 5.0

UE-DPO quantifies epistemic uncertainty from grounding failures to direct more learning pressure on hard visual tokens in preferred samples while easing penalties on dispreferred ones.

Mitigating Hallucinations in Large Vision-Language Models without Performance Degradation

cs.CV · 2026-04-22 · unverdicted · novelty 5.0

MPD reduces hallucinations in LVLMs by 23.4% while retaining 97.4% of general capability through semantic disentanglement and selective parameter updates.

Mitigating Entangled Steering in Large Vision-Language Models for Hallucination Reduction

cs.CV · 2026-04-09 · unverdicted · novelty 5.0

MESA reduces hallucinations in LVLMs via controlled selective latent intervention that preserves the original token distribution.

Steering the Verifiability of Multimodal AI Hallucinations

cs.AI · 2026-04-08 · unverdicted · novelty 5.0

Researchers create a human-labeled dataset of obvious and elusive multimodal hallucinations and use learned activation-space probes to control their verifiability in MLLMs.

Hallucination of Multimodal Large Language Models: A Survey

cs.CV · 2024-04-29 · accept · novelty 5.0

The survey organizes causes of hallucinations in MLLMs, reviews evaluation benchmarks and metrics, and outlines mitigation approaches plus open questions.

A Survey on Hallucination in Large Vision-Language Models

cs.CV · 2024-02-01 · unverdicted · novelty 3.0

This survey reviews the definition, symptoms, evaluation benchmarks, root causes, and mitigation methods for hallucinations in large vision-language models.

citing papers explorer

Showing 16 of 16 citing papers.

AVID: A Benchmark for Omni-Modal Audio-Visual Inconsistency Understanding via Agent-Driven Construction cs.MM · 2026-04-15 · unverdicted · none · ref 33
AVID is the first large-scale benchmark for audio-visual inconsistency detection, grounding, classification, and reasoning in long videos, constructed via agent-driven methods and showing that state-of-the-art models struggle while a fine-tuned baseline improves performance.
OxyEcomBench: Benchmarking Multimodal Foundation Models across E-Commerce Ecosystems cs.DB · 2026-05-13 · conditional · none · ref 31
OxyEcomBench is a unified multimodal benchmark covering 6 capability areas and 29 tasks with authentic e-commerce data to measure how well foundation models handle real platform, merchant, and customer challenges.
DO-Bench: An Attributable Benchmark for Diagnosing Object Hallucination in Vision-Language Models cs.CV · 2026-04-18 · unverdicted · none · ref 34
DO-Bench is a controlled benchmark that attributes VLM object hallucination errors to textual prior pressure, perceptual limits, or their interaction via two diagnostic dimensions and metrics.
Dual-Pathway Circuits of Object Hallucination in Vision-Language Models cs.CV · 2026-05-13 · unverdicted · none · ref 31
Vision-language models contain identifiable grounding and hallucination pathways; suppressing the latter reduces object hallucinations by up to 76% while preserving accuracy.
Learning to See What You Need: Gaze Attention for Multimodal Large Language Models cs.CV · 2026-05-13 · unverdicted · none · ref 107
Gaze Attention groups visual embeddings into selectable regions and dynamically restricts attention to task-relevant ones, matching dense baselines with up to 90% fewer visual KV entries via added context tokens.
Vocabulary Hijacking in LVLMs: Unveiling Critical Attention Heads by Excluding Inert Tokens to Mitigate Hallucination cs.MM · 2026-05-11 · unverdicted · none · ref 122
LVLMs show vocabulary hijacking by inert tokens that decode to hijacking anchors; HABI locates them, NHAR finds resilient heads, and HAVAE boosts those heads to cut hallucinations.
Through the Lens of Character: Resolving Modality-Role Interference in Multimodal Role-Playing Agent cs.CV · 2026-05-10 · unverdicted · none · ref 27
CAVI framework uses character-guided token pruning, orthogonal feature modulation, and modality-adaptive role steering to resolve modality-role interference in multimodal RPAs.
R-CoV: Region-Aware Chain-of-Verification for Alleviating Object Hallucinations in LVLMs cs.CV · 2026-04-22 · conditional · none · ref 46
R-CoV is a six-step region-aware chain-of-verification technique that elicits coordinate and description outputs from LVLMs themselves to detect and reduce object hallucinations without external models or retraining.
Mitigating Multimodal Hallucination via Phase-wise Self-reward cs.CV · 2026-04-20 · unverdicted · none · ref 47
PSRD mitigates visual hallucinations in LVLMs via phase-wise self-reward decoding, cutting rates by 50% on LLaVA-1.5-7B and outperforming prior methods on five benchmarks.
SIF: Semantically In-Distribution Fingerprints for Large Vision-Language Models cs.CV · 2026-04-18 · unverdicted · none · ref 39
SIF creates semantically in-distribution fingerprints for LVLMs by distilling text watermarks into visual inputs and optimizing for robustness against detection and modification.
Uncertainty-Aware Exploratory Direct Preference Optimization for Multimodal Large Language Models cs.LG · 2026-05-06 · unverdicted · none · ref 33
UE-DPO quantifies epistemic uncertainty from grounding failures to direct more learning pressure on hard visual tokens in preferred samples while easing penalties on dispreferred ones.
Mitigating Hallucinations in Large Vision-Language Models without Performance Degradation cs.CV · 2026-04-22 · unverdicted · none · ref 169
MPD reduces hallucinations in LVLMs by 23.4% while retaining 97.4% of general capability through semantic disentanglement and selective parameter updates.
Mitigating Entangled Steering in Large Vision-Language Models for Hallucination Reduction cs.CV · 2026-04-09 · unverdicted · none · ref 44
MESA reduces hallucinations in LVLMs via controlled selective latent intervention that preserves the original token distribution.
Steering the Verifiability of Multimodal AI Hallucinations cs.AI · 2026-04-08 · unverdicted · none · ref 35
Researchers create a human-labeled dataset of obvious and elusive multimodal hallucinations and use learned activation-space probes to control their verifiability in MLLMs.
Hallucination of Multimodal Large Language Models: A Survey cs.CV · 2024-04-29 · accept · none · ref 164
The survey organizes causes of hallucinations in MLLMs, reviews evaluation benchmarks and metrics, and outlines mitigation approaches plus open questions.
A Survey on Hallucination in Large Vision-Language Models cs.CV · 2024-02-01 · unverdicted · none · ref 44
This survey reviews the definition, symptoms, evaluation benchmarks, root causes, and mitigation methods for hallucinations in large vision-language models.

AMBER: An LLM-free multi-dimensional benchmark for MLLMs hallucination evaluation

hub tools

citation-role summary

citation-polarity summary

fields

years

verdicts

roles

polarities

representative citing papers

citing papers explorer