Audio-Contrastive Preference Optimization (ACPO) mitigates audio hallucination in AVLMs via output-contrastive and input-contrastive objectives that enforce faithful audio grounding.
Title resolution pending
4 Pith papers cite this work. Polarity classification is still indexing.
years
2026 4verdicts
UNVERDICTED 4representative citing papers
Temporal Contrastive Decoding mitigates temporal smoothing bias in unified large audio-language models by contrasting logits from original and blurred audio inputs during decoding, yielding consistent gains on MMAU and AIR-Bench.
STEAR reduces spatial and temporal hallucinations in Video-LLMs via layer-aware evidence intervention from middle decoder layers in a single-encode pass.
The survey organizes hallucinations in Vid-LLMs into dynamic distortion and content fabrication, reviews evaluation benchmarks and mitigation methods, and traces root causes to weak temporal modeling and visual grounding.
citing papers explorer
-
Don't Let the Video Speak: Audio-Contrastive Preference Optimization for Audio-Visual Language Models
Audio-Contrastive Preference Optimization (ACPO) mitigates audio hallucination in AVLMs via output-contrastive and input-contrastive objectives that enforce faithful audio grounding.
-
Temporal Contrastive Decoding: A Training-Free Method for Large Audio-Language Models
Temporal Contrastive Decoding mitigates temporal smoothing bias in unified large audio-language models by contrasting logits from original and blurred audio inputs during decoding, yielding consistent gains on MMAU and AIR-Bench.
-
STEAR: Layer-Aware Spatiotemporal Evidence Intervention for Hallucination Mitigation in Video Large Language Models
STEAR reduces spatial and temporal hallucinations in Video-LLMs via layer-aware evidence intervention from middle decoder layers in a single-encode pass.
-
Distorted or Fabricated? A Survey on Hallucination in Video LLMs
The survey organizes hallucinations in Vid-LLMs into dynamic distortion and content fabrication, reviews evaluation benchmarks and mitigation methods, and traces root causes to weak temporal modeling and visual grounding.