DMLR performs dynamic visual-textual interleaving in latent space using confidence-guided latent policy gradient optimization and a dynamic visual injection strategy, yielding improved multimodal reasoning on benchmarks.
Reducing hallucinations in large vision-language models via latent space steering
2 Pith papers cite this work. Polarity classification is still indexing.
2
Pith papers citing it
representative citing papers
Generative VLAs hallucinate physically invalid actions due to topological, precision, and horizon mismatches between model architectures and feasible robot behavior.
citing papers explorer
-
Action Hallucination in Generative Vision-Language-Action Models
Generative VLAs hallucinate physically invalid actions due to topological, precision, and horizon mismatches between model architectures and feasible robot behavior.