Alignvlm: Bridg- ing vision and language latent spaces for multimodal document understanding.arXiv preprint arXiv:2502.01341

Ahmed Masry, Juan A · 2025 · arXiv 2502.01341

2 Pith papers cite this work. Polarity classification is still indexing.

2 Pith papers citing it

representative citing papers

Reflection Anchors for Propagation-Aware Visual Retention in Long-Chain Multimodal Reasoning

cs.CV · 2026-05-10 · unverdicted · novelty 7.0

RAPO uses an information-theoretic lower bound on visual gain to select high-entropy reflection anchors and optimizes a chain-masked KL surrogate, delivering gains over baselines on reasoning benchmarks across LVLM backbones.

When Language Overwrites Vision: Over-Alignment and Geometric Debiasing in Vision-Language Models

cs.CV · 2026-05-07 · unverdicted · novelty 6.0

Decoder-based VLMs hallucinate due to geometric over-alignment of visual embeddings with the text manifold in a universal dataset-agnostic subspace, mitigated by projecting out the linguistic bias.

citing papers explorer

Showing 2 of 2 citing papers.

Reflection Anchors for Propagation-Aware Visual Retention in Long-Chain Multimodal Reasoning cs.CV · 2026-05-10 · unverdicted · none · ref 12
RAPO uses an information-theoretic lower bound on visual gain to select high-entropy reflection anchors and optimizes a chain-masked KL surrogate, delivering gains over baselines on reasoning benchmarks across LVLM backbones.
When Language Overwrites Vision: Over-Alignment and Geometric Debiasing in Vision-Language Models cs.CV · 2026-05-07 · unverdicted · none · ref 13
Decoder-based VLMs hallucinate due to geometric over-alignment of visual embeddings with the text manifold in a universal dataset-agnostic subspace, mitigated by projecting out the linguistic bias.

Alignvlm: Bridg- ing vision and language latent spaces for multimodal document understanding.arXiv preprint arXiv:2502.01341

fields

years

verdicts

representative citing papers

citing papers explorer