PAPO improves reasoning performance in diffusion LLMs by converting sparse terminal rewards into dense step-wise credit and replaying real high-uncertainty trajectories, reporting gains up to 42.2% on Countdown.
See Different, Think Better: Visual Variations Mitigating Hallucinations in LVLMs , url=
2 Pith papers cite this work. Polarity classification is still indexing.
2
Pith papers citing it
years
2026 2verdicts
UNVERDICTED 2representative citing papers
AVEX-Prune is an RL-based audio-visual token pruning method using modality exchange that maintains near-full performance at 40% token retention on VILA 1.5-8B and VideoLLaMA 2.
citing papers explorer
-
Audio-Visual Exchange-Aware Token Pruning for Efficient Audio-Visual Captioning
AVEX-Prune is an RL-based audio-visual token pruning method using modality exchange that maintains near-full performance at 40% token retention on VILA 1.5-8B and VideoLLaMA 2.