Docvqa: A dataset for vqa on document images

Minesh Mathew, Dimosthenis Karatzas, CV Jawahar · 2021

2 Pith papers cite this work. Polarity classification is still indexing.

2 Pith papers citing it

browse 2 citing papers

representative citing papers

ViSurf: Visual Supervised-and-Reinforcement Fine-Tuning for Large Vision-and-Language Models

cs.CV · 2025-10-12 · unverdicted · novelty 6.0

ViSurf unifies SFT and RLVR for LVLMs in one training stage by injecting ground-truth labels into rollouts and applying novel reward controls, outperforming standalone and two-stage baselines on diverse benchmarks.

Efficient Long-Context Modeling in Diffusion Language Models via Block Approximate Sparse Attention

cs.CV · 2026-05-19 · unverdicted · novelty 5.0

BA-Att introduces pre-downsampled block selection with norm-sorting and diagonal covariance correction to approximate sparse attention, yielding up to 6.95x speedup at 50% sparsity across language, multimodal, and video models.

citing papers explorer

Showing 2 of 2 citing papers.

ViSurf: Visual Supervised-and-Reinforcement Fine-Tuning for Large Vision-and-Language Models cs.CV · 2025-10-12 · unverdicted · none · ref 27
ViSurf unifies SFT and RLVR for LVLMs in one training stage by injecting ground-truth labels into rollouts and applying novel reward controls, outperforming standalone and two-stage baselines on diverse benchmarks.
Efficient Long-Context Modeling in Diffusion Language Models via Block Approximate Sparse Attention cs.CV · 2026-05-19 · unverdicted · none · ref 32
BA-Att introduces pre-downsampled block selection with norm-sorting and diagonal covariance correction to approximate sparse attention, yielding up to 6.95x speedup at 50% sparsity across language, multimodal, and video models.

Docvqa: A dataset for vqa on document images

fields

years

verdicts

representative citing papers

citing papers explorer