Mmdocbench: Benchmarking large vision-language models for fine-grained visual document understanding.arXiv preprint arXiv:2410.21311, 2024

Fengbin Zhu, Ziyang Liu, Xiang Yao Ng, Haohui Wu, Wenjie Wang, Fuli Feng, Chao Wang, Huanbo Luan, Tat Seng Chua · 2024 · arXiv 2410.21311

1 Pith paper cite this work. Polarity classification is still indexing.

1 Pith paper citing it

read on arXiv browse 1 citing papers

representative citing papers

PIXELRAG: Web Screenshots Beat Text for Retrieval-Augmented Generation

cs.IR · 2026-06-01 · unverdicted · novelty 7.0

PixelRAG shows that operating RAG entirely over web screenshots outperforms text-based retrieval on NQ, SimpleQA, MMSearch, LiveVQA, and MoNaCo, with up to 18.1% accuracy gains and 3x token savings via image compression.

citing papers explorer

Showing 1 of 1 citing paper.

PIXELRAG: Web Screenshots Beat Text for Retrieval-Augmented Generation cs.IR · 2026-06-01 · unverdicted · none · ref 46
PixelRAG shows that operating RAG entirely over web screenshots outperforms text-based retrieval on NQ, SimpleQA, MMSearch, LiveVQA, and MoNaCo, with up to 18.1% accuracy gains and 3x token savings via image compression.

Mmdocbench: Benchmarking large vision-language models for fine-grained visual document understanding.arXiv preprint arXiv:2410.21311, 2024

fields

years

verdicts

representative citing papers

citing papers explorer