Proceedings of the 30th ACM international conference on multimedia , pages=

Layoutlmv3: Pre-training for document ai with unified text, image masking , author=

3 Pith papers cite this work. Polarity classification is still indexing.

3 Pith papers citing it

browse 3 citing papers

representative citing papers

Beyond Bag-of-Patches: Learning Global Layout via Textual Supervision for Late-Interaction Visual Document Retrieval

cs.CV · 2026-05-08 · unverdicted · novelty 7.0

A text-supervised global layout embedding augments local patch representations in late-interaction VDR, yielding +2.4 nDCG@5 and +2.3 MAP@5 gains over ColPali/ColQwen baselines on ViDoRe-v2.

DocAtlas: Multilingual Document Understanding Across 80+ Languages

cs.CL · 2026-05-12 · unverdicted · novelty 6.0 · 2 refs

DocAtlas introduces model-free rendering pipelines to create DocTag-annotated datasets across 82 languages and shows DPO adaptation improves multilingual performance without base-language degradation.

CC-OCR V2: Benchmarking Large Multimodal Models for Literacy in Real-world Document Processing

cs.CL · 2026-05-05 · unverdicted · novelty 6.0

CC-OCR V2 reveals that state-of-the-art large multimodal models substantially underperform on challenging real-world document processing tasks.

citing papers explorer

Showing 3 of 3 citing papers.

Beyond Bag-of-Patches: Learning Global Layout via Textual Supervision for Late-Interaction Visual Document Retrieval cs.CV · 2026-05-08 · unverdicted · none · ref 20
A text-supervised global layout embedding augments local patch representations in late-interaction VDR, yielding +2.4 nDCG@5 and +2.3 MAP@5 gains over ColPali/ColQwen baselines on ViDoRe-v2.
DocAtlas: Multilingual Document Understanding Across 80+ Languages cs.CL · 2026-05-12 · unverdicted · none · ref 72 · 2 links
DocAtlas introduces model-free rendering pipelines to create DocTag-annotated datasets across 82 languages and shows DPO adaptation improves multilingual performance without base-language degradation.
CC-OCR V2: Benchmarking Large Multimodal Models for Literacy in Real-world Document Processing cs.CL · 2026-05-05 · unverdicted · none · ref 40
CC-OCR V2 reveals that state-of-the-art large multimodal models substantially underperform on challenging real-world document processing tasks.

Proceedings of the 30th ACM international conference on multimedia , pages=

fields

years

verdicts

representative citing papers

citing papers explorer