Monkeyocr: Document parsing with a structure-recognition-relation triplet paradigm

MonkeyOCR: Document Parsing with a Structure-Recognition-Relation Triplet Paradigm , author= · 2025 · arXiv 2506.05218

9 Pith papers cite this work. Polarity classification is still indexing.

9 Pith papers citing it

representative citing papers

How Far Is Document Parsing from Solved? PureDocBench: A Source-TraceableBenchmark across Clean, Degraded, and Real-World Settings

cs.CV · 2026-05-08 · conditional · novelty 8.0

PureDocBench shows document parsing is far from solved, with top models at ~74/100, small specialists competing with large VLMs, and ranking reversals under real degradation.

GlotOCR Bench: OCR Models Still Struggle Beyond a Handful of Unicode Scripts

cs.CL · 2026-04-14 · unverdicted · novelty 7.0

GlotOCR Bench shows that OCR models perform well on fewer than 10 scripts and fail to generalize beyond about 30, with results tracking pretraining coverage and models hallucinating from known scripts on unfamiliar ones.

The Character Error Vector: Decomposable errors for page-level OCR evaluation

cs.CV · 2026-04-07 · conditional · novelty 7.0

The Character Error Vector is a decomposable bag-of-characters evaluator for page-level OCR that remains defined under parsing errors and bridges parsing metrics with local CER.

MinerU2.5-Pro: Pushing the Limits of Data-Centric Document Parsing at Scale

cs.CV · 2026-04-06 · unverdicted · novelty 7.0

A fixed 1.2B model trained via diversity-aware sampling, cross-model verification, annotation refinement, and progressive stages achieves new state-of-the-art document parsing accuracy of 95.69 on OmniDocBench v1.6.

DocAtlas: Multilingual Document Understanding Across 80+ Languages

cs.CL · 2026-05-12 · unverdicted · novelty 6.0

DocAtlas creates multilingual document datasets across 82 languages and shows DPO with rendered ground truth improves model accuracy by 1.7-1.9% without degrading base-language performance, unlike supervised fine-tuning.

InstructTable: Improving Table Structure Recognition Through Instructions

cs.CV · 2026-04-03 · unverdicted · novelty 6.0

InstructTable combines instruction-guided pre-training on structural patterns with visual fine-tuning and a template-free synthetic data generator (TME) to reach state-of-the-art table structure recognition on public benchmarks and a new complex-table test set.

Parser-Oriented Structural Refinement for a Stable Layout Interface in Document Parsing

cs.CV · 2026-04-03 · unverdicted · novelty 6.0

A parser-oriented refinement stage performs set-level reasoning on detector hypotheses to jointly decide instance retention, refine boxes, and set parser input order, cutting reading order errors to 0.024 on OmniDocBench.

DeepSeek-OCR: Contexts Optical Compression

cs.CV · 2025-10-21 · unverdicted · novelty 6.0

DeepSeek-OCR compresses text contexts up to 20x via 2D optical mapping while achieving 97% OCR accuracy below 10x and 60% at 20x, outperforming prior OCR tools with fewer vision tokens.

DocSeeker: Structured Visual Reasoning with Evidence Grounding for Long Document Understanding

cs.AI · 2026-04-14 · unverdicted · novelty 5.0 · 2 refs

DocSeeker improves long-document understanding in MLLMs via a two-stage training process that combines supervised fine-tuning from distilled data with evidence-aware group relative policy optimization and memory-efficient resolution allocation.

citing papers explorer

Showing 9 of 9 citing papers.

How Far Is Document Parsing from Solved? PureDocBench: A Source-TraceableBenchmark across Clean, Degraded, and Real-World Settings cs.CV · 2026-05-08 · conditional · none · ref 27
PureDocBench shows document parsing is far from solved, with top models at ~74/100, small specialists competing with large VLMs, and ranking reversals under real degradation.
GlotOCR Bench: OCR Models Still Struggle Beyond a Handful of Unicode Scripts cs.CL · 2026-04-14 · unverdicted · none · ref 33
GlotOCR Bench shows that OCR models perform well on fewer than 10 scripts and fail to generalize beyond about 30, with results tracking pretraining coverage and models hallucinating from known scripts on unfamiliar ones.
The Character Error Vector: Decomposable errors for page-level OCR evaluation cs.CV · 2026-04-07 · conditional · none · ref 25
The Character Error Vector is a decomposable bag-of-characters evaluator for page-level OCR that remains defined under parsing errors and bridges parsing metrics with local CER.
MinerU2.5-Pro: Pushing the Limits of Data-Centric Document Parsing at Scale cs.CV · 2026-04-06 · unverdicted · none · ref 19
A fixed 1.2B model trained via diversity-aware sampling, cross-model verification, annotation refinement, and progressive stages achieves new state-of-the-art document parsing accuracy of 95.69 on OmniDocBench v1.6.
DocAtlas: Multilingual Document Understanding Across 80+ Languages cs.CL · 2026-05-12 · unverdicted · none · ref 8
DocAtlas creates multilingual document datasets across 82 languages and shows DPO with rendered ground truth improves model accuracy by 1.7-1.9% without degrading base-language performance, unlike supervised fine-tuning.
InstructTable: Improving Table Structure Recognition Through Instructions cs.CV · 2026-04-03 · unverdicted · none · ref 17
InstructTable combines instruction-guided pre-training on structural patterns with visual fine-tuning and a template-free synthetic data generator (TME) to reach state-of-the-art table structure recognition on public benchmarks and a new complex-table test set.
Parser-Oriented Structural Refinement for a Stable Layout Interface in Document Parsing cs.CV · 2026-04-03 · unverdicted · none · ref 14
A parser-oriented refinement stage performs set-level reasoning on detector hypotheses to jointly decide instance retention, refine boxes, and set parser input order, cutting reading order errors to 0.024 on OmniDocBench.
DeepSeek-OCR: Contexts Optical Compression cs.CV · 2025-10-21 · unverdicted · none · ref 18
DeepSeek-OCR compresses text contexts up to 20x via 2D optical mapping while achieving 97% OCR accuracy below 10x and 60% at 20x, outperforming prior OCR tools with fewer vision tokens.
DocSeeker: Structured Visual Reasoning with Evidence Grounding for Long Document Understanding cs.AI · 2026-04-14 · unverdicted · none · ref 6 · 2 links
DocSeeker improves long-document understanding in MLLMs via a two-stage training process that combines supervised fine-tuning from distilled data with evidence-aware group relative policy optimization and memory-efficient resolution allocation.

Monkeyocr: Document parsing with a structure-recognition-relation triplet paradigm

fields

years

verdicts

representative citing papers

citing papers explorer