PureDocBench shows document parsing is far from solved, with top models at ~74/100, small specialists competing with large VLMs, and ranking reversals under real degradation.
OCR-Reasoning benchmark: Unveiling the true capabilities of MLLMs in complex text-rich image reasoning
3 Pith papers cite this work. Polarity classification is still indexing.
years
2026 3representative citing papers
Chronicles-OCR is the first benchmark with 2,800 images across the complete evolutionary trajectory of Chinese characters, defining four tasks to evaluate VLLMs' cross-temporal visual perception.
Nemotron 3 Nano Omni is an efficient open multimodal model supporting audio, text, images, and video with reported accuracy gains and leading results on document understanding and long audio-video tasks.
citing papers explorer
-
How Far Is Document Parsing from Solved? PureDocBench: A Source-TraceableBenchmark across Clean, Degraded, and Real-World Settings
PureDocBench shows document parsing is far from solved, with top models at ~74/100, small specialists competing with large VLMs, and ranking reversals under real degradation.
-
Chronicles-OCR: A Cross-Temporal Perception Benchmark for the Evolutionary Trajectory of Chinese Characters
Chronicles-OCR is the first benchmark with 2,800 images across the complete evolutionary trajectory of Chinese characters, defining four tasks to evaluate VLLMs' cross-temporal visual perception.
-
Nemotron 3 Nano Omni: Efficient and Open Multimodal Intelligence
Nemotron 3 Nano Omni is an efficient open multimodal model supporting audio, text, images, and video with reported accuracy gains and leading results on document understanding and long audio-video tasks.