Unimernet: A universal network for real-world mathematical expression recognition

Bin Wang, Zhuangcheng Gu, Chao Xu, Bo Zhang, Botian Shi, Conghui He · 2024 · arXiv 2404.15254

7 Pith papers cite this work. Polarity classification is still indexing.

7 Pith papers citing it

read on arXiv browse 7 citing papers

citation-role summary

background 1 dataset 1

citation-polarity summary

background 1 use dataset 1

representative citing papers

MinerU2.5-Pro: Pushing the Limits of Data-Centric Document Parsing at Scale

cs.CV · 2026-04-06 · unverdicted · novelty 7.0

A fixed 1.2B model trained via diversity-aware sampling, cross-model verification, annotation refinement, and progressive stages achieves new state-of-the-art document parsing accuracy of 95.69 on OmniDocBench v1.6.

Q-Mask: Query-driven Causal Masks for Text Anchoring in OCR-Oriented Vision-Language Models

cs.CV · 2026-03-31 · unverdicted · novelty 7.0

Q-Mask uses query-conditioned causal masks to separate text location from recognition in OCR VLMs, backed by a new benchmark and 26M-pair training dataset.

Boosting Document Parsing Efficiency and Performance with Coarse-to-Fine Visual Processing

cs.CV · 2026-03-25 · conditional · novelty 6.0

PaddleOCR-VL uses a Valid Region Focus Module to select key visual tokens and a 0.9B model for guided recognition, delivering SOTA document parsing with far fewer tokens and parameters.

Towards Real-World Document Parsing via Realistic Scene Synthesis and Document-Aware Training

cs.CV · 2026-03-25 · unverdicted · novelty 6.0

A realistic scene synthesis strategy and document-aware training recipe enable a 1B-parameter MLLM to achieve superior accuracy and robustness in end-to-end parsing of real-world captured documents.

MinerU2.5: A Decoupled Vision-Language Model for Efficient High-Resolution Document Parsing

cs.CV · 2025-09-26 · unverdicted · novelty 6.0

MinerU2.5 uses a two-stage decoupled vision-language architecture to achieve state-of-the-art document parsing accuracy with lower computational overhead than existing general and domain-specific models.

MinerU: An Open-Source Solution for Precise Document Content Extraction

cs.CV · 2024-09-27 · conditional · novelty 4.0

MinerU delivers an open-source pipeline for high-precision document content extraction by integrating specialized models with tuned preprocessing and postprocessing rules.

Document Parsing Unveiled: Techniques, Challenges, and Prospects for Structured Information Extraction

cs.MM · 2024-10-28 · unverdicted · novelty 3.0

Survey proposing a taxonomy for document parsing into pipeline-based systems and VLM-driven unified models, reviewing components, metrics, benchmarks, and challenges.

citing papers explorer

Showing 7 of 7 citing papers.

MinerU2.5-Pro: Pushing the Limits of Data-Centric Document Parsing at Scale cs.CV · 2026-04-06 · unverdicted · none · ref 34
A fixed 1.2B model trained via diversity-aware sampling, cross-model verification, annotation refinement, and progressive stages achieves new state-of-the-art document parsing accuracy of 95.69 on OmniDocBench v1.6.
Q-Mask: Query-driven Causal Masks for Text Anchoring in OCR-Oriented Vision-Language Models cs.CV · 2026-03-31 · unverdicted · none · ref 34
Q-Mask uses query-conditioned causal masks to separate text location from recognition in OCR VLMs, backed by a new benchmark and 26M-pair training dataset.
Boosting Document Parsing Efficiency and Performance with Coarse-to-Fine Visual Processing cs.CV · 2026-03-25 · conditional · none · ref 50
PaddleOCR-VL uses a Valid Region Focus Module to select key visual tokens and a 0.9B model for guided recognition, delivering SOTA document parsing with far fewer tokens and parameters.
Towards Real-World Document Parsing via Realistic Scene Synthesis and Document-Aware Training cs.CV · 2026-03-25 · unverdicted · none · ref 44
A realistic scene synthesis strategy and document-aware training recipe enable a 1B-parameter MLLM to achieve superior accuracy and robustness in end-to-end parsing of real-world captured documents.
MinerU2.5: A Decoupled Vision-Language Model for Efficient High-Resolution Document Parsing cs.CV · 2025-09-26 · unverdicted · none · ref 45
MinerU2.5 uses a two-stage decoupled vision-language architecture to achieve state-of-the-art document parsing accuracy with lower computational overhead than existing general and domain-specific models.
MinerU: An Open-Source Solution for Precise Document Content Extraction cs.CV · 2024-09-27 · conditional · none · ref 32
MinerU delivers an open-source pipeline for high-precision document content extraction by integrating specialized models with tuned preprocessing and postprocessing rules.
Document Parsing Unveiled: Techniques, Challenges, and Prospects for Structured Information Extraction cs.MM · 2024-10-28 · unverdicted · none · ref 238
Survey proposing a taxonomy for document parsing into pipeline-based systems and VLM-driven unified models, reviewing components, metrics, benchmarks, and challenges.

Unimernet: A universal network for real-world mathematical expression recognition

citation-role summary

citation-polarity summary

fields

years

verdicts

roles

polarities

representative citing papers

citing papers explorer