hub Mixed citations

Nougat: Neural Optical Understanding for Academic Documents

Lukas Blecher, Guillem Cucurull, Thomas Scialom, Robert Stojnic · 2023 · cs.LG · arXiv 2308.13418

Mixed citation behavior. Most common role is background (56%).

34 Pith papers citing it

Background 56% of classified citations

open full Pith review browse 34 citing papers arXiv PDF

abstract

Scientific knowledge is predominantly stored in books and scientific journals, often in the form of PDFs. However, the PDF format leads to a loss of semantic information, particularly for mathematical expressions. We propose Nougat (Neural Optical Understanding for Academic Documents), a Visual Transformer model that performs an Optical Character Recognition (OCR) task for processing scientific documents into a markup language, and demonstrate the effectiveness of our model on a new dataset of scientific documents. The proposed approach offers a promising solution to enhance the accessibility of scientific knowledge in the digital age, by bridging the gap between human-readable documents and machine-readable text. We release the models and code to accelerate future work on scientific text recognition.

hub tools

JSON dossier citing papers JSON arXiv source

citation-role summary

background 6 method 2 baseline 1

citation-polarity summary

background 5 use method 2 baseline 1 unclear 1

representative citing papers

MathNet: a Global Multimodal Benchmark for Mathematical Reasoning and Retrieval

cs.AI · 2026-04-20 · accept · novelty 8.0

MathNet delivers the largest multilingual Olympiad math dataset and benchmarks where models like Gemini-3.1-Pro reach 78% on solving but embedding models struggle on equivalent problem retrieval, with retrieval augmentation yielding up to 12% gains.

A document is worth a structured record: Principled inductive bias design for document recognition

cs.CV · 2025-07-11 · unverdicted · novelty 8.0

Introduces a method to design structure-specific relational inductive biases for a base transformer architecture, enabling end-to-end transcription of documents with intrinsic structures, demonstrated on sheet music, shape drawings, and mechanical engineering drawings.

Structured Layout Priors for Robust Out-of-Distribution Visual Document Understanding

cs.CV · 2026-05-19 · conditional · novelty 7.0

Injecting pre-computed layout priors from RT-DETR into VLM prompts raises markdown F1 from 0.37 to 0.92 on a 10k-page OOD benchmark and cuts infinite-loop failures across domains.

PaperFit: Vision-in-the-Loop Typesetting Optimization for Scientific Documents

cs.AI · 2026-05-11 · unverdicted · novelty 7.0

PaperFit uses rendered page images in a closed loop to diagnose and repair typesetting defects in LaTeX documents, outperforming baselines on a new benchmark of 200 papers.

ShredBench: Evaluating the Semantic Reasoning Capabilities of Multimodal LLMs in Document Reconstruction

cs.CV · 2026-04-26 · unverdicted · novelty 7.0

ShredBench shows state-of-the-art MLLMs perform well on intact documents but suffer sharp drops in restoration accuracy as fragmentation increases to 8-16 pieces, indicating insufficient cross-modal semantic reasoning for VRDU.

MasterSet: A Large-Scale Benchmark for Must-Cite Citation Recommendation in the AI/ML Literature

cs.IR · 2026-04-20 · unverdicted · novelty 7.0

MasterSet is a new large-scale benchmark for must-cite citation recommendation in AI/ML, using LLM-annotated tiers on 150k papers and Recall@K evaluation.

The Shrinking Lifespan of LLMs in Science

cs.DL · 2026-04-08 · unverdicted · novelty 7.0

LLM adoption in science follows a compressing inverted-U trajectory where release year predicts time-to-peak and lifespan better than model attributes.

MinerU2.5-Pro: Pushing the Limits of Data-Centric Document Parsing at Scale

cs.CV · 2026-04-06 · unverdicted · novelty 7.0

A fixed 1.2B model trained via diversity-aware sampling, cross-model verification, annotation refinement, and progressive stages achieves new state-of-the-art document parsing accuracy of 95.69 on OmniDocBench v1.6.

CodeOCR: On the Effectiveness of Vision Language Models in Code Understanding

cs.CL · 2026-02-02 · unverdicted · novelty 7.0

Multimodal LLMs process code as images to achieve up to 8x token compression, with visual cues like syntax highlighting aiding tasks and clone detection remaining resilient or even improving under compression.

Consensus Entropy: Harnessing Multi-VLM Agreement for Self-Verifying and Self-Improving OCR

cs.CV · 2025-04-15 · conditional · novelty 7.0

Consensus Entropy measures inter-VLM output agreement to verify OCR reliability and enable self-improving ensembles, yielding 42.1% F1 gains over single-model judging.

Semantic-Guided Reading Order Reconstruction in Historical Armenian Newspapers with LLMs

cs.CV · 2026-07-01 · unverdicted · novelty 6.0

Hybrid semantic-LLM method for reading order reconstruction in Armenian historical newspapers outperforms baselines on a new 66-page dataset while releasing a specialized Tesseract OCR model.

Invoice Haystack: Benchmarking Document Retrieval and Visual Question Answering Under Strong Visual Homogeneity

cs.CV · 2026-06-24 · unverdicted · novelty 6.0

Presents Invoice Haystack benchmark for homogeneous document retrieval and VL-RAG hybrid framework achieving 60% Recall@1 and up to 13.5 point gains over prior methods.

Any2Poster: Any-Source Poster Generation Across Modalities and Domains

cs.CV · 2026-06-01 · unverdicted · novelty 6.0

Any2Poster Bench tests poster generation from 8 modalities and 5 domains using quizzes and VLM judgments; Any2Poster Agent reaches 87% accuracy and beats prior paper-only methods.

Ryze: Evidence-Enriched Data Synthesis from Biomedical Papers

cs.AI · 2026-05-30 · unverdicted · novelty 6.0

Ryze automates evidence-enriched QA synthesis from biomedical papers to produce BioVLM-8B, which reaches 48.0% weighted accuracy on LAB-Bench (+12.6pp over base, +3.8pp over GPT-5.2) at under $200 cost.

MPDocBench-Parse: Benchmarking Practical Multi-page Document Parsing

cs.AI · 2026-05-21 · unverdicted · novelty 6.0 · 2 refs

MPDocBench-Parse provides 433 annotated multi-page documents and an evaluation protocol covering text/table/formula extraction, merging, figure extraction, reading order, and heading hierarchy for realistic document parsing.

UniVL: Unified Vision-Language Embedding for Spatially Grounded Contextual Image Generation

cs.CV · 2026-05-20 · unverdicted · novelty 6.0

UniVL unifies vision and language into one mask-rendered input processed by an OCR backbone to condition diffusion models for spatially grounded image generation without a standalone text encoder.

DocAtlas: Multilingual Document Understanding Across 80+ Languages

cs.CL · 2026-05-12 · unverdicted · novelty 6.0

DocAtlas introduces model-free rendering pipelines to create DocTag-annotated datasets across 82 languages and shows DPO adaptation improves multilingual performance without base-language degradation.

Scientific Graphics Program Synthesis via Dual Self-Consistency Reinforcement Learning

cs.CV · 2026-04-07 · unverdicted · novelty 6.0

SciTikZer-8B uses a new dataset, benchmark, and dual self-consistency RL to generate TikZ code for scientific graphics, outperforming much larger models like Gemini-2.5-Pro.

AdaQE-CG: Adaptive Query Expansion for Web-Scale Generative AI Model and Data Card Generation

cs.AI · 2026-03-16 · unverdicted · novelty 6.0

AdaQE-CG uses context-aware adaptive query expansion and inter-card knowledge transfer from a MetaGAI Pool to generate higher-quality model and data cards than prior methods, validated on the new expert-annotated MetaGAI-Bench.

SciPostGen: Bridging the Gap between Scientific Papers and Poster Layouts

cs.CV · 2025-11-27 · conditional · novelty 6.0

SciPostGen supplies a paired dataset linking paper structure to poster layouts and shows that retrieval of matching layouts improves generation while respecting user constraints.

DeepSeek-OCR: Contexts Optical Compression

cs.CV · 2025-10-21 · unverdicted · novelty 6.0

DeepSeek-OCR compresses text contexts up to 20x via 2D optical mapping while achieving 97% OCR accuracy below 10x and 60% at 20x, outperforming prior OCR tools with fewer vision tokens.

MinerU2.5: A Decoupled Vision-Language Model for Efficient High-Resolution Document Parsing

cs.CV · 2025-09-26 · unverdicted · novelty 6.0

MinerU2.5 uses a two-stage decoupled vision-language architecture to achieve state-of-the-art document parsing accuracy with lower computational overhead than existing general and domain-specific models.

GLM-4.5V and GLM-4.1V-Thinking: Towards Versatile Multimodal Reasoning with Scalable Reinforcement Learning

cs.CV · 2025-07-01 · unverdicted · novelty 6.0

GLM-4.5V reaches state-of-the-art results on 42 multimodal benchmarks among open-source models of similar size by applying reinforcement learning with curriculum sampling to a strong vision foundation model.

An AI-ready, Polarized Electron-Positron Collision Dataset

hep-ex · 2026-05-29 · unverdicted · novelty 5.0

Release of an AI-ready dataset containing approximately 660,000 reconstructed polarized e+e- collision events at 91.2 GeV from the SLD experiment, translated from legacy formats with accompanying digitized documentation.

citing papers explorer

Showing 2 of 2 citing papers after filters.

CodeOCR: On the Effectiveness of Vision Language Models in Code Understanding cs.CL · 2026-02-02 · unverdicted · none · ref 14 · internal anchor
Multimodal LLMs process code as images to achieve up to 8x token compression, with visual cues like syntax highlighting aiding tasks and clone detection remaining resilient or even improving under compression.
GLM-4.5V and GLM-4.1V-Thinking: Towards Versatile Multimodal Reasoning with Scalable Reinforcement Learning cs.CV · 2025-07-01 · unverdicted · none · ref 5 · internal anchor
GLM-4.5V reaches state-of-the-art results on 42 multimodal benchmarks among open-source models of similar size by applying reinforcement learning with curriculum sampling to a strong vision foundation model.

Nougat: Neural Optical Understanding for Academic Documents

hub tools

citation-role summary

citation-polarity summary

fields

years

verdicts

roles

polarities

representative citing papers

citing papers explorer