MathNet delivers the largest multilingual Olympiad math dataset and benchmarks where models like Gemini-3.1-Pro reach 78% on solving but embedding models struggle on equivalent problem retrieval, with retrieval augmentation yielding up to 12% gains.
arXiv preprint arXiv:2308.13418 , year=
13 Pith papers cite this work. Polarity classification is still indexing.
citation-role summary
citation-polarity summary
roles
background 1polarities
background 1representative citing papers
PaperFit uses rendered page images in a closed loop to diagnose and repair typesetting defects in LaTeX documents, outperforming baselines on a new benchmark of 200 papers.
ShredBench shows state-of-the-art MLLMs perform well on intact documents but suffer sharp drops in restoration accuracy as fragmentation increases to 8-16 pieces, indicating insufficient cross-modal semantic reasoning for VRDU.
MasterSet is a new large-scale benchmark for must-cite citation recommendation in AI/ML, using LLM-annotated tiers on 150k papers and Recall@K evaluation.
LLM adoption in science follows a compressing inverted-U trajectory where release year predicts time-to-peak and lifespan better than model attributes.
A fixed 1.2B model trained via diversity-aware sampling, cross-model verification, annotation refinement, and progressive stages achieves new state-of-the-art document parsing accuracy of 95.69 on OmniDocBench v1.6.
DocAtlas creates multilingual document datasets across 82 languages and shows DPO with rendered ground truth improves model accuracy by 1.7-1.9% without degrading base-language performance, unlike supervised fine-tuning.
SciTikZer-8B uses a new dataset, benchmark, and dual self-consistency RL to generate TikZ code for scientific graphics, outperforming much larger models like Gemini-2.5-Pro.
DeepSeek-OCR compresses text contexts up to 20x via 2D optical mapping while achieving 97% OCR accuracy below 10x and 60% at 20x, outperforming prior OCR tools with fewer vision tokens.
GLM-4.5V reaches state-of-the-art results on 42 multimodal benchmarks among open-source models of similar size by applying reinforcement learning with curriculum sampling to a strong vision foundation model.
RESCORE recovers task-coherent simulations from 40.7% of 500 CDC papers via a three-component LLM agent pipeline and claims a 10X speedup over manual human replication.
DeepSeek-VL develops open-source 1.3B and 7B vision-language models that achieve competitive or state-of-the-art results on real-world visual-language benchmarks through diverse data curation, a hybrid vision encoder, and pretraining that preserves language capabilities.
MolSeek-OCR reaches exact SMILES matching accuracy comparable to leading image-to-sequence OCSR models after two-stage fine-tuning on PubChem renderings and USPTO-MOL patent images, but remains below image-to-graph state-of-the-art.
citing papers explorer
-
MathNet: a Global Multimodal Benchmark for Mathematical Reasoning and Retrieval
MathNet delivers the largest multilingual Olympiad math dataset and benchmarks where models like Gemini-3.1-Pro reach 78% on solving but embedding models struggle on equivalent problem retrieval, with retrieval augmentation yielding up to 12% gains.
-
PaperFit: Vision-in-the-Loop Typesetting Optimization for Scientific Documents
PaperFit uses rendered page images in a closed loop to diagnose and repair typesetting defects in LaTeX documents, outperforming baselines on a new benchmark of 200 papers.
-
ShredBench: Evaluating the Semantic Reasoning Capabilities of Multimodal LLMs in Document Reconstruction
ShredBench shows state-of-the-art MLLMs perform well on intact documents but suffer sharp drops in restoration accuracy as fragmentation increases to 8-16 pieces, indicating insufficient cross-modal semantic reasoning for VRDU.
-
MasterSet: A Large-Scale Benchmark for Must-Cite Citation Recommendation in the AI/ML Literature
MasterSet is a new large-scale benchmark for must-cite citation recommendation in AI/ML, using LLM-annotated tiers on 150k papers and Recall@K evaluation.
-
The Shrinking Lifespan of LLMs in Science
LLM adoption in science follows a compressing inverted-U trajectory where release year predicts time-to-peak and lifespan better than model attributes.
-
MinerU2.5-Pro: Pushing the Limits of Data-Centric Document Parsing at Scale
A fixed 1.2B model trained via diversity-aware sampling, cross-model verification, annotation refinement, and progressive stages achieves new state-of-the-art document parsing accuracy of 95.69 on OmniDocBench v1.6.
-
DocAtlas: Multilingual Document Understanding Across 80+ Languages
DocAtlas creates multilingual document datasets across 82 languages and shows DPO with rendered ground truth improves model accuracy by 1.7-1.9% without degrading base-language performance, unlike supervised fine-tuning.
-
Scientific Graphics Program Synthesis via Dual Self-Consistency Reinforcement Learning
SciTikZer-8B uses a new dataset, benchmark, and dual self-consistency RL to generate TikZ code for scientific graphics, outperforming much larger models like Gemini-2.5-Pro.
-
DeepSeek-OCR: Contexts Optical Compression
DeepSeek-OCR compresses text contexts up to 20x via 2D optical mapping while achieving 97% OCR accuracy below 10x and 60% at 20x, outperforming prior OCR tools with fewer vision tokens.
-
GLM-4.5V and GLM-4.1V-Thinking: Towards Versatile Multimodal Reasoning with Scalable Reinforcement Learning
GLM-4.5V reaches state-of-the-art results on 42 multimodal benchmarks among open-source models of similar size by applying reinforcement learning with curriculum sampling to a strong vision foundation model.
-
RESCORE: LLM-Driven Simulation Recovery in Control Systems Research Papers
RESCORE recovers task-coherent simulations from 40.7% of 500 CDC papers via a three-component LLM agent pipeline and claims a 10X speedup over manual human replication.
-
DeepSeek-VL: Towards Real-World Vision-Language Understanding
DeepSeek-VL develops open-source 1.3B and 7B vision-language models that achieve competitive or state-of-the-art results on real-world visual-language benchmarks through diverse data curation, a hybrid vision encoder, and pretraining that preserves language capabilities.
-
Fine-tuning DeepSeek-OCR-2 for Molecular Structure Recognition
MolSeek-OCR reaches exact SMILES matching accuracy comparable to leading image-to-sequence OCSR models after two-stage fine-tuning on PubChem renderings and USPTO-MOL patent images, but remains below image-to-graph state-of-the-art.