In: Findings of the 61st Annual Meeting of the Association for Computational Linguistics (2023), https://arxiv.org/abs/ 2212.10505 10

Fangyu Liu, Julian Martin Eisenschlos, Francesco Piccinno, Syrine Krichene, Chenxi Pang, Kenton Lee, Mandar Joshi, Wenhu Chen, Nigel Collier, Yasemin Altun · 2023 · arXiv 2212.10505

6 Pith papers cite this work. Polarity classification is still indexing.

6 Pith papers citing it

read on arXiv browse 6 citing papers

citation-role summary

background 1

citation-polarity summary

background 1

representative citing papers

PlotChain: Deterministic Checkpointed Evaluation of Multimodal LLMs on Engineering Plot Reading

cs.AI · 2026-01-29 · conditional · novelty 7.0

PlotChain benchmark reports top MLLMs reaching ~80% field-level accuracy on engineering plot reading under human-like tolerances, but with persistent failures on frequency-domain tasks like bandpass and FFT spectra.

Making Multimodal LLMs Reliable Chart Data Extractors: A Benchmark and Training Framework

cs.HC · 2026-06-29 · unverdicted · novelty 6.0

Introduces a benchmark for MLLM-based chart data extraction from unlabeled images and a human-centered training framework that reaches SOTA numerical accuracy with a 7B model.

Boosting Document Parsing Efficiency and Performance with Coarse-to-Fine Visual Processing

cs.CV · 2026-03-25 · conditional · novelty 6.0

PaddleOCR-VL uses a Valid Region Focus Module to select key visual tokens and a 0.9B model for guided recognition, delivering SOTA document parsing with far fewer tokens and parameters.

OCRBench: On the Hidden Mystery of OCR in Large Multimodal Models

cs.CV · 2023-05-13 · accept · novelty 6.0

OCRBench provides the largest evaluation suite yet for OCR capabilities in large multimodal models, revealing gaps in multilingual, handwritten, and mathematical text handling.

General OCR Theory: Towards OCR-2.0 via a Unified End-to-end Model

cs.CV · 2024-09-03 · unverdicted · novelty 5.0

GOT is a unified end-to-end model that treats all man-made optical signals as characters and handles multiple OCR tasks including formatted output and interactive region recognition via prompts.

PaLI-X: On Scaling up a Multilingual Vision and Language Model

cs.CV · 2023-05-29 · unverdicted · novelty 4.0

Scaling a multilingual vision-language model in size and training breadth yields new state-of-the-art results on over 25 benchmarks plus emerging abilities in counting and multilingual detection.

citing papers explorer

Showing 6 of 6 citing papers.

PlotChain: Deterministic Checkpointed Evaluation of Multimodal LLMs on Engineering Plot Reading cs.AI · 2026-01-29 · conditional · none · ref 17
PlotChain benchmark reports top MLLMs reaching ~80% field-level accuracy on engineering plot reading under human-like tolerances, but with persistent failures on frequency-domain tasks like bandpass and FFT spectra.
Making Multimodal LLMs Reliable Chart Data Extractors: A Benchmark and Training Framework cs.HC · 2026-06-29 · unverdicted · none · ref 36
Introduces a benchmark for MLLM-based chart data extraction from unlabeled images and a human-centered training framework that reaches SOTA numerical accuracy with a 7B model.
Boosting Document Parsing Efficiency and Performance with Coarse-to-Fine Visual Processing cs.CV · 2026-03-25 · conditional · none · ref 25
PaddleOCR-VL uses a Valid Region Focus Module to select key visual tokens and a 0.9B model for guided recognition, delivering SOTA document parsing with far fewer tokens and parameters.
OCRBench: On the Hidden Mystery of OCR in Large Multimodal Models cs.CV · 2023-05-13 · accept · none · ref 120
OCRBench provides the largest evaluation suite yet for OCR capabilities in large multimodal models, revealing gaps in multilingual, handwritten, and mathematical text handling.
General OCR Theory: Towards OCR-2.0 via a Unified End-to-end Model cs.CV · 2024-09-03 · unverdicted · none · ref 22
GOT is a unified end-to-end model that treats all man-made optical signals as characters and handles multiple OCR tasks including formatted output and interactive region recognition via prompts.
PaLI-X: On Scaling up a Multilingual Vision and Language Model cs.CV · 2023-05-29 · unverdicted · none · ref 8
Scaling a multilingual vision-language model in size and training breadth yields new state-of-the-art results on over 25 benchmarks plus emerging abilities in counting and multilingual detection.

In: Findings of the 61st Annual Meeting of the Association for Computational Linguistics (2023), https://arxiv.org/abs/ 2212.10505 10

citation-role summary

citation-polarity summary

fields

years

verdicts

roles

polarities

representative citing papers

citing papers explorer