pith. sign in

arxiv: 2606.29808 · v1 · pith:M7K6O6G2new · submitted 2026-06-29 · 💻 cs.HC · cs.AI

Making Multimodal LLMs Reliable Chart Data Extractors: A Benchmark and Training Framework

Pith reviewed 2026-06-30 05:27 UTC · model grok-4.3

classification 💻 cs.HC cs.AI
keywords multimodal LLMschart data extractionbenchmarkprogressive trainingnumerical accuracymixed-initiative systems
0
0 comments X

The pith

A training framework based on progressive human-like learning enables 7B multimodal models to extract precise chart data at state-of-the-art levels.

A machine-rendered reading of the paper's core claim, the machinery that carries it, and where it could break.

The paper creates a benchmark using diverse real-world charts lacking data labels to assess multimodal large language models' ability to extract data tables. While these models reconstruct table structures reliably, they fall short on recovering exact numerical values. The authors introduce a training framework that follows a progressive process akin to human chart reading to address this shortfall. This method boosts numerical accuracy, letting a 7B model achieve top performance. User studies confirm the model aids mixed-initiative extraction workflows.

Core claim

Chart data extraction from images should mimic the progressive learning process humans use when reading charts. Implementing this in a training framework for multimodal LLMs substantially improves the recovery of precise numerical values from unlabeled charts, resulting in state-of-the-art performance from a 7B-parameter model on a new benchmark of real-world charts.

What carries the argument

The progressive training framework that models chart reading as a step-by-step human-like process to improve value recovery.

Load-bearing premise

That modeling chart reading as a progressive human-like learning process will reliably close the gap in precise value recovery on the authors' benchmark of unlabeled real-world charts.

What would settle it

Running the progressive training on the benchmark and finding no significant gain in numerical accuracy over baseline fine-tuning would falsify the central claim.

Figures

Figures reproduced from arXiv: 2606.29808 by Dazhen Deng, Kuilin Peng, Liqi Cheng, Peizhi Ying, Yingcai Wu, Yuan Tian, Yuchen He.

Figure 1
Figure 1. Figure 1: Chart data extraction aims to recover the underlying data table from a chart image. Although existing interactive [PITH_FULL_IMAGE:figures/full_fig_p001_1.png] view at source ↗
Figure 2
Figure 2. Figure 2: Prompt creation in ExChart-Bench. We provide the [PITH_FULL_IMAGE:figures/full_fig_p006_2.png] view at source ↗
Figure 3
Figure 3. Figure 3: The benchmark dataset construction pipeline. We collected both real-world and synthetic chart sources, removed data [PITH_FULL_IMAGE:figures/full_fig_p007_3.png] view at source ↗
Figure 4
Figure 4. Figure 4: Distribution of benchmark samples and evaluation results of MLLMs across different chart types. (A) Distribution [PITH_FULL_IMAGE:figures/full_fig_p008_4.png] view at source ↗
Figure 5
Figure 5. Figure 5: Trends in Adaptive MAPE relative to the position [PITH_FULL_IMAGE:figures/full_fig_p010_5.png] view at source ↗
Figure 6
Figure 6. Figure 6: Common failure cases of MLLMs on our benchmark: (A) Sudden changes in value on line charts. (B) Pie slices exceeding [PITH_FULL_IMAGE:figures/full_fig_p011_6.png] view at source ↗
Figure 7
Figure 7. Figure 7: Success examples, where each chart achieves less than 1% average Adaptive MAPE across Gemini 2.5 Flash, GLM-4.5V, and GPT-4.1. between marks to infer values, leading to error accumulation in sequential outputs. By explicitly training models to reason about co￾ordinate geometry, we aim to reduce this dependency and improve overall robustness. In this task, for each type of coordinate system, the model is gi… view at source ↗
Figure 8
Figure 8. Figure 8: The proposed training framework includes a [PITH_FULL_IMAGE:figures/full_fig_p012_8.png] view at source ↗
Figure 9
Figure 9. Figure 9: Examples of training samples for the Coordinate [PITH_FULL_IMAGE:figures/full_fig_p012_9.png] view at source ↗
Figure 10
Figure 10. Figure 10: Using the prototype system to validate and correct extracted chart data for a basic line chart. Users first calibrate the [PITH_FULL_IMAGE:figures/full_fig_p015_10.png] view at source ↗
read the original abstract

Chart data extraction, which reverse-engineers data tables from chart images, is essential for reproducibility, analysis, retrieval, and redesign. Existing interactive tools are reliable but tedious, and mixed-initiative systems, while more efficient, lack generalizability. Recent multimodal large language models (MLLMs) offer a unified interface for chart interpretation, yet their ability to extract accurate data tables, especially without visible labels, remains unclear. We build a benchmark featuring diverse real-world charts without data labels to evaluate this capability. Results show that, while current MLLMs reliably reconstruct table structures, they struggle with precise value recovery. To address this, we revisit chart data extraction from a human-centered perspective and argue that extraction should follow a progressive learning process similar to how people read charts. Our training framework substantially improves numerical accuracy, achieving state-of-the-art performance with a 7B-parameter model. A user study further shows that our model effectively supports mixed-initiative workflows for reliable chart data extraction.

Editorial analysis

A structured set of objections, weighed in public.

Desk editor's note, referee report, simulated authors' rebuttal, and a circularity audit. Tearing a paper down is the easy half of reading it; the pith above is the substance, this is the friction.

Referee Report

1 major / 0 minor

Summary. The paper introduces a benchmark of diverse unlabeled real-world charts to evaluate MLLMs on chart data extraction. It reports that current MLLMs reliably reconstruct table structures but struggle with precise numerical value recovery. The authors propose a human-centered progressive training framework modeled on how people read charts; this framework is claimed to substantially improve numerical accuracy and reach state-of-the-art performance using a 7B-parameter model. A user study is presented showing the model supports mixed-initiative workflows.

Significance. If the quantitative claims hold on a properly constructed benchmark, the work would demonstrate a practical route to reliable chart data extraction with relatively small models, which could benefit reproducibility, data analysis, and visualization tools. The human-centered progressive-learning framing is a constructive angle that merits empirical testing.

major comments (1)
  1. [Abstract] Abstract: the central claim of 'state-of-the-art performance' and 'substantially improves numerical accuracy' is asserted without any reported metrics, baselines, dataset sizes, error bars, or exclusion criteria, rendering the magnitude and reliability of the improvement unverifiable from the provided text.

Simulated Author's Rebuttal

1 responses · 0 unresolved

We thank the referee for their review and for highlighting the need for greater verifiability in the abstract. We address the comment below and will revise the manuscript accordingly.

read point-by-point responses
  1. Referee: [Abstract] Abstract: the central claim of 'state-of-the-art performance' and 'substantially improves numerical accuracy' is asserted without any reported metrics, baselines, dataset sizes, error bars, or exclusion criteria, rendering the magnitude and reliability of the improvement unverifiable from the provided text.

    Authors: We agree that the abstract, as currently written, summarizes the claims at a high level without the supporting quantitative details the referee requests. The full manuscript reports these elements in the experimental sections (benchmark construction, baseline comparisons, numerical accuracy results with standard deviations, and dataset statistics). To make the central claims immediately verifiable from the abstract itself, we will revise it to include the key metrics, baseline references, and dataset scale. This is a straightforward and warranted change. revision: yes

Circularity Check

0 steps flagged

No significant circularity detected

full rationale

The provided abstract and text describe a benchmark of unlabeled real-world charts plus a human-centered progressive training framework for MLLMs. No equations, fitted parameters, self-citations, or uniqueness theorems appear. The central claim of improved numerical accuracy is presented as an empirical outcome of the framework rather than a derivation that reduces to its own inputs by construction. The derivation chain is therefore self-contained against external benchmarks.

Axiom & Free-Parameter Ledger

0 free parameters · 0 axioms · 0 invented entities

Abstract describes no free parameters, background axioms, or new postulated entities.

pith-pipeline@v0.9.1-grok · 5718 in / 953 out tokens · 25281 ms · 2026-06-30T05:27:20.707663+00:00 · methodology

discussion (0)

Sign in with ORCID, Apple, or X to comment. Anyone can read and Pith papers without signing in.

Reference graph

Works this paper leans on

85 extracted references · 60 canonical work pages · 5 internal anchors

  1. [1]

    Mubashara Akhtar, Nikesh Subedi, Vivek Gupta, Sahar Tahmasebi, Oana Co- carascu, and Elena Simperl. 2024. ChartCheck: Explainable Fact-Checking over Real-World Chart Images. InFindings of the Association for Computational Lin- guistics: ACL 2024. Association for Computational Linguistics, Bangkok, Thailand, 13921–13937. doi:10.18653/v1/2024.findings-acl.828

  2. [2]

    Shuai Bai, Keqin Chen, Xuejing Liu, Jialin Wang, Wenbin Ge, Sibo Song, Kai Dang, Peng Wang, Shijie Wang, Jun Tang, et al. 2025. Qwen2.5-VL Technical Report. arXiv:2502.13923 [cs.CV]

  3. [3]

    Kortum, and James T

    Aaron Bangor, Philip T. Kortum, and James T. Miller. 2008. An Empirical Evalu- ation of the System Usability Scale.International Journal of Human–Computer Interaction24, 6 (2008), 574–594. doi:10.1080/10447310802205776

  4. [4]

    Ekaba Bisong. 2019. Matplotlib and Seaborn. InBuilding Machine Learning and Deep Learning Models on Google Cloud Platform: A Comprehensive Guide for Beginners. Apress, Berkeley, CA, 151–165. doi:10.1007/978-1-4842-4470-8_12

  5. [5]

    John Brooke. 2013. SUS: a retrospective.J. Usability Studies8, 2 (Feb. 2013), 29–40

  6. [6]

    Chengliang Chai, Guoliang Li, Ju Fan, and Yuyu Luo. 2021. CrowdChart: Crowd- sourced Data Extraction From Visualization Charts.IEEE Transactions on Knowl- edge and Data Engineering33, 11 (2021), 3537–3549. doi:10.1109/TKDE.2020. 2972543

  7. [7]

    Jinyue Chen, Lingyu Kong, Haoran Wei, Chenglong Liu, Zheng Ge, Liang Zhao, Jianjian Sun, Chunrui Han, and Xiangyu Zhang. 2024. OneChart: Purify the Chart Structural Extraction via One Auxiliary Token. InProceedings of the 32nd ACM International Conference on Multimedia(Melbourne VIC, Australia)(MM ’24). Association for Computing Machinery, New York, NY, U...

  8. [8]

    Nan Chen, Yuge Zhang, Jiahang Xu, Kan Ren, and Yuqing Yang. 2025. VisEval: A Benchmark for Data Visualization in the Era of Large Language Models.IEEE Transactions on Visualization and Computer Graphics31, 1 (2025), 1301–1311. doi:10.1109/TVCG.2024.3456320

  9. [9]

    Shiqi Chen, Tongyao Zhu, Ruochen Zhou, Jinghan Zhang, Siyang Gao, Juan Car- los Niebles, Mor Geva, Junxian He, Jiajun Wu, and Manling Li. 2025. Why Is Spatial Reasoning Hard for VLMs? An Attention Mechanism Perspective on Focus Areas. arXiv:2503.01773 [cs.CL]

  10. [10]

    Zixin Chen, Sicheng Song, KaShun Shum, Yanna Lin, Rui Sheng, Weiqi Wang, and Huamin Qu. 2025. Unmasking Deceptive Visuals: Benchmarking Multimodal Large Language Models on Misleading Chart Question Answering. InProceedings of the 2025 Conference on Empirical Methods in Natural Language Processing. Association for Computational Linguistics, Suzhou, China, ...

  11. [11]

    Zhe Chen, Jiannan Wu, Wenhai Wang, Weijie Su, Guo Chen, Sen Xing, Muyan Zhong, Qinglong Zhang, Xizhou Zhu, Lewei Lu, et al. 2024. InternVL: Scaling up Vision Foundation Models and Aligning for Generic Visual-Linguistic Tasks. In Proceedings of the IEEE/CVF Conference on Computer Vision and Pattern Recognition (CVPR). 24185–24198

  12. [12]

    Gheorghe Comanici, Eric Bieber, Mike Schaekermann, Ice Pasupat, Noveen Sachdeva, Inderjit Dhillon, Marcel Blistein, Ori Ram, Dan Zhang, Evan Rosen, et al . 2025. Gemini 2.5: Pushing the Frontier with Advanced Reason- ing, Multimodality, Long Context, and Next Generation Agentic Capabilities. arXiv:2507.06261 [cs.CL]

  13. [13]

    Wenjing Dai, Meng Wang, Zhibin Niu, and Jiawan Zhang. 2018. Chart decoder: Generating textual and numeric information from chart images automatically. Journal of Visual Languages & Computing48 (2018), 101–109

  14. [14]

    Amit Kumar Das, Mohammad Tarun, and Klaus Mueller. 2025. Charts-of-Thought: Enhancing LLM Visualization Literacy through Structured Data Extraction.IEEE Transactions on Visualization and Computer Graphics(2025), 1–11. doi:10.1109/ TVCG.2025.3634813

  15. [15]

    Kenny Davila, Srirangaraj Setlur, David Doermann, Bhargava Urala Kota, and Venu Govindaraju. 2020. Chart mining: A survey of methods for automated chart analysis.IEEE transactions on pattern analysis and machine intelligence43, 11 (2020), 3799–3819

  16. [16]

    Dazhen Deng, Yihong Wu, Xinhuan Shu, Jiang Wu, Siwei Fu, Weiwei Cui, and Yingcai Wu. 2023. VisImages: A Fine-Grained Expert-Annotated Visualization Dataset.IEEE Transactions on Visualization and Computer Graphics29, 7 (2023), 3298–3311. doi:10.1109/TVCG.2022.3155440

  17. [17]

    Yucheng Han, Chi Zhang, Xin Chen, Xu Yang, Zhibin Wang, Gang Yu, Bin Fu, and Hanwang Zhang. 2023. ChartLlama: A Multimodal LLM for Chart Understanding and Generation. arXiv:2311.16483 [cs.CV]

  18. [18]

    Wenyi Hong, Wenmeng Yu, Xiaotao Gu, Guo Wang, Guobing Gan, Haomiao Tang, Jiale Cheng, Ji Qi, Junhui Ji, Lihang Pan, et al. 2025. GLM-4.5V and GLM-4.1V- Thinking: Towards Versatile Multimodal Reasoning with Scalable Reinforcement Learning. arXiv:2507.01006 [cs.CV]

  19. [19]

    Enamul Hoque, Parsa Kavehzadeh, and Ahmed Masry. 2022. Chart Question Answering: State of the Art and Future Directions.Computer Graphics Forum41, 3 (2022), 555–572. doi:10.1111/cgf.14573

  20. [20]

    Eric Horvitz. 1999. Principles of mixed-initiative user interfaces. InProceedings of the SIGCHI Conference on Human Factors in Computing Systems(Pittsburgh, Pennsylvania, USA)(CHI ’99). Association for Computing Machinery, New York, NY, USA, 159–166. doi:10.1145/302979.303030

  21. [21]

    Edward J Hu, Yelong Shen, Phillip Wallis, Zeyuan Allen-Zhu, Yuanzhi Li, Shean Wang, Lu Wang, Weizhu Chen, et al. 2022. Lora: Low-rank adaptation of large language models.ICLR1, 2 (2022), 3

  22. [22]

    Bakker, Stephen Li, Tim Kraska, and César Hidalgo

    Kevin Hu, Michiel A. Bakker, Stephen Li, Tim Kraska, and César Hidalgo. 2019. VizML: A Machine Learning Approach to Visualization Recommendation. In Proceedings of the 2019 CHI Conference on Human Factors in Computing Systems (Glasgow, Scotland Uk)(CHI ’19). Association for Computing Machinery, New York, NY, USA, 1–12. doi:10.1145/3290605.3300358

  23. [23]

    Gaikwad, Madelon Hulsebos, Michiel A

    Kevin Hu, Snehalkumar ’Neil’ S. Gaikwad, Madelon Hulsebos, Michiel A. Bakker, Emanuel Zgraggen, César Hidalgo, Tim Kraska, Guoliang Li, Arvind Satya- narayan, and Çağatay Demiralp. 2019. VizNet: Towards A Large-Scale Visu- alization Learning and Benchmarking Repository. InProceedings of the 2019 CHI Conference on Human Factors in Computing Systems(Glasgow...

  24. [24]

    Kung-Hsiang Huang, Hou Pong Chan, May Fung, Haoyi Qiu, Mingyang Zhou, Shafiq Joty, Shih-Fu Chang, and Heng Ji. 2025. From Pixels to Insights: A Survey on Automatic Chart Understanding in the Era of Large Foundation Models. IEEE Transactions on Knowledge and Data Engineering37, 5 (2025), 2550–2568. doi:10.1109/TKDE.2024.3513320

  25. [25]

    Kung-Hsiang Huang, Mingyang Zhou, Hou Pong Chan, Yi Fung, Zhenhailong Wang, Lingyu Zhang, Shih-Fu Chang, and Heng Ji. 2024. Do LVLMs Understand Charts? Analyzing and Correcting Factual Errors in Chart Captioning. InFindings of the Association for Computational Linguistics: ACL 2024. Association for Compu- tational Linguistics, Bangkok, Thailand, 730–749. ...

  26. [26]

    Aaron Hurst, Adam Lerer, Adam P Goucher, Adam Perelman, Aditya Ramesh, Aidan Clark, AJ Ostrow, Akila Welihinda, Alan Hayes, Alec Radford, et al. 2024. GPT-4o System Card. arXiv:2410.21276 [cs.CL]

  27. [27]

    OpenAI Inc. 2025. Introducing GPT-4.1 in the API. https://openai.com/index/gpt- 4-1/

  28. [28]

    Mohaiminul Islam and Shangzhu Jin. 2019. An Overview of Data Visualization. In2019 International Conference on Information Science and Communications Technologies (ICISCT). 1–7. doi:10.1109/ICISCT47635.2019.9012031

  29. [29]

    Mohammed Saidul Islam, Raian Rahman, Ahmed Masry, Md Tahmid Rahman Laskar, Mir Tafseer Nayeem, and Enamul Hoque. 2024. Are Large Vision Language Models up to the Challenge of Chart Comprehension and Reasoning. InFindings of the Association for Computational Linguistics: EMNLP 2024. Association for Computational Linguistics, Miami, Florida, USA, 3334–3368....

  30. [30]

    Hyeon Jeon, Hyunwook Lee, Yun-Hsin Kuo, Taehyun Yang, Daniel Archambault, Sungahn Ko, Takanori Fujiwara, Kwan-Liu Ma, and Jinwook Seo. 2025. Unveil- ing High-dimensional Backstage: A Survey for Reliable Visual Analytics with Dimensionality Reduction. InProceedings of the 2025 CHI Conference on Human Factors in Computing Systems (CHI ’25). Association for ...

  31. [31]

    Daekyoung Jung, Wonjae Kim, Hyunjoo Song, Jeong-in Hwang, Bongshin Lee, Bohyoung Kim, and Jinwook Seo. 2017. ChartSense: Interactive Data Extraction from Chart Images. InProceedings of the 2017 CHI Conference on Human Factors in Computing Systems(Denver, Colorado, USA)(CHI ’17). Association for Computing Machinery, New York, NY, USA, 6706–6717. doi:10.114...

  32. [32]

    Kushal Kafle, Brian Price, Scott Cohen, and Christopher Kanan. 2018. DVQA: Understanding Data Visualizations via Question Answering. InProceedings of the IEEE Conference on Computer Vision and Pattern Recognition (CVPR)

  33. [33]

    Samira Ebrahimi Kahou, Vincent Michalski, Adam Atkinson, Akos Kadar, Adam Trischler, and Yoshua Bengio. 2018. FigureQA: An Annotated Figure Dataset for Visual Reasoning. arXiv:1710.07300 [cs.CV]

  34. [34]

    UW Interactive Data Lab. 2025. Vega-Lite Example Gallery. https://vega.github. io/vega-lite/examples/

  35. [35]

    Yijie Lian, Jianing Hao, Wei Zeng, and Qiong Luo. 2025. A survey of visual insight mining: Connecting data and insights via visualization.Visual Informatics9, 4 (2025), 100271. doi:10.1016/j.visinf.2025.100271

  36. [36]

    Fangyu Liu, Julian Martin Eisenschlos, Francesco Piccinno, Syrine Krichene, Chenxi Pang, Kenton Lee, Mandar Joshi, Wenhu Chen, Nigel Collier, and Yasemin Altun. 2023. DePlot: One-shot visual language reasoning by plot-to-table transla- tion. arXiv:2212.10505 [cs.CL]

  37. [37]

    Fangyu Liu, Francesco Piccinno, Syrine Krichene, Chenxi Pang, Kenton Lee, Man- dar Joshi, Yasemin Altun, Nigel Collier, and Julian Eisenschlos. 2023. MatCha: Enhancing Visual Language Pretraining with Math Reasoning and Chart Deren- dering. InProceedings of the 61st Annual Meeting of the Association for Com- putational Linguistics (Volume 1: Long Papers)....

  38. [38]

    Haotian Liu, Chunyuan Li, Yuheng Li, and Yong Jae Lee. 2024. Improved Baselines with Visual Instruction Tuning. InProceedings of the IEEE/CVF Conference on CHI ’26, April 13–17, 2026, Barcelona, Spain He et al. Computer Vision and Pattern Recognition (CVPR). 26296–26306

  39. [39]

    Haotian Liu, Chunyuan Li, Yuheng Li, Bo Li, Yuanhan Zhang, Sheng Shen, and Yong Jae Lee. 2024. LLaVA-NeXT: Improved reasoning, OCR, and world knowl- edge. https://llava-vl.github.io/blog/2024-01-30-llava-next/

  40. [40]

    Haotian Liu, Chunyuan Li, Qingyang Wu, and Yong Jae Lee. 2023. Visual in- struction tuning.Advances in neural information processing systems36 (2023), 34892–34916

  41. [41]

    Sheng Liu, Haotian Ye, Lei Xing, and James Zou. 2024. Reducing Hallucinations in Vision-Language Models via Latent Space Steering. arXiv:2410.15778 [cs.CV]

  42. [42]

    Yuan Liu, Haodong Duan, Yuanhan Zhang, Bo Li, Songyang Zhang, Wangbo Zhao, Yike Yuan, Jiaqi Wang, Conghui He, Ziwei Liu, Kai Chen, and Dahua Lin

  43. [43]

    InComputer Vision – ECCV 2024

    MMBench: Is Your Multi-modal Model an All-Around Player?. InComputer Vision – ECCV 2024. Springer Nature Switzerland, Cham, 216–233

  44. [44]

    Automeris LLC. 2024. WebPlotDigitizer. https://automeris.io

  45. [45]

    Junyu Luo, Zekun Li, Jinpeng Wang, and Chin-Yew Lin. 2021. ChartOCR: Data Extraction From Charts Images via a Deep Hybrid Framework. InProceedings of the IEEE/CVF Winter Conference on Applications of Computer Vision (W ACV). 1917–1925

  46. [46]

    Yuyu Luo, Jiawei Tang, and Guoliang Li. 2021. nvBench: A Large-Scale Syn- thesized Dataset for Cross-Domain Natural Language to Visualization Task. arXiv:2112.12926 [cs.HC]

  47. [47]

    Ahmed Masry, Mohammed Saidul Islam, Mahir Ahmed, Aayush Bajaj, Firoz Kabir, Aaryaman Kartha, Md Tahmid Rahman Laskar, Mizanur Rahman, Shadikur Rahman, Mehrad Shahmohammadi, et al. 2025. ChartQAPro: A More Diverse and Challenging Benchmark for Chart Question Answering. arXiv:2504.05506 [cs.CL]

  48. [48]

    Ahmed Masry, Parsa Kavehzadeh, Xuan Long Do, Enamul Hoque, and Shafiq Joty. 2023. UniChart: A Universal Vision-language Pretrained Model for Chart Comprehension and Reasoning. arXiv:2305.14761 [cs.CL]

  49. [49]

    Ahmed Masry, Do Xuan Long, Jia Qing Tan, Shafiq Joty, and Enamul Hoque. 2022. ChartQA: A Benchmark for Question Answering about Charts with Visual and Logical Reasoning. InFindings of the Association for Computational Linguistics: ACL 2022. Association for Computational Linguistics, Dublin, Ireland, 2263–2279. doi:10.18653/v1/2022.findings-acl.177

  50. [50]

    Ahmed Masry, Mehrad Shahmohammadi, Md Rizwan Parvez, Enamul Hoque, and Shafiq Joty. 2024. ChartInstruct: Instruction Tuning for Chart Comprehension and Reasoning. InFindings of the Association for Computational Linguistics: ACL 2024. Association for Computational Linguistics, Bangkok, Thailand, 10387–10409. doi:10.18653/v1/2024.findings-acl.619

  51. [51]

    Damien Masson, Sylvain Malacria, Daniel Vogel, Edward Lank, and Géry Casiez

  52. [52]

    InProceedings of the 2023 CHI Conference on Hu- man Factors in Computing Systems(Hamburg, Germany)(CHI ’23)

    ChartDetective: Easy and Accurate Interactive Data Extraction from Complex Vector Charts. InProceedings of the 2023 CHI Conference on Hu- man Factors in Computing Systems(Hamburg, Germany)(CHI ’23). Associa- tion for Computing Machinery, New York, NY, USA, Article 147, 17 pages. doi:10.1145/3544548.3581113

  53. [53]

    Nacenta, and Sebastien Vandenheste

    Gonzalo Gabriel Méndez, Miguel A. Nacenta, and Sebastien Vandenheste. 2016. iVoLVER: Interactive Visual Language for Visualization Extraction and Recon- struction. InProceedings of the 2016 CHI Conference on Human Factors in Com- puting Systems(San Jose, California, USA)(CHI ’16). Association for Computing Machinery, New York, NY, USA, 4073–4085. doi:10.1...

  54. [54]

    Fanqing Meng, Wenqi Shao, Quanfeng Lu, Peng Gao, Kaipeng Zhang, Yu Qiao, and Ping Luo. 2024. ChartAssisstant: A Universal Chart Multimodal Lan- guage Model via Chart-to-Table Pre-training and Multitask Instruction Tuning. arXiv:2401.02384 [cs.CV]

  55. [55]

    Khapra, and Pratyush Kumar

    Nitesh Methani, Pritha Ganguly, Mitesh M. Khapra, and Pratyush Kumar. 2020. PlotQA: Reasoning over Scientific Plots. InProceedings of the IEEE/CVF Winter Conference on Applications of Computer Vision (W ACV)

  56. [56]

    Kushin Mukherjee, Donghao Ren, Dominik Moritz, and Yannick Assogba. 2025. EncQA: Benchmarking Vision-Language Models on Visual Encodings for Charts. IEEE Transactions on Visualization and Computer Graphics(2025), 1–11. doi:10. 1109/TVCG.2025.3634249

  57. [57]

    Donald A. Norman. 1994. How might people interact with agents.Commun. ACM37, 7 (July 1994), 68–71. doi:10.1145/176789.176796

  58. [58]

    Jason Obeid and Enamul Hoque. 2020. Chart-to-Text: Generating Natu- ral Language Descriptions for Charts by Adapting the Transformer Model. arXiv:2010.09142 [cs.CL]

  59. [59]

    PlotDigitizer. 2025. PlotDigitizer. https://plotdigitizer.com

  60. [60]

    Tahmid Rahman Laskar, Md

    Raian Rahman, Rizvi Hasan, Abdullah Al Farhad, Md. Tahmid Rahman Laskar, Md. Hamjajul Ashmafee, and Abu Raihan Mostofa Kamal. 2023. ChartSumm: A Comprehensive Benchmark for Automatic Chart Summarization of Long and Short Summaries.Proceedings of the Canadian Conference on Artificial Intelligence (June 2023). doi:10.21428/594757db.0b1f96f6

  61. [61]

    Arvind Satyanarayan, Dominik Moritz, Kanit Wongsuphasawat, and Jeffrey Heer

  62. [62]

    doi:10.1109/TVCG

    Vega-Lite: A Grammar of Interactive Graphics.IEEE Transactions on Visualization and Computer Graphics23, 1 (2017), 341–350. doi:10.1109/TVCG. 2016.2599030

  63. [63]

    Manolis Savva, Nicholas Kong, Arti Chhajta, Li Fei-Fei, Maneesh Agrawala, and Jeffrey Heer. 2011. ReVision: automated classification, analysis and redesign of chart images. InProceedings of the 24th Annual ACM Symposium on User Interface Software and Technology(Santa Barbara, California, USA)(UIST ’11). Association for Computing Machinery, New York, NY, U...

  64. [64]

    Danqing Shi, Yao Wang, Yunpeng Bai, Andreas Bulling, and Antti Oulasvirta. 2025. Chartist: Task-driven Eye Movement Control for Chart Reading. InProceedings of the 2025 CHI Conference on Human Factors in Computing Systems (CHI ’25). Association for Computing Machinery, New York, NY, USA, Article 1167, 14 pages. doi:10.1145/3706598.3713128

  65. [65]

    Benny Tang, Angie Boggust, and Arvind Satyanarayan. 2023. VisText: A Bench- mark for Semantically Rich Chart Captioning. InProceedings of the 61st Annual Meeting of the Association for Computational Linguistics (Volume 1: Long Pa- pers). Association for Computational Linguistics, Toronto, Canada, 7268–7298. doi:10.18653/v1/2023.acl-long.401

  66. [66]

    Yuan Tian, Weiwei Cui, Dazhen Deng, Xinjing Yi, Yurun Yang, Haidong Zhang, and Yingcai Wu. 2025. ChartGPT: Leveraging LLMs to Generate Charts From Abstract Natural Language.IEEE Transactions on Visualization and Computer Graphics31, 3 (2025), 1731–1745. doi:10.1109/TVCG.2024.3368621

  67. [67]

    Yuan Tian, Dazhen Deng, Sen Yang, Huawei Zheng, Bowen Shi, Kai Xiong, Xinjing Yi, and Yingcai Wu. 2025. NoteFlow: Recommending Charts as Sight Glasses for Tracing Data Flow in Computational Notebooks. arXiv:2502.02326 [cs.HC]

  68. [68]

    Yuan Tian, Chuhan Zhang, Xiaotong Wang, Sitong Pan, Weiwei Cui, Haidong Zhang, Dazhen Deng, and Yingcai Wu. 2025. ReSpark: Leveraging Previous Data Reports as References to Generate New Reports with LLMs. InProceedings of the 38th Annual ACM Symposium on User Interface Software and Technology (UIST ’25). Association for Computing Machinery, New York, NY, ...

  69. [69]

    B. Tummers. 2006. DataThief III. https://datathief.org

  70. [70]

    Zirui Wang, Mengzhou Xia, Luxi He, Howard Chen, Yitao Liu, Richard Zhu, Kaiqu Liang, Xindi Wu, Haotian Liu, Sadhika Malladi, et al. 2024. Charxiv: Charting gaps in realistic chart understanding in multimodal llms.Advances in Neural Information Processing Systems37 (2024), 113569–113697

  71. [71]

    Jiayang Wu, Wensheng Gan, Zefeng Chen, Shicheng Wan, and Philip S. Yu

  72. [72]

    In2023 IEEE International Conference on Big Data (BigData)

    Multimodal Large Language Models: A Survey. In2023 IEEE International Conference on Big Data (BigData). 2247–2256. doi:10.1109/BigData59044.2023. 10386743

  73. [73]

    Yifan Wu, Lutao Yan, Leixian Shen, Yunhai Wang, Nan Tang, and Yuyu Luo. 2024. ChartInsights: Evaluating Multimodal Large Language Models for Low-Level Chart Question Answering. arXiv:2405.07001 [cs.CL]

  74. [74]

    Renqiu Xia, Haoyang Peng, Hancheng Ye, Mingsheng Li, Xiangchao Yan, Peng Ye, Botian Shi, Yu Qiao, Junchi Yan, and Bo Zhang. 2024. StructChart: On the Schema, Metric, and Augmentation for Visual Chart Understanding. arXiv:2309.11268 [cs.CV]

  75. [75]

    Zhengzhuo Xu, Sinan Du, Yiyan Qi, Chengjin Xu, Chun Yuan, and Jian Guo

  76. [76]

    arXiv:2312.15915 [cs.CV]

    ChartBench: A Benchmark for Complex Visual Reasoning in Charts. arXiv:2312.15915 [cs.CV]

  77. [77]

    Zhengzhuo Xu, Bowen Qu, Yiyan Qi, Sinan Du, Chengjin Xu, Chun Yuan, and Jian Guo. 2025. ChartMoE: Mixture of Diversely Aligned Expert Connector for Chart Understanding. arXiv:2409.03277 [cs.AI]

  78. [78]

    Cheng Yang, Chufan Shi, Yaxin Liu, Bo Shui, Junjie Wang, Mohan Jing, Linran Xu, Xinyu Zhu, Siheng Li, Yuxiang Zhang, et al . 2025. ChartMimic: Evaluat- ing LMM’s Cross-Modal Reasoning Capability via Chart-to-Code Generation. arXiv:2406.09961 [cs.SE]

  79. [79]

    Yilin Ye, Jianing Hao, Yihan Hou, Zhan Wang, Shishi Xiao, Yuyu Luo, and Wei Zeng. 2024. Generative AI for visualization: State of the art and future directions. Visual Informatics8, 2 (2024), 43–66. doi:10.1016/j.visinf.2024.04.003

  80. [80]

    Shukang Yin, Chaoyou Fu, Sirui Zhao, Ke Li, Xing Sun, Tong Xu, and Enhong Chen. 2024. A survey on multimodal large language models.National Science Review11, 12 (11 2024), nwae403. doi:10.1093/nsr/nwae403

Showing first 80 references.