pith. machine review for the scientific record. sign in

arxiv: 2604.03406 · v1 · submitted 2026-04-03 · 💻 cs.GR

Recognition: 2 theorem links

· Lean Theorem

SASAV: Self-Directed Agent for Scientific Analysis and Visualization

Authors on Pith no claims yet

Pith reviewed 2026-05-13 17:48 UTC · model grok-4.3

classification 💻 cs.GR
keywords scientificdatavisualizationanalysisagentsasavexplorationfeedback
0
0 comments X

The pith

SASAV introduces the first fully autonomous multi-agent system for scientific data analysis and visualization that operates without external prompting or human-in-the-loop feedback.

A machine-rendered reading of the paper's core claim, the machinery that carries it, and where it could break.

Recent multimodal large language models can understand data and reason visually. Earlier agents for scientific visualization still needed domain experts to supply dataset-specific knowledge or relied on iterative human feedback to guide the process. This made them slow and hard to scale for big data. SASAV is built as a self-directed multi-agent system with three main parts: automated data profiling to inspect the data structure, context-aware knowledge retrieval to pull relevant information on its own, and reasoning-driven exploration of visualization parameters. These agents work together to produce visualizations and support interactive tasks with no human input at any stage. The paper positions this as a foundational step toward AI systems that can accelerate discovery by handling large-scale scientific data autonomously.

Core claim

SASAV is the first fully autonomous AI agent to perform scientific data analysis and generate insightful visualizations without any external prompting or HITL feedback.

Load-bearing premise

That current frontier multimodal LLMs possess sufficient reliable capabilities for automated data profiling, context-aware knowledge retrieval, and reasoning-driven visualization parameter exploration in scientific contexts without human oversight or errors.

Figures

Figures reproduced from arXiv: 2604.03406 by David Lenz, Hongfeng Yu, Jianxin Sun, Tom Peterka.

Figure 1
Figure 1. Figure 1: SASAV workflow to automatically generate visualization results for AbdomenAtlas 1.0 Mini datasets: (a) Direct volume rendering [PITH_FULL_IMAGE:figures/full_fig_p001_1.png] view at source ↗
Figure 2
Figure 2. Figure 2: Evolution of AI for Science as reactive tools, relying on human-in-the-loop (HITL) input to provide prior knowledge of the target dataset and heuristic to direct data explo￾ration. SciVis code generation LLM assistants, such as ChatVis [32], rely on users to provide a complete and specific description of visual￾ization parameters, including the viewpoints, transfer functions (TFs), lighting, and visualizat… view at source ↗
Figure 3
Figure 3. Figure 3: Architecture of SASAV. N is the number of RSV selected for initial rendering, M is the number of isovalues selected for isosurface rendering, and K is the number of viewpoints sampled on the view surfaces [PITH_FULL_IMAGE:figures/full_fig_p004_3.png] view at source ↗
Figure 5
Figure 5. Figure 5: Forager agentic workflow to retrieve knowledge about the regions [PITH_FULL_IMAGE:figures/full_fig_p004_5.png] view at source ↗
Figure 6
Figure 6. Figure 6: Detailed workflow of Semantic Analyzer (SA) and Transfer Function Designer (TFD) on simulated scientific data (Flame dataset). SA conducts [PITH_FULL_IMAGE:figures/full_fig_p005_6.png] view at source ↗
Figure 7
Figure 7. Figure 7: Example workflow color and opacity mapping suggestion on empirical scientific data (AbdomenAtlas 1.0 Mini dataset). The SA output shown [PITH_FULL_IMAGE:figures/full_fig_p006_7.png] view at source ↗
Figure 8
Figure 8. Figure 8: View selection process is to recommend the anchor views, the most informative views, and avoid views that are redundant or have occlusion. [PITH_FULL_IMAGE:figures/full_fig_p007_8.png] view at source ↗
Figure 9
Figure 9. Figure 9: User interface of SASAV [PITH_FULL_IMAGE:figures/full_fig_p008_9.png] view at source ↗
Figure 10
Figure 10. Figure 10: Final visualization image generated by SASAV and its suggested visualization parameters of TFs, anchor viewpoints, and exploratory [PITH_FULL_IMAGE:figures/full_fig_p009_10.png] view at source ↗
Figure 11
Figure 11. Figure 11: Time consumption of SASAV for each step across all 5 datasets. [PITH_FULL_IMAGE:figures/full_fig_p009_11.png] view at source ↗
Figure 12
Figure 12. Figure 12: Token usage of SASAV for each step across all 5 datasets. Input [PITH_FULL_IMAGE:figures/full_fig_p009_12.png] view at source ↗
read the original abstract

With recent advances in frontier multimodal large language models (MLLMs) for data understanding and visual reasoning, the role of LLMs has evolved from passive LLM-as-an-interface to proactive LLM-as-a-judge, enabling deeper integration into the scientific data analysis and visualization pipelines. However, existing scientific visualization agents still rely on domain experts to provide prior knowledge for specific datasets or visualization-oriented objective functions to guide the workflow through iterative feedback. This reactive, data-dependent, human-in-the-loop (HITL) paradigm is time-consuming and does not scale effectively to large-scale scientific data. In this work, we propose a Self-Directed Agent for Scientific Analysis and Visualization (SASAV), the first fully autonomous AI agent to perform scientific data analysis and generate insightful visualizations without any external prompting or HITL feedback. SASAV is a multi-agent system that automatically orchestrates data exploration workflows through our proposed components, including automated data profiling, context-aware knowledge retrieval, and reasoning-driven visualization parameter exploration, while supporting downstream interactive visualization tasks. This work establishes a foundational building block for the future AI for Science to accelerate scientific discovery and innovation at scale.

Editorial analysis

A structured set of objections, weighed in public.

Desk editor's note, referee report, simulated authors' rebuttal, and a circularity audit. Tearing a paper down is the easy half of reading it; the pith above is the substance, this is the friction.

Referee Report

2 major / 1 minor

Summary. The manuscript proposes SASAV, a multi-agent system built on frontier multimodal LLMs that claims to be the first fully autonomous agent for scientific data analysis and visualization. It operates without external prompting or human-in-the-loop feedback by orchestrating automated data profiling, context-aware knowledge retrieval, and reasoning-driven visualization parameter exploration, while also supporting downstream interactive tasks.

Significance. If the architecture can be shown to function reliably end-to-end on real scientific datasets, the work would provide a foundational step toward scalable, human-free AI pipelines in visualization and data-driven science, reducing dependence on domain-expert guidance.

major comments (2)
  1. [Abstract] Abstract: The claim that SASAV is 'the first fully autonomous AI agent to perform scientific data analysis and generate insightful visualizations without any external prompting or HITL feedback' rests entirely on an untested architectural description; no experiments, benchmarks, success rates, error analysis, or case studies on scientific data are supplied to substantiate autonomy or the 'first' designation.
  2. [Proposed Method] Proposed Method (multi-agent orchestration): The components for automated data profiling, context-aware retrieval, and reasoning-driven parameter exploration are presented at a conceptual level only, with no algorithms, prompt templates, decision procedures, or integration details that would allow assessment of whether the system can actually close the loop without human intervention or errors.
minor comments (1)
  1. [Abstract] Abstract: The phrase 'supporting downstream interactive visualization tasks' is mentioned but never expanded; a brief description of the interface or hand-off mechanism would improve clarity.

Simulated Author's Rebuttal

2 responses · 0 unresolved

We thank the referee for the constructive comments on our manuscript. We address each major comment below with clarifications on the current scope and specific plans for revision to strengthen the empirical grounding and implementation details.

read point-by-point responses
  1. Referee: [Abstract] Abstract: The claim that SASAV is 'the first fully autonomous AI agent to perform scientific data analysis and generate insightful visualizations without any external prompting or HITL feedback' rests entirely on an untested architectural description; no experiments, benchmarks, success rates, error analysis, or case studies on scientific data are supplied to substantiate autonomy or the 'first' designation.

    Authors: We agree that the abstract claim requires empirical support to be fully substantiated. The current manuscript focuses on introducing the novel architecture as a foundational proposal, with the 'first' designation referring to the absence of any prior system that achieves end-to-end autonomy without external prompting or HITL across data profiling, retrieval, and visualization parameter exploration. In the revised version, we will add preliminary case studies on real scientific datasets (e.g., from astrophysics and materials science), including quantitative success rates, error analysis, and comparisons to HITL baselines to better substantiate the autonomy and novelty claims. revision: yes

  2. Referee: [Proposed Method] Proposed Method (multi-agent orchestration): The components for automated data profiling, context-aware retrieval, and reasoning-driven parameter exploration are presented at a conceptual level only, with no algorithms, prompt templates, decision procedures, or integration details that would allow assessment of whether the system can actually close the loop without human intervention or errors.

    Authors: The method section currently emphasizes the high-level multi-agent orchestration to convey the overall workflow and its departure from prior HITL approaches. We acknowledge that additional low-level details are necessary for assessing closed-loop feasibility. In the revision, we will expand this section with pseudocode for each component, example prompt templates used by the agents, explicit decision procedures for reasoning-driven parameter exploration, and a detailed integration diagram showing error-handling mechanisms that enable fully autonomous operation without human intervention. revision: yes

Circularity Check

0 steps flagged

No significant circularity in SASAV architectural proposal

full rationale

The manuscript proposes a multi-agent orchestration of existing multimodal LLM capabilities for data profiling, knowledge retrieval, and visualization parameter exploration. No equations, fitted parameters, or predictive models appear in the provided text. The autonomy claim is framed as a new workflow composition rather than a derivation that reduces to prior inputs by definition or self-citation. No load-bearing self-citations, uniqueness theorems, or ansatzes imported from the authors' prior work are invoked to justify core components. The system description remains self-contained against external benchmarks and does not exhibit any of the enumerated circularity patterns.

Axiom & Free-Parameter Ledger

0 free parameters · 1 axioms · 1 invented entities

The central claim rests on the domain assumption that frontier MLLMs can reliably handle scientific reasoning tasks autonomously; no free parameters are introduced because this is an architectural proposal rather than a fitted model.

axioms (1)
  • domain assumption Frontier multimodal LLMs can perform reliable automated data profiling, context-aware knowledge retrieval, and reasoning-driven visualization without external prompting or human feedback
    Invoked throughout the abstract as the enabling capability for the fully autonomous workflow.
invented entities (1)
  • SASAV multi-agent system no independent evidence
    purpose: To automatically orchestrate data exploration workflows including profiling, knowledge retrieval, and visualization parameter exploration
    New system architecture proposed in the paper with no independent evidence of functionality provided.

pith-pipeline@v0.9.0 · 5499 in / 1347 out tokens · 133275 ms · 2026-05-13T17:48:43.723141+00:00 · methodology

discussion (0)

Sign in with ORCID, Apple, or X to comment. Anyone can read and Pith papers without signing in.

Reference graph

Works this paper leans on

55 extracted references · 55 canonical work pages · 1 internal anchor

  1. [1]

    Ahrens, B

    J. Ahrens, B. Geveci, and C. Law. Paraview: An end-user tool for large data visualization.The visualization handbook, 717(8), 2005. 2

  2. [2]

    K. Ai, H. Miao, Z. Li, C. Wang, and S. Liu. An evaluation-centric paradigm for scientific visualization agents.arXiv preprint arXiv:2509.15160, 2025. 9

  3. [3]

    K. Ai, K. Tang, and C. Wang. Nli4volvis: Natural language interaction for volume visualization via llm multi-agents and editable 3d gaussian splatting.IEEE Transactions on Visualization and Computer Graphics, pp. 1–11, 2025. doi: 10.1109/TVCG.2025.3633888 1, 2

  4. [4]

    Arnab, M

    A. Arnab, M. Dehghani, G. Heigold, C. Sun, M. Lu ˇci´c, and C. Schmid. Vivit: A video vision transformer. InProceedings of the IEEE/CVF International Conference on Computer Vision (ICCV), pp. 6836–6846, October 2021. 9

  5. [5]

    Berger and S

    M. Berger and S. Liu. The visualization judge: Can multimodal foundation models guide visualization design through visual perception? In2024 IEEE Evaluation and Beyond - Methodological Approaches for Visualiza- tion (BELIV), pp. 60–70, 2024. doi: 10.1109/BELIV64461.2024.00012 5

  6. [6]

    Bordoloi and H.-W

    U. Bordoloi and H.-W. Shen. View selection for volume rendering. InVIS

  7. [7]

    487–494, 2005

    IEEE Visualization, 2005., pp. 487–494, 2005. doi: 10.1109/VISUAL. 2005.1532833 3, 6

  8. [8]

    D. Chen, R. Chen, S. Zhang, Y . Wang, Y . Liu, H. Zhou, Q. Zhang, Y . Wan, P. Zhou, and L. Sun. Mllm-as-a-judge: assessing multimodal llm-as- a-judge with vision-language benchmark. InProceedings of the 41st International Conference on Machine Learning, ICML’24, article no. 254, 34 pages. JMLR.org, 2024. 2

  9. [9]

    N. Chen, Y . Zhang, J. Xu, K. Ren, and Y . Yang. Viseval: A benchmark for data visualization in the era of large language models.IEEE Transactions on Visualization and Computer Graphics, 31(1):1301–1311, 2025. doi: 10 .1109/TVCG.2024.3456320 2

  10. [10]

    ARC-AGI-2: A New Challenge for Frontier AI Reasoning Systems

    F. Chollet, M. Knoop, G. Kamradt, B. Landers, and H. Pinkard. Arc- agi-2: A new challenge for frontier ai reasoning systems.arXiv preprint arXiv:2505.11831, 2025. 5

  11. [11]

    C. D. Correa and K.-L. Ma. Visibility-driven transfer functions. In2009 IEEE Pacific Visualization Symposium, pp. 177–184, 2009. doi: 10.1109/ PACIFICVIS.2009.4906854 3

  12. [12]

    W. Cui. Visual analytics: A comprehensive overview.IEEE Access, 7:81555–81573, 2019. doi: 10.1109/ACCESS.2019.2923736 1

  13. [13]

    Dhanoa, A

    V . Dhanoa, A. Wolter, G. M. León, H.-J. Schulz, and N. Elmqvist. Agentic visualization: Extracting agent-based design patterns from visualization systems.IEEE Computer Graphics and Applications, 45(6):89–100, 2025. doi: 10.1109/MCG.2025.3607741 1, 5, 7

  14. [14]

    V . Dibia. LIDA: A tool for automatic generation of grammar-agnostic visu- alizations and infographics using large language models. In D. Bollegala, R. Huang, and A. Ritter, eds.,Proceedings of the 61st Annual Meet- ing of the Association for Computational Linguistics (Volume 3: System Demonstrations), pp. 113–126. Association for Computational Linguistic...

  15. [15]

    Engel, L

    D. Engel, L. Sick, and T. Ropinski. Leveraging self-supervised vision transformers for segmentation-based transfer function design.IEEE Trans- actions on Visualization and Computer Graphics, 31(8):4357–4368, 2025. doi: 10.1109/TVCG.2024.3401755 3

  16. [16]

    S. Goel, R. J. Lee, and K. Ramchandran. Sage: A realistic benchmark for semantic understanding.arXiv preprint arXiv:2509.21310, 2025. 4

  17. [17]

    J. Gu, X. Jiang, Z. Shi, H. Tan, X. Zhai, C. Xu, W. Li, Y . Shen, S. Ma, H. Liu, S. Wang, K. Zhang, Z. Lin, B. Zhang, L. Ni, W. Gao, Y . Wang, and J. Guo. A survey on llm-as-a-judge.The Innovation, p. 101253, 2026. doi: 10.1016/j.xinn.2025.101253 2

  18. [18]

    Y . Guo, D. Shi, M. Guo, Y . Wu, N. Cao, and Q. Chen. Talk2data: A natural language interface for exploratory visual analysis via question decomposition.ACM Trans. Interact. Intell. Syst., 14(2), article no. 8, 24 pages, Apr. 2024. doi: 10.1145/3643894 2

  19. [19]

    X. Hou, Y . Zhao, S. Wang, and H. Wang. Model context protocol (mcp): Landscape, security threats, and future research directions.ACM Trans. Softw. Eng. Methodol., Feb. 2026. Just Accepted. doi: 10.1145/3796519 2

  20. [20]

    Humphrey, A

    W. Humphrey, A. Dalke, and K. Schulten. Vmd: Visual molecular dy- namics.Journal of Molecular Graphics, 14(1):33–38, 1996. doi: 10. 1016/0263-7855(96)00018-5 2

  21. [21]

    Jeong, J

    S. Jeong, J. Li, C. R. Johnson, S. Liu, and M. Berger. Text-based transfer function design for semantic volume rendering. In2024 IEEE Visualization and Visual Analytics (VIS), pp. 196–200, 2024. doi: 10.1109/VIS55277. 2024.00047 5

  22. [22]

    Langchain: The agent engineering platform

    LangChain. Langchain: The agent engineering platform. https:// github.com/langchain-ai/langchain, 2023. Accessed: 2026-03-

  23. [23]

    S. Liu, H. Miao, and P.-T. Bremer. Paraview-mcp: An autonomous visualization agent with direct tool use. In2025 IEEE Visualization and Visual Analytics (VIS), pp. 61–65, 2025. doi: 10.1109/VIS60296.2025. 00018 1, 2

  24. [24]

    S. Liu, H. Miao, Z. Li, M. Olson, V . Pascucci, and P.-T. Bremer. Ava: Towards autonomous visualization agents through visual perception-driven decision-making.Computer Graphics Forum, 43(3):e15093, 2024. doi: 10.1111/cgf.15093 2, 3, 5, 6

  25. [25]

    Ljung, J

    P. Ljung, J. Krüger, E. Groller, M. Hadwiger, C. D. Hansen, and A. Yn- nerman. State of the art in transfer functions for direct volume rendering. Computer Graphics Forum, 35(3):669–691, 2016. doi: 10.1111/cgf.12934 2

  26. [26]

    P. Lu, H. Bansal, T. Xia, J. Liu, C. Li, H. Hajishirzi, H. Cheng, K.-W. Chang, M. Galley, and J. Gao. Mathvista: Evaluating mathematical reason- ing of foundation models in visual contexts. InInternational Conference on Learning Representations (ICLR), 2024. 5

  27. [27]

    T. Luo, C. Huang, L. Shen, B. Li, S. Shen, W. Zeng, N. Tang, and Y . Luo. nvbench 2.0: A benchmark for natural language to visualization under ambiguity.arXiv preprint arXiv:2503.12880, 1(3):4, 2025. 2

  28. [28]

    H. T. M. Luong and V . T. Nguyen. Nl2vis transformed: From linguistic abstraction to visual specification in the generative ai era.SN Comput. Sci., 7(1), 19 pages, Dec. 2025. doi: 10.1007/s42979-025-04636-4 2

  29. [29]

    K.-L. Ma, J. Painter, C. Hansen, and M. Krogh. Parallel volume ren- dering using binary-swap compositing.IEEE Computer Graphics and Applications, 14(4):59–68, 1994. doi: 10.1109/38.291532 7

  30. [30]

    Maddigan and T

    P. Maddigan and T. Susnjak. Chat2vis: Generating data visualizations via natural language using chatgpt, codex and gpt-3 large language mod- els.IEEE Access, 11:45181–45193, 2023. doi: 10.1109/ACCESS.2023. 3274199 2

  31. [31]

    Ouyang, J

    G. Ouyang, J. Chen, Z. Nie, Y . Gui, Y . Wan, H. Zhang, and D. Chen. nvAgent: Automated data visualization from natural language via collab- orative agent workflow. In W. Che, J. Nabende, E. Shutova, and M. T. Pilehvar, eds.,Proceedings of the 63rd Annual Meeting of the Association for Computational Linguistics (Volume 1: Long Papers), pp. 19534–19567. As...

  32. [32]

    J. G. Pauloski, Y . Babuji, R. Chard, M. Sakarvadia, K. Chard, and I. Foster. Empowering scientific workflows with federated agents.arXiv preprint arXiv:2505.05428, 2025. 2, 9

  33. [33]

    Usher and V

    T. Peterka, T. Mallick, O. Yildiz, D. Lenz, C. Quammen, and B. Geveci. Chatvis: Large language model agent for generating scientific visual- izations. In2025 IEEE 15th Symposium on Large Data Analysis and Visualization (LDAV), pp. 22–32, 2025. doi: 10.1109/LDA V68558.2025. 00007 1, 2, 7

  34. [34]

    R. Ranjan. One word is not enough: Simple prompts improve word embeddings.arXiv preprint arXiv:2512.06744, 2025. 4

  35. [35]

    Serna-Serna, A

    W. Serna-Serna, A. M. Álvarez-Meza, and Á. Orozco-Gutiérrez. Fast semi-supervised t-sne for transfer function enhancement in direct volume rendering-based medical image visualization.Mathematics, 12(12), 2024. doi: 10.3390/math12121885 3

  36. [36]

    Z. Shao, Y . Shan, Y . He, Y . Yao, J. Wang, X. Zhang, Y . Zhang, and S. Chen. Do language model agents align with humans in rating visualizations? an empirical study.IEEE Computer Graphics and Applications, 45(6):14–28,

  37. [37]

    doi: 10.1109/MCG.2025.3586461 5

  38. [38]

    L. Shen, H. Li, Y . Wang, and H. Qu. From data to story: Towards automatic animated data video creation with llm-based multi-agent systems. In2024 IEEE VIS Workshop on Data Storytelling in an Era of Generative AI (GEN4DS), pp. 20–27, 2024. doi: 10.1109/GEN4DS63889.2024.00008 3

  39. [39]

    Shuai, B

    Z. Shuai, B. Li, S. Yan, Y . Luo, and W. Yang. Deepvis: Bridging natural language and data visualization through step-wise reasoning.IEEE Trans- actions on Visualization and Computer Graphics, 32(1):868–878, 2026. doi: 10.1109/TVCG.2025.3634645 2

  40. [40]

    S. Song, J. Chen, C. Li, and C. Wang. Gvqa: Learning to answer questions about graphs with visualizations via knowledge base. InProceedings of the 2023 CHI Conference on Human Factors in Computing Systems, CHI ’23, article no. 464, 16 pages. Association for Computing Machinery, New York, NY , USA, 2023. doi: 10.1145/3544548.3581067 4

  41. [41]

    J. Sun, D. Lenz, H. Yu, and T. Peterka. Scalable volume visualization for big scientific data modeled by functional approximation. In2023 IEEE International Conference on Big Data (BigData), pp. 905–914, 2023. doi: 10.1109/BigData59044.2023.10386434 7

  42. [42]

    Y . Tang, J. Bi, S. Xu, L. Song, S. Liang, T. Wang, D. Zhang, J. An, J. Lin, R. Zhu, A. V osoughi, C. Huang, Z. Zhang, P. Liu, M. Feng, F. Zheng, J. Zhang, P. Luo, J. Luo, and C. Xu. Video understanding with large language models: A survey.IEEE Transactions on Circuits and Systems for Video Technology, 36(2):1355–1376, 2026. doi: 10.1109/TCSVT.2025 .3566695 9

  43. [43]

    Y . Tian, W. Cui, D. Deng, X. Yi, Y . Yang, H. Zhang, and Y . Wu. Chartgpt: Leveraging LLMs to generate charts from abstract natural language.IEEE Transactions on Visualization and Computer Graphics, 31(3):1731–1745,

  44. [44]

    doi: 10.1109/TVCG.2024.3368621 2

  45. [45]

    Tylosky, A

    N. Tylosky, A. Knutas, and A. Wolff. Design practices in visualization driven data exploration for non-expert audiences.Comput. Sci. Rev., 56(C), 16 pages, May 2025. doi: 10.1016/j.cosrev.2025.100731 1

  46. [46]

    Van Der Spoel, E

    D. Van Der Spoel, E. Lindahl, B. Hess, G. Groenhof, A. E. Mark, and H. J. C. Berendsen. Gromacs: Fast, flexible, and free.Journal of Com- putational Chemistry, 26(16):1701–1718, 2005. doi: 10.1002/jcc.20291 2

  47. [47]

    C. Wang, J. Thompson, and B. Lee. Data formulator: Ai-powered concept- driven visualization authoring.IEEE Transactions on Visualization and Computer Graphics, 30(1):1128–1138, 2024. doi: 10.1109/TVCG.2023. 3326585 3

  48. [48]

    Y . Wang, B. Pan, K. Wang, H. Liu, J. Mao, Y . Liu, M. Zhu, B. Zhang, W. Chen, X. Huang, and W. Chen. Intuitf: Mllm-guided transfer function optimization for direct volume rendering.arXiv preprint arXiv:2506.18407, 2025. doi: 10.48550/arXiv.2506.18407 2, 3, 5

  49. [49]

    J. Wei, Y . Yang, X. Zhang, Y . Chen, X. Zhuang, Z. Gao, D. Zhou, G. Wang, Z. Gao, J. Cao, et al. From ai for science to agentic science: A survey on autonomous scientific discovery.arXiv preprint arXiv:2508.14111, 2025. 2

  50. [50]

    L. Weng, X. Wang, J. Lu, Y . Feng, Y . Liu, H. Feng, D. Huang, and W. Chen. Insightlens: Augmenting llm-powered data analysis with interactive in- sight management and navigation.IEEE Transactions on Visualization and Computer Graphics, 31(6):3719–3732, 2025. doi: 10.1109/TVCG. 2025.3567131 3

  51. [51]

    T. Wu, G. Yang, Z. Li, K. Zhang, Z. Liu, L. Guibas, D. Lin, and G. Wet- zstein. Gpt-4v(ision) is a human-aligned evaluator for text-to-3d gener- ation. In2024 IEEE/CVF Conference on Computer Vision and Pattern Recognition (CVPR), pp. 22227–22238, 2024. doi: 10.1109/CVPR52733. 2024.02098 2

  52. [52]

    Y . Ye, J. Hao, Y . Hou, Z. Wang, S. Xiao, Y . Luo, and W. Zeng. Gener- ative ai for visualization: State of the art and future directions.Visual Informatics, 8(2):43–66, 2024. doi: 10.1016/j.visinf.2024.04.003 2

  53. [53]

    X. Yue, T. Zheng, Y . Ni, Y . Wang, K. Zhang, S. Tong, Y . Sun, B. Yu, G. Zhang, H. Sun, Y . Su, W. Chen, and G. Neubig. MMMU-pro: A more robust multi-discipline multimodal understanding benchmark. In W. Che, J. Nabende, E. Shutova, and M. T. Pilehvar, eds.,Proceedings of the 63rd Annual Meeting of the Association for Computational Linguistics (Volume 1: ...

  54. [54]

    Y . Zhao, X. Shu, L. Fan, L. Gao, Y . Zhang, and S. Chen. Proactiveva: Proactive visual analytics with llm-based ui agent.IEEE Transactions on Visualization and Computer Graphics, 32(1):451–461, 2026. doi: 10. 1109/TVCG.2025.3642628 3

  55. [55]

    Y . Zhao, Y . Zhang, Y . Zhang, X. Zhao, J. Wang, Z. Shao, C. Turkay, and S. Chen. Leva: Using large language models to enhance visual analytics. IEEE Transactions on Visualization and Computer Graphics, 31(3):1830– 1847, 2025. doi: 10.1109/TVCG.2024.3368060 2