pith. machine review for the scientific record. sign in

arxiv: 2604.10008 · v1 · submitted 2026-04-11 · 💻 cs.HC

Recognition: unknown

Raiven: LLM-Based Visualization Authoring via Domain-Specific Language Mediation

Alexandra Irger, David Pugmire, Ella Hugie, Hanspeter Pfister, Kenneth Moreland, Minghao Guo, Simon Warchol, Wojciech Matusik

Authors on Pith no claims yet

Pith reviewed 2026-05-10 16:44 UTC · model grok-4.3

classification 💻 cs.HC
keywords visualization authoringlarge language modelsdomain-specific languagescientific visualizationinformation visualizationdeterministic compilationdata faithfulnessconversational interfaces
0
0 comments X

The pith

Raiven uses a domain-specific language to let LLMs create correct and verifiable visualizations for scientific and information data.

A machine-rendered reading of the paper's core claim, the machinery that carries it, and where it could break.

The paper presents Raiven as a conversational system that lets large language models author visualizations through natural language requests. Rather than generating code directly, the model produces a specification in RaivenDSL, a language that unifies scientific and information visualization for 2D, 3D, and tabular data. A deterministic compiler then translates the specification into executable D3 or VTK.js code. This design keeps the LLM working only on dataset metadata, so outputs are verifiable before running and data fabrication cannot occur. Benchmarks and expert studies show the method compiles reliably, runs faster and cheaper than direct LLM approaches, and reduces debugging effort for users.

Core claim

Raiven mediates visualization authoring through RaivenDSL, a formally defined domain-specific language. The LLM produces compact specifications under schema-guided constraints, and a deterministic compiler translates them to D3 or VTK.js code. Because the LLM operates only on metadata, specifications are verifiable before execution and data fabrication is impossible by construction. In a 100-task benchmark Raiven achieves 100% compilation success, runs up to six times faster and cheaper than state-of-the-art LLMs, and improves interaction quality, correctness, and data faithfulness; an expert user study confirms reduced debugging effort.

What carries the argument

RaivenDSL, the domain-specific language that unifies scientific and information visualization and serves as the constrained interface between the LLM and the deterministic compiler.

If this is right

  • The same authoring interface works for both scientific and information visualization tasks.
  • All generated visualizations can be inspected and verified before any code executes.
  • Data fabrication is prevented by design because the LLM never sees or generates raw data values.
  • Expert users spend significantly less time debugging than with direct LLM code generation.
  • The approach delivers up to sixfold gains in speed and cost while maintaining or improving output quality.

Where Pith is reading between the lines

These are editorial extensions of the paper, not claims the author makes directly.

  • The same mediation pattern could be applied to other LLM-driven tasks that currently suffer from non-deterministic or unverifiable outputs.
  • Limiting models to metadata-only interaction may become a practical safeguard for any visualization or analysis system where data integrity matters.
  • Extending the DSL to additional visualization types or backends would test how far the unification claim generalizes.

Load-bearing premise

Schema-guided constraints are sufficient for the LLM to reliably produce correct, complete RaivenDSL specifications that fully cover the requirements of both scientific and information visualizations without needing manual correction.

What would settle it

A benchmark task where the generated RaivenDSL specification compiles and runs but produces a visualization that misrepresents the underlying data or fails to satisfy the original natural-language request.

Figures

Figures reproduced from arXiv: 2604.10008 by Alexandra Irger, David Pugmire, Ella Hugie, Hanspeter Pfister, Kenneth Moreland, Minghao Guo, Simon Warchol, Wojciech Matusik.

Figure 1
Figure 1. Figure 1: Raiven Overview. The user describes a visualization task in natural language; the LLM translates the request into a schema￾constrained RaivenDSL specification, which the compiler deterministically transforms into an interactive visualization and control panel. Abstract—Visualization is central to scientific discovery, yet authoring tools remain split between information and scientific visualization, and ex… view at source ↗
Figure 2
Figure 2. Figure 2: Raiven pipeline. Left: in the interpretation phase, schema-mediated generation translates natural language into RaivenDSL through a sequence of validated stages. Right: in the execution phase, the compiler parses, validates, resolves backend-specific defaults, and generates executable code and interactive controls. The user can interact with the system at three points: natural language input (Describe), di… view at source ↗
Figure 3
Figure 3. Figure 3: Output of the running example: a volume rendering and synchro [PITH_FULL_IMAGE:figures/full_fig_p004_3.png] view at source ↗
Figure 4
Figure 4. Figure 4: Visualization types from published systems, recreated in RaivenDSL. (a) Mosaic [ [PITH_FULL_IMAGE:figures/full_fig_p005_4.png] view at source ↗
Figure 5
Figure 5. Figure 5: VMPC scoring for benchmark prompt #73: “Render the CT iso￾surface in pink from head.vti, then summarize head_sample.csv using a histogram of intensity colored pink.” (a) Raiven output (VMPC = 1.00). (b) Gemini output (VMPC = 0.53): marks and encodings are missing. Fractional scores (e.g. V = 0.67) reflect disagreement among graders. The table shows per-view component scores averaged across three human grad… view at source ↗
Figure 6
Figure 6. Figure 6: Fraction of participants at each Likert level. Rows (a)–(e) in [PITH_FULL_IMAGE:figures/full_fig_p008_6.png] view at source ↗
Figure 7
Figure 7. Figure 7: Comparing Raiven with the Baseline: (a) required less mental [PITH_FULL_IMAGE:figures/full_fig_p009_7.png] view at source ↗
Figure 8
Figure 8. Figure 8: Case 73 output from Raiven: a pink CT isosurface alongside a [PITH_FULL_IMAGE:figures/full_fig_p028_8.png] view at source ↗
Figure 9
Figure 9. Figure 9: Case 73 output from ChatGPT: a pink CT isosurface alongside [PITH_FULL_IMAGE:figures/full_fig_p028_9.png] view at source ↗
Figure 12
Figure 12. Figure 12: Case s1 output from ChatGPT: a blank screen with only a [PITH_FULL_IMAGE:figures/full_fig_p029_12.png] view at source ↗
Figure 13
Figure 13. Figure 13: Printed task sheet given to participants for Task 1 (InfoVis). The sheet specifies the target dashboard layout, required views, data files, [PITH_FULL_IMAGE:figures/full_fig_p056_13.png] view at source ↗
Figure 14
Figure 14. Figure 14: Printed task sheet given to participants for Task 2 (SciVis). The sheet specifies the target dashboard layout, required views, data files, [PITH_FULL_IMAGE:figures/full_fig_p057_14.png] view at source ↗
Figure 15
Figure 15. Figure 15: Printed task sheet given to participants for Task 3 (Combined). The sheet specifies the target dashboard layout, required views, data files, [PITH_FULL_IMAGE:figures/full_fig_p058_15.png] view at source ↗
read the original abstract

Visualization is central to scientific discovery, yet authoring tools remain split between information and scientific visualization, and expertise in one rarely transfers to the other. Large Language Model (LLM) based systems promise to bridge this gap through natural language, but current approaches generate code non-deterministically, with no guarantee of correctness and no protection against silent data fabrication. We present Raiven, a conversational system that mediates visualization authoring through a formally defined domain-specific language. RaivenDSL unifies scientific and information visualization in a single representation spanning 2D, 3D, and tabular data. The LLM produces a compact RaivenDSL specification under schema-guided constraints, and a deterministic compiler translates it to executable D3 or VTK.js code. Because the LLM operates only on dataset metadata, outputs are deterministic, specifications are verifiable before execution, and data fabrication is impossible by construction. In a 100-task benchmark, Raiven achieves 100% compilation, is up to six times faster and six times cheaper than state-of-the-art LLMs, while improving interaction quality, correctness, and data faithfulness. An expert user study shows that Raiven significantly reduces debugging effort and makes it easier to produce correct visualizations.

Editorial analysis

A structured set of objections, weighed in public.

Desk editor's note, referee report, simulated authors' rebuttal, and a circularity audit. Tearing a paper down is the easy half of reading it; the pith above is the substance, this is the friction.

Referee Report

2 major / 2 minor

Summary. The paper presents Raiven, a conversational visualization authoring system that mediates LLM generation through a formally defined domain-specific language (RaivenDSL). RaivenDSL unifies scientific and information visualization across 2D, 3D, and tabular data. The LLM produces compact, schema-constrained RaivenDSL specifications from natural language; a deterministic compiler then translates them to executable D3 or VTK.js code. This design ensures outputs are deterministic, verifiable before execution, and free of data fabrication by construction. The authors report 100% compilation success on a 100-task benchmark, up to 6x improvements in speed and cost over state-of-the-art LLMs, plus gains in interaction quality, correctness, and data faithfulness; an expert user study indicates reduced debugging effort and easier production of correct visualizations.

Significance. If the quantitative and user-study results hold under transparent evaluation, the work offers a principled way to make LLM-based visualization authoring reliable and cross-domain. The DSL mediation plus deterministic compilation directly addresses non-determinism and silent errors that plague direct code generation. The formal unification of sci/info viz primitives and the emphasis on pre-execution verifiability are genuine strengths that could influence future hybrid LLM+DSL systems.

major comments (2)
  1. [Abstract / Evaluation] The 100-task benchmark (Abstract) reports 100% compilation success and claims of improved correctness and data faithfulness, yet provides no breakdown of semantic error rates, task coverage for complex 3D interactions or custom data transformations, or how 'correctness' was scored beyond syntactic compilation. Compilation verifies only syntactic/type validity via the deterministic compiler; without semantic analysis or coverage metrics against real visualization requirements, the claim that schema constraints suffice for complete, correct specifications remains unsubstantiated.
  2. [System Description / Evaluation] The central assumption that schema-guided constraints enable the LLM to produce RaivenDSL specifications that fully cover user intent for both scientific and information visualizations (including edge cases) is load-bearing for the reliability claims. The manuscript should include an explicit analysis of DSL primitive coverage, omission rates, or cases requiring manual correction; the current evidence rests on aggregate benchmark numbers without these diagnostics.
minor comments (2)
  1. [Abstract] The abstract states 'up to six times faster and six times cheaper' without naming the exact baseline LLMs, prompt conditions, or hardware setup used for the comparison.
  2. [User Study] The expert user study is summarized only at a high level; details on participant background, specific tasks, quantitative measures of debugging effort, and statistical tests would strengthen the presentation.

Simulated Author's Rebuttal

2 responses · 0 unresolved

We thank the referee for their constructive and detailed feedback. We address the two major comments point by point below, acknowledging the need for greater transparency in our evaluation and outlining specific revisions to the manuscript.

read point-by-point responses
  1. Referee: [Abstract / Evaluation] The 100-task benchmark (Abstract) reports 100% compilation success and claims of improved correctness and data faithfulness, yet provides no breakdown of semantic error rates, task coverage for complex 3D interactions or custom data transformations, or how 'correctness' was scored beyond syntactic compilation. Compilation verifies only syntactic/type validity via the deterministic compiler; without semantic analysis or coverage metrics against real visualization requirements, the claim that schema constraints suffice for complete, correct specifications remains unsubstantiated.

    Authors: We agree that the current presentation of the benchmark results would benefit from additional granularity. The 100% compilation rate specifically validates the effectiveness of schema-guided constraints in producing syntactically and type-valid RaivenDSL specifications, which by design eliminates data fabrication and enables pre-execution verification. Claims regarding improved correctness, interaction quality, and data faithfulness are supported by the expert user study, in which participants directly compared outputs on semantic fidelity to user intent and input data. To strengthen the evidence, we will revise the evaluation section to include: (i) a categorized breakdown of the 100 tasks (e.g., by 2D/3D, scientific vs. information viz, and presence of custom transformations), (ii) semantic error rates derived from expert review of generated visualizations, and (iii) explicit criteria used to score correctness in the user study. These additions will better substantiate coverage of user intent. revision: yes

  2. Referee: [System Description / Evaluation] The central assumption that schema-guided constraints enable the LLM to produce RaivenDSL specifications that fully cover user intent for both scientific and information visualizations (including edge cases) is load-bearing for the reliability claims. The manuscript should include an explicit analysis of DSL primitive coverage, omission rates, or cases requiring manual correction; the current evidence rests on aggregate benchmark numbers without these diagnostics.

    Authors: We concur that an explicit diagnostic analysis of DSL coverage would make the reliability claims more robust. RaivenDSL was designed as a unifying formal representation, and the benchmark tasks were selected to span common and complex cases across domains. In the revised manuscript, we will add a dedicated subsection that reports: primitive coverage statistics across the 100 tasks, any observed omission rates for edge cases, and details on conversational interactions where the LLM produced incomplete specifications requiring clarification or correction. This will provide the requested per-diagnostic evidence rather than relying solely on aggregate metrics. revision: yes

Circularity Check

0 steps flagged

No significant circularity; empirical claims rest on external benchmarks and user study

full rationale

The paper's core claims—100% compilation success, speed/cost improvements, and reduced debugging effort—are supported by a 100-task benchmark and expert user study rather than any self-referential definitions or fitted parameters. The RaivenDSL schema and deterministic compiler are architectural choices that enable verifiability by design, but the correctness and completeness assertions are tested externally against real tasks and not derived tautologically from the inputs. No load-bearing steps reduce to self-citation chains, ansatzes smuggled via prior work, or renaming of known results. The derivation chain is self-contained against the reported external evaluations.

Axiom & Free-Parameter Ledger

0 free parameters · 1 axioms · 2 invented entities

Based on abstract only, the central claim depends on the existence of a unifying formal DSL and the effectiveness of schema constraints on LLM output. No free parameters are mentioned. The DSL itself is an invented construct whose coverage of all visualization cases is assumed rather than derived.

axioms (1)
  • domain assumption RaivenDSL is a formally defined domain-specific language capable of representing visualizations across 2D, 3D, and tabular data in a unified way.
    The paper treats the DSL as sufficient to mediate between LLM generation and deterministic compilation for both scientific and information visualization needs.
invented entities (2)
  • RaivenDSL no independent evidence
    purpose: Provide a compact, schema-constrained, verifiable intermediate representation for visualizations that LLMs can generate reliably.
    New DSL introduced to enforce structure and prevent data fabrication.
  • Raiven system no independent evidence
    purpose: Conversational authoring pipeline combining LLM, DSL, and compiler.
    Overall architecture presented as the solution.

pith-pipeline@v0.9.0 · 5533 in / 1468 out tokens · 35889 ms · 2026-05-10T16:44:07.963410+00:00 · methodology

discussion (0)

Sign in with ORCID, Apple, or X to comment. Anyone can read and Pith papers without signing in.

Reference graph

Works this paper leans on

72 extracted references · 39 canonical work pages

  1. [1]

    K. Ai, K. Tang, and C. Wang. Nli4volvis: Natural language interaction for volume visualization via llm multi-agents and editable 3d gaussian splatting.IEEE Transactions on Visualization and Computer Graphics, 32(1):46–56, 2026. doi: 10.1109/TVCG.2025.3633888 2, 3

  2. [2]

    hiPSC single-cell image dataset [AICS- 10_8], 2018

    Allen Institute for Cell Science. hiPSC single-cell image dataset [AICS- 10_8], 2018. 5

  3. [3]

    Beyer, J

    J. Beyer, J. Troidl, S. Boorboor, M. Hadwiger, A. Kaufman, and H. Pfister. A survey of visualization and analysis in high-resolution connectomics. Computer Graphics Forum, 41(3):573–607, 2022. doi: 10.1111/cgf.14574 5

  4. [4]

    Bostock, V

    M. Bostock, V . Ogievetsky, and J. Heer. D3 data-driven documents.IEEE Transactions on Visualization and Computer Graphics, 17(12):2301–2309,

  5. [5]

    doi: 10.1109/TVCG.2011.185 2

  6. [6]

    N. Chen, Y . Zhang, J. Xu, K. Ren, and Y . Yang. Viseval: A benchmark for data visualization in the era of large language models.IEEE Transactions on Visualization and Computer Graphics, 31(1):1301–1311, 2025. doi: 10 .1109/TVCG.2024.3456320 7

  7. [7]

    C. Chiw, G. Kindlmann, J. Reppy, L. Samuels, and N. Seltzer. Diderot: a parallel dsl for image analysis and visualization. InProceedings of the 33rd ACM SIGPLAN Conference on Programming Language Design and Implementation, PLDI ’12, 10 pages, p. 111–120. Association for Computing Machinery, New York, NY , USA, 2012. doi: 10.1145/2254064 .2254079 2

  8. [8]

    H. Choi, W. Choi, T. M. Quan, D. G. C. Hildebrand, H. Pfister, and W.-K. Jeong. Vivaldi: A domain-specific language for volume processing and visualization on distributed heterogeneous systems.IEEE Transactions on Visualization and Computer Graphics, 20(12):2407–2416, 2014. doi: 10. 1109/TVCG.2014.2346322 2

  9. [9]

    V . Dibia. LIDA: A tool for automatic generation of grammar-agnostic visu- alizations and infographics using large language models. In D. Bollegala, R. Huang, and A. Ritter, eds.,Proceedings of the 61st Annual Meet- ing of the Association for Computational Linguistics (Volume 3: System Demonstrations), pp. 113–126. Association for Computational Linguistic...

  10. [10]

    O. Elek, J. N. Burchett, J. X. Prochaska, and A. G. Forbes. Polyphorm: Structural analysis of cosmological datasets via interactive physarum polycephalum visualization.IEEE Transactions on Visualization and Computer Graphics, 27(2):806–816, 2021. doi: 10.1109/TVCG.2020. 3030407 5

  11. [11]

    Harth, A

    P. Harth, A. Bast, J. Troidl, B. Meulemeester, H. Pfister, J. Beyer, M. Ober- laender, H.-C. Hege, and D. Baum. Rapid Prototyping for Coordinated Views of Multi-scale Spatial and Abstract Data: A Grammar-based Approach. In C. Hansen, J. Procter, R. G. Raidou, D. Jönsson, and T. Höllt, eds.,Eurographics Workshop on Visual Computing for Biology and Medicine...

  12. [12]

    Heer and D

    J. Heer and D. Moritz. Mosaic: An architecture for scalable & interop- erable data views.IEEE Transactions on Visualization and Computer Graphics, 30(1):436–446, 2024. doi: 10.1109/TVCG.2023.3327189 5

  13. [13]

    Hostnik, R

    M. Hostnik, R. Kurbanov, Y . Sokolov, and A. Trofimov. Vegachat: A robust framework for llm-based chart generation and assessment. 2026. doi: 10.48550/arXiv.2601.15385 2, 7

  14. [14]

    Huang, Y

    J. Huang, Y . Xi, J. Hu, and J. Tao. Flownl: Asking the flow data in natural languages.IEEE Transactions on Visualization and Computer Graphics, 29(1):1200–1210, 2023. doi: 10.1109/TVCG.2022.3209453 2, 3

  15. [15]

    J. D. Hunter. Matplotlib: A 2d graphics environment.Computing in Science & Engineering, 9(3):90–95, 2007. doi: 10.1109/MCSE.2007.55 2

  16. [16]

    D. Jia, Y . Wang, and I. Viola. Chat modeling: Natural language-based procedural modeling of biological structures without training. 2024. doi: 10.48550/arXiv.2404.01063 2, 3

  17. [17]

    S. A. Joseph, S. M. Husain, S. S. R. Offner, S. Juneau, P. Torrey, A. S. Bolton, J. P. Farias, N. Gaffney, G. Durrett, and J. J. Li. Astrovisbench: A code benchmark for scientific computing and visualization in astronomy

  18. [18]

    doi: 10.48550/arXiv.2505.20538 1, 7

  19. [19]

    Biswas, T

    D. Kouˇril, T. Manz, S. L’Yi, and N. Gehlenborg. Design space and declarative grammar for 3d genomic data visualization.IEEE Transactions on Visualization and Computer Graphics, 32(1):890–900, 2026. doi: 10. 1109/TVCG.2025.3634654 2

  20. [20]

    F. Lan, M. Young, L. Anderson, A. Ynnerman, A. Bock, M. A. Borkin, A. G. Forbes, J. A. Kollmeier, and B. Wang. Visualization in astrophysics: Developing new methods, discovering our universe, and educating the earth.Computer Graphics Forum, 40(3):635–663, 2021. doi: 10.1111/cgf. 14332 5

  21. [21]

    Lange, S

    D. Lange, S. Gao, P. Sui, A. Money, P. Misner, M. Zitnik, and N. Gehlen- borg. Yac: Bridging natural language and interactive visual exploration with generative ai for biomedical data discovery. 2025. doi: 10.48550/ arXiv.2509.19182 2

  22. [22]

    H. Lin, D. Moritz, and J. Heer. Dziban: Balancing agency & automation in visualization design via anchored recommendations. InProceedings of the 2020 CHI conference on human factors in computing systems, pp. 1–12, 2020. 2

  23. [23]

    S. Liu, H. Miao, and P.-T. Bremer. Paraview-mcp: An autonomous visualization agent with direct tool use. In2025 IEEE Visualization and Visual Analytics (VIS), pp. 61–65, 2025. doi: 10.1109/VIS60296.2025. 00018 2

  24. [24]

    Y . Luo, N. Tang, G. Li, J. Tang, C. Chai, and X. Qin. Natural language to visualization by neural machine translation.IEEE Transactions on Visualization and Computer Graphics, 28(1):217–226, 2022. doi: 10. 1109/TVCG.2021.3114848 2

  25. [25]

    Maddigan and T

    P. Maddigan and T. Susnjak. Chat2vis: Generating data visualizations via natural language using chatgpt, codex and gpt-3 large language mod- els.IEEE Access, 11:45181–45193, 2023. doi: 10.1109/ACCESS.2023. 3274199 2

  26. [26]

    Maitin-Shepard, A

    J. Maitin-Shepard, A. Baden, W. Silversmith, E. Perlman, F. Collman, T. Blakely, J. Funke, C. Jordan, B. Falk, N. Kemnitz, tingzhao, C. Roat, M. Castro, S. Jagannathan, moenigin, J. Clements, A. Hoag, B. Katz, D. Parsons, J. Wu, L. Kamentsky, P. Chervakov, P. Hubbard, S. Berg, J. Hoffer, A. Halageri, C. Machacek, K. Mader, L. Roeder, and P. H. Li. google/...

  27. [27]

    Mallick, O

    T. Mallick, O. Yildiz, D. Lenz, and T. Peterka. Chatvis: Automating scien- tific visualization with a large language model. InSC24-W: Workshops of the International Conference for High Performance Computing, Net- working, Storage and Analysis, pp. 49–55, 2024. doi: 10.1109/SCW63240 .2024.00014 2

  28. [28]

    A. M. McNutt. No grammar to rule them all: A survey of json-style dsls for visualization.IEEE Transactions on Visualization and Computer Graphics, 29(1):160–170, 2023. doi: 10.1109/TVCG.2022.3209460 4

  29. [29]

    Sarikaya and M

    D. Moritz, C. Wang, G. L. Nelson, H. Lin, A. M. Smith, B. Howe, and J. Heer. Formalizing visualization design knowledge as constraints: Ac- tionable and extensible models in draco.IEEE Transactions on Visualiza- tion and Computer Graphics, 25(1):438–448, 2019. doi: 10.1109/TVCG. 2018.2865240 2

  30. [30]

    Narechania, A

    A. Narechania, A. Srinivasan, and J. Stasko. Nl4dv: A toolkit for gener- ating analytic specifications for data visualization from natural language queries.IEEE Transactions on Visualization and Computer Graphics, 27(2):369–379, 2021. doi: 10.1109/TVCG.2020.3030378 2

  31. [31]

    Pollock and A

    J. Pollock and A. Satyanarayan. Gofish: A grammar of more graphics! IEEE Transactions on Visualization and Computer Graphics, 32(1):549– 559, 2026. doi: 10.1109/TVCG.2025.3634250 2

  32. [32]

    Rautek, S

    P. Rautek, S. Bruckner, M. E. Gröller, and M. Hadwiger. Vislang: A system for interpreted domain-specific languages for scientific visualization.IEEE Transactions on Visualization and Computer Graphics, 20(12):2388–2396,

  33. [33]

    doi: 10.1109/TVCG.2014.2346318 1, 2

  34. [34]

    Ribalta-Albado and P.-P

    M. Ribalta-Albado and P.-P. Vázquez. Evaluating llms’ abilities to create charts, a systematic approach.Computers & Graphics, 135:104544, 2026. doi: 10.1016/j.cag.2026.104544 1

  35. [35]

    L. A. Royer. Omega—harnessing the power of large language models for bioimage analysis.nature methods, 21(8):1371–1373, 2024. doi: 10. 1038/s41592-024-02310-w 2

  36. [36]

    Satyanarayan, D

    A. Satyanarayan, D. Moritz, K. Wongsuphasawat, and J. Heer. Vega-lite: A grammar of interactive graphics.IEEE Transactions on Visualization and Computer Graphics, 23(1):341–350, 2017. doi: 10.1109/TVCG.2016. 2599030 2

  37. [37]

    D. Scholz. A modular domain-specific language for interactive 3D visu- alization. Diploma thesis, Technische Universität Wien, 2021. doi: 10. 34726/hss.2021.80484 2

  38. [38]

    Schroeder, K

    W. Schroeder, K. Martin, and B. Lorensen.The Visualization Toolkit (4th ed.). Kitware, 2006. 2

  39. [39]

    M. Shih, C. Rozhon, and K.-L. Ma. A declarative grammar of flexible volume visualization pipelines.IEEE Transactions on Visualization and Computer Graphics, 25(1):1050–1059, 2019. doi: 10.1109/TVCG.2018. 2864841 2

  40. [40]

    Sicat, J

    R. Sicat, J. Li, J. Choi, M. Cordeil, W.-K. Jeong, B. Bach, and H. Pfister. Dxr: A toolkit for building immersive data visualizations.IEEE Transac- tions on Visualization and Computer Graphics, 25(1):715–725, 2019. doi: 10.1109/TVCG.2018.2865152 2

  41. [41]

    Y . Tian, W. Cui, D. Deng, X. Yi, Y . Yang, H. Zhang, and Y . Wu. Chartgpt: Leveraging llms to generate charts from abstract natural language.IEEE Transactions on Visualization and Computer Graphics, 31(3):1731–1745,

  42. [42]

    doi: 10.1109/TVCG.2024.3368621 2

  43. [43]

    Tory and T

    M. Tory and T. Moller. Rethinking visualization: A high-level taxonomy. InIEEE Symposium on Information Visualization, pp. 151–158, 2004. doi: 10.1109/INFVIS.2004.59 4

  44. [44]

    Summary of the content and survey properties

    A. Vallenari, A. G. Brown, T. Prusti, J. H. De Bruijne, F. Arenou, C. Babu- siaux, M. Biermann, O. L. Creevey, C. Ducourant, D. W. Evans, et al. Gaia data release 3-summary of the content and survey properties.Astronomy & Astrophysics, 674:A1, 2023. doi: 10.1051/0004-6361/202243940 5

  45. [45]

    Wilkinson.The grammar of graphics

    L. Wilkinson.The grammar of graphics. Statistics and computing. Springer, New York, 1999. 2

  46. [46]

    Y . Wu, Y . Wan, H. Zhang, Y . Sui, W. Wei, W. Zhao, G. Xu, and H. Jin. Automated data visualization from natural language via large language models: An exploratory study.Proc. ACM Manag. Data, 2(3), article no. 115, 28 pages, May 2024. doi: 10.1145/3654992 1

  47. [47]

    Z. Wu, V . Le, A. Tiwari, S. Gulwani, A. Radhakrishna, I. Radi ˇcek, G. Soares, X. Wang, Z. Li, and T. Xie. Nl2viz: natural language to visualization via constrained syntax-guided synthesis. InProceedings of the 30th ACM Joint European Software Engineering Conference and Symposium on the Foundations of Software Engineering, ESEC/FSE 2022, 12 pages, p. 972...

  48. [48]

    taylorgreen_9.vti

    B. Yu and C. T. Silva. Flowsense: A natural language interface for visual data exploration within a dataflow system.IEEE Transactions on Visualization and Computer Graphics, 26(1):1–11, 2020. doi: 10.1109/ TVCG.2019.2934668 2 A RAIVENDSL This section provides the complete RaivenDSL language reference: the formal grammar (Section A.1), the language structu...

  49. [50]

    Fill the schema: •source_name: short identifier •type: one of tbl, img, net, geo, func •path: exact file path or URL •args: optional arguments (used for func)

  50. [51]

    Insert any provided file path or URL into path, then decide Enough vs Not Enough

  51. [52]

    view_name

    Provide two outputs: • Schema output (JSON) • Feedback (Markdown) –If complete: first lineEnough – If incomplete: first line Not Enough , followed by a specific request Ensure both outputs are always produced. B.2.3 View & Layer Node Prompt {user}, {data_ref}, and {data_ref_with_types} denote runtime-injected values from the current schema and user input....

  52. [53]

    If it saysinteractive scatterplotor interactive chart, treat that as a brushable scatterplot: one view only, no second view

    Read the user’s description: user. If it saysinteractive scatterplotor interactive chart, treat that as a brushable scatterplot: one view only, no second view

  53. [54]

    If the user asked for one chart, fill view_name, layer_from, andgeo

    If the user asked for multiple views/charts, fill views with one object per view. If the user asked for one chart, fill view_name, layer_from, andgeo

  54. [55]

    When there are multiple data sources: • if there is one img and one tbl and the user asked for both volume/slice and a chart type, infer layer_from and return Enough • only when there is real ambiguity (e.g., two tbl or two img sources) output Not Enough and ask which dataset each view/layer uses

  55. [56]

    choropleth

    Provide two outputs: • Schema output(inside json ... ): set layer_from only when (a) there is one data source, or (b) the user explicitly said which dataset a view uses. If there are two or more sources of the same type and the user specified only one dataset, leave the others empty. • Feedback(inside markdown ... ): use exact dataset names (e.g., data_re...

  56. [57]

    Read the user’s description: {user}

  57. [58]

    volume",

    Only if the user clearly specified a chart/visualization type (e.g. his- togram, scatter, heatmap, bar, line, pie, volume, slice), choose the matching mark type(s) from the list above. • Layered (one view, multiple layers):When the schema has one view with multiple layers fromlayered(e.g. two vol- ume layers + one streamline layer), return one mark per la...

  58. [59]

    encode":

    Providetwo outputs: •Schema output(insidejson ...) •Feedback(insidemarkdown ...) –First lineEnoughwhen valid marks are provided – Otherwise Not Enough with a request to specify the visualization type Make sure to always output both parts. B.2.5 Encode Node Prompt {user_intent_block}, {data_info}, and {refinement_context_block} denote runtime-injected valu...

  59. [60]

    If single view and mark type is one of no_encode_marks, output empty encode and first lineEnough

  60. [61]

    Only request or fill encode for views that need variable mapping (points, bar, line, etc.)

    If multiple views: for each view with mark in no_encode_marks, use for that view’s encode. Only request or fill encode for views that need variable mapping (points, bar, line, etc.)

  61. [62]

    If allowed variable names is empty and at least one view needs encode, output empty encodes for those views and first line Not Enough; ask the user to specify which columns/variables to use (only for the view(s) that need encode, not for volume/slice/isosurface views)

  62. [63]

    If the user did not specify which variable goes on which channel for a view that needs encode, do not guess and do not pick from the allowed list — output empty encode for that view and say Not Enough, asking the user to choose which variables to use (only for that view)

  63. [64]

    Underspecification:If a view needs multiple variables (e.g. scatter- plot/points needs x and y) but the user only specified one variable for that view, output empty encode for that view and sayNot Enough , naming which view is underspecified, what is required (e.g. x, y), and what is optional for this mark if any (e.g. color). Example:Scat- terplot view: ...

  64. [65]

    Include optional channels (e.g

    Use only the required and optional channels listed in Context for this mark. Include optional channels (e.g. color, size, opacity for scatterplot) only if the user explicitly assigned them

  65. [66]

    , ".join(view_ids) or

    Providetwo outputs: Schema (inside json ... ) and Feedback (inside markdown ... ). First line of feedback: Enough when every view that needs encode has all required channels filled with names from the allowed list (and volume/slice/isosurface views have empty encode); otherwiseNot Enough. Make sure to always output both parts. B.2.6 Selections & Linking N...

  66. [67]

    Read the user’s description: user

  67. [68]

    interval

    If they want linked views with brush/selection: fill selections (name, type "interval", bind_view, bind_channels for that view only) and linking (shared_data_source, linked_view_ids, selection_name). Linked views may use different encode channels (e.g. scatter a vs b and scatter c vs d from the same table) — same data source is sufficient. Use exact view_...

  68. [69]

    selections

    If they did not ask for linking: return { "selections": [], "linking": {} }and first lineEnough

  69. [70]

    I want a histogram of b

    Providetwo outputs: •Schema output(insidejson ...) • Feedback(inside markdown ... ). First line Enough in both cases. Make sure to always output both parts. B.3 Agent Clarification Messages When Raiven cannot proceed with a workflow step due to missing or ambiguous information, it pauses and returns a clarification message to the user rather than making a...

  70. [71]

    Apromptthat was given to an LLM-based visualization system

  71. [72]

    TheHTML source codeof the output that system produced

  72. [73]

    bar chart

    Arendered viewof that HTML, which you can interact with (click, hover, brush, select). 4.N, the number of views the prompt requested. Your job is to evaluate the output by scoring each component de- scribed below. Core Principles • Everything is grounded in the prompt.Score based on what the prompt asked for, not what you think a good visual- ization shou...