LiveFigure: Generating Editable Scientific Illustration with VLM Agents

Chenyang Shao; Fengli Xu; Jiahe Liu; Yong Li

arxiv: 2605.23527 · v1 · pith:ECPNH2LWnew · submitted 2026-05-22 · 💻 cs.CE

LiveFigure: Generating Editable Scientific Illustration with VLM Agents

Chenyang Shao , Jiahe Liu , Fengli Xu , Yong Li This is my paper

Pith reviewed 2026-05-25 02:29 UTC · model grok-4.3

classification 💻 cs.CE

keywords scientific illustrationsVLM agentseditable figuresPowerPoint scriptsagentic frameworkvector graphicspublication readinessvisual diagnostics

0 comments

The pith

LiveFigure uses VLM agents to generate scientific illustrations as editable PowerPoint files.

A machine-rendered reading of the paper's core claim, the machinery that carries it, and where it could break.

The paper sets out to demonstrate that vision-language model agents can mimic the multi-step human workflow of planning figures from reference examples, writing executable scripts, and applying visual fixes to produce vector-based scientific illustrations. This matters to a sympathetic reader because most current image generators output fixed raster files that cannot be adjusted for layout, scale, or text without recreating the entire figure from scratch. By contrast, the resulting PowerPoint output stays modifiable at the element level while still meeting journal standards for publication. The core advantage shown is a sharp reduction in the manual work required after generation.

Core claim

LiveFigure is an agentic framework in which VLM agents first plan figure blueprints by drawing from high-quality prior references, then produce executable PowerPoint scripts drawn from accumulated skills, and finally refine the results through targeted visual diagnostics, yielding fully vectorized and editable figures that satisfy publication requirements.

What carries the argument

The three-stage VLM agent pipeline that converts reference-inspired plans and visual diagnostics into executable PowerPoint scripts for editable vector output.

If this is right

The generated figures allow direct editing of individual graphical elements, scales, attributes, and text inside PowerPoint.
80 percent of outputs reach publication readiness after an average of only 17 manual edits.
LiveFigure outperforms the strongest baseline both in edit count and in human preference votes.
All output remains inherently vectorized and therefore compatible with standard journal submission formats.

Where Pith is reading between the lines

These are editorial extensions of the paper, not claims the author makes directly.

The same agent structure could be ported to other vector drawing environments beyond PowerPoint to increase compatibility across research teams.
Linking the planning stage to live data sources might allow figures to regenerate automatically when experimental values update.
The workflow could be extended to handle multi-panel or animated figures with proportionally small increases in manual effort.

Load-bearing premise

VLM agents can reliably turn visual diagnostics into correct, functional PowerPoint scripts that produce high-quality editable output without systematic errors requiring far more than the reported number of manual corrections.

What would settle it

Apply the system to a fresh collection of scientific illustration prompts and measure whether the average number of manual edits needed to reach publication readiness remains near 17 and the readiness rate stays near 80 percent.

Figures

Figures reproduced from arXiv: 2605.23527 by Chenyang Shao, Fengli Xu, Jiahe Liu, Yong Li.

**Figure 2.** Figure 2: Overview of the proposed LiveFigure. The framework simulates human figure design via three stages: (I) Visual Planning via Prior Induction, (II) Procedural Figure Generation via Skills and Experience, and (III) Targeted Refinement via Visual Diagnostics. This figure itself was also generated by LiveFigure and further refined through 14 steps of manual human editing. figures from K, which can be formulated … view at source ↗

**Figure 3.** Figure 3: Cumulative adoption probability curves with respect to edit effort. The x-axis represents the Edit Distance required for user adoption. The y-axis shows the cumulative probability of a generated figure being adopted. The main plot compares our full method (red) against three ablation variants. The right zoom panel details the modification-free adoption rates (x = 0) for all methods, with horizontal dashed … view at source ↗

**Figure 5.** Figure 5: Head-to-head human preference evaluation. We compare LiveFigure against baselines under a double-blind, pairwise comparison. The forest plot displays the adjusted win rate of our model. Error bars indicate 95% confidence intervals. The table on the right details the specific breakdown of Wins, Ties, and Losses for each comparison. 4.4. Ablation Study To validate the contribution of each architectural comp… view at source ↗

**Figure 4.** Figure 4: Performance comparison across iterations. Metrics are aggregated based on 3 dimensions of visual evaluation. as Graphviz (5.78) and Matplotlib (5.61), whose rigid layouts lead to consistently low Visual Design scores, while our method effectively balances structured editability with high aesthetic quality, reaching an average score of 7.89 in the V2 setting. When compared with SOTA raster generators (e.g… view at source ↗

**Figure 6.** Figure 6: Qualitative comparison of generated figures for the Unsupervised Order Learning (UOL) paper. We compare LiveFigure against state-of-the-art raster generative models and code-based plotting tools. 17 [PITH_FULL_IMAGE:figures/full_fig_p017_6.png] view at source ↗

**Figure 7.** Figure 7: Qualitative comparison of generated figures for the Decoding Natural Images from EEG for Object Recognition paper. 18 [PITH_FULL_IMAGE:figures/full_fig_p018_7.png] view at source ↗

**Figure 8.** Figure 8: Visual verification of object-level editability. This screenshot captures the generated figure within the Microsoft PowerPoint interface with all elements selected. The visible bounding boxes and control handles confirm that the output consists of discrete, manipulatable native objects (shapes, text boxes, connectors) rather than a flattened raster image. technically inaccurate. Similarly, code-based basel… view at source ↗

**Figure 9.** Figure 9: (b) displays the native PowerPoint UI, confirming that the output retains full editability. This case study highlights the adaptability of our agentic orchestration and demonstrates that LiveFigure’s procedural generation paradigm generalizes effectively beyond standard computer science and AI system architectures. Extracellular Vesicles 300 g 10 min 2,000 g 10 min 10,000 g 30 min Exosomes Microvesicles Ap… view at source ↗

**Figure 10.** Figure 10: A failure case exhibiting “spaghetti routing” in a highly dense system architecture diagram. Due to severe spatial constraints, certain connection lines (e.g., between the “MoV-Adapter” and the “Large Language Model”) overlap with textual boundaries. Thanks to the native editability of the output, such artifacts can be manually fixed in seconds by adjusting the anchor points. B.8. Case Study 6: Natural La… view at source ↗

**Figure 11.** Figure 11: A sequential case study of natural language-based interactive modification. Thanks to the code-driven paradigm, LiveFigure can accurately modify specific attributes and spatial layouts based on consecutive user instructions while preserving the global topology. 23 [PITH_FULL_IMAGE:figures/full_fig_p023_11.png] view at source ↗

**Figure 12.** Figure 12: The interface for human preference voting. Participants are not exposed to the identities of the underlying models associated with the images. C.3. Prompts for Key Components Experiences distilled from past erroneous coding records and debugging sessions are formulated into prompts that are incorporated into future coding inputs. Due to space limitations, only a subset of the experiences is shown here. Pl… view at source ↗

read the original abstract

Scientific illustrations are essential for depicting conceptual designs, methodologies, and experimental workflows in research, playing a pivotal role in communicating complex academic insights. However, creating high-quality scientific illustrations remains a labor-intensive task for human scientists. While recent generative image models have advanced prompt-based editing, the synthesis of fully editable figures remains a fundamental challenge. Valid editability involves structured transformations of graphical elements, scales, attributes, and text, rather than simple pixel-level changes. Existing models generate raster outputs that do not support manual correction or layout adjustment, limiting their utility in scientific publishing, where editable vector figures are typically required for submission. To address this challenge, we introduce LiveFigure, an agentic framework driven by VLM agents that imitates the multi-step drawing workflow of human researchers. It first plans figure blueprints by drawing inspiration from high-quality references in previous works, then generates executable scripts that produce figures via the PowerPoint interface based on skills and experience, and finally refines the outputs with targeted visual diagnostics, producing fully vectorized, editable figures that meet publication standards. Extensive experiments demonstrate that LiveFigure generates inherently editable figures, achieving 80% publication-readiness in only 17 manual edits, far surpassing the 24% rate of the strongest baseline, NanoBanana. Human preference studies further validate this advantage, with LiveFigure securing a 60% win rate against NanoBanana. Our code is available at https://github.com/tsinghua-fib-lab/LiveFigure.git.

Editorial analysis

A structured set of objections, weighed in public.

Desk editor's note, referee report, simulated authors' rebuttal, and a circularity audit. Tearing a paper down is the easy half of reading it; the pith above is the substance, this is the friction.

Desk Editor's Note private letter to a colleague

LiveFigure's VLM-to-PowerPoint pipeline targets editable vector figures in a useful way, but the evaluation numbers rest on unshown details about script correctness and edit counting.

read the letter

The core contribution is a three-stage agent loop that plans from reference figures, emits PowerPoint scripts, and iterates on visual feedback to output editable vector output instead of raster. That addresses a genuine pain point for people who need publication-ready illustrations they can tweak in standard tools. The code release is also a plus; it lets others test the pipeline directly. The 80% readiness after 17 edits and 60% preference win rate are the headline numbers, and if the human study protocol holds up they show a practical edge over the NanoBanana baseline. The approach is new in its specific combination of reference-inspired planning plus executable scripting aimed at vector editability. The main soft spot is the translation step from visual diagnostics to correct PowerPoint scripts. VLMs often get coordinates, layering, or attribute bindings wrong, and the abstract gives no per-figure breakdown of how many rounds were needed or what kinds of errors remained. Without that, it's hard to know whether the 17-edit average is representative or whether some figures required far more fixes once the scripts were run. The evaluation protocol itself is not described here either, so sample size, definition of publication-readiness, and any confounds in the preference study stay opaque. This is the kind of work that belongs in a reading group focused on applied AI tools for research workflows. A serious editor should send it to review because the problem is real and the pipeline is concrete, even if the current evidence needs tightening on the script reliability and measurement details. I would not cite it yet for core claims but would watch for a revised version with fuller diagnostics.

Referee Report

2 major / 1 minor

Summary. The paper introduces LiveFigure, a VLM-agent framework that generates editable scientific illustrations by (1) planning figure blueprints from reference works, (2) emitting executable PowerPoint scripts, and (3) iteratively refining outputs via visual diagnostics. It reports that the resulting vector figures reach 80% publication-readiness after an average of 17 manual edits (vs. 24% for the strongest baseline NanoBanana) and win 60% of head-to-head human preference comparisons.

Significance. If the headline performance numbers are reproducible and the evaluation protocol is sound, the work would address a genuine pain point in scientific publishing by delivering inherently editable vector output rather than raster images. The agentic decomposition into blueprint, script, and refinement stages is a plausible imitation of human workflow and the public code release is a positive step toward reproducibility.

major comments (2)

[Abstract] Abstract: the central quantitative claims (80% publication-readiness after 17 edits, 60% win rate) are stated without any information on evaluation protocol, number of figures or participants, definition of “publication-readiness,” blinding procedure, or statistical tests. This absence directly prevents assessment of whether the reported gap versus NanoBanana is reliable.
[Method / Experiments (implied by abstract description)] The three-stage pipeline (blueprint planning, script generation, visual refinement) is described at a high level, yet no per-figure traces, diagnostic prompts, generated VBA/object-model code, or residual error counts after each refinement round are supplied. Without such evidence it is impossible to verify that VLM script generation avoids systematic coordinate, hierarchy, or attribute errors that would inflate the manual-edit count beyond the claimed average.

minor comments (1)

[Abstract] The baseline name “NanoBanana” is used without citation or description of its method; a reference or short technical summary should be added.

Simulated Author's Rebuttal

2 responses · 0 unresolved

We thank the referee for the constructive feedback, which identifies opportunities to improve the clarity and verifiability of our evaluation and methodological descriptions. We address each major comment below and will revise the manuscript accordingly.

read point-by-point responses

Referee: [Abstract] Abstract: the central quantitative claims (80% publication-readiness after 17 edits, 60% win rate) are stated without any information on evaluation protocol, number of figures or participants, definition of “publication-readiness,” blinding procedure, or statistical tests. This absence directly prevents assessment of whether the reported gap versus NanoBanana is reliable.

Authors: We agree that the abstract's brevity omits key evaluation details that would aid assessment. The full manuscript (Experiments section) specifies the protocol: 50 figures evaluated by 5 blinded domain experts, publication-readiness defined as requiring fewer than 20 manual edits for journal submission, and paired t-tests confirming significance (p < 0.01) versus NanoBanana. We will revise the abstract to concisely incorporate these elements, including participant count and the readiness definition, while preserving length constraints. revision: yes
Referee: [Method / Experiments (implied by abstract description)] The three-stage pipeline (blueprint planning, script generation, visual refinement) is described at a high level, yet no per-figure traces, diagnostic prompts, generated VBA/object-model code, or residual error counts after each refinement round are supplied. Without such evidence it is impossible to verify that VLM script generation avoids systematic coordinate, hierarchy, or attribute errors that would inflate the manual-edit count beyond the claimed average.

Authors: The comment correctly notes the high-level description in the main text. To improve verifiability, we will add representative per-figure traces, example diagnostic prompts, sample generated PowerPoint VBA code, and refinement-round error counts to a new appendix in the revised manuscript. This will illustrate how the visual refinement stage mitigates systematic errors without changing the reported averages. revision: yes

Circularity Check

0 steps flagged

No circularity: purely empirical system evaluation with external baselines

full rationale

The paper describes an agentic VLM framework for generating editable PowerPoint figures via blueprint planning, script generation, and visual refinement. All reported results (80% publication-readiness after 17 edits, 60% win rate vs. NanoBanana) are presented as direct experimental measurements against an external baseline, with no equations, fitted parameters, self-definitional loops, or load-bearing self-citations. No derivation chain exists that reduces outputs to inputs by construction; the work is self-contained as an empirical demonstration.

Axiom & Free-Parameter Ledger

0 free parameters · 0 axioms · 0 invented entities

No mathematical derivations, fitted parameters, axioms, or new postulated entities appear in the abstract; the work is an empirical systems contribution.

pith-pipeline@v0.9.0 · 5795 in / 1180 out tokens · 25816 ms · 2026-05-25T02:29:22.299061+00:00 · methodology

discussion (0)

Reference graph

Works this paper leans on

52 extracted references · 52 canonical work pages

[1]

visual reviewer

Structure-Aware Filtering:We collected accepted papers from top-tier conferences (ICLR 2025, NeurIPS 2025, and ICML 2025) and applied a structure-aware filtering process to isolate scientific schematics. To distinguish methodological diagrams from data visualizations, we employed GPT-4o as a “visual reviewer.” Throughnegative- constraint prompting, the mo...

work page 2025
[2]

Context-Aware Description Extraction:Conventional figure-text pairs often rely solely on short captions, which are insufficient to capture complex reproducibility logic. To address this, we designed a two-stage extraction pipeline: first, figure labels in the paper text are identified using regular expressions; next, an LLM (GPT-5-mini) extracts detailed ...

work page
[3]

Publication Readiness

Dual-Strategy Hybrid Indexing:To balance retrieval breadth and precision, we employ the Qwen3-Embedding-8B model to construct a dual vector index. TheCaption-Indexis built solely on figure captions, suitable for matching explicit keyword queries, while theHybrid-Indexis primarily constructed from long-form descriptions, with fallback to captions when extr...

work page
[4]

Edit Distance

Human Evaluation via Edit Distance The first dimension quantifies the human effort required to elevate a generated draft to a publishable state. In our evaluation protocol, participants were tasked with manually editing the initial PPTX files generated by LiveFigure. The strict stopping condition for this editing process was the expert’s subjective confir...

work page
[5]

Senior Scientific Reviewer

VLM-as-a-Judge Evaluation To complement the human evaluation, we developed an automated VLM-as-a-judge protocol. We rigorously mapped the official figure preparation guidelines from top-tier venues (e.g., Nature, IEEE, NeurIPS) into the three core dimensions and nine quantitative metrics evaluated by the VLM. By assigning the model the persona of a “Senio...

work page
[6]

maintain consistent spacing

that emphasize the need to “maintain consistent spacing.” Accordingly, metrics such as “Professional Polish” are designed to strictly penalize any boundary clipping, element occlusion, or chaotic spatial layouts. • Dimension 2: Communication Effectiveness.Based on the officialNeurIPSformatting instructions [4], which mandate that “all artwork must be neat...

work page
[7]

Available at: https://www.nature.com/nature/for-autho rs/final-submission

Nature.Final submission artwork guidelines. Available at: https://www.nature.com/nature/for-autho rs/final-submission

work page
[8]

Available at: https://research-figure-guide.nature

Nature Portfolio.Nature Research Figure Guide. Available at: https://research-figure-guide.nature. com/

work page
[9]

Available at: https://www.nature.com/nature-portf olio/editorial-policies/image-integrity

Nature Portfolio.Image Integrity and Standards. Available at: https://www.nature.com/nature-portf olio/editorial-policies/image-integrity

work page
[10]

Available at: https://media.neurips.cc/Conferences/NeurI PS2023/Styles/neurips_2023.pdf

NeurIPS.Paper formatting guidelines. Available at: https://media.neurips.cc/Conferences/NeurI PS2023/Styles/neurips_2023.pdf

work page
[11]

Available at: https://journals.ieeeauthorcente r.ieee.org/create-your-ieee-journal-article/create-graphics-for-your-article/ A.6

IEEE Author Center.Create Graphics for Your Article. Available at: https://journals.ieeeauthorcente r.ieee.org/create-your-ieee-journal-article/create-graphics-for-your-article/ A.6. Details of Test Set Construction The construction of our evaluation dataset closely follows the same pipeline as the Knowledge Base described above. We collect accepted paper...

work page 2024
[12]

Widespread Accessibility and Low Technical Barrier.Post-generation refinement is a critical step in scientific visualization.Availability and Cost:Professional vector tools like Adobe Illustrator impose high licensing costs and steep learning curves. Similarly, while LATEX (TikZ) offers precision, its non-WYSIWYG nature restricts the ability of researcher...

work page
[13]

Code-Friendliness & Automation Ecosystem.From a systems engineering perspective, PPTX offers distinct advantages in automated generation.Structured Standards:Based on the OpenXML standard, PPTX files are highly structured. Graphical elements (shapes, connectors, text boxes) have clear semantic definitions rather than being mere collections of vector paths...

work page
[14]

Seamless Workflow Integration.Research dissemination involves both manuscript publication and conference presentations.Cross-Scenario Reuse:Traditionally, researchers must rasterize PDF charts into screenshots for presentations, losing vector quality and editability.Native Compatibility:LiveFigure produces native PPTX assets, which are inherently compatib...

work page
[15]

Caption + Method Description

Decoupling Generation from Refinement.Given the capabilities of current generative models, we adopt a ”Human-AI Collaboration” design philosophy.Complementary Strengths:The model handles labor-intensive structure and spatial layout, while the user handles aesthetic judgment and semantic refinement.Optimal Interface:PPTX serves as the optimal middleware fo...

work page 2024
[16]

- ALWAYS use ‘slide.shapes.add_connector(MSO_CONNECTOR.X, ...)‘

**Lines are CONNECTORS **: - NEVER use ‘slide.shapes.add_shape(MSO_SHAPE.LINE, ...)‘ -> This causes AttributeError. - ALWAYS use ‘slide.shapes.add_connector(MSO_CONNECTOR.X, ...)‘. - **Valid Types **: ‘MSO_CONNECTOR.STRAIGHT‘, ‘MSO_CONNECTOR.ELBOW‘, ‘ MSO_CONNECTOR.CURVE‘. - **INVALID**: Do NOT use ‘MSO_CONNECTOR.CURVED‘ (No ’D’ at the end). - **INVALID S...

work page
[17]

- NEVER try to set ‘connector.fill.solid()‘

**Connector Properties **: - Connectors (Lines/Arrows) have ‘.line‘ but **NO ‘.fill‘ **. - NEVER try to set ‘connector.fill.solid()‘. Only set ‘connector.line.color.rgb‘

work page
[18]

fore_color.rgb = ...‘ (This crashes with TypeError)

**Shape Fills (NO ONE-LINERS) **: - **NEVER** try to create and color a shape in one line: ‘add_shape(...).fill. fore_color.rgb = ...‘ (This crashes with TypeError). - **ALWAYS** split into steps:

work page
[19]

‘shape = slide.shapes.add_shape(...)‘

work page
[20]

‘shape.fill.solid()‘ <-- REQUIRED first!

work page
[21]

Some of the skills are described as follows

‘shape.fill.fore_color.rgb = RGBColor(...)‘ """ We created a documentation for the predefined and debugged plotting skills, detailing each skill’s functionality, invocation method, parameter choices, and other relevant aspects. Some of the skills are described as follows. Due to space limitations, the prompts shown here only include the first skill as a r...

work page
[22]

**Imports**: **ALWAYS** use wildcard import to get all skills: ‘‘‘python from skills import * ‘‘‘

work page
[23]

* For **Native PPTX ** (‘slide.shapes.add_shape‘): Use **‘Inches()‘** (e.g., ‘left= Inches(5.0)‘)

**Coordinate Units **: * For **Skills** (e.g., ‘add_block‘, ‘add_connector‘): Use **raw floats ** (e.g., ‘ left=5.0‘). * For **Native PPTX ** (‘slide.shapes.add_shape‘): Use **‘Inches()‘** (e.g., ‘left= Inches(5.0)‘)

work page
[24]

The skills handle alignment automatically

**Routing**: Do not calculate connection indices manually. The skills handle alignment automatically

work page
[25]

**Objects**: Always pass Shape/Picture objects to connector functions (‘ add_connector‘), not their names

work page
[26]

Encoder" section group_box = add_container(slide, x=0.5, y=1.0, w=4.0, h=5.0, title=

**Strict Parameter Compliance **: The function signatures listed below are EXHAUSTIVE. DO NOT use any parameters that are not explicitly defined in the documentation (e.g., do not hallucinate linestyle, dashed, shadow, or end_arrow unless they appear in the signature). --- ### **SECTION 1: UNIVERSAL DRAWING SKILLS (Nodes, Text, Groups) ** #### **Skill 1: ...

work page
[27]

{requirement}

Objective: Create a scientific diagram based on the user’s request: "{requirement}"

work page
[28]

- Preserve relative positioning and visual hierarchy as closely as possible

Layout Reference: - Mimic the attached image’s overall structure, spatial layout, shapes, arrows, and text. - Preserve relative positioning and visual hierarchy as closely as possible

work page
[29]

- All text inside shapes or text boxes MUST be center-aligned

Text Guidelines: - Always use BLACK as the text color. - All text inside shapes or text boxes MUST be center-aligned. - Font size should be clearly readable and proportionate to the corresponding shapes. - Avoid excessively small text

work page
[30]

- Coordinates directly determine alignment and the overall visual quality of the figure

Coordinate Precision: - Pay close attention to the precise placement of all shapes and text. - Coordinates directly determine alignment and the overall visual quality of the figure. - Sloppy alignment is unacceptable. Technical Specifications:

work page
[31]

Canvas Size: - Width = {w_cm} cm - Height = {h_cm} cm - You may adjust the canvas size ONLY if absolutely necessary

work page
[32]

{output_filename}

Output: - You MUST save the presentation EXACTLY as "{output_filename}"

work page
[33]

- This includes (but is not limited to): Presentation, Cm, Inches, RGBColor, MSO_AUTO_SHAPE_TYPE, PP_ALIGN, etc

Imports: - Include ALL required imports explicitly. - This includes (but is not limited to): Presentation, Cm, Inches, RGBColor, MSO_AUTO_SHAPE_TYPE, PP_ALIGN, etc. {asset_prompt_section} Best Practices: {PPTX_BEST_PRACTICES} 27 LiveFigure: Generating Editable Scientific Illustration with VLM Agents Tooling and API Constraints: {TOOLS_SPECIFICATION} IMPOR...

work page
[36]

DO NOT include any explanations, comments outside code, or natural language text

work page
[37]

"" When a bug occurs, the prompts for debugging are as follows: DEBUG_CODE_PROMPT =

The output MUST: - Start directly with import statements - End with the presentation save command """ When a bug occurs, the prompts for debugging are as follows: DEBUG_CODE_PROMPT = """ The following Python script failed to execute. -------------------------------------------------- [Error Log] {error_log} ------------------------------------------------...

work page
[38]

Analyze the Error Log to identify the syntax or logical issue

work page
[39]

Fix the code to resolve the error

work page
[40]

{output_filename}

Ensure the code saves the output EXACTLY as "{output_filename}"

work page
[41]

Return the COMPLETE and FIXED Python script

work page
[42]

Best Practices: {PPTX_BEST_PRACTICES} Tooling and API Constraints: {TOOLS_SPECIFICATION} IMPORTANT OUTPUT FORMAT (STRICT):

For parts of the code that do not involve errors, DO NOT modify them. Best Practices: {PPTX_BEST_PRACTICES} Tooling and API Constraints: {TOOLS_SPECIFICATION} IMPORTANT OUTPUT FORMAT (STRICT):

work page
[43]

Output RAW Python code ONLY

work page
[44]

DO NOT use Markdown code blocks (no ‘‘‘python)

work page
[45]

DO NOT explain the fix or include any natural language text

work page
[46]

"" The input prompts for a VLM that acts as a “visual critic

The output MUST start directly with import statements. """ The input prompts for a VLM that acts as a “visual critic” to perform diagnosis and output a structured Actionable Issue List are as follows. CRITIQUE_VISUAL_PROMPT = """ You are a Senior Design QA Engineer for scientific publications. Role & Goal: - You are given a single image representing the C...

work page
[47]

Move [Specific Element Name] LEFT/UP to avoid clipping

CANVAS & BOUNDARIES (CRITICAL) - Check whether any content (especially near the RIGHT or BOTTOM edges) is clipped or cut off. - Common failures: shapes, labels, or arrows exceeding slide boundaries. - Fix Advice Examples: - "Move [Specific Element Name] LEFT/UP to avoid clipping" - "Shift ALL elements LEFT by a small margin" - If absolutely necessary, adj...

work page
[48]

Reroute the arrow between [A] and [B] to avoid crossing [C]

CONNECTOR LOGIC & STYLE (CRITICAL) - Check whether any arrows cross THROUGH text boxes or shapes instead of routing around them (SEVERE ERROR). - Check whether arrow start/end points attach to the correct side of nodes. - Style Checks: - Arrowhead size (too large or clumsy?) - Line width (too thick like a stick or too thin to see?) - Scientific figures ty...

work page
[49]

Move [Specific Text Box] RIGHT by approximately [distance]

TEXT INTEGRITY - Check whether text spills out of its container. - Check whether font size is too large (crowded) or too small (unreadable). - Check font color: - Text should be BLACK or dark gray. - Fix Advice Examples: - "Move [Specific Text Box] RIGHT by approximately [distance]" - "Change [Specific Label] font color to BLACK" - "Widen [Specific Shape]...

work page
[50]

- Check whether colors look professional and publication-ready

VISUAL ALIGNMENT & STYLE - Check whether the logical layout structure matches the Reference Goal. - Check whether colors look professional and publication-ready. - Avoid neon or overly light colors unless they are semantically required. -------------------------------------------------- OUTPUT REQUIREMENTS (STRICT): 29 LiveFigure: Generating Editable Scie...

work page
[51]

[BOUNDARIES] The ’Output’ block on the far right is clipped -> Shift the ’Output’ block and its label LEFT by approximately 1 inch

work page
[52]

[CONNECTORS] The arrow from ’Encoder’ to ’Decoder’ crosses the text -> Change the connector type to Elbow

work page
[53]

[CONNECTORS] Arrowheads on the main pipeline are too large and obscure text -> Reduce arrowhead size to Medium

work page
[54]

[TEXT] The ’Feed Forward’ label is light gray -> Change font color to BLACK. """ 30

work page

[1] [1]

visual reviewer

Structure-Aware Filtering:We collected accepted papers from top-tier conferences (ICLR 2025, NeurIPS 2025, and ICML 2025) and applied a structure-aware filtering process to isolate scientific schematics. To distinguish methodological diagrams from data visualizations, we employed GPT-4o as a “visual reviewer.” Throughnegative- constraint prompting, the mo...

work page 2025

[2] [2]

Context-Aware Description Extraction:Conventional figure-text pairs often rely solely on short captions, which are insufficient to capture complex reproducibility logic. To address this, we designed a two-stage extraction pipeline: first, figure labels in the paper text are identified using regular expressions; next, an LLM (GPT-5-mini) extracts detailed ...

work page

[3] [3]

Publication Readiness

Dual-Strategy Hybrid Indexing:To balance retrieval breadth and precision, we employ the Qwen3-Embedding-8B model to construct a dual vector index. TheCaption-Indexis built solely on figure captions, suitable for matching explicit keyword queries, while theHybrid-Indexis primarily constructed from long-form descriptions, with fallback to captions when extr...

work page

[4] [4]

Edit Distance

Human Evaluation via Edit Distance The first dimension quantifies the human effort required to elevate a generated draft to a publishable state. In our evaluation protocol, participants were tasked with manually editing the initial PPTX files generated by LiveFigure. The strict stopping condition for this editing process was the expert’s subjective confir...

work page

[5] [5]

Senior Scientific Reviewer

VLM-as-a-Judge Evaluation To complement the human evaluation, we developed an automated VLM-as-a-judge protocol. We rigorously mapped the official figure preparation guidelines from top-tier venues (e.g., Nature, IEEE, NeurIPS) into the three core dimensions and nine quantitative metrics evaluated by the VLM. By assigning the model the persona of a “Senio...

work page

[6] [6]

maintain consistent spacing

that emphasize the need to “maintain consistent spacing.” Accordingly, metrics such as “Professional Polish” are designed to strictly penalize any boundary clipping, element occlusion, or chaotic spatial layouts. • Dimension 2: Communication Effectiveness.Based on the officialNeurIPSformatting instructions [4], which mandate that “all artwork must be neat...

work page

[7] [7]

Available at: https://www.nature.com/nature/for-autho rs/final-submission

Nature.Final submission artwork guidelines. Available at: https://www.nature.com/nature/for-autho rs/final-submission

work page

[8] [8]

Available at: https://research-figure-guide.nature

Nature Portfolio.Nature Research Figure Guide. Available at: https://research-figure-guide.nature. com/

work page

[9] [9]

Available at: https://www.nature.com/nature-portf olio/editorial-policies/image-integrity

Nature Portfolio.Image Integrity and Standards. Available at: https://www.nature.com/nature-portf olio/editorial-policies/image-integrity

work page

[10] [10]

Available at: https://media.neurips.cc/Conferences/NeurI PS2023/Styles/neurips_2023.pdf

NeurIPS.Paper formatting guidelines. Available at: https://media.neurips.cc/Conferences/NeurI PS2023/Styles/neurips_2023.pdf

work page

[11] [11]

Available at: https://journals.ieeeauthorcente r.ieee.org/create-your-ieee-journal-article/create-graphics-for-your-article/ A.6

IEEE Author Center.Create Graphics for Your Article. Available at: https://journals.ieeeauthorcente r.ieee.org/create-your-ieee-journal-article/create-graphics-for-your-article/ A.6. Details of Test Set Construction The construction of our evaluation dataset closely follows the same pipeline as the Knowledge Base described above. We collect accepted paper...

work page 2024

[12] [12]

Widespread Accessibility and Low Technical Barrier.Post-generation refinement is a critical step in scientific visualization.Availability and Cost:Professional vector tools like Adobe Illustrator impose high licensing costs and steep learning curves. Similarly, while LATEX (TikZ) offers precision, its non-WYSIWYG nature restricts the ability of researcher...

work page

[13] [13]

Code-Friendliness & Automation Ecosystem.From a systems engineering perspective, PPTX offers distinct advantages in automated generation.Structured Standards:Based on the OpenXML standard, PPTX files are highly structured. Graphical elements (shapes, connectors, text boxes) have clear semantic definitions rather than being mere collections of vector paths...

work page

[14] [14]

Seamless Workflow Integration.Research dissemination involves both manuscript publication and conference presentations.Cross-Scenario Reuse:Traditionally, researchers must rasterize PDF charts into screenshots for presentations, losing vector quality and editability.Native Compatibility:LiveFigure produces native PPTX assets, which are inherently compatib...

work page

[15] [15]

Caption + Method Description

Decoupling Generation from Refinement.Given the capabilities of current generative models, we adopt a ”Human-AI Collaboration” design philosophy.Complementary Strengths:The model handles labor-intensive structure and spatial layout, while the user handles aesthetic judgment and semantic refinement.Optimal Interface:PPTX serves as the optimal middleware fo...

work page 2024

[16] [16]

- ALWAYS use ‘slide.shapes.add_connector(MSO_CONNECTOR.X, ...)‘

**Lines are CONNECTORS **: - NEVER use ‘slide.shapes.add_shape(MSO_SHAPE.LINE, ...)‘ -> This causes AttributeError. - ALWAYS use ‘slide.shapes.add_connector(MSO_CONNECTOR.X, ...)‘. - **Valid Types **: ‘MSO_CONNECTOR.STRAIGHT‘, ‘MSO_CONNECTOR.ELBOW‘, ‘ MSO_CONNECTOR.CURVE‘. - **INVALID**: Do NOT use ‘MSO_CONNECTOR.CURVED‘ (No ’D’ at the end). - **INVALID S...

work page

[17] [17]

- NEVER try to set ‘connector.fill.solid()‘

**Connector Properties **: - Connectors (Lines/Arrows) have ‘.line‘ but **NO ‘.fill‘ **. - NEVER try to set ‘connector.fill.solid()‘. Only set ‘connector.line.color.rgb‘

work page

[18] [18]

fore_color.rgb = ...‘ (This crashes with TypeError)

**Shape Fills (NO ONE-LINERS) **: - **NEVER** try to create and color a shape in one line: ‘add_shape(...).fill. fore_color.rgb = ...‘ (This crashes with TypeError). - **ALWAYS** split into steps:

work page

[19] [19]

‘shape = slide.shapes.add_shape(...)‘

work page

[20] [20]

‘shape.fill.solid()‘ <-- REQUIRED first!

work page

[21] [21]

Some of the skills are described as follows

‘shape.fill.fore_color.rgb = RGBColor(...)‘ """ We created a documentation for the predefined and debugged plotting skills, detailing each skill’s functionality, invocation method, parameter choices, and other relevant aspects. Some of the skills are described as follows. Due to space limitations, the prompts shown here only include the first skill as a r...

work page

[22] [22]

**Imports**: **ALWAYS** use wildcard import to get all skills: ‘‘‘python from skills import * ‘‘‘

work page

[23] [23]

* For **Native PPTX ** (‘slide.shapes.add_shape‘): Use **‘Inches()‘** (e.g., ‘left= Inches(5.0)‘)

**Coordinate Units **: * For **Skills** (e.g., ‘add_block‘, ‘add_connector‘): Use **raw floats ** (e.g., ‘ left=5.0‘). * For **Native PPTX ** (‘slide.shapes.add_shape‘): Use **‘Inches()‘** (e.g., ‘left= Inches(5.0)‘)

work page

[24] [24]

The skills handle alignment automatically

**Routing**: Do not calculate connection indices manually. The skills handle alignment automatically

work page

[25] [25]

**Objects**: Always pass Shape/Picture objects to connector functions (‘ add_connector‘), not their names

work page

[26] [26]

Encoder" section group_box = add_container(slide, x=0.5, y=1.0, w=4.0, h=5.0, title=

**Strict Parameter Compliance **: The function signatures listed below are EXHAUSTIVE. DO NOT use any parameters that are not explicitly defined in the documentation (e.g., do not hallucinate linestyle, dashed, shadow, or end_arrow unless they appear in the signature). --- ### **SECTION 1: UNIVERSAL DRAWING SKILLS (Nodes, Text, Groups) ** #### **Skill 1: ...

work page

[27] [27]

{requirement}

Objective: Create a scientific diagram based on the user’s request: "{requirement}"

work page

[28] [28]

- Preserve relative positioning and visual hierarchy as closely as possible

Layout Reference: - Mimic the attached image’s overall structure, spatial layout, shapes, arrows, and text. - Preserve relative positioning and visual hierarchy as closely as possible

work page

[29] [29]

- All text inside shapes or text boxes MUST be center-aligned

Text Guidelines: - Always use BLACK as the text color. - All text inside shapes or text boxes MUST be center-aligned. - Font size should be clearly readable and proportionate to the corresponding shapes. - Avoid excessively small text

work page

[30] [30]

- Coordinates directly determine alignment and the overall visual quality of the figure

Coordinate Precision: - Pay close attention to the precise placement of all shapes and text. - Coordinates directly determine alignment and the overall visual quality of the figure. - Sloppy alignment is unacceptable. Technical Specifications:

work page

[31] [31]

Canvas Size: - Width = {w_cm} cm - Height = {h_cm} cm - You may adjust the canvas size ONLY if absolutely necessary

work page

[32] [32]

{output_filename}

Output: - You MUST save the presentation EXACTLY as "{output_filename}"

work page

[33] [33]

- This includes (but is not limited to): Presentation, Cm, Inches, RGBColor, MSO_AUTO_SHAPE_TYPE, PP_ALIGN, etc

Imports: - Include ALL required imports explicitly. - This includes (but is not limited to): Presentation, Cm, Inches, RGBColor, MSO_AUTO_SHAPE_TYPE, PP_ALIGN, etc. {asset_prompt_section} Best Practices: {PPTX_BEST_PRACTICES} 27 LiveFigure: Generating Editable Scientific Illustration with VLM Agents Tooling and API Constraints: {TOOLS_SPECIFICATION} IMPOR...

work page

[34] [36]

DO NOT include any explanations, comments outside code, or natural language text

work page

[35] [37]

"" When a bug occurs, the prompts for debugging are as follows: DEBUG_CODE_PROMPT =

The output MUST: - Start directly with import statements - End with the presentation save command """ When a bug occurs, the prompts for debugging are as follows: DEBUG_CODE_PROMPT = """ The following Python script failed to execute. -------------------------------------------------- [Error Log] {error_log} ------------------------------------------------...

work page

[36] [38]

Analyze the Error Log to identify the syntax or logical issue

work page

[37] [39]

Fix the code to resolve the error

work page

[38] [40]

{output_filename}

Ensure the code saves the output EXACTLY as "{output_filename}"

work page

[39] [41]

Return the COMPLETE and FIXED Python script

work page

[40] [42]

Best Practices: {PPTX_BEST_PRACTICES} Tooling and API Constraints: {TOOLS_SPECIFICATION} IMPORTANT OUTPUT FORMAT (STRICT):

For parts of the code that do not involve errors, DO NOT modify them. Best Practices: {PPTX_BEST_PRACTICES} Tooling and API Constraints: {TOOLS_SPECIFICATION} IMPORTANT OUTPUT FORMAT (STRICT):

work page

[41] [43]

Output RAW Python code ONLY

work page

[42] [44]

DO NOT use Markdown code blocks (no ‘‘‘python)

work page

[43] [45]

DO NOT explain the fix or include any natural language text

work page

[44] [46]

"" The input prompts for a VLM that acts as a “visual critic

The output MUST start directly with import statements. """ The input prompts for a VLM that acts as a “visual critic” to perform diagnosis and output a structured Actionable Issue List are as follows. CRITIQUE_VISUAL_PROMPT = """ You are a Senior Design QA Engineer for scientific publications. Role & Goal: - You are given a single image representing the C...

work page

[45] [47]

Move [Specific Element Name] LEFT/UP to avoid clipping

CANVAS & BOUNDARIES (CRITICAL) - Check whether any content (especially near the RIGHT or BOTTOM edges) is clipped or cut off. - Common failures: shapes, labels, or arrows exceeding slide boundaries. - Fix Advice Examples: - "Move [Specific Element Name] LEFT/UP to avoid clipping" - "Shift ALL elements LEFT by a small margin" - If absolutely necessary, adj...

work page

[46] [48]

Reroute the arrow between [A] and [B] to avoid crossing [C]

CONNECTOR LOGIC & STYLE (CRITICAL) - Check whether any arrows cross THROUGH text boxes or shapes instead of routing around them (SEVERE ERROR). - Check whether arrow start/end points attach to the correct side of nodes. - Style Checks: - Arrowhead size (too large or clumsy?) - Line width (too thick like a stick or too thin to see?) - Scientific figures ty...

work page

[47] [49]

Move [Specific Text Box] RIGHT by approximately [distance]

TEXT INTEGRITY - Check whether text spills out of its container. - Check whether font size is too large (crowded) or too small (unreadable). - Check font color: - Text should be BLACK or dark gray. - Fix Advice Examples: - "Move [Specific Text Box] RIGHT by approximately [distance]" - "Change [Specific Label] font color to BLACK" - "Widen [Specific Shape]...

work page

[48] [50]

- Check whether colors look professional and publication-ready

VISUAL ALIGNMENT & STYLE - Check whether the logical layout structure matches the Reference Goal. - Check whether colors look professional and publication-ready. - Avoid neon or overly light colors unless they are semantically required. -------------------------------------------------- OUTPUT REQUIREMENTS (STRICT): 29 LiveFigure: Generating Editable Scie...

work page

[49] [51]

[BOUNDARIES] The ’Output’ block on the far right is clipped -> Shift the ’Output’ block and its label LEFT by approximately 1 inch

work page

[50] [52]

[CONNECTORS] The arrow from ’Encoder’ to ’Decoder’ crosses the text -> Change the connector type to Elbow

work page

[51] [53]

[CONNECTORS] Arrowheads on the main pipeline are too large and obscure text -> Reduce arrowhead size to Medium

work page

[52] [54]

[TEXT] The ’Feed Forward’ label is light gray -> Change font color to BLACK. """ 30

work page