arxiv: 2604.00270 · v3 · submitted 2026-03-31 · 💻 cs.CV

Recognition: no theorem link

OmniSch: A Multimodal PCB Schematic Benchmark For Structured Diagram Visual Reasoning

Taiting Lu , Kaiyuan Lin , Yuxin Tian , Mingjia Wang , Yubo Wang , Muchuan Wang , Sharique Khatri , Akshit Kartik

show 8 more authors

Yixi Wang Amey Santosh Rane Yida Wang Sung-Liang Chen Yifan Yang Yi-Chao Chen Yincheng Jin Mahanth Gowda

Authors on Pith no claims yet

Pith reviewed 2026-05-13 23:14 UTC · model grok-4.3

classification 💻 cs.CV

keywords PCB schematicsmultimodal modelsdiagram reasoningnetlist graphsvisual groundingbenchmarkelectronic design automationgraph construction

0 comments

The pith

Large multimodal models exhibit significant limitations in understanding PCB schematic diagrams and constructing spatial netlist graphs from them.

A machine-rendered reading of the paper's core claim, the machinery that carries it, and where it could break.

The paper introduces OmniSch, a benchmark with 1,854 real-world PCB schematic diagrams, to evaluate large multimodal models on schematic understanding and netlist graph construction. It defines four tasks covering visual grounding of entities, topological diagram-to-graph reasoning, geometric reasoning for connection weights, and tool-augmented agentic reasoning. Results show models struggle with unreliable fine-grained grounding, brittle parsing of layouts into graphs, inconsistent connectivity reasoning, and inefficient visual exploration. This matters because such graph representations form the backbone of electronic design automation workflows. If models can close these gaps, it could advance automated circuit design.

Core claim

OmniSch is the first comprehensive benchmark for assessing large multimodal models on schematic understanding and spatial netlist graph construction. It contains 1,854 real-world schematic diagrams and includes four tasks: visual grounding for schematic entities with 109.9K grounded instances, diagram-to-graph reasoning for topological relationships, geometric reasoning for layout-dependent weights, and tool-augmented agentic reasoning for visual search. Evaluations reveal substantial gaps in current LMMs, including unreliable fine-grained grounding, brittle layout-to-graph parsing, inconsistent global connectivity reasoning, and inefficient visual exploration.

What carries the argument

The OmniSch benchmark with its collection of 1,854 diagrams and four-task protocol that tests conversion of schematics into spatially weighted netlist graphs.

Load-bearing premise

The selected diagrams, annotation process, and evaluation tasks capture the essential difficulties in real-world PCB schematic analysis.

What would settle it

A large multimodal model achieving consistent high performance on all four OmniSch tasks across varied schematics would indicate the gaps are not fundamental.

Figures

Figures reproduced from arXiv: 2604.00270 by Akshit Kartik, Amey Santosh Rane, Kaiyuan Lin, Mahanth Gowda, Mingjia Wang, Muchuan Wang, Sharique Khatri, Sung-Liang Chen, Taiting Lu, Yi-Chao Chen, Yida Wang, Yifan Yang, Yincheng Jin, Yixi Wang, Yubo Wang, Yuxin Tian.

**Figure 2.** Figure 2: Overview of OmniSch benchmark with representative cases. 3,700 QA instances with 174 schematic designs. Current datasets focusing on SPICE-style schematic diagrams, which typically contain a limited number of entity types and samples, compared with thousands of components types in practical schematic designs. Similarly, current methods remain confined to recognizing a narrow predefined set of predefined s… view at source ↗

**Figure 3.** Figure 3: Comparison between different data annotation paradigms. (a) Man [PITH_FULL_IMAGE:figures/full_fig_p006_3.png] view at source ↗

**Figure 4.** Figure 4: Statistical overview of the OmniSch benchmark. The dataset encompasses a diverse range of electronic domains, comprising 1-440 symbols, 1-1200 pins, 1-400 nets, and 1-1600 text instances. This large-scale diversity provides a comprehensive benchmark for the automatic generation and evaluation of schematic netlists. 3.3 Statistics of OmniSch Benchmark As illustrated in [PITH_FULL_IMAGE:figures/full_fig_p0… view at source ↗

**Figure 5.** Figure 5: Synthesized schematic variations generated by our custom EDA generative rendering engine. The first diagram shows the original export from the industrial EDA tool; all remaining diagrams are rendered by our engine under controlled variations. (a) original EDA export; (b) full text; (c) without all text; (d) without symbol names and values. 4.1 Experimental Setups Study Setup. The tested LMMs in this sectio… view at source ↗

**Figure 6.** Figure 6: Overview of the ReAct-based agentic framework for LMMs to evaluate usage and performance of LMMs using tools in schematic-to-netlist conversion. Implementation of LMMs Agentic Framework. We design a Reactbased evaluation framework to benchmark how LMMs perform VQA-style rea- [PITH_FULL_IMAGE:figures/full_fig_p011_6.png] view at source ↗

read the original abstract

Recent large multimodal models (LMMs) have made rapid progress in visual grounding, document understanding, and diagram reasoning tasks. However, their ability to convert Printed Circuit Board (PCB) schematic diagrams into machine-readable spatially weighted netlist graphs, jointly capturing component attributes, connectivity, and geometry, remains largely underexplored, despite such graph representations are the backbone of practical electronic design automation (EDA) workflows. To bridge this gap, we introduce OmniSch, the first comprehensive benchmark designed to assess LMMs on schematic understanding and spatial netlist graph construction. OmniSch contains 1,854 real-world schematic diagrams and includes four tasks: (1) visual grounding for schematic entities, with 109.9K grounded instances aligning 423.4K diagram semantic labels to their visual regions; (2) diagram-to-graph reasoning, understanding topological relationship among diagram elements; (3) geometric reasoning, constructing layout-dependent weights for each connection; and (4) tool-augmented agentic reasoning for visual search, invoking external tools to accomplish (1)-(3). Our results reveal substantial gaps of current LMMs in interpreting schematic engineering artifacts, including unreliable fine-grained grounding, brittle layout-to-graph parsing, inconsistent global connectivity reasoning and inefficient visual exploration.

Editorial analysis

A structured set of objections, weighed in public.

Desk editor's note, referee report, simulated authors' rebuttal, and a circularity audit. Tearing a paper down is the easy half of reading it; the pith above is the substance, this is the friction.

Desk Editor's Note private letter to a colleague

OmniSch is the first benchmark for turning PCB schematics into netlist graphs with multimodal models, and it shows clear gaps in grounding and connectivity, but the data construction details need checking.

read the letter

The paper's main contribution is a new benchmark called OmniSch with 1,854 real PCB diagrams and four tasks that cover visual grounding of components, building the topology graph, adding geometric weights to connections, and using external tools for search. This fills a gap for electronic design automation where models need to produce machine-readable netlists from diagrams, and the reported results indicate current LMMs are unreliable on fine details and global structure. That part is useful and directly relevant to practical workflows. The scale with 109.9K grounded instances gives a concrete starting point for measuring progress where none existed before. What is less clear is how the diagrams were chosen and annotated. The description does not include selection criteria, source diversity, or inter-annotator agreement, so it is hard to tell whether the observed shortfalls are general model limits or tied to this particular collection. If the set skews toward simpler or specific styles, the gaps could be narrower than claimed. The baselines and metrics also need full verification for fairness and reproducibility. This work is aimed at researchers building or evaluating multimodal models on technical diagrams and EDA applications. A reader focused on diagram reasoning or vision-language models for engineering will get value from the task definitions and the evidence of current weaknesses. It deserves a serious referee to examine the data pipeline and evaluation protocol in detail.

Referee Report

2 major / 1 minor

Summary. The paper introduces OmniSch, a benchmark of 1,854 real-world PCB schematic diagrams containing 109.9K grounded instances, designed to evaluate large multimodal models on four tasks: visual grounding of schematic entities, diagram-to-graph reasoning for topological relationships, geometric reasoning to construct layout-dependent connection weights, and tool-augmented agentic reasoning that invokes external tools for the prior tasks. It claims to reveal substantial gaps in current LMMs, including unreliable fine-grained grounding, brittle layout-to-graph parsing, inconsistent global connectivity reasoning, and inefficient visual exploration.

Significance. If the dataset construction and evaluation protocol prove representative of real EDA workflows, the benchmark would offer a valuable, structured resource for measuring progress in multimodal diagram understanding and graph extraction, directly relevant to practical electronic design automation. The explicit focus on spatially weighted netlist construction and the inclusion of an agentic tool-use task distinguish it from prior diagram benchmarks and could guide targeted improvements in LMMs for engineering artifacts.

major comments (2)

[Dataset Construction] Dataset section: No selection criteria, source diversity statistics, complexity metrics, or inter-annotator agreement scores are reported for the 1,854 diagrams or the 109.9K grounded instances and associated netlists. This information is load-bearing for the central claim that observed performance gaps reflect intrinsic model limitations rather than benchmark-specific artifacts.
[Experiments and Results] Evaluation section: The results lack error bars, statistical tests, and detailed descriptions of baseline implementations and metric definitions for the four tasks. Without these, the assertions of 'substantial gaps' and 'brittle' performance cannot be rigorously assessed.

minor comments (1)

[Abstract] Abstract: The phrasing '109.9K grounded instances aligning 423.4K diagram semantic labels' requires clarification on the exact relationship between these quantities.

Simulated Author's Rebuttal

2 responses · 0 unresolved

We thank the referee for the constructive comments, which highlight important areas for improving the clarity and rigor of our benchmark presentation. We will revise the manuscript to incorporate additional details on dataset construction and evaluation protocols.

read point-by-point responses

Referee: [Dataset Construction] Dataset section: No selection criteria, source diversity statistics, complexity metrics, or inter-annotator agreement scores are reported for the 1,854 diagrams or the 109.9K grounded instances and associated netlists. This information is load-bearing for the central claim that observed performance gaps reflect intrinsic model limitations rather than benchmark-specific artifacts.

Authors: We agree that these details are necessary to support the benchmark's validity. In the revised manuscript, we will expand the Dataset section to report selection criteria, source diversity statistics, complexity metrics, and inter-annotator agreement scores for the diagrams and grounded instances. This addition will help demonstrate that the observed model limitations are not due to benchmark-specific artifacts. revision: yes
Referee: [Experiments and Results] Evaluation section: The results lack error bars, statistical tests, and detailed descriptions of baseline implementations and metric definitions for the four tasks. Without these, the assertions of 'substantial gaps' and 'brittle' performance cannot be rigorously assessed.

Authors: We acknowledge that the current evaluation reporting can be strengthened. In the revised manuscript, we will update the Evaluation section to include error bars, statistical tests, expanded descriptions of baseline implementations, and precise definitions of the metrics used for each of the four tasks. These changes will allow for a more rigorous assessment of the reported performance gaps. revision: yes

Circularity Check

0 steps flagged

No circularity; empirical benchmark with new data and direct evaluation

full rationale

This is a benchmark release paper with no mathematical derivations, fitted parameters, predictions, or equations. The central claims rest on introducing 1,854 new diagrams and four evaluation tasks, with performance gaps measured directly on that data. No self-citation chains, ansatzes, or renamings reduce any result to prior inputs by construction. The work is self-contained against external benchmarks.

Axiom & Free-Parameter Ledger

0 free parameters · 0 axioms · 0 invented entities

No mathematical derivations, free parameters, axioms, or invented entities; the paper is an empirical benchmark introduction.

pith-pipeline@v0.9.0 · 5587 in / 1024 out tokens · 49739 ms · 2026-05-13T23:14:28.081716+00:00 · methodology

discussion (0)

Reference graph

Works this paper leans on

41 extracted references · 41 canonical work pages · 6 internal anchors

[1]

arXiv preprint arXiv:2508.08137 (2025)

Abbineni, P., Aldowaish, S., Liechty, C., Noorzad, S., Ghazizadeh, A., Fayazi, M.: Muallm: A multimodal large language model agent for circuit design as- sistance with hybrid contextual retrieval-augmented generation. arXiv preprint arXiv:2508.08137 (2025)

work page arXiv 2025
[2]

Adafruit Industries: Adafruit industries.https://www.adafruit.com(2026)

work page 2026
[3]

AI, M.: Llama 4 technical report.https://ai.meta.com(2025)

work page 2025
[4]

AI, M.: Ministral 14b.https://mistral.ai(2024)

work page 2024
[5]

Aldowaish, S., Karumanchi, Y., Chiang, K.C., Noorzad, S., Fayazi, M.: Sina: A cir- cuitschematicimage-to-netlistgeneratorusingartificialintelligence.arXivpreprint arXiv:2601.22114 (2026)

work page arXiv 2026
[6]

Anthropic: Claude sonnet 4.6.https://www.anthropic.com(2025), claude model family

work page 2025
[7]

Arduino: Arduino: Open-source electronics platform.https://www.arduino.cc (2026)

work page 2026
[8]

autodesk.com/products/eagle(2024), accessed: 2026-03-05

Autodesk Inc.: Autodesk eagle: Pcb design and schematic software.https://www. autodesk.com/products/eagle(2024), accessed: 2026-03-05

work page 2024
[9]

Qwen Technical Report

Bai, J., et al.: Qwen technical report. arXiv preprint arXiv:2309.16609 (2023)

work page internal anchor Pith review Pith/arXiv arXiv 2023
[10]

Qwen3-VL Technical Report

Bai, S., Cai, Y., Chen, R., Chen, K., Chen, X., Cheng, Z., Deng, L., Ding, W., Gao, C., Ge, C., et al.: Qwen3-vl technical report. arXiv preprint arXiv:2511.21631 (2025)

work page internal anchor Pith review Pith/arXiv arXiv 2025
[11]

arXiv preprint arXiv:2411.14299 (2024)

Bhandari, J., Bhat, V., He, Y., Rahmani, H., Garg, S., Karri, R.: Masala-chai: A large-scale spice netlist dataset for analog circuits by harnessing ai. arXiv preprint arXiv:2411.14299 (2024)

work page arXiv 2024
[12]

In: Proceedings of the IEEE/CVF Conference on Computer Vision and Pattern Recognition (CVPR)

Cheng, T., Song, L., Ge, Y., Liu, W., Wang, X., Shan, Y.: Yolo-world: Real-time open-vocabulary object detection. In: Proceedings of the IEEE/CVF Conference on Computer Vision and Pattern Recognition (CVPR). pp. 16901–16911 (2024)

work page 2024
[13]

DeepMind, G.: Gemini 2.5 flash-lite.https://deepmind.google(2025)

work page 2025
[14]

DeepMind, G.: Gemini 3.1 pro preview.https://deepmind.google(2026)

work page 2026
[15]

GitHub,Inc.:Github:Softwaredevelopmentplatform.https://github.com(2026)

work page 2026
[16]

In: 2025 ACM/IEEE 7th Symposium on Machine Learning for CAD (MLCAD)

Huang, C.Y., Chen, H.I., Ho, H.W., Kang, P.H., Lin, M.P.H., Liu, W.H., Ren, H.: Netlistify: Transforming circuit schematics into netlists with deep learning. In: 2025 ACM/IEEE 7th Symposium on Machine Learning for CAD (MLCAD). pp. 1–8. IEEE (2025)

work page 2025
[17]

Jocher, G., Chaurasia, A., Qiu, J.: Ultralytics yolo11.https://github.com/ ultralytics/ultralytics(2024)

work page 2024
[18]

Biometrika30(1/2), 81–93 (1938)

Kendall, M.G.: A new measure of rank correlation. Biometrika30(1/2), 81–93 (1938)

work page 1938
[19]

In: The Four- teenth International Conference on Learning Representations

Li, J., Chen, L., YANG, B., Zhu, J., Wang, Y., Ma, Y., Yang, M.: Pcb-bench: Benchmarking llms for printed circuit board placement and routing. In: The Four- teenth International Conference on Learning Representations

work page
[20]

Visual Instruction Tuning

Liu, H., Li, C., Wu, Q., Lee, Y.J.: Visual instruction tuning. arXiv preprint arXiv:2304.08485 (2023)

work page internal anchor Pith review Pith/arXiv arXiv 2023
[21]

arXiv preprint arXiv:2404.13013 (2024)

Ma, C., Jiang, Y., Wu, J., Yuan, Z., Qi, X.: Groma: Localized visual tokenization for grounding multimodal large language models. arXiv preprint arXiv:2404.13013 (2024)

work page arXiv 2024
[22]

In: CVPR (2024)

Ma, C., et al.: When visual grounding meets gigapixel-level large-scale scenes: Benchmark and approach. In: CVPR (2024)

work page 2024
[23]

OpenAI: Gpt-5-mini.https://openai.com(2025), openAI model release OmniSch: A Multimodal PCB Schematic Benchmark 17

work page 2025
[24]

OpenAI: Gpt-5.2.https://openai.com(2025), openAI model release

work page 2025
[25]

Kosmos-2: Grounding Multimodal Large Language Models to the World

Peng, Z., Wang, W., Dong, L., Hao, Y., Huang, S., Ma, S., Wei, F.: Kosmos- 2: Grounding multimodal large language models to the world. arXiv preprint arXiv:2306.14824 (2023)

work page internal anchor Pith review Pith/arXiv arXiv 2023
[26]

ProtoCentral: Protocentral: Open source medical electronics.https : / / protocentral.com(2026)

work page 2026
[27]

arXiv preprint arXiv:2311.03356 (2024)

Rasheed, H., et al.: Glamm: Pixel grounding large multimodal model. arXiv preprint arXiv:2311.03356 (2024)

work page arXiv 2024
[28]

Seeed Technology Co., Ltd.: Seeed studio.https://www.seeedstudio.com(2026)

work page 2026
[29]

Shi, Y., Tao, Z., Gao, Y., Huang, L., Wang, H., Yu, Z., Lin, T.J., He, L.: Amsnet 2.0: A large ams database with ai segmentation for net detection (2025),https: //arxiv.org/abs/2505.09155

work page arXiv 2025
[30]

In: 2025 IEEE In- ternational Conference on LLM-Aided Design (ICLAD)

Shi, Y., Tao, Z., Gao, Y., Huang, L., Wang, H., Yu, Z., Lin, T.J., He, L.: Amsnet 2.0: A large ams database with ai segmentation for net detection. In: 2025 IEEE In- ternational Conference on LLM-Aided Design (ICLAD). pp. 242–248. IEEE (2025)

work page 2025
[31]

SparkFun Electronics: Sparkfun electronics.https://www.sparkfun.com(2026)

work page 2026
[32]

In: 2024 IEEE LLM Aided Design Workshop (LAD)

Tao, Z., Shi, Y., Huo, Y., Ye, R., Li, Z., Huang, L., Wu, C., Bai, N., Yu, Z., Lin, T.J., et al.: Amsnet: Netlist dataset for ams circuits. In: 2024 IEEE LLM Aided Design Workshop (LAD). pp. 1–5. IEEE (2024)

work page 2024
[33]

General OCR Theory: Towards OCR-2.0 via a Unified End-to-end Model.arXiv preprint arXiv:2409.01704, 2024

Team, P.: Paddleocr 3.0 technical report. arXiv preprint arXiv:2409.01704 (2024), https://arxiv.org/abs/2409.01704

work page arXiv 2024
[34]

Thoma, F., Bayer, J., Li, Y., Dengel, A.: A public ground-truth dataset for hand- writtencircuitdiagramimages.In:InternationalConferenceonDocumentAnalysis and Recognition. pp. 20–27. Springer (2021)

work page 2021
[35]

In: Proceedings of the IEEE/CVF Conference on Computer Vision and Pattern Recognition

Wu, P., Xie, S.: V?: Guided visual search as a core mechanism in multimodal llms. In: Proceedings of the IEEE/CVF Conference on Computer Vision and Pattern Recognition. pp. 13084–13094 (2024)

work page 2024
[36]

In: 2025 International Symposium of Electronics Design Automation (ISEDA)

Xu,H.,Liu,C.,Wang,Q.,Huang,W.,Xu,Y.,Chen,W.,Peng,A.,Li,Z.,Li,B.,Qi, L., et al.: Image2net: datasets, benchmark and hybrid framework to convert analog circuit diagrams into netlists. In: 2025 International Symposium of Electronics Design Automation (ISEDA). pp. 807–816. IEEE (2025)

work page 2025
[37]

MM-REACT: Prompting ChatGPT for Multimodal Reasoning and Action

Yang, Z., Liu, Z., Wang, X., Wang, Z., Yu, Y., Yang, Z., et al.: Mm-react: Prompt- ing chatgpt for multimodal reasoning and action. arXiv preprint arXiv:2303.11381 (2023)

work page internal anchor Pith review arXiv 2023
[38]

Ferret: Refer and ground anything anywhere at any granular- ity

You, H., Zhang, H., Gan, Z., Du, X., Zhang, B., Wang, Z., Cao, L., Chang, S.F., Yang, Y.: Ferret: Refer and ground anything anywhere at any granularity. arXiv preprint arXiv:2310.07704 (2023)

work page arXiv 2023
[39]

In: CVPR (2024)

Zeng, Y., et al.: Investigating compositional challenges in vision-language models for visual grounding. In: CVPR (2024)

work page 2024
[40]

arXiv preprint arXiv:2512.24561 (2025)

Zhao, T., et al.: Rgbt-ground benchmark: Visual grounding beyond rgb in complex real-world scenarios. arXiv preprint arXiv:2512.24561 (2025)

work page arXiv 2025
[41]

DeepEyes: Incentivizing "Thinking with Images" via Reinforcement Learning

Zheng, Z., Yang, M., Hong, J., Zhao, C., Xu, G., Yang, L., Shen, C., Yu, X.: Deepeyes: Incentivizing" thinking with images" via reinforcement learning. arXiv preprint arXiv:2505.14362 (2025)

work page internal anchor Pith review Pith/arXiv arXiv 2025