Recognition: no theorem link
EDA-Schema-V2: A Multimodal Schema, Open Datasets, and Benchmarks for Machine Learning in Digital Physical Design
Pith reviewed 2026-05-11 01:13 UTC · model grok-4.3
The pith
EDA-Schema-V2 supplies a multimodal schema and large open datasets from the full digital design flow to support machine learning research in physical design.
A machine-rendered reading of the paper's core claim, the machinery that carries it, and where it could break.
Core claim
The authors present EDA-Schema-V2 as an open multimodal schema that organizes physical attributes and quality-of-results metrics across logic synthesis, floorplanning, placement, clock network synthesis, and routing. Using SkyWater 130nm, Nangate 45nm, IHP SG13G2 130nm, and ASAP 7nm process design kits with the OpenROAD flow, they generate 7776 design instances from the IWLS'05 benchmark suite through sweeps of clock period, core utilization, and aspect ratio. The resulting collection contains over 275 million gates, 75 million nets, and more than 36 million timing paths. They further identify twelve representative prediction tasks for timing, power, area, and routing metrics, supply initial
What carries the argument
EDA-Schema-V2 is the multimodal schema that supplies structured representations of physical attributes and quality-of-results metrics across the stages of the design flow from logic synthesis through detailed routing.
If this is right
- Machine learning models can be trained and compared on consistent stage-resolved data spanning synthesis to routing.
- The public dataset of 7776 instances supplies over 275 million gates and 75 million nets for developing predictors of timing, power, area, and routing.
- Twelve defined prediction tasks with baseline analyses create standardized points for measuring progress in ML-based physical design methods.
- Releasing both the schema and the data supports direct reproducibility across different research groups.
Where Pith is reading between the lines
- If adopted by others, the schema could allow merging datasets across tool flows to create larger combined training collections.
- Models built on these open instances could be tested for how well they transfer to proprietary industrial netlists.
- Adding sweeps over additional variables such as cell library choices or power-grid densities would test whether the current parameter space is sufficient.
Load-bearing premise
That datasets generated from the IWLS'05 circuits with the specified open PDKs and OpenROAD parameter sweeps will prove representative for training ML models that generalize to industrial designs and other tools.
What would settle it
Train models on the released dataset and evaluate prediction accuracy on designs produced with a commercial EDA tool or different process technologies; substantially lower accuracy than on the original data would show the benchmarks lack broad utility.
Figures
read the original abstract
The continuous scaling of CMOS technology has significantly increased the complexity of very large-scale integrated circuits, driving interest in applying machine learning (ML) to electronic design automation (EDA). However, the limited availability of open and standardized datasets limits interoperability, comparability, and reproducibility in ML-based research. This paper introduces EDA-Schema-V2, an open multimodal schema that provides a structured framework for representing and analyzing datasets in digital physical design. The schema includes representations of physical attributes and quality-of-results metrics across multiple stages of the design flow, including logic synthesis, floorplanning, placement, clock network synthesis, and routing. Utilizing the SkyWater 130nm, Nangate 45nm, IHP SG13G2 130nm, and ASAP 7nm open-source process design kits with the OpenROAD tool flow, datasets of physical circuit designs from the IWLS'05 benchmark suite are generated and analyzed. The dataset comprises 7,776 design instances spanning 18 benchmark circuits and includes stage-resolved representations from synthesis through detailed routing, generated through parameter sweeps over clock period, core utilization, and aspect ratio. The dataset contains over 275 million gates, 75 million nets, and more than 36 million extracted timing paths. In addition, twelve representative prediction tasks spanning timing, power, area, and routing metrics are identified, along with baseline analyses that characterize stage-to-stage predictability across the design flow. The resulting datasets and baselines are publicly released to support reproducible ML research and establish standardized benchmarks for evaluating ML-based approaches in digital physical design.
Editorial analysis
A structured set of objections, weighed in public.
Referee Report
Summary. The manuscript introduces EDA-Schema-V2, a multimodal schema for representing physical attributes and QoR metrics across the digital physical design flow (synthesis through routing). It generates and releases a dataset of 7,776 instances from the 18 IWLS'05 circuits using SkyWater 130nm, Nangate 45nm, IHP SG13G2 130nm, and ASAP 7nm PDKs with OpenROAD, via sweeps over clock period, core utilization, and aspect ratio. The data includes stage-resolved representations, over 275 million gates, 75 million nets, and 36 million timing paths. Twelve prediction tasks (timing, power, area, routing) are defined with stage-to-stage baseline analyses, and all artifacts are publicly released to support reproducible ML research and standardized benchmarks.
Significance. If the released schema and datasets see adoption, the work could meaningfully advance reproducible ML-for-EDA research by supplying the first large-scale, open, stage-resolved multimodal resource spanning the full physical design flow. The scale of the generated data and the explicit baseline characterizations for cross-stage predictability are concrete strengths that lower the barrier for model development and comparison. The contribution is primarily infrastructural rather than algorithmic, but the open release directly addresses a documented community need.
major comments (1)
- [Abstract] Abstract: The central claim that the datasets and baselines 'establish standardized benchmarks for evaluating ML-based approaches in digital physical design' is load-bearing for the paper's stated contribution. This claim rests on an unverified assumption of representativeness; the generation pipeline is confined to IWLS'05 circuits, four specific open PDKs, and OpenROAD parameter sweeps, with no described cross-tool, cross-PDK, or cross-scale validation to show that timing/power/area distributions or netlist statistics remain predictive outside this regime.
minor comments (1)
- The abstract and introduction would benefit from an explicit statement of the schema's concrete data formats or serialization (e.g., how physical attributes and per-stage QoR metrics are encoded) to improve immediate usability for readers intending to adopt the schema.
Simulated Author's Rebuttal
We thank the referee for the constructive feedback and positive assessment of the work's potential to advance reproducible ML-for-EDA research. We address the major comment regarding the benchmark claim in the abstract.
read point-by-point responses
-
Referee: [Abstract] Abstract: The central claim that the datasets and baselines 'establish standardized benchmarks for evaluating ML-based approaches in digital physical design' is load-bearing for the paper's stated contribution. This claim rests on an unverified assumption of representativeness; the generation pipeline is confined to IWLS'05 circuits, four specific open PDKs, and OpenROAD parameter sweeps, with no described cross-tool, cross-PDK, or cross-scale validation to show that timing/power/area distributions or netlist statistics remain predictive outside this regime.
Authors: We agree that the datasets are generated from a specific set of 18 IWLS'05 circuits, four open PDKs, and OpenROAD sweeps, without cross-tool or cross-scale validation to demonstrate broader representativeness. The manuscript does not assert that the timing, power, or area distributions are predictive outside this regime. The core contribution is the release of the first large-scale, open, stage-resolved multimodal schema and dataset spanning synthesis through routing, together with twelve explicitly defined prediction tasks and baseline results. To address the concern, we will revise the abstract to state that the work 'provides open datasets, a multimodal schema, and baselines that contribute to standardized benchmarks' rather than 'establish standardized benchmarks'. We will also add a short paragraph in the conclusions acknowledging the current scope limitations and encouraging community extensions to other tools and PDKs. This change ensures the claim accurately reflects the evidence while preserving the infrastructural value of the public release. revision: yes
Circularity Check
No circularity: data-generation and release effort with independent process description
full rationale
The paper introduces a schema, generates datasets via explicit parameter sweeps on IWLS'05 circuits with open PDKs and OpenROAD, identifies 12 prediction tasks, and releases the data plus baseline characterizations. No equations, fitted parameters renamed as predictions, self-citations used as load-bearing uniqueness theorems, or ansatzes smuggled in via prior work appear in the provided text. All claims rest on the described generation pipeline and public release rather than any reduction to inputs by construction.
Axiom & Free-Parameter Ledger
free parameters (3)
- clock period
- core utilization
- aspect ratio
axioms (1)
- domain assumption OpenROAD flow with the listed open PDKs produces valid, extractable physical designs for IWLS'05 benchmarks
invented entities (1)
-
EDA-Schema-V2
no independent evidence
Reference graph
Works this paper leans on
-
[1]
Machine learning for electronic design automation: A survey,
G. Huang, J. Hu, Y. He, J. Liu, M. Ma, Z. Shen, J. Wu, Y. Xu, H. Zhang, and K. Zhong, “Machine learning for electronic design automation: A survey, ”ACM Transactions on Design Automation of Electronic Systems (TODAES), Vol. 26, No. 5, pp. 1–46, Jun. 2021
work page 2021
-
[2]
METRICS2.1 and flow tuning in the IEEE CEDA robust design flow and OpenROAD,
J. Jung, A. B. Kahng, S. Kim, and R. Varadarajan, “METRICS2.1 and flow tuning in the IEEE CEDA robust design flow and OpenROAD, ”Proceedings of the IEEE/ACM International Conference On Computer Aided Design (ICCAD), pp. 1–9, Nov. 2021
work page 2021
-
[3]
Z. Chai, Y. Zhao, W. Liu, Y. Lin, R. Wang, and R. Huang, “CircuitNet: An open-source dataset for machine learning in VLSI CAD applications with improved domain-specific evaluation metric and learning strategies, ”IEEE Transactions on Computer-Aided Design of Integrated Circuits and Systems, Vol. 42, No. 12, pp. 5034–5047, Jun. 2023
work page 2023
-
[4]
CircuitOps: An ML infrastructure enabling generative AI for VLSI circuit optimization,
R. Liang, A. Agnesina, G. Pradipta, V. A. Chhabria, and H. Ren, “CircuitOps: An ML infrastructure enabling generative AI for VLSI circuit optimization, ”Proceedings of the IEEE/ACM International Conference on Computer Aided Design (ICCAD), pp. 1–6, Oct. 2023. 30 Pratik Shrestha, Alec Aversa, and Ioannis Savidis
work page 2023
-
[5]
OpenROAD: Toward a self-driving, open-source digital layout implementation tool chain,
T. Ajayi and D. Blaauw, “OpenROAD: Toward a self-driving, open-source digital layout implementation tool chain, ” Proceedings of the Government Microcircuit Applications and Critical Technology Conference, pp. 1–6, Mar. 2019
work page 2019
-
[6]
EDA-schema: A graph datamodel schema and open dataset for digital design automation,
P. Shrestha, A. Aversa, S. Phatharodom, and I. Savidis, “EDA-schema: A graph datamodel schema and open dataset for digital design automation, ”Proceedings of the ACM Great Lakes Symposium on VLSI (GLSVLSI), pp. 1–8, Jun. 2024
work page 2024
-
[7]
C. Albrecht, “IWLS 2005 benchmarks, ”In International Workshop for Logic Synthesis (IWLS): http://www.iwls.org, Jun. 2005
work page 2005
-
[8]
OpenSTA: Static timing analysis tool,
The OpenROAD Project, “OpenSTA: Static timing analysis tool, ” https://github.com/The-OpenROAD-Project/ OpenSTA, 2025
work page 2025
-
[9]
OpenRCX: Parasitic extraction tool,
The OpenROAD Project, “OpenRCX: Parasitic extraction tool, ” https://github.com/The-OpenROAD-Project/OpenRCX, 2025
work page 2025
-
[10]
IEEE standard for integrated circuit (IC) open library architecture (OLA),
“IEEE standard for integrated circuit (IC) open library architecture (OLA), ”IEEE Std 1481-2019 (Revision of IEEE Std 1481-2009), pp. 1–641, Mar. 2020
work page 2019
-
[11]
PDNSim: Power delivery network analysis tool,
The OpenROAD Project, “PDNSim: Power delivery network analysis tool, ” https://github.com/The-OpenROAD- Project/PDNSim, 2019
work page 2019
-
[12]
Fast and accurate routing demand estimation for efficient routability-driven placement,
P. Spindler and F. M. Johannes, “Fast and accurate routing demand estimation for efficient routability-driven placement, ” Proceedings of the Design, Automation & Test in Europe Conference & Exhibition (DATE), pp. 1–6, Apr. 2007
work page 2007
- [13]
-
[14]
SkyWater SKY130 Open Source PDK,
Google and SkyWater Technology Foundry, “SkyWater SKY130 Open Source PDK, ” https://github.com/google/ skywater-pdk, 2025
work page 2025
- [15]
-
[16]
ASAP: Arizona State Predictive PDK,
“ASAP: Arizona State Predictive PDK, ” https://asap.asu.edu, 2017
work page 2017
-
[17]
J. Ahn, K. Chang, K.-M. Choi, T. Kim, and H. Park, “DTOC-P: Deep-learning-driven timing optimization using commercial EDA tool with practicality enhancement, ”IEEE Transactions on Computer-Aided Design of Integrated Circuits and Systems, Vol. 43, No. 8, pp. 2493–2506, Feb. 2024
work page 2024
-
[18]
RouteNet: Routability prediction for mixed-size designs using convolutional neural network,
Z. Xie, Y.-H. Huang, G.-Q. Fang, H. Ren, S.-Y. Fang, Y. Chen, and J. Hu, “RouteNet: Routability prediction for mixed-size designs using convolutional neural network, ”Proceedings of the IEEE/ACM International Conference on Computer-Aided Design (ICCAD), pp. 1–8, Nov. 2018
work page 2018
-
[19]
Eh-DRVP: Combining placement and global routing data in a hyper-image-based DRV predictor,
S. F. Almeida, R. Netto, T. A. Fontana, E. Aghaeekiasaraee, U. Gandhi, A. F. Tabrizi, J. L. Güntzel, L. Behjat, and C. Meinhardt, “Eh-DRVP: Combining placement and global routing data in a hyper-image-based DRV predictor, ” Integration, the VLSI Journal, Vol. 101, pp. 1–14, Mar. 2025
work page 2025
-
[20]
Driving early physical synthesis exploration through end-of-flow total power prediction,
Y.-C. Lu, W.-T. Chan, V. Khandelwal, and S. K. Lim, “Driving early physical synthesis exploration through end-of-flow total power prediction, ”Proceedings of the ACM/IEEE Workshop on Machine Learning for CAD (MLCAD), pp. 97–102, Sept. 2022
work page 2022
-
[21]
PowPrediCT: Cross-stage power prediction with circuit-transformation-aware learning,
Y. Du, Z. Guo, X. Jiang, Z. Chai, Y. Zhao, Y. Lin, R. Wang, and R. Huang, “PowPrediCT: Cross-stage power prediction with circuit-transformation-aware learning, ”Proceedings of the ACM/IEEE Design Automation Conference, pp. 1–6, Nov. 2024
work page 2024
-
[22]
Net2: A graph attention network method customized for pre- placement net length estimation,
Z. Xie, R. Liang, X. Xu, J. Hu, Y. Duan, and Y. Chen, “Net2: A graph attention network method customized for pre- placement net length estimation, ”Proceedings of the Asia and South Pacific Design Automation Conference (ASPDAC), pp. 671–677, Jan. 2021
work page 2021
-
[23]
Differentiable graph neural networks for wirelength estimation,
Z. Wu, P. Shrestha, S. Phatharodom, and I. Savidis, “Differentiable graph neural networks for wirelength estimation, ” Proceedings of the IEEE International Symposium on Circuits and Systems (ISCAS), pp. 1–5, May 2025
work page 2025
-
[24]
Graph representation learning for gate arrival time prediction,
P. Shrestha, S. Phatharodom, and I. Savidis, “Graph representation learning for gate arrival time prediction, ”Proceedings of the ACM/IEEE Workshop on Machine Learning for CAD (MLCAD), pp. 127–133, Sept. 2022
work page 2022
-
[25]
Graph representation learning for parasitic impedance prediction of the interconnect,
P. Shrestha and I. Savidis, “Graph representation learning for parasitic impedance prediction of the interconnect, ” Proceedings of the IEEE International Symposium on Circuits and Systems (ISCAS), pp. 1–5, May 2023
work page 2023
-
[26]
J. Yoon, J. Lee, D. Kim, J. Hur, and S. Kang, “ParaFormer: A hybrid graph neural network and transformer approach for pre-routing parasitic RC prediction, ”Proceedings of the Asia and South Pacific Design Automation Conference (ASPDAC), pp. 513–519, Mar. 2025
work page 2025
-
[27]
A lightweight inception boosted U-net neural network for routability prediction,
H. Li, Y. Huo, Y. Wang, X. Yang, M. Hao, and X. Wang, “A lightweight inception boosted U-net neural network for routability prediction, ”Proceedings of the International Symposium of Electronics Design Automation (ISEDA), pp. 648–653, May 2024
work page 2024
-
[28]
Encoder-decoder networks for analyzing thermal and power delivery networks,
V. A. Chhabria, V. Ahuja, A. Prabhu, N. Patil, P. Jain, and S. S. Sapatnekar, “Encoder-decoder networks for analyzing thermal and power delivery networks, ”ACM Transactions on Design Automation of Electronic Systems, Vol. 28, No. 3, pp. 1–27, Dec. 2022
work page 2022
-
[29]
Global and local attention-based inception U-Net for static IR drop estimation,
Y. Chen, Z. Cai, M. Wei, Z. Lin, and J. Chen, “Global and local attention-based inception U-Net for static IR drop estimation, ” pp. 673–680, Nov. 2024. EDA-Schema-V2: Multimodal Schema, Open Datasets, and Benchmarks for ML in Digital Physical Design 31
work page 2024
-
[30]
LaRED: Efficient IR drop predictor with layout- preserving rebuilder-encoder-decoder architecture,
C. Yu, Y. Teng, W. Dai, Y. Li, W. W. Xing, X. Wu, D. Niu, and Z. Jin, “LaRED: Efficient IR drop predictor with layout- preserving rebuilder-encoder-decoder architecture, ”Proceedings of the Design, Automation & Test in Europe Conference (DATE), pp. 1–7, Mar. 2025
work page 2025
-
[31]
An optimization-aware prerouting timing prediction framework based on multimodal learning,
P. Cao, Y. Qin, G. He, W. Ding, X. Cheng, Z. Zhang, and Y. Ye, “An optimization-aware prerouting timing prediction framework based on multimodal learning, ”IEEE Transactions on Computer-Aided Design of Integrated Circuits and Systems, Vol. 44, No. 10, pp. 3896–3909, Oct. 2025
work page 2025
-
[32]
Multilevel hypergraph partitioning: Application in VLSI domain,
G. Karypis, R. Aggarwal, V. Kumar, and S. Shekhar, “Multilevel hypergraph partitioning: Application in VLSI domain, ” Proceedings of the Design Automation Conference (DAC), pp. 526–529, Jun. 1997. 32 Pratik Shrestha, Alec Aversa, and Ioannis Savidis A Appendix The appendix provides additional analyses that support the characterization of the dataset and e...
work page 1997
discussion (0)
Sign in with ORCID, Apple, or X to comment. Anyone can read and Pith papers without signing in.