arxiv: 2604.08021 · v1 · submitted 2026-04-09 · 💻 cs.DB

Recognition: no theorem link

SynQL: A Controllable and Scalable Rule-Based Framework for SQL Workload Synthesis for Performance Benchmarking

Kahan Mehta , Amit Mankodi

Authors on Pith no claims yet

Pith reviewed 2026-05-10 18:33 UTC · model grok-4.3

classification 💻 cs.DB

keywords SQL workload synthesisquery optimizer trainingforeign-key graph traversalsynthetic data generationdatabase benchmarkinglearned cost modelingAST-based query generationcontrollable SQL synthesis

0 comments

The pith

SynQL generates valid, diverse SQL workloads by traversing a database's foreign-key graph and populating an abstract syntax tree under explicit parametric control.

A machine-rendered reading of the paper's core claim, the machinery that carries it, and where it could break.

Acquiring realistic SQL workloads for training learned query optimizers is difficult because privacy rules block access to production queries and anonymized traces often omit executable text. Existing fixed benchmarks lack variety for statistical learning while language-model generators frequently produce schema errors or overly simple joins. SynQL instead walks the live foreign-key graph to deterministically construct execution-ready queries that include multi-table joins, projections, aggregations, and range predicates. A configuration vector supplies direct control over join topology, analytical intensity, and predicate selectivity. Experiments on TPC-H and IMDb schemas show the resulting workloads reach near-maximal topological diversity and support training of tree-based cost models that achieve strong accuracy on held-out synthetic data with sub-millisecond inference.

Core claim

SynQL is a deterministic rule-based framework that traverses a database's foreign-key graph to build an abstract syntax tree for the core analytical SQL fragment of multi-table joins with projections, aggregations, and predicates. A single configuration vector Θ explicitly governs join topology (Star, Chain, Fork), analytical intensity, and predicate selectivity, guaranteeing schema and syntactic validity by construction without probabilistic generation. On TPC-H and IMDb the method yields workloads with topological entropy of 1.53 bits; tree-based cost models trained on the synthetic corpus attain R² ≥ 0.79 on held-out synthetic test sets at sub-millisecond inference latency.

What carries the argument

Foreign-key graph traversal that populates an abstract syntax tree for SQL queries, parameterized by a configuration vector Θ to control topology, intensity, and selectivity.

If this is right

Synthetic corpora can replace inaccessible production logs for training learned query optimizers.
Explicit control over join topology and predicate selectivity enables targeted generation of stress-test workloads.
Near-maximal topological entropy supports better statistical generalization than fixed-template benchmarks.
Trained cost models deliver accurate estimates at sub-millisecond latency suitable for real-time optimizer use.
The approach works across different schemas such as TPC-H and IMDb without requiring probabilistic sampling.

Where Pith is reading between the lines

These are editorial extensions of the paper, not claims the author makes directly.

The same graph-traversal technique could be adapted to synthesize training data for related database tasks such as index recommendation or rewrite rule learning.
If the synthetic patterns prove sufficiently representative, the framework could reduce dependence on anonymized traces that discard executable query text.
Direct evaluation of SynQL-trained models against proprietary real workloads would quantify generalization beyond the synthetic domain.
Combining SynQL with existing benchmark suites could create hybrid evaluation pipelines that test both synthetic diversity and real-world fidelity.

Load-bearing premise

Workloads generated by traversing the foreign-key graph and controlled by Θ are sufficiently representative of real-world analytical query patterns to serve as effective training data for learned optimizers.

What would settle it

Cost models trained on SynQL data achieving markedly lower accuracy on real production query traces than on the held-out synthetic test sets would falsify the claim that the synthetic workloads are effective substitutes.

Figures

Figures reproduced from arXiv: 2604.08021 by Amit Mankodi, Kahan Mehta.

**Figure 1.** Figure 1: SynQL pipeline overview. The database catalog feeds Phase I (Algorithm 1), which produces a join blueprint under topology bias αshape. Phase II (Algorithm 2) injects semantic content and compiles each query via an AST. Configuration vector Θ governs both phases; the outer loop repeats them N times to emit workload Q. Iterative Edge Selection. At each expansion step the algorithm identifies Jposs, the set o… view at source ↗

**Figure 2.** Figure 2: Effect of αshape on join topology. High values attach all tables to the root R (Star); low values extend the deepest frontier (Chain); intermediate values produce branching Forks. validity: every non-aggregated column in the SELECT list will have a corresponding GROUP BY entry, mechanically eliminating the “unaggregated column” errors that plague LLM-based generators. Step 2: Predicate Injection. With prob… view at source ↗

**Figure 3.** Figure 3: Detailed walkthrough of a single SynQL iteration on the IMDb schema. Phase I (steps 1–3): root table title is selected, join depth 3 is sampled, and three αshape-weighted edge expansions produce a chain subgraph with FK join conditions shown on each edge. Phase II (steps 4–6): columns are sampled with full aggregation (Pagg = 1.0), a year predicate is injected, and the AST compiler auto-appends GROUP B… view at source ↗

read the original abstract

Database research and the development of learned query optimisers rely heavily on realistic SQL workloads. Acquiring real-world queries is increasingly difficult, however, due to strict privacy regulations, and publicly released anonymised traces typically strip out executable query text to preserve confidentiality. Existing synthesis tools fail to bridge this training data gap: traditional benchmarks offer too few fixed templates for statistical generalisation, while Large Language Model (LLM) approaches suffer from schema hallucination fabricating non-existent columns and topological collapse systematically defaulting to simplistic join patterns that fail to stress-test query optimisers. We propose SynQL, a deterministic workload synthesis framework that generates structurally diverse, execution-ready SQL workloads. As a foundational step toward bridging the training-data gap, SynQL targets the core SQL fragment -- multi-table joins with projections, aggregations, and range predicates -- which dominates analytical workloads. SynQL abandons probabilistic text generation in favour of traversing the live database's foreign-key graph to populate an Abstract Syntax Tree (AST), guaranteeing schema and syntactic validity by construction. A configuration vector $\Theta$ provides explicit, parametric control over join topology (Star, Chain, Fork), analytical intensity, and predicate selectivity. Experiments on TPC-H and IMDb show that SynQL produces near-maximally diverse workloads (Topological Entropy $H = 1.53$ bits) and that tree-based cost models trained on the synthetic corpus achieve $R^2 \ge 0.79$ on held-out synthetic test sets with sub-millisecond inference latency, establishing SynQL as an effective foundation for generating training data when production logs are inaccessible.

Editorial analysis

A structured set of objections, weighed in public.

Desk editor's note, referee report, simulated authors' rebuttal, and a circularity audit. Tearing a paper down is the easy half of reading it; the pith above is the substance, this is the friction.

Desk Editor's Note private letter to a colleague

SynQL gives a deterministic FK-graph traversal to generate valid, controllable SQL workloads, but its cost model results are only shown on synthetic data from the same process.

read the letter

SynQL is a rule-based framework that generates SQL workloads by traversing the database's foreign-key graph to fill an abstract syntax tree. This gives explicit control through a parameter vector Θ over join topologies like star or chain, selectivity, and analytical features, while ensuring every query is schema-valid and executable by construction. The new part is the deterministic graph traversal approach. It sidesteps the schema hallucination and topological collapse that come with LLM generators, and it offers more flexibility than the small set of fixed templates in traditional benchmarks like TPC-H. They measure topological entropy at 1.53 bits on TPC-H and IMDb, which indicates good structural variety. The paper shows that cost models trained on these synthetic queries reach R² values of 0.79 or higher on held-out synthetic data, with quick inference. This suggests the generated workloads can support training when real traces are unavailable due to privacy. However, the evaluation has a clear limitation. All the reported results use data from the same generation process for both training and testing. There are no experiments comparing predictions to actual execution times on a DBMS, no tests on the canonical TPC-H query set, and no checks against real-world query logs. This means we don't yet know if models trained this way generalize beyond the synthetic distribution or capture real cost structures. The assumption that these controlled synthetic queries represent real analytical patterns well enough for optimizer training is plausible but untested in the paper. This work is for people in the database community who need large volumes of varied SQL for developing learned optimizers or running performance studies. It is a useful engineering tool rather than a theoretical breakthrough. I think it deserves peer review. The method is well-defined and addresses a practical problem directly, so referees can help strengthen the validation side.

Referee Report

2 major / 1 minor

Summary. The paper introduces SynQL, a deterministic rule-based framework that synthesizes execution-ready SQL workloads by traversing a live database's foreign-key graph to populate an AST for multi-table joins with projections, aggregations, and range predicates. A configuration vector Θ explicitly controls join topology (Star/Chain/Fork), analytical intensity, and predicate selectivity. On TPC-H and IMDb schemas, it reports near-maximal diversity via topological entropy H=1.53 bits and shows that tree-based cost models trained on the generated corpus achieve R² ≥ 0.79 on held-out synthetic test sets with sub-millisecond inference.

Significance. If the generated workloads are shown to be representative of real analytical patterns, SynQL would provide a scalable, controllable, and validity-guaranteed source of training data for learned query optimizers where privacy constraints limit access to production logs. The deterministic FK-graph traversal and parametric control avoid common pitfalls of LLM-based synthesis such as schema hallucination. The reported entropy and latency numbers, if reproducible, would be concrete strengths.

major comments (2)

[Abstract] Abstract and Experiments section: the central claim that SynQL supplies effective training data for learned optimizers rests on tree-based models reaching R² ≥ 0.79, yet this is measured only on held-out workloads produced by the identical FK-graph traversal and Θ-controlled generation process; no results are reported on the standard 22 TPC-H queries, measured DBMS execution times, or transfer to real production traces, leaving the representativeness assumption untested and risking that models simply recover the deterministic generation rules.
[Abstract] Abstract: the diversity claim (Topological Entropy H = 1.53 bits) and downstream R² values are presented without any comparison to existing synthesis baselines (fixed-template benchmarks or LLM approaches) on the same schemas and downstream task, so the asserted superiority in bridging the training-data gap cannot be assessed from the reported numbers alone.

minor comments (1)

[Abstract] Abstract: the definition and normalization of Topological Entropy H are not provided, making it impossible to verify whether 1.53 bits is near-maximal for the given schemas.

Simulated Author's Rebuttal

2 responses · 1 unresolved

We thank the referee for the constructive and detailed comments. We address each major comment below, clarifying the intended scope of our contributions while acknowledging limitations in the current evaluation.

read point-by-point responses

Referee: [Abstract] Abstract and Experiments section: the central claim that SynQL supplies effective training data for learned optimizers rests on tree-based models reaching R² ≥ 0.79, yet this is measured only on held-out workloads produced by the identical FK-graph traversal and Θ-controlled generation process; no results are reported on the standard 22 TPC-H queries, measured DBMS execution times, or transfer to real production traces, leaving the representativeness assumption untested and risking that models simply recover the deterministic generation rules.

Authors: We agree that the reported R² ≥ 0.79 is measured exclusively on held-out synthetic workloads generated by the same deterministic FK-graph traversal and Θ-controlled process. This choice aligns with the paper's stated goal of providing a controllable, validity-guaranteed data source specifically for scenarios where production logs are inaccessible due to privacy constraints. The high topological entropy and parametric control are intended to ensure the synthetic distribution is rich enough for model training within that regime. We acknowledge that this leaves the broader representativeness to real analytical patterns untested and creates the possibility that models may partially recover generation rules rather than general cost-model features. In the revised manuscript we will add experiments that apply the trained models to the standard 22 TPC-H queries and report measured DBMS execution times for direct comparison. We will also expand the discussion to address potential transfer and the risk of rule recovery. revision: partial
Referee: [Abstract] Abstract: the diversity claim (Topological Entropy H = 1.53 bits) and downstream R² values are presented without any comparison to existing synthesis baselines (fixed-template benchmarks or LLM approaches) on the same schemas and downstream task, so the asserted superiority in bridging the training-data gap cannot be assessed from the reported numbers alone.

Authors: We accept that the manuscript presents the entropy and R² figures without quantitative head-to-head comparisons against fixed-template or LLM-based synthesizers on identical schemas and the same downstream cost-modeling task. While the text qualitatively contrasts the limitations of templates (low diversity) and LLMs (hallucination and collapse), direct numerical evidence is absent. In the revised version we will extend the experimental section to include such baselines: we will generate comparable workloads using TPC-H templates and publicly available LLM synthesis methods on the same TPC-H and IMDb schemas, then report topological entropy and the resulting R² of tree-based cost models trained on each corpus. This will enable an objective assessment of SynQL's relative effectiveness. revision: yes

standing simulated objections not resolved

Direct transfer results on real production traces cannot be provided, as the authors do not have access to such proprietary data.

Circularity Check

0 steps flagged

No significant circularity; generative framework evaluated via standard held-out split on its own outputs

full rationale

The paper presents SynQL as a deterministic, rule-based generator that traverses FK graphs under parametric control Θ to produce ASTs. The reported R² ≥ 0.79 is an experimental outcome of training tree-based regressors on one synthetic corpus and testing on a held-out portion of the same corpus; this is ordinary ML practice and does not reduce any claimed prediction to a fitted parameter or self-defined quantity by construction. No self-citations, uniqueness theorems, or ansatzes are invoked to justify core results. The absence of real-query validation is a limitation of external validity, not a circularity in the derivation chain.

Axiom & Free-Parameter Ledger

0 free parameters · 1 axioms · 0 invented entities

The approach rests on the assumption that every schema contains a traversable foreign-key graph that can be used to generate representative analytical queries; no new entities are postulated and no parameters are fitted to data.

axioms (1)

domain assumption The input database schema contains foreign-key relationships that can be traversed to produce valid multi-table joins.
Invoked when describing how the AST is populated from the live database.

pith-pipeline@v0.9.0 · 5590 in / 1345 out tokens · 37248 ms · 2026-05-10T18:33:52.920576+00:00 · methodology

discussion (0)

Reference graph

Works this paper leans on

31 extracted references · 18 canonical work pages

[1]

Synthetic data generation for healthcare: Ex- ploring generative adversarial networks variants for medical tabular data

Halal Abdulrahman-Ahmed, Pau Baquero-Arnal, Javier Silvestre-Blanes, and Victor Sempere-Paya. Synthetic data generation for healthcare: Ex- ploring generative adversarial networks variants for medical tabular data. International Journal of Data Science and Analytics, 20:5739–5754, 2025. https://doi.org/10.1007/s41060-025-00816-w . URL https://link. spring...

work page doi:10.1007/s41060-025-00816-w 2025
[2]

The Snowflake elastic data warehouse

Benoit Dageville, Thierry Cruanes, Marcin Zukowski, Vadim Antonov, Artin Avanes, Jon Bock, Jonathan Claybaugh, Daniel Engovatov, Martin Isard, Speedy Joshi, et al. The Snowflake elastic data warehouse. InProceedings of the 2016 ACM SIGMOD International Conference on Management of Data, pages 215–226, 2016. https://doi.org/10.1145/2882903.2903741. URLhttps...

work page doi:10.1145/2882903.2903741 2016
[3]

Is your data warehouse ready for AI? Redset: A large-scale, realistic benchmark from Redshift workloads.arXiv preprint arXiv:2411.07571, 2024

Parimarjan Jain, Abhash Kumar Pokharel, Navneet Dhillon, Aaron Elmore, Ryan Marcus, and Tim Kraska. Is your data warehouse ready for AI? Redset: A large-scale, realistic benchmark from Redshift workloads.arXiv preprint arXiv:2411.07571, 2024. URLhttps://arxiv.org/abs/2411.07571

work page arXiv 2024
[4]

New TPC benchmarks for decision support and web commerce.ACM SIGMOD Record, 29(4):64–71, 2000

Meikel Poess and Chris Floyd. New TPC benchmarks for decision support and web commerce.ACM SIGMOD Record, 29(4):64–71, 2000. URL https://dl.acm.org/doi/10.1145/373626.373714

work page doi:10.1145/373626.373714 2000
[5]

The making of TPC-DS

Raghunath Othayoth Nambiar and Meikel Poess. The making of TPC-DS. Proceedings of the VLDB Endowment, 32:999–1005, 2006. URL https: //dl.acm.org/doi/10.5555/1182635.1164217

work page doi:10.5555/1182635.1164217 2006
[6]

How good are query optimizers, really?Proceedings of the VLDB Endowment, 9(3):204–215, 2015

Viktor Leis, Andrey Gubichev, Atanas Mirchev, Peter Boncz, Alfons Kemper, and Thomas Neumann. How good are query optimizers, really?Proceedings of the VLDB Endowment, 9(3):204–215, 2015. URL http://www.vldb.org/ pvldb/vol9/p204-leis.pdf

2015
[7]

Still asking: How good are query optimizers, really?Proceedings of the VLDB Endowment, 18:5531–5544,

Viktor Leis and Thomas Neumann. Still asking: How good are query optimizers, really?Proceedings of the VLDB Endowment, 18:5531–5544,
[8]

URLhttp://www.vldb.org/pvldb/vol18/p5531-viktor.pdf
[9]

Wang, and Victor Zhong

Fangyu Lei, Jixuan Chen, Yuxiao Ye, Ruisheng Cao, Dongchan Shin, Hong- shen Su, Zhengyang Suo, Hongbin Gao, Wenjing Hu, Pengcheng Yin, et al. Spider 2.0: Evaluating language models on real-world enterprise text-to-SQL workflows, 2024. URL https://arxiv.org/abs/2411.07763. arXiv:2411.07763

work page arXiv 2024
[10]

Can LLM already serve as a database interface? A big bench for large-scale database grounded text- to-SQLs

Jinyang Li, Binyuan Hui, Ge Qu, Jiaxi Yang, Binhua Li, Bowen Li, Bailin Wang, Bowen Qin, Ruiying Geng, Nan Huo, et al. Can LLM already serve as a database interface? A big bench for large-scale database grounded text- to-SQLs. InAdvances in Neural Information Processing Systems (NeurIPS),
[11]

URLhttps://arxiv.org/abs/2305.03111

work page arXiv
[12]

Stage: Query execution time SynQL: Controllable SQL Workload Synthesis 23 prediction in Amazon Redshift

Ziniu Wu, Ryan Marcus, Zhengchun Liu, Parimarjan Negi, Vikram Nathan, Pascal Pfeil, Gaurav Saxena, Mohammad Rahman, Balakrish- nan Narayanaswamy, and Tim Kraska. Stage: Query execution time SynQL: Controllable SQL Workload Synthesis 23 prediction in Amazon Redshift. InCompanion of the 2024 Interna- tional Conference on Management of Data (SIGMOD/PODS ’24)...

work page doi:10.1145/3626246.3653391 2024
[13]

JOB-Complex: A challenging benchmark for traditional & learned query optimization, 2025

Jonas Wehrstein, Tobias Eckmann, Ruben Heinrich, and Carsten Bin- nig. JOB-Complex: A challenging benchmark for traditional & learned query optimization, 2025. URL https://arxiv.org/abs/2507.07471. arXiv:2507.07471

work page arXiv 2025
[14]

Learned cardinalities: Estimating correlated joins with deep learning

Andreas Kipf, Thomas Kipf, Bernhard Radke, Viktor Leis, Peter Boncz, and Alfons Kemper. Learned cardinalities: Estimating correlated joins with deep learning. InCIDR, 2019. URLhttps://arxiv.org/abs/1809.00677

work page arXiv 2019
[15]

NeuroCard: One cardinality estimator for all tables.Proceedings of the VLDB Endowment, 14(1):61–73, 2020

Zongheng Yang, Amog Kamsetty, Shu Luan, Eric Liang, Yan Duan, Xi Chen, and Ion Stoica. NeuroCard: One cardinality estimator for all tables.Proceedings of the VLDB Endowment, 14(1):61–73, 2020. URL https://www.vldb.org/pvldb/vol14/p61-yang.pdf

2020
[16]

An end-to-end learning-based cost estimator

Ji Sun and Guoliang Li. An end-to-end learning-based cost estimator. Proceedings of the VLDB Endowment, 13(3):307–319, 2019. URL https: //www.vldb.org/pvldb/vol13/p307-sun.pdf

2019
[17]

Bao: Making learned query optimization practical

Ryan Marcus, Parimarjan Negi, Hongzi Mao, Nesime Tatbul, Mohammad Alizadeh, and Tim Kraska. Bao: Making learned query optimization practical. InProceedings of ACM SIGMOD, pages 2177–2191, 2021. URL https: //dl.acm.org/doi/10.1145/3448016.3452711

work page doi:10.1145/3448016.3452711 2021
[18]

Reads the Manual

Zongheng Yang, Wei-Lin Chiang, Shu Luan, Michael Luo, and Ion Sto- ica. Balsa: Learning a query optimizer without expert demonstrations. InProceedings of ACM SIGMOD, pages 931–944, 2022. URL https: //dl.acm.org/doi/10.1145/3514221.3517843

work page doi:10.1145/3514221.3517843 2022
[19]

LIMAO: A framework for lifelong modular learned query optimization, 2025

Yuxing Chen, Ziniu Wu, and Tim Kraska. LIMAO: A framework for lifelong modular learned query optimization, 2025. URL https://arxiv.org/abs/ 2507.00188. arXiv:2507.00188

work page arXiv 2025
[20]

A survey on learned query optimization.arXiv preprint arXiv:2404.02595, 2024

Rong Zhu, Liang Chen, Shuai Wang, et al. A survey on learned query optimization.arXiv preprint arXiv:2404.02595, 2024. URL https://arxiv. org/abs/2404.02595

work page arXiv 2024
[21]

Learned cardinality estimation: A design space exploration and comparative eval- uation.Proceedings of the VLDB Endowment, 15(1):85–97, 2021

Ji Sun, Jintao Zhang, Zhaoyan Sun, Guoliang Li, and Nan Tang. Learned cardinality estimation: A design space exploration and comparative eval- uation.Proceedings of the VLDB Endowment, 15(1):85–97, 2021. URL https://www.vldb.org/pvldb/vol15/p85-sun.pdf

2021
[22]

You are a security analyst specializing in Microsoft Sentinel KQL

Wei Zhou, Guoliang Li, Haoyu Wang, Yuxing Han, Xufei Wu, Fan Wu, and Xuanhe Zhou. PARROT: A benchmark for evaluating LLMs in cross-system SQL translation. InAdvances in Neural Information Processing Systems (NeurIPS), 2025. URLhttps://arxiv.org/abs/2509.23338

work page arXiv 2025
[23]

Next-generation database interfaces: A sur- vey of LLM-based text-to-SQL.IEEE Transactions on Knowledge and Data Engineering, 2025

Zijin Hong, Zheng Yuan, Qinggang Zhang, Hao Chen, Junfeng Dong, Feiran Huang, and Xiao Huang. Next-generation database interfaces: A sur- vey of LLM-based text-to-SQL.IEEE Transactions on Knowledge and Data Engineering, 2025. URL https://ieeexplore.ieee.org/document/ 10839257/. 24 K. Mehta and A. Mankodi

2025
[24]

The PostgreSQL Global Development Group.PostgreSQL 14 Documentation,
[25]

URLhttps://www.postgresql.org/docs/14/
[26]

Learned cost models for query optimization: From batch to streaming systems.Proceedings of the VLDB Endowment, 18(12):5482–5487, 2025

Ruben Heinrich, Xin Li, Manuele Luthra, and Zoi Kaoudi. Learned cost models for query optimization: From batch to streaming systems.Proceedings of the VLDB Endowment, 18(12):5482–5487, 2025. URL https://www.vldb. org/pvldb/vol18/p5482-heinrich.pdf

2025
[27]

Scikit-learn: Machine learning in Python

Fabian Pedregosa, Ga¨ el Varoquaux, Alexandre Gramfort, Vincent Michel, Bertrand Thirion, Olivier Grisel, Mathieu Blondel, Peter Prettenhofer, Ron Weiss, Vincent Dubourg, et al. Scikit-learn: Machine learning in Python. Journal of Machine Learning Research, 12:2825–2830, 2011. URL https: //jmlr.org/papers/v12/pedregosa11a.html

2011
[28]

Machine Learning , author =

Leo Breiman. Random forests.Machine Learning, 45(1):5–32, 2001. URL https://doi.org/10.1023/A:1010933404324

work page doi:10.1023/a:1010933404324 2001
[29]

XGBoost: A Scalable Tree Boosting System

Tianqi Chen and Carlos Guestrin. XGBoost: A scalable tree boosting system. InProceedings of the 22nd ACM SIGKDD, pages 785–794, 2016. URLhttps://arxiv.org/abs/1603.02754

work page Pith review arXiv 2016
[30]

Annals of statistics pp

Jerome H. Friedman. Greedy function approximation: A gradient boost- ing machine.Annals of Statistics, 29(5):1189–1232, 2001. URL https: //projecteuclid.org/euclid.aos/1013203451

work page arXiv 2001
[31]

Online performance prediction using the fusion model of LightGBM and TabNet for large laser facilities.International Journal of Data Science and Analytics, 2024

Hao Zhang and Jingyi Li. Online performance prediction using the fusion model of LightGBM and TabNet for large laser facilities.International Journal of Data Science and Analytics, 2024. https://doi.org/10.1007/ s41060-024-00686-8 . URL https://link.springer.com/article/10. 1007/s41060-024-00686-8

2024