Bridging the Smart City Cybersecurity Data Gap Through AI-Driven Synthetic Dataset Generation

John D. Hastings; Kyle Korman; Stephanie Polczynski; Varghese Vaidyan

arxiv: 2606.12225 · v1 · pith:O3COQ4FPnew · submitted 2026-06-10 · 💻 cs.CR

Bridging the Smart City Cybersecurity Data Gap Through AI-Driven Synthetic Dataset Generation

Stephanie Polczynski , John D. Hastings , Varghese Vaidyan , Kyle Korman This is my paper

Pith reviewed 2026-06-27 09:16 UTC · model grok-4.3

classification 💻 cs.CR

keywords cybersecuritysmart citiessynthetic data generationgenerative AIIoT securitydataset creationattack simulationnetwork security

0 comments

The pith

A generative AI framework produces synthetic cybersecurity datasets that replicate smart city device behaviors, network interactions, and attack scenarios.

A machine-rendered reading of the paper's core claim, the machinery that carries it, and where it could break.

Smart cities depend on interconnected sensors and IoT systems that create large attack surfaces, yet real datasets for testing defenses are often private, incomplete, or short on malicious activity. The paper proposes an AI-based synthetic data generation framework that uses generative models to build high-fidelity alternatives matching realistic conditions. These datasets are checked for protocol conformity, statistical resemblance to originals, and usefulness inside standard security tools. If the approach holds, researchers gain the ability to develop and validate threat models and defensive methods without the usual barriers of data access.

Core claim

The paper claims that an AI-based synthetic data generation framework leveraging generative artificial intelligence models can produce high-fidelity synthetic cybersecurity datasets that replicate realistic device behaviors, network interactions, and cyber-attack scenarios for smart cities, with the resulting data evaluated for conformity to protocol standards, statistical similarity to original datasets, and utility in common security tools to advance threat modeling and defense evaluation.

What carries the argument

The AI-based synthetic data generation (SDG) framework that uses generative artificial intelligence models to create datasets replicating device behaviors, network interactions, and attack scenarios

If this is right

Researchers gain the ability to model smart city threats more effectively using accessible data.
Defensive techniques can be evaluated more comprehensively across varied attack scenarios.
Critical smart city infrastructures receive improved protection through better-tested cybersecurity methods.
Synthetic datasets conform to protocol standards and maintain statistical similarity to real data.

Where Pith is reading between the lines

These are editorial extensions of the paper, not claims the author makes directly.

If the synthetic data proves effective in tools, it could reduce dependence on privacy-restricted real collections for training detection systems.
The same generation approach might apply to other data-scarce cyber-physical domains such as industrial control systems.
A practical next step would be to measure how well models trained solely on the synthetic sets detect novel attack variants in live deployments.

Load-bearing premise

Generative AI models can be trained or prompted to output data that passes statistical similarity checks and proves useful in common security tools.

What would settle it

A direct comparison in which a security tool trained on the synthetic data shows markedly lower performance on actual smart city network traces than on the synthetic traces themselves.

Figures

Figures reproduced from arXiv: 2606.12225 by John D. Hastings, Kyle Korman, Stephanie Polczynski, Varghese Vaidyan.

read the original abstract

Smart cities rely on interconnected cyber-physical systems that integrate sensors, IoT devices, cloud platforms, and AI-driven services and decision-making. While these systems enhance city services, they also introduce complex cybersecurity challenges due to their large attack surfaces, heterogeneous data flows, and evolving threat vectors. Developing and validating cybersecurity tools for smart cities requires high-quality datasets that accurately represent real operational conditions. However, real-world datasets are often incomplete, contain privacy-sensitive data, are difficult to access, or lack sufficient malicious activity to support tool development. This research addresses this critical gap by proposing an AI-based synthetic data generation (SDG) framework designed specifically for smart city cybersecurity research. The proposed framework leverages generative artificial intelligence models to produce high-fidelity synthetic cybersecurity datasets that replicate realistic device behaviors, network interactions, and cyber-attack scenarios. The synthetic datasets are evaluated for conformity to protocol standards, statistical similarity to original datasets, and utility in common security tools. The resulting synthetic data generation framework and evaluation metrics are expected to advance smart city cybersecurity by enabling researchers to model threats more effectively and evaluate defensive techniques more comprehensively to better protect critical smart city infrastructures.

Editorial analysis

A structured set of objections, weighed in public.

Desk editor's note, referee report, simulated authors' rebuttal, and a circularity audit. Tearing a paper down is the easy half of reading it; the pith above is the substance, this is the friction.

Desk Editor's Note private letter to a colleague

This is a high-level proposal for synthetic cybersecurity data generation with no architecture, training details, or results.

read the letter

The main takeaway is that the paper describes an idea for an AI-driven framework to create synthetic smart city cybersecurity datasets but supplies no model architecture, training procedure, data sources, or any evaluation outcomes.

It does a reasonable job stating the practical problem: real datasets are often unavailable due to privacy rules, access barriers, or insufficient attack examples. That matches what researchers in this area already know.

Nothing technically new appears. Generative models for synthetic data exist in other fields; this work simply names the smart city cybersecurity gap as a target without adding any method, conditioning approach, or loss design.

The central weakness is the unsupported claim that the generated data will match protocol standards, show statistical similarity, and prove useful in security tools. The text only says these properties are "expected," with no implementation or tests to back it up. That leaves the load-bearing assumption unexamined.

A reader already working on smart city security might find the problem statement useful as a prompt for their own project. Beyond that, there is little to engage with or cite.

I would not bring this to a reading group. I would not cite it. It does not deserve peer review until the authors implement the framework and report concrete results.

Referee Report

3 major / 0 minor

Summary. The manuscript proposes an AI-based synthetic data generation (SDG) framework for smart city cybersecurity research. It claims that generative AI models can be used to produce high-fidelity synthetic datasets replicating realistic device behaviors, network interactions, and cyber-attack scenarios. These datasets are to be evaluated for conformity to protocol standards, statistical similarity to original datasets, and utility in common security tools, with the expectation that the framework will advance threat modeling and defensive technique evaluation in smart city infrastructures.

Significance. If a concrete implementation of the proposed framework were developed and shown to meet the stated evaluation criteria, it could meaningfully address data scarcity and privacy issues in smart city cybersecurity research, enabling broader experimentation with defensive tools. The core idea of using generative models for this purpose aligns with existing needs in the field, though the manuscript provides no evidence that the approach is feasible or novel relative to prior synthetic data work in cybersecurity.

major comments (3)

[Abstract] Abstract: The framework is presented only as a high-level proposal with no specification of the generative models (e.g., GAN, VAE, transformer, or diffusion variants), conditioning inputs, architecture, or training procedure. This is load-bearing because the central claim that the SDG pipeline produces high-fidelity data replicating device behaviors and attacks rests entirely on the unstated assumption that such models can be configured to succeed.
[Abstract] Abstract: No training corpus, loss functions, or quantitative results are supplied for any of the three evaluation axes (protocol conformity, statistical similarity, downstream utility in security tools). The text uses only forward-looking language ('are expected to', 'will advance') rather than demonstrated outcomes, leaving the load-bearing assumption that the pipeline meets these criteria untested.
[Abstract] Abstract: The manuscript contains no derivation, pseudocode, or preliminary validation showing that synthetic data can pass the required checks for smart-city traffic and attacks; without these elements the proposal cannot be assessed for internal consistency or feasibility.

Simulated Author's Rebuttal

3 responses · 1 unresolved

We thank the referee for the detailed review and constructive feedback on our manuscript. We agree that the work is presented as a high-level conceptual proposal and will use the comments to strengthen the description of the framework.

read point-by-point responses

Referee: [Abstract] Abstract: The framework is presented only as a high-level proposal with no specification of the generative models (e.g., GAN, VAE, transformer, or diffusion variants), conditioning inputs, architecture, or training procedure. This is load-bearing because the central claim that the SDG pipeline produces high-fidelity data replicating device behaviors and attacks rests entirely on the unstated assumption that such models can be configured to succeed.

Authors: We agree that the manuscript describes the framework at a conceptual level without specifying particular generative models, conditioning inputs, or training procedures. The current version focuses on the overall pipeline and evaluation strategy rather than implementation details. We will revise to include example model selections (e.g., conditional GANs or diffusion models for traffic and attack generation), conditioning on device types and protocols, and high-level architecture and training considerations. revision: yes
Referee: [Abstract] Abstract: No training corpus, loss functions, or quantitative results are supplied for any of the three evaluation axes (protocol conformity, statistical similarity, downstream utility in security tools). The text uses only forward-looking language ('are expected to', 'will advance') rather than demonstrated outcomes, leaving the load-bearing assumption that the pipeline meets these criteria untested.

Authors: The referee is correct that no specific training corpora, loss functions, or quantitative results are provided. As this is a framework proposal rather than an empirical study, the manuscript does not include implemented results. In a revision we will specify example training sources (public IoT and smart-city datasets), suggest loss functions aligned with the three evaluation axes, and clarify that empirical outcomes are intended as future work. revision: yes
Referee: [Abstract] Abstract: The manuscript contains no derivation, pseudocode, or preliminary validation showing that synthetic data can pass the required checks for smart-city traffic and attacks; without these elements the proposal cannot be assessed for internal consistency or feasibility.

Authors: We acknowledge the absence of pseudocode, derivations, or preliminary validation. The manuscript is a high-level proposal, so these elements were not included. We will revise to add high-level pseudocode for the SDG pipeline and discuss feasibility based on related synthetic data literature for cybersecurity, while noting that concrete validation requires implementation. revision: yes

standing simulated objections not resolved

The manuscript contains no implementation or empirical results, so actual quantitative outcomes, trained models, or full validation data cannot be supplied in response to requests for demonstrated performance.

Circularity Check

0 steps flagged

No circularity: high-level proposal with no derivations or self-referential claims

full rationale

The manuscript is a framework proposal that states an intent to use generative AI for synthetic datasets and lists evaluation criteria (protocol conformity, statistical similarity, utility) but supplies no equations, model architectures, training procedures, fitted parameters, or derivations. No self-citations appear in the provided text, and no step reduces a claimed result to its own inputs by construction. The central claim remains an unformalized expectation rather than a derived prediction, so no circularity patterns apply.

Axiom & Free-Parameter Ledger

0 free parameters · 0 axioms · 0 invented entities

Only the abstract is available; no specific free parameters, axioms, or invented entities are named or quantified in the provided text.

pith-pipeline@v0.9.1-grok · 5736 in / 1096 out tokens · 19076 ms · 2026-06-27T09:16:43.555826+00:00 · methodology

discussion (0)

Reference graph

Works this paper leans on

32 extracted references · 21 canonical work pages

[1]

[Online]

Grand View Research,Smart cities market size, share — global industry report, 2019-2025, 2025. [Online]. Available: https://www. grandviewresearch.com/industry-analysis/smart-cities-market

2019
[2]

International Telecommunication Union,Digital transformation for people-centered cities, Sep. 2022. Accessed: May 31, 2026. [Online]. Available: https://www.itu.int/cities/about/

2022
[3]

Woetzel,Smart city technology for a more liveable future, Jun

J. Woetzel,Smart city technology for a more liveable future, Jun. 2018. [Online]. Available: https://www.mckinsey.com/capabilities/operations/ our-insights/smart-cities-digital-solutions-for-a-more-livable-future

2018
[4]

Lea,Smart Cities: An Overview of the Technology Trends Driving Smart Cities

R. Lea,Smart Cities: An Overview of the Technology Trends Driving Smart Cities. 2017. [Online]. Available: https://web.archive.org/web/ 20251207164621/https://www.ieee.org/content/dam/ieee-org/ieee/web/ org/about/corporate/ieee- industry- advisory- board/ieee- smart- cities- trend-paper-2017.pdf

2017
[5]

Analysis of smart cities security: Challenges and advancements,

M. Houichi, F. Jaidi, and A. Bouhoula, “Analysis of smart cities security: Challenges and advancements,” in2022 15th International Conference on Security of Information and Networks (SIN), 2022, pp. 01–05.DOI: 10.1109/SIN56466.2022.9970494

work page doi:10.1109/sin56466.2022.9970494 2022
[6]

Smart city: The state of the art, datasets, and evaluation platforms,

S. Mallapuram, N. Ngwum, F. Yuan, C. Lu, and W. Yu, “Smart city: The state of the art, datasets, and evaluation platforms,” inIEEE/ACIS 16th International Conference on Computer & Information Science (ICIS), IEEE, 2017, 447–452.DOI: 10.1109/ICIS.2017.7960034

work page doi:10.1109/icis.2017.7960034 2017
[7]

Data sets, modeling, and decision making in smart cities: A survey,

M. Ma, S. M. Preum, M. Y . Ahmed, W. T ¨arneberg, A. Hendawi, and J. A. Stankovic, “Data sets, modeling, and decision making in smart cities: A survey,” en,ACM Transactions on Cyber-Physical Systems, vol. 4, no. 2, 1–28, Apr. 2020.DOI: 10.1145/3355283

work page doi:10.1145/3355283 2020
[8]

Edge-iiotset: A new comprehensive realistic cyber security dataset of iot and iiot applications for centralized and federated learning,

M. A. Ferrag, O. Friha, D. Hamouda, L. Maglaras, and H. Janicke, “Edge-iiotset: A new comprehensive realistic cyber security dataset of iot and iiot applications for centralized and federated learning,”IEEE Access, vol. 10, 40281–40306, 2022.DOI: 10.1109/ACCESS.2022. 3165809

work page doi:10.1109/access.2022 2022
[9]

WUSTL-IIOT-2021 dataset for IIoT cybersecurity research,

M. Zolanvari, M. A. Teixeira, L. Gupta, K. M. Khan, and R. Jain, “WUSTL-IIOT-2021 dataset for IIoT cybersecurity research,” 2021. [Online]. Available: https://www.cse.wustl.edu/∼jain/iiot2/

2021
[10]

X-iiotid: A connectivity-agnostic and device-agnostic intrusion data set for indus- trial internet of things,

M. Al-Hawawreh, E. Sitnikova, and N. Aboutorab, “X-iiotid: A connectivity-agnostic and device-agnostic intrusion data set for indus- trial internet of things,”IEEE Internet of Things Journal, vol. 9, no. 5, 3962–3977, Mar. 2022.DOI: 10.1109/JIOT.2021.3102056

work page doi:10.1109/jiot.2021.3102056 2022
[11]

UNSW-NB15: A comprehensive data set for network intrusion detection systems (unsw-nb15 network data set),

N. Moustafa and J. Slay, “UNSW-NB15: A comprehensive data set for network intrusion detection systems (unsw-nb15 network data set),” in 2015 Military Communications and Information Systems Conference (MilCIS), Nov. 2015, 1–6.DOI: 10.1109/MilCIS.2015.7348942

work page doi:10.1109/milcis.2015.7348942 2015
[12]

In: Medical Image Computing and Computer Assisted Intervention – MICCAI 2025

N. Koroniotis, N. Moustafa, E. Sitnikova, and J. Slay, “Towards developing network forensic mechanism for botnet activities in the iot based on machine learning techniques,” inMobile Networks and Management, vol. 235, Springer, 2018, 30–44.DOI: 10.1007/978-3- 319-90775-8 3

work page doi:10.1007/978-3- 2018
[13]

Federated TON IoT Windows Datasets for Evaluating AI-Based Security Ap- plications,

N. Moustafa, M. Keshky, E. Debiez, and H. Janicke, “Federated TON IoT Windows Datasets for Evaluating AI-Based Security Ap- plications,” in2020 IEEE 19th International Conference on Trust, Security and Privacy in Computing and Communications (TrustCom), Guangzhou, China: IEEE, Dec. 2020, 848–855.DOI: 10 . 1109 / TrustCom50675.2020.00114

arXiv 2020
[14]

Liu et al.,Best practices and lessons learned on synthetic data,

R. Liu et al.,Best practices and lessons learned on synthetic data,
[15]

Privacy mechanisms and evaluation metrics for synthetic data generation: A systematic review,

P. A. Osorio-Marulanda, G. Epelde, M. Hernandez, I. Isasa, N. M. Reyes, and A. B. Iraola, “Privacy mechanisms and evaluation metrics for synthetic data generation: A systematic review,”IEEE Access, vol. 12, 88048–88074, 2024.DOI: 10.1109/ACCESS.2024.3417608

work page doi:10.1109/access.2024.3417608 2024
[16]

Synthetic data generation models for time series: A literature review,

D. Viana, R. Teixeira, J. Baptista, and T. Pinto, “Synthetic data generation models for time series: A literature review,” in2024 Inter- national Conference on Electrical, Computer and Energy Technologies (ICECET, Jul. 2024, 1–6.DOI: 10.1109/ICECET61485.2024.10698494

work page doi:10.1109/icecet61485.2024.10698494 2024
[17]

and Silva, Jéssica Alice A

K. Wang and M. Govindarasu, “Fgsm-based synthetic data generation technique and application to anomaly detection in smart grid,” in2024 IEEE Power & Energy Society General Meeting (PESGM), Jul. 2024, 1–5.DOI: 10.1109/PESGM51994.2024.10688539

work page doi:10.1109/pesgm51994.2024.10688539 2024
[18]

Synthetic training-data gen- eration for ml-based process mining tools,

A. Singh, Z. Bettouche, and A. Fischer, “Synthetic training-data gen- eration for ml-based process mining tools,” in2024 14th International Conference on Advanced Computer Information Technologies (ACIT), Sep. 2024, 705–709.DOI: 10.1109/ACIT62333.2024.10712516

work page doi:10.1109/acit62333.2024.10712516 2024
[19]

2024, 10.1109/BigData62323.2024.10825388

I. Tenison, A. Chen, N. Singh, O. Dahleh, E. Zemour, and L. Kagal, “Private synthetic data generation for mixed type datasets,” in2024 IEEE International Conference on Big Data (BigData), Dec. 2024, 6379–6386.DOI: 10.1109/BigData62323.2024.10825249

work page doi:10.1109/bigdata62323.2024.10825249 2024
[20]

Elaborate synthetic data generation for internet of things services at smart home environment,

R. Myung, S. Choi, W. Choi, H. Yu, D. Lee, and E. Lee, “Elaborate synthetic data generation for internet of things services at smart home environment,” in2016 International Conference on Computational Science and Computational Intelligence (CSCI), Dec. 2016, 226–229. DOI: 10.1109/CSCI.2016.0050

work page doi:10.1109/csci.2016.0050 2016
[21]

Synthetic packet traffic generative adversarial networks in multi agents with peer-to-peer and global priority queue generation,

C.-L. Wu, Y .-Y . Chen, P.-Y . Chou, and C.-Y . Wang, “Synthetic packet traffic generative adversarial networks in multi agents with peer-to-peer and global priority queue generation,”IEEE Transactions on Network Science and Engineering, vol. 13, 5851–5869, 2026,ISSN: 2327-4697. DOI: 10.1109/TNSE.2026.3653576

work page doi:10.1109/tnse.2026.3653576 2026
[22]

Iotgemini: Modeling iot network behaviors for synthetic traffic generation,

R. Li et al., “Iotgemini: Modeling iot network behaviors for synthetic traffic generation,”IEEE Transactions on Mobile Computing, vol. 23, no. 12, 13240–13257, Dec. 2024,ISSN: 1558-0660.DOI: 10 . 1109 / TMC.2024.3426600

arXiv 2024
[23]

Toward synthetic network traffic generating in ntn- enabled iot: A generative ai approach,

D. Jiang et al., “Toward synthetic network traffic generating in ntn- enabled iot: A generative ai approach,”IEEE Internet of Things Journal, vol. 12, no. 2, 2174–2187, Jan. 2025,ISSN: 2327-4662.DOI: 10.1109/JIOT.2024.3468209

work page doi:10.1109/jiot.2024.3468209 2025
[24]

A tale of two methods: Unveiling the limitations of gan and the rise of bayesian networks for synthetic network traffic generation,

A. Schoen, G. Blanc, P.-F. Gimenez, Y . Han, F. Majorczyk, and L. Me, “A tale of two methods: Unveiling the limitations of gan and the rise of bayesian networks for synthetic network traffic generation,” in 2024 IEEE European Symposium on Security and Privacy Workshops (EuroS&PW), ISSN: 2768-0657, Jul. 2024, 273–286.DOI: 10.1109/ EuroSPW61312.2024.00036

Pith/arXiv arXiv 2024
[25]

Explainable ai for network threat detection: Isolation forests and synthetic wifi traffic,

S. Fioretto, E. Masciari, and E. V . Napolitano, “Explainable ai for network threat detection: Isolation forests and synthetic wifi traffic,” in 2025 IEEE/ACS 22nd International Conference on Computer Systems and Applications (AICCSA), ISSN: 2161-5330, Oct. 2025, 1–5.DOI: 10.1109/AICCSA66935.2025.11315249

work page doi:10.1109/aiccsa66935.2025.11315249 2025
[26]

Bavdekar, E

N.-T. Nguyen, T.-N. Le, K.-H. Le-Minh, and K.-H. Le, “Towards generating semi-synthetic datasets for network intrusion detection system,” in2023 International Conference on Information Networking (ICOIN), Jan. 2023, 62–66.DOI: 10.1109/ICOIN56518.2023.10048962

work page doi:10.1109/icoin56518.2023.10048962 2023
[27]

C. Task, K. Bhagat, and G. Howarth,SDNist: Deidentified Data Report Tool. Apr. 2023.DOI: 10.18434/mds2-2943 [Online]. Available: https: //github.com/usnistgov/SDNist [28]SDMetrics, en, Sep. 2025. [Online]. Available: https://docs.sdv.dev/ sdmetrics [29]SynthEval. schneiderkamplab, Oct. 2025. [Online]. Available: https : //github.com/schneiderkamplab/syntheval

work page doi:10.18434/mds2-2943 2023
[28]

How good is your synthetic data? synthro, a dashboard to evaluate and bench- mark synthetic tabular data,

G. Santangelo, G. Nicora, R. Bellazzi, and A. Dagliati, “How good is your synthetic data? synthro, a dashboard to evaluate and bench- mark synthetic tabular data,”BMC Medical Informatics and Decision Making, vol. 25, no. 1, p. 89, 2025.DOI: 10.1186/s12911-024-02731-9

work page doi:10.1186/s12911-024-02731-9 2025
[29]

Mario Stefanelli

BMI “Mario Stefanelli” Lab - UNIPV,SynthRO, Jul. 2025. [Online]. Available: https://github.com/bmi-labmedinfo/SynthRO

2025
[30]

Employing generative adversarial networks for secure and reliable synthetic data generation in cyber security applications,

V . U. Krishnan, R. Dhumpati, V . E. Salis, M. B. K, K. Sutaria, and G. Abhyankar, “Employing generative adversarial networks for secure and reliable synthetic data generation in cyber security applications,” in2025 4th International Conference on Distributed Computing and Electrical Circuits and Electronics (ICDCECE), Apr. 2025, 1–5.DOI: 10.1109/ICDCECE6...

work page doi:10.1109/icdcece65353.2025.11035442 2025
[31]

Architectural selection framework for synthetic network traffic: Quantifying the fidelity–utility trade-off,

D. A. Ammara, J. Ding, and K. Tutschku, “Architectural selection framework for synthetic network traffic: Quantifying the fidelity–utility trade-off,”IEEE Access, vol. 14, 468–484, 2026,ISSN: 2169-3536. DOI: 10.1109/ACCESS.2025.3646769

work page doi:10.1109/access.2025.3646769 2026
[32]

Idaho National Labs,Malcolm, Feb. 2026. [Online]. Available: https: //github.com/idaholab/Malcolm

2026

[1] [1]

[Online]

Grand View Research,Smart cities market size, share — global industry report, 2019-2025, 2025. [Online]. Available: https://www. grandviewresearch.com/industry-analysis/smart-cities-market

2019

[2] [2]

International Telecommunication Union,Digital transformation for people-centered cities, Sep. 2022. Accessed: May 31, 2026. [Online]. Available: https://www.itu.int/cities/about/

2022

[3] [3]

Woetzel,Smart city technology for a more liveable future, Jun

J. Woetzel,Smart city technology for a more liveable future, Jun. 2018. [Online]. Available: https://www.mckinsey.com/capabilities/operations/ our-insights/smart-cities-digital-solutions-for-a-more-livable-future

2018

[4] [4]

Lea,Smart Cities: An Overview of the Technology Trends Driving Smart Cities

R. Lea,Smart Cities: An Overview of the Technology Trends Driving Smart Cities. 2017. [Online]. Available: https://web.archive.org/web/ 20251207164621/https://www.ieee.org/content/dam/ieee-org/ieee/web/ org/about/corporate/ieee- industry- advisory- board/ieee- smart- cities- trend-paper-2017.pdf

2017

[5] [5]

Analysis of smart cities security: Challenges and advancements,

M. Houichi, F. Jaidi, and A. Bouhoula, “Analysis of smart cities security: Challenges and advancements,” in2022 15th International Conference on Security of Information and Networks (SIN), 2022, pp. 01–05.DOI: 10.1109/SIN56466.2022.9970494

work page doi:10.1109/sin56466.2022.9970494 2022

[6] [6]

Smart city: The state of the art, datasets, and evaluation platforms,

S. Mallapuram, N. Ngwum, F. Yuan, C. Lu, and W. Yu, “Smart city: The state of the art, datasets, and evaluation platforms,” inIEEE/ACIS 16th International Conference on Computer & Information Science (ICIS), IEEE, 2017, 447–452.DOI: 10.1109/ICIS.2017.7960034

work page doi:10.1109/icis.2017.7960034 2017

[7] [7]

Data sets, modeling, and decision making in smart cities: A survey,

M. Ma, S. M. Preum, M. Y . Ahmed, W. T ¨arneberg, A. Hendawi, and J. A. Stankovic, “Data sets, modeling, and decision making in smart cities: A survey,” en,ACM Transactions on Cyber-Physical Systems, vol. 4, no. 2, 1–28, Apr. 2020.DOI: 10.1145/3355283

work page doi:10.1145/3355283 2020

[8] [8]

Edge-iiotset: A new comprehensive realistic cyber security dataset of iot and iiot applications for centralized and federated learning,

M. A. Ferrag, O. Friha, D. Hamouda, L. Maglaras, and H. Janicke, “Edge-iiotset: A new comprehensive realistic cyber security dataset of iot and iiot applications for centralized and federated learning,”IEEE Access, vol. 10, 40281–40306, 2022.DOI: 10.1109/ACCESS.2022. 3165809

work page doi:10.1109/access.2022 2022

[9] [9]

WUSTL-IIOT-2021 dataset for IIoT cybersecurity research,

M. Zolanvari, M. A. Teixeira, L. Gupta, K. M. Khan, and R. Jain, “WUSTL-IIOT-2021 dataset for IIoT cybersecurity research,” 2021. [Online]. Available: https://www.cse.wustl.edu/∼jain/iiot2/

2021

[10] [10]

X-iiotid: A connectivity-agnostic and device-agnostic intrusion data set for indus- trial internet of things,

M. Al-Hawawreh, E. Sitnikova, and N. Aboutorab, “X-iiotid: A connectivity-agnostic and device-agnostic intrusion data set for indus- trial internet of things,”IEEE Internet of Things Journal, vol. 9, no. 5, 3962–3977, Mar. 2022.DOI: 10.1109/JIOT.2021.3102056

work page doi:10.1109/jiot.2021.3102056 2022

[11] [11]

UNSW-NB15: A comprehensive data set for network intrusion detection systems (unsw-nb15 network data set),

N. Moustafa and J. Slay, “UNSW-NB15: A comprehensive data set for network intrusion detection systems (unsw-nb15 network data set),” in 2015 Military Communications and Information Systems Conference (MilCIS), Nov. 2015, 1–6.DOI: 10.1109/MilCIS.2015.7348942

work page doi:10.1109/milcis.2015.7348942 2015

[12] [12]

In: Medical Image Computing and Computer Assisted Intervention – MICCAI 2025

N. Koroniotis, N. Moustafa, E. Sitnikova, and J. Slay, “Towards developing network forensic mechanism for botnet activities in the iot based on machine learning techniques,” inMobile Networks and Management, vol. 235, Springer, 2018, 30–44.DOI: 10.1007/978-3- 319-90775-8 3

work page doi:10.1007/978-3- 2018

[13] [13]

Federated TON IoT Windows Datasets for Evaluating AI-Based Security Ap- plications,

N. Moustafa, M. Keshky, E. Debiez, and H. Janicke, “Federated TON IoT Windows Datasets for Evaluating AI-Based Security Ap- plications,” in2020 IEEE 19th International Conference on Trust, Security and Privacy in Computing and Communications (TrustCom), Guangzhou, China: IEEE, Dec. 2020, 848–855.DOI: 10 . 1109 / TrustCom50675.2020.00114

arXiv 2020

[14] [14]

Liu et al.,Best practices and lessons learned on synthetic data,

R. Liu et al.,Best practices and lessons learned on synthetic data,

[15] [15]

Privacy mechanisms and evaluation metrics for synthetic data generation: A systematic review,

P. A. Osorio-Marulanda, G. Epelde, M. Hernandez, I. Isasa, N. M. Reyes, and A. B. Iraola, “Privacy mechanisms and evaluation metrics for synthetic data generation: A systematic review,”IEEE Access, vol. 12, 88048–88074, 2024.DOI: 10.1109/ACCESS.2024.3417608

work page doi:10.1109/access.2024.3417608 2024

[16] [16]

Synthetic data generation models for time series: A literature review,

D. Viana, R. Teixeira, J. Baptista, and T. Pinto, “Synthetic data generation models for time series: A literature review,” in2024 Inter- national Conference on Electrical, Computer and Energy Technologies (ICECET, Jul. 2024, 1–6.DOI: 10.1109/ICECET61485.2024.10698494

work page doi:10.1109/icecet61485.2024.10698494 2024

[17] [17]

and Silva, Jéssica Alice A

K. Wang and M. Govindarasu, “Fgsm-based synthetic data generation technique and application to anomaly detection in smart grid,” in2024 IEEE Power & Energy Society General Meeting (PESGM), Jul. 2024, 1–5.DOI: 10.1109/PESGM51994.2024.10688539

work page doi:10.1109/pesgm51994.2024.10688539 2024

[18] [18]

Synthetic training-data gen- eration for ml-based process mining tools,

A. Singh, Z. Bettouche, and A. Fischer, “Synthetic training-data gen- eration for ml-based process mining tools,” in2024 14th International Conference on Advanced Computer Information Technologies (ACIT), Sep. 2024, 705–709.DOI: 10.1109/ACIT62333.2024.10712516

work page doi:10.1109/acit62333.2024.10712516 2024

[19] [19]

2024, 10.1109/BigData62323.2024.10825388

I. Tenison, A. Chen, N. Singh, O. Dahleh, E. Zemour, and L. Kagal, “Private synthetic data generation for mixed type datasets,” in2024 IEEE International Conference on Big Data (BigData), Dec. 2024, 6379–6386.DOI: 10.1109/BigData62323.2024.10825249

work page doi:10.1109/bigdata62323.2024.10825249 2024

[20] [20]

Elaborate synthetic data generation for internet of things services at smart home environment,

R. Myung, S. Choi, W. Choi, H. Yu, D. Lee, and E. Lee, “Elaborate synthetic data generation for internet of things services at smart home environment,” in2016 International Conference on Computational Science and Computational Intelligence (CSCI), Dec. 2016, 226–229. DOI: 10.1109/CSCI.2016.0050

work page doi:10.1109/csci.2016.0050 2016

[21] [21]

Synthetic packet traffic generative adversarial networks in multi agents with peer-to-peer and global priority queue generation,

C.-L. Wu, Y .-Y . Chen, P.-Y . Chou, and C.-Y . Wang, “Synthetic packet traffic generative adversarial networks in multi agents with peer-to-peer and global priority queue generation,”IEEE Transactions on Network Science and Engineering, vol. 13, 5851–5869, 2026,ISSN: 2327-4697. DOI: 10.1109/TNSE.2026.3653576

work page doi:10.1109/tnse.2026.3653576 2026

[22] [22]

Iotgemini: Modeling iot network behaviors for synthetic traffic generation,

R. Li et al., “Iotgemini: Modeling iot network behaviors for synthetic traffic generation,”IEEE Transactions on Mobile Computing, vol. 23, no. 12, 13240–13257, Dec. 2024,ISSN: 1558-0660.DOI: 10 . 1109 / TMC.2024.3426600

arXiv 2024

[23] [23]

Toward synthetic network traffic generating in ntn- enabled iot: A generative ai approach,

D. Jiang et al., “Toward synthetic network traffic generating in ntn- enabled iot: A generative ai approach,”IEEE Internet of Things Journal, vol. 12, no. 2, 2174–2187, Jan. 2025,ISSN: 2327-4662.DOI: 10.1109/JIOT.2024.3468209

work page doi:10.1109/jiot.2024.3468209 2025

[24] [24]

A tale of two methods: Unveiling the limitations of gan and the rise of bayesian networks for synthetic network traffic generation,

A. Schoen, G. Blanc, P.-F. Gimenez, Y . Han, F. Majorczyk, and L. Me, “A tale of two methods: Unveiling the limitations of gan and the rise of bayesian networks for synthetic network traffic generation,” in 2024 IEEE European Symposium on Security and Privacy Workshops (EuroS&PW), ISSN: 2768-0657, Jul. 2024, 273–286.DOI: 10.1109/ EuroSPW61312.2024.00036

Pith/arXiv arXiv 2024

[25] [25]

Explainable ai for network threat detection: Isolation forests and synthetic wifi traffic,

S. Fioretto, E. Masciari, and E. V . Napolitano, “Explainable ai for network threat detection: Isolation forests and synthetic wifi traffic,” in 2025 IEEE/ACS 22nd International Conference on Computer Systems and Applications (AICCSA), ISSN: 2161-5330, Oct. 2025, 1–5.DOI: 10.1109/AICCSA66935.2025.11315249

work page doi:10.1109/aiccsa66935.2025.11315249 2025

[26] [26]

Bavdekar, E

N.-T. Nguyen, T.-N. Le, K.-H. Le-Minh, and K.-H. Le, “Towards generating semi-synthetic datasets for network intrusion detection system,” in2023 International Conference on Information Networking (ICOIN), Jan. 2023, 62–66.DOI: 10.1109/ICOIN56518.2023.10048962

work page doi:10.1109/icoin56518.2023.10048962 2023

[27] [27]

C. Task, K. Bhagat, and G. Howarth,SDNist: Deidentified Data Report Tool. Apr. 2023.DOI: 10.18434/mds2-2943 [Online]. Available: https: //github.com/usnistgov/SDNist [28]SDMetrics, en, Sep. 2025. [Online]. Available: https://docs.sdv.dev/ sdmetrics [29]SynthEval. schneiderkamplab, Oct. 2025. [Online]. Available: https : //github.com/schneiderkamplab/syntheval

work page doi:10.18434/mds2-2943 2023

[28] [28]

How good is your synthetic data? synthro, a dashboard to evaluate and bench- mark synthetic tabular data,

G. Santangelo, G. Nicora, R. Bellazzi, and A. Dagliati, “How good is your synthetic data? synthro, a dashboard to evaluate and bench- mark synthetic tabular data,”BMC Medical Informatics and Decision Making, vol. 25, no. 1, p. 89, 2025.DOI: 10.1186/s12911-024-02731-9

work page doi:10.1186/s12911-024-02731-9 2025

[29] [29]

Mario Stefanelli

BMI “Mario Stefanelli” Lab - UNIPV,SynthRO, Jul. 2025. [Online]. Available: https://github.com/bmi-labmedinfo/SynthRO

2025

[30] [30]

Employing generative adversarial networks for secure and reliable synthetic data generation in cyber security applications,

V . U. Krishnan, R. Dhumpati, V . E. Salis, M. B. K, K. Sutaria, and G. Abhyankar, “Employing generative adversarial networks for secure and reliable synthetic data generation in cyber security applications,” in2025 4th International Conference on Distributed Computing and Electrical Circuits and Electronics (ICDCECE), Apr. 2025, 1–5.DOI: 10.1109/ICDCECE6...

work page doi:10.1109/icdcece65353.2025.11035442 2025

[31] [31]

Architectural selection framework for synthetic network traffic: Quantifying the fidelity–utility trade-off,

D. A. Ammara, J. Ding, and K. Tutschku, “Architectural selection framework for synthetic network traffic: Quantifying the fidelity–utility trade-off,”IEEE Access, vol. 14, 468–484, 2026,ISSN: 2169-3536. DOI: 10.1109/ACCESS.2025.3646769

work page doi:10.1109/access.2025.3646769 2026

[32] [32]

Idaho National Labs,Malcolm, Feb. 2026. [Online]. Available: https: //github.com/idaholab/Malcolm

2026