Synthetic but Not Realistic: The Evaluation Challenge in Generative Modelling for Structured Electronic Medical Records

Blanca Gallego; Louisa Jorm; Nicholas I-Hsien Kuo

arxiv: 2606.08903 · v1 · pith:46CD6L4Enew · submitted 2026-06-08 · 💻 cs.LG

Synthetic but Not Realistic: The Evaluation Challenge in Generative Modelling for Structured Electronic Medical Records

Nicholas I-Hsien Kuo , Blanca Gallego , Louisa Jorm This is my paper

Pith reviewed 2026-06-27 17:17 UTC · model grok-4.3

classification 💻 cs.LG

keywords synthetic EMR datagenerative modelsevaluation frameworkdescriptive fidelityclinical utilitystructural validitysubgroup structureeffect estimates

0 comments

The pith

Generative models for structured EMRs reproduce marginal distributions but fail to preserve subgroup structure, effect estimates, and dependency structure.

A machine-rendered reading of the paper's core claim, the machinery that carries it, and where it could break.

The paper introduces an evaluation framework for synthetic electronic medical records drawn from epidemiology, separating descriptive fidelity, clinical utility, and structural validity. Applied to four generative approaches on a 50,000-person cohort with known ground truth, the tests show that every model captures basic distributions yet none keeps subgroups, causal effects, and dependencies intact together. Models that score well on standard statistical checks can still produce poor calibration and twisted relationships. This indicates that conventional similarity metrics overestimate how usable synthetic EMR data are for real clinical or scientific work.

Core claim

While all models reproduce marginal distributions, none simultaneously preserve subgroup structure, effect estimates, and dependency structure. Models with strong distributional fidelity can exhibit poor calibration and distorted relationships, leading to unreliable inference. These results show that current evaluation practices can overestimate synthetic data quality and motivate domain-informed assessment based on the ability to support valid clinical and scientific conclusions.

What carries the argument

Multi-dimensional evaluation framework grounded in epidemiology that measures descriptive fidelity, clinical utility, and structural validity on the PRIME-CVD cohort with known ground-truth structure.

If this is right

Generative models must be checked for their ability to answer predictive and causal questions, not only descriptive statistics.
High performance on marginal distributions does not ensure correct calibration or preserved relationships.
Standard evaluation practices risk overstating the quality of synthetic EMR data for downstream use.
Valid clinical and scientific conclusions from synthetic data require domain-specific structural checks.

Where Pith is reading between the lines

These are editorial extensions of the paper, not claims the author makes directly.

Synthetic EMRs from these models may produce misleading effect estimates if used in place of real data for observational studies.
The observed gap between distributional match and structural preservation suggests future models should target dependency structure explicitly.
Applying the same three-part test to other high-stakes domains could reveal whether the same evaluation shortfall exists elsewhere.

Load-bearing premise

The epidemiology-grounded three-part framework plus the PRIME-CVD cohort's known ground-truth structure is enough to conclude that current generative models cannot support valid clinical conclusions in EMR settings in general.

What would settle it

A generative model that, when trained on the PRIME-CVD cohort, simultaneously preserves subgroup structure, effect estimates, and dependency structure at levels comparable to the real data.

Figures

Figures reproduced from arXiv: 2606.08903 by Blanca Gallego, Louisa Jorm, Nicholas I-Hsien Kuo.

**Figure 2.** Figure 2: Distributional comparisons across IRSD quintiles for numeric and categorical variables. [PITH_FULL_IMAGE:figures/full_fig_p007_2.png] view at source ↗

**Figure 3.** Figure 3: Evaluation of clinical utility for models trained on real and synthetic data. [PITH_FULL_IMAGE:figures/full_fig_p008_3.png] view at source ↗

**Figure 4.** Figure 4: Structural validity comparison using DAGs discovered through GES [PITH_FULL_IMAGE:figures/full_fig_p009_4.png] view at source ↗

read the original abstract

Synthetic healthcare data are widely proposed as privacy-preserving substitutes for real patient data, yet their evaluation remains dominated by statistical similarity and predictive performance that do not reflect clinical validity. We introduce a multi-dimensional evaluation framework grounded in epidemiology, assessing descriptive fidelity, clinical utility, and structural validity, corresponding to descriptive, predictive, and causal questions. We evaluate four representative generative paradigms - GAN-based, VAE-boosted, diffusion-based, and masked modelling - using PRIME-CVD, a 50,000-person cohort with known ground-truth structure. While all models reproduce marginal distributions, none simultaneously preserve subgroup structure, effect estimates, and dependency structure. Notably, models with strong distributional fidelity can exhibit poor calibration and distorted relationships, leading to unreliable inference. These results show that current evaluation practices can overestimate synthetic data quality and motivate domain-informed assessment based on the ability to support valid clinical and scientific conclusions.

Editorial analysis

A structured set of objections, weighed in public.

Desk editor's note, referee report, simulated authors' rebuttal, and a circularity audit. Tearing a paper down is the easy half of reading it; the pith above is the substance, this is the friction.

Desk Editor's Note private letter to a colleague

The paper's new epi-grounded framework shows models match marginals but fail on subgroups, effects, and dependencies on one cohort, with the single-dataset limit as the main constraint.

read the letter

This paper introduces a three-axis evaluation framework for synthetic EMRs drawn from epidemiology—descriptive fidelity, clinical utility, and structural validity—and applies it to four generative approaches on the PRIME-CVD cohort. The central empirical result is that all tested models reproduce marginal distributions yet none keep subgroup structure, effect estimates, and dependency structure intact simultaneously.

The work does well by connecting evaluation directly to the kinds of questions that matter for actual clinical or epidemiological use rather than stopping at statistical similarity. The choice of a cohort with known ground-truth structure lets them measure preservation of relationships without circularity, and the finding that strong distributional fidelity can still produce poor calibration and distorted relationships is a useful concrete observation.

The soft spot is the single 50k-person cohort. The claim that current evaluation practices overestimate quality and that models cannot support valid clinical conclusions rests entirely on results from PRIME-CVD. Without additional datasets, a representativeness argument, or sensitivity checks across cohort types, it is unclear whether the observed failures are general or specific to this cohort's properties.

This is for researchers who generate or evaluate synthetic health data and want to think beyond basic similarity metrics. Readers working on privacy-preserving datasets for medical AI would find the framework worth considering even if they question how far the results travel.

It deserves peer review because the question it raises is worth examining and the setup is clear enough for referees to assess the methods and the scope of the conclusions.

Referee Report

2 major / 1 minor

Summary. The paper introduces a multi-dimensional evaluation framework for synthetic EMR data, grounded in epidemiology, with three axes: descriptive fidelity, clinical utility, and structural validity. Using the PRIME-CVD cohort (50k persons with known ground-truth structure), it evaluates four generative paradigms (GAN-based, VAE-boosted, diffusion-based, masked modelling). The central empirical finding is that all models reproduce marginal distributions but none simultaneously preserve subgroup structure, effect estimates, and dependency structure; models with strong distributional fidelity can still show poor calibration and distorted relationships.

Significance. If the results hold, the work is significant for demonstrating that standard statistical-similarity metrics can overestimate the utility of synthetic EMR data for clinical inference. The framework's grounding in descriptive/predictive/causal questions and the use of a cohort with independent ground-truth structure provide a concrete, falsifiable basis for the critique of current evaluation practices. This could shift the field toward domain-informed assessment that better supports valid scientific conclusions.

major comments (2)

[Abstract and conclusion] Abstract and conclusion: the claim that current generative models cannot support valid clinical conclusions rests on results from a single 50k-person cohort (PRIME-CVD). No additional datasets, representativeness argument, or sensitivity analysis across cohort characteristics is described. This is load-bearing for the generalization beyond the specific PRIME-CVD setting.
[Methods] Methods (implied by abstract): the support for the central claim that 'none simultaneously preserve subgroup structure, effect estimates, and dependency structure' cannot be fully verified without the exact metrics, statistical tests, and full experimental protocol for each dimension of the framework.

minor comments (1)

[Abstract] The abstract would benefit from explicitly naming the four paradigms evaluated and the precise metrics used for each of the three evaluation axes.

Simulated Author's Rebuttal

2 responses · 0 unresolved

We thank the referee for the insightful comments on our manuscript. We address each major comment below and indicate where revisions will be made to strengthen the work.

read point-by-point responses

Referee: [Abstract and conclusion] Abstract and conclusion: the claim that current generative models cannot support valid clinical conclusions rests on results from a single 50k-person cohort (PRIME-CVD). No additional datasets, representativeness argument, or sensitivity analysis across cohort characteristics is described. This is load-bearing for the generalization beyond the specific PRIME-CVD setting.

Authors: We agree that the empirical demonstration is limited to the PRIME-CVD cohort, which was selected specifically because it provides known ground-truth structure for evaluating structural validity. This choice enables the central falsifiable test but does constrain broad generalization claims. In revision we will modify the abstract and conclusion to state that the findings are demonstrated on PRIME-CVD and that the framework is intended to be applied to other cohorts with comparable ground-truth information. We will also add an explicit limitations paragraph discussing cohort-specific factors and the desirability of multi-cohort validation. These changes will be made without altering the core empirical result. revision: yes
Referee: [Methods] Methods (implied by abstract): the support for the central claim that 'none simultaneously preserve subgroup structure, effect estimates, and dependency structure' cannot be fully verified without the exact metrics, statistical tests, and full experimental protocol for each dimension of the framework.

Authors: The manuscript already specifies the three evaluation axes, the concrete metrics (e.g., subgroup-stratified Kolmogorov-Smirnov distances, calibration slopes for effect estimates, and mutual-information or partial-correlation measures for dependencies), and the statistical tests used to assess preservation. To improve verifiability we will expand the Methods section with a dedicated subsection that lists every metric formula, the exact hypothesis tests (including multiplicity corrections), and the full experimental protocol, and we will release the corresponding code repository upon acceptance. These additions will make every quantitative claim directly reproducible from the provided description. revision: yes

Circularity Check

0 steps flagged

Empirical evaluation on external cohort with independent ground truth; no derivations or self-referential reductions

full rationale

The paper introduces a multi-dimensional evaluation framework grounded in epidemiology and applies it to compare four generative model classes on the PRIME-CVD cohort, which supplies independent ground-truth structure. All claims rest on direct empirical measurements of marginals, subgroups, effect estimates, and dependencies rather than any mathematical derivation, fitted parameter renamed as prediction, or self-citation chain. No equations or ansatzes appear that could reduce outputs to inputs by construction. The central finding—that no model simultaneously preserves all three validity dimensions—is therefore an independent observation relative to the input data and framework definitions.

Axiom & Free-Parameter Ledger

0 free parameters · 0 axioms · 0 invented entities

No free parameters, new axioms, or invented entities are introduced; the work relies on standard epidemiological distinctions between descriptive, predictive, and causal questions plus established generative model families.

pith-pipeline@v0.9.1-grok · 5685 in / 1061 out tokens · 19179 ms · 2026-06-27T17:17:27.142937+00:00 · methodology

discussion (0)

Reference graph

Works this paper leans on

93 extracted references · 8 canonical work pages · 6 internal anchors

[1]

Enriching data science and health care education: Application and impact of synthetic data sets through the health gym project.JMIR Medical Education, 10(1):e51388, 2024

Nicholas I-Hsien Kuo, Oscar Perez-Concha, Mark Hanly, Emmanuel Mnatzaganian, Brandon Hao, Marcus Di Sipio, Guolin Yu, Jash Vanjara, Ivy Cerelia Valerie, et al. Enriching data science and health care education: Application and impact of synthetic data sets through the health gym project.JMIR Medical Education, 10(1):e51388, 2024

2024
[2]

Evaluating the impact of health care data completeness for deep generative models.Methods of Information in Medicine, 62 (01/02):031–039, 2023

Benjamin Smith, Senne Van Steelandt, and Anahita Khojandi. Evaluating the impact of health care data completeness for deep generative models.Methods of Information in Medicine, 62 (01/02):031–039, 2023

2023
[3]

Synthetic data in machine learning for medicine and healthcare.Nature Biomedical Engineering, 5(6):493–497, 2021

Richard J Chen, Ming Y Lu, Tiffany Y Chen, Drew FK Williamson, and Faisal Mahmood. Synthetic data in machine learning for medicine and healthcare.Nature Biomedical Engineering, 5(6):493–497, 2021

2021
[4]

Transformation of medical care through gene therapy and human rights to life and health–balancing risks and benefits.European Journal of Health Law, 29(3-5):359–380, 2022

Anne Kjersti Befring. Transformation of medical care through gene therapy and human rights to life and health–balancing risks and benefits.European Journal of Health Law, 29(3-5):359–380, 2022

2022
[5]

Survey on synthetic data generation, evaluation methods and gans.Mathematics, 10(15):2733, 2022

Alvaro Figueira and Bruno Vaz. Survey on synthetic data generation, evaluation methods and gans.Mathematics, 10(15):2733, 2022

2022
[6]

Synthetic data generation: State of the art in health care domain.Computer Science Review, 48:100546, 2023

Hajra Murtaza, Musharif Ahmed, Naurin Farooq Khan, Ghulam Murtaza, Saad Zafar, and Ambreen Bano. Synthetic data generation: State of the art in health care domain.Computer Science Review, 48:100546, 2023

2023
[7]

Can i trust my fake data–a comprehensive quality assessment framework for synthetic tabular data in healthcare.International Journal of Medical Informatics, 185:105413, 2024

Vibeke Binz Vallevik, Aleksandar Babic, Serena E Marshall, Severin Elvatun, Helga MB Brøgger, Sharmini Alagaratnam, Bjørn Edwin, Narasimha R Veeraragavan, Anne Kjersti Befring, and Jan F Nygård. Can i trust my fake data–a comprehensive quality assessment framework for synthetic tabular data in healthcare.International Journal of Medical Informatics, 185:1...

2024
[8]

How to use a subgroup analysis: users’ guide to the medical literature.Jama, 311(4), 2014

Xin Sun, John PA Ioannidis, Thomas Agoritsas, Ana C Alba, and Gordon Guyatt. How to use a subgroup analysis: users’ guide to the medical literature.Jama, 311(4), 2014

2014
[9]

Estimating causal effects from epidemiological data

Miguel A Hernán and James M Robins. Estimating causal effects from epidemiological data. Journal of Epidemiology & Community Health, 60(7):578–586, 2006

2006
[10]

Use of directed acyclic graphs (dags) to identify confounders in applied health research: review and recommendations.International journal of epidemiology, 50(2):620–632, 2021

Peter WG Tennant, Eleanor J Murray, Kellyn F Arnold, Laurie Berrie, Matthew P Fox, Sarah C Gadd, Wendy J Harrison, Claire Keeble, Lynsie R Ranker, Johannes Textor, et al. Use of directed acyclic graphs (dags) to identify confounders in applied health research: review and recommendations.International journal of epidemiology, 50(2):620–632, 2021

2021
[11]

Prime- cvd: A parametrically rendered informatics medical environment for education in cardiovascular risk modelling.medRxiv, pages 2026–03, 2026

Nicholas I-Hsien Kuo, Marzia Hoque Tania, Blanca Gallego Luxan, and Louisa Jorm. Prime- cvd: A parametrically rendered informatics medical environment for education in cardiovascular risk modelling.medRxiv, pages 2026–03, 2026

2026
[12]

Auto-Encoding Variational Bayes

Diederik P Kingma and Max Welling. Auto-encoding variational bayes.arXiv preprint arXiv:1312.6114, 2013

work page internal anchor Pith review Pith/arXiv arXiv 2013
[13]

Generative adversarial nets.Advances in neural information processing systems, 27, 2014

Ian J Goodfellow, Jean Pouget-Abadie, Mehdi Mirza, Bing Xu, David Warde-Farley, Sherjil Ozair, Aaron Courville, and Yoshua Bengio. Generative adversarial nets.Advances in neural information processing systems, 27, 2014

2014
[14]

Wasserstein generative adversarial networks

Martin Arjovsky, Soumith Chintala, and Léon Bottou. Wasserstein generative adversarial networks. InInternational conference on machine learning, pages 214–223. Pmlr, 2017

2017
[15]

Improved training of wasserstein gans.Advances in neural information processing systems, 30, 2017

Ishaan Gulrajani, Faruk Ahmed, Martin Arjovsky, Vincent Dumoulin, and Aaron C Courville. Improved training of wasserstein gans.Advances in neural information processing systems, 30, 2017

2017
[16]

Deep unsuper- vised learning using nonequilibrium thermodynamics

Jascha Sohl-Dickstein, Eric Weiss, Niru Maheswaranathan, and Surya Ganguli. Deep unsuper- vised learning using nonequilibrium thermodynamics. InInternational conference on machine learning, pages 2256–2265. pmlr, 2015. 10

2015
[17]

Denoising diffusion probabilistic models.Advances in neural information processing systems, 33:6840–6851, 2020

Jonathan Ho, Ajay Jain, and Pieter Abbeel. Denoising diffusion probabilistic models.Advances in neural information processing systems, 33:6840–6851, 2020

2020
[18]

Extracting and composing robust features with denoising autoencoders

Pascal Vincent, Hugo Larochelle, Yoshua Bengio, and Pierre-Antoine Manzagol. Extracting and composing robust features with denoising autoencoders. InProceedings of the 25th international conference on Machine learning, pages 1096–1103, 2008

2008
[19]

Context encoders: Feature learning by inpainting

Deepak Pathak, Philipp Krahenbuhl, Jeff Donahue, Trevor Darrell, and Alexei A Efros. Context encoders: Feature learning by inpainting. InProceedings of the IEEE conference on computer vision and pattern recognition, pages 2536–2544, 2016

2016
[20]

Deep variational information bottleneck

Alexander A Alemi, Ian Fischer, Joshua V Dillon, and Kevin Murphy. Deep variational information bottleneck. InInternational Conference on Learning Representations, 2017

2017
[21]

Autoencoding beyond pixels using a learned similarity metric

Anders Boesen Lindbo Larsen, Søren Kaae Sønderby, Hugo Larochelle, and Ole Winther. Autoencoding beyond pixels using a learned similarity metric. InInternational conference on machine learning, pages 1558–1566. PMLR, 2016

2016
[22]

Generative adversarial networks are special cases of artificial curiosity (1990) and also closely related to predictability minimization (1991).Neural Networks, 127: 58–66, 2020

Jürgen Schmidhuber. Generative adversarial networks are special cases of artificial curiosity (1990) and also closely related to predictability minimization (1991).Neural Networks, 127: 58–66, 2020

1990
[23]

NIPS 2016 Tutorial: Generative Adversarial Networks

Ian Goodfellow. Nips 2016 tutorial: Generative adversarial networks.arXiv preprint arXiv:1701.00160, 2016

work page internal anchor Pith review Pith/arXiv arXiv 2016
[24]

The health gym: synthetic health-related datasets for the development of reinforcement learning algorithms

Nicholas I-Hsien Kuo, Mark N Polizzotto, Simon Finfer, Federico Garcia, Anders Sönnerborg, Maurizio Zazzi, Michael Böhm, Rolf Kaiser, Louisa Jorm, and Sebastiano Barbieri. The health gym: synthetic health-related datasets for the development of reinforcement learning algorithms. Scientific data, 9(1):693, 2022

2022
[25]

Diffusion models beat gans on image synthesis

Prafulla Dhariwal and Alexander Nichol. Diffusion models beat gans on image synthesis. Advances in neural information processing systems, 34:8780–8794, 2021

2021
[26]

Score-Based Generative Modeling through Stochastic Differential Equations

Yang Song, Jascha Sohl-Dickstein, Diederik P Kingma, Abhishek Kumar, Stefano Ermon, and Ben Poole. Score-based generative modeling through stochastic differential equations.arXiv preprint arXiv:2011.13456, 2020

work page internal anchor Pith review Pith/arXiv arXiv 2011
[27]

Bert: Pre-training of deep bidirectional transformers for language understanding

Jacob Devlin, Ming-Wei Chang, Kenton Lee, and Kristina Toutanova. Bert: Pre-training of deep bidirectional transformers for language understanding. InProceedings of the 2019 conference of the North American chapter of the association for computational linguistics: human language technologies, volume 1 (long and short papers), pages 4171–4186, 2019

2019
[28]

Behrt: transformer for electronic health records.Scientific reports, 10(1):7155, 2020

Yikuan Li, Shishir Rao, José Roberto Ayala Solares, Abdelaali Hassaine, Rema Ramakrishnan, Dexter Canoy, Yajie Zhu, Kazem Rahimi, and Gholamreza Salimi-Khorshidi. Behrt: transformer for electronic health records.Scientific reports, 10(1):7155, 2020

2020
[29]

Flow Straight and Fast: Learning to Generate and Transfer Data with Rectified Flow

Xingchao Liu, Chengyue Gong, and Qiang Liu. Flow straight and fast: Learning to generate and transfer data with rectified flow.arXiv preprint arXiv:2209.03003, 2022

work page internal anchor Pith review Pith/arXiv arXiv 2022
[30]

High- resolution image synthesis with latent diffusion models

Robin Rombach, Andreas Blattmann, Dominik Lorenz, Patrick Esser, and Björn Ommer. High- resolution image synthesis with latent diffusion models. InProceedings of the IEEE/CVF conference on computer vision and pattern recognition, pages 10684–10695, 2022

2022
[31]

Weak baselines and reporting biases lead to overoptimism in machine learning for fluid-related partial differential equations.Nature machine intelligence, 6(10):1256–1269, 2024

Nick McGreivy and Ammar Hakim. Weak baselines and reporting biases lead to overoptimism in machine learning for fluid-related partial differential equations.Nature machine intelligence, 6(10):1256–1269, 2024

2024
[32]

Sulla determinazione empirica di una legge didistribuzione.Giorn Dell’inst Ital Degli Att, 4:89–91, 1933

Kolmogorov An. Sulla determinazione empirica di una legge didistribuzione.Giorn Dell’inst Ital Degli Att, 4:89–91, 1933

1933
[33]

Table for estimating the goodness of fit of empirical distributions.The annals of mathematical statistics, 19(2):279–281, 1948

Nickolay Smirnov. Table for estimating the goodness of fit of empirical distributions.The annals of mathematical statistics, 19(2):279–281, 1948

1948
[34]

On information and sufficiency.The annals of mathematical statistics, 22(1):79–86, 1951

Solomon Kullback and Richard A Leibler. On information and sufficiency.The annals of mathematical statistics, 22(1):79–86, 1951. 11

1951
[35]

Similarity-based methods for word sense disambiguation

Ido Dagan, Lillian Lee, and Fernando Pereira. Similarity-based methods for word sense disambiguation. In35th Annual Meeting of the Association for Computational Linguistics and 8th Conference of the European Chapter of the Association for Computational Linguistics, pages 56–63, 1997

1997
[36]

Learning via hilbert space embedding of distributions.University of Sydney (2008), 17, 2008

Le Song. Learning via hilbert space embedding of distributions.University of Sydney (2008), 17, 2008

2008
[37]

A class of wasserstein metrics for probability distribu- tions.Michigan Mathematical Journal, 31(2):231–240, 1984

Clark R Givens and Rae Michael Shortt. A class of wasserstein metrics for probability distribu- tions.Michigan Mathematical Journal, 31(2):231–240, 1984

1984
[38]

Principal component analysis: A natural approach to data exploration.ACM Computing Surveys (CSUR), 54(4):1–34, 2021

Felipe L Gewers, Gustavo R Ferreira, Henrique F De Arruda, Filipi N Silva, Cesar H Comin, Diego R Amancio, and Luciano da F Costa. Principal component analysis: A natural approach to data exploration.ACM Computing Surveys (CSUR), 54(4):1–34, 2021

2021
[39]

Visualizing data using t-sne.Journal of machine learning research, 9(11), 2008

Laurens Van der Maaten and Geoffrey Hinton. Visualizing data using t-sne.Journal of machine learning research, 9(11), 2008

2008
[40]

The central role of the propensity score in observational studies for causal effects.Biometrika, 70(1):41–55, 1983

Paul R Rosenbaum and Donald B Rubin. The central role of the propensity score in observational studies for causal effects.Biometrika, 70(1):41–55, 1983

1983
[41]

Adrian Ybañez, Rosein Ancheta, Samantha Shane Evangelista, Joerabell Lourdes Aro, Fatima Maturan, Nadine May Atibing, Egberto Selerio, Kafferine Yamagishi, and Lanndon Ocampo. How can we use machine learning for characterizing organizational identification-a study using clustering with picture fuzzy datasets.International Journal of Information Management...

2023
[42]

Comparison of tabular synthetic data generation techniques using propensity and cluster log metric.International Journal of Information Management Data Insights, 3(2): 100177, 2023

Aryan Pathare, Ramchandra Mangrulkar, Kartik Suvarna, Aryan Parekh, Govind Thakur, and Aruna Gawade. Comparison of tabular synthetic data generation techniques using propensity and cluster log metric.International Journal of Information Management Data Insights, 3(2): 100177, 2023

2023
[43]

Generating and evaluating cross-sectional synthetic electronic healthcare data: preserving data utility and patient privacy.Computational Intelligence, 37(2):819–851, 2021

Zhenchen Wang, Puja Myles, and Allan Tucker. Generating and evaluating cross-sectional synthetic electronic healthcare data: preserving data utility and patient privacy.Computational Intelligence, 37(2):819–851, 2021

2021
[44]

A multi-dimensional evaluation of synthetic data generators.IEEE Access, 10:11147–11158, 2022

Fida K Dankar, Mahmoud K Ibrahim, and Leila Ismail. A multi-dimensional evaluation of synthetic data generators.IEEE Access, 10:11147–11158, 2022

2022
[45]

Synthetic tab- ular data evaluation in the health domain covering resemblance, utility, and privacy dimensions

Mikel Hernadez, Gorka Epelde, Ane Alberdi, Rodrigo Cilla, and Debbie Rankin. Synthetic tab- ular data evaluation in the health domain covering resemblance, utility, and privacy dimensions. Methods of information in medicine, 62(S 01):e19–e38, 2023

2023
[46]

How faithful is your synthetic data? sample-level metrics for evaluating and auditing generative models

Ahmed Alaa, Boris Van Breugel, Evgeny S Saveliev, and Mihaela Van Der Schaar. How faithful is your synthetic data? sample-level metrics for evaluating and auditing generative models. In International conference on machine learning, pages 290–306. PMLR, 2022

2022
[47]

Evaluating identity disclosure risk in fully synthetic health data: model development and validation.Journal of medical Internet research, 22(11):e23139, 2020

Khaled El Emam, Lucy Mosquera, and Jason Bass. Evaluating identity disclosure risk in fully synthetic health data: model development and validation.Journal of medical Internet research, 22(11):e23139, 2020

2020
[48]

Ck4gen: A knowledge distillation framework for generating high-utility synthetic survival datasets in healthcare.arXiv preprint arXiv:2410.16872, 2024

Nicholas I-Hsien Kuo, Blanca Gallego, Louisa Jorm, et al. Ck4gen: A knowledge distillation framework for generating high-utility synthetic survival datasets in healthcare.arXiv preprint arXiv:2410.16872, 2024

work page arXiv 2024
[49]

Estimating 5-year absolute risk of cardiovascular disease using routinely collected electronic medical records from australian general practices.Heart, 2025

Nicholas I-Hsien Kuo, Sebastiano Barbieri, Clare Arnott, Blanca Gallego, Ziba Gandomkar, Shahana Ferdousi, Kirsty Douglas, Mark Woodward, and Louisa Jorm. Estimating 5-year absolute risk of cardiovascular disease using routinely collected electronic medical records from australian general practices.Heart, 2025

2025
[50]

Ruth Walker and Janet E Hiller. The index of relative socio-economic disadvantage: general population views on indicators used to determine area-based disadvantage.Australian and New Zealand journal of public health, 29(5):442–447, 2005. 12

2005
[51]

PRIME-CVD Data Asset 1: DAG-Simulated Cardiovascular Risk Cohort for Medical Informatics Education

Nicholas I-Hsien Kuo. PRIME-CVD Data Asset 1: DAG-Simulated Cardiovascular Risk Cohort for Medical Informatics Education. https://figshare.com/articles/dataset/ PRIME-CVD_Data_Asset_1_DAG-Simulated_Cardiovascular_Risk_Cohort_for_ Medical_Informatics_Education/31395765, February 2026

work page arXiv 2026
[52]

Nicholas I-Hsien Kuo, Federico Garcia, Anders Sönnerborg, Michael Böhm, Rolf Kaiser, Maurizio Zazzi, Mark Polizzotto, Louisa Jorm, Sebastiano Barbieri, et al. Generating synthetic clinical data that capture class imbalanced distributions with generative adversarial networks: Example using antiretroviral therapy for hiv.Journal of Biomedical Informatics, 1...

2023
[53]

Attention is all you need.Advances in neural information processing systems, 30, 2017

Ashish Vaswani, Noam Shazeer, Niki Parmar, Jakob Uszkoreit, Llion Jones, Aidan N Gomez, Łukasz Kaiser, and Illia Polosukhin. Attention is all you need.Advances in neural information processing systems, 30, 2017

2017
[54]

Synthetic health-related longitudinal data with mixed-type variables generated using diffusion models

Nicholas I-Hsien Kuo, Federico Garcia, Anders Sonnerborg, Michael Bohm, Rolf Kaiser, Maurizio Zazzi, Louisa Jorm, and Sebastiano Barbieri. Synthetic health-related longitudinal data with mixed-type variables generated using diffusion models. InNeurIPS 2023 Workshop on Synthetic Data Generation with Generative AI, 2023

2023
[55]

Nicholas I-Hsien Kuo, Blanca Gallego, and Louisa Jorm. Attention-based synthetic data generation for calibration-enhanced survival analysis: a case study for chronic kidney disease using electronic health records.Journal of Biomedical Informatics, page 104928, 2025

2025
[56]

Centrum voor Wiskunde en Informatica Amsterdam, 1995

Guido Van Rossum, Fred L Drake, et al.Python reference manual, volume 111. Centrum voor Wiskunde en Informatica Amsterdam, 1995

1995
[57]

identifying variables that independently predict

John B Carlin. “identifying variables that independently predict. . . ” is not a well-defined research task.Journal of Clinical Epidemiology, 189, 2026

2026
[58]

The distinction between causal, predictive, and descriptive research–there is still room for improvement.Journal of Clinical Epidemiology, page 111960, 2025

Brett P Dyer. The distinction between causal, predictive, and descriptive research–there is still room for improvement.Journal of Clinical Epidemiology, page 111960, 2025

2025
[59]

John Wiley & Sons, 2002

John D Kalbfleisch and Ross L Prentice.The statistical analysis of failure time data. John Wiley & Sons, 2002

2002
[60]

Regression models and life-tables.Journal of the royal statistical society: Series B (methodological), 34(2):187–202, 1972

David R Cox. Regression models and life-tables.Journal of the royal statistical society: Series B (methodological), 34(2):187–202, 1972

1972
[61]

Calibration: the achilles heel of predictive analytics.BMC medicine, 17(1):230, 2019

Ben Van Calster, David J McLernon, Maarten Van Smeden, Laure Wynants, and Ewout W Steyerberg. Calibration: the achilles heel of predictive analytics.BMC medicine, 17(1):230, 2019

2019
[62]

Learning equivalence classes of bayesian-network structures

David Maxwell Chickering. Learning equivalence classes of bayesian-network structures. Journal of machine learning research, 2(Feb):445–498, 2002

2002
[63]

Structural Intervention Distance (SID) for Evaluating Causal Graphs

Jonas Peters and Peter Bühlmann. Structural intervention distance (sid) for evaluating causal graphs.arXiv preprint arXiv:1306.1043, 2013

work page internal anchor Pith review Pith/arXiv arXiv 2013
[64]

Smote: synthetic minority over-sampling technique.Journal of artificial intelligence research, 16: 321–357, 2002

Nitesh V Chawla, Kevin W Bowyer, Lawrence O Hall, and W Philip Kegelmeyer. Smote: synthetic minority over-sampling technique.Journal of artificial intelligence research, 16: 321–357, 2002

2002
[65]

Differential privacy

Cynthia Dwork. Differential privacy. InEncyclopedia of Cryptography, Security and Privacy, pages 649–652. Springer, 2025

2025
[66]

Array programming with numpy.nature, 585(7825):357–362, 2020

Charles R Harris, K Jarrod Millman, Stéfan J Van Der Walt, Ralf Gommers, Pauli Virtanen, David Cournapeau, Eric Wieser, Julian Taylor, Sebastian Berg, Nathaniel J Smith, et al. Array programming with numpy.nature, 585(7825):357–362, 2020

2020
[67]

Data structures for statistical computing in python.scipy, 445(1):51–56, 2010

Wes McKinney et al. Data structures for statistical computing in python.scipy, 445(1):51–56, 2010. 13

2010
[68]

Scipy 1.0: fundamental algorithms for scientific computing in python.Nature methods, 17(3):261–272, 2020

Pauli Virtanen, Ralf Gommers, Travis E Oliphant, Matt Haberland, Tyler Reddy, David Courna- peau, Evgeni Burovski, Pearu Peterson, Warren Weckesser, Jonathan Bright, et al. Scipy 1.0: fundamental algorithms for scientific computing in python.Nature methods, 17(3):261–272, 2020

2020
[69]

Scikit- learn: Machine learning in python.the Journal of machine Learning research, 12:2825–2830, 2011

Fabian Pedregosa, Gaël Varoquaux, Alexandre Gramfort, Vincent Michel, Bertrand Thirion, Olivier Grisel, Mathieu Blondel, Peter Prettenhofer, Ron Weiss, Vincent Dubourg, et al. Scikit- learn: Machine learning in python.the Journal of machine Learning research, 12:2825–2830, 2011

2011
[70]

Matplotlib: A 2d graphics environment.Computing in science & engineering, 9(3):90–95, 2007

John D Hunter. Matplotlib: A 2d graphics environment.Computing in science & engineering, 9(3):90–95, 2007

2007
[71]

Seaborn: statistical data visualization.Journal of open source software, 6 (60):3021, 2021

Michael L Waskom. Seaborn: statistical data visualization.Journal of open source software, 6 (60):3021, 2021

2021
[72]

lifelines: survival analysis in python.Journal of Open Source Software, 4(40):1317, 2019

Cameron Davidson-Pilon. lifelines: survival analysis in python.Journal of Open Source Software, 4(40):1317, 2019

2019
[73]

Pytorch: An imperative style, high-performance deep learning library.Advances in neural information processing systems, 32, 2019

Adam Paszke, Sam Gross, Francisco Massa, Adam Lerer, James Bradbury, Gregory Chanan, Trevor Killeen, Zeming Lin, Natalia Gimelshein, Luca Antiga, et al. Pytorch: An imperative style, high-performance deep learning library.Advances in neural information processing systems, 32, 2019

2019
[74]

Causal-learn: Causal discovery in python.Journal of Machine Learning Research, 25(60):1–8, 2024

Yujia Zheng, Biwei Huang, Wei Chen, Joseph Ramsey, Mingming Gong, Ruichu Cai, Shohei Shimizu, Peter Spirtes, and Kun Zhang. Causal-learn: Causal discovery in python.Journal of Machine Learning Research, 25(60):1–8, 2024

2024
[75]

Adam: A Method for Stochastic Optimization

Diederik P Kingma and Jimmy Ba. Adam: A method for stochastic optimization.arXiv preprint arXiv:1412.6980, 2014. 14 Appendix: Additional Details to the Main Text Purpose of this Appendix.This appendix provides detailed methodological, implementation, and supplementary result information supporting the main text. It ensures reproducibility, clarifies evalu...

work page internal anchor Pith review Pith/arXiv arXiv 2014
[76]

an auxiliary autoencoder update on embedded real samples,
[77]

multiple critic updates using the WGAN-GP objective,
[78]

one generator update using the adversarial and correlation-alignment losses. Formally, the model optimises min G,A max D Exreal[D(xreal)]−E z[D(G(z))] +λ GP ·GP +λ corr · Lcorr +L A, with the understanding that LA is applied only to the auxiliary autoencoder parameters, while the adversarial objectives govern the generator and critic updates. In the imple...
[79]

samplex 0 from real data,
[80]

sample timestept∼ U {1, . . . , T},

Showing first 80 references.

[1] [1]

Enriching data science and health care education: Application and impact of synthetic data sets through the health gym project.JMIR Medical Education, 10(1):e51388, 2024

Nicholas I-Hsien Kuo, Oscar Perez-Concha, Mark Hanly, Emmanuel Mnatzaganian, Brandon Hao, Marcus Di Sipio, Guolin Yu, Jash Vanjara, Ivy Cerelia Valerie, et al. Enriching data science and health care education: Application and impact of synthetic data sets through the health gym project.JMIR Medical Education, 10(1):e51388, 2024

2024

[2] [2]

Evaluating the impact of health care data completeness for deep generative models.Methods of Information in Medicine, 62 (01/02):031–039, 2023

Benjamin Smith, Senne Van Steelandt, and Anahita Khojandi. Evaluating the impact of health care data completeness for deep generative models.Methods of Information in Medicine, 62 (01/02):031–039, 2023

2023

[3] [3]

Synthetic data in machine learning for medicine and healthcare.Nature Biomedical Engineering, 5(6):493–497, 2021

Richard J Chen, Ming Y Lu, Tiffany Y Chen, Drew FK Williamson, and Faisal Mahmood. Synthetic data in machine learning for medicine and healthcare.Nature Biomedical Engineering, 5(6):493–497, 2021

2021

[4] [4]

Transformation of medical care through gene therapy and human rights to life and health–balancing risks and benefits.European Journal of Health Law, 29(3-5):359–380, 2022

Anne Kjersti Befring. Transformation of medical care through gene therapy and human rights to life and health–balancing risks and benefits.European Journal of Health Law, 29(3-5):359–380, 2022

2022

[5] [5]

Survey on synthetic data generation, evaluation methods and gans.Mathematics, 10(15):2733, 2022

Alvaro Figueira and Bruno Vaz. Survey on synthetic data generation, evaluation methods and gans.Mathematics, 10(15):2733, 2022

2022

[6] [6]

Synthetic data generation: State of the art in health care domain.Computer Science Review, 48:100546, 2023

Hajra Murtaza, Musharif Ahmed, Naurin Farooq Khan, Ghulam Murtaza, Saad Zafar, and Ambreen Bano. Synthetic data generation: State of the art in health care domain.Computer Science Review, 48:100546, 2023

2023

[7] [7]

Can i trust my fake data–a comprehensive quality assessment framework for synthetic tabular data in healthcare.International Journal of Medical Informatics, 185:105413, 2024

Vibeke Binz Vallevik, Aleksandar Babic, Serena E Marshall, Severin Elvatun, Helga MB Brøgger, Sharmini Alagaratnam, Bjørn Edwin, Narasimha R Veeraragavan, Anne Kjersti Befring, and Jan F Nygård. Can i trust my fake data–a comprehensive quality assessment framework for synthetic tabular data in healthcare.International Journal of Medical Informatics, 185:1...

2024

[8] [8]

How to use a subgroup analysis: users’ guide to the medical literature.Jama, 311(4), 2014

Xin Sun, John PA Ioannidis, Thomas Agoritsas, Ana C Alba, and Gordon Guyatt. How to use a subgroup analysis: users’ guide to the medical literature.Jama, 311(4), 2014

2014

[9] [9]

Estimating causal effects from epidemiological data

Miguel A Hernán and James M Robins. Estimating causal effects from epidemiological data. Journal of Epidemiology & Community Health, 60(7):578–586, 2006

2006

[10] [10]

Use of directed acyclic graphs (dags) to identify confounders in applied health research: review and recommendations.International journal of epidemiology, 50(2):620–632, 2021

Peter WG Tennant, Eleanor J Murray, Kellyn F Arnold, Laurie Berrie, Matthew P Fox, Sarah C Gadd, Wendy J Harrison, Claire Keeble, Lynsie R Ranker, Johannes Textor, et al. Use of directed acyclic graphs (dags) to identify confounders in applied health research: review and recommendations.International journal of epidemiology, 50(2):620–632, 2021

2021

[11] [11]

Prime- cvd: A parametrically rendered informatics medical environment for education in cardiovascular risk modelling.medRxiv, pages 2026–03, 2026

Nicholas I-Hsien Kuo, Marzia Hoque Tania, Blanca Gallego Luxan, and Louisa Jorm. Prime- cvd: A parametrically rendered informatics medical environment for education in cardiovascular risk modelling.medRxiv, pages 2026–03, 2026

2026

[12] [12]

Auto-Encoding Variational Bayes

Diederik P Kingma and Max Welling. Auto-encoding variational bayes.arXiv preprint arXiv:1312.6114, 2013

work page internal anchor Pith review Pith/arXiv arXiv 2013

[13] [13]

Generative adversarial nets.Advances in neural information processing systems, 27, 2014

Ian J Goodfellow, Jean Pouget-Abadie, Mehdi Mirza, Bing Xu, David Warde-Farley, Sherjil Ozair, Aaron Courville, and Yoshua Bengio. Generative adversarial nets.Advances in neural information processing systems, 27, 2014

2014

[14] [14]

Wasserstein generative adversarial networks

Martin Arjovsky, Soumith Chintala, and Léon Bottou. Wasserstein generative adversarial networks. InInternational conference on machine learning, pages 214–223. Pmlr, 2017

2017

[15] [15]

Improved training of wasserstein gans.Advances in neural information processing systems, 30, 2017

Ishaan Gulrajani, Faruk Ahmed, Martin Arjovsky, Vincent Dumoulin, and Aaron C Courville. Improved training of wasserstein gans.Advances in neural information processing systems, 30, 2017

2017

[16] [16]

Deep unsuper- vised learning using nonequilibrium thermodynamics

Jascha Sohl-Dickstein, Eric Weiss, Niru Maheswaranathan, and Surya Ganguli. Deep unsuper- vised learning using nonequilibrium thermodynamics. InInternational conference on machine learning, pages 2256–2265. pmlr, 2015. 10

2015

[17] [17]

Denoising diffusion probabilistic models.Advances in neural information processing systems, 33:6840–6851, 2020

Jonathan Ho, Ajay Jain, and Pieter Abbeel. Denoising diffusion probabilistic models.Advances in neural information processing systems, 33:6840–6851, 2020

2020

[18] [18]

Extracting and composing robust features with denoising autoencoders

Pascal Vincent, Hugo Larochelle, Yoshua Bengio, and Pierre-Antoine Manzagol. Extracting and composing robust features with denoising autoencoders. InProceedings of the 25th international conference on Machine learning, pages 1096–1103, 2008

2008

[19] [19]

Context encoders: Feature learning by inpainting

Deepak Pathak, Philipp Krahenbuhl, Jeff Donahue, Trevor Darrell, and Alexei A Efros. Context encoders: Feature learning by inpainting. InProceedings of the IEEE conference on computer vision and pattern recognition, pages 2536–2544, 2016

2016

[20] [20]

Deep variational information bottleneck

Alexander A Alemi, Ian Fischer, Joshua V Dillon, and Kevin Murphy. Deep variational information bottleneck. InInternational Conference on Learning Representations, 2017

2017

[21] [21]

Autoencoding beyond pixels using a learned similarity metric

Anders Boesen Lindbo Larsen, Søren Kaae Sønderby, Hugo Larochelle, and Ole Winther. Autoencoding beyond pixels using a learned similarity metric. InInternational conference on machine learning, pages 1558–1566. PMLR, 2016

2016

[22] [22]

Generative adversarial networks are special cases of artificial curiosity (1990) and also closely related to predictability minimization (1991).Neural Networks, 127: 58–66, 2020

Jürgen Schmidhuber. Generative adversarial networks are special cases of artificial curiosity (1990) and also closely related to predictability minimization (1991).Neural Networks, 127: 58–66, 2020

1990

[23] [23]

NIPS 2016 Tutorial: Generative Adversarial Networks

Ian Goodfellow. Nips 2016 tutorial: Generative adversarial networks.arXiv preprint arXiv:1701.00160, 2016

work page internal anchor Pith review Pith/arXiv arXiv 2016

[24] [24]

The health gym: synthetic health-related datasets for the development of reinforcement learning algorithms

Nicholas I-Hsien Kuo, Mark N Polizzotto, Simon Finfer, Federico Garcia, Anders Sönnerborg, Maurizio Zazzi, Michael Böhm, Rolf Kaiser, Louisa Jorm, and Sebastiano Barbieri. The health gym: synthetic health-related datasets for the development of reinforcement learning algorithms. Scientific data, 9(1):693, 2022

2022

[25] [25]

Diffusion models beat gans on image synthesis

Prafulla Dhariwal and Alexander Nichol. Diffusion models beat gans on image synthesis. Advances in neural information processing systems, 34:8780–8794, 2021

2021

[26] [26]

Score-Based Generative Modeling through Stochastic Differential Equations

Yang Song, Jascha Sohl-Dickstein, Diederik P Kingma, Abhishek Kumar, Stefano Ermon, and Ben Poole. Score-based generative modeling through stochastic differential equations.arXiv preprint arXiv:2011.13456, 2020

work page internal anchor Pith review Pith/arXiv arXiv 2011

[27] [27]

Bert: Pre-training of deep bidirectional transformers for language understanding

Jacob Devlin, Ming-Wei Chang, Kenton Lee, and Kristina Toutanova. Bert: Pre-training of deep bidirectional transformers for language understanding. InProceedings of the 2019 conference of the North American chapter of the association for computational linguistics: human language technologies, volume 1 (long and short papers), pages 4171–4186, 2019

2019

[28] [28]

Behrt: transformer for electronic health records.Scientific reports, 10(1):7155, 2020

Yikuan Li, Shishir Rao, José Roberto Ayala Solares, Abdelaali Hassaine, Rema Ramakrishnan, Dexter Canoy, Yajie Zhu, Kazem Rahimi, and Gholamreza Salimi-Khorshidi. Behrt: transformer for electronic health records.Scientific reports, 10(1):7155, 2020

2020

[29] [29]

Flow Straight and Fast: Learning to Generate and Transfer Data with Rectified Flow

Xingchao Liu, Chengyue Gong, and Qiang Liu. Flow straight and fast: Learning to generate and transfer data with rectified flow.arXiv preprint arXiv:2209.03003, 2022

work page internal anchor Pith review Pith/arXiv arXiv 2022

[30] [30]

High- resolution image synthesis with latent diffusion models

Robin Rombach, Andreas Blattmann, Dominik Lorenz, Patrick Esser, and Björn Ommer. High- resolution image synthesis with latent diffusion models. InProceedings of the IEEE/CVF conference on computer vision and pattern recognition, pages 10684–10695, 2022

2022

[31] [31]

Weak baselines and reporting biases lead to overoptimism in machine learning for fluid-related partial differential equations.Nature machine intelligence, 6(10):1256–1269, 2024

Nick McGreivy and Ammar Hakim. Weak baselines and reporting biases lead to overoptimism in machine learning for fluid-related partial differential equations.Nature machine intelligence, 6(10):1256–1269, 2024

2024

[32] [32]

Sulla determinazione empirica di una legge didistribuzione.Giorn Dell’inst Ital Degli Att, 4:89–91, 1933

Kolmogorov An. Sulla determinazione empirica di una legge didistribuzione.Giorn Dell’inst Ital Degli Att, 4:89–91, 1933

1933

[33] [33]

Table for estimating the goodness of fit of empirical distributions.The annals of mathematical statistics, 19(2):279–281, 1948

Nickolay Smirnov. Table for estimating the goodness of fit of empirical distributions.The annals of mathematical statistics, 19(2):279–281, 1948

1948

[34] [34]

On information and sufficiency.The annals of mathematical statistics, 22(1):79–86, 1951

Solomon Kullback and Richard A Leibler. On information and sufficiency.The annals of mathematical statistics, 22(1):79–86, 1951. 11

1951

[35] [35]

Similarity-based methods for word sense disambiguation

Ido Dagan, Lillian Lee, and Fernando Pereira. Similarity-based methods for word sense disambiguation. In35th Annual Meeting of the Association for Computational Linguistics and 8th Conference of the European Chapter of the Association for Computational Linguistics, pages 56–63, 1997

1997

[36] [36]

Learning via hilbert space embedding of distributions.University of Sydney (2008), 17, 2008

Le Song. Learning via hilbert space embedding of distributions.University of Sydney (2008), 17, 2008

2008

[37] [37]

A class of wasserstein metrics for probability distribu- tions.Michigan Mathematical Journal, 31(2):231–240, 1984

Clark R Givens and Rae Michael Shortt. A class of wasserstein metrics for probability distribu- tions.Michigan Mathematical Journal, 31(2):231–240, 1984

1984

[38] [38]

Principal component analysis: A natural approach to data exploration.ACM Computing Surveys (CSUR), 54(4):1–34, 2021

Felipe L Gewers, Gustavo R Ferreira, Henrique F De Arruda, Filipi N Silva, Cesar H Comin, Diego R Amancio, and Luciano da F Costa. Principal component analysis: A natural approach to data exploration.ACM Computing Surveys (CSUR), 54(4):1–34, 2021

2021

[39] [39]

Visualizing data using t-sne.Journal of machine learning research, 9(11), 2008

Laurens Van der Maaten and Geoffrey Hinton. Visualizing data using t-sne.Journal of machine learning research, 9(11), 2008

2008

[40] [40]

The central role of the propensity score in observational studies for causal effects.Biometrika, 70(1):41–55, 1983

Paul R Rosenbaum and Donald B Rubin. The central role of the propensity score in observational studies for causal effects.Biometrika, 70(1):41–55, 1983

1983

[41] [41]

Adrian Ybañez, Rosein Ancheta, Samantha Shane Evangelista, Joerabell Lourdes Aro, Fatima Maturan, Nadine May Atibing, Egberto Selerio, Kafferine Yamagishi, and Lanndon Ocampo. How can we use machine learning for characterizing organizational identification-a study using clustering with picture fuzzy datasets.International Journal of Information Management...

2023

[42] [42]

Comparison of tabular synthetic data generation techniques using propensity and cluster log metric.International Journal of Information Management Data Insights, 3(2): 100177, 2023

Aryan Pathare, Ramchandra Mangrulkar, Kartik Suvarna, Aryan Parekh, Govind Thakur, and Aruna Gawade. Comparison of tabular synthetic data generation techniques using propensity and cluster log metric.International Journal of Information Management Data Insights, 3(2): 100177, 2023

2023

[43] [43]

Generating and evaluating cross-sectional synthetic electronic healthcare data: preserving data utility and patient privacy.Computational Intelligence, 37(2):819–851, 2021

Zhenchen Wang, Puja Myles, and Allan Tucker. Generating and evaluating cross-sectional synthetic electronic healthcare data: preserving data utility and patient privacy.Computational Intelligence, 37(2):819–851, 2021

2021

[44] [44]

A multi-dimensional evaluation of synthetic data generators.IEEE Access, 10:11147–11158, 2022

Fida K Dankar, Mahmoud K Ibrahim, and Leila Ismail. A multi-dimensional evaluation of synthetic data generators.IEEE Access, 10:11147–11158, 2022

2022

[45] [45]

Synthetic tab- ular data evaluation in the health domain covering resemblance, utility, and privacy dimensions

Mikel Hernadez, Gorka Epelde, Ane Alberdi, Rodrigo Cilla, and Debbie Rankin. Synthetic tab- ular data evaluation in the health domain covering resemblance, utility, and privacy dimensions. Methods of information in medicine, 62(S 01):e19–e38, 2023

2023

[46] [46]

How faithful is your synthetic data? sample-level metrics for evaluating and auditing generative models

Ahmed Alaa, Boris Van Breugel, Evgeny S Saveliev, and Mihaela Van Der Schaar. How faithful is your synthetic data? sample-level metrics for evaluating and auditing generative models. In International conference on machine learning, pages 290–306. PMLR, 2022

2022

[47] [47]

Evaluating identity disclosure risk in fully synthetic health data: model development and validation.Journal of medical Internet research, 22(11):e23139, 2020

Khaled El Emam, Lucy Mosquera, and Jason Bass. Evaluating identity disclosure risk in fully synthetic health data: model development and validation.Journal of medical Internet research, 22(11):e23139, 2020

2020

[48] [48]

Ck4gen: A knowledge distillation framework for generating high-utility synthetic survival datasets in healthcare.arXiv preprint arXiv:2410.16872, 2024

Nicholas I-Hsien Kuo, Blanca Gallego, Louisa Jorm, et al. Ck4gen: A knowledge distillation framework for generating high-utility synthetic survival datasets in healthcare.arXiv preprint arXiv:2410.16872, 2024

work page arXiv 2024

[49] [49]

Estimating 5-year absolute risk of cardiovascular disease using routinely collected electronic medical records from australian general practices.Heart, 2025

Nicholas I-Hsien Kuo, Sebastiano Barbieri, Clare Arnott, Blanca Gallego, Ziba Gandomkar, Shahana Ferdousi, Kirsty Douglas, Mark Woodward, and Louisa Jorm. Estimating 5-year absolute risk of cardiovascular disease using routinely collected electronic medical records from australian general practices.Heart, 2025

2025

[50] [50]

Ruth Walker and Janet E Hiller. The index of relative socio-economic disadvantage: general population views on indicators used to determine area-based disadvantage.Australian and New Zealand journal of public health, 29(5):442–447, 2005. 12

2005

[51] [51]

PRIME-CVD Data Asset 1: DAG-Simulated Cardiovascular Risk Cohort for Medical Informatics Education

Nicholas I-Hsien Kuo. PRIME-CVD Data Asset 1: DAG-Simulated Cardiovascular Risk Cohort for Medical Informatics Education. https://figshare.com/articles/dataset/ PRIME-CVD_Data_Asset_1_DAG-Simulated_Cardiovascular_Risk_Cohort_for_ Medical_Informatics_Education/31395765, February 2026

work page arXiv 2026

[52] [52]

Nicholas I-Hsien Kuo, Federico Garcia, Anders Sönnerborg, Michael Böhm, Rolf Kaiser, Maurizio Zazzi, Mark Polizzotto, Louisa Jorm, Sebastiano Barbieri, et al. Generating synthetic clinical data that capture class imbalanced distributions with generative adversarial networks: Example using antiretroviral therapy for hiv.Journal of Biomedical Informatics, 1...

2023

[53] [53]

Attention is all you need.Advances in neural information processing systems, 30, 2017

Ashish Vaswani, Noam Shazeer, Niki Parmar, Jakob Uszkoreit, Llion Jones, Aidan N Gomez, Łukasz Kaiser, and Illia Polosukhin. Attention is all you need.Advances in neural information processing systems, 30, 2017

2017

[54] [54]

Synthetic health-related longitudinal data with mixed-type variables generated using diffusion models

Nicholas I-Hsien Kuo, Federico Garcia, Anders Sonnerborg, Michael Bohm, Rolf Kaiser, Maurizio Zazzi, Louisa Jorm, and Sebastiano Barbieri. Synthetic health-related longitudinal data with mixed-type variables generated using diffusion models. InNeurIPS 2023 Workshop on Synthetic Data Generation with Generative AI, 2023

2023

[55] [55]

Nicholas I-Hsien Kuo, Blanca Gallego, and Louisa Jorm. Attention-based synthetic data generation for calibration-enhanced survival analysis: a case study for chronic kidney disease using electronic health records.Journal of Biomedical Informatics, page 104928, 2025

2025

[56] [56]

Centrum voor Wiskunde en Informatica Amsterdam, 1995

Guido Van Rossum, Fred L Drake, et al.Python reference manual, volume 111. Centrum voor Wiskunde en Informatica Amsterdam, 1995

1995

[57] [57]

identifying variables that independently predict

John B Carlin. “identifying variables that independently predict. . . ” is not a well-defined research task.Journal of Clinical Epidemiology, 189, 2026

2026

[58] [58]

The distinction between causal, predictive, and descriptive research–there is still room for improvement.Journal of Clinical Epidemiology, page 111960, 2025

Brett P Dyer. The distinction between causal, predictive, and descriptive research–there is still room for improvement.Journal of Clinical Epidemiology, page 111960, 2025

2025

[59] [59]

John Wiley & Sons, 2002

John D Kalbfleisch and Ross L Prentice.The statistical analysis of failure time data. John Wiley & Sons, 2002

2002

[60] [60]

Regression models and life-tables.Journal of the royal statistical society: Series B (methodological), 34(2):187–202, 1972

David R Cox. Regression models and life-tables.Journal of the royal statistical society: Series B (methodological), 34(2):187–202, 1972

1972

[61] [61]

Calibration: the achilles heel of predictive analytics.BMC medicine, 17(1):230, 2019

Ben Van Calster, David J McLernon, Maarten Van Smeden, Laure Wynants, and Ewout W Steyerberg. Calibration: the achilles heel of predictive analytics.BMC medicine, 17(1):230, 2019

2019

[62] [62]

Learning equivalence classes of bayesian-network structures

David Maxwell Chickering. Learning equivalence classes of bayesian-network structures. Journal of machine learning research, 2(Feb):445–498, 2002

2002

[63] [63]

Structural Intervention Distance (SID) for Evaluating Causal Graphs

Jonas Peters and Peter Bühlmann. Structural intervention distance (sid) for evaluating causal graphs.arXiv preprint arXiv:1306.1043, 2013

work page internal anchor Pith review Pith/arXiv arXiv 2013

[64] [64]

Smote: synthetic minority over-sampling technique.Journal of artificial intelligence research, 16: 321–357, 2002

Nitesh V Chawla, Kevin W Bowyer, Lawrence O Hall, and W Philip Kegelmeyer. Smote: synthetic minority over-sampling technique.Journal of artificial intelligence research, 16: 321–357, 2002

2002

[65] [65]

Differential privacy

Cynthia Dwork. Differential privacy. InEncyclopedia of Cryptography, Security and Privacy, pages 649–652. Springer, 2025

2025

[66] [66]

Array programming with numpy.nature, 585(7825):357–362, 2020

Charles R Harris, K Jarrod Millman, Stéfan J Van Der Walt, Ralf Gommers, Pauli Virtanen, David Cournapeau, Eric Wieser, Julian Taylor, Sebastian Berg, Nathaniel J Smith, et al. Array programming with numpy.nature, 585(7825):357–362, 2020

2020

[67] [67]

Data structures for statistical computing in python.scipy, 445(1):51–56, 2010

Wes McKinney et al. Data structures for statistical computing in python.scipy, 445(1):51–56, 2010. 13

2010

[68] [68]

Scipy 1.0: fundamental algorithms for scientific computing in python.Nature methods, 17(3):261–272, 2020

Pauli Virtanen, Ralf Gommers, Travis E Oliphant, Matt Haberland, Tyler Reddy, David Courna- peau, Evgeni Burovski, Pearu Peterson, Warren Weckesser, Jonathan Bright, et al. Scipy 1.0: fundamental algorithms for scientific computing in python.Nature methods, 17(3):261–272, 2020

2020

[69] [69]

Scikit- learn: Machine learning in python.the Journal of machine Learning research, 12:2825–2830, 2011

Fabian Pedregosa, Gaël Varoquaux, Alexandre Gramfort, Vincent Michel, Bertrand Thirion, Olivier Grisel, Mathieu Blondel, Peter Prettenhofer, Ron Weiss, Vincent Dubourg, et al. Scikit- learn: Machine learning in python.the Journal of machine Learning research, 12:2825–2830, 2011

2011

[70] [70]

Matplotlib: A 2d graphics environment.Computing in science & engineering, 9(3):90–95, 2007

John D Hunter. Matplotlib: A 2d graphics environment.Computing in science & engineering, 9(3):90–95, 2007

2007

[71] [71]

Seaborn: statistical data visualization.Journal of open source software, 6 (60):3021, 2021

Michael L Waskom. Seaborn: statistical data visualization.Journal of open source software, 6 (60):3021, 2021

2021

[72] [72]

lifelines: survival analysis in python.Journal of Open Source Software, 4(40):1317, 2019

Cameron Davidson-Pilon. lifelines: survival analysis in python.Journal of Open Source Software, 4(40):1317, 2019

2019

[73] [73]

Pytorch: An imperative style, high-performance deep learning library.Advances in neural information processing systems, 32, 2019

Adam Paszke, Sam Gross, Francisco Massa, Adam Lerer, James Bradbury, Gregory Chanan, Trevor Killeen, Zeming Lin, Natalia Gimelshein, Luca Antiga, et al. Pytorch: An imperative style, high-performance deep learning library.Advances in neural information processing systems, 32, 2019

2019

[74] [74]

Causal-learn: Causal discovery in python.Journal of Machine Learning Research, 25(60):1–8, 2024

Yujia Zheng, Biwei Huang, Wei Chen, Joseph Ramsey, Mingming Gong, Ruichu Cai, Shohei Shimizu, Peter Spirtes, and Kun Zhang. Causal-learn: Causal discovery in python.Journal of Machine Learning Research, 25(60):1–8, 2024

2024

[75] [75]

Adam: A Method for Stochastic Optimization

Diederik P Kingma and Jimmy Ba. Adam: A method for stochastic optimization.arXiv preprint arXiv:1412.6980, 2014. 14 Appendix: Additional Details to the Main Text Purpose of this Appendix.This appendix provides detailed methodological, implementation, and supplementary result information supporting the main text. It ensures reproducibility, clarifies evalu...

work page internal anchor Pith review Pith/arXiv arXiv 2014

[76] [76]

an auxiliary autoencoder update on embedded real samples,

[77] [77]

multiple critic updates using the WGAN-GP objective,

[78] [78]

one generator update using the adversarial and correlation-alignment losses. Formally, the model optimises min G,A max D Exreal[D(xreal)]−E z[D(G(z))] +λ GP ·GP +λ corr · Lcorr +L A, with the understanding that LA is applied only to the auxiliary autoencoder parameters, while the adversarial objectives govern the generator and critic updates. In the imple...

[79] [79]

samplex 0 from real data,

[80] [80]

sample timestept∼ U {1, . . . , T},