Recognition: no theorem link
MIDST Challenge at SaTML 2025: Membership Inference over Diffusion-models-based Synthetic Tabular data
Pith reviewed 2026-05-15 08:03 UTC · model grok-4.3
The pith
The MIDST challenge shows that membership inference attacks can quantify privacy leakage in synthetic tabular data from diffusion models.
A machine-rendered reading of the paper's core claim, the machinery that carries it, and where it could break.
Core claim
MIDST is a challenge that explores diffusion models for generating synthetic tabular data of mixed types and multi-relational structures with interconnected constraints, and it prompted the creation of specialized black-box and white-box membership inference attacks to evaluate how resistant the resulting data is to privacy threats.
What carries the argument
Membership inference attacks applied directly to diffusion models trained on tabular data, used as the metric to measure whether synthetic outputs leak information about the original training records.
If this is right
- Novel black-box membership inference attacks were developed specifically for diffusion models on tabular data.
- Novel white-box membership inference attacks were developed specifically for diffusion models on tabular data.
- The evaluation covers both single tables with mixed data types and multi-relational tables.
- Quantitative scores of privacy efficacy are produced by measuring attack success rates across the submitted methods.
- The challenge provides a benchmark for comparing privacy protection levels of different diffusion model approaches to synthetic tabular data.
Where Pith is reading between the lines
- The same attack-based evaluation approach could be adapted to other generative models such as GANs or VAEs for tabular data.
- High-performing attacks in the challenge could guide the addition of privacy constraints directly into diffusion model training for tabular data.
- Standardized challenges like MIDST may become a routine step before releasing synthetic tabular datasets for public use.
Load-bearing premise
That success or failure of membership inference attacks on the synthetic outputs can accurately reflect the real privacy leakage from the original dataset used to train the diffusion models.
What would settle it
A finding that all submitted membership inference attacks achieve accuracy no better than random guessing on the challenge's held-out test sets would show that the evaluation method does not detect meaningful privacy leakage.
read the original abstract
Synthetic data is often perceived as a silver-bullet solution to data anonymization and privacy-preserving data publishing. Drawn from generative models like diffusion models, synthetic data is expected to preserve the statistical properties of the original dataset while remaining resilient to privacy attacks. Recent developments of diffusion models have been effective on a wide range of data types, but their privacy resilience, particularly for tabular formats, remains largely unexplored. MIDST challenge sought a quantitative evaluation of the privacy gain of synthetic tabular data generated by diffusion models, with a specific focus on its resistance to membership inference attacks (MIAs). Given the heterogeneity and complexity of tabular data, multiple target models were explored for MIAs, including diffusion models for single tables of mixed data types and multi-relational tables with interconnected constraints. MIDST inspired the development of novel black-box and white-box MIAs tailored to these target diffusion models as a key outcome, enabling a comprehensive evaluation of their privacy efficacy. The MIDST GitHub repository is available at https://github.com/VectorInstitute/MIDST
Editorial analysis
A structured set of objections, weighed in public.
Referee Report
Summary. The paper presents the MIDST challenge at SaTML 2025, which evaluates the privacy resilience of synthetic tabular data generated by diffusion models against membership inference attacks (MIAs). It covers target models for single mixed-type tables and multi-relational tables, and claims that the challenge inspired novel black-box and white-box MIAs tailored to these models, enabling a comprehensive privacy evaluation. The work references a GitHub repository for resources but provides no attack algorithms, results, or quantitative metrics in the manuscript itself.
Significance. If the claimed novel MIAs and their evaluations were rigorously documented and shown to outperform prior methods with reproducible metrics, the challenge could meaningfully advance privacy assessment for diffusion-based tabular synthesis, an underexplored area. However, the absence of any technical details or results in the manuscript substantially reduces its standalone contribution to the literature.
major comments (1)
- [Abstract] Abstract: The central claim that MIDST 'inspired the development of novel black-box and white-box MIAs tailored to these target diffusion models as a key outcome' is unsupported by any attack descriptions, novelty arguments relative to existing tabular or diffusion MIAs, success rates, or ablation studies. The manuscript limits itself to challenge setup and a GitHub link, leaving the primary asserted contribution dependent on external, unexamined submissions.
minor comments (1)
- The paper would benefit from a brief summary table of challenge submissions (e.g., attack types, AUC scores, or top-performing methods) even if full details are in the repository, to make the privacy evaluation claims more self-contained.
Simulated Author's Rebuttal
We thank the referee for their thoughtful review and for highlighting the need to better substantiate the manuscript's claims. The paper describes the MIDST challenge at SaTML 2025, which was designed to evaluate privacy resilience of diffusion-generated synthetic tabular data. We address the specific concern below and will revise the manuscript accordingly.
read point-by-point responses
-
Referee: [Abstract] Abstract: The central claim that MIDST 'inspired the development of novel black-box and white-box MIAs tailored to these target diffusion models as a key outcome' is unsupported by any attack descriptions, novelty arguments relative to existing tabular or diffusion MIAs, success rates, or ablation studies. The manuscript limits itself to challenge setup and a GitHub link, leaving the primary asserted contribution dependent on external, unexamined submissions.
Authors: We acknowledge that the current manuscript focuses primarily on the challenge design, target models (single mixed-type tables and multi-relational tables), and the overall evaluation framework, with technical details of participant submissions referenced via the GitHub repository. The claim that the challenge inspired novel MIAs is based on the fact that multiple teams developed and submitted tailored black-box and white-box attacks specifically for diffusion-based tabular generators, which were not previously explored in this setting. To address the referee's valid point and make the contribution more self-contained, we will revise the manuscript to include a concise summary section describing the key innovations in the top-performing attacks (e.g., adaptations for mixed data types and relational constraints), high-level performance metrics from the challenge leaderboard, and brief novelty arguments relative to prior tabular MIAs. Full algorithms and ablations will remain in the repository and associated participant reports, as is standard for challenge papers. revision: yes
Circularity Check
No circularity in challenge description paper
full rationale
This manuscript describes the MIDST challenge setup for evaluating privacy of diffusion-based synthetic tabular data against membership inference attacks. It contains no equations, derivations, fitted parameters, predictions, or first-principles results. The statement that MIDST 'inspired the development of novel black-box and white-box MIAs' refers to external participant submissions (via GitHub link) rather than any internal reduction to the paper's own inputs. No self-citation load-bearing steps, ansatzes, uniqueness theorems, or renamings of known results appear. The derivation chain is empty; the paper is self-contained as a challenge report with no circular structure.
Axiom & Free-Parameter Ledger
Forward citations
Cited by 2 Pith papers
-
FERMI: Exploiting Relations for Membership Inference Against Tabular Diffusion Models
FERMI improves membership inference on tabular diffusion models by mapping relational auxiliary information into attack features, raising TPR at 0.1 FPR by up to 53% white-box and 22% black-box over single-table baselines.
-
On Privacy Leakage in Tabular Diffusion Models: Influential Factors, Attacker Knowledge, and Metrics
Tabular diffusion models leak membership information via attacks even with partial attacker knowledge, and common heuristic privacy metrics like distance-to-closest-record are unreliable.
Reference graph
Works this paper leans on
-
[1]
S. A. Assefa, D. Dervovic, M. Mahfouz, R. E. Tillman, P. Reddy, and M. Veloso. Generating synthetic data in finance: opportunities, challenges and pitfalls. In Proceedings of the First ACM International Conference on AI in Finance, pages 1–8, 2020
work page 2020
-
[2]
P. Berka et al. Guide to the financial data set.PKDD2000 discovery challenge, 2000
work page 2000
-
[3]
N. Carlini, J. Hayes, M. Nasr, M. Jagielski, V . Sehwag, F. Tram `er, B. Balle, D. Ippolito, and E. Wallace. Ex- tracting training data from diffusion models. InUSENIX Security 23, pages 5253–5270, 2023
work page 2023
-
[4]
J. Duan, F. Kong, S. Wang, X. Shi, and K. Xu. Are diffusion models vulnerable to membership inference attacks? InICML, 2023
work page 2023
-
[5]
J. Fonseca and F. Bacao. Tabular and latent space synthetic data generation: a literature review.Journal of Big Data, 10(1):115, 2023
work page 2023
-
[6]
E. German and D. Samira. Mia-ept: Membership inference attack via error prediction for tabular data. https://github.com/eyalgerman/MIA-EPT, 2025. GitHub repository
work page 2025
- [7]
-
[8]
A. Gonzales, G. Guruswamy, and S. R. Smith. Synthetic data in health care: A narrative review.PLOS Digital Health, 2(1):1–16, 01 2023
work page 2023
-
[9]
M. Hernandez, G. Epelde, A. Alberdi, R. Cilla, and D. Rankin. Synthetic data generation for tabular health records: A systematic review.Neurocomputing, 493:28– 45, 2022
work page 2022
- [10]
-
[11]
A. Kotelnikov, D. Baranchuk, I. Rubachev, and A. Babenko. Tabddpm: Modelling tabular data with diffusion models. InICML, pages 17564–17579, 2023
work page 2023
-
[12]
H. Lautraite, L. Herbault, , Y . Qi, J.-F. Rajotte, and S. Gambs. Ensemble mia: The 2nd place solu- tion to the midst black-box mia on the single-table competition. https://github.com/CRCHUM-CITADEL/ ensemble-mia, 2025. GitHub repository, accessed: 2025- 12-10
work page 2025
-
[13]
C. Lee, J. Kim, and N. Park. Codi: Co-evolving con- trastive diffusion models for mixed-type tabular synthe- sis. InICML, pages 18940–18956, 2023
work page 2023
-
[14]
T. Liu, J. Fan, N. Tang, G. Li, and X. Du. Controllable tabular data synthesis using diffusion models.Proc. ACM Manag. Data, 2(1), 2024
work page 2024
- [15]
-
[16]
MICO: Membership inference competition
Microsoft. MICO: Membership inference competition. https://github.com/microsoft/MICO, 2023. GitHub repos- itory
work page 2023
- [17]
-
[18]
Y . Pang. Solution for MIDST. https://github.com/ py85252876/MIDST, 2025. GitHub repository
work page 2025
-
[19]
Y . Pang, T. Wang, X. Kang, M. Huai, and Y . Zhang. White-box membership inference attacks against diffu- sion models.Proceedings on Privacy Enhancing Tech- nologies, 2025(2):398–415, 2025
work page 2025
-
[20]
V . K. Potluru, D. Borrajo, A. Coletta, N. Dalmasso, Y . El- Laham, E. Fons, M. Ghassemi, S. Gopalakrishnan, V . Go- sai, E. Krea ˇci´c, G. Mani, S. Obitayo, D. Paramanand, N. Raman, M. Solonin, S. Sood, S. Vyetrenko, H. Zhu, M. Veloso, and T. Balch. Synthetic data applications in finance.arXiv preprint arXiv:2401.00081, 2024
-
[21]
R. Rombach, A. Blattmann, D. Lorenz, P. Esser, and B. Ommer. High-resolution image synthesis with latent diffusion models. InProceedings of the IEEE/CVF conference on computer vision and pattern recognition, pages 10684–10695, 2022
work page 2022
-
[22]
T. Stadler, B. Oprisanu, and C. Troncoso. Synthetic data – anonymisation groundhog day. InUSENIX Security 22, pages 1451–1468, 2022
work page 2022
-
[23]
B. van Breugel, N. Seedat, F. Imrie, and M. van der Schaar. Can you rely on your model evaluation? im- proving model evaluation with synthetic test data. In Advances in Neural Information Processing Systems, 2023
work page 2023
-
[24]
B. van Breugel, H. Sun, Z. Qian, and M. van der Schaar. Membership inference attacks against synthetic data through overfitting detection. In F. J. R. Ruiz, J. G. Dy, and J. van de Meent, editors,International Confer- ence on Artificial Intelligence and Statistics, volume 206 ofProceedings of Machine Learning Research, pages 3493–3514, 2023
work page 2023
-
[25]
B. van Breugel, H. Sun, Z. Qian, and M. van der Schaar. Membership inference attacks against synthetic data through overfitting detection. In F. J. R. Ruiz, J. G. Dy, and J. van de Meent, editors,International Confer- ence on Artificial Intelligence and Statistics, 25-27 April 2023, Palau de Congressos, Valencia, Spain, volume 206 ofProceedings of Machine...
work page 2023
-
[26]
B. van Breugel and M. van der Schaar. Beyond privacy: Navigating the opportunities and challenges of synthetic data.arXiv preprint arXiv:2304.03722, 2023
-
[27]
Diffusion models for tabular and time series bootcamp
Vector Institute. Diffusion models for tabular and time series bootcamp. https://github.com/VectorInstitute/ diffusion-models, 2024. GitHub repository
work page 2024
-
[28]
X. Wu, Y . Pang, T. Liu, and S. Wu. Winning the midst challenge: New membership inference attacks on diffu- sion models for tabular data synthesis.arXiv preprint, 2025
work page 2025
- [29]
-
[30]
S. Zheng and N. Charoenphakdee. Diffusion models for missing value imputation in tabular data.arXiv preprint arXiv:2210.17128, 2022
discussion (0)
Sign in with ORCID, Apple, or X to comment. Anyone can read and Pith papers without signing in.