pith. machine review for the scientific record. sign in

arxiv: 2505.16527 · v3 · submitted 2025-05-22 · 💻 cs.LG

Recognition: unknown

Joint Relational Database Generation via Graph-Conditional Diffusion Models

Authors on Pith no claims yet
classification 💻 cs.LG
keywords generationmodelsrdbsrelationalsingle-tableapplicationsapproachautoregressive
0
0 comments X
read the original abstract

Building generative models for relational databases (RDBs) is important for many applications, such as privacy-preserving data release and augmenting real datasets. However, most prior works either focus on single-table generation or adapt single-table models to the multi-table setting by relying on autoregressive factorizations and sequential generation. These approaches limit parallelism, restrict flexibility in downstream applications, and compound errors due to commonly made conditional independence assumptions. In this paper, we propose a fundamentally different approach: jointly modeling all tables in an RDB without imposing any table order. By using a natural graph representation of RDBs, we propose the Graph-Conditional Relational Diffusion Model (GRDM), which leverages a graph neural network to jointly denoise row attributes and capture complex inter-table dependencies. Extensive experiments on six real-world RDBs demonstrate that our approach substantially outperforms autoregressive baselines in modeling multi-hop inter-table correlations and achieves state-of-the-art performance on single-table fidelity metrics. Our code is available at https://github.com/ketatam/rdb-diffusion.

This paper has not been read by Pith yet.

discussion (0)

Sign in with ORCID, Apple, or X to comment. Anyone can read and Pith papers without signing in.

Forward citations

Cited by 1 Pith paper

Reviewed papers in the Pith corpus that reference this work. Sorted by Pith novelty score.

  1. RelBench v2: A Large-Scale Benchmark and Repository for Relational Data

    cs.LG 2026-02 unverdicted novelty 7.0

    RelBench v2 expands a relational deep learning benchmark with four new large datasets and autocomplete tasks, showing models that use table relationships outperform single-table baselines.