DiSK: A Diffusion Model for Structured Knowledge

Bhaskar Mitra; James Hensman; Niklas Nolte; Ouail Kitouni

arxiv: 2312.05253 · v2 · pith:XKHTP5KYnew · submitted 2023-12-08 · 💻 cs.LG · cs.AI

DiSK: A Diffusion Model for Structured Knowledge

Ouail Kitouni , Niklas Nolte , James Hensman , Bhaskar Mitra This is my paper

classification 💻 cs.LG cs.AI

keywords structureddatadiskmodelsdiffusionknowledgemodelapproach

0 comments

read the original abstract

Structured (dictionary-like) data presents challenges for left-to-right language models, as they can struggle with structured entities for a wide variety of reasons such as formatting and sensitivity to the order in which attributes are presented. Tabular generative models suffer from a different set of limitations such as their lack of flexibility. We introduce Diffusion Models of Structured Knowledge (DiSK) - a new architecture and training approach specialized for structured data. DiSK handles text, categorical, and continuous numerical data using a Gaussian mixture model approach, which allows for improved precision when dealing with numbers. It employs diffusion training to model relationships between properties. Experiments demonstrate DiSK's state-of-the-art performance on tabular data modeling, synthesis, and imputation on over 15 datasets across diverse domains. DiSK provides an effective inductive bias for generative modeling and manipulation of structured data. The techniques we propose could open the door to improved knowledge manipulation in future language models.

This paper has not been read by Pith yet.

discussion (0)

Forward citations

Cited by 4 Pith papers

Reviewed papers in the Pith corpus that reference this work. Sorted by Pith novelty score.

Large Language Diffusion Models
cs.CL 2025-02 unverdicted novelty 8.0

LLaDA is a scalable diffusion-based language model that matches autoregressive LLMs like LLaMA3 8B on tasks and surpasses GPT-4o on reversal poem completion.
Fast-dLLM: Training-free Acceleration of Diffusion LLM by Enabling KV Cache and Parallel Decoding
cs.CL 2025-05 conditional novelty 7.0

Fast-dLLM adds reusable KV cache blocks and selective parallel decoding to diffusion LLMs, closing most of the speed gap with autoregressive models without retraining.
LLaDA-V: Large Language Diffusion Models with Visual Instruction Tuning
cs.LG 2025-05 conditional novelty 6.0

LLaDA-V is a diffusion-based multimodal large language model that reaches competitive or state-of-the-art results on visual instruction tasks while using a non-autoregressive architecture.
Efficient Long-Context Modeling in Diffusion Language Models via Block Approximate Sparse Attention
cs.CV 2026-05 unverdicted novelty 5.0

BA-Att introduces pre-downsampled block selection with norm-sorting and diagonal covariance correction to approximate sparse attention, yielding up to 6.95x speedup at 50% sparsity across language, multimodal, and vid...