Syntax-Directed Variational Autoencoder for Structured Data

Hanjun Dai , Yingtao Tian , Bo Dai , Steven Skiena , Le Song

Authors on Pith no claims yet

classification 💻 cs.LG cs.CL

keywords datasyntax-directedapproachautoencodercheckconstraintsdiscretegenerative

read the original abstract

Deep generative models have been enjoying success in modeling continuous data. However it remains challenging to capture the representations for discrete structures with formal grammars and semantics, e.g., computer programs and molecular structures. How to generate both syntactically and semantically correct data still remains largely an open problem. Inspired by the theory of compiler where the syntax and semantics check is done via syntax-directed translation (SDT), we propose a novel syntax-directed variational autoencoder (SD-VAE) by introducing stochastic lazy attributes. This approach converts the offline SDT check into on-the-fly generated guidance for constraining the decoder. Comparing to the state-of-the-art methods, our approach enforces constraints on the output space so that the output will be not only syntactically valid, but also semantically reasonable. We evaluate the proposed model with applications in programming language and molecules, including reconstruction and program/molecule optimization. The results demonstrate the effectiveness in incorporating syntactic and semantic constraints in discrete generative models, which is significantly better than current state-of-the-art approaches.

This paper has not been read by Pith yet.

discussion (0)

Forward citations

Cited by 2 Pith papers

Reviewed papers in the Pith corpus that reference this work. Sorted by Pith novelty score.

MUSE: Resolving Manifold Misalignment in Visual Tokenization via Topological Orthogonality
cs.CV 2026-05 unverdicted novelty 6.0

MUSE decouples reconstruction and semantic learning in visual tokenization via topological orthogonality, yielding SOTA generation quality and improved semantic performance over its teacher model.
Machine Learning for Multi-messenger Probes of New Physics and Cosmology: A Review and Perspective
hep-ph 2026-04 unverdicted novelty 3.0

A review summarizing machine learning methods for multi-messenger probes of dark matter and new physics, with a proposed plan for future integrated analyses.