Graph-GRPO: Training Graph Flow Models with Reinforcement Learning
read the original abstract
Graph generation is a fundamental task with broad applications, such as drug discovery. Recently, discrete flow matching-based graph generation, \aka, graph flow model (GFM), has emerged due to its superior performance and flexible sampling. However, effectively aligning GFMs with complex human preferences or task-specific objectives remains a significant challenge. In this paper, we propose Graph-GRPO, an online reinforcement learning (RL) framework for training GFMs under verifiable rewards. Our method makes two key contributions: (1) We derive an analytical expression for the transition probability of GFMs, replacing the Monte Carlo sampling and enabling fully differentiable rollouts for RL training; (2) We propose a refinement strategy that randomly perturbs specific nodes and edges in a graph, and regenerates them, allowing for localized exploration and self-improvement of generation quality. Extensive experiments on both synthetic and real datasets demonstrate the effectiveness of Graph-GRPO. With only 50 denoising steps, our method achieves 95.0\% and 97.5\% Valid-Unique-Novelty scores on the planar and tree datasets, respectively. Moreover, Graph-GRPO achieves state-of-the-art performance on the molecular optimization tasks, outperforming graph-based and fragment-based RL methods as well as classic genetic algorithms.
This paper has not been read by Pith yet.
Forward citations
Cited by 3 Pith papers
-
Crys-JEPA: Accelerating Crystal Discovery via Embedding Screening and Generative Refinement
Crys-JEPA introduces a joint embedding predictive architecture that creates an energy-aware latent space, enabling embedding-based stability screening and a refinement pipeline that yields up to 72.7% gains on the V.S...
-
LEMON: Learning Executable Multi-Agent Orchestration via Counterfactual Reinforcement Learning
LEMON trains an LLM orchestrator with counterfactual-augmented GRPO to produce deployable multi-agent specifications that reach state-of-the-art results on six reasoning and coding benchmarks.
-
Composable Crystals: Controllable Materials Discovery via Concept Learning
VQ-VAE concept learning enables controllable recombination of crystal motifs to generate structures with reported gains in validity-stability-uniqueness-novelty metrics on MP-20 and Alex-MP-20.
discussion (0)
Sign in with ORCID, Apple, or X to comment. Anyone can read and Pith papers without signing in.