TBPO derives a token-level preference optimization objective from sequence-level pairwise data via Bregman divergence ratio matching that generalizes DPO and improves alignment quality.
Concrete score matching: Generalized score matching for discrete data
2 Pith papers cite this work. Polarity classification is still indexing.
years
2026 2verdicts
UNVERDICTED 2representative citing papers
SCMDM adapts trained masked diffusion models to condition denoising steps on their own prior clean predictions, cutting generative perplexity nearly in half on open-web text while improving discretized image, molecule, and genomic synthesis.
citing papers explorer
-
TokenRatio: Principled Token-Level Preference Optimization via Ratio Matching
TBPO derives a token-level preference optimization objective from sequence-level pairwise data via Bregman divergence ratio matching that generalizes DPO and improves alignment quality.
-
Simple Self-Conditioning Adaptation for Masked Diffusion Models
SCMDM adapts trained masked diffusion models to condition denoising steps on their own prior clean predictions, cutting generative perplexity nearly in half on open-web text while improving discretized image, molecule, and genomic synthesis.