Introduces Block-R1 benchmark, Block-R1-41K dataset, and a conflict score to handle domain-specific optimal block sizes in RL post-training of diffusion LLMs.
hub
Mmada: Multimodal large diffusion language models.arXiv preprint arXiv:2505.15809
17 Pith papers cite this work. Polarity classification is still indexing.
hub tools
citation-role summary
citation-polarity summary
roles
baseline 1polarities
baseline 1representative citing papers
UniPath adaptively models coordination-path diversity in unified multimodal models by training a path-conditioned executor and using a lightweight planner for input-dependent selection, improving performance over fixed strategies.
RSPO interprets reward advantages as targets for relative log-ratios in dLLMs, calibrating noisy estimates to stabilize RLVR training and achieve strong gains on planning tasks with competitive math reasoning performance.
ΔLPS is a gradient-guided discrete posterior sampler for inverse problems that works with masked or uniform discrete diffusion priors and outperforms prior discrete methods on image restoration tasks.
BARD bridges autoregressive and diffusion VLMs with progressive block merging plus stage-wise intra-diffusion distillation, delivering 3x speedup and new SOTA on open dVLMs using under 4.4M data points.
b1 trains dLLMs to dynamically select reasoning block sizes via monotonic entropy descent with RL, improving coherence over fixed-size baselines on reasoning benchmarks.
NoiseRater meta-learns instance-level importance scores for noise in diffusion training via bilevel optimization, then uses a two-stage pipeline to improve efficiency and generation quality on FFHQ and ImageNet.
Auditing five frontier VLMs reveals severe grounding failures (max 0.23 IoU, 19.1% Acc@0.5) and format collapse (up to 99% parse failure) in medical VQA; fine-tuning yields 85.5% SLAKE recall but perception remains the primary trustworthiness issue.
A discrete diffusion model tokenizes multimodal robotic data and uses a progress token to predict future states and task completion for scalable policy evaluation.
Stability-Weighted Decoding improves diffusion LLM accuracy by modulating token scores with temporal stability from KL divergence between prediction steps.
Position and step penalty plus visual reasoning guidance fix premature answering and weak visual grounding in diffusion MLLMs, delivering up to 7.5% accuracy gains and over 3x speedup.
Tuna-2 shows pixel embeddings can replace vision encoders in unified multimodal models, achieving competitive or superior results on understanding and generation benchmarks.
UniGenDet unifies generative and discriminative models through symbiotic self-attention and detector-guided alignment to co-evolve image generation and authenticity detection.
DMax enables faster parallel decoding in diffusion language models by using on-policy training to recover from errors and soft embedding interpolations for iterative revision, boosting tokens per forward pass roughly 2-3x on benchmarks while preserving accuracy.
Motus unifies understanding, video generation, and action in one latent world model via MoT experts and optical-flow latent actions, reporting gains over prior methods in simulation and real robots.
TorchUMM is the first unified codebase and benchmark suite for standardized evaluation of diverse unified multimodal models on understanding, generation, and editing tasks.
Show-o2 unifies text, image, and video understanding and generation in a single autoregressive-plus-flow-matching model built on 3D causal VAE representations.
citing papers explorer
-
Block-R1: Rethinking the Role of Block Size in Multi-domain Reinforcement Learning for Diffusion Large Language Models
Introduces Block-R1 benchmark, Block-R1-41K dataset, and a conflict score to handle domain-specific optimal block sizes in RL post-training of diffusion LLMs.
-
Discrete Langevin-Inspired Posterior Sampling
ΔLPS is a gradient-guided discrete posterior sampler for inverse problems that works with masked or uniform discrete diffusion priors and outperforms prior discrete methods on image restoration tasks.
-
Break the Block: Dynamic-size Reasoning Blocks for Diffusion Large Language Models via Monotonic Entropy Descent with Reinforcement Learning
b1 trains dLLMs to dynamically select reasoning block sizes via monotonic entropy descent with RL, improving coherence over fixed-size baselines on reasoning benchmarks.
-
NoiseRater: Meta-Learned Noise Valuation for Diffusion Model Training
NoiseRater meta-learns instance-level importance scores for noise in diffusion training via bilevel optimization, then uses a two-stage pipeline to improve efficiency and generation quality on FFHQ and ImageNet.
-
DMax: Aggressive Parallel Decoding for dLLMs
DMax enables faster parallel decoding in diffusion language models by using on-policy training to recover from errors and soft embedding interpolations for iterative revision, boosting tokens per forward pass roughly 2-3x on benchmarks while preserving accuracy.