A new speculative inference system speeds up diffusion VLAs to 19.1 ms average latency (3.04x faster) on LIBERO by replacing most full 58 ms inferences with 7.8 ms draft rounds while preserving task performance.
hub
One step diffusion via shortcut models
18 Pith papers cite this work. Polarity classification is still indexing.
hub tools
representative citing papers
DriftXpress approximates drifting kernels via projected RKHS fields to lower training cost of one-step generative models while matching original FID scores.
W-Flow achieves state-of-the-art one-step ImageNet 256x256 generation at 1.29 FID by training a static neural network to follow a Wasserstein gradient flow that minimizes Sinkhorn divergence, delivering roughly 100x faster sampling than comparable multi-step models.
FMRG is a training-free, single-trajectory guidance method for flow models derived from optimal control that achieves strong reward alignment with only 3 NFEs.
Isokinetic Flow Matching adds a lightweight regularization term to flow matching that penalizes acceleration along paths via self-guided finite differences, yielding straighter trajectories and large gains in few-step sampling quality on CIFAR-10.
VOSR shows that competitive generative image super-resolution with faithful structures can be achieved by training a diffusion-style model from scratch on visual data alone, using a vision encoder for guidance and a restoration-oriented sampling strategy.
Dreamer 4 is the first agent to obtain diamonds in Minecraft from only offline data by reinforcement learning inside a scalable world model that accurately predicts game mechanics.
Tyche achieves competitive probabilistic weather forecasting skill and calibration using a single-step flow model with JVP-regularized training and rollout finetuning.
Distilled one-step consistency model from optimal-transport flow-matching teacher reconstructs high-fidelity dynamical system flows from low-fidelity data with 12x speedup, half the parameters, and 23.1% better SSIM than scratch-trained baselines.
OGPO is a sample-efficient off-policy method for full finetuning of generative control policies that reaches SOTA on robotic manipulation tasks and can recover from poor behavior-cloning initializations without expert data.
FlowS achieves state-of-the-art single-step motion prediction on Waymo Open Motion Dataset by using scene-conditioned anchor trajectories and a step-consistent displacement field to make local transport accurate in one Euler step.
Mutual Forcing trains a single native autoregressive audio-video model with mutually reinforcing few-step and multi-step modes via self-distillation to match 50-step baselines at 4-8 steps.
FASTER models multi-candidate denoising as an MDP and trains a value function to filter actions early, delivering the performance of full sampling at lower cost in diffusion RL policies.
APEX derives self-adversarial gradients from condition-shifted velocity fields in flow models to achieve high-fidelity one-step generation, outperforming much larger models and multi-step teachers.
MENO enhances neural operators with MeanFlow to restore multi-scale accuracy in dynamical system predictions while keeping inference costs low, achieving up to 2x better power spectrum accuracy and 12x faster inference than diffusion-enhanced baselines on phase-field, Kolmogorov flow, and active-m<f
Salt improves low-step video generation quality by adding endpoint-consistent regularization to distribution matching distillation and using cache-conditioned feature alignment for autoregressive models.
SAM 3D reconstructs 3D objects from single images with geometry, texture, and pose using human-model annotated data at scale and synthetic-to-real training, achieving 5:1 human preference wins.
Visual generation models are evolving from passive renderers to interactive agentic world modelers, but current systems lack spatial reasoning, temporal consistency, and causal understanding, with evaluations overemphasizing perceptual quality.
citing papers explorer
-
Realtime-VLA FLASH: Speculative Inference Framework for Diffusion-based VLAs
A new speculative inference system speeds up diffusion VLAs to 19.1 ms average latency (3.04x faster) on LIBERO by replacing most full 58 ms inferences with 7.8 ms draft rounds while preserving task performance.
-
DriftXpress: Faster Drifting Models via Projected RKHS Fields
DriftXpress approximates drifting kernels via projected RKHS fields to lower training cost of one-step generative models while matching original FID scores.
-
One-Step Generative Modeling via Wasserstein Gradient Flows
W-Flow achieves state-of-the-art one-step ImageNet 256x256 generation at 1.29 FID by training a static neural network to follow a Wasserstein gradient flow that minimizes Sinkhorn divergence, delivering roughly 100x faster sampling than comparable multi-step models.
-
How to Guide Your Flow: Few-Step Alignment via Flow Map Reward Guidance
FMRG is a training-free, single-trajectory guidance method for flow models derived from optimal control that achieves strong reward alignment with only 3 NFEs.
-
Isokinetic Flow Matching for Pathwise Straightening of Generative Flows
Isokinetic Flow Matching adds a lightweight regularization term to flow matching that penalizes acceleration along paths via self-guided finite differences, yielding straighter trajectories and large gains in few-step sampling quality on CIFAR-10.
-
VOSR: A Vision-Only Generative Model for Image Super-Resolution
VOSR shows that competitive generative image super-resolution with faithful structures can be achieved by training a diffusion-style model from scratch on visual data alone, using a vision encoder for guidance and a restoration-oriented sampling strategy.
-
Training Agents Inside of Scalable World Models
Dreamer 4 is the first agent to obtain diamonds in Minecraft from only offline data by reinforcement learning inside a scalable world model that accurately predicts game mechanics.
-
Tyche: One Step Flow for Efficient Probabilistic Weather Forecasting
Tyche achieves competitive probabilistic weather forecasting skill and calibration using a single-step flow model with JVP-regularized training and rollout finetuning.
-
Physical Fidelity Reconstruction via Improved Consistency-Distilled Flow Matching for Dynamical Systems
Distilled one-step consistency model from optimal-transport flow-matching teacher reconstructs high-fidelity dynamical system flows from low-fidelity data with 12x speedup, half the parameters, and 23.1% better SSIM than scratch-trained baselines.
-
OGPO: Sample Efficient Full-Finetuning of Generative Control Policies
OGPO is a sample-efficient off-policy method for full finetuning of generative control policies that reaches SOTA on robotic manipulation tasks and can recover from poor behavior-cloning initializations without expert data.
-
FlowS: One-Step Motion Prediction via Local Transport Conditioning
FlowS achieves state-of-the-art single-step motion prediction on Waymo Open Motion Dataset by using scene-conditioned anchor trajectories and a step-consistent displacement field to make local transport accurate in one Euler step.
-
Mutual Forcing: Dual-Mode Self-Evolution for Fast Autoregressive Audio-Video Character Generation
Mutual Forcing trains a single native autoregressive audio-video model with mutually reinforcing few-step and multi-step modes via self-distillation to match 50-step baselines at 4-8 steps.
-
FASTER: Value-Guided Sampling for Fast RL
FASTER models multi-candidate denoising as an MDP and trains a value function to filter actions early, delivering the performance of full sampling at lower cost in diffusion RL policies.
-
Self-Adversarial One Step Generation via Condition Shifting
APEX derives self-adversarial gradients from condition-shifted velocity fields in flow models to achieve high-fidelity one-step generation, outperforming much larger models and multi-step teachers.
-
MENO: MeanFlow-Enhanced Neural Operators for Dynamical Systems
MENO enhances neural operators with MeanFlow to restore multi-scale accuracy in dynamical system predictions while keeping inference costs low, achieving up to 2x better power spectrum accuracy and 12x faster inference than diffusion-enhanced baselines on phase-field, Kolmogorov flow, and active-m<f
-
Salt: Self-Consistent Distribution Matching with Cache-Aware Training for Fast Video Generation
Salt improves low-step video generation quality by adding endpoint-consistent regularization to distribution matching distillation and using cache-conditioned feature alignment for autoregressive models.
-
SAM 3D: 3Dfy Anything in Images
SAM 3D reconstructs 3D objects from single images with geometry, texture, and pose using human-model annotated data at scale and synthetic-to-real training, achieving 5:1 human preference wins.
-
Visual Generation in the New Era: An Evolution from Atomic Mapping to Agentic World Modeling
Visual generation models are evolving from passive renderers to interactive agentic world modelers, but current systems lack spatial reasoning, temporal consistency, and causal understanding, with evaluations overemphasizing perceptual quality.