FLDD learns non-Markovian marginal and posterior distributions for the forward process so a factorized reverse process can match the target better and produce higher-quality samples in fewer steps.
hub
Simpler diffusion (SiD2): 1.5 FID on ImageNet512 with pixel-space diffusion
12 Pith papers cite this work. Polarity classification is still indexing.
hub tools
citation-role summary
citation-polarity summary
roles
background 1polarities
background 1representative citing papers
3D-Belief maintains and updates explicit 3D beliefs about partially observed environments to enable multi-hypothesis imagination and improved performance on embodied tasks.
DFoT enables flexible history conditioning in video diffusion, with history guidance methods that boost temporal consistency and support long rollouts.
MIMFlow uses a VAE on masked images to feed semantic latents to a normalizing flow while a decoder handles high-frequency details, reporting FID 2.50 and 71.3% linear probing on ImageNet 256x256 with 128 tokens.
GPIC is a new 28-trillion-pixel permissively licensed image corpus with 100M training examples for visual generative modeling.
WavFlow performs direct waveform audio generation via flow matching on 2D token grids from raw patches plus amplitude lifting, matching latent-based methods on VGGSound and AudioCaps without intermediate compression.
SRC-Flow compresses RAE features via a Semantic Representation Compressor into a low-dimensional space, enabling normalizing flows to reach gFID 1.65 on ImageNet 256x256 and 2.07 on 512x512 while retaining exact likelihoods.
L2P repurposes pre-trained LDMs for direct pixel generation via large-patch tokenization and shallow-layer training on synthetic data, matching source performance with 8-GPU training and enabling native 4K output.
iTARFlow augments normalizing flows with diffusion-style iterative denoising during sampling while preserving end-to-end likelihood training, reaching competitive results on ImageNet 64/128/256.
Derives closed-form optimal loss for unified diffusion models, provides variance-controlled estimators, and shows improved diagnosis, training schedules, and power-law scaling after subtracting the optimal value.
Multimodal diffusion model generates discrete gate selections and continuous parameters for quantum circuit compilation, claiming better gate counts and noise resilience than prior methods.
The Cosmos platform supplies open-source pre-trained world models and supporting tools for building fine-tunable digital world simulations to train Physical AI.
citing papers explorer
-
Forward-Learned Discrete Diffusion: Learning how to noise to denoise faster
FLDD learns non-Markovian marginal and posterior distributions for the forward process so a factorized reverse process can match the target better and produce higher-quality samples in fewer steps.
-
3D-Belief: Embodied Belief Inference via Generative 3D World Modeling
3D-Belief maintains and updates explicit 3D beliefs about partially observed environments to enable multi-hypothesis imagination and improved performance on embodied tasks.
-
History-Guided Video Diffusion
DFoT enables flexible history conditioning in video diffusion, with history guidance methods that boost temporal consistency and support long rollouts.
-
MIMFlow: Integrating Masked Image Modeling with Normalizing Flows for End-to-End Image Generation
MIMFlow uses a VAE on masked images to feed semantic latents to a normalizing flow while a decoder handles high-frequency details, reporting FID 2.50 and 71.3% linear probing on ImageNet 256x256 with 128 tokens.
-
GPIC: A Giant Permissive Image Corpus for Visual Generation
GPIC is a new 28-trillion-pixel permissively licensed image corpus with 100M training examples for visual generative modeling.
-
WavFlow: Audio Generation in Waveform Space
WavFlow performs direct waveform audio generation via flow matching on 2D token grids from raw patches plus amplitude lifting, matching latent-based methods on VGGSound and AudioCaps without intermediate compression.
-
SRC-Flow: Compact Semantic Representations Enable Normalizing Flows for Image Generation
SRC-Flow compresses RAE features via a Semantic Representation Compressor into a low-dimensional space, enabling normalizing flows to reach gFID 1.65 on ImageNet 256x256 and 2.07 on 512x512 while retaining exact likelihoods.
-
L2P: Unlocking Latent Potential for Pixel Generation
L2P repurposes pre-trained LDMs for direct pixel generation via large-patch tokenization and shallow-layer training on synthetic data, matching source performance with 8-GPU training and enabling native 4K output.
-
Normalizing Flows with Iterative Denoising
iTARFlow augments normalizing flows with diffusion-style iterative denoising during sampling while preserving end-to-end likelihood training, reaching competitive results on ImageNet 64/128/256.
-
Diagnosing and Improving Diffusion Models by Estimating the Optimal Loss Value
Derives closed-form optimal loss for unified diffusion models, provides variance-controlled estimators, and shows improved diagnosis, training schedules, and power-law scaling after subtracting the optimal value.
-
Synthesis of discrete-continuous quantum circuits with multimodal diffusion models
Multimodal diffusion model generates discrete gate selections and continuous parameters for quantum circuit compilation, claiming better gate counts and noise resilience than prior methods.
-
Cosmos World Foundation Model Platform for Physical AI
The Cosmos platform supplies open-source pre-trained world models and supporting tools for building fine-tunable digital world simulations to train Physical AI.