MetaEarth-MM unifies multi-modal remote sensing image generation and any-to-any translation across five modalities via scene-centered joint modeling on the new EarthMM dataset.
hub
Scalable diffusion models with transformers
18 Pith papers cite this work. Polarity classification is still indexing.
hub tools
citation-role summary
citation-polarity summary
representative citing papers
4DLidarOpen is a new open dataset providing synchronized 4D FMCW Lidar velocity measurements, multi-Lidar and camera data, and 3D bounding-box annotations with track IDs to support benchmarks on 3D detection, BEV segmentation, flow prediction, and motion forecasting.
Latent space probing on CogVideoX achieves 97.29% F1 for adult content detection on a new 11k-clip dataset with 4-6ms overhead.
DiffTSP applies discrete diffusion to knowledge graph triple set prediction, recovering all missing triples simultaneously via edge-masking noise reversal and a structure-aware transformer, achieving SOTA on three datasets.
TacticGen generates realistic, adaptable football tactics via a multi-agent diffusion transformer trained on 3.3M events and 100M frames, supporting rule-, language-, or model-based guidance at inference time.
SetFlow is a flow-matching generative model for permutation-invariant MIL bags in representation space that produces synthetic data improving classification performance and enabling training on synthetic data alone.
MIMIC-D enables multi-modal multi-agent coordination via joint training of decentralized diffusion policies using only local information.
ShardTensor is a domain-parallelism system for SciML that enables flexible scaling of extreme-resolution spatial datasets by removing the constraint of batch size one per device.
A cold diffusion model with direct and delta-normalized reverse processes, using UNet and transformer backbones, outperforms diffusion baselines for dereverberating acoustic and electronic drum stems on in-domain and out-of-domain tests.
The work creates identity-consistent synthetic makeup data via ConsistentBeauty and adapts models to real images using reinforcement learning in RealBeauty, achieving better identity preservation and real-world performance than prior methods.
Chain-of-Details (CoD) is a cascaded TTS method that explicitly models temporal coarse-to-fine dynamics with a shared decoder, achieving competitive performance using significantly fewer parameters.
TAX-DPD combines a feed-forward dense GMM for global placement priors with disentangled point cloud diffusion for local geometry and pose to achieve precise robotic object placement.
NC-Diffusion matches quantization noise to the diffusion forward process, adds an adaptive frequency filter and zero-shot enhancement, and reports superior fidelity on benchmarks.
A primitive-based truncated diffusion model with keypoint attention encoding generates more efficient and diverse trajectories for mobile manipulators than vanilla diffusion in cluttered 3D simulations.
Video generation models can function as world simulators if efficiency gaps in spatiotemporal modeling are bridged via organized paradigms, architectures, and algorithms.
Flow-Opt combines a flow-matching DiT model with a custom differentiable safety filter and learned initialization to enable fast centralized trajectory optimization for tens of robots.
RF-HiT uses rectified flow and a multi-scale hierarchical transformer to reach 91.27% Dice on ACDC and 87.40% on BraTS 2021 with only 10.14 GFLOPs, 13.6M parameters, and three inference steps.
AnimateAnyMesh++ animates arbitrary 3D meshes from text using an expanded 300K-identity DyMesh-XL dataset, a power-law topology-aware DyMeshVAE-Flex, and a variable-length rectified-flow generator to produce semantically accurate, temporally coherent animations in seconds.
citing papers explorer
-
One Pass for All: A Discrete Diffusion Model for Knowledge Graph Triple Set Prediction
DiffTSP applies discrete diffusion to knowledge graph triple set prediction, recovering all missing triples simultaneously via edge-masking noise reversal and a structure-aware transformer, achieving SOTA on three datasets.
-
TacticGen: Grounding Adaptable and Scalable Generation of Football Tactics
TacticGen generates realistic, adaptable football tactics via a multi-agent diffusion transformer trained on 3.3M events and 100M frames, supporting rule-, language-, or model-based guidance at inference time.