UniTriGen uses unified diffusion in a shared latent space plus lightweight adapters and scene-balanced sampling to produce high-quality aligned VIS-IR-Label triplets from limited paired data, improving few-shot RGB-T semantic segmentation.
Image-to-image translation with conditional adversarial networks
5 Pith papers cite this work. Polarity classification is still indexing.
citation-role summary
citation-polarity summary
roles
method 1polarities
use method 1representative citing papers
Spatial Gram Alignment aligns internal self-similarities of LDM features with foundation priors to reconcile global structure and fine details in ultra-high-resolution text-to-image synthesis.
InsightTok improves text and face fidelity in discrete image tokenization via content-aware perceptual losses, with gains transferring to autoregressive generation.
CDPA scales diffusion-based reconstruction to large 3D volumes by conditioning 2D models on initial 3D reconstructions plus data-consistency alignment, delivering state-of-the-art results on synthetic and real CBCT data.
Emu3.5 is a native multimodal world model pre-trained on over 10 trillion vision-language tokens with next-token prediction, post-trained via reinforcement learning, and accelerated by Discrete Diffusion Adaptation for efficient interleaved generation and world exploration.
citing papers explorer
-
UniTriGen: Unified Triplet Generation of Aligned Visible-Infrared-Label for Few-Shot RGB-T Semantic Segmentation
UniTriGen uses unified diffusion in a shared latent space plus lightweight adapters and scene-balanced sampling to produce high-quality aligned VIS-IR-Label triplets from limited paired data, improving few-shot RGB-T semantic segmentation.
-
Spatial Gram Alignment for Ultra-High-Resolution Image Synthesis
Spatial Gram Alignment aligns internal self-similarities of LDM features with foundation priors to reconcile global structure and fine details in ultra-high-resolution text-to-image synthesis.
-
InsightTok: Improving Text and Face Fidelity in Discrete Tokenization for Autoregressive Image Generation
InsightTok improves text and face fidelity in discrete image tokenization via content-aware perceptual losses, with gains transferring to autoregressive generation.
-
Conditional Diffusion Posterior Alignment for Sparse-View CT Reconstruction
CDPA scales diffusion-based reconstruction to large 3D volumes by conditioning 2D models on initial 3D reconstructions plus data-consistency alignment, delivering state-of-the-art results on synthetic and real CBCT data.
-
Emu3.5: Native Multimodal Models are World Learners
Emu3.5 is a native multimodal world model pre-trained on over 10 trillion vision-language tokens with next-token prediction, post-trained via reinforcement learning, and accelerated by Discrete Diffusion Adaptation for efficient interleaved generation and world exploration.