pith. sign in

hub

Gans trained by a two time-scale update rule converge to a local nash equilibrium

12 Pith papers cite this work. Polarity classification is still indexing.

12 Pith papers citing it

hub tools

citation-role summary

method 2 background 1 baseline 1

citation-polarity summary

clear filters

representative citing papers

Guidance Watermarking for Diffusion Models

cs.CR · 2025-09-26 · unverdicted · novelty 6.0

Guidance watermarking steers diffusion denoising steps via gradients from an off-the-shelf watermark decoder to embed marks during generation, converting post-hoc schemes into in-generation ones while remaining complementary to VAE modifications.

Emu3: Next-Token Prediction is All You Need

cs.CV · 2024-09-27 · unverdicted · novelty 6.0

Emu3 shows that next-token prediction on a unified discrete token space for text, images, and video lets a single transformer outperform task-specific models such as SDXL and LLaVA-1.6 in multimodal generation and perception.

Lossless Anti-Distillation Sampling

cs.LG · 2026-05-12 · unverdicted · novelty 5.0

LADS is a sampling method that keeps benign user generations statistically identical to the original model while forcing correlated samples across a distiller's multiple accounts, provably worsening their generalization via uniform convergence bounds.

ModelScope Text-to-Video Technical Report

cs.CV · 2023-08-12 · unverdicted · novelty 4.0

ModelScopeT2V is a 1.7-billion-parameter text-to-video model built on Stable Diffusion that adds temporal modeling and outperforms prior methods on three evaluation metrics.

citing papers explorer

Showing 9 of 9 citing papers after filters.

  • Causal-Adapter: Taming Text-to-Image Diffusion for Faithful Counterfactual Generation cs.CV · 2025-09-29 · unverdicted · none · ref 9 · 2 links

    Causal-Adapter adapts frozen diffusion backbones via structural causal modeling, prompt-aligned injection, and conditioned token contrastive loss to achieve faithful counterfactual generation with strong attribute control and identity preservation.

  • Guidance Watermarking for Diffusion Models cs.CR · 2025-09-26 · unverdicted · none · ref 24

    Guidance watermarking steers diffusion denoising steps via gradients from an off-the-shelf watermark decoder to embed marks during generation, converting post-hoc schemes into in-generation ones while remaining complementary to VAE modifications.

  • Geometry Forcing: Marrying Video Diffusion and 3D Representation for Consistent World Modeling cs.CV · 2025-07-10 · unverdicted · none · ref 31

    Geometry Forcing aligns video diffusion representations with geometric foundation model features via angular cosine and scale regression objectives to improve 3D consistency in generated videos.

  • 2ndMatch: Finetuning Pruned Diffusion Models via Second-Order Jacobian Matching cs.GR · 2025-06-03 · unverdicted · none · ref 16

    2ndMatch finetunes pruned diffusion models via second-order Jacobian matching inspired by Finite-Time Lyapunov Exponents to reduce the quality gap with dense models on image generation tasks.

  • Emu3: Next-Token Prediction is All You Need cs.CV · 2024-09-27 · unverdicted · none · ref 29

    Emu3 shows that next-token prediction on a unified discrete token space for text, images, and video lets a single transformer outperform task-specific models such as SDXL and LLaVA-1.6 in multimodal generation and perception.

  • Scaling Autoregressive Models for Content-Rich Text-to-Image Generation cs.CV · 2022-06-22 · unverdicted · none · ref 52

    Scaling an autoregressive Transformer to 20B parameters for text-to-image generation using image token sequences achieves new SOTA zero-shot FID of 7.23 and fine-tuned FID of 3.22 on MS-COCO.

  • Lossless Anti-Distillation Sampling cs.LG · 2026-05-12 · unverdicted · none · ref 85

    LADS is a sampling method that keeps benign user generations statistically identical to the original model while forcing correlated samples across a distiller's multiple accounts, provably worsening their generalization via uniform convergence bounds.

  • Reconstruction or Semantics? What Makes a Latent Space Useful for Robotic World Models cs.CV · 2026-05-07 · unverdicted · none · ref 24

    Semantic latent spaces from pretrained encoders outperform reconstruction-based spaces for robotic world models on planning and downstream policy performance.

  • ModelScope Text-to-Video Technical Report cs.CV · 2023-08-12 · unverdicted · none · ref 15

    ModelScopeT2V is a 1.7-billion-parameter text-to-video model built on Stable Diffusion that adds temporal modeling and outperforms prior methods on three evaluation metrics.