hub

T2i-adapter: Learning adapters to dig out more controllable ability for text-to-image diffusion models

Chong Mou, Xintao Wang, Liangbin Xie, Jian Zhang, Zhongang Qi, Ying Shan, Xiaohu Qie · 2023 · arXiv 2302.08453

16 Pith papers cite this work. Polarity classification is still indexing.

16 Pith papers citing it

read on arXiv browse 16 citing papers

hub tools

JSON dossier citing papers JSON arXiv source

citation-role summary

background 1

citation-polarity summary

background 1

representative citing papers

Delta-Adapter: Scalable Exemplar-Based Image Editing with Single-Pair Supervision

cs.CV · 2026-05-08 · unverdicted · novelty 7.0

Delta-Adapter extracts a semantic delta from a single image pair via a pre-trained vision encoder and injects it through a Perceiver adapter to enable scalable single-pair supervised editing.

AnimateDiff: Animate Your Personalized Text-to-Image Diffusion Models without Specific Tuning

cs.CV · 2023-07-10 · unverdicted · novelty 7.0

A single motion module trained on videos adds temporally coherent animation to any personalized text-to-image model derived from the same base without additional tuning.

The two clocks and the innovation window: When and how generative models learn rules

cs.LG · 2026-05-11 · unverdicted · novelty 6.0

Generative models learn rules before memorizing data, creating an innovation window whose width depends on dataset size and rule complexity, observed in both diffusion and autoregressive architectures.

Stylistic Attribute Control in Latent Diffusion Models

cs.CV · 2026-05-04 · unverdicted · novelty 6.0

A technique for parametric stylistic control in latent diffusion models learns disentangled directions from synthetic datasets and applies them via guidance composition while preserving semantics.

Map2World: Segment Map Conditioned Text to 3D World Generation

cs.CV · 2026-05-01 · unverdicted · novelty 6.0

Map2World produces scale-consistent 3D worlds from text and arbitrary segment maps via a detail enhancer that incorporates global structure information.

PhysEdit: Physically-Consistent Region-Aware Image Editing via Adaptive Spatio-Temporal Reasoning

cs.CV · 2026-05-01 · unverdicted · novelty 6.0

PhysEdit introduces adaptive reasoning depth and spatial masking to make image editing faster and more instruction-aligned without retraining the base model.

UniVidX: A Unified Multimodal Framework for Versatile Video Generation via Diffusion Priors

cs.CV · 2026-05-01 · unverdicted · novelty 6.0

UniVidX unifies diverse video generation tasks into one conditional diffusion model using stochastic condition masking, decoupled gated LoRAs, and cross-modal self-attention.

MetaSR: Content-Adaptive Metadata Orchestration for Generative Super-Resolution

cs.CV · 2026-04-29 · unverdicted · novelty 6.0

MetaSR adaptively orchestrates metadata in a DiT-based generative SR model to deliver up to 1 dB PSNR gains and 50% bitrate savings across diverse content and degradations.

Camera Control for Text-to-Image Generation via Learning Viewpoint Tokens

cs.CV · 2026-04-21 · unverdicted · novelty 6.0

Viewpoint tokens learned on a mixed 3D-rendered and photorealistic dataset enable precise camera control in text-to-image generation while factorizing geometry from appearance and transferring to unseen object categories.

VBench-2.0: Advancing Video Generation Benchmark Suite for Intrinsic Faithfulness

cs.CV · 2025-03-27 · accept · novelty 6.0

VBench-2.0 is a benchmark suite that automatically evaluates video generative models on five dimensions of intrinsic faithfulness: Human Fidelity, Controllability, Creativity, Physics, and Commonsense using VLMs, LLMs, and anomaly detection methods.

CameraCtrl: Enabling Camera Control for Text-to-Video Generation

cs.CV · 2024-04-02 · unverdicted · novelty 6.0

CameraCtrl enables accurate camera pose control in video diffusion models through a trained plug-and-play module and dataset choices emphasizing diverse camera trajectories with matching appearance.

VideoCrafter1: Open Diffusion Models for High-Quality Video Generation

cs.CV · 2023-10-30 · unverdicted · novelty 6.0

Open-source text-to-video and image-to-video diffusion models generate high-quality 1024x576 videos, with the I2V variant claimed as the first to strictly preserve reference image content.

IP-Adapter: Text Compatible Image Prompt Adapter for Text-to-Image Diffusion Models

cs.CV · 2023-08-13 · unverdicted · novelty 6.0

IP-Adapter adds effective image prompting to text-to-image diffusion models using a lightweight decoupled cross-attention adapter that works alongside text prompts and other controls.

Any2Any 3D Diffusion Models with Knowledge Transfer: A Radiotherapy Planning Study

cs.CV · 2026-05-10 · unverdicted · novelty 5.0

DiffKT3D transfers priors from video diffusion models to 3D radiotherapy dose prediction via modality-specific embeddings and clinically guided RL, reducing voxel MAE from 2.07 to 1.93 and claiming SOTA over the GDP-HMM challenge winner.

Step1X-Edit: A Practical Framework for General Image Editing

cs.CV · 2025-04-24 · unverdicted · novelty 4.0

Step1X-Edit integrates a multimodal LLM with a diffusion decoder, trained on a custom high-quality dataset, to deliver image editing performance that surpasses open-source baselines and approaches proprietary models on the new GEdit-Bench.

Parameter-Efficient Fine-Tuning for Large Models: A Comprehensive Survey

cs.LG · 2024-03-21 · accept · novelty 4.0

A comprehensive survey of PEFT algorithms for large models, covering their performance, overhead, applications, and real-world system implementations.

citing papers explorer

Showing 16 of 16 citing papers.

Delta-Adapter: Scalable Exemplar-Based Image Editing with Single-Pair Supervision cs.CV · 2026-05-08 · unverdicted · none · ref 35
Delta-Adapter extracts a semantic delta from a single image pair via a pre-trained vision encoder and injects it through a Perceiver adapter to enable scalable single-pair supervised editing.
AnimateDiff: Animate Your Personalized Text-to-Image Diffusion Models without Specific Tuning cs.CV · 2023-07-10 · unverdicted · none · ref 13
A single motion module trained on videos adds temporally coherent animation to any personalized text-to-image model derived from the same base without additional tuning.
The two clocks and the innovation window: When and how generative models learn rules cs.LG · 2026-05-11 · unverdicted · none · ref 20
Generative models learn rules before memorizing data, creating an innovation window whose width depends on dataset size and rule complexity, observed in both diffusion and autoregressive architectures.
Stylistic Attribute Control in Latent Diffusion Models cs.CV · 2026-05-04 · unverdicted · none · ref 13
A technique for parametric stylistic control in latent diffusion models learns disentangled directions from synthetic datasets and applies them via guidance composition while preserving semantics.
Map2World: Segment Map Conditioned Text to 3D World Generation cs.CV · 2026-05-01 · unverdicted · none · ref 28
Map2World produces scale-consistent 3D worlds from text and arbitrary segment maps via a detail enhancer that incorporates global structure information.
PhysEdit: Physically-Consistent Region-Aware Image Editing via Adaptive Spatio-Temporal Reasoning cs.CV · 2026-05-01 · unverdicted · none · ref 20
PhysEdit introduces adaptive reasoning depth and spatial masking to make image editing faster and more instruction-aligned without retraining the base model.
UniVidX: A Unified Multimodal Framework for Versatile Video Generation via Diffusion Priors cs.CV · 2026-05-01 · unverdicted · none · ref 53
UniVidX unifies diverse video generation tasks into one conditional diffusion model using stochastic condition masking, decoupled gated LoRAs, and cross-modal self-attention.
MetaSR: Content-Adaptive Metadata Orchestration for Generative Super-Resolution cs.CV · 2026-04-29 · unverdicted · none · ref 18
MetaSR adaptively orchestrates metadata in a DiT-based generative SR model to deliver up to 1 dB PSNR gains and 50% bitrate savings across diverse content and degradations.
Camera Control for Text-to-Image Generation via Learning Viewpoint Tokens cs.CV · 2026-04-21 · unverdicted · none · ref 24
Viewpoint tokens learned on a mixed 3D-rendered and photorealistic dataset enable precise camera control in text-to-image generation while factorizing geometry from appearance and transferring to unseen object categories.
VBench-2.0: Advancing Video Generation Benchmark Suite for Intrinsic Faithfulness cs.CV · 2025-03-27 · accept · none · ref 40
VBench-2.0 is a benchmark suite that automatically evaluates video generative models on five dimensions of intrinsic faithfulness: Human Fidelity, Controllability, Creativity, Physics, and Commonsense using VLMs, LLMs, and anomaly detection methods.
CameraCtrl: Enabling Camera Control for Text-to-Video Generation cs.CV · 2024-04-02 · unverdicted · none · ref 135
CameraCtrl enables accurate camera pose control in video diffusion models through a trained plug-and-play module and dataset choices emphasizing diverse camera trajectories with matching appearance.
VideoCrafter1: Open Diffusion Models for High-Quality Video Generation cs.CV · 2023-10-30 · unverdicted · none · ref 36
Open-source text-to-video and image-to-video diffusion models generate high-quality 1024x576 videos, with the I2V variant claimed as the first to strictly preserve reference image content.
IP-Adapter: Text Compatible Image Prompt Adapter for Text-to-Image Diffusion Models cs.CV · 2023-08-13 · unverdicted · none · ref 11
IP-Adapter adds effective image prompting to text-to-image diffusion models using a lightweight decoupled cross-attention adapter that works alongside text prompts and other controls.
Any2Any 3D Diffusion Models with Knowledge Transfer: A Radiotherapy Planning Study cs.CV · 2026-05-10 · unverdicted · none · ref 40
DiffKT3D transfers priors from video diffusion models to 3D radiotherapy dose prediction via modality-specific embeddings and clinically guided RL, reducing voxel MAE from 2.07 to 1.93 and claiming SOTA over the GDP-HMM challenge winner.
Step1X-Edit: A Practical Framework for General Image Editing cs.CV · 2025-04-24 · unverdicted · none · ref 36
Step1X-Edit integrates a multimodal LLM with a diffusion decoder, trained on a custom high-quality dataset, to deliver image editing performance that surpasses open-source baselines and approaches proprietary models on the new GEdit-Bench.
Parameter-Efficient Fine-Tuning for Large Models: A Comprehensive Survey cs.LG · 2024-03-21 · accept · none · ref 242
A comprehensive survey of PEFT algorithms for large models, covering their performance, overhead, applications, and real-world system implementations.

T2i-adapter: Learning adapters to dig out more controllable ability for text-to-image diffusion models

hub tools

citation-role summary

citation-polarity summary

fields

years

verdicts

roles

polarities

representative citing papers

citing papers explorer