hub

arXiv preprint arXiv:2310.01506 (2023)

Xuan Ju, Ailing Zeng, Yuxuan Bian, Shaoteng Liu, Qiang Xu · 2023 · arXiv 2310.01506

18 Pith papers cite this work. Polarity classification is still indexing.

18 Pith papers citing it

read on arXiv browse 18 citing papers

hub tools

JSON dossier citing papers JSON arXiv source

citation-role summary

background 1 dataset 1 method 1

citation-polarity summary

background 1 use dataset 1 use method 1

representative citing papers

ResetEdit: Precise Text-guided Editing of Generated Image via Resettable Starting Latent

cs.CV · 2026-04-28 · unverdicted · novelty 7.0

ResetEdit embeds a recoverable discrepancy signal during image generation in diffusion models to reconstruct an approximate original latent for high-fidelity text-guided editing.

$Z^2$-Sampling: Zero-Cost Zigzag Trajectories for Semantic Alignment in Diffusion Models

cs.CV · 2026-04-26 · unverdicted · novelty 7.0

Z²-Sampling implicitly realizes zero-cost zigzag trajectories for curvature-aware semantic alignment in diffusion models by reducing multi-step paths via operator dualities and temporal caching while synthesizing a directional derivative penalty.

Prompt-Guided Image Editing with Masked Logit Nudging in Visual Autoregressive Models

cs.CV · 2026-04-16 · unverdicted · novelty 7.0

Masked Logit Nudging aligns visual autoregressive model logits with source token maps under target prompts inside cross-attention masks, delivering top image editing results on PIE benchmarks and strong reconstructions on COCO and OpenImages while running faster than diffusion approaches.

RewardFlow: Generate Images by Optimizing What You Reward

cs.CV · 2026-04-09 · unverdicted · novelty 7.0

RewardFlow unifies differentiable rewards including a new VQA-based one and uses a prompt-aware adaptive policy with Langevin dynamics to achieve state-of-the-art image editing and compositional generation.

LPNSR: Optimal Noise-Guided Diffusion Image Super-Resolution Via Learnable Noise Prediction

cs.CV · 2026-03-22 · conditional · novelty 7.0

LPNSR derives optimal intermediate noise for diffusion SR via MLE and implements it with an LR-guided noise predictor, reaching SOTA perceptual quality in 4 steps without text priors.

Preserving Source Video Realism: High-Fidelity Face Swapping for Cinematic Quality

cs.CV · 2025-12-08 · unverdicted · novelty 7.0

LivingSwap is the first video reference-guided face swapping model that uses keyframe conditioning and temporal stitching to preserve source video realism with high fidelity across long sequences.

Delta Rectified Flow Sampling for Text-to-Image Editing

cs.CV · 2025-09-01 · unverdicted · novelty 7.0

DRFS is a new inversion-free editing technique for rectified flow models that models source-target velocity discrepancies and applies a time-dependent shift to improve fidelity and unify prior methods like DDS and FlowEdit.

In-Context Edit: Enabling Instructional Image Editing with In-Context Generation in Large Scale Diffusion Transformer

cs.CV · 2025-04-29 · unverdicted · novelty 7.0

ICEdit achieves state-of-the-art instructional image editing in Diffusion Transformers via in-context generation, requiring only 0.1% of prior training data and 1% trainable parameters.

VAGS: Velocity Adaptive Guidance Scale for Image Editing and Generation

cs.CV · 2026-05-15 · accept · novelty 6.0

VAGS adapts the CFG scale at each ODE step using velocity alignment signals to raise structural fidelity in editing and sample quality in generation over fixed-scale baselines.

LimeCross: Context-Conditioned Layered Image Editing with Structural Consistency

cs.CV · 2026-05-11 · unverdicted · novelty 6.0

LimeCross enables text-guided editing of individual layers in composite images by conditioning on cross-layer context via bi-stream attention while preserving layer integrity and introducing the LayerEditBench benchmark.

Rethinking Where to Edit: Task-Aware Localization for Instruction-Based Image Editing

cs.CV · 2026-04-22 · unverdicted · novelty 6.0

Task-aware localization via attention cues and feature centroids from source/target streams in IIE models improves non-edit consistency while preserving instruction following.

FlashEdit: Decoupling Speed, Structure, and Semantics for Precise Image Editing

cs.CV · 2025-09-26 · conditional · novelty 6.0

FlashEdit delivers real-time localized text-guided image editing under 0.2 seconds via cycle-consistent one-step inversion, background shield, and sparsified spatial cross-attention, achieving over 150x speedup on PIE-Bench.

EditVerse: Unifying Image and Video Editing and Generation with In-Context Learning

cs.CV · 2025-09-24 · unverdicted · novelty 6.0

EditVerse unifies image and video editing and generation in one transformer model via unified token sequences and in-context learning, trained jointly on curated video editing data plus image/video corpora and evaluated on a new instruction-based benchmark.

CPAM: Context-Preserving Adaptive Manipulation for Zero-Shot Real Image Editing

cs.CV · 2025-06-23 · unverdicted · novelty 6.0

CPAM proposes a context-preserving adaptive manipulation method for zero-shot real image editing in diffusion models via preservation adaptation and localized extraction modules, outperforming prior techniques on a new IMBA benchmark.

Grounded SAM: Assembling Open-World Models for Diverse Visual Tasks

cs.CV · 2024-01-25 · unverdicted · novelty 6.0

Grounded SAM integrates Grounding DINO and SAM to support text-prompted open-world detection and segmentation, achieving 48.7 mean AP on SegInW zero-shot with the base detector and huge segmenter.

Stable and Near-Reversible Diffusion ODE Solvers for Image Editing

cs.CV · 2026-05-12 · unverdicted · novelty 5.0

Near-reversible Runge-Kutta diffusion ODE solvers with vector-field smoothing improve stability and edit fidelity for large changes in text-guided image editing compared to exactly reversible alternatives.

Semantic Granularity Navigation in Image Editing

cs.CV · 2026-05-20

DirectEdit: Step-Level Accurate Inversion for Flow-Based Image Editing

cs.CV · 2026-05-04

citing papers explorer

Showing 18 of 18 citing papers.

ResetEdit: Precise Text-guided Editing of Generated Image via Resettable Starting Latent cs.CV · 2026-04-28 · unverdicted · none · ref 10
ResetEdit embeds a recoverable discrepancy signal during image generation in diffusion models to reconstruct an approximate original latent for high-fidelity text-guided editing.
$Z^2$-Sampling: Zero-Cost Zigzag Trajectories for Semantic Alignment in Diffusion Models cs.CV · 2026-04-26 · unverdicted · none · ref 14
Z²-Sampling implicitly realizes zero-cost zigzag trajectories for curvature-aware semantic alignment in diffusion models by reducing multi-step paths via operator dualities and temporal caching while synthesizing a directional derivative penalty.
Prompt-Guided Image Editing with Masked Logit Nudging in Visual Autoregressive Models cs.CV · 2026-04-16 · unverdicted · none · ref 22
Masked Logit Nudging aligns visual autoregressive model logits with source token maps under target prompts inside cross-attention masks, delivering top image editing results on PIE benchmarks and strong reconstructions on COCO and OpenImages while running faster than diffusion approaches.
RewardFlow: Generate Images by Optimizing What You Reward cs.CV · 2026-04-09 · unverdicted · none · ref 19
RewardFlow unifies differentiable rewards including a new VQA-based one and uses a prompt-aware adaptive policy with Langevin dynamics to achieve state-of-the-art image editing and compositional generation.
LPNSR: Optimal Noise-Guided Diffusion Image Super-Resolution Via Learnable Noise Prediction cs.CV · 2026-03-22 · conditional · none · ref 40
LPNSR derives optimal intermediate noise for diffusion SR via MLE and implements it with an LR-guided noise predictor, reaching SOTA perceptual quality in 4 steps without text priors.
Preserving Source Video Realism: High-Fidelity Face Swapping for Cinematic Quality cs.CV · 2025-12-08 · unverdicted · none · ref 20
LivingSwap is the first video reference-guided face swapping model that uses keyframe conditioning and temporal stitching to preserve source video realism with high fidelity across long sequences.
Delta Rectified Flow Sampling for Text-to-Image Editing cs.CV · 2025-09-01 · unverdicted · none · ref 13
DRFS is a new inversion-free editing technique for rectified flow models that models source-target velocity discrepancies and applies a time-dependent shift to improve fidelity and unify prior methods like DDS and FlowEdit.
In-Context Edit: Enabling Instructional Image Editing with In-Context Generation in Large Scale Diffusion Transformer cs.CV · 2025-04-29 · unverdicted · none · ref 16
ICEdit achieves state-of-the-art instructional image editing in Diffusion Transformers via in-context generation, requiring only 0.1% of prior training data and 1% trainable parameters.
VAGS: Velocity Adaptive Guidance Scale for Image Editing and Generation cs.CV · 2026-05-15 · accept · none · ref 30
VAGS adapts the CFG scale at each ODE step using velocity alignment signals to raise structural fidelity in editing and sample quality in generation over fixed-scale baselines.
LimeCross: Context-Conditioned Layered Image Editing with Structural Consistency cs.CV · 2026-05-11 · unverdicted · none · ref 21
LimeCross enables text-guided editing of individual layers in composite images by conditioning on cross-layer context via bi-stream attention while preserving layer integrity and introducing the LayerEditBench benchmark.
Rethinking Where to Edit: Task-Aware Localization for Instruction-Based Image Editing cs.CV · 2026-04-22 · unverdicted · none · ref 17
Task-aware localization via attention cues and feature centroids from source/target streams in IIE models improves non-edit consistency while preserving instruction following.
FlashEdit: Decoupling Speed, Structure, and Semantics for Precise Image Editing cs.CV · 2025-09-26 · conditional · none · ref 13
FlashEdit delivers real-time localized text-guided image editing under 0.2 seconds via cycle-consistent one-step inversion, background shield, and sparsified spatial cross-attention, achieving over 150x speedup on PIE-Bench.
EditVerse: Unifying Image and Video Editing and Generation with In-Context Learning cs.CV · 2025-09-24 · unverdicted · none · ref 11
EditVerse unifies image and video editing and generation in one transformer model via unified token sequences and in-context learning, trained jointly on curated video editing data plus image/video corpora and evaluated on a new instruction-based benchmark.
CPAM: Context-Preserving Adaptive Manipulation for Zero-Shot Real Image Editing cs.CV · 2025-06-23 · unverdicted · none · ref 21
CPAM proposes a context-preserving adaptive manipulation method for zero-shot real image editing in diffusion models via preservation adaptation and localized extraction modules, outperforming prior techniques on a new IMBA benchmark.
Grounded SAM: Assembling Open-World Models for Diverse Visual Tasks cs.CV · 2024-01-25 · unverdicted · none · ref 21
Grounded SAM integrates Grounding DINO and SAM to support text-prompted open-world detection and segmentation, achieving 48.7 mean AP on SegInW zero-shot with the base detector and huge segmenter.
Stable and Near-Reversible Diffusion ODE Solvers for Image Editing cs.CV · 2026-05-12 · unverdicted · none · ref 21
Near-reversible Runge-Kutta diffusion ODE solvers with vector-field smoothing improve stability and edit fidelity for large changes in text-guided image editing compared to exactly reversible alternatives.
Semantic Granularity Navigation in Image Editing cs.CV · 2026-05-20 · unreviewed · ref 7
DirectEdit: Step-Level Accurate Inversion for Flow-Based Image Editing cs.CV · 2026-05-04 · unreviewed · ref 25

arXiv preprint arXiv:2310.01506 (2023)

hub tools

citation-role summary

citation-polarity summary

fields

years

verdicts

roles

polarities

representative citing papers

citing papers explorer