EditMGT: Unleashing potentials of masked generative transformers in image editing

Wei Chow, Linfeng Li, Lingdong Kong, Zefeng Li, Qi Xu, Hang Song, Tian Ye, Xian Wang, Jinbin Bai, Shilin Xu, et al · 2025 · arXiv 2512.11715

6 Pith papers cite this work. Polarity classification is still indexing.

6 Pith papers citing it

read on arXiv browse 6 citing papers

citation-role summary

baseline 1 dataset 1 method 1

citation-polarity summary

baseline 1 use dataset 1 use method 1

representative citing papers

Aurora: Unified Video Editing with a Tool-Using Agent

cs.CV · 2026-05-18 · unverdicted · novelty 7.0

Aurora introduces a VLM-based agent that converts raw user video edit requests into structured conditioning inputs for a unified diffusion transformer, improving performance on underspecified tasks via a new benchmark.

Edit Where You Mean: Region-Aware Adapter Injection for Mask-Free Local Image Editing

cs.CV · 2026-04-26 · unverdicted · novelty 7.0

A co-trained adapter framework enables mask-free local editing in DiTs by factorizing edit semantics from spatial location and jointly learning a mask predictor.

UniEditBench: A Unified and Cost-Effective Benchmark for Image and Video Editing via Distilled MLLMs

cs.CV · 2026-04-17 · unverdicted · novelty 7.0

UniEditBench unifies image and video editing evaluation with a nine-plus-eight operation taxonomy and cost-effective 4B/8B distilled MLLM evaluators that align with human judgments.

SIGMA: Semantic-Difference Instruction-Grounding Mask Annotator for Text-Driven Image Manipulation Localization

cs.CV · 2026-05-27 · unverdicted · novelty 6.0

SIGMA generates accurate IML masks via semantic feature differencing and instruction-guided cross-modal refinement, yielding a 1.1M training set that boosts six detectors by 18.34% F1 on five datasets.

LLaDA2.0-Uni: Unifying Multimodal Understanding and Generation with Diffusion Large Language Model

cs.CV · 2026-04-22 · unverdicted · novelty 6.0

LLaDA2.0-Uni unifies multimodal understanding and generation inside one discrete diffusion large language model with a semantic tokenizer, MoE backbone, and diffusion decoder.

Steering Visual Generation in Unified Multimodal Models with Understanding Supervision

cs.CV · 2026-05-07 · unverdicted · novelty 5.0

Using understanding tasks as direct supervision during post-training improves image generation and editing in unified multimodal models.

citing papers explorer

Showing 6 of 6 citing papers.

Aurora: Unified Video Editing with a Tool-Using Agent cs.CV · 2026-05-18 · unverdicted · none · ref 5
Aurora introduces a VLM-based agent that converts raw user video edit requests into structured conditioning inputs for a unified diffusion transformer, improving performance on underspecified tasks via a new benchmark.
Edit Where You Mean: Region-Aware Adapter Injection for Mask-Free Local Image Editing cs.CV · 2026-04-26 · unverdicted · none · ref 42
A co-trained adapter framework enables mask-free local editing in DiTs by factorizing edit semantics from spatial location and jointly learning a mask predictor.
UniEditBench: A Unified and Cost-Effective Benchmark for Image and Video Editing via Distilled MLLMs cs.CV · 2026-04-17 · unverdicted · none · ref 9
UniEditBench unifies image and video editing evaluation with a nine-plus-eight operation taxonomy and cost-effective 4B/8B distilled MLLM evaluators that align with human judgments.
SIGMA: Semantic-Difference Instruction-Grounding Mask Annotator for Text-Driven Image Manipulation Localization cs.CV · 2026-05-27 · unverdicted · none · ref 34
SIGMA generates accurate IML masks via semantic feature differencing and instruction-guided cross-modal refinement, yielding a 1.1M training set that boosts six detectors by 18.34% F1 on five datasets.
LLaDA2.0-Uni: Unifying Multimodal Understanding and Generation with Diffusion Large Language Model cs.CV · 2026-04-22 · unverdicted · none · ref 8
LLaDA2.0-Uni unifies multimodal understanding and generation inside one discrete diffusion large language model with a semantic tokenizer, MoE backbone, and diffusion decoder.
Steering Visual Generation in Unified Multimodal Models with Understanding Supervision cs.CV · 2026-05-07 · unverdicted · none · ref 8
Using understanding tasks as direct supervision during post-training improves image generation and editing in unified multimodal models.

EditMGT: Unleashing potentials of masked generative transformers in image editing

citation-role summary

citation-polarity summary

fields

years

verdicts

roles

polarities

representative citing papers

citing papers explorer