Med-Banana: Learning Quality-Controlled Medical Image Editing from Success-and-Failure Trajectories
read the original abstract
Text-guided medical image editing must satisfy the requested pathology while preserving anatomy, modality-specific appearance, and clinical plausibility. However, existing datasets largely supervise editors with final accepted edits and discard the failed attempts produced during generation. We argue that these failures provide essential supervision for quality control: they specify what should be rejected, why an edit is medically or visually invalid, and how the instruction should be revised. We present Med-Banana, a trajectory-supervised framework for quality-controlled medical image editing. We introduce Med-Banana-80K, a large-scale resource of success-and-failure editing trajectories with candidate images, verification outcomes, rejection reasons, and prompt refinements. Building on it, Med-Banana jointly trains an editor, verifier, and refiner, enabling edit--verify--refine inference from accepted and rejected attempts. Experiments across MLLM judges, blind expert assessment, source-preservation and real--synthetic separability probes demonstrate consistent improvements over open medical image editors. Code and data are publicly available.
This paper has not been read by Pith yet.
Forward citations
Cited by 2 Pith papers
-
LLM-as-a-Judge in Healthcare: A Scoping Analysis of Applications, Methods, and Human Alignment
Scoping review of 134 studies on LLM-as-a-Judge in healthcare finds concentration in clinical decision support and NLP, frequent use of OpenAI models with prompt engineering, and moderate-to-strong human alignment whe...
-
Beyond Visual Forensics: Auditing Multimodal Robustness for Synthetic Medical Image Detection
VLMs for synthetic medical image detection overweight accompanying metadata, as demonstrated by a new paired benchmark that fixes the image and varies controlled text variants across imaging modalities.
discussion (0)
Sign in with ORCID, Apple, or X to comment. Anyone can read and Pith papers without signing in.