arxiv: 2605.13493 · v1 · submitted 2026-05-13 · 💻 cs.CV

Recognition: unknown

PhysEditBench: A Protocol-Conditioned Benchmark for Dense Physical-Map Prediction with Image Editors

Jiaxin Yang , Yu Hou , Muxin Liu , Weixuan Liu , Ze Yuan , Zeming Chen , Zhongrui Wang , Xiaojuan Qi

Authors on Pith no claims yet

Pith reviewed 2026-05-14 20:12 UTC · model grok-4.3

classification 💻 cs.CV

keywords image editorsphysical map predictiondense prediction benchmarkdepth estimationsurface normalsalbedoroughnessmetallic maps

0 comments

The pith

General image editors can output physical maps from RGB photos but trail specialized models on depth, normals, and albedo under fixed rules.

A machine-rendered reading of the paper's core claim, the machinery that carries it, and where it could break.

The paper asks whether prompt-guided image editors can serve as drop-in replacements for task-specific networks that predict depth, surface normals, albedo, roughness, and metallic maps from one photograph. It builds PhysEditBench to enforce the same input format, output format, and scoring rules for every editor and every map type, drawing test images from OpenRooms-FF, InteriorVerse, and procedurally generated scenes. Results show that dedicated models still lead on geometric properties, while editors reach or surpass them on some scalar scores for roughness and metallic yet produce structurally broken or lighting-sensitive outputs. If the gap narrows under these same rules, a single editor could replace multiple specialized predictors without retraining. The benchmark deliberately withholds optimal prompting so that reported scores reflect performance under controlled conditions rather than best-case interaction.

Core claim

Under protocol-conditioned evaluation that fixes allowed inputs, output formats, and scoring procedures, general-purpose image editors produce lower accuracy than specialized dense-prediction models on depth, normal, and albedo maps; on roughness and metallic maps they can equal or exceed the baselines on certain scalar metrics while still exhibiting structural errors, sparsity artifacts, and lighting sensitivity.

What carries the argument

PhysEditBench, the protocol-conditioned benchmark that pairs single RGB images with ground-truth physical maps, valid-region masks, and lighting-stress subsets, and that scores every editor under a single fixed protocol per target map.

If this is right

A single general editor could eventually handle multiple physical-property predictions if structural consistency improves under the same fixed rules.
Material maps such as roughness remain sensitive to lighting changes even when scalar metrics look competitive.
Curated datasets with scene-level sampling and quality masks provide a stable substrate for comparing editors against specialized models.
Stronger editors already generate more map-like outputs, indicating that scaling or better training may close the gap on geometric targets.

Where Pith is reading between the lines

These are editorial extensions of the paper, not claims the author makes directly.

The protocol approach could be applied to other dense-prediction tasks to make fair comparisons between general and specialized models.
If editors improve on metallic maps generated procedurally, the same method might supply ground truth for additional hard-to-measure properties.
Persistent lighting sensitivity suggests that future editor training should include explicit illumination variation during fine-tuning.
Success on this benchmark would reduce the engineering cost of deploying separate networks for each physical map.

Load-bearing premise

The chosen fixed protocols represent the realistic capabilities of image editors without letting them use more effective prompting or interaction strategies.

What would settle it

Re-running the identical test images and editors but replacing the benchmark's fixed protocol with each editor's own best free-form prompting and observing whether map accuracy rises substantially above the protocol scores.

Figures

Figures reproduced from arXiv: 2605.13493 by Jiaxin Yang, Muxin Liu, Weixuan Liu, Xiaojuan Qi, Yu Hou, Zeming Chen, Ze Yuan, Zhongrui Wang.

**Figure 1.** Figure 1: The motivation and result snapshot of PhysEditBench. PhysEditBench evaluates whether generalpurpose image editors can predict dense physical maps from a single RGB indoor image without task-specific training. The benchmark covers depth, normal, albedo, roughness, and metallic maps under target-specific access protocols. The figure shows representative predictions and primary-metric comparisons between pro… view at source ↗

**Figure 2.** Figure 2: Benchmark overview. The benchmark is organized around three components: dataset construction, targetspecific experimental design, and protocol diagnostics. Left: scene-level metadata are built from source-wise sampling, multi-view annotation, VLM-assisted auditing, and reviewer arbitration, followed by main-axis and photometric stress-axis split construction. Middle: each target uses a fixed access setti… view at source ↗

**Figure 3.** Figure 3: Representative qualitative examples across the five evaluated targets. Each row shows the input RGB image, ground-truth map, and predictions from representative specialized predictors and proprietary image editors for one target. The examples complement the quantitative results by illustrating differences in spatial structure, map fidelity, and target-specific failure patterns across depth, normal, albedo,… view at source ↗

**Figure 4.** Figure 4: The three fixed RGB–normal exemplar pairs used in the main normal-evaluation protocol. Each row shows [PITH_FULL_IMAGE:figures/full_fig_p028_4.png] view at source ↗

**Figure 5.** Figure 5: Representative qualitative examples for depth. Each row shows the input RGB image, ground-truth map, [PITH_FULL_IMAGE:figures/full_fig_p031_5.png] view at source ↗

**Figure 6.** Figure 6: Representative qualitative examples for normal. Each two rows show the input RGB image, ground-truth [PITH_FULL_IMAGE:figures/full_fig_p033_6.png] view at source ↗

**Figure 7.** Figure 7: Representative qualitative examples for albedo. Each row shows the input RGB image, ground-truth map, [PITH_FULL_IMAGE:figures/full_fig_p034_7.png] view at source ↗

**Figure 8.** Figure 8: Representative qualitative examples for roughness. Each row shows the input RGB image, ground-truth [PITH_FULL_IMAGE:figures/full_fig_p035_8.png] view at source ↗

**Figure 9.** Figure 9: Representative qualitative examples for metallic. Each two row shows the input RGB image, ground-truth [PITH_FULL_IMAGE:figures/full_fig_p035_9.png] view at source ↗

**Figure 10.** Figure 10: Representative WAN2.7-Image normal-representation failures under the evaluated normal access setting. [PITH_FULL_IMAGE:figures/full_fig_p044_10.png] view at source ↗

**Figure 11.** Figure 11: Representative InteriorVerse metallic-channel reliability examples. Within each pair, the left image is the [PITH_FULL_IMAGE:figures/full_fig_p047_11.png] view at source ↗

**Figure 12.** Figure 12: Representative InteriorVerse roughness-channel reliability examples. Within each pair, the left image is [PITH_FULL_IMAGE:figures/full_fig_p047_12.png] view at source ↗

read the original abstract

Can general-purpose image editors predict physical maps from a single RGB image? General-purpose image editors differ from standard task-specific dense-prediction models: they do not directly take an image and output a physical map. Instead, they must be guided by prompts, examples, or image-based textual cues. To this end, we introduce PhysEditBench, a novel protocol-conditioned benchmark to evaluate and standardize image editors in dense physical-map prediction that covers five targets: depth, normal, albedo, roughness, and metallic maps. For evaluation data, we build a target-dependent benchmark substrate. We use OpenRooms-FF for depth, surface normal, albedo, and roughness, InteriorVerse as an additional source for depth, normal, albedo, and a new procedurally generated source for metallic maps. We curate the data with quality checks, valid-region masks, scene-level sampling, and lighting-based stress subsets to ensure reliable and diverse evaluation. For each target, PhysEditBench defines a fixed protocol that specifies the allowed input, expected output format, and scoring procedure. Each score, therefore, reflects the performance of a model under a specified protocol, rather than its best possible performance under all prompts or interaction modes. Experimental results show that specialized models remain much stronger on depth, normal, and albedo, and stronger image editors can produce more reasonable map-like outputs. For roughness and metallic, image editors can match or outperform specialized baselines on some scalar metrics, but they still suffer from structural errors, sparsity effects, and sensitivity to lighting.

Editorial analysis

A structured set of objections, weighed in public.

Desk editor's note, referee report, simulated authors' rebuttal, and a circularity audit. Tearing a paper down is the easy half of reading it; the pith above is the substance, this is the friction.

Desk Editor's Note private letter to a colleague

PhysEditBench gives a clean, protocol-fixed way to test image editors on physical map prediction and shows the expected gap versus specialized models, with decent curation but limited real-world prompting flexibility.

read the letter

The core of this paper is PhysEditBench, a new benchmark that forces general image editors to produce depth, normal, albedo, roughness, and metallic maps from single RGB images under fixed protocols. They source data from OpenRooms-FF and InteriorVerse plus procedural metallic scenes, then add quality checks, valid-region masks, scene sampling, and lighting stress subsets. That setup lets them report comparable scores instead of prompt-optimized ones. The results are unsurprising but useful: specialized models stay ahead on depth, normals, and albedo, while stronger editors can match or beat baselines on some scalar roughness and metallic numbers yet still fail on structure, sparsity, and lighting changes. The separation of scalar versus structural metrics is a clear plus. The fixed protocols are the right call for a benchmark, though they do mean we learn less about what editors could achieve with free prompting or multi-turn interaction. The datasets are curated reasonably, but the work inherits whatever accuracy limits exist in OpenRooms and InteriorVerse ground truth. No major circularity or hidden fitting shows up. This is aimed at people working on general image editors or physical scene understanding who need a standardized testbed. A reader who cares about dense prediction benchmarks will find the protocol design and stress subsets practical. It deserves peer review because the evaluation framework is thoughtful and the claims stay qualified to the protocols used.

Referee Report

2 major / 2 minor

Summary. The paper introduces PhysEditBench, a protocol-conditioned benchmark for evaluating general-purpose image editors on dense physical-map prediction from single RGB images across five targets: depth, surface normals, albedo, roughness, and metallic. It constructs target-specific evaluation substrates from OpenRooms-FF, InteriorVerse, and procedurally generated metallic data, applying curation steps including quality checks, valid-region masks, scene sampling, and lighting-based stress subsets. Fixed protocols per target specify allowed inputs, output formats, and scoring procedures. Experiments show specialized models outperform editors on depth, normal, and albedo; editors produce more reasonable outputs with stronger models and can match or exceed specialized baselines on scalar metrics for roughness and metallic, though they exhibit structural errors, sparsity, and lighting sensitivity.

Significance. If the benchmark protocols and results hold, the work provides a standardized, reproducible framework for assessing image editors on physical-property tasks that are conventionally addressed by task-specific models. It highlights domain-specific strengths and gaps, particularly the editors' competitiveness on scalar roughness/metallic metrics versus persistent structural shortcomings. Credit is due for the multi-source dataset curation, explicit lighting stress tests, and separation of scalar versus structural evaluation, which together enable targeted diagnosis of editor limitations.

major comments (2)

[Abstract, §3] Abstract and evaluation protocol description: the fixed protocols are central to the claim that scores reflect performance 'under a specified protocol, rather than its best possible performance,' yet the manuscript provides only high-level specifications without the exact prompt templates, in-context example counts, or output-format constraints used for each target. This directly affects reproducibility and the interpretation of editor vs. specialized-model comparisons.
[§4, Table 2] Results section on roughness and metallic: the claim that editors 'can match or outperform specialized baselines on some scalar metrics' is load-bearing for the overall narrative, but the manuscript does not report the precise scalar metrics (e.g., MAE, RMSE) or the number of lighting-stress samples on which this holds, nor does it quantify the structural errors (e.g., via edge F1 or spatial correlation) that are said to remain.

minor comments (2)

[§2.2] Dataset curation paragraph: state the exact number of retained scenes per target after quality checks and the precise definition of 'valid-region masks' so that future extensions can replicate the substrate.
[Figure 4] Figure captions and legends: ensure every qualitative example explicitly annotates which protocol (including any example-image count) was applied to each editor output.

Simulated Author's Rebuttal

2 responses · 0 unresolved

We thank the referee for their constructive feedback and positive assessment of PhysEditBench. We address each major comment below and will incorporate the suggested clarifications in the revised manuscript.

read point-by-point responses

Referee: [Abstract, §3] Abstract and evaluation protocol description: the fixed protocols are central to the claim that scores reflect performance 'under a specified protocol, rather than its best possible performance,' yet the manuscript provides only high-level specifications without the exact prompt templates, in-context example counts, or output-format constraints used for each target. This directly affects reproducibility and the interpretation of editor vs. specialized-model comparisons.

Authors: We agree that the exact prompt templates, in-context example counts, and output-format constraints are necessary for full reproducibility. In the revised version we will expand the protocol description in §3 (and add a dedicated appendix) with the complete prompt templates, example counts, and output-format constraints for each of the five targets. These details will also be released with the benchmark code and data. revision: yes
Referee: [§4, Table 2] Results section on roughness and metallic: the claim that editors 'can match or outperform specialized baselines on some scalar metrics' is load-bearing for the overall narrative, but the manuscript does not report the precise scalar metrics (e.g., MAE, RMSE) or the number of lighting-stress samples on which this holds, nor does it quantify the structural errors (e.g., via edge F1 or spatial correlation) that are said to remain.

Authors: We acknowledge that the current presentation of scalar results and structural-error quantification is insufficient. In the revision we will update Table 2 and the §4 text to report the exact MAE and RMSE values, the precise number of lighting-stress samples, and additional structural metrics (edge F1 and spatial correlation) for roughness and metallic. This will make the comparison with specialized baselines fully transparent. revision: yes

Circularity Check

0 steps flagged

No significant circularity

full rationale

This is an empirical benchmark paper that defines fixed protocols for evaluating image editors on external datasets (OpenRooms-FF, InteriorVerse, procedural metallic maps) with explicit curation steps and quality checks. No derivations, equations, fitted parameters, or self-citation chains are present that reduce any claim to its own inputs by construction. All performance claims are qualified against the stated protocols and external ground truth, satisfying the self-contained benchmark criterion.

Axiom & Free-Parameter Ledger

0 free parameters · 2 axioms · 0 invented entities

The benchmark rests on domain assumptions about data quality and protocol fairness rather than new derivations or entities.

axioms (2)

domain assumption Curated datasets from OpenRooms-FF, InteriorVerse, and the procedural metallic source provide accurate and representative ground truth for physical maps.
Invoked in the data curation section of the abstract; no independent verification details provided beyond quality checks.
domain assumption Fixed protocols reflect realistic usage of image editors without introducing bias from prompt variation.
Central to the protocol-conditioned design described in the abstract.

pith-pipeline@v0.9.0 · 5599 in / 1422 out tokens · 39580 ms · 2026-05-14T20:12:51.860093+00:00 · methodology

discussion (0)

Reference graph

Works this paper leans on

33 extracted references · 10 canonical work pages · 1 internal anchor

[1]

Gwangbin Bae and Andrew J. Davison. Rethinking inductive biases for surface normal estimation. InProceed- ings of the IEEE/CVF Conference on Computer Vision and Pattern Recognition (CVPR), 2024

2024
[2]

Tim Brooks, Aleksander Holynski, and Alexei A. Efros. Instructpix2pix: Learning to follow image editing instructions. InProceedings of the IEEE/CVF Conference on Computer Vision and Pattern Recognition, 2023

2023
[3]

Seedream 5.0 lite.https://seed.bytedance.com/en/seedream5_0_lite, 2026

ByteDance Seed. Seedream 5.0 lite.https://seed.bytedance.com/en/seedream5_0_lite, 2026. Ac- cessed: 2026-05-06

2026
[4]

Colorful diffuse intrinsic image decomposition in the wild.ACM Transactions on Graphics, 43(6), 2024

Chris Careaga and Ya ˘gız Aksoy. Colorful diffuse intrinsic image decomposition in the wild.ACM Transactions on Graphics, 43(6), 2024. doi: 10.1145/3687984

work page doi:10.1145/3687984 2024
[5]

Intrinsicanything: Learning diffusion priors for inverse rendering under unknown illumination

Xi Chen, Sida Peng, Dongchen Yang, Yuan Liu, Bowen Pan, Chengfei Lv, and Xiaowei Zhou. Intrinsicanything: Learning diffusion priors for inverse rendering under unknown illumination. InEuropean Conference on Com- puter Vision, 2024

2024
[6]

Mair: Multi- view attention inverse rendering with 3d spatially-varying lighting estimation

JunYong Choi, SeokYeong Lee, Haesol Park, Seung-Won Jung, Ig-Jae Kim, and Junghyun Cho. Mair: Multi- view attention inverse rendering with 3d spatially-varying lighting estimation. InProceedings of the IEEE/CVF Conference on Computer Vision and Pattern Recognition (CVPR), 2023. Introduces the OpenRooms Forward Facing (OpenRooms FF) dataset

2023
[7]

Valentin Gabeur, Shangbang Long, Songyou Peng, Paul V oigtlaender, Shuyang Sun, Yanan Bao, Karen Truong, Zhicheng Wang, Wenlei Zhou, Jonathan T. Barron, Kyle Genova, Nithish Kannen, Sherry Ben, Yandong Li, Mandy Guo, Suhas Yogin, Yiming Gu, Huizhong Chen, Oliver Wang, Saining Xie, Howard Zhou, Kaiming He, Thomas Funkhouser, Jean-Baptiste Alayrac, and Radu...
[8]

Accepted by CVPR 2026

2026
[9]

Introducing gemini 2.5 flash image, google’s state-of-the-art image generation and editing model.https://developers.googleblog.com/en/introducing-gemini-2-5-flash-image/, 2025

Google. Introducing gemini 2.5 flash image, google’s state-of-the-art image generation and editing model.https://developers.googleblog.com/en/introducing-gemini-2-5-flash-image/, 2025. Accessed: 2026-05-06

2025
[10]

epbr: Extended pbr materials in image synthesis.arXiv preprint arXiv:2504.17062, 2025

Yu Guo, Zhiqiang Lao, Xiyun Song, Yubin Zhou, Zongfang Lin, and Heather Yu. epbr: Extended pbr materials in image synthesis.arXiv preprint arXiv:2504.17062, 2025

work page arXiv 2025
[11]

Lotus: Diffusion-based visual foundation model for high-quality dense prediction

Jing He, Haodong Li, Wei Yin, Yixun Liang, Leheng Li, Kaiqiang Zhou, Hongbo Zhang, Bingbing Liu, and Ying- Cong Chen. Lotus: Diffusion-based visual foundation model for high-quality dense prediction. InInternational Conference on Learning Representations (ICLR), 2025

2025
[12]

Intrinsic image diffusion for indoor single-view material estimation

Peter Kocsis, Vincent Sitzmann, and Matthias Nieß ner. Intrinsic image diffusion for indoor single-view material estimation. InProceedings of the IEEE/CVF Conference on Computer Vision and Pattern Recognition (CVPR), 2024

2024
[13]

2021 , url =

Zhengqin Li, Ting-Wei Yu, Shen Sang, Sarah Wang, Meng Song, Yuhan Liu, Yu-Ying Yeh, Rui Zhu, Nitesh Bharadwaj Gundavarapu, Jia Shi, Sai Bi, Hong-Xing Yu, Zexiang Xu, Kalyan Sunkavalli, Miloš Hašan, Ravi Ramamoorthi, and Manmohan Chandraker. Openrooms: An open framework for photorealistic indoor scene datasets. InProceedings of the IEEE/CVF Conference on C...

work page doi:10.1109/cvpr46437.2021.00711 2021
[14]

What’s in the image? a deep-dive into the vision of vision language models

Chih-Hao Lin, Jia-Bin Huang, Zhengqin Li, Zhao Dong, Christian Richardt, Tuotuo Li, Michael Zollhöfer, Jo- hannes Kopf, Shenlong Wang, and Changil Kim. Iris: Inverse rendering of indoor scenes from low dynamic range images. InProceedings of the IEEE/CVF Conference on Computer Vision and Pattern Recognition (CVPR), pages 465–474, 2025. doi: 10.1109/CVPR527...

work page doi:10.1109/cvpr52734.2025.00052 2025
[15]

Chen, Zhenyu Li, Guang Shi, Jiashi Feng, and Bingyi Kang

Haotong Lin, Sili Chen, Junhao Liew, Donny Y . Chen, Zhenyu Li, Guang Shi, Jiashi Feng, and Bingyi Kang. Depth anything 3: Recovering the visual space from any views, 2025. URLhttps://arxiv.org/abs/2511. 10647

2025
[16]

Hengyu Liu, Chenxin Li, Zhengxin Li, Yipeng Wu, Wuyang Li, Zhiqin Yang, Zhenyuan Zhang, Yunlong Lin, Sirui Han, and Brandon Y . Feng. Ir3d-bench: Evaluating vision-language model scene understanding as agentic inverse rendering. InAdvances in Neural Information Processing Systems, Track on Datasets and Benchmarks, 2025

2025
[17]

Wan-image: Pushing the boundaries of generative visual intelligence, 2026

Chaojie Mao, Chen-Wei Xie, Chongyang Zhong, Haoyou Deng, Jiaxing Zhao, Jie Xiao, Jinbo Xing, Jingfeng Zhang, Jingren Zhou, Jingyi Zhang, et al. Wan-image: Pushing the boundaries of generative visual intelligence, 2026

2026
[18]

Gpt image 1.5 model.https://developers.openai.com/api/docs/models/gpt-image-1.5,

OpenAI. Gpt image 1.5 model.https://developers.openai.com/api/docs/models/gpt-image-1.5,
[19]

Accessed: 2026-05-06

2026
[20]

Gpt image 2 model.https://developers.openai.com/api/docs/models/gpt-image-2, 2026

OpenAI. Gpt image 2 model.https://developers.openai.com/api/docs/models/gpt-image-2, 2026. Accessed: 2026-05-06

2026
[21]

Qwen-image-2.0: Professional infographics, exquisite text rendering, and unified generation + editing.https://qwen.ai/blog?id=qwen-image-2.0, 2026

Qwen Team. Qwen-image-2.0: Professional infographics, exquisite text rendering, and unified generation + editing.https://qwen.ai/blog?id=qwen-image-2.0, 2026. Accessed: 2026-05-06

2026
[22]

Infinite photorealistic worlds using procedural generation

Alexander Raistrick, Lahav Lipson, Zeyu Ma, Lingjie Mei, Mingzhe Wang, Yiming Zuo, Karhan Kayan, Hongyu Wen, Beining Han, Yihan Wang, Alejandro Newell, Hei Law, Ankit Goyal, Kaiyu Yang, and Jia Deng. Infinite photorealistic worlds using procedural generation. InProceedings of the IEEE/CVF Conference on Computer Vision and Pattern Recognition (CVPR), pages...

2023
[23]

Infinigen indoors: Photorealistic indoor scenes using procedural generation

Alexander Raistrick, Lingjie Mei, Karhan Kayan, David Yan, Yiming Zuo, Beining Han, Hongyu Wen, Meenal Parakh, Stamatis Alexandropoulos, Lahav Lipson, Zeyu Ma, and Jia Deng. Infinigen indoors: Photorealistic indoor scenes using procedural generation. InProceedings of the IEEE/CVF Conference on Computer Vision and Pattern Recognition (CVPR), pages 21783–21...

2024
[24]

Susskind

Mike Roberts, Jason Ramapuram, Anurag Ranjan, Atulit Kumar, Miguel Angel Bautista, Nathan Paczan, Russ Webb, and Joshua M. Susskind. Hypersim: A photorealistic synthetic dataset for holistic indoor scene under- standing. InProceedings of the IEEE/CVF International Conference on Computer Vision (ICCV), 2021

2021
[25]

Emu edit: Precise image editing via recognition and generation tasks

Shelly Sheynin, Adam Polyak, Uriel Singer, Yuval Kirstain, Amit Zohar, Oron Ashual, Devi Parikh, and Yaniv Taigman. Emu edit: Precise image editing via recognition and generation tasks. InProceedings of the IEEE/CVF Conference on Computer Vision and Pattern Recognition, 2024

2024
[26]

Materialist: Physically Based Editing Using Single-Image Inverse Rendering

Lezhong Wang, Duc Minh Tran, Ruiqi Cui, Thomson TG, Anders Bjorholm Dahl, Siavash Arjomand Bigdeli, Jeppe Revall Frisvad, and Manmohan Chandraker. Materialist: Physically based editing using single-image inverse rendering, 2025. URLhttps://arxiv.org/abs/2501.03717

work page internal anchor Pith review Pith/arXiv arXiv 2025
[27]

Boosting 3d object generation through pbr materials, 2024

Yao Wang et al. Boosting 3d object generation through pbr materials, 2024. URLhttps://arxiv.org/abs/ 2411.16080

work page arXiv 2024
[28]

MVInverse: Feed-forward multi-view inverse rendering in seconds.arXiv preprint arXiv:2512.21003, 2025

Xiangzuo Wu, Chengwei Ren, Jun Zhou, Xiu Li, and Yuan Liu. MVInverse: Feed-forward multi-view inverse rendering in seconds.arXiv preprint arXiv:2512.21003, 2025

work page arXiv 2025
[29]

Stablenormal: Reducing diffusion variance for stable and sharp normal.ACM Transactions on Graphics (TOG), 43(6):1–18, 2024

Chongjie Ye, Lingteng Qiu, Xiaodong Gu, Qi Zuo, Yushuang Wu, Zilong Dong, Liefeng Bo, Yuliang Xiu, and Xiaoguang Han. Stablenormal: Reducing diffusion variance for stable and sharp normal.ACM Transactions on Graphics (TOG), 43(6):1–18, 2024. doi: 10.1145/3687971

work page doi:10.1145/3687971 2024
[30]

Rgb↔x: Image decomposition and synthesis using material- and lighting-aware diffusion models

Zheng Zeng, Valentin Deschaintre, Iliyan Georgiev, Yannick Hold-Geoffroy, Yiwei Hu, Fujun Luan, Ling-Qi Yan, and Miloš Hašan. Rgb↔x: Image decomposition and synthesis using material- and lighting-aware diffusion models. InACM SIGGRAPH 2024 Conference Papers, 2024. doi: 10.1145/3641519.3657445

work page doi:10.1145/3641519.3657445 2024
[31]

Magicbrush: A manually annotated dataset for instruction-guided image editing

Kai Zhang, Lingbo Mo, Wenhu Chen, Huan Sun, and Yu Su. Magicbrush: A manually annotated dataset for instruction-guided image editing. InAdvances in Neural Information Processing Systems, 2023

2023
[32]

Dnf-intrinsic: Deterministic noise-free diffusion for indoor inverse rendering

Rongjia Zheng, Qing Zhang, Chengjiang Long, and Wei-Shi Zheng. Dnf-intrinsic: Deterministic noise-free diffusion for indoor inverse rendering. InProceedings of the IEEE/CVF International Conference on Computer Vision (ICCV), 2025

2025
[33]

Learning-based inverse rendering of complex indoor scenes with differentiable monte carlo raytracing

Jingsen Zhu, Fujun Luan, Yuchi Huo, Zihao Lin, Zhihua Zhong, Dianbing Xi, Rui Wang, Hujun Bao, Jiaxiang Zheng, and Rui Tang. Learning-based inverse rendering of complex indoor scenes with differentiable monte carlo raytracing. InSIGGRAPH Asia 2022 Conference Papers, 2022. doi: 10.1145/3550469.3555407. 10 PhysEditBench: A Protocol-Conditioned Benchmark for...

work page doi:10.1145/3550469.3555407 2022