arxiv: 2604.13863 · v1 · submitted 2026-04-15 · 💻 cs.CV

Recognition: unknown

PostureObjectstitch: Anomaly Image Generation Considering Assembly Relationships in Industrial Scenarios

Dongpu Cao, Gang Chen, Hao Chen, Hongchang Chen, Jieming Zhang, Ying Li, Yujie Lei, Yushi Liu, Zebei Tong, Zhi Zheng

Pith reviewed 2026-05-10 14:13 UTC · model grok-4.3

classification 💻 cs.CV

keywords anomaly image generationindustrial assemblydiffusion modelsgeometric priorcondition decouplingsynthetic dataDreamAssembly datasetanomaly detection

0 comments

The pith

A diffusion model generates industrial anomaly images that respect component assembly poses and relationships.

A machine-rendered reading of the paper's core claim, the machinery that carries it, and where it could break.

The paper introduces PostureObjectStitch to create synthetic anomaly images of assembled industrial parts where each component sits in its correct pose and orientation. Standard generation methods produce images that ignore these physical constraints, so the results cannot train reliable anomaly detectors for real assembly lines. The method decouples multi-view inputs into texture, high-frequency, and RGB features, modulates them over diffusion steps for progressive detail, and applies a geometric prior plus conditional loss to lock in semantic accuracy and proper positioning. If this holds, manufacturers could generate large volumes of usable training data to compensate for the rarity of real anomalies. Experiments across the MureCom dataset, the new DreamAssembly dataset, and a downstream detection task support the approach.

Core claim

PostureObjectStitch separates multi-view images into high-frequency, texture, and RGB features via condition decoupling, then adapts these features across diffusion time-steps through temporal modulation to build consistent coarse-to-fine outputs. A conditional loss strengthens key industrial elements while a geometric prior directs component placement to satisfy assembly relationships.

What carries the argument

Condition decoupling of multi-view inputs into separate feature streams, combined with temporal modulation in diffusion and a geometric prior that enforces assembly relationships.

If this is right

The generated images can supplement limited real anomaly data to train stronger industrial detection models.
Progressive generation maintains multi-view consistency while adding fine details only after coarse structure is set.
The method is shown to outperform prior techniques on the MureCom dataset and the contributed DreamAssembly dataset.
Downstream anomaly detection performance improves when models are trained with the assembly-aware synthetic images.

Where Pith is reading between the lines

These are editorial extensions of the paper, not claims the author makes directly.

Similar geometric priors could be tested in other constrained generation tasks such as robotic scene assembly or mechanical part layouts.
The feature decoupling step may help diffusion models in any domain where multiple input views must remain consistent with physical structure.
If the prior scales without heavy tuning, it offers a route to reduce manual labeling in quality-control pipelines for complex products.

Load-bearing premise

The geometric prior and conditional loss together force generated images to show correct component positions and semantics without creating new misalignments or visual artifacts.

What would settle it

Quantitative metrics or visual checks on real assembled industrial images showing that generated parts are often rotated or shifted relative to each other in violation of the claimed assembly rules.

Figures

Figures reproduced from arXiv: 2604.13863 by Dongpu Cao, Gang Chen, Hao Chen, Hongchang Chen, Jieming Zhang, Ying Li, Yujie Lei, Yushi Liu, Zebei Tong, Zhi Zheng.

**Figure 1.** Figure 1: Fantastic application of our proposed PostureObjectstitch in industrial anomaly generation considering assembly [PITH_FULL_IMAGE:figures/full_fig_p001_1.png] view at source ↗

**Figure 2.** Figure 2: Overview of PostureObjectstitch. Given N reference images of a specific sample, PostureObjectstitch fine-tunes the [PITH_FULL_IMAGE:figures/full_fig_p003_2.png] view at source ↗

**Figure 3.** Figure 3: OCR auxiliary loss. 3.2 Time-feature modulation Based on the consensus from prior work [24, 35, 42, 44, 45], the image generation process of diffusion models follows a coarse-to-fine paradigm. As demonstrated in [9], features at different hierarchical levels play distinct roles across different timesteps. Motivated by this observation, we introduce a time-feature modulation module that associates timestep… view at source ↗

**Figure 4.** Figure 4: Pose and orientation prior fusion. information. In image replacement tasks, most existing methods focus solely on the consistency of object morphology and category before and after replacement, while neglecting the consistency of geometric information, which is crucial for maintaining assembly relationships in industrial scenarios. To address this issue, we introduce pose and orientation prior information… view at source ↗

**Figure 5.** Figure 5: Dreamassembly dataset overview. Background:The background images are collected from real industrial environments [PITH_FULL_IMAGE:figures/full_fig_p006_5.png] view at source ↗

**Figure 6.** Figure 6: Visualization of different methods on the MureCom dataset. For each row, we present the background image with [PITH_FULL_IMAGE:figures/full_fig_p007_6.png] view at source ↗

**Figure 7.** Figure 7: Visualization of different methods on our DreamAssembly dataset. For each row, we present the background image [PITH_FULL_IMAGE:figures/full_fig_p007_7.png] view at source ↗

read the original abstract

Image generation technology can synthesize condition-specific images to supplement real-world industrial anomaly data and enhance anomaly detection model performance. Existing generation techniques rarely account for the pose and orientation of industrial components in assembly, making the generated images difficult to utilize for downstream application. To solve this, we propose a novel image synthesis approach, called PostureObjectStitch, that achieves accurate generation to meet the requirement of industrial assembly. A condition decoupling approach is introduced to separate input multi-view images into high-frequency, texture, and RGB features. The feature temporal modulation mechanism adapts these features across diffusion model time-steps, enabling progressive generation from coarse to fine details while maintaining consistency. To ensure semantic accuracy, we introduce a conditional loss that enhances critical industrial elements and a geometric prior that guides component positioning for correct assembly relationships. Comprehensive experimental results on the MureCom dataset, our newly contributed DreamAssembly dataset, and the downstream application validate the outstanding performance of our method.

Editorial analysis

A structured set of objections, weighed in public.

Desk editor's note, referee report, simulated authors' rebuttal, and a circularity audit. Tearing a paper down is the easy half of reading it; the pith above is the substance, this is the friction.

Desk Editor's Note private letter to a colleague

The paper adds feature decoupling, temporal modulation, and a geometric prior to diffusion models for assembly-aware industrial anomaly images, but validates the pose constraints only through indirect downstream gains.

read the letter

This paper targets a real gap: generating fake anomaly images for factory parts that still sit in the right positions relative to each other. Standard diffusion approaches often scramble the layout, so the outputs can't train detectors that care about assembly context. Their fix splits the input conditions into high-frequency, texture, and RGB streams, then modulates those features across diffusion steps while adding a conditional loss and a geometric prior to keep components aligned. They also release the DreamAssembly dataset, which is a concrete contribution others can use. The experiments run on MureCom, their new set, and a downstream anomaly detection task, showing measurable improvements there. That gives the work some practical grounding. The soft spot is the missing direct test of the geometric prior. We see no pose deviation numbers, alignment error, or ablation that isolates whether the prior actually reduces component overlap or misplacement versus just improving texture realism. If the downstream lift comes from other pieces, the central claim about enforcing assembly relationships stays unproven. The method itself looks coherent and avoids obvious circularity or unfalsifiable claims. This is for researchers in industrial computer vision who need better synthetic data for defect detection on assembled products. A reader building or evaluating anomaly models in manufacturing would get a usable pipeline and dataset to try. It deserves peer review because the problem is well-motivated and the technical pieces are spelled out, even if the geometric validation needs tightening before publication.

Referee Report

1 major / 0 minor

Summary. The manuscript proposes PostureObjectStitch, a diffusion-based image synthesis method for generating anomaly images of industrial components that respects assembly relationships. It decouples multi-view inputs into high-frequency, texture, and RGB features, applies feature temporal modulation across diffusion timesteps for coarse-to-fine generation, introduces a conditional loss to emphasize critical elements, and employs a geometric prior to enforce correct component positioning. The approach is evaluated on the MureCom dataset, the newly contributed DreamAssembly dataset, and a downstream anomaly detection task, with claims of superior performance over existing methods.

Significance. If the central claims are substantiated, the work addresses an important gap in synthetic data generation for industrial anomaly detection by explicitly modeling assembly poses and relationships, which prior diffusion-based approaches largely ignore. The release of the DreamAssembly dataset represents a concrete, reusable contribution that could benchmark future methods in this domain. The combination of geometric priors with conditional losses in a diffusion framework offers a technically grounded direction for controllable generation in structured scenes.

major comments (1)

[Experimental Results] The central technical claim—that the geometric prior and conditional loss successfully enforce correct assembly relationships and semantic accuracy—rests on indirect evidence from downstream anomaly detection gains rather than direct quantitative validation. No metrics for pose deviation, component alignment error, overlap, or geometric fidelity on the generated DreamAssembly outputs are reported, leaving open the possibility that performance improvements arise from texture realism or other factors unrelated to the proposed priors.

Simulated Author's Rebuttal

1 responses · 0 unresolved

We thank the referee for their constructive feedback and for recognizing the importance of modeling assembly relationships in industrial anomaly image generation. We address the single major comment below.

read point-by-point responses

Referee: The central technical claim—that the geometric prior and conditional loss successfully enforce correct assembly relationships and semantic accuracy—rests on indirect evidence from downstream anomaly detection gains rather than direct quantitative validation. No metrics for pose deviation, component alignment error, overlap, or geometric fidelity on the generated DreamAssembly outputs are reported, leaving open the possibility that performance improvements arise from texture realism or other factors unrelated to the proposed priors.

Authors: We agree that direct quantitative metrics would provide stronger and more isolated evidence for the contribution of the geometric prior and conditional loss. The current evaluation relies on downstream anomaly detection performance on DreamAssembly (plus qualitative results), which demonstrates practical utility but does not directly quantify geometric fidelity. In the revised manuscript we will add explicit metrics on the generated DreamAssembly outputs, including pose deviation, component alignment error, and overlap ratios, computed by comparing synthesized assemblies against the known ground-truth configurations provided in the dataset. These will be reported alongside the existing downstream results to better separate the effect of the proposed priors from general improvements in texture or realism. revision: yes

Circularity Check

0 steps flagged

No significant circularity; novel components added to standard diffusion models

full rationale

The paper proposes independent additions (condition decoupling, feature temporal modulation, conditional loss, geometric prior) to diffusion models and contributes a new DreamAssembly dataset. These are described as new mechanisms for enforcing assembly relationships and semantic accuracy rather than being defined in terms of the outputs they produce or fitted to the target results by construction. Experimental validation on MureCom, DreamAssembly, and downstream tasks is presented without any quoted reduction of predictions to inputs, self-citation chains, or ansatz smuggling. The derivation chain is self-contained.

Axiom & Free-Parameter Ledger

1 free parameters · 1 axioms · 1 invented entities

Based on abstract only; central claim rests on standard diffusion model assumptions plus newly introduced geometric prior and conditional loss whose effectiveness is asserted but not detailed.

free parameters (1)

loss weighting coefficients
Weights balancing conditional loss and geometric prior are likely tuned but unspecified in abstract.

axioms (1)

domain assumption Diffusion models conditioned on decoupled multi-view features can produce consistent progressive generation from coarse to fine while preserving assembly semantics.
Invoked to justify the feature temporal modulation mechanism.

invented entities (1)

geometric prior no independent evidence
purpose: Guides component positioning to maintain correct assembly relationships.
Newly introduced to ensure semantic accuracy in generated images.

pith-pipeline@v0.9.0 · 5487 in / 1326 out tokens · 62400 ms · 2026-05-10T14:13:25.247549+00:00 · methodology

discussion (0)

Reference graph

Works this paper leans on

55 extracted references · 50 canonical work pages · 7 internal anchors

[1]

Yuval Alaluf, Elad Richardson, Gal Metzer, and Daniel Cohen-Or. 2023. A Neural Space-Time Representation for Text-to-Image Personalization. arXiv:2305.15391 [cs.CV] https://arxiv.org/abs/2305.15391

work page arXiv 2023
[2]

James Betker, Gabriel Goh, Li Jing,†TimBrooks, Jianfeng Wang, Linjie Li,†Lon- gOuyang,†JuntangZhuang,†JoyceLee,†YufeiGuo,†WesamManassra,†Praful- laDhariwal,†CaseyChu,†YunxinJiao, and Aditya Ramesh. [n. d.]. Improving Im- age Generation with Better Captions. https://api.semanticscholar.org/CorpusID: 264403242
[3]

Tim Brooks, Aleksander Holynski, and Alexei A. Efros. 2023. InstructPix2Pix: Learning to Follow Image Editing Instructions. arXiv:2211.09800 [cs.CV] https: //arxiv.org/abs/2211.09800

work page arXiv 2023
[4]

Jiaxuan Chen, Bo Zhang, Qingdong He, Jinlong Peng, and Li Niu. 2025. Mure- ObjectStitch: Multi-reference Image Composition. arXiv:2411.07462 [cs.CV] https://arxiv.org/abs/2411.07462

work page arXiv 2025
[5]

Xi Chen, Lianghua Huang, Yu Liu, Yujun Shen, Deli Zhao, and Heng- shuang Zhao. 2024. AnyDoor: Zero-shot Object-level Image Customization. arXiv:2307.09481 [cs.CV] https://arxiv.org/abs/2307.09481

work page arXiv 2024
[6]

Songmin Dai, Yifan Wu, Xiaoqiang Li, and Xiangyang Xue. 2023. Generating and Reweighting Dense Contrastive Patterns for Unsupervised Anomaly Detection. arXiv:2312.15911 [cs.CV] https://arxiv.org/abs/2312.15911

work page arXiv 2023
[7]

Zhewei Dai, Shilei Zeng, Haotian Liu, Xurui Li, Feng Xue, and Yu Zhou. 2025. SeaS: Few-shot Industrial Anomaly Image Generation with Separation and Sharing Fine-tuning. arXiv:2410.14987 [cs.CV] https://arxiv.org/abs/2410.14987

work page arXiv 2025
[8]

Dalal and B

N. Dalal and B. Triggs. 2005. Histograms of oriented gradients for human detec- tion. In2005 IEEE Computer Society Conference on Computer Vision and Pattern Recognition (CVPR’05), Vol. 1. 886–893 vol. 1. doi:10.1109/CVPR.2005.177

work page doi:10.1109/cvpr.2005.177 2005
[9]

Dale Decatur, Thibault Groueix, Wang Yifan, Rana Hanocka, Vladimir Kim, and Matheus Gadelha. 2025. Reusing Computation in Text-to-Image Diffusion for Efficient Generation of Image Sets. InProceedings of the IEEE/CVF International Conference on Computer Vision. 16482–16491

2025
[10]

Ziyi Dong, Pengxu Wei, and Liang Lin. 2025. DreamArtist++: Control- lable One-Shot Text-to-Image Generation via Positive-Negative Adapter. arXiv:2211.11337 [cs.CV] https://arxiv.org/abs/2211.11337

work page arXiv 2025
[11]

An Image is Worth One Word: Personalizing Text-to-Image Generation using Textual Inversion

Rinon Gal, Yuval Alaluf, Yuval Atzmon, Or Patashnik, Amit H. Bermano, Gal Chechik, and Daniel Cohen-Or. 2022. An Image is Worth One Word: Personalizing Text-to-Image Generation using Textual Inversion. arXiv:2208.01618 [cs.CV] https://arxiv.org/abs/2208.01618

work page internal anchor Pith review arXiv 2022
[12]

Yuchao Gu, Xintao Wang, Jay Zhangjie Wu, Yujun Shi, Yunpeng Chen, Zihan Fan, Wuyou Xiao, Rui Zhao, Shuning Chang, Weijia Wu, Yixiao Ge, Ying Shan, and Mike Zheng Shou. 2023. Mix-of-Show: Decentralized Low-Rank Adaptation for Multi-Concept Customization of Diffusion Models. arXiv:2305.18292 [cs.CV] https://arxiv.org/abs/2305.18292

work page arXiv 2023
[13]

Teng Hu, Jiangning Zhang, Ran Yi, Yuzhen Du, Xu Chen, Liang Liu, Yabiao Wang, and Chengjie Wang. 2024. AnomalyDiffusion: Few-Shot Anomaly Image Generation with Diffusion Model. arXiv:2312.05767 [cs.CV] https://arxiv.org/ abs/2312.05767

work page arXiv 2024
[14]

Zongxiang Hu and Zhaosheng Zhang. 2024. SOWA: Adapting Hierarchical Frozen Window Self-Attention to Visual-Language Models for Better Anomaly Detection. arXiv:2407.03634 [cs.CV] https://arxiv.org/abs/2407.03634

work page arXiv 2024
[15]

Zhe Kong, Yong Zhang, Tianyu Yang, Tao Wang, Kaihao Zhang, Bizhu Wu, Guany- ing Chen, Wei Liu, and Wenhan Luo. 2024. OMG: Occlusion-friendly Personal- ized Multi-concept Generation in Diffusion Models. arXiv:2403.10983 [cs.CV] https://arxiv.org/abs/2403.10983

work page arXiv 2024
[16]

Nupur Kumari, Bingliang Zhang, Richard Zhang, Eli Shechtman, and Jun- Yan Zhu. 2023. Multi-Concept Customization of Text-to-Image Diffusion. arXiv:2212.04488 [cs.CV] https://arxiv.org/abs/2212.04488

work page arXiv 2023
[17]

Dongxu Li, Junnan Li, and Steven C. H. Hoi. 2023. BLIP-Diffusion: Pre-trained Subject Representation for Controllable Text-to-Image Generation and Editing. arXiv:2305.14720 [cs.CV] https://arxiv.org/abs/2305.14720

work page arXiv 2023
[18]

Tianle Li, Max Ku, Cong Wei, and Wenhu Chen. 2023. DreamEdit: Subject-driven Image Editing. arXiv:2306.12624 [cs.CV] https://arxiv.org/abs/2306.12624

work page arXiv 2023
[19]

Yuanwei Li, Elizaveta Ivanova, and Martins Bruveris. 2024. FADE: Few- shot/zero-shot Anomaly Detection Engine using Large Vision-Language Model. arXiv:2409.00556 [cs.CV] https://arxiv.org/abs/2409.00556

work page arXiv 2024
[20]

Yaowei Li, Xiaoyu Li, Zhaoyang Zhang, Yuxuan Bian, Gan Liu, Xinyuan Li, Jiale Xu, Wenbo Hu, Yating Liu, Lingen Li, Jing Cai, Yuexian Zou, Yancheng He, and Ying Shan. 2025. IC-Custom: Diverse Image Customization via In-Context Learning. arXiv:2507.01926 [cs.CV] https://arxiv.org/abs/2507.01926

work page arXiv 2025
[22]

Jianxiang Lu, Cong Xie, and Hui Guo. 2024. Object-Driven One- Shot Fine-tuning of Text-to-Image Diffusion with Prototypical Embedding. arXiv:2401.15708 [cs.CV] https://arxiv.org/abs/2401.15708

work page arXiv 2024
[23]

Lingxiao Lu, Jiangtong Li, Bo Zhang, and Li Niu. 2024. DreamCom: Finetuning Text-guided Inpainting Model for Image Composition. arXiv:2309.15508 [cs.CV] https://arxiv.org/abs/2309.15508

work page arXiv 2024
[24]

Andreas Lugmayr, Martin Danelljan, Andres Romero, Fisher Yu, Radu Timo- fte, and Luc Van Gool. 2022. RePaint: Inpainting using Denoising Diffusion Probabilistic Models. arXiv:2201.09865 [cs.CV] https://arxiv.org/abs/2201.09865

work page arXiv 2022
[25]

Alex Nichol, Prafulla Dhariwal, Aditya Ramesh, Pranav Shyam, Pamela Mishkin, Bob McGrew, Ilya Sutskever, and Mark Chen. 2022. GLIDE: Towards Photo- realistic Image Generation and Editing with Text-Guided Diffusion Models. arXiv:2112.10741 [cs.CV] https://arxiv.org/abs/2112.10741

work page internal anchor Pith review arXiv 2022
[26]

Dustin Podell, Zion English, Kyle Lacey, Andreas Blattmann, Tim Dockhorn, Jonas Müller, Joe Penna, and Robin Rombach. 2023. SDXL: Improving Latent Diffusion Models for High-Resolution Image Synthesis. arXiv:2307.01952 [cs.CV] https://arxiv.org/abs/2307.01952

work page internal anchor Pith review arXiv 2023
[27]

Alec Radford, Jong Wook Kim, Chris Hallacy, Aditya Ramesh, Gabriel Goh, Sandhini Agarwal, Girish Sastry, Amanda Askell, Pamela Mishkin, Jack Clark, Gretchen Krueger, and Ilya Sutskever. 2021. Learning Transferable Visual Models From Natural Language Supervision. arXiv:2103.00020 [cs.CV] https://arxiv.org/ abs/2103.00020

work page internal anchor Pith review Pith/arXiv arXiv 2021
[28]

Aditya Ramesh, Prafulla Dhariwal, Alex Nichol, Casey Chu, and Mark Chen
[29]

Hierarchical Text-Conditional Image Generation with CLIP Latents

Hierarchical Text-Conditional Image Generation with CLIP Latents. arXiv:2204.06125 [cs.CV] https://arxiv.org/abs/2204.06125

work page internal anchor Pith review arXiv
[30]

Robin Rombach, Andreas Blattmann, Dominik Lorenz, Patrick Esser, and Björn Ommer. 2022. High-Resolution Image Synthesis with Latent Diffusion Models. arXiv:2112.10752 [cs.CV] https://arxiv.org/abs/2112.10752

work page Pith review arXiv 2022
[31]

Nataniel Ruiz, Yuanzhen Li, Varun Jampani, Yael Pritch, Michael Rubinstein, and Kfir Aberman. 2023. DreamBooth: Fine Tuning Text-to-Image Diffusion Models for Subject-Driven Generation. arXiv:2208.12242 [cs.CV] https://arxiv.org/abs/ 2208.12242

work page arXiv 2023
[32]

Nataniel Ruiz, Yuanzhen Li, Varun Jampani, Wei Wei, Tingbo Hou, Yael Pritch, Neal Wadhwa, Michael Rubinstein, and Kfir Aberman. 2024. Hyper- DreamBooth: HyperNetworks for Fast Personalization of Text-to-Image Models. arXiv:2307.06949 [cs.CV] https://arxiv.org/abs/2307.06949

work page arXiv 2024
[33]

Photorealistic Text-to-Image Diffusion Models with Deep Language Understanding

Chitwan Saharia, William Chan, Saurabh Saxena, Lala Li, Jay Whang, Emily Den- ton, Seyed Kamyar Seyed Ghasemipour, Burcu Karagol Ayan, S. Sara Mahdavi, Rapha Gontijo Lopes, Tim Salimans, Jonathan Ho, David J Fleet, and Mohammad Norouzi. 2022. Photorealistic Text-to-Image Diffusion Models with Deep Lan- guage Understanding. arXiv:2205.11487 [cs.CV] https:/...

work page internal anchor Pith review arXiv 2022
[34]

Kihyuk Sohn, Nataniel Ruiz, Kimin Lee, Daniel Castro Chin, Irina Blok, Huiwen Chang, Jarred Barber, Lu Jiang, Glenn Entis, Yuanzhen Li, Yuan Hao, Irfan Essa, Michael Rubinstein, and Dilip Krishnan. 2023. StyleDrop: Text-to-Image Genera- tion in Any Style. arXiv:2306.00983 [cs.CV] https://arxiv.org/abs/2306.00983

work page arXiv 2023
[35]

Yizhi Song, Zhifei Zhang, Zhe Lin, Scott Cohen, Brian Price, Jianming Zhang, Soo Ye Kim, and Daniel Aliaga. 2022. ObjectStitch: Generative Object Composit- ing. arXiv:2212.00932 [cs.CV] https://arxiv.org/abs/2212.00932

work page arXiv 2022
[36]

Mingyu Sung, Il-Min Kim, Sangseok Yun, and Jae-Mo Kang. 2025. H2-Cache: A Novel Hierarchical Dual-Stage Cache for High-Performance Acceleration of Generative Diffusion Models. arXiv:2510.27171 [cs.CV] https://arxiv.org/abs/ 2510.27171

work page arXiv 2025
[37]

Yoad Tewel, Rinon Gal, Gal Chechik, and Yuval Atzmon. 2024. Key-Locked Rank One Editing for Text-to-Image Personalization. arXiv:2305.01644 [cs.CV] https://arxiv.org/abs/2305.01644

work page arXiv 2024
[38]

Anton Voronov, Mikhail Khoroshikh, Artem Babenko, and Max Ryabinin. 2023. Is This Loss Informative? Faster Text-to-Image Customization by Tracking Ob- jective Dynamics. arXiv:2302.04841 [cs.CV] https://arxiv.org/abs/2302.04841

work page arXiv 2023
[39]

Andrey Voynov, Qinghao Chu, Daniel Cohen-Or, and Kfir Aberman
[40]

P+: Extended textual conditioning in text-to-image generation.arXiv preprint arXiv:2303.09522, 2023

P+: Extended Textual Conditioning in Text-to-Image Generation. arXiv:2303.09522 [cs.CV] https://arxiv.org/abs/2303.09522

work page arXiv
[41]

Qixun Wang, Xu Bai, Haofan Wang, Zekui Qin, Anthony Chen, Huaxia Li, Xu Tang, and Yao Hu. 2024. InstantID: Zero-shot Identity-Preserving Generation in Seconds. arXiv:2401.07519 [cs.CV] https://arxiv.org/abs/2401.07519

work page arXiv 2024
[42]

Bovik, H.R

Zhou Wang, A.C. Bovik, H.R. Sheikh, and E.P. Simoncelli. 2004. Image quality assessment: from error visibility to structural similarity.IEEE Transactions on Image Processing13, 4 (2004), 600–612. doi:10.1109/TIP.2003.819861

work page doi:10.1109/tip.2003.819861 2004
[43]

Zhonghao Wang, Wei Wei, Yang Zhao, Zhisheng Xiao, Mark Hasegawa-Johnson, Humphrey Shi, and Tingbo Hou. 2023. HiFi Tuner: High-Fidelity Subject-Driven Fine-Tuning for Diffusion Models. arXiv:2312.00079 [cs.CV] https://arxiv.org/ abs/2312.00079

work page arXiv 2023
[44]

Felix Wimbauer, Bichen Wu, Edgar Schoenfeld, Xiaoliang Dai, Ji Hou, Zijian He, Artsiom Sanakoyeu, Peizhao Zhang, Sam Tsai, Jonas Kohler, Christian Rupprecht, Daniel Cremers, Peter Vajda, and Jialiang Wang. 2024. Cache Me if You Can: Accelerating Diffusion Models through Block Caching. arXiv:2312.03209 [cs.CV] https://arxiv.org/abs/2312.03209

work page arXiv 2024
[45]

Chendong Xiang, Fan Bao, Chongxuan Li, Hang Su, and Jun Zhu. 2023. A Closer Look at Parameter-Efficient Tuning in Diffusion Models. arXiv:2303.18181 [cs.CV] https://arxiv.org/abs/2303.18181

work page arXiv 2023
[46]

Jinqi Xiao, Miao Yin, Yu Gong, Xiao Zang, Jian Ren, and Bo Yuan. 2023. COMCAT: Towards Efficient Compression and Customization of Attention-Based Vision Models. arXiv:2305.17235 [cs.CV] https://arxiv.org/abs/2305.17235 Conference’17, July 2017, Washington, DC, USA Trovato et al

work page arXiv 2023
[47]

Yuhao Xu, Tao Gu, Weifeng Chen, and Chengcai Chen. 2024. OOTDiffu- sion: Outfitting Fusion based Latent Diffusion for Controllable Virtual Try-on. arXiv:2403.01779 [cs.CV] https://arxiv.org/abs/2403.01779

work page arXiv 2024
[48]

Binxin Yang, Shuyang Gu, Bo Zhang, Ting Zhang, Xuejin Chen, Xiaoyan Sun, Dong Chen, and Fang Wen. 2022. Paint by Example: Exemplar-based Image Editing with Diffusion Models. arXiv:2211.13227 [cs.CV] https://arxiv.org/abs/ 2211.13227

work page arXiv 2022
[49]

Hu Ye, Jun Zhang, Sibo Liu, Xiao Han, and Wei Yang. 2023. IP-Adapter: Text Compatible Image Prompt Adapter for Text-to-Image Diffusion Models. arXiv:2308.06721 [cs.CV] https://arxiv.org/abs/2308.06721

work page internal anchor Pith review arXiv 2023
[50]

Ge Yuan, Xiaodong Cun, Yong Zhang, Maomao Li, Chenyang Qi, Xintao Wang, Ying Shan, and Huicheng Zheng. 2023. Inserting Anybody in Diffusion Models via Celeb Basis. arXiv:2306.00926 [cs.CV] https://arxiv.org/abs/2306.00926

work page arXiv 2023
[51]

Lvmin Zhang, Anyi Rao, and Maneesh Agrawala. 2023. Adding Conditional Control to Text-to-Image Diffusion Models. arXiv:2302.05543 [cs.CV] https: //arxiv.org/abs/2302.05543

work page arXiv 2023
[52]

Efros, Eli Shechtman, and Oliver Wang

Richard Zhang, Phillip Isola, Alexei A. Efros, Eli Shechtman, and Oliver Wang
[53]

Efros, Eli Shechtman, and Oliver Wang

The Unreasonable Effectiveness of Deep Features as a Perceptual Metric. arXiv:1801.03924 [cs.CV] https://arxiv.org/abs/1801.03924

work page arXiv
[54]

Yiheng Zhang, Yunkang Cao, Xiaohao Xu, and Weiming Shen. 2024. LogiCode: an LLM-Driven Framework for Logical Anomaly Detection. arXiv:2406.04687 [cs.LG] https://arxiv.org/abs/2406.04687

work page arXiv 2024
[55]

Zibo Zhao, Zeqiang Lai, Qingxiang Lin, Yunfei Zhao, Haolin Liu, Shuhui Yang, Yifei Feng, Mingxin Yang, Sheng Zhang, Xianghui Yang, et al. 2025. Hunyuan3d 2.0: Scaling diffusion models for high resolution textured 3d assets generation. arXiv preprint arXiv:2501.12202(2025)

work page Pith review arXiv 2025
[56]

Wenbing Zhu, Lidong Wang, Ziqing Zhou, Chengjie Wang, Yurui Pan, Ruoyi Zhang, Zhuhao Chen, Linjie Cheng, Bin-Bin Gao, Jiangning Zhang, Zhenye Gan, Yuxie Wang, Yulong Chen, Shuguang Qian, Mingmin Chi, Bo Peng, and Lizhuang Ma. 2025. Real-IAD D3: A Real-World 2D/Pseudo-3D/3D Dataset for Industrial Anomaly Detection. arXiv:2504.14221 [cs.CV] https://arxiv.or...

work page arXiv 2025