pith. sign in

arxiv: 2605.17500 · v1 · pith:WH4KMRSQnew · submitted 2026-05-17 · 💻 cs.LG · cs.CV

The Silent Brush: Evaluating Artistic Style Leakage in AI Art Generation

Pith reviewed 2026-05-20 14:28 UTC · model grok-4.3

classification 💻 cs.LG cs.CV
keywords The Silent Brushartistic style leakagetext-to-image modelsevaluation protocoldiffusion modelsstyle interactionAI art generation
0
0 comments X

The pith

Text-to-image models reproduce artistic styles from training data without any prompt reference due to uneven encoding strengths and interaction dynamics among artworks.

A machine-rendered reading of the paper's core claim, the machinery that carries it, and where it could break.

The paper establishes that generative models trained on web data containing artworks can output stylistic traits from those works even when the prompt never mentions them. This unintended resurfacing, called The Silent Brush, is traced to differences in how strongly individual artworks are represented inside the model and how those representations combine during generation. A sympathetic reader would care because it clarifies a mechanism of unrequested style reuse that existing checks for duplicates or membership do not catch. The authors introduce Art Arena, a protocol that quantifies encoding strength, pairwise interactions, and the rate of unprompted style appearance, then apply it to Stable Diffusion v1.5, SDXL, and SANA-1.5. The central finding is that these differences produce asymmetric blending, where stronger representations dominate weaker ones in the output.

Core claim

The Silent Brush is the reappearance of stylistic traits from training artworks in model generations without explicit prompt mention. Art Arena is the evaluation protocol that measures how strongly artworks are encoded, how they interact with one another, and how frequently their stylistic traits reappear in generated outputs. Results on widely used diffusion models show that The Silent Brush arises from differences in representational strength and interaction dynamics between artworks, leading to asymmetric blending in model generations.

What carries the argument

Art Arena, the evaluation protocol that measures representational strength of individual artworks, their interaction dynamics, and the frequency of unprompted stylistic reappearance in outputs.

If this is right

  • Stronger-encoded artworks impose their styles on outputs even when weaker artworks are the intended reference.
  • Pairwise interaction dynamics between artworks are not symmetric, so style dominance depends on which pair is involved.
  • The same leakage pattern appears across multiple diffusion-based text-to-image systems when evaluated with the same protocol.
  • Evaluation of unintended reuse must track interaction dynamics rather than relying only on near-duplicate retrieval or membership inference.

Where Pith is reading between the lines

These are editorial extensions of the paper, not claims the author makes directly.

  • If the measured strength differences prove stable, targeted removal or down-weighting of dominant artworks during training could reduce specific style leakages.
  • The protocol could be adapted to test whether similar asymmetric blending occurs for non-artistic concepts such as object categories or color palettes.
  • Prompt engineering or post-generation style transfer might serve as practical countermeasures once the dominant artworks for a given output are identified.

Load-bearing premise

The Art Arena protocol accurately isolates stylistic leakage from confounding factors such as prompt phrasing, model architecture specifics, or dataset composition, allowing the measured interactions to generalize across text-to-image systems.

What would settle it

If Art Arena measurements of unequal representational strength between two artworks consistently produce symmetric rather than asymmetric style blending in controlled generations, the claimed causal link between strength differences and asymmetric outcomes would be falsified.

Figures

Figures reproduced from arXiv: 2605.17500 by Ashutosh Ranjan, Ninad Joshi, Shirish Karande, Vivek Srivastava.

Figure 1
Figure 1. Figure 1: The Silent Brush: An image generated by a text-to-image model that, without explicitly prompting Café Terrace at Night or The Scream, visibly incorporates elements from both artworks. Prompt used for image generation: “A solitary, androgynous figure stands rigid in the foreground, clutching its head in a silent scream as golden café light spills onto the street behind. The glowing awning and receding table… view at source ↗
Figure 2
Figure 2. Figure 2: We present the distribution of fitness scores for SD v1.5, SDXL, and SANA [PITH_FULL_IMAGE:figures/full_fig_p005_2.png] view at source ↗
Figure 3
Figure 3. Figure 3: Motif Duel for SD v1.5 evaluated under the fidelity based proximity (CSD). The challenger contributes the motif which is paired with defender 1 and defender 2 to form two composite prompts. The generated images are then evaluated for proximity to both the challenger and the corresponding defender. The figure shows that the motif composed with different defenders yields stylistically distinct outputs as ref… view at source ↗
Figure 4
Figure 4. Figure 4: Motif-Extraction Prompt: Prompt used to extract motif names and natural-language motif descriptions for an artwork, based on terminology commonly used in reliable art-historical sources. The sources used are mentioned in [PITH_FULL_IMAGE:figures/full_fig_p017_4.png] view at source ↗
Figure 5
Figure 5. Figure 5: Sources used to extract motif names and descriptions as commonly referenced by curators, [PITH_FULL_IMAGE:figures/full_fig_p018_5.png] view at source ↗
Figure 6
Figure 6. Figure 6: Motif-based challenger prompt construction: We enumerate motif combinations and blend them into coherent, style-neutral scene descriptions used as challenger prompts in Motif Duels. Including clear Style Influence Constraints and Prompt Construction Rules (refer [PITH_FULL_IMAGE:figures/full_fig_p019_6.png] view at source ↗
Figure 7
Figure 7. Figure 7: Imitation of the artwork Impression, Sunrise by Claude Monet using the SDXL, SD v1.5, and SANA-1.5 text-to-image models. The prompt used for imitation was: “Impression, Sunrise in the style of Claude Monet.” We evaluate the degree of imitation using CLIP (Semantics), LPIPS (Aesthetics), and CSD (Fidelity). Higher CLIP and CSD scores indicate stronger imitation, whereas lower LPIPS scores indicate stronger … view at source ↗
Figure 8
Figure 8. Figure 8: Motif Duel for SD v1.5 evaluated under the semantic-based proximity (CLIP). The challenger contributes the motif which is paired with Defender 1 and Defender 2 to form two composite prompts. The generated images are then evaluated for proximity to both the challenger and the corresponding defender. The figure shows that the motif composed with different defenders yields stylistically distinct outputs as re… view at source ↗
Figure 9
Figure 9. Figure 9: Motif Duel for SD v1.5 evaluated under the aesthetics-based proximity (LPIPS). Lower score indicates better performance. The challenger contributes the motif which is paired with Defender 1 and Defender 2 to form two composite prompts. The generated images are then evaluated for proximity to both the challenger and the corresponding defender. The figure shows that the motif composed with different defender… view at source ↗
Figure 10
Figure 10. Figure 10: Motif Duel for SDXL evaluated under the semantic-based proximity (CLIP). Higher score indicates better performance. The challenger contributes the motif which is paired with Defender 1 and Defender 2 to form two composite prompts. The generated images are then evaluated for proximity to both the challenger and the corresponding defender. The figure shows that the motif composed with different defenders yi… view at source ↗
Figure 11
Figure 11. Figure 11: Motif Duel for SDXL evaluated under the fidelity-based proximity (CSD). Higher score indicates better performance. The challenger contributes the motif which is paired with Defender 1 and Defender 2 to form two composite prompts. The generated images are then evaluated for proximity to both the challenger and the corresponding defender. The figure shows that the motif composed with different defenders yie… view at source ↗
Figure 12
Figure 12. Figure 12: Motif Duel for SDXL evaluated under the aesthetics-based proximity (LPIPS). Lower score indicates better performance. The challenger contributes the motif which is paired with Defender 1 and Defender 2 to form two composite prompts. The generated images are then evaluated for proximity to both the challenger and the corresponding defender. The figure shows that the motif composed with different defenders … view at source ↗
Figure 13
Figure 13. Figure 13: Motif Duel for SANA-1.5 evaluated under the semantic-based proximity (CLIP). Higher score indicates better performance. The challenger contributes the motif which is paired with Defender 1 and Defender 2 to form two composite prompts. The generated images are then evaluated for proximity to both the challenger and the corresponding defender. The figure shows that the motif composed with different defender… view at source ↗
Figure 14
Figure 14. Figure 14: Motif Duel for SANA-1.5 evaluated under the fidelity-based proximity (CSD). Higher score indicates better performance. The challenger contributes the motif which is paired with Defender 1 and Defender 2 to form two composite prompts. The generated images are then evaluated for proximity to both the challenger and the corresponding defender. The figure shows that the motif composed with different defenders… view at source ↗
Figure 15
Figure 15. Figure 15: Motif Duel for SANA-1.5 evaluated under the aesthetics-based proximity (LPIPS). Lower score indicates better performance. The challenger contributes the motif which is paired with Defender 1 and Defender 2 to form two composite prompts. The generated images are then evaluated for proximity to both the challenger and the corresponding defender. The figure shows that the motif composed with different defend… view at source ↗
Figure 16
Figure 16. Figure 16: Representative Motif Duel instance for SD v1.5 under semantics-based proximity, with a challenger artwork tested against four distinct defenders. The challenger artwork is Sunset on the Seine at Lavacourt, Winter Effect by Claude Monet and the motif-derived prompt is given by: “The scene features muted winter sunset light with pastel tones, a river reflecting shimmering light, a hazy softness over structu… view at source ↗
Figure 17
Figure 17. Figure 17: Representative Motif Duel instance for SDXL under semantics based proximity, with a challenger artwork tested against four distinct defenders. The challenger artwork is Vincent’s Bedroom in Arles by Vincent van Gogh and the motif-derived prompt is given by: “A room with a prominent yellow bed, plain wooden chairs with green cushions, and framed pictures on the walls featuring portraits and landscapes.” Co… view at source ↗
Figure 18
Figure 18. Figure 18: Representative Motif Duel instance for SANA-1.5 under semantics-based proximity, with a challenger artwork tested against four distinct defenders. The challenger artwork is Green Coca Cola Bottles by Andy Warhol and the motif-derived prompt is given by: “Rows of repeating Coca-Cola bottles arranged in a flat layout, emphasizing a lack of visual depth or dimensional perspective.” Column (a) shows the chall… view at source ↗
Figure 19
Figure 19. Figure 19: Sensitivity analysis of threshold (𝛿) used to award rounds for SD v1.5 with proximity (a) Semantics, (b) Aesthetics, and (c) Fidelity. The x-axis represents the different threshold values and the y-axis represent the number of artworks (defender and challenger) with at least one win. 30 [PITH_FULL_IMAGE:figures/full_fig_p030_19.png] view at source ↗
Figure 20
Figure 20. Figure 20: Sensitivity analysis of threshold (𝛿) used to award rounds for SDXL with proximity (a) Semantics, (b) Aesthetics, and (c) Fidelity. The x-axis represents the different threshold values and the y-axis represent the number of artworks (defender and challenger) with at least one win. (a) Semantics (b) Aesthetics (c) Fidelity [PITH_FULL_IMAGE:figures/full_fig_p031_20.png] view at source ↗
Figure 21
Figure 21. Figure 21: Sensitivity analysis of threshold (𝛿) used to award rounds for SANA-1.5 with proximity (a) Semantics, (b) Aesthetics, and (c) Fidelity. The x-axis represents the different threshold values and the y-axis represent the number of artworks (defender and challenger) with at least one win [PITH_FULL_IMAGE:figures/full_fig_p031_21.png] view at source ↗
read the original abstract

Generative text-to-image models are typically trained on large-scale web-scraped datasets that include diverse visual content such as copyrighted and stylistically distinctive artworks, raising concerns about ownership, attribution, and the unintended reuse of protected visual expressions. A key issue is that models can learn stylistic patterns from this data and reproduce them in generated outputs without any explicit reference in the prompt. We refer to this phenomenon as The Silent Brush, where such learned styles reappear even when they are not requested. Existing evaluation methods mainly focus on near-duplicate retrieval or membership inference and do not account for this form of unintended stylistic resurfacing across prompts. To address these gaps, we first formulate guiding principles for evaluation of The Silent Brush. We then introduce Art Arena, an evaluation protocol that measures how strongly artworks are encoded, how they interact, and how frequently their stylistic traits reappear in generated outputs without explicit mention in prompts. We evaluate Art Arena on widely used text-to-image diffusion models, including Stable Diffusion v1.5, Stable Diffusion XL (SDXL), and SANA-1.5, and design it to generalize across text-to-image generative systems. Our results show that The Silent Brush arises from differences in representational strength and interaction dynamics between artworks, leading to asymmetric blending in model generations. Code and evaluation resources are available at: https://anonymous.4open.science/r/ArtArena-EBE4.

Editorial analysis

A structured set of objections, weighed in public.

Desk editor's note, referee report, simulated authors' rebuttal, and a circularity audit. Tearing a paper down is the easy half of reading it; the pith above is the substance, this is the friction.

Referee Report

2 major / 2 minor

Summary. The paper defines 'The Silent Brush' as unintended stylistic leakage in text-to-image diffusion models, where learned artistic styles from web-scraped training data reappear in generated outputs without explicit prompting. It outlines guiding principles for evaluation, introduces the Art Arena protocol to quantify encoding strength, inter-artwork interactions, and reappearance frequency of stylistic traits, and applies it to Stable Diffusion v1.5, SDXL, and SANA-1.5. The central claim is that observed asymmetric blending arises from differences in representational strength and interaction dynamics rather than prompt or dataset artifacts, with code and resources provided for generalization across systems.

Significance. If the Art Arena protocol can be shown to isolate stylistic leakage from confounders, the work would provide a useful empirical tool for studying unintended style reproduction in generative models, with relevance to copyright, attribution, and model auditing. The release of code and evaluation resources supports potential reproducibility and extension by others.

major comments (2)
  1. [Abstract and §3] Abstract and §3 (Art Arena protocol description): the protocol measures encoding strength and reappearance frequency but provides no explicit controls such as fixed prompt templates, balanced sampling of style frequencies from the pretraining corpus, or ablations of conditioning mechanisms. This leaves open whether reported asymmetries reflect intrinsic representational properties or prompt sensitivity and dataset frequency effects.
  2. [Results and evaluation sections] Results and evaluation sections: the abstract and main claims assert that The Silent Brush arises from representational strength differences leading to asymmetric blending, yet no quantitative metrics, error bars, data exclusion criteria, or statistical tests are referenced. Without these, it is not possible to assess whether the central empirical observations support the causal attribution over alternative explanations.
minor comments (2)
  1. [Introduction] Introduction: the distinction between The Silent Brush and prior membership-inference or near-duplicate retrieval methods could be sharpened with a brief comparison table.
  2. [§4] §4 (model evaluations): clarify how the protocol ensures generalization claims across text-to-image systems beyond the three tested models.

Simulated Author's Rebuttal

2 responses · 0 unresolved

We thank the referee for their constructive and detailed feedback on our work. We have carefully addressed each major comment below and revised the manuscript accordingly to improve experimental controls and statistical rigor. These changes strengthen the presentation of the Art Arena protocol and the support for our central claims regarding stylistic leakage.

read point-by-point responses
  1. Referee: [Abstract and §3] Abstract and §3 (Art Arena protocol description): the protocol measures encoding strength and reappearance frequency but provides no explicit controls such as fixed prompt templates, balanced sampling of style frequencies from the pretraining corpus, or ablations of conditioning mechanisms. This leaves open whether reported asymmetries reflect intrinsic representational properties or prompt sensitivity and dataset frequency effects.

    Authors: We appreciate this observation on experimental design. The original Art Arena protocol employed a fixed set of neutral prompt templates across all model evaluations to reduce prompt-induced variability, as noted in §3. To further isolate effects, the revised manuscript now includes an explicit ablation of conditioning mechanisms (new Appendix B) and a sensitivity analysis varying prompt phrasing. Regarding balanced sampling, our sampling was intentionally drawn from the empirical distribution of styles in web-scraped corpora to reflect realistic leakage conditions rather than artificial balance; however, we have added a new subsection in §3.3 discussing frequency biases and report results from a balanced subsample experiment showing that the observed asymmetries persist. These revisions support attribution to representational strength and interaction dynamics while acknowledging dataset influences. revision: yes

  2. Referee: [Results and evaluation sections] Results and evaluation sections: the abstract and main claims assert that The Silent Brush arises from representational strength differences leading to asymmetric blending, yet no quantitative metrics, error bars, data exclusion criteria, or statistical tests are referenced. Without these, it is not possible to assess whether the central empirical observations support the causal attribution over alternative explanations.

    Authors: We acknowledge the value of enhanced statistical reporting for validating the causal claims. The revised manuscript now reports mean encoding strength and reappearance frequency with standard deviations across 5 independent runs per condition, includes error bars on all figures in §4, and specifies data exclusion criteria (generations with CLIP similarity below 0.2 to reference styles are excluded). We have added statistical tests including paired t-tests for blending asymmetry significance (p < 0.01 reported for key comparisons) and ANOVA for interaction effects between artworks, with full results in a new Table 3. These quantitative elements provide clearer support for the role of representational strength differences over prompt or frequency artifacts alone. revision: yes

Circularity Check

0 steps flagged

No circularity: empirical protocol with independent observations

full rationale

The paper introduces the Art Arena evaluation protocol as a set of guiding principles and measurements for encoding strength, interactions, and reappearance frequency of stylistic traits in text-to-image models. It reports empirical results on specific models (Stable Diffusion v1.5, SDXL, SANA-1.5) attributing asymmetric blending to representational differences. No equations, fitted parameters, or predictions are described that reduce by construction to the protocol inputs or self-citations. The derivation chain consists of protocol design followed by observation reporting, remaining self-contained against external benchmarks without self-referential definitions or load-bearing self-citations.

Axiom & Free-Parameter Ledger

0 free parameters · 0 axioms · 1 invented entities

Review is based solely on the abstract; no explicit free parameters, axioms, or invented entities beyond the named phenomenon are detailed. The protocol implicitly assumes that stylistic strength and interactions can be isolated and quantified in a generalizable way, but specifics are absent.

invented entities (1)
  • The Silent Brush no independent evidence
    purpose: Naming the phenomenon of unintended stylistic resurfacing in generated outputs without explicit prompts
    Introduced as a descriptive term for the observed behavior in the abstract.

pith-pipeline@v0.9.0 · 5791 in / 1321 out tokens · 45535 ms · 2026-05-20T14:28:32.753585+00:00 · methodology

discussion (0)

Sign in with ORCID, Apple, or X to comment. Anyone can read and Pith papers without signing in.

Lean theorems connected to this paper

Citations machine-checked in the Pith Canon. Every link opens the source theorem in the public Lean library.

What do these tags mean?
matches
The paper's claim is directly supported by a theorem in the formal canon.
supports
The theorem supports part of the paper's argument, but the paper may add assumptions or extra steps.
extends
The paper goes beyond the formal theorem; the theorem is a base layer rather than the whole result.
uses
The paper appears to rely on the theorem as machinery.
contradicts
The paper's claim conflicts with a theorem or certificate in the canon.
unclear
Pith found a possible connection, but the passage is too broad, indirect, or ambiguous to say the theorem truly supports the claim.

Reference graph

Works this paper leans on

22 extracted references · 22 canonical work pages · 2 internal anchors

  1. [1]

    Style-based clustering of visual artworks and the play of neural style-representations.arXiv preprint arXiv:2409.08245,

    Abhishek Dangeti, Pavan Gajula, Vivek Srivastava, and Vikram Jamwal. Style-based clustering of visual artworks and the play of neural style-representations.arXiv preprint arXiv:2409.08245,

  2. [2]

    Stylesentinel: Reliable artistic copyright verification via stylistic fingerprints.arXiv preprint arXiv:2508.01335,

    Lingxiao Chen, Liqin Wang, and Wei Lu. Stylesentinel: Reliable artistic copyright verification via stylistic fingerprints.arXiv preprint arXiv:2508.01335,

  3. [3]

    The work of art in the age of mechanical reproduction, 1936.New York,

    Walter Benjamin. The work of art in the age of mechanical reproduction, 1936.New York,

  4. [4]

    Color encoding in latent space of stable diffusion models.arXiv preprint arXiv:2512.09477,

    Guillem Arias, Ariadna Solà, Martí Armengod, and Maria Vanrell. Color encoding in latent space of stable diffusion models.arXiv preprint arXiv:2512.09477,

  5. [6]

    A Neural Algorithm of Artistic Style

    Leon A Gatys, Alexander S Ecker, and Matthias Bethge. A neural algorithm of artistic style.arXiv preprint arXiv:1508.06576,

  6. [7]

    High- resolutionimagesynthesiswithlatentdiffusionmodels

    Robin Rombach, Andreas Blattmann, Dominik Lorenz, Patrick Esser, and Björn Ommer. High- resolutionimagesynthesiswithlatentdiffusionmodels. InProceedingsoftheIEEE/CVFconference on computer vision and pattern recognition, pages 10684–10695, 2022b. Enze Xie, Junsong Chen, Yuyang Zhao, Jincheng Yu, Ligeng Zhu, Chengyue Wu, Yujun Lin, Zhekai Zhang,MuyangLi,Jun...

  7. [8]

    Clipscore: A reference- free evaluation metric for image captioning

    Jack Hessel, Ari Holtzman, Maxwell Forbes, Ronan Le Bras, and Yejin Choi. Clipscore: A reference- free evaluation metric for image captioning. InProceedings of the 2021 conference on empirical methods in natural language processing, pages 7514–7528,

  8. [9]

    Measuring style similarity in diffusion models.arXiv preprint arXiv:2404.01292, 2024

    GowthamiSomepalli,AnubhavGupta,KamalGupta,ShramayPalta,MicahGoldblum,JonasGeiping, Abhinav Shrivastava, and Tom Goldstein. Measuring style similarity in diffusion models.arXiv preprint arXiv:2404.01292,

  9. [10]

    On memorization in diffusion models

    Xiangming Gu, Chao Du, Tianyu Pang, Chongxuan Li, Min Lin, and Ye Wang. On memorization in diffusion models.arXiv preprint arXiv:2310.02664, 2023b. Shawn Shan, Jenna Cryan, Emily Wenger, Haitao Zheng, Rana Hanocka, and Ben Y Zhao. Glaze: Protecting artists from style mimicry by{Text-to-Image} models. In32nd USENIX Security Symposium (USENIX Security 23), ...

  10. [11]

    Black-box membership inference attacks against fine-tuned diffusion models.arXiv preprint arXiv:2312.08207,

    Yan Pang and Tianhao Wang. Black-box membership inference attacks against fine-tuned diffusion models.arXiv preprint arXiv:2312.08207,

  11. [12]

    Classactioncomplaint: Andersenetal.v.stability ailtd.etal

    SarahAndersen,KellyMcKernan,andKarlaOrtiz. Classactioncomplaint: Andersenetal.v.stability ailtd.etal. https://ipwatchdog.com/wp-content/uploads/2023/02/Andersen_et_al_ v._Stability_AI.pdf,

  12. [13]

    Getty images (us) inc and others v

    High Court of Justice of England and Wales. Getty images (us) inc and others v. stability ai limited: Approved high court judgment.https://www.judiciary.uk/wp-content/uploads/2025/ 11/Getty-Images-v-Stability-AI.pdf,

  13. [14]

    Image-generatingaicancopyandpastefromtrainingdata,raisingipconcerns.techcrunch (2022),

    KWiggers. Image-generatingaicancopyandpastefromtrainingdata,raisingipconcerns.techcrunch (2022),

  14. [15]

    Identifying and eliminating csam in generative ml training data and mod- els

    12 David Thiel. Identifying and eliminating csam in generative ml training data and mod- els. https://stacks.stanford.edu/file/druid:kh752sm9123/ml_training_data_ csam_report-2023-12-23.pdf, December

  15. [16]

    Ninad Joshi, Vivek Srivastava, and Shirish Karande

    Official LAION announcement of the Re-LAION-5B dataset revision. Ninad Joshi, Vivek Srivastava, and Shirish Karande. Dota: Latent distribution conditioned data attributionfordiffusionmodels.InProceedingsoftheIEEE/CVFWinterConferenceonApplications of Computer Vision, pages 2022–2031,

  16. [17]

    Investigating the limitation of clip models: The worst-performing categories.arXiv preprint arXiv:2310.03324,

    Jie-Jing Shao, Jiang-Xin Shi, Xiao-Wen Yang, Lan-Zhe Guo, and Yu-Feng Li. Investigating the limitation of clip models: The worst-performing categories.arXiv preprint arXiv:2310.03324,

  17. [18]

    E-LPIPS: Robust Perceptual Image Similarity via Random Transformation Ensembles

    Chitwan Saharia, William Chan, Saurabh Saxena, Lala Li, Jay Whang, Emily L Denton, Kamyar Ghasemipour, Raphael Gontijo Lopes, Burcu Karagol Ayan, Tim Salimans, et al. Photorealistic text-to-image diffusion models with deep language understanding.Advances in neural information processing systems, 35:36479–36494, 2022b. Markus Kettunen, Erik Härkönen, and J...

  18. [19]

    R-lpips: An adversarially robust perceptual similarity metric.arXiv preprint arXiv:2307.15157,

    SaraGhazanfari,SiddharthGarg,PrashanthKrishnamurthy,FarshadKhorrami,andAlexandreAraujo. R-lpips: An adversarially robust perceptual similarity metric.arXiv preprint arXiv:2307.15157,

  19. [20]

    Csgo: Content-style composition in text-to-image generation,

    Peng Xing, Haofan Wang, Yanpeng Sun, Qixun Wang, Xu Bai, Hao Ai, Renyuan Huang, and Zechao Li. Csgo: Content-stylecompositionintext-to-imagegeneration.arXivpreprintarXiv:2408.16766,

  20. [21]

    Dreamo: A unified framework for image customization

    Chong Mou, Yanze Wu, Wenxu Wu, Zinan Guo, Pengze Zhang, Yufeng Cheng, Yiming Luo, Fei Ding, Shiwen Zhang, Xinghui Li, et al. Dreamo: A unified framework for image customization. In Proceedings of the SIGGRAPH Asia 2025 Conference Papers, pages 1–12,

  21. [22]

    14 A.2 Related Work

    13 Appendix Appendix Contents A.1 Ethical Considerations and Limitations . . . . . . . . . . . . . . . . . . . . . . . . 14 A.2 Related Work . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 14 A.3 Motif Extraction . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 16 A.4 Artworks and their extracted motifs . ....

  22. [23]

    Impression, Sunrise in the style of Claude Monet

    by Jackson Pollock Marilyn Monroe by Andy Warhol 3 The Japanese Bridge (The Bridge in Monet’s Garden) by Claude Monet Number 32 by Jackson Pollock Cross by Andy Warhol 4 Water Lilies by Claude Monet Portrait of a Man by Rembrandt Beatles by Andy Warhol 5 Pathway in Monet’s Garden at Giverny by Claude Monet Tree with Ivy in the Asylum Gar- den by Vincent v...