The Silent Brush: Evaluating Artistic Style Leakage in AI Art Generation

Ashutosh Ranjan; Ninad Joshi; Shirish Karande; Vivek Srivastava

arxiv: 2605.17500 · v1 · pith:WH4KMRSQnew · submitted 2026-05-17 · 💻 cs.LG · cs.CV

The Silent Brush: Evaluating Artistic Style Leakage in AI Art Generation

Ninad Joshi , Ashutosh Ranjan , Vivek Srivastava , Shirish Karande This is my paper

Pith reviewed 2026-05-20 14:28 UTC · model grok-4.3

classification 💻 cs.LG cs.CV

keywords The Silent Brushartistic style leakagetext-to-image modelsevaluation protocoldiffusion modelsstyle interactionAI art generation

0 comments

The pith

Text-to-image models reproduce artistic styles from training data without any prompt reference due to uneven encoding strengths and interaction dynamics among artworks.

A machine-rendered reading of the paper's core claim, the machinery that carries it, and where it could break.

The paper establishes that generative models trained on web data containing artworks can output stylistic traits from those works even when the prompt never mentions them. This unintended resurfacing, called The Silent Brush, is traced to differences in how strongly individual artworks are represented inside the model and how those representations combine during generation. A sympathetic reader would care because it clarifies a mechanism of unrequested style reuse that existing checks for duplicates or membership do not catch. The authors introduce Art Arena, a protocol that quantifies encoding strength, pairwise interactions, and the rate of unprompted style appearance, then apply it to Stable Diffusion v1.5, SDXL, and SANA-1.5. The central finding is that these differences produce asymmetric blending, where stronger representations dominate weaker ones in the output.

Core claim

The Silent Brush is the reappearance of stylistic traits from training artworks in model generations without explicit prompt mention. Art Arena is the evaluation protocol that measures how strongly artworks are encoded, how they interact with one another, and how frequently their stylistic traits reappear in generated outputs. Results on widely used diffusion models show that The Silent Brush arises from differences in representational strength and interaction dynamics between artworks, leading to asymmetric blending in model generations.

What carries the argument

Art Arena, the evaluation protocol that measures representational strength of individual artworks, their interaction dynamics, and the frequency of unprompted stylistic reappearance in outputs.

If this is right

Stronger-encoded artworks impose their styles on outputs even when weaker artworks are the intended reference.
Pairwise interaction dynamics between artworks are not symmetric, so style dominance depends on which pair is involved.
The same leakage pattern appears across multiple diffusion-based text-to-image systems when evaluated with the same protocol.
Evaluation of unintended reuse must track interaction dynamics rather than relying only on near-duplicate retrieval or membership inference.

Where Pith is reading between the lines

These are editorial extensions of the paper, not claims the author makes directly.

If the measured strength differences prove stable, targeted removal or down-weighting of dominant artworks during training could reduce specific style leakages.
The protocol could be adapted to test whether similar asymmetric blending occurs for non-artistic concepts such as object categories or color palettes.
Prompt engineering or post-generation style transfer might serve as practical countermeasures once the dominant artworks for a given output are identified.

Load-bearing premise

The Art Arena protocol accurately isolates stylistic leakage from confounding factors such as prompt phrasing, model architecture specifics, or dataset composition, allowing the measured interactions to generalize across text-to-image systems.

What would settle it

If Art Arena measurements of unequal representational strength between two artworks consistently produce symmetric rather than asymmetric style blending in controlled generations, the claimed causal link between strength differences and asymmetric outcomes would be falsified.

Figures

Figures reproduced from arXiv: 2605.17500 by Ashutosh Ranjan, Ninad Joshi, Shirish Karande, Vivek Srivastava.

**Figure 1.** Figure 1: The Silent Brush: An image generated by a text-to-image model that, without explicitly prompting Café Terrace at Night or The Scream, visibly incorporates elements from both artworks. Prompt used for image generation: “A solitary, androgynous figure stands rigid in the foreground, clutching its head in a silent scream as golden café light spills onto the street behind. The glowing awning and receding table… view at source ↗

**Figure 2.** Figure 2: We present the distribution of fitness scores for SD v1.5, SDXL, and SANA [PITH_FULL_IMAGE:figures/full_fig_p005_2.png] view at source ↗

**Figure 3.** Figure 3: Motif Duel for SD v1.5 evaluated under the fidelity based proximity (CSD). The challenger contributes the motif which is paired with defender 1 and defender 2 to form two composite prompts. The generated images are then evaluated for proximity to both the challenger and the corresponding defender. The figure shows that the motif composed with different defenders yields stylistically distinct outputs as ref… view at source ↗

**Figure 4.** Figure 4: Motif-Extraction Prompt: Prompt used to extract motif names and natural-language motif descriptions for an artwork, based on terminology commonly used in reliable art-historical sources. The sources used are mentioned in [PITH_FULL_IMAGE:figures/full_fig_p017_4.png] view at source ↗

**Figure 5.** Figure 5: Sources used to extract motif names and descriptions as commonly referenced by curators, [PITH_FULL_IMAGE:figures/full_fig_p018_5.png] view at source ↗

**Figure 6.** Figure 6: Motif-based challenger prompt construction: We enumerate motif combinations and blend them into coherent, style-neutral scene descriptions used as challenger prompts in Motif Duels. Including clear Style Influence Constraints and Prompt Construction Rules (refer [PITH_FULL_IMAGE:figures/full_fig_p019_6.png] view at source ↗

**Figure 7.** Figure 7: Imitation of the artwork Impression, Sunrise by Claude Monet using the SDXL, SD v1.5, and SANA-1.5 text-to-image models. The prompt used for imitation was: “Impression, Sunrise in the style of Claude Monet.” We evaluate the degree of imitation using CLIP (Semantics), LPIPS (Aesthetics), and CSD (Fidelity). Higher CLIP and CSD scores indicate stronger imitation, whereas lower LPIPS scores indicate stronger … view at source ↗

**Figure 8.** Figure 8: Motif Duel for SD v1.5 evaluated under the semantic-based proximity (CLIP). The challenger contributes the motif which is paired with Defender 1 and Defender 2 to form two composite prompts. The generated images are then evaluated for proximity to both the challenger and the corresponding defender. The figure shows that the motif composed with different defenders yields stylistically distinct outputs as re… view at source ↗

**Figure 9.** Figure 9: Motif Duel for SD v1.5 evaluated under the aesthetics-based proximity (LPIPS). Lower score indicates better performance. The challenger contributes the motif which is paired with Defender 1 and Defender 2 to form two composite prompts. The generated images are then evaluated for proximity to both the challenger and the corresponding defender. The figure shows that the motif composed with different defender… view at source ↗

**Figure 10.** Figure 10: Motif Duel for SDXL evaluated under the semantic-based proximity (CLIP). Higher score indicates better performance. The challenger contributes the motif which is paired with Defender 1 and Defender 2 to form two composite prompts. The generated images are then evaluated for proximity to both the challenger and the corresponding defender. The figure shows that the motif composed with different defenders yi… view at source ↗

**Figure 11.** Figure 11: Motif Duel for SDXL evaluated under the fidelity-based proximity (CSD). Higher score indicates better performance. The challenger contributes the motif which is paired with Defender 1 and Defender 2 to form two composite prompts. The generated images are then evaluated for proximity to both the challenger and the corresponding defender. The figure shows that the motif composed with different defenders yie… view at source ↗

**Figure 12.** Figure 12: Motif Duel for SDXL evaluated under the aesthetics-based proximity (LPIPS). Lower score indicates better performance. The challenger contributes the motif which is paired with Defender 1 and Defender 2 to form two composite prompts. The generated images are then evaluated for proximity to both the challenger and the corresponding defender. The figure shows that the motif composed with different defenders … view at source ↗

**Figure 13.** Figure 13: Motif Duel for SANA-1.5 evaluated under the semantic-based proximity (CLIP). Higher score indicates better performance. The challenger contributes the motif which is paired with Defender 1 and Defender 2 to form two composite prompts. The generated images are then evaluated for proximity to both the challenger and the corresponding defender. The figure shows that the motif composed with different defender… view at source ↗

**Figure 14.** Figure 14: Motif Duel for SANA-1.5 evaluated under the fidelity-based proximity (CSD). Higher score indicates better performance. The challenger contributes the motif which is paired with Defender 1 and Defender 2 to form two composite prompts. The generated images are then evaluated for proximity to both the challenger and the corresponding defender. The figure shows that the motif composed with different defenders… view at source ↗

**Figure 15.** Figure 15: Motif Duel for SANA-1.5 evaluated under the aesthetics-based proximity (LPIPS). Lower score indicates better performance. The challenger contributes the motif which is paired with Defender 1 and Defender 2 to form two composite prompts. The generated images are then evaluated for proximity to both the challenger and the corresponding defender. The figure shows that the motif composed with different defend… view at source ↗

**Figure 16.** Figure 16: Representative Motif Duel instance for SD v1.5 under semantics-based proximity, with a challenger artwork tested against four distinct defenders. The challenger artwork is Sunset on the Seine at Lavacourt, Winter Effect by Claude Monet and the motif-derived prompt is given by: “The scene features muted winter sunset light with pastel tones, a river reflecting shimmering light, a hazy softness over structu… view at source ↗

**Figure 17.** Figure 17: Representative Motif Duel instance for SDXL under semantics based proximity, with a challenger artwork tested against four distinct defenders. The challenger artwork is Vincent’s Bedroom in Arles by Vincent van Gogh and the motif-derived prompt is given by: “A room with a prominent yellow bed, plain wooden chairs with green cushions, and framed pictures on the walls featuring portraits and landscapes.” Co… view at source ↗

**Figure 18.** Figure 18: Representative Motif Duel instance for SANA-1.5 under semantics-based proximity, with a challenger artwork tested against four distinct defenders. The challenger artwork is Green Coca Cola Bottles by Andy Warhol and the motif-derived prompt is given by: “Rows of repeating Coca-Cola bottles arranged in a flat layout, emphasizing a lack of visual depth or dimensional perspective.” Column (a) shows the chall… view at source ↗

**Figure 19.** Figure 19: Sensitivity analysis of threshold (𝛿) used to award rounds for SD v1.5 with proximity (a) Semantics, (b) Aesthetics, and (c) Fidelity. The x-axis represents the different threshold values and the y-axis represent the number of artworks (defender and challenger) with at least one win. 30 [PITH_FULL_IMAGE:figures/full_fig_p030_19.png] view at source ↗

**Figure 20.** Figure 20: Sensitivity analysis of threshold (𝛿) used to award rounds for SDXL with proximity (a) Semantics, (b) Aesthetics, and (c) Fidelity. The x-axis represents the different threshold values and the y-axis represent the number of artworks (defender and challenger) with at least one win. (a) Semantics (b) Aesthetics (c) Fidelity [PITH_FULL_IMAGE:figures/full_fig_p031_20.png] view at source ↗

**Figure 21.** Figure 21: Sensitivity analysis of threshold (𝛿) used to award rounds for SANA-1.5 with proximity (a) Semantics, (b) Aesthetics, and (c) Fidelity. The x-axis represents the different threshold values and the y-axis represent the number of artworks (defender and challenger) with at least one win [PITH_FULL_IMAGE:figures/full_fig_p031_21.png] view at source ↗

read the original abstract

Generative text-to-image models are typically trained on large-scale web-scraped datasets that include diverse visual content such as copyrighted and stylistically distinctive artworks, raising concerns about ownership, attribution, and the unintended reuse of protected visual expressions. A key issue is that models can learn stylistic patterns from this data and reproduce them in generated outputs without any explicit reference in the prompt. We refer to this phenomenon as The Silent Brush, where such learned styles reappear even when they are not requested. Existing evaluation methods mainly focus on near-duplicate retrieval or membership inference and do not account for this form of unintended stylistic resurfacing across prompts. To address these gaps, we first formulate guiding principles for evaluation of The Silent Brush. We then introduce Art Arena, an evaluation protocol that measures how strongly artworks are encoded, how they interact, and how frequently their stylistic traits reappear in generated outputs without explicit mention in prompts. We evaluate Art Arena on widely used text-to-image diffusion models, including Stable Diffusion v1.5, Stable Diffusion XL (SDXL), and SANA-1.5, and design it to generalize across text-to-image generative systems. Our results show that The Silent Brush arises from differences in representational strength and interaction dynamics between artworks, leading to asymmetric blending in model generations. Code and evaluation resources are available at: https://anonymous.4open.science/r/ArtArena-EBE4.

Editorial analysis

A structured set of objections, weighed in public.

Desk editor's note, referee report, simulated authors' rebuttal, and a circularity audit. Tearing a paper down is the easy half of reading it; the pith above is the substance, this is the friction.

Desk Editor's Note private letter to a colleague

Art Arena gives a practical protocol for spotting unintended style leakage in text-to-image models, but the write-up still needs clearer controls and quantitative backing before the asymmetry claims land solidly.

read the letter

The paper's main contribution is Art Arena, a protocol that tries to measure how strongly training artworks get encoded and how their styles reappear in outputs even when the prompt does not mention them. They test this on Stable Diffusion v1.5, SDXL, and SANA-1.5 and release the code, which is useful for anyone who wants to run similar checks. That focus on interaction dynamics and asymmetric blending is a step past simple duplicate detection or membership inference, and the guiding principles they list make sense as a starting point for this kind of evaluation. The availability of resources at the anonymous link is a plus for reproducibility. The soft spots are in the execution details. The abstract and description do not show fixed prompt templates, balanced sampling across style frequencies, or ablations that would rule out prompt sensitivity or dataset skew as the source of the observed asymmetries. Without those, it is hard to tell whether the representational-strength story holds or whether the results are partly driven by how often certain styles appear in the web data. The quantitative metrics and error analysis are also thin in the summary provided. This work is aimed at people building or auditing generative models who care about style ownership and unintended reuse. A reader who needs a concrete evaluation framework rather than a finished theoretical result could still pull ideas from it. I would send it to peer review so the authors can add the missing controls and show the numbers that back the central claim.

Referee Report

2 major / 2 minor

Summary. The paper defines 'The Silent Brush' as unintended stylistic leakage in text-to-image diffusion models, where learned artistic styles from web-scraped training data reappear in generated outputs without explicit prompting. It outlines guiding principles for evaluation, introduces the Art Arena protocol to quantify encoding strength, inter-artwork interactions, and reappearance frequency of stylistic traits, and applies it to Stable Diffusion v1.5, SDXL, and SANA-1.5. The central claim is that observed asymmetric blending arises from differences in representational strength and interaction dynamics rather than prompt or dataset artifacts, with code and resources provided for generalization across systems.

Significance. If the Art Arena protocol can be shown to isolate stylistic leakage from confounders, the work would provide a useful empirical tool for studying unintended style reproduction in generative models, with relevance to copyright, attribution, and model auditing. The release of code and evaluation resources supports potential reproducibility and extension by others.

major comments (2)

[Abstract and §3] Abstract and §3 (Art Arena protocol description): the protocol measures encoding strength and reappearance frequency but provides no explicit controls such as fixed prompt templates, balanced sampling of style frequencies from the pretraining corpus, or ablations of conditioning mechanisms. This leaves open whether reported asymmetries reflect intrinsic representational properties or prompt sensitivity and dataset frequency effects.
[Results and evaluation sections] Results and evaluation sections: the abstract and main claims assert that The Silent Brush arises from representational strength differences leading to asymmetric blending, yet no quantitative metrics, error bars, data exclusion criteria, or statistical tests are referenced. Without these, it is not possible to assess whether the central empirical observations support the causal attribution over alternative explanations.

minor comments (2)

[Introduction] Introduction: the distinction between The Silent Brush and prior membership-inference or near-duplicate retrieval methods could be sharpened with a brief comparison table.
[§4] §4 (model evaluations): clarify how the protocol ensures generalization claims across text-to-image systems beyond the three tested models.

Simulated Author's Rebuttal

2 responses · 0 unresolved

We thank the referee for their constructive and detailed feedback on our work. We have carefully addressed each major comment below and revised the manuscript accordingly to improve experimental controls and statistical rigor. These changes strengthen the presentation of the Art Arena protocol and the support for our central claims regarding stylistic leakage.

read point-by-point responses

Referee: [Abstract and §3] Abstract and §3 (Art Arena protocol description): the protocol measures encoding strength and reappearance frequency but provides no explicit controls such as fixed prompt templates, balanced sampling of style frequencies from the pretraining corpus, or ablations of conditioning mechanisms. This leaves open whether reported asymmetries reflect intrinsic representational properties or prompt sensitivity and dataset frequency effects.

Authors: We appreciate this observation on experimental design. The original Art Arena protocol employed a fixed set of neutral prompt templates across all model evaluations to reduce prompt-induced variability, as noted in §3. To further isolate effects, the revised manuscript now includes an explicit ablation of conditioning mechanisms (new Appendix B) and a sensitivity analysis varying prompt phrasing. Regarding balanced sampling, our sampling was intentionally drawn from the empirical distribution of styles in web-scraped corpora to reflect realistic leakage conditions rather than artificial balance; however, we have added a new subsection in §3.3 discussing frequency biases and report results from a balanced subsample experiment showing that the observed asymmetries persist. These revisions support attribution to representational strength and interaction dynamics while acknowledging dataset influences. revision: yes
Referee: [Results and evaluation sections] Results and evaluation sections: the abstract and main claims assert that The Silent Brush arises from representational strength differences leading to asymmetric blending, yet no quantitative metrics, error bars, data exclusion criteria, or statistical tests are referenced. Without these, it is not possible to assess whether the central empirical observations support the causal attribution over alternative explanations.

Authors: We acknowledge the value of enhanced statistical reporting for validating the causal claims. The revised manuscript now reports mean encoding strength and reappearance frequency with standard deviations across 5 independent runs per condition, includes error bars on all figures in §4, and specifies data exclusion criteria (generations with CLIP similarity below 0.2 to reference styles are excluded). We have added statistical tests including paired t-tests for blending asymmetry significance (p < 0.01 reported for key comparisons) and ANOVA for interaction effects between artworks, with full results in a new Table 3. These quantitative elements provide clearer support for the role of representational strength differences over prompt or frequency artifacts alone. revision: yes

Circularity Check

0 steps flagged

No circularity: empirical protocol with independent observations

full rationale

The paper introduces the Art Arena evaluation protocol as a set of guiding principles and measurements for encoding strength, interactions, and reappearance frequency of stylistic traits in text-to-image models. It reports empirical results on specific models (Stable Diffusion v1.5, SDXL, SANA-1.5) attributing asymmetric blending to representational differences. No equations, fitted parameters, or predictions are described that reduce by construction to the protocol inputs or self-citations. The derivation chain consists of protocol design followed by observation reporting, remaining self-contained against external benchmarks without self-referential definitions or load-bearing self-citations.

Axiom & Free-Parameter Ledger

0 free parameters · 0 axioms · 1 invented entities

Review is based solely on the abstract; no explicit free parameters, axioms, or invented entities beyond the named phenomenon are detailed. The protocol implicitly assumes that stylistic strength and interactions can be isolated and quantified in a generalizable way, but specifics are absent.

invented entities (1)

The Silent Brush no independent evidence
purpose: Naming the phenomenon of unintended stylistic resurfacing in generated outputs without explicit prompts
Introduced as a descriptive term for the observed behavior in the abstract.

pith-pipeline@v0.9.0 · 5791 in / 1321 out tokens · 45535 ms · 2026-05-20T14:28:32.753585+00:00 · methodology

discussion (0)

Lean theorems connected to this paper

Citations machine-checked in the Pith Canon. Every link opens the source theorem in the public Lean library.

IndisputableMonolith/Cost/FunctionalEquation.lean washburn_uniqueness_aczel unclear

?

unclear
Relation between the paper passage and the cited Recognition theorem.

We introduce Art Arena, an evaluation protocol that measures how strongly artworks are encoded, how they interact, and how frequently their stylistic traits reappear in generated outputs without explicit mention in prompts.
IndisputableMonolith/Foundation/BranchSelection.lean branch_selection unclear

?

unclear
Relation between the paper passage and the cited Recognition theorem.

Our results show that The Silent Brush arises from differences in representational strength and interaction dynamics between artworks, leading to asymmetric blending in model generations.

What do these tags mean?

matches: The paper's claim is directly supported by a theorem in the formal canon.
supports: The theorem supports part of the paper's argument, but the paper may add assumptions or extra steps.
extends: The paper goes beyond the formal theorem; the theorem is a base layer rather than the whole result.
uses: The paper appears to rely on the theorem as machinery.
contradicts: The paper's claim conflicts with a theorem or certificate in the canon.
unclear: Pith found a possible connection, but the passage is too broad, indirect, or ambiguous to say the theorem truly supports the claim.

Reference graph

Works this paper leans on

22 extracted references · 22 canonical work pages · 2 internal anchors

[1]

Style-based clustering of visual artworks and the play of neural style-representations.arXiv preprint arXiv:2409.08245,

Abhishek Dangeti, Pavan Gajula, Vivek Srivastava, and Vikram Jamwal. Style-based clustering of visual artworks and the play of neural style-representations.arXiv preprint arXiv:2409.08245,

work page arXiv
[2]

Stylesentinel: Reliable artistic copyright verification via stylistic fingerprints.arXiv preprint arXiv:2508.01335,

Lingxiao Chen, Liqin Wang, and Wei Lu. Stylesentinel: Reliable artistic copyright verification via stylistic fingerprints.arXiv preprint arXiv:2508.01335,

work page arXiv
[3]

The work of art in the age of mechanical reproduction, 1936.New York,

Walter Benjamin. The work of art in the age of mechanical reproduction, 1936.New York,

work page 1936
[4]

Color encoding in latent space of stable diffusion models.arXiv preprint arXiv:2512.09477,

Guillem Arias, Ariadna Solà, Martí Armengod, and Maria Vanrell. Color encoding in latent space of stable diffusion models.arXiv preprint arXiv:2512.09477,

work page arXiv
[6]

A Neural Algorithm of Artistic Style

Leon A Gatys, Alexander S Ecker, and Matthias Bethge. A neural algorithm of artistic style.arXiv preprint arXiv:1508.06576,

work page internal anchor Pith review Pith/arXiv arXiv
[7]

High- resolutionimagesynthesiswithlatentdiffusionmodels

Robin Rombach, Andreas Blattmann, Dominik Lorenz, Patrick Esser, and Björn Ommer. High- resolutionimagesynthesiswithlatentdiffusionmodels. InProceedingsoftheIEEE/CVFconference on computer vision and pattern recognition, pages 10684–10695, 2022b. Enze Xie, Junsong Chen, Yuyang Zhao, Jincheng Yu, Ligeng Zhu, Chengyue Wu, Yujun Lin, Zhekai Zhang,MuyangLi,Jun...

work page arXiv
[8]

Clipscore: A reference- free evaluation metric for image captioning

Jack Hessel, Ari Holtzman, Maxwell Forbes, Ronan Le Bras, and Yejin Choi. Clipscore: A reference- free evaluation metric for image captioning. InProceedings of the 2021 conference on empirical methods in natural language processing, pages 7514–7528,

work page 2021
[9]

Measuring style similarity in diffusion models.arXiv preprint arXiv:2404.01292, 2024

GowthamiSomepalli,AnubhavGupta,KamalGupta,ShramayPalta,MicahGoldblum,JonasGeiping, Abhinav Shrivastava, and Tom Goldstein. Measuring style similarity in diffusion models.arXiv preprint arXiv:2404.01292,

work page arXiv
[10]

On memorization in diffusion models

Xiangming Gu, Chao Du, Tianyu Pang, Chongxuan Li, Min Lin, and Ye Wang. On memorization in diffusion models.arXiv preprint arXiv:2310.02664, 2023b. Shawn Shan, Jenna Cryan, Emily Wenger, Haitao Zheng, Rana Hanocka, and Ben Y Zhao. Glaze: Protecting artists from style mimicry by{Text-to-Image} models. In32nd USENIX Security Symposium (USENIX Security 23), ...

work page arXiv
[11]

Black-box membership inference attacks against fine-tuned diffusion models.arXiv preprint arXiv:2312.08207,

Yan Pang and Tianhao Wang. Black-box membership inference attacks against fine-tuned diffusion models.arXiv preprint arXiv:2312.08207,

work page arXiv
[12]

Classactioncomplaint: Andersenetal.v.stability ailtd.etal

SarahAndersen,KellyMcKernan,andKarlaOrtiz. Classactioncomplaint: Andersenetal.v.stability ailtd.etal. https://ipwatchdog.com/wp-content/uploads/2023/02/Andersen_et_al_ v._Stability_AI.pdf,

work page 2023
[13]

Getty images (us) inc and others v

High Court of Justice of England and Wales. Getty images (us) inc and others v. stability ai limited: Approved high court judgment.https://www.judiciary.uk/wp-content/uploads/2025/ 11/Getty-Images-v-Stability-AI.pdf,

work page 2025
[14]

Image-generatingaicancopyandpastefromtrainingdata,raisingipconcerns.techcrunch (2022),

KWiggers. Image-generatingaicancopyandpastefromtrainingdata,raisingipconcerns.techcrunch (2022),

work page 2022
[15]

Identifying and eliminating csam in generative ml training data and mod- els

12 David Thiel. Identifying and eliminating csam in generative ml training data and mod- els. https://stacks.stanford.edu/file/druid:kh752sm9123/ml_training_data_ csam_report-2023-12-23.pdf, December

work page 2023
[16]

Ninad Joshi, Vivek Srivastava, and Shirish Karande

Official LAION announcement of the Re-LAION-5B dataset revision. Ninad Joshi, Vivek Srivastava, and Shirish Karande. Dota: Latent distribution conditioned data attributionfordiffusionmodels.InProceedingsoftheIEEE/CVFWinterConferenceonApplications of Computer Vision, pages 2022–2031,

work page 2022
[17]

Investigating the limitation of clip models: The worst-performing categories.arXiv preprint arXiv:2310.03324,

Jie-Jing Shao, Jiang-Xin Shi, Xiao-Wen Yang, Lan-Zhe Guo, and Yu-Feng Li. Investigating the limitation of clip models: The worst-performing categories.arXiv preprint arXiv:2310.03324,

work page arXiv
[18]

E-LPIPS: Robust Perceptual Image Similarity via Random Transformation Ensembles

Chitwan Saharia, William Chan, Saurabh Saxena, Lala Li, Jay Whang, Emily L Denton, Kamyar Ghasemipour, Raphael Gontijo Lopes, Burcu Karagol Ayan, Tim Salimans, et al. Photorealistic text-to-image diffusion models with deep language understanding.Advances in neural information processing systems, 35:36479–36494, 2022b. Markus Kettunen, Erik Härkönen, and J...

work page internal anchor Pith review Pith/arXiv arXiv 1906
[19]

R-lpips: An adversarially robust perceptual similarity metric.arXiv preprint arXiv:2307.15157,

SaraGhazanfari,SiddharthGarg,PrashanthKrishnamurthy,FarshadKhorrami,andAlexandreAraujo. R-lpips: An adversarially robust perceptual similarity metric.arXiv preprint arXiv:2307.15157,

work page arXiv
[20]

Csgo: Content-style composition in text-to-image generation,

Peng Xing, Haofan Wang, Yanpeng Sun, Qixun Wang, Xu Bai, Hao Ai, Renyuan Huang, and Zechao Li. Csgo: Content-stylecompositionintext-to-imagegeneration.arXivpreprintarXiv:2408.16766,

work page arXiv
[21]

Dreamo: A unified framework for image customization

Chong Mou, Yanze Wu, Wenxu Wu, Zinan Guo, Pengze Zhang, Yufeng Cheng, Yiming Luo, Fei Ding, Shiwen Zhang, Xinghui Li, et al. Dreamo: A unified framework for image customization. In Proceedings of the SIGGRAPH Asia 2025 Conference Papers, pages 1–12,

work page 2025
[22]

14 A.2 Related Work

13 Appendix Appendix Contents A.1 Ethical Considerations and Limitations . . . . . . . . . . . . . . . . . . . . . . . . 14 A.2 Related Work . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 14 A.3 Motif Extraction . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 16 A.4 Artworks and their extracted motifs . ....

work page 2023
[23]

Impression, Sunrise in the style of Claude Monet

by Jackson Pollock Marilyn Monroe by Andy Warhol 3 The Japanese Bridge (The Bridge in Monet’s Garden) by Claude Monet Number 32 by Jackson Pollock Cross by Andy Warhol 4 Water Lilies by Claude Monet Portrait of a Man by Rembrandt Beatles by Andy Warhol 5 Pathway in Monet’s Garden at Giverny by Claude Monet Tree with Ivy in the Asylum Gar- den by Vincent v...

work page 2021

[1] [1]

Style-based clustering of visual artworks and the play of neural style-representations.arXiv preprint arXiv:2409.08245,

Abhishek Dangeti, Pavan Gajula, Vivek Srivastava, and Vikram Jamwal. Style-based clustering of visual artworks and the play of neural style-representations.arXiv preprint arXiv:2409.08245,

work page arXiv

[2] [2]

Stylesentinel: Reliable artistic copyright verification via stylistic fingerprints.arXiv preprint arXiv:2508.01335,

Lingxiao Chen, Liqin Wang, and Wei Lu. Stylesentinel: Reliable artistic copyright verification via stylistic fingerprints.arXiv preprint arXiv:2508.01335,

work page arXiv

[3] [3]

The work of art in the age of mechanical reproduction, 1936.New York,

Walter Benjamin. The work of art in the age of mechanical reproduction, 1936.New York,

work page 1936

[4] [4]

Color encoding in latent space of stable diffusion models.arXiv preprint arXiv:2512.09477,

Guillem Arias, Ariadna Solà, Martí Armengod, and Maria Vanrell. Color encoding in latent space of stable diffusion models.arXiv preprint arXiv:2512.09477,

work page arXiv

[5] [6]

A Neural Algorithm of Artistic Style

Leon A Gatys, Alexander S Ecker, and Matthias Bethge. A neural algorithm of artistic style.arXiv preprint arXiv:1508.06576,

work page internal anchor Pith review Pith/arXiv arXiv

[6] [7]

High- resolutionimagesynthesiswithlatentdiffusionmodels

Robin Rombach, Andreas Blattmann, Dominik Lorenz, Patrick Esser, and Björn Ommer. High- resolutionimagesynthesiswithlatentdiffusionmodels. InProceedingsoftheIEEE/CVFconference on computer vision and pattern recognition, pages 10684–10695, 2022b. Enze Xie, Junsong Chen, Yuyang Zhao, Jincheng Yu, Ligeng Zhu, Chengyue Wu, Yujun Lin, Zhekai Zhang,MuyangLi,Jun...

work page arXiv

[7] [8]

Clipscore: A reference- free evaluation metric for image captioning

Jack Hessel, Ari Holtzman, Maxwell Forbes, Ronan Le Bras, and Yejin Choi. Clipscore: A reference- free evaluation metric for image captioning. InProceedings of the 2021 conference on empirical methods in natural language processing, pages 7514–7528,

work page 2021

[8] [9]

Measuring style similarity in diffusion models.arXiv preprint arXiv:2404.01292, 2024

GowthamiSomepalli,AnubhavGupta,KamalGupta,ShramayPalta,MicahGoldblum,JonasGeiping, Abhinav Shrivastava, and Tom Goldstein. Measuring style similarity in diffusion models.arXiv preprint arXiv:2404.01292,

work page arXiv

[9] [10]

On memorization in diffusion models

Xiangming Gu, Chao Du, Tianyu Pang, Chongxuan Li, Min Lin, and Ye Wang. On memorization in diffusion models.arXiv preprint arXiv:2310.02664, 2023b. Shawn Shan, Jenna Cryan, Emily Wenger, Haitao Zheng, Rana Hanocka, and Ben Y Zhao. Glaze: Protecting artists from style mimicry by{Text-to-Image} models. In32nd USENIX Security Symposium (USENIX Security 23), ...

work page arXiv

[10] [11]

Black-box membership inference attacks against fine-tuned diffusion models.arXiv preprint arXiv:2312.08207,

Yan Pang and Tianhao Wang. Black-box membership inference attacks against fine-tuned diffusion models.arXiv preprint arXiv:2312.08207,

work page arXiv

[11] [12]

Classactioncomplaint: Andersenetal.v.stability ailtd.etal

SarahAndersen,KellyMcKernan,andKarlaOrtiz. Classactioncomplaint: Andersenetal.v.stability ailtd.etal. https://ipwatchdog.com/wp-content/uploads/2023/02/Andersen_et_al_ v._Stability_AI.pdf,

work page 2023

[12] [13]

Getty images (us) inc and others v

High Court of Justice of England and Wales. Getty images (us) inc and others v. stability ai limited: Approved high court judgment.https://www.judiciary.uk/wp-content/uploads/2025/ 11/Getty-Images-v-Stability-AI.pdf,

work page 2025

[13] [14]

Image-generatingaicancopyandpastefromtrainingdata,raisingipconcerns.techcrunch (2022),

KWiggers. Image-generatingaicancopyandpastefromtrainingdata,raisingipconcerns.techcrunch (2022),

work page 2022

[14] [15]

Identifying and eliminating csam in generative ml training data and mod- els

12 David Thiel. Identifying and eliminating csam in generative ml training data and mod- els. https://stacks.stanford.edu/file/druid:kh752sm9123/ml_training_data_ csam_report-2023-12-23.pdf, December

work page 2023

[15] [16]

Ninad Joshi, Vivek Srivastava, and Shirish Karande

Official LAION announcement of the Re-LAION-5B dataset revision. Ninad Joshi, Vivek Srivastava, and Shirish Karande. Dota: Latent distribution conditioned data attributionfordiffusionmodels.InProceedingsoftheIEEE/CVFWinterConferenceonApplications of Computer Vision, pages 2022–2031,

work page 2022

[16] [17]

Investigating the limitation of clip models: The worst-performing categories.arXiv preprint arXiv:2310.03324,

Jie-Jing Shao, Jiang-Xin Shi, Xiao-Wen Yang, Lan-Zhe Guo, and Yu-Feng Li. Investigating the limitation of clip models: The worst-performing categories.arXiv preprint arXiv:2310.03324,

work page arXiv

[17] [18]

E-LPIPS: Robust Perceptual Image Similarity via Random Transformation Ensembles

Chitwan Saharia, William Chan, Saurabh Saxena, Lala Li, Jay Whang, Emily L Denton, Kamyar Ghasemipour, Raphael Gontijo Lopes, Burcu Karagol Ayan, Tim Salimans, et al. Photorealistic text-to-image diffusion models with deep language understanding.Advances in neural information processing systems, 35:36479–36494, 2022b. Markus Kettunen, Erik Härkönen, and J...

work page internal anchor Pith review Pith/arXiv arXiv 1906

[18] [19]

R-lpips: An adversarially robust perceptual similarity metric.arXiv preprint arXiv:2307.15157,

SaraGhazanfari,SiddharthGarg,PrashanthKrishnamurthy,FarshadKhorrami,andAlexandreAraujo. R-lpips: An adversarially robust perceptual similarity metric.arXiv preprint arXiv:2307.15157,

work page arXiv

[19] [20]

Csgo: Content-style composition in text-to-image generation,

Peng Xing, Haofan Wang, Yanpeng Sun, Qixun Wang, Xu Bai, Hao Ai, Renyuan Huang, and Zechao Li. Csgo: Content-stylecompositionintext-to-imagegeneration.arXivpreprintarXiv:2408.16766,

work page arXiv

[20] [21]

Dreamo: A unified framework for image customization

Chong Mou, Yanze Wu, Wenxu Wu, Zinan Guo, Pengze Zhang, Yufeng Cheng, Yiming Luo, Fei Ding, Shiwen Zhang, Xinghui Li, et al. Dreamo: A unified framework for image customization. In Proceedings of the SIGGRAPH Asia 2025 Conference Papers, pages 1–12,

work page 2025

[21] [22]

14 A.2 Related Work

13 Appendix Appendix Contents A.1 Ethical Considerations and Limitations . . . . . . . . . . . . . . . . . . . . . . . . 14 A.2 Related Work . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 14 A.3 Motif Extraction . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 16 A.4 Artworks and their extracted motifs . ....

work page 2023

[22] [23]

Impression, Sunrise in the style of Claude Monet

by Jackson Pollock Marilyn Monroe by Andy Warhol 3 The Japanese Bridge (The Bridge in Monet’s Garden) by Claude Monet Number 32 by Jackson Pollock Cross by Andy Warhol 4 Water Lilies by Claude Monet Portrait of a Man by Rembrandt Beatles by Andy Warhol 5 Pathway in Monet’s Garden at Giverny by Claude Monet Tree with Ivy in the Asylum Gar- den by Vincent v...

work page 2021