pith. machine review for the scientific record. sign in

arxiv: 2605.02583 · v1 · submitted 2026-05-04 · 💻 cs.CV

Recognition: 2 theorem links

· Lean Theorem

Stylistic Attribute Control in Latent Diffusion Models

Authors on Pith no claims yet

Pith reviewed 2026-05-08 18:39 UTC · model grok-4.3

classification 💻 cs.CV
keywords latent diffusion modelsstylistic controlimage editingdisentangled directionsguidance compositionsynthetic datasetsDDIM inversion
0
0 comments X

The pith

Learning disentangled editing directions from synthetic datasets enables precise continuous control over stylistic attributes in latent diffusion models while preserving content.

A machine-rendered reading of the paper's core claim, the machinery that carries it, and where it could break.

The paper aims to give users parametric control over specific stylistic features in diffusion-generated images without the content shifts that text prompts often produce. It does so by training on synthetic images filtered for one style at a time to extract editing directions for attributes such as outlines, local contrast, watercolor effects, and geometric patterns. These directions are then composed with the base model through guidance to close the domain gap between the fine-tuned and original models. A regularization loss during training and optimized null embeddings during inversion further ensure that edits remain consistent when applied to real photographs.

Core claim

By learning disentangled editing directions from stylistically filtered synthetic datasets and applying them through guidance composition in latent diffusion models, together with a training regularization loss and enhanced DDIM inversion using optimized null-conditional embeddings, the approach achieves fine-grained parametric control of stylistic attributes on both generated and real images while keeping original semantics intact.

What carries the argument

Disentangled editing directions learned from stylistically filtered synthetic datasets, composed via guidance to control stylistic attributes parametrically in latent diffusion models.

Load-bearing premise

Disentangled editing directions learned from synthetic datasets will transfer to real images through guidance composition without causing unintended content changes or domain gaps.

What would settle it

Observing systematic semantic alterations or loss of edit precision when the learned directions are applied via guidance composition to a diverse set of real-world photographs.

Figures

Figures reproduced from arXiv: 2605.02583 by Benito Buchheim, J\"urgen D\"ollner, Max Reimann.

Figure 1
Figure 1. Figure 1: Our method enables editing of individual stylistic attributes in LDM [ view at source ↗
Figure 2
Figure 2. Figure 2: Comparison of xDoG [WKO12] filtered LDM-generated input image with our xDoG-finetuned method. Earlier filter-based non-photorealistic rendering (NPR) meth￾ods [KCWI13], allow fine grained parameter control (e.g., line width in Fig. 2b), but they operate on low level features and of￾ten fail to capture scene semantics, a shortcoming that has repeat￾edly been mentioned as motivation for methods such as Neura… view at source ↗
Figure 3
Figure 3. Figure 3: Overview of our approach. During training, attention layers (dark orange) of an edge-conditioned Controlnet are finetuned to view at source ↗
Figure 4
Figure 4. Figure 4: Using our stylization guidance (gA) without Controlnet guidance. Here we increase the linewidth parameter. While results vary in line strength, they are not spatially stable. 3.2. Spatial Stability While applying the above formulation to an εθ conditioned solely on text can yield visually compelling results ( view at source ↗
Figure 5
Figure 5. Figure 5: Examples of training images stylized with watercolor fil view at source ↗
Figure 6
Figure 6. Figure 6: Examples from PaintTransformer [LLH∗ 21] as targets for the optimization of the null text embedding ∅t , min￾imizing the error between z ∗ t and zt−1(zt ,∅t , cp), which is the la￾tent code obtained from a step of DDIM sampling with guidance w = 7.5. This procedure is repeated in each timestep, initialising zt−1 and ∅t−1 with the previous step results, and yielding {∅t} T t=1 optimized embeddings. Please r… view at source ↗
Figure 7
Figure 7. Figure 7: Comparison of training outputs for λstrokewidth = 1 with and without regularization. Regularization leads to early conver￾gence. (a) canny edge map (b) black point=1 (c) details=1 view at source ↗
Figure 8
Figure 8. Figure 8: Outputs of only εA. Finetuned layers, conditioned on the edge map (a), learn to discard most colour and scene information. • Watercolor. We use a watercolorization filter [BKTS06], adjust￾ing seven parameters: “contourWidth" (Fig. 5a) and “details" for controlling outlines and fine brushstrokes; “colorfulness" and “blackpoint" (Fig. 5b) for histogram adjustments; watercolor￾specific effects such as “paintS… view at source ↗
Figure 10
Figure 10. Figure 10: Reconstruction (rec.) and editing of blackpoint parame view at source ↗
Figure 14
Figure 14. Figure 14: Change in out￾put compared to original εθ only. To evaluate these influences, we vary the outline thickness and the activation timesteps actt when generating DT view at source ↗
Figure 11
Figure 11. Figure 11: Comparison of continuous editing. Top row: our method varies contour width smoothly by adjusting parameter values and acti 0.1 view at source ↗
Figure 12
Figure 12. Figure 12: Varying the activation time step and the line-width ( view at source ↗
Figure 13
Figure 13. Figure 13: Stroke size variations using the PaintTransformer [ view at source ↗
Figure 16
Figure 16. Figure 16: Sample images from user study. Here the users where view at source ↗
Figure 17
Figure 17. Figure 17: User method preferences. Shown are the preference view at source ↗
Figure 18
Figure 18. Figure 18: Training behaviour for parameters and regularization view at source ↗
Figure 19
Figure 19. Figure 19: Similarity of parameters over training steps view at source ↗
Figure 21
Figure 21. Figure 21: Extended Fig view at source ↗
Figure 22
Figure 22. Figure 22: Parameter variation for the Artbench-trained model. The parameter encodes the “expressionism" style, and we use the prompt “a view at source ↗
Figure 23
Figure 23. Figure 23: Training progress with different regularizations. Note that the outputs are at view at source ↗
Figure 24
Figure 24. Figure 24: Qualitative comparison to related methods. We show results from the user study. Note that for fair comparison, all images were view at source ↗
Figure 25
Figure 25. Figure 25: Influence of increasing stylization strength - extension of Fig. view at source ↗
Figure 26
Figure 26. Figure 26: Extended Fig view at source ↗
read the original abstract

Text-to-image diffusion models have revolutionized image synthesis and editing, but precise control over stylistic attributes remains a challenge, often causing unintended content modifications. We propose an approach for fine-grained parametric control of stylistic attributes in latent diffusion models by learning disentangled editing directions from synthetic datasets. We use guidance composition to close the domain gap between stylistically finetuned and foundation models, preserving the original image semantics while applying stylistic adjustments. To ensure consistent edits, we introduce a training regularization loss and enhance DDIM inversion with optimized null-conditional embeddings for real image editing. We validate our approach by learning from stylistically filtered synthetic datasets varying a range of stylistic attributes, including outlines, local contrast, watercolorization effects, and geometric patterns. Our evaluations demonstrate that compared to current text-based editing techniques, our method offers well-integrated, more precise and continuously adjustable stylistic modifications.

Editorial analysis

A structured set of objections, weighed in public.

Desk editor's note, referee report, simulated authors' rebuttal, and a circularity audit. Tearing a paper down is the easy half of reading it; the pith above is the substance, this is the friction.

Referee Report

2 major / 2 minor

Summary. The paper claims to introduce a technique for precise stylistic attribute control in latent diffusion models. By learning disentangled editing directions from stylistically filtered synthetic datasets and applying them via guidance composition, along with a regularization loss and optimized DDIM inversion using null-conditional embeddings, the method aims to achieve stylistic edits on real images without altering content semantics. It is tested on various stylistic attributes such as outlines, local contrast, watercolorization effects, and geometric patterns, asserting better performance than text-based editing methods in terms of integration, precision, and continuous adjustability.

Significance. Should the proposed method prove effective in transferring style directions from synthetic to real domains without content drift, it would represent a meaningful advance in controllable image synthesis. This could facilitate more accurate and flexible stylistic modifications in applications ranging from digital art to automated design, addressing a persistent challenge in diffusion-based editing where text prompts often lead to unintended changes.

major comments (2)
  1. The central claim that guidance composition and regularization enable precise stylistic edits on real images without domain-induced content drift is load-bearing, yet the abstract provides no quantitative support such as content preservation metrics (e.g., semantic similarity or segmentation IoU before/after editing) or ablations showing orthogonality of learned directions to content axes on real distributions.
  2. Evaluations section: the superiority over text-based techniques is asserted via 'well-integrated, more precise' modifications, but no specific metrics, baselines, tables, or statistical tests are referenced, leaving the continuous adjustability and precision claims without verifiable grounding.
minor comments (2)
  1. The abstract could clarify the backbone model (e.g., specific Stable Diffusion variant) and the exact procedure for stylistically filtering the synthetic datasets.
  2. A diagram showing the composition of guidance signals and the regularization loss formulation would improve readability of the method.

Simulated Author's Rebuttal

2 responses · 0 unresolved

We thank the referee for their constructive and detailed feedback. We agree that strengthening the quantitative grounding of our claims will improve the manuscript. We have revised the paper to incorporate additional metrics, ablations, and tables as detailed below.

read point-by-point responses
  1. Referee: The central claim that guidance composition and regularization enable precise stylistic edits on real images without domain-induced content drift is load-bearing, yet the abstract provides no quantitative support such as content preservation metrics (e.g., semantic similarity or segmentation IoU before/after editing) or ablations showing orthogonality of learned directions to content axes on real distributions.

    Authors: We agree that the abstract does not contain quantitative metrics and that this weakens the presentation of the central claim. The full manuscript contains qualitative results across multiple attributes, but to directly address the concern we have added content-preservation metrics (CLIP cosine similarity and LPIPS) computed on real images before and after editing, plus an ablation that measures the correlation of the learned directions with content features on real data. These additions appear in a new quantitative evaluation subsection and are summarized in the abstract. revision: yes

  2. Referee: Evaluations section: the superiority over text-based techniques is asserted via 'well-integrated, more precise' modifications, but no specific metrics, baselines, tables, or statistical tests are referenced, leaving the continuous adjustability and precision claims without verifiable grounding.

    Authors: We acknowledge that the original evaluations section relied primarily on visual comparisons without tabulated metrics or statistical tests. In the revision we have inserted a new table that reports quantitative comparisons against text-based baselines (InstructPix2Pix and Prompt-to-Prompt) using CLIP directional similarity for integration, participant preference scores (N=50) for perceived precision, and a smoothness metric for continuous adjustability. Paired t-tests are included to assess statistical significance of the observed differences. revision: yes

Circularity Check

0 steps flagged

No circularity: derivation builds on standard components with independent evaluations

full rationale

The paper's chain starts from existing latent diffusion models and DDIM inversion, then introduces learning of editing directions on synthetic stylistic data, guidance composition for domain gap, a regularization loss, and optimized null embeddings. These are presented as novel additions whose effectiveness is asserted via described evaluations on filtered synthetic datasets and comparisons to text-based methods. No step reduces a claimed result to a fitted parameter or self-defined quantity by construction, no load-bearing self-citation chain is invoked for uniqueness or ansatz, and no renaming of known patterns occurs. The central claims rest on empirical validation rather than tautological re-expression of inputs.

Axiom & Free-Parameter Ledger

0 free parameters · 2 axioms · 0 invented entities

The claim rests on domain assumptions about diffusion model guidance and synthetic data providing disentangled attributes; no free parameters or invented entities are explicitly detailed in the abstract.

axioms (2)
  • domain assumption Guidance composition can close the domain gap between stylistically finetuned and foundation latent diffusion models while preserving semantics.
    Invoked to apply learned directions without content changes.
  • domain assumption Synthetic datasets with controlled stylistic variations yield disentangled editing directions that generalize to real images.
    Core premise for learning and validation approach.

pith-pipeline@v0.9.0 · 5443 in / 1275 out tokens · 32671 ms · 2026-05-08T18:39:13.533813+00:00 · methodology

discussion (0)

Sign in with ORCID, Apple, or X to comment. Anyone can read and Pith papers without signing in.

Lean theorems connected to this paper

Citations machine-checked in the Pith Canon. Every link opens the source theorem in the public Lean library.

What do these tags mean?
matches
The paper's claim is directly supported by a theorem in the formal canon.
supports
The theorem supports part of the paper's argument, but the paper may add assumptions or extra steps.
extends
The paper goes beyond the formal theorem; the theorem is a base layer rather than the whole result.
uses
The paper appears to rely on the theorem as machinery.
contradicts
The paper's claim conflicts with a theorem or certificate in the canon.
unclear
Pith found a possible connection, but the passage is too broad, indirect, or ambiguous to say the theorem truly supports the claim.

Reference graph

Works this paper leans on

124 extracted references · 41 canonical work pages · 4 internal anchors

  1. [1]

    In- structpix2pix: Learning to follow image editing instructions

    Brooks, Tim and Holynski, Aleksander and Efros, Alexei A. , year =. arxiv , keywords =:2211.09800 , primaryclass =

  2. [2]

    Attend-and-

    Chefer, Hila and Alaluf, Yuval and Vinker, Yael and Wolf, Lior and. Attend-and-. 2023 , month = may, number =. arxiv , keywords =:2301.13826 , primaryclass =

  3. [3]

    In: 2023 IEEE/CVF International Conference on Computer Vision (ICCV)

    2023 , journal =. doi:10.1109/iccv51070.2023.00390 , abstract =

  4. [4]

    2023 , abstract =

    Less Is. 2023 , abstract =

  5. [5]

    Classifier-Free Diffusion Guidance

    Classifier-. 2022 , month = jul, journal =. doi:10.48550/arxiv.2207.12598 , abstract =

  6. [6]

    doi:10.48550/arxiv.2310.10343 , abstract =

    2023 , month = oct, journal =. doi:10.48550/arxiv.2310.10343 , abstract =

  7. [7]

    2023 , journal =

    Multi-. 2023 , journal =. doi:10.48550/arxiv.2312.04337 , abstract =

  8. [8]

    doi:10.48550/arXiv.2310.15160 , urldate =

    2023 , month = oct, journal =. doi:10.48550/arxiv.2310.15160 , abstract =

  9. [9]

    DINOv2: Learning Robust Visual Features without Supervision

    2023 , month = apr, journal =. doi:10.48550/arxiv.2304.07193 , abstract =

  10. [10]

    DeeperForensics-1.0: A Large-Scale Dataset for Real-World Face Forgery Detection

    Focus on. 2020 , month = jun, journal =. doi:10.1109/cvpr42600.2020.00115 , abstract =

  11. [11]

    doi:10.48550/arxiv.2404.03145 , abstract =

    2024 , journal =. doi:10.48550/arxiv.2404.03145 , abstract =

  12. [12]

    Proceedings of the IEEE/CVF conference on computer vision and pattern recognition , pages=

    Null-text inversion for editing real images using guided diffusion models , author=. Proceedings of the IEEE/CVF conference on computer vision and pattern recognition , pages=

  13. [13]

    T2i-adapter: Learning adapters to dig out more controllable ability for text-to-image diffusion models

    Mou, Chong and Wang, Xintao and Xie, Liangbin and Wu, Yanze and Zhang, Jian and Qi, Zhongang and Shan, Ying and Qie, Xiaohu , year =. arxiv , keywords =:2302.08453 , primaryclass =

  14. [14]

    Tenenbaum

    Compositional. 2022 , month = jun, journal =. doi:10.48550/arxiv.2206.01714 , abstract =

  15. [15]

    In: IEEE/CVF Conference on Computer Vision and Pattern Recognition (CVPR)

    2023 , month = jun, journal =. doi:10.1109/cvpr52729.2023.02155 , abstract =

  16. [16]

    2023 , month = jul, journal =

    Synthetic-. 2023 , month = jul, journal =. doi:10.24132/csrn.3301.16 , abstract =

  17. [17]

    IEEE Transactions on Pattern Analysis and Machine Intelligence , title =

    Z. IEEE Transactions on Pattern Analysis and Machine Intelligence , title =

  18. [18]

    AnimateDiff: Animate Your Personalized Text-to-Image Diffusion Models without Specific Tuning

    Guo, Yuwei and Yang, Ceyuan and Rao, Anyi and Liang, Zhengyang and Wang, Yaohui and Qiao, Yu and Agrawala, Maneesh and Lin, Dahua and Dai, Bo , year =. arXiv , langid =:2307.04725 , primaryclass =

  19. [19]

    In: 2023 IEEE/CVF International Conference on Computer Vision (ICCV)

    Zero-1-to-3:. 2023 , journal =. doi:10.1109/iccv51070.2023.00853 , abstract =

  20. [20]

    arXiv preprint arXiv:2310.15110 , year=

    Zero123++: A. 2023 , month = oct, journal =. doi:10.48550/arxiv.2310.15110 , abstract =

  21. [21]
  22. [22]

    2024 , journal =

    Champ:. 2024 , journal =. doi:10.48550/arxiv.2403.14781 , abstract =

  23. [23]

    2022 , abstract =

    Fine-Tuning. 2022 , abstract =

  24. [24]

    Plug-and-play diffusion features for text-driven image-to-image translation, 2022

    Plug-and-. 2022 , month = nov, journal =. doi:10.48550/arxiv.2211.12572 , abstract =

  25. [25]

    Dream- sim: Learning new dimensions of human visual similar- ity using synthetic data.arXiv preprint arXiv:2306.09344,

    Dreamsim: Learning new dimensions of human visual similarity using synthetic data , author=. arXiv preprint arXiv:2306.09344 , year=

  26. [26]

    Plug-and-play diffusion features for text-driven image-to-image translation, 2022

    Tumanyan, Narek and Geyer, Michal and Bagon, Shai and Dekel, Tali , year =. Plug-and-. arxiv , keywords =:2211.12572 , primaryclass =

  27. [27]

    doi:10.48550/arxiv.2306.00984 , abstract =

    2023 , month = jun, journal =. doi:10.48550/arxiv.2306.00984 , abstract =

  28. [28]

    Zhang, Lvmin and Agrawala, Maneesh , year =. Adding. arxiv , keywords =:2302.05543 , primaryclass =

  29. [29]

    Null-Text

    Zhao, Jing and Zheng, Heliang and Wang, Chaoyue and Lan, Long and Huang, Wanrong and Yang, Wenjing , year =. Null-Text. arxiv , keywords =:2305.06710 , primaryclass =

  30. [30]
  31. [31]

    2023 , month = jul, series =

    Diffusion. 2023 , month = jul, series =. doi:10.1145/3588432.3591558 , urldate =

  32. [32]

    2023 , journal =

    Diversify,. 2023 , journal =. doi:10.48550/arxiv.2312.02253 , abstract =

  33. [33]

    doi:10.48550/arxiv.2310.01830 , abstract =

    2023 , month = oct, journal =. doi:10.48550/arxiv.2310.01830 , abstract =

  34. [34]

    , title =

    Loper, Matthew and Mahmood, Naureen and Romero, Javier and Pons-Moll, Gerard and Black, Michael J. , title =. ACM Trans. Graphics (Proc. SIGGRAPH Asia) , month = oct, number =

  35. [35]

    Proceedings of the IEEE/CVF Conference on Computer Vision and Pattern Recognition , year=

    High-resolution image synthesis with latent diffusion models , author=. Proceedings of the IEEE/CVF Conference on Computer Vision and Pattern Recognition , year=

  36. [36]

    International Conference on Machine Learning , year=

    Deep unsupervised learning using nonequilibrium thermodynamics , author=. International Conference on Machine Learning , year=

  37. [37]

    Advances in Neural Information Processing Systems , year=

    Denoising diffusion probabilistic models , author=. Advances in Neural Information Processing Systems , year=

  38. [38]

    International Conference on Learning Representations , year=

    Score-Based Generative Modeling through Stochastic Differential Equations , author=. International Conference on Learning Representations , year=

  39. [39]

    Medical Image Computing and Computer-Assisted Intervention , year=

    U-net: Convolutional networks for biomedical image segmentation , author=. Medical Image Computing and Computer-Assisted Intervention , year=

  40. [40]

    International Conference on Learning Representations , year=

    Denoising Diffusion Implicit Models , author=. International Conference on Learning Representations , year=

  41. [41]

    Advances in Neural Information Processing Systems , year=

    Dpm-solver: A fast ode solver for diffusion probabilistic model sampling in around 10 steps , author=. Advances in Neural Information Processing Systems , year=

  42. [42]

    International Conference on Learning Representations , year=

    Prompt-to-Prompt Image Editing with Cross-Attention Control , author=. International Conference on Learning Representations , year=

  43. [43]

    Proceedings of the IEEE/CVF international conference on computer vision , pages=

    Hierarchical kinematic probability distributions for 3D human shape and pose estimation from images in the wild , author=. Proceedings of the IEEE/CVF international conference on computer vision , pages=

  44. [44]

    Computer Vision--ECCV 2014: 13th European Conference, Zurich, Switzerland, September 6-12, 2014, Proceedings, Part V 13 , pages=

    Microsoft coco: Common objects in context , author=. Computer Vision--ECCV 2014: 13th European Conference, Zurich, Switzerland, September 6-12, 2014, Proceedings, Part V 13 , pages=. 2014 , organization=

  45. [45]

    arXiv preprint arXiv:2009.10013 , year=

    Synthetic training for accurate 3d human pose and shape estimation in the wild , author=. arXiv preprint arXiv:2009.10013 , year=

  46. [46]

    Proceedings of the IEEE conference on computer vision and pattern recognition , pages=

    Learning from synthetic humans , author=. Proceedings of the IEEE conference on computer vision and pattern recognition , pages=

  47. [47]

    Proceedings of the IEEE/CVF International Conference on Computer Vision , pages=

    Ai choreographer: Music conditioned 3d dance generation with aist++ , author=. Proceedings of the IEEE/CVF International Conference on Computer Vision , pages=

  48. [48]

    Proceedings of the IEEE international conference on computer vision , pages=

    Mask r-cnn , author=. Proceedings of the IEEE international conference on computer vision , pages=

  49. [49]

    CVPR , year=

    The Unreasonable Effectiveness of Deep Features as a Perceptual Metric , author=. CVPR , year=

  50. [50]

    Computer Science

    Improving image generation with better captions , author=. Computer Science. https://cdn. openai. com/papers/dall-e-3. pdf , volume=

  51. [51]

    Gemini: A Family of Highly Capable Multimodal Models

    Gemini: a family of highly capable multimodal models , author=. arXiv preprint arXiv:2312.11805 , year=

  52. [52]

    Demystifying MMD GANs

    Demystifying mmd gans , author=. arXiv preprint arXiv:1801.01401 , year=

  53. [53]

    Advances in neural information processing systems , volume=

    Gans trained by a two time-scale update rule converge to a local nash equilibrium , author=. Advances in neural information processing systems , volume=

  54. [54]

    Advances in neural information processing systems , volume=

    Improved techniques for training gans , author=. Advances in neural information processing systems , volume=

  55. [55]

    Proceedings of the IEEE/CVF Winter Conference on Applications of Computer Vision , pages=

    Stylegan-fusion: Diffusion guided domain adaptation of image generators , author=. Proceedings of the IEEE/CVF Winter Conference on Applications of Computer Vision , pages=

  56. [56]

    Proceedings of the IEEE/CVF international conference on computer vision , pages=

    Styleclip: Text-driven manipulation of stylegan imagery , author=. Proceedings of the IEEE/CVF international conference on computer vision , pages=

  57. [57]

    ACM Transactions on Graphics (TOG) , volume=

    Stylegan-nada: Clip-guided domain adaptation of image generators , author=. ACM Transactions on Graphics (TOG) , volume=. 2022 , publisher=

  58. [58]

    Proceedings of the IEEE/CVF conference on computer vision and pattern recognition , pages=

    Analyzing and improving the image quality of stylegan , author=. Proceedings of the IEEE/CVF conference on computer vision and pattern recognition , pages=

  59. [59]

    European Conference on Computer Vision , pages=

    Compositional visual generation with composable diffusion models , author=. European Conference on Computer Vision , pages=. 2022 , organization=

  60. [60]

    Proceedings of the IEEE conference on computer vision and pattern recognition , pages=

    Real-time monocular depth estimation using synthetic data with domain adaptation via image style transfer , author=. Proceedings of the IEEE conference on computer vision and pattern recognition , pages=

  61. [61]

    Auto-Encoding Variational Bayes

    Auto-encoding variational bayes , author=. arXiv preprint arXiv:1312.6114 , year=

  62. [62]

    arXiv preprint arXiv:2303.15361 , year=

    A comprehensive survey on test-time adaptation under distribution shifts , author=. arXiv preprint arXiv:2303.15361 , year=

  63. [63]

    International conference on machine learning , pages=

    Test-time training with self-supervision for generalization under distribution shifts , author=. International conference on machine learning , pages=. 2020 , organization=

  64. [64]

    IEEE transactions on pattern analysis and machine intelligence , volume=

    A review of domain adaptation without target labels , author=. IEEE transactions on pattern analysis and machine intelligence , volume=. 2019 , publisher=

  65. [65]

    2013 , Month = may, Number =

    Jan Eric Kyprianidis and John Collomosse and Tinghuai Wang and Tobias Isenberg , Journal =. 2013 , Month = may, Number =

  66. [66]

    Gatys, Alexander S

    Leon A. Gatys and Alexander S. Ecker and Matthias Bethge , title =. 2016. 2016 , url =. doi:10.1109/CVPR.2016.265 , timestamp =

  67. [67]

    Proceedings of the IEEE/CVF Conference on Computer Vision and Pattern Recognition (CVPR) , month =

    Chung, Jiwoo and Hyun, Sangeek and Heo, Jae-Pil , title =. Proceedings of the IEEE/CVF Conference on Computer Vision and Pattern Recognition (CVPR) , month =. 2024 , pages =

  68. [68]

    2024 , eprint=

    FreeStyle: Free Lunch for Text-guided Style Transfer using Diffusion Models , author=. 2024 , eprint=

  69. [69]

    Advances in neural information processing systems , volume=

    Diffusion models beat gans on image synthesis , author=. Advances in neural information processing systems , volume=

  70. [70]

    2022 , booktitle=

    BLIP: Bootstrapping Language-Image Pre-training for Unified Vision-Language Understanding and Generation , author=. 2022 , booktitle=

  71. [71]

    ACM TOG , volume = 25, number = 3, pages =

    Real-Time Video Abstraction , author =. ACM TOG , volume = 25, number = 3, pages =

  72. [72]

    Paint Transformer: Feed Forward Neural Painting with Stroke Prediction , author =. Proc. ICCV , pages =

  73. [73]

    Interactive watercolor rendering with temporal coherence and abstraction , author =. Proc. NPAR , pages =

  74. [74]

    ACM Transactions on Graphics (TOG) , volume=

    Deep bilateral learning for real-time image enhancement , author=. ACM Transactions on Graphics (TOG) , volume=. 2017 , publisher=

  75. [75]

    A Style-Based Generator Architecture for Generative Adversarial Networks , author =. Proc. CVPR , pages =

  76. [76]

    Adam: A Method for Stochastic Optimization , author =

  77. [77]

    Generative Adversarial Nets , author =

  78. [78]

    Painterly Rendering with Curved Brush Strokes of Multiple Sizes , author =. Proc. SIGGRAPH , pages =

  79. [79]

    Image and Video-Based Artistic Stylisation , pages=

    Winnem. Image and Video-Based Artistic Stylisation , pages=. 2012 , doi =

  80. [80]

    arXiv preprint arXiv:2312.09008 , year=

    Style Injection in Diffusion: A Training-free Approach for Adapting Large-scale Diffusion Models for Style Transfer , author=. arXiv preprint arXiv:2312.09008 , year=

Showing first 80 references.