Recognition: unknown
Diffusion Templates: A Unified Plugin Framework for Controllable Diffusion
Pith reviewed 2026-05-08 04:20 UTC · model grok-4.3
The pith
Diffusion Templates decouple control capabilities from specific diffusion models using a shared plugin interface for composable generation.
A machine-rendered reading of the paper's core claim, the machinery that carries it, and where it could break.
Core claim
The central claim is that defining control injection at the systems level through Template models, a Template cache, and a Template pipeline unifies a broad range of controllable diffusion tasks while preserving modularity and extensibility. The Template cache serves as the key standardized interface that accepts heterogeneous capability carriers, allowing the same pipeline to support structural, color, editing, and other controls without tying the abstraction to any single control architecture. This enables a single runtime to load and compose multiple Template caches across evolving diffusion backbones.
What carries the argument
The Template cache, which functions as a standardized, architecture-independent interface for injecting capabilities from Template models into the base diffusion runtime.
If this is right
- Multiple controls such as inpainting and aesthetic alignment can be composed within a single generation without custom integration code.
- Capabilities developed for one diffusion backbone can be transferred to others by swapping the base model while keeping the same Template caches.
- New tasks can be added by implementing only a Template model and cache, reusing the existing pipeline and training infrastructure.
- The framework supports ongoing evolution of diffusion backbones without forcing reimplementation of existing controls.
Where Pith is reading between the lines
- Community contributions could accumulate as a shared library of Templates, reducing duplicated effort across research groups.
- A common runtime might make direct comparisons between control methods more straightforward by isolating differences to the Template layer.
- The same abstraction could be tested for compatibility with non-diffusion generative models if the cache interface is kept general.
Load-bearing premise
That an interface defined at the systems level can support all relevant control mechanisms without needing changes specific to each control architecture.
What would settle it
A new control technique that cannot be expressed as a Template model plus cache without modifying the base diffusion model's inference code or runtime hooks.
Figures
read the original abstract
Controllable diffusion methods have substantially expanded the practical utility of diffusion models, but they are typically developed as isolated, backbone-specific systems with incompatible training pipelines, parameter formats, and runtime hooks. This fragmentation makes it difficult to reuse infrastructure across tasks, transfer capabilities across backbones, or compose multiple controls within a single generation pipeline. We present Diffusion Templates, a unified and open plugin framework that decouples base-model inference from controllable capability injection. The framework is organized around three components: Template models that map arbitrary task-specific inputs to an intermediate capability representation, a Template cache that functions as a standardized interface for capability injection, and a Template pipeline that loads, merges, and injects one or more Template caches into the base diffusion runtime. Because the interface is defined at the systems level rather than tied to a specific control architecture, heterogeneous capability carriers such as KV-Cache and LoRA can be supported under the same abstraction. Based on this design, we build a diverse model zoo spanning structural control, brightness adjustment, color adjustment, image editing, super-resolution, sharpness enhancement, aesthetic alignment, content reference, local inpainting, and age control. These case studies show that Diffusion Templates can unify a broad range of controllable generation tasks while preserving modularity, composability, and practical extensibility across rapidly evolving diffusion backbones. All resources will be open sourced, including code, models, and datasets.
Editorial analysis
A structured set of objections, weighed in public.
Referee Report
Summary. The paper proposes Diffusion Templates, a unified plugin framework for controllable diffusion that decouples base-model inference from capability injection via three components: Template models (mapping task inputs to intermediate representations), a Template cache (standardized injection interface), and a Template pipeline (loading, merging, and injecting caches into the runtime). It claims this systems-level abstraction supports heterogeneous carriers (e.g., KV-Cache, LoRA) without architecture-specific ties, enabling unification, modularity, composability, and extensibility. The authors present a model zoo of ten case studies (structural control, brightness/color adjustment, editing, super-resolution, sharpness, aesthetic alignment, content reference, inpainting, age control) and commit to open-sourcing code, models, and datasets.
Significance. If the abstraction truly permits uniform handling of disparate carriers and backbones while preserving quality and efficiency, the work could meaningfully reduce fragmentation in controllable diffusion by enabling infrastructure reuse, cross-backbone transfer, and multi-control composition. The explicit open-sourcing of all resources is a concrete strength that would support reproducibility and community extensions.
major comments (2)
- [Abstract] Abstract (design description of Template cache and pipeline): the claim that 'the interface is defined at the systems level rather than tied to a specific control architecture' allowing KV-Cache and LoRA under the same abstraction is load-bearing for the unification claim, yet the manuscript provides no concrete specification, pseudocode, or merge logic showing how distinct operations (runtime tensor modification for KV-Cache vs. weight-matrix updates for LoRA) are dispatched uniformly without carrier-specific handlers inside the standardized interface.
- [Abstract] Abstract (case studies paragraph): the assertion that the framework 'can unify a broad range of controllable generation tasks while preserving modularity, composability, and practical extensibility' lacks any quantitative support; no performance metrics, quality comparisons, efficiency measurements, or ablation results are reported for the ten tasks, leaving the central claim that unification occurs 'without loss of quality or efficiency' unsubstantiated.
Simulated Author's Rebuttal
We thank the referee for the constructive feedback and for acknowledging the potential of Diffusion Templates to address fragmentation in controllable diffusion. We address each major comment below with clarifications and commit to revisions that strengthen the manuscript without overstating the current content.
read point-by-point responses
-
Referee: [Abstract] Abstract (design description of Template cache and pipeline): the claim that 'the interface is defined at the systems level rather than tied to a specific control architecture' allowing KV-Cache and LoRA under the same abstraction is load-bearing for the unification claim, yet the manuscript provides no concrete specification, pseudocode, or merge logic showing how distinct operations (runtime tensor modification for KV-Cache vs. weight-matrix updates for LoRA) are dispatched uniformly without carrier-specific handlers inside the standardized interface.
Authors: We agree that the abstract does not contain concrete specification, pseudocode, or explicit merge logic, which leaves the systems-level unification claim insufficiently supported in that section. The framework design intends for the Template cache to act as a carrier-agnostic container and for the pipeline to perform generic loading and merging, with carrier-specific dispatch handled via extensible plugins. To resolve this, we will revise the abstract to include a concise description of the interface and add pseudocode plus a merge-logic diagram to the main text in the revised manuscript. This will explicitly illustrate uniform handling at the systems level. revision: yes
-
Referee: [Abstract] Abstract (case studies paragraph): the assertion that the framework 'can unify a broad range of controllable generation tasks while preserving modularity, composability, and practical extensibility' lacks any quantitative support; no performance metrics, quality comparisons, efficiency measurements, or ablation results are reported for the ten tasks, leaving the central claim that unification occurs 'without loss of quality or efficiency' unsubstantiated.
Authors: We acknowledge that the abstract reports no quantitative metrics, comparisons, or ablations, so the claim of unification without loss of quality or efficiency is not numerically substantiated there. The ten case studies currently serve to demonstrate breadth and feasibility through qualitative examples across tasks and backbones. In the revision we will add quantitative evaluations, including quality metrics (e.g., FID, CLIP scores), efficiency measurements (e.g., runtime overhead), and limited baseline comparisons plus composition ablations for a subset of tasks. These additions will be supported by the open-sourced code, models, and datasets. revision: yes
Circularity Check
No circularity: high-level systems framework with no derivations or self-referential reductions
full rationale
The paper presents a descriptive architectural framework consisting of Template models, a Template cache interface, and a Template pipeline for unifying controllable diffusion tasks. No mathematical equations, derivations, parameter fittings, or predictive claims appear in the abstract or described content. The central claim—that a systems-level interface enables support for heterogeneous carriers such as KV-Cache and LoRA—is advanced as a design property rather than derived from prior results or self-defined quantities. No self-citations, uniqueness theorems, or ansatzes are invoked as load-bearing steps. The work is self-contained as an engineering abstraction without reduction to its inputs by construction.
Axiom & Free-Parameter Ledger
axioms (1)
- domain assumption An interface defined at the systems level can support heterogeneous capability carriers such as KV-Cache and LoRA under the same abstraction.
invented entities (3)
-
Template model
no independent evidence
-
Template cache
no independent evidence
-
Template pipeline
no independent evidence
Forward citations
Cited by 2 Pith papers
-
BRIDGE: Background Routing and Isolated Discrete Gating for Coarse-Mask Local Editing
BRIDGE uses separate main and subject paths plus a discrete gate on positional embeddings to improve local edits with coarse masks, raising local SigLIP2-T from 0.39 to 0.50 on its benchmark.
-
BRIDGE: Background Routing and Isolated Discrete Gating for Coarse-Mask Local Editing
BRIDGE improves coarse-mask local image editing in DiT models by routing background and subject paths separately and using a discrete geometric gate on positional embeddings to reduce mask-shape bias.
Reference graph
Works this paper leans on
-
[1]
Apiserve: Efficient api support for large-language model inferencing
Reyna Abhyankar, Zijian He, Vikranth Srivatsa, Hao Zhang, and Yiying Zhang. Infercept: Efficient intercept support for augmented large language model inference.arXiv preprint arXiv:2402.01869, 2024
-
[2]
Model context protocol specification
Anthropic. Model context protocol specification. Technical specification, 2024.https:// modelcontextprotocol.io/
2024
-
[3]
Introducing agent skills
Anthropic. Introducing agent skills. Product announcement, 2025.https://www.anthropic. com/news/skills, accessed April 12, 2026
2025
-
[4]
Flux.1 model family
Black Forest Labs. Flux.1 model family. Technical report/model release, 2024.https:// blackforestlabs.ai/
2024
-
[5]
A computational approach to edge detection.IEEE Transactions on pattern analysis and machine intelligence, 8(6):679–698, 1986
John Canny. A computational approach to edge detection.IEEE Transactions on pattern analysis and machine intelligence, 8(6):679–698, 1986
1986
-
[6]
Die Chen, Zhongjie Duan, Zhiwen Li, Cen Chen, Daoyuan Chen, Yaliang Li, and Yingda Chen. Attrictrl: Fine-grained control of aesthetic attribute intensity in diffusion models.arXiv preprint arXiv:2508.02151, 2025
-
[7]
PixArt-$\alpha$: Fast Training of Diffusion Transformer for Photorealistic Text-to-Image Synthesis
Junsong Chen, Jincheng Yu, Chongjian Ge, Lewei Yao, Enze Xie, Yue Wu, Zhongdao Wang, James Kwok, Ping Luo, Huchuan Lu, et al. Pixart-α: Fast training of diffusion transformer for photorealistic text-to-image synthesis.arXiv preprint arXiv:2310.00426, 2023
work page internal anchor Pith review arXiv 2023
-
[8]
FlashAttention-2: Faster Attention with Better Parallelism and Work Partitioning
Tri Dao. Flashattention-2: Faster attention with better parallelism and work partitioning. arXiv preprint arXiv:2307.08691, 2023
work page internal anchor Pith review arXiv 2023
-
[9]
Flashattention: Fast and memory-efficient exact attention with io-awareness.Advances in neural information processing systems, 35:16344–16359, 2022
Tri Dao, Dan Fu, Stefano Ermon, Atri Rudra, and Christopher R´ e. Flashattention: Fast and memory-efficient exact attention with io-awareness.Advances in neural information processing systems, 35:16344–16359, 2022. 17
2022
-
[10]
Zhongjie Duan, Qianyi Zhao, Cen Chen, Daoyuan Chen, Wenmeng Zhou, Yaliang Li, and Yingda Chen. Artaug: Enhancing text-to-image generation through synthesis-understanding interaction.arXiv preprint arXiv:2412.12888, 2024
-
[11]
Scaling rectified flow trans- formers for high-resolution image synthesis
Patrick Esser, Sumith Kulal, Andreas Blattmann, Rahim Entezari, Jonas M¨ uller, Harry Saini, Yam Levi, Dominik Lorenz, Axel Sauer, Frederic Boesel, et al. Scaling rectified flow trans- formers for high-resolution image synthesis. InForty-first international conference on machine learning, 2024
2024
-
[12]
An image is worth one word: Personalizing text-to-image generation using textual inversion
Rinon Gal, Yuval Alaluf, Yuval Atzmon, Or Patashnik, Amit Haim Bermano, Gal Chechik, and Daniel Cohen-Or. An image is worth one word: Personalizing text-to-image generation using textual inversion. InThe Eleventh International Conference on Learning Representations, 2023
2023
-
[13]
LTX-2: Efficient Joint Audio-Visual Foundation Model
Yoav HaCohen, Benny Brazowski, Nisan Chiprut, Yaki Bitterman, Andrew Kvochko, Avishai Berkowitz, Daniel Shalem, Daphna Lifschitz, Dudu Moshe, Eitan Porat, et al. Ltx-2: Efficient joint audio-visual foundation model.arXiv preprint arXiv:2601.03233, 2026
work page Pith review arXiv 2026
-
[14]
Denoising diffusion probabilistic models.Advances in neural information processing systems, 33:6840–6851, 2020
Jonathan Ho, Ajay Jain, and Pieter Abbeel. Denoising diffusion probabilistic models.Advances in neural information processing systems, 33:6840–6851, 2020
2020
-
[15]
Classifier-Free Diffusion Guidance
Jonathan Ho and Tim Salimans. Classifier-free diffusion guidance.arXiv preprint arXiv:2207.12598, 2022
work page internal anchor Pith review arXiv 2022
-
[16]
Lora: Low-rank adaptation of large language models.Iclr, 1(2):3, 2022
Edward J Hu, Yelong Shen, Phillip Wallis, Zeyuan Allen-Zhu, Yuanzhi Li, Shean Wang, Liang Wang, Weizhu Chen, et al. Lora: Low-rank adaptation of large language models.Iclr, 1(2):3, 2022
2022
-
[17]
Genai arena: An open evaluation platform for generative models.Advances in Neural Information Processing Systems, 37:79889–79908, 2024
Dongfu Jiang, Max Ku, Tianle Li, Yuansheng Ni, Shizhuo Sun, Rongqi Fan, and Wenhu Chen. Genai arena: An open evaluation platform for generative models.Advances in Neural Information Processing Systems, 37:79889–79908, 2024
2024
-
[18]
Auto-Encoding Variational Bayes
Diederik P Kingma and Max Welling. Auto-encoding variational bayes.arXiv preprint arXiv:1312.6114, 2013
work page internal anchor Pith review arXiv 2013
-
[19]
Pick-a-pic: An open dataset of user preferences for text-to-image generation.Advances in neural information processing systems, 36:36652–36663, 2023
Yuval Kirstain, Adam Polyak, Uriel Singer, Shahbuland Matiana, Joe Penna, and Omer Levy. Pick-a-pic: An open dataset of user preferences for text-to-image generation.Advances in neural information processing systems, 36:36652–36663, 2023
2023
-
[20]
HunyuanVideo: A Systematic Framework For Large Video Generative Models
Weijie Kong, Qi Tian, Zijian Zhang, Rox Min, Zuozhuo Dai, Jin Zhou, Jiangfeng Xiong, Xin Li, Bo Wu, Jianwei Zhang, et al. Hunyuanvideo: A systematic framework for large video generative models.arXiv preprint arXiv:2412.03603, 2024
work page internal anchor Pith review arXiv 2024
-
[21]
Efficient memory management for large language model serving with pagedattention
Woosuk Kwon, Zhuohan Li, Siyuan Zhuang, Ying Sheng, Lianmin Zheng, Cody Hao Yu, Joseph Gonzalez, Hao Zhang, and Ion Stoica. Efficient memory management for large language model serving with pagedattention. InProceedings of the 29th symposium on operating systems principles, pages 611–626, 2023
2023
-
[22]
Snapkv: Llm knows what you are looking for before generation.Advances in Neural Information Processing Systems, 37:22947–22970, 2024
Yuhong Li, Yingbing Huang, Bowen Yang, Bharat Venkitesh, Acyr Locatelli, Hanchen Ye, Tianle Cai, Patrick Lewis, and Deming Chen. Snapkv: Llm knows what you are looking for before generation.Advances in Neural Information Processing Systems, 37:22947–22970, 2024. 18
2024
-
[23]
Zhimin Li, Jianwei Zhang, Qin Lin, Jiangfeng Xiong, Yanxin Long, Xinchi Deng, Ying- fang Zhang, Xingchao Liu, Minbin Huang, Zedong Xiao, et al. Hunyuan-dit: A powerful multi-resolution diffusion transformer with fine-grained chinese understanding.arXiv preprint arXiv:2405.08748, 2024
-
[24]
T2i-adapter: Learning adapters to dig out more controllable ability for text-to-image diffusion models
Chong Mou, Xintao Wang, Liangbin Xie, Yanze Wu, Jian Zhang, Zhongang Qi, and Ying Shan. T2i-adapter: Learning adapters to dig out more controllable ability for text-to-image diffusion models. InProceedings of the AAAI conference on artificial intelligence, volume 38, pages 4296–4304, 2024
2024
-
[25]
Function calling and tool use in openai models
OpenAI. Function calling and tool use in openai models. Technical documentation, 2023. https://platform.openai.com/docs/guides/function-calling
2023
-
[26]
Scalable diffusion models with transformers
William Peebles and Saining Xie. Scalable diffusion models with transformers. InProceedings of the IEEE/CVF international conference on computer vision, pages 4195–4205, 2023
2023
-
[27]
SDXL: Improving Latent Diffusion Models for High-Resolution Image Synthesis
Dustin Podell, Zion English, Kyle Lacey, Andreas Blattmann, Tim Dockhorn, Jonas M¨ uller, Joe Penna, and Robin Rombach. Sdxl: Improving latent diffusion models for high-resolution image synthesis.arXiv preprint arXiv:2307.01952, 2023
work page internal anchor Pith review arXiv 2023
-
[28]
Mooncake: A kvcache-centric disaggregated archi- tecture for llm serving.ACM Transactions on Storage, 2024
Ruoyu Qin, Zheming Li, Weiran He, Jialei Cui, Heyi Tang, Feng Ren, Teng Ma, Shangming Cai, Yineng Zhang, Mingxing Zhang, et al. Mooncake: A kvcache-centric disaggregated archi- tecture for llm serving.ACM Transactions on Storage, 2024
2024
-
[29]
Tool learning with foundation models.ACM Computing Surveys, 57(4):1–40, 2024
Yujia Qin, Shengding Hu, Yankai Lin, Weize Chen, Ning Ding, Ganqu Cui, Zheni Zeng, Xuanhe Zhou, Yufei Huang, Chaojun Xiao, et al. Tool learning with foundation models.ACM Computing Surveys, 57(4):1–40, 2024
2024
-
[30]
Learning transferable visual models from natural language supervision
Alec Radford, Jong Wook Kim, Chris Hallacy, Aditya Ramesh, Gabriel Goh, Sandhini Agarwal, Girish Sastry, Amanda Askell, Pamela Mishkin, Jack Clark, et al. Learning transferable visual models from natural language supervision. InInternational conference on machine learning, pages 8748–8763. PmLR, 2021
2021
-
[31]
Exploring the limits of transfer learning with a unified text-to-text transformer.Journal of machine learning research, 21(140):1–67, 2020
Colin Raffel, Noam Shazeer, Adam Roberts, Katherine Lee, Sharan Narang, Michael Matena, Yanqi Zhou, Wei Li, and Peter J Liu. Exploring the limits of transfer learning with a unified text-to-text transformer.Journal of machine learning research, 21(140):1–67, 2020
2020
-
[32]
High-resolution image synthesis with latent diffusion models
Robin Rombach, Andreas Blattmann, Dominik Lorenz, Patrick Esser, and Bj¨ orn Ommer. High-resolution image synthesis with latent diffusion models. InProceedings of the IEEE/CVF conference on computer vision and pattern recognition, pages 10684–10695, 2022
2022
-
[33]
U-net: Convolutional networks for biomedical image segmentation
Olaf Ronneberger, Philipp Fischer, and Thomas Brox. U-net: Convolutional networks for biomedical image segmentation. InInternational Conference on Medical image computing and computer-assisted intervention, pages 234–241. Springer, 2015
2015
-
[34]
Dex: Deep expectation of apparent age from a single image
Rasmus Rothe, Radu Timofte, and Luc Van Gool. Dex: Deep expectation of apparent age from a single image. InProceedings of the IEEE international conference on computer vision workshops, pages 10–15, 2015
2015
-
[35]
Dreambooth: Fine tuning text-to-image diffusion models for subject-driven generation
Nataniel Ruiz, Yuanzhen Li, Varun Jampani, Yael Pritch, Michael Rubinstein, and Kfir Aber- man. Dreambooth: Fine tuning text-to-image diffusion models for subject-driven generation. InProceedings of the IEEE/CVF conference on computer vision and pattern recognition, pages 22500–22510, 2023. 19
2023
-
[36]
Toolformer: Language models can teach themselves to use tools.Advances in neural information processing systems, 36:68539– 68551, 2023
Timo Schick, Jane Dwivedi-Yu, Roberto Dess` ı, Roberta Raileanu, Maria Lomeli, Eric Hambro, Luke Zettlemoyer, Nicola Cancedda, and Thomas Scialom. Toolformer: Language models can teach themselves to use tools.Advances in neural information processing systems, 36:68539– 68551, 2023
2023
-
[37]
Denoising Diffusion Implicit Models
Jiaming Song, Chenlin Meng, and Stefano Ermon. Denoising diffusion implicit models.arXiv preprint arXiv:2010.02502, 2020
work page internal anchor Pith review arXiv 2010
-
[38]
Vikranth Srivatsa, Zijian He, Reyna Abhyankar, Dongming Li, and Yiying Zhang. Preble: Efficient distributed prompt scheduling for llm serving.arXiv preprint arXiv:2407.00023, 2024
-
[39]
Michael Tschannen, Alexey Gritsenko, Xiao Wang, Muhammad Ferjad Naeem, Ibrahim Al- abdulmohsin, Nikhil Parthasarathy, Talfan Evans, Lucas Beyer, Ye Xia, Basil Mustafa, et al. Siglip 2: Multilingual vision-language encoders with improved semantic understanding, local- ization, and dense features.arXiv preprint arXiv:2502.14786, 2025
work page internal anchor Pith review arXiv 2025
-
[40]
Wan: Open and Advanced Large-Scale Video Generative Models
Team Wan, Ang Wang, Baole Ai, Bin Wen, Chaojie Mao, Chen-Wei Xie, Di Chen, Feiwu Yu, Haiming Zhao, Jianxiao Yang, et al. Wan: Open and advanced large-scale video generative models.arXiv preprint arXiv:2503.20314, 2025
work page internal anchor Pith review arXiv 2025
-
[41]
Real-esrgan: Training real-world blind super-resolution with pure synthetic data
Xintao Wang, Liangbin Xie, Chao Dong, and Ying Shan. Real-esrgan: Training real-world blind super-resolution with pure synthetic data. InProceedings of the IEEE/CVF international conference on computer vision, pages 1905–1914, 2021
1905
-
[42]
Chenfei Wu, Jiahao Li, Jingren Zhou, Junyang Lin, Kaiyuan Gao, Kun Yan, Sheng-ming Yin, Shuai Bai, Xiao Xu, Yilei Chen, et al. Qwen-image technical report.arXiv preprint arXiv:2508.02324, 2025
work page internal anchor Pith review arXiv 2025
-
[43]
The rise and potential of large language model based agents: A survey.Science China Information Sciences, 68(2):121101, 2025
Zhiheng Xi, Wenxiang Chen, Xin Guo, Wei He, Yiwen Ding, Boyang Hong, Ming Zhang, Junzhe Wang, Senjie Jin, Enyu Zhou, et al. The rise and potential of large language model based agents: A survey.Science China Information Sciences, 68(2):121101, 2025
2025
-
[44]
Enze Xie, Junsong Chen, Junyu Chen, Han Cai, Haotian Tang, Yujun Lin, Zhekai Zhang, Muyang Li, Ligeng Zhu, Yao Lu, et al. Sana: Efficient high-resolution image synthesis with linear diffusion transformers.arXiv preprint arXiv:2410.10629, 2024
-
[45]
An Yang, Anfeng Li, Baosong Yang, Beichen Zhang, Binyuan Hui, Bo Zheng, Bowen Yu, Chang Gao, Chengen Huang, Chenxu Lv, et al. Qwen3 technical report.arXiv preprint arXiv:2505.09388, 2025
work page internal anchor Pith review arXiv 2025
-
[46]
React: Synergizing reasoning and acting in language models
Shunyu Yao, Jeffrey Zhao, Dian Yu, Nan Du, Izhak Shafran, Karthik R Narasimhan, and Yuan Cao. React: Synergizing reasoning and acting in language models. InThe eleventh international conference on learning representations, 2022
2022
-
[47]
IP-Adapter: Text Compatible Image Prompt Adapter for Text-to-Image Diffusion Models
Hu Ye, Jun Zhang, Sibo Liu, Xiao Han, and Wei Yang. Ip-adapter: Text compatible image prompt adapter for text-to-image diffusion models.arXiv preprint arXiv:2308.06721, 2023
work page internal anchor Pith review arXiv 2023
-
[48]
Eligen: Entity- level controlled image generation with regional attention
Hong Zhang, Zhongjie Duan, Xingjun Wang, Yingda Chen, and Yu Zhang. Eligen: Entity- level controlled image generation with regional attention. InProceedings of the 7th ACM International Conference on Multimedia in Asia, pages 1–7, 2025. 20
2025
-
[49]
Adding conditional control to text-to- image diffusion models
Lvmin Zhang, Anyi Rao, and Maneesh Agrawala. Adding conditional control to text-to- image diffusion models. InProceedings of the IEEE/CVF international conference on computer vision, pages 3836–3847, 2023
2023
-
[50]
H2o: Heavy-hitter oracle for effi- cient generative inference of large language models.Advances in Neural Information Processing Systems, 36:34661–34710, 2023
Zhenyu Zhang, Ying Sheng, Tianyi Zhou, Tianlong Chen, Lianmin Zheng, Ruisi Cai, Zhao Song, Yuandong Tian, Christopher R´ e, Clark Barrett, et al. H2o: Heavy-hitter oracle for effi- cient generative inference of large language models.Advances in Neural Information Processing Systems, 36:34661–34710, 2023. 21
2023
discussion (0)
Sign in with ORCID, Apple, or X to comment. Anyone can read and Pith papers without signing in.