Recognition: unknown
Creo: From One-Shot Image Generation to Progressive, Co-Creative Ideation
Pith reviewed 2026-05-10 12:35 UTC · model grok-4.3
The pith
Creo shows that building images in progressive stages with decision locking gives users more control and yields more varied results than one-shot generation.
A machine-rendered reading of the paper's core claim, the machinery that carries it, and where it could break.
Core claim
Creo scaffolds image generation by progressing from rough sketches to high-resolution outputs, exposing intermediary abstractions where users can make incremental changes. Each stage can be modified with manual changes and AI-assisted operations, enabling fine-grained, step-wise control through a locking mechanism that preserves prior decisions so subsequent edits affect only specified regions or attributes. Users remain in the loop, making and verifying decisions across stages, while the system applies diffs instead of regenerating full images, reducing drift as fidelity increases.
What carries the argument
The multi-stage pipeline with a locking mechanism that preserves prior decisions during targeted later edits and uses incremental diffs instead of full regenerations.
Load-bearing premise
The user study and embedding analysis with the tested participants and tasks will hold for other people and creative work, and the locking feature will avoid creating new rigidity or usability problems.
What would settle it
A follow-up study with more participants and varied tasks that finds no increase in reported ownership or no reduction in output homogeneity for Creo compared with one-shot generation.
Figures
read the original abstract
Text-to-image (T2I) systems enable rapid generation of high-fidelity imagery but are misaligned with how visual ideas develop. T2I systems generate outputs that make implicit visual decisions on behalf of the user, often introduce fine-grained details that can anchor users prematurely and limit their ability to keep options open early on, and cause unintended changes during editing that are difficult to correct and reduce users' sense of control. To address these concerns, we present Creo, a multi-stage T2I system that scaffolds image generation by progressing from rough sketches to high-resolution outputs, exposing intermediary abstractions where users can make incremental changes. Sketch-like abstractions invite user editing and allow users to keep design options open when ideas are still forming due to their provisional nature. Each stage in Creo can be modified with manual changes and AI-assisted operations, enabling fine-grained, step-wise control through a locking mechanism that preserves prior decisions so subsequent edits affect only specified regions or attributes. Users remain in the loop, making and verifying decisions across stages, while the system applies diffs instead of regenerating full images, reducing drift as fidelity increases. A comparative study with a one-shot baseline shows that participants felt stronger ownership over Creo outputs, as they were able to trace their decisions in building up the image. Furthermore, embedding-based analysis indicates that Creo outputs are less homogeneous than one-shot results. These findings suggest that multi-stage generation, combined with intermediate control and decision locking, is a key design principle for improving controllability, user agency, creativity, and output diversity in generative systems.
Editorial analysis
A structured set of objections, weighed in public.
Referee Report
Summary. The paper introduces Creo, a multi-stage text-to-image system that scaffolds generation from rough sketches to high-resolution outputs using editable intermediary abstractions and a decision-locking mechanism that applies diffs to preserve prior user decisions. A comparative user study against a one-shot baseline reports higher user ownership with Creo, and an embedding analysis indicates less homogeneous outputs. The authors conclude that multi-stage generation combined with intermediate control and decision locking is a key design principle for improving controllability, agency, creativity, and diversity in generative systems.
Significance. If the empirical results prove robust, this work offers a concrete design principle for co-creative interfaces that could meaningfully advance human-AI collaboration in visual ideation by reducing premature commitment and unintended drift. The progressive scaffolding approach directly targets well-documented pain points in current T2I tools and provides initial evidence that such workflows can increase user agency and output variety.
major comments (2)
- [Abstract / User Study] The user study (described in the abstract and evaluation sections) compares only the full Creo system against a one-shot baseline and includes no ablation conditions that retain multi-stage progression while disabling the decision-locking mechanism. This design cannot isolate whether locking (via diff application to preserve prior decisions) is necessary for the reported gains in ownership and reduced drift, or whether any staged interface would produce similar benefits. The distinction is load-bearing for the central claim that 'multi-stage generation, combined with intermediate control and decision locking, is a key design principle.'
- [Abstract] The abstract reports ownership gains and an embedding-based homogeneity analysis but omits essential methodological details: participant count, task descriptions, statistical tests, exact embedding model and distance metric, and any controls for the locking component. Without these, the soundness and generalizability of the empirical support for the design principle cannot be assessed.
minor comments (1)
- Additional diagrams illustrating the locking mechanism across stages and how diffs are applied would improve clarity of the system description.
Simulated Author's Rebuttal
We thank the referee for their thoughtful and constructive feedback. We address each major comment point by point below, providing clarifications and indicating revisions where appropriate.
read point-by-point responses
-
Referee: [Abstract / User Study] The user study (described in the abstract and evaluation sections) compares only the full Creo system against a one-shot baseline and includes no ablation conditions that retain multi-stage progression while disabling the decision-locking mechanism. This design cannot isolate whether locking (via diff application to preserve prior decisions) is necessary for the reported gains in ownership and reduced drift, or whether any staged interface would produce similar benefits. The distinction is load-bearing for the central claim that 'multi-stage generation, combined with intermediate control and decision locking, is a key design principle.'
Authors: We agree that an ablation isolating the decision-locking mechanism from multi-stage progression alone would strengthen causal claims about its specific role. However, the locking mechanism (via diff-based preservation of prior decisions) is tightly integrated into Creo's multi-stage workflow; without it, the system would default to full regeneration at each stage, reintroducing the very drift and loss of agency the design seeks to mitigate. The one-shot baseline was chosen to represent the dominant current paradigm in T2I tools. We have added a dedicated paragraph in the Discussion section acknowledging this as a limitation of the current study design, explaining the integrated rationale, and outlining plans for future ablation experiments. The central claim is framed around the combined system rather than isolated components. revision: partial
-
Referee: [Abstract] The abstract reports ownership gains and an embedding-based homogeneity analysis but omits essential methodological details: participant count, task descriptions, statistical tests, exact embedding model and distance metric, and any controls for the locking component. Without these, the soundness and generalizability of the empirical support for the design principle cannot be assessed.
Authors: We appreciate this observation. While abstracts are length-constrained, we have revised the abstract to incorporate the requested details: participant count, a concise description of the two ideation tasks, the statistical tests applied to ownership measures, the embedding model and distance metric used for the homogeneity analysis, and explicit mention that decision locking is active in the Creo condition. These additions are drawn directly from the Evaluation section and improve transparency without altering the abstract's core message. revision: yes
Circularity Check
No circularity: empirical system evaluation without derivations or self-referential fits
full rationale
The paper describes a multi-stage T2I interface and supports its claims via a comparative user study (ownership, agency) plus embedding homogeneity analysis against a one-shot baseline. No equations, parameters, or derivations appear; the central design principle is presented as an empirical finding rather than a mathematical result that reduces to its inputs. No self-citation chains, uniqueness theorems, or fitted quantities renamed as predictions are invoked. The study design limitations noted by the skeptic concern experimental controls, not circular reasoning in any derivation chain.
Axiom & Free-Parameter Ledger
Reference graph
Works this paper leans on
-
[1]
Maneesh Agrawala. 2023. Unpredictable black boxes are terrible interfaces.ACM TechTalks(2023)
2023
-
[2]
Hadi Alzayer, Zhihao Xia, Xuaner Zhang, Eli Shechtman, Jia-Bin Huang, and Michael Gharbi. 2025. Magic fixup: Streamlining photo editing by watching dynamic videos.ACM Transactions on Graphics44, 5 (2025), 1–25
2025
-
[3]
Artists & Illustrators. 2021. How to Illustrate a Children’s Book.Artists & Illustrators(2021). https://www.artistsandillustrators.co.uk/how-to/illustration/ how-to-illustrate-a-childrens-book/ Accessed January 2026
2021
-
[4]
Seok-Hyung Bae, Ravin Balakrishnan, and Karan Singh. 2008. ILoveSketch: as- natural-as-possible sketching system for creating 3d curve models. InProceedings of the 21st annual ACM symposium on User interface software and technology. 151–160
2008
-
[5]
Eric J Bigelow, John P McCoy, and Tomer D Ullman. 2023. Non-commitment in mental imagery.Cognition238 (2023), 105498
2023
-
[6]
Stephen Brade, Bryan Wang, Mauricio Sousa, Sageev Oore, and Tovi Gross- man. 2023. Promptify: Text-to-image generation through interactive prompt exploration with large language models. InProceedings of the 36th Annual ACM Symposium on User Interface Software and Technology. 1–14
2023
-
[7]
Juliette Brun, Pascal Le Masson, and Benoit Weil. 2016. Designing with sketches: the generative effects of knowledge preordering.Design Science2 (2016), e13
2016
-
[8]
2010.Sketching user experiences: getting the design right and the right design
Bill Buxton. 2010.Sketching user experiences: getting the design right and the right design. Morgan kaufmann
2010
-
[9]
Fanny Chevalier, Pierre Dragicevic, and Christophe Hurter. 2012. Histomages: fully synchronized views for image editing. InProceedings of the 25th annual ACM symposium on User interface software and technology. 281–286
2012
-
[10]
Francis D. K. Ching. 2014.Architectural Graphics(6 ed.). Wiley, Hoboken, NJ, USA
2014
-
[11]
2013.The Visual Language of Comics: Introduction to the Structure and Cognition of Sequential Images
Neil Cohn. 2013.The Visual Language of Comics: Introduction to the Structure and Cognition of Sequential Images. Bloomsbury Academic, London
2013
-
[12]
Sebastian Deterding, Jonathan Hook, Rebecca Fiebrink, Marco Gillies, Jeremy Gow, Memo Akten, Gillian Smith, Antonios Liapis, and Kate Compton. 2017. Mixed-initiative creative interfaces. InProceedings of the 2017 CHI conference extended abstracts on human factors in computing systems. 628–635
2017
-
[13]
Anil R Doshi and Oliver P Hauser. 2024. Generative AI enhances individual creativity but reduces the collective diversity of novel content.Science advances 10, 28 (2024), eadn5290
2024
-
[14]
1914.Composition
Arthur Wesley Dow. 1914.Composition. Doubleday, Doran, Incorporated
1914
-
[15]
2008.Comics and Sequential Art(revised edition ed.)
Will Eisner. 2008.Comics and Sequential Art(revised edition ed.). W. W. Norton & Company, New York
2008
-
[16]
Judith E Fan, Wilma A Bainbridge, Rebecca Chamberlain, and Jeffrey D Wammes
-
[17]
Drawing as a versatile cognitive tool.Nature Reviews Psychology2, 9 (2023), 556–568
2023
-
[18]
Yingchaojie Feng, Xingbo Wang, Kam Kwai Wong, Sijia Wang, Yuhong Lu, Min- feng Zhu, Baicheng Wang, and Wei Chen. 2023. Promptmagician: Interactive prompt engineering for text-to-image creation.IEEE Transactions on Visualization and Computer Graphics30, 1 (2023), 295–305
2023
-
[19]
Jennifer Fernquist, Tovi Grossman, and George Fitzmaurice. 2011. Sketch-sketch revolution: an engaging tutorial system for guided sketching and application learning. InProceedings of the 24th annual ACM symposium on User interface software and technology. 373–382
2011
-
[20]
1995.Sketches of thought
Vinod Goel. 1995.Sketches of thought. MIT press
1995
-
[21]
Gabriela Goldschmidt. 2014. Modeling the role of sketching in design idea gener- ation. InAn anthology of theories and models of design: philosophy, approaches and empirical explorations. Springer, 433–450
2014
-
[22]
Charles Goodwin. [n. d.]. 1994!. Professional Vision.American Anthropologist96, 3 ([n. d.]), 606–633
1994
-
[23]
Thomas RG Green. 1989. Cognitive dimensions of notations.People and computers V(1989), 443–460
1989
- [24]
-
[25]
Aaron Hertzmann. 2020. Why do line drawings work? a realism hypothesis. Perception49, 4 (2020), 439–451
2020
-
[26]
Aspen Hopkins, Angie Boggust, and Harini Suresh. 2025. Chatbot Evaluation Is (Sometimes) Ill-Posed: Contextualization Errors in the Human-Interface-Model Pipeline. InProceedings of the Human-Centered Evaluation and Auditing Workshop (HEAL@CHI)
2025
-
[27]
Emmanuel Iarussi, Adrien Bousseau, and Theophanis Tsandilas. 2013. The draw- ing assistant: Automated drawing guidance and feedback from photographs. In ACM Symposium on User Interface Software and Technology (UIST). ACM
2013
-
[28]
Takeo Igarashi, Satoshi Matsuoka, and Hidehiko Tanaka. 1999. Teddy: A Sketch- ing Interface for 3D Freeform Design. InProceedings of ACM SIGGRAPH. ACM, 409–416
1999
-
[29]
David G Jansson and Steven M Smith. 1991. Design fixation.Design studies12, 1 (1991), 3–11
1991
-
[30]
Seung-Jun Lee, Jeongche Yoon, Sang-Hyun Lee, Joon Hyub Lee, and Seok-Hyung Bae. 2025. 3D Sketching + 2D Generative AI for Car Exterior Design. InProceed- ings of the 38th Annual ACM Symposium on User Interface Software and Technology (UIST 2025). ACM. doi:10.1145/3746059.3747609 Best Demo Honorable Mention
-
[31]
Youn-Kyung Lim, Erik Stolterman, and Josh Tenenberg. 2008. The anatomy of prototypes: Prototypes as filters, prototypes as manifestations of design ideas. ACM Transactions on Computer-Human Interaction (TOCHI)15, 2 (2008), 1–27
2008
-
[32]
Margaret Livingstone and David H Hubel. 2002. Vision and art: The biology of seeing.(No Title)(2002)
2002
-
[33]
1947.Creative illustration
Andrew Loomis. 1947.Creative illustration. Viking Press New York, NY
1947
-
[34]
Todd Lubart. 2005. How can computers be partners in the creative process: classification and commentary on the special issue.International journal of human-computer studies63, 4-5 (2005), 365–369
2005
-
[35]
Jiaju Ma, Chau Vu, Asya Lyubavina, Catherine Liu, and Jingyi Li. 2025. Compu- tational Scaffolding of Composition, Value, and Color for Disciplined Drawing. InProceedings of the 38th Annual ACM Symposium on User Interface Software and Technology (UIST ’25). Association for Computing Machinery, New York, NY, USA, Article 161, 15 pages. doi:10.1145/3746059.3747605
-
[36]
1994.Understanding Comics: The Invisible Art
Scott McCloud. 1994.Understanding Comics: The Invisible Art. HarperCollins, New York
1994
-
[37]
1941.The Natural Way to Draw
Kimon Nicolaides. 1941.The Natural Way to Draw. Houghton Mifflin, Boston, MA, USA
1941
-
[38]
Peter O’Donovan, Aseem Agarwala, and Aaron Hertzmann. 2015. DesignScape: Design with Interactive Layout Suggestions. InProceedings of the 33rd Annual ACM Conference on Human Factors in Computing Systems(Seoul, Republic of Korea)(CHI ’15). Association for Computing Machinery, New York, NY, USA, 1221–1224. doi:10.1145/2702123.2702149
-
[39]
Changhoon Oh, Jungwoo Song, Jinhan Choi, Seonghyeon Kim, Sungwoo Lee, and Bongwon Suh. 2018. I lead, you help but only with enough details: Understanding user experience of co-creation with artificial intelligence. InProceedings of the 2018 CHI conference on human factors in computing systems. 1–13
2018
-
[40]
Okun and Susan Zwerman
Jeffrey A. Okun and Susan Zwerman. 2010.The VES Handbook of Visual Effects: Industry Standard VFX Practices and Procedures. Focal Press, Burlington, MA
2010
-
[41]
2012.Computer Animation: Algorithms and Techniques(3rd ed.)
Rick Parent. 2012.Computer Animation: Algorithms and Techniques(3rd ed.). Morgan Kaufmann, Burlington, MA
2012
-
[42]
Adobe Photoshop. 2026. photoshop.Retrieved January(2026)
2026
-
[43]
A Terry Purcell and John S Gero. 1998. Drawings and the design process: A review of protocol studies in design and other disciplines and related research in cognitive psychology.Design studies19, 4 (1998), 389–430
1998
-
[44]
Aditya Ramesh, Prafulla Dhariwal, Alex Nichol, Casey Chu, and Mark Chen
-
[45]
Hierarchical Text-Conditional Image Generation with CLIP Latents.arXiv preprint arXiv:2204.06125(2022)
work page internal anchor Pith review arXiv 2022
-
[46]
Robin Rombach, Andreas Blattmann, Dominik Lorenz, Patrick Esser, and Björn Ommer. 2022. High-Resolution Image Synthesis with Latent Diffusion Mod- els. InProceedings of the IEEE/CVF Conference on Computer Vision and Pattern Recognition
2022
-
[47]
Chitwan Saharia, William Chan, Saurabh Saxena, Lala Li, Jay Whang, Emily Denton, Seyed Kamyar Ghasemipour, and et al. 2022. Photorealistic Text-to- Image Diffusion Models with Deep Language Understanding.arXiv preprint arXiv:2205.11487(2022)
work page internal anchor Pith review arXiv 2022
-
[48]
Vishnu Sarukkai, Lu Yuan, Mia Tang, Maneesh Agrawala, and Kayvon Fatahalian
-
[49]
InProceedings of the 37th Annual ACM Symposium on User Interface Software and Technology
Block and detail: Scaffolding sketch-to-image generation. InProceedings of the 37th Annual ACM Symposium on User Interface Software and Technology. 1–13
-
[50]
Donald A. Schön. 1983.The Reflective Practitioner: How Professionals Think in Action. Basic Books, New York, NY
1983
-
[51]
Xinyu Shi, Li-Yi Wei, Nanxuan Zhao, Jian Zhao, and Rubaiat Habib Kazi. 2026. Notational Animating: An Interactive Approach to Creating and Editing Ani- mation Keyframes. InProceedings of the ACM Conference on Human Factors in Computing Systems (CHI 2026). ACM
2026
-
[52]
Ben Shneiderman. 2007. Creativity Support Tools: Accelerating Discovery and Innovation.Commun. ACM50, 12 (2007), 20–32
2007
-
[53]
Maria Shugrina, Jingwan Lu, and Stephen Diverdi. 2017. Playful palette: an interactive parametric color mixer for artists.ACM Transactions on Graphics , , De Simone et al. (TOG)36, 4 (2017), 1–10
2017
-
[54]
Maria Shugrina, Wenjia Zhang, Fanny Chevalier, Sanja Fidler, and Karan Singh
-
[55]
InProceedings of the 2019 CHI conference on human factors in computing systems
Color builder: A direct manipulation interface for versatile color theme authoring. InProceedings of the 2019 CHI conference on human factors in computing systems. 1–12
2019
-
[56]
Hari Subramonyam, Roy Pea, Christopher Pondoc, Maneesh Agrawala, and Colleen Seifert. 2024. Bridging the gulf of envisioning: Cognitive challenges in prompt based interactions with llms. InProceedings of the 2024 CHI Conference on Human Factors in Computing Systems. 1–19
2024
-
[57]
Masaki Suwa, John Gero, and Terry Purcell. 2000. Unexpected discoveries and S-invention of design requirements: important vehicles for a design process. Design studies21, 6 (2000), 539–567
2000
-
[58]
Amanda Swearngin, Amy J Ko, and James Fogarty. 2018. Scout: Mixed-initiative exploration of design variations through high-level design constraints. InAdjunct Proceedings of the 31st Annual ACM Symposium on User Interface Software and Technology. 134–136
2018
-
[59]
Jeff Szuc. 2020. Behind the Scenes: Illustration Process Tutorial.Jeff Szuc (2020). https://www.jeffszuc.com/posts/articles/behind-the-scenes-illustration- process-tutorial Accessed January 2026
2020
-
[60]
Mia Tang, Yael Vinker, Chuan Yan, Lvmin Zhang, and Maneesh Agrawala. 2025. Instance Segmentation of Scene Sketches Using Natural Image Priors. InACM SIGGRAPH Conference Proceedings. ACM, 96:1–96:10
2025
-
[61]
1995.The Illusion of Life: Disney Animation
Frank Thomas and Ollie Johnston. 1995.The Illusion of Life: Disney Animation. Disney Editions, New York
1995
-
[62]
Barbara Tversky. 2002. What do sketches say about thinking. In2002 AAAI Spring Symposium, Sketch Understanding Workshop, Stanford University, AAAI Technical Report SS-02-08, Vol. 148. 151
2002
-
[63]
Ilse M Verstijnen, Cees van Leeuwen, Gabriela Goldschmidt, Ronald Hamel, and JM Hennessey. 1998. Sketching and creative discovery.Design studies19, 4 (1998), 519–546
1998
-
[64]
Yael Vinker, Tamar Rott Shaham, Kristine Zheng, Alex Zhao, Judith E Fan, and Antonio Torralba. 2025. SketchAgent: Language-Driven Sequential Sketch Gen- eration. InProceedings of the IEEE/CVF Conference on Computer Vision and Pattern Recognition (CVPR). 23355–23368
2025
-
[65]
Andrey Voynov, Kfir Aberman, and Daniel Cohen-Or. 2023. Sketch-guided text- to-image diffusion models. InACM SIGGRAPH 2023 conference proceedings. 1–11
2023
-
[66]
Samangi Wadinambiarachchi, Ryan M Kelly, Saumya Pareek, Qiushi Zhou, and Eduardo Velloso. 2024. The effects of generative ai on design fixation and diver- gent thinking. InProceedings of the 2024 CHI Conference on Human Factors in Computing Systems. 1–18
2024
-
[67]
Blake Williford, Abhay Doke, Michel Pahud, Ken Hinckley, and Tracy Ham- mond. 2019. DrawMyPhoto: assisting novices in drawing from photographs. In Proceedings of the 2019 Conference on Creativity and Cognition. 198–209
2019
-
[68]
Karl DD Willis, Yewen Pu, Jieliang Luo, Hang Chu, Tao Du, Joseph G Lambourne, Armando Solar-Lezama, and Wojciech Matusik. 2021. Fusion 360 gallery: A dataset and environment for programmatic cad construction from human design sequences.ACM Transactions on Graphics (TOG)40, 4 (2021), 1–24
2021
-
[69]
Lvmin Zhang, Anyi Rao, and Maneesh Agrawala. 2023. Adding conditional con- trol to text-to-image diffusion models. InProceedings of the IEEE/CVF international conference on computer vision. 3836–3847
2023
-
[70]
a close up of my main character
Lvmin Zhang, Chuan Yan, Yuwei Guo, Jinbo Xing, and Maneesh Agrawala. 2025. Generating Past and Future in Digital Painting Processes.ACM Transactions on Graphics(2025), 127:1–127:13. Creo , , 7 Appendix ACreoWorkflows: Two entry points to the same abstractions We support two entry points into theCreoworkflow: starting from a text prompt or from an existing...
2025
discussion (0)
Sign in with ORCID, Apple, or X to comment. Anyone can read and Pith papers without signing in.