arxiv: 2604.13956 · v1 · submitted 2026-04-15 · 💻 cs.HC · cs.AI· cs.CV

Recognition: unknown

Creo: From One-Shot Image Generation to Progressive, Co-Creative Ideation

Zoe De Simone , Angie Boggust , Fredo Durand , Ashia Wilson , Arvind Satyanarayan

Authors on Pith no claims yet

Pith reviewed 2026-05-10 12:35 UTC · model grok-4.3

classification 💻 cs.HC cs.AIcs.CV

keywords text-to-image generationmulti-stage generationdecision lockinguser controlco-creationgenerative AIhuman-AI interactionimage editing

0 comments

The pith

Creo shows that building images in progressive stages with decision locking gives users more control and yields more varied results than one-shot generation.

A machine-rendered reading of the paper's core claim, the machinery that carries it, and where it could break.

Text-to-image systems generate complete pictures in one step, which often locks in details too early and makes later edits cause unwanted drift. Creo instead moves through stages from rough sketches to finished images, showing intermediate versions where users can edit manually or with AI help. A locking mechanism keeps earlier choices fixed so only chosen parts change in later steps, and the system updates by applying small differences rather than redrawing everything. A study found that users felt they owned the final images more because they could follow their own decisions through the process, and the results were less uniform than those from a standard one-shot system.

Core claim

Creo scaffolds image generation by progressing from rough sketches to high-resolution outputs, exposing intermediary abstractions where users can make incremental changes. Each stage can be modified with manual changes and AI-assisted operations, enabling fine-grained, step-wise control through a locking mechanism that preserves prior decisions so subsequent edits affect only specified regions or attributes. Users remain in the loop, making and verifying decisions across stages, while the system applies diffs instead of regenerating full images, reducing drift as fidelity increases.

What carries the argument

The multi-stage pipeline with a locking mechanism that preserves prior decisions during targeted later edits and uses incremental diffs instead of full regenerations.

Load-bearing premise

The user study and embedding analysis with the tested participants and tasks will hold for other people and creative work, and the locking feature will avoid creating new rigidity or usability problems.

What would settle it

A follow-up study with more participants and varied tasks that finds no increase in reported ownership or no reduction in output homogeneity for Creo compared with one-shot generation.

Figures

Figures reproduced from arXiv: 2604.13956 by Angie Boggust, Arvind Satyanarayan, Ashia Wilson, Fredo Durand, Zoe De Simone.

**Figure 1.** Figure 1: Creo is a multi-stage image generation workflow. Unlike current text-to-image systems, Creo starts from a rough sketch, allowing users to progressively make visual decisions, like viewpoint, composition, color, lighting, and style. Abstract Text-to-image (T2I) systems enable rapid generation of high-fidelity imagery but are misaligned with how visual ideas develop. T2I generate outputs which make implicit … view at source ↗

**Figure 2.** Figure 2: Creo decomposes image generation into multiple stages. From a prompt, it generates (1) multiple viewpoints, after which the illustrator (2) refines composition, (3) color, (4) lighting, and (5) style in any order. Stages can be completed in any order. a single step, Creo begins with sketches that users progressively refine into high-fidelity images (DG3). 3.1 Multi-Stage Image Generation We designed Creo t… view at source ↗

**Figure 3.** Figure 3: Creo supports non-linear creative workflows by combining edits in earlier stages (e.g., adding a hat to the composition) with existing upstream decisions (e.g., dog color). Within each stage, Creo combines direct manipulation and AIassisted tools. We combine these two types of interactions to balance between giving users precise editing control and allowing them to delegate tedious edits to an AI model.… view at source ↗

**Figure 4.** Figure 4: Creo reduces design anchoring and homogenization. Visual embeddings of the images that users produced using Creo are less tightly clustered than those from GPT. In Creo, participants described the opposite effect. Intermediate outputs were treated as provisional and open to change. After viewing several sketch-like outputs in the viewpoint stage, P1 described the images as “a question mark. . . something… view at source ↗

**Figure 5.** Figure 5: Creo supports people’s non-linear creative workflows. Participants P4-P9 frequently explored stages out of order and revisited stages to generate their images. linearly. A majority (66.7%) revisited earlier stages and skipped others, entering through different starting points such as composition, lighting, or object-level refinement. These differences are reflected in stage usage patterns ( [PITH_FULL_I… view at source ↗

**Figure 6.** Figure 6: Two entry points into Creo’s progressive ideation workflow. Users can begin with an existing image. This image is reverse engineered into the lighting, color and composition stages. Users can edit each of the representations, as shown in the bottom half of the image, moving across stages and propagating decisions across stages. This allows for iterative refinement of visual decisions on existing images. Wi… view at source ↗

**Figure 7.** Figure 7: Creo decomposes image generation into multiple stages. From a prompt, it generates (1) multiple viewpoints, after which the illustrator (2) refines composition, (3) color, (4) lighting, and (5) style in any order. Stages can be completed in any order. concept art, interior design and architectural visualization, comics and storyboarding, graphic design and marketing, and animation or motion design. Partici… view at source ↗

**Figure 8.** Figure 8: Diagram depicting how editing instructions and [PITH_FULL_IMAGE:figures/full_fig_p017_8.png] view at source ↗

**Figure 9.** Figure 9: Diagram depicting how editing instructions and constrains passed across layers; revision loops etc. [PITH_FULL_IMAGE:figures/full_fig_p018_9.png] view at source ↗

read the original abstract

Text-to-image (T2I) systems enable rapid generation of high-fidelity imagery but are misaligned with how visual ideas develop. T2I systems generate outputs that make implicit visual decisions on behalf of the user, often introduce fine-grained details that can anchor users prematurely and limit their ability to keep options open early on, and cause unintended changes during editing that are difficult to correct and reduce users' sense of control. To address these concerns, we present Creo, a multi-stage T2I system that scaffolds image generation by progressing from rough sketches to high-resolution outputs, exposing intermediary abstractions where users can make incremental changes. Sketch-like abstractions invite user editing and allow users to keep design options open when ideas are still forming due to their provisional nature. Each stage in Creo can be modified with manual changes and AI-assisted operations, enabling fine-grained, step-wise control through a locking mechanism that preserves prior decisions so subsequent edits affect only specified regions or attributes. Users remain in the loop, making and verifying decisions across stages, while the system applies diffs instead of regenerating full images, reducing drift as fidelity increases. A comparative study with a one-shot baseline shows that participants felt stronger ownership over Creo outputs, as they were able to trace their decisions in building up the image. Furthermore, embedding-based analysis indicates that Creo outputs are less homogeneous than one-shot results. These findings suggest that multi-stage generation, combined with intermediate control and decision locking, is a key design principle for improving controllability, user agency, creativity, and output diversity in generative systems.

Editorial analysis

A structured set of objections, weighed in public.

Desk editor's note, referee report, simulated authors' rebuttal, and a circularity audit. Tearing a paper down is the easy half of reading it; the pith above is the substance, this is the friction.

Desk Editor's Note private letter to a colleague

Creo gives a concrete staged interface for T2I with locking that users like for ownership, but the study only compares the whole system to one-shot and leaves the locking contribution untested.

read the letter

The paper builds Creo as a progressive T2I tool that starts with sketch-like stages and moves to higher fidelity, letting users edit manually or with AI help at each step. A locking feature applies diffs so earlier decisions stay put while later changes target only chosen parts. This directly tackles the anchoring and drift problems that come with single-pass generators. The user study finds participants felt more ownership over the results because they could follow their own choices through the stages, and an embedding check shows the outputs spread out more than one-shot baselines. Those are the main empirical points and they line up with the design goals. The work is new in how it packages the stages, the locking primitive, and the diff-based updates into one system for this domain, plus the direct comparison on ownership and homogeneity. It does a clean job of naming the usability issues with current T2I tools and showing a workable alternative that keeps the user in the loop. The soft spot is the evaluation. The comparative study runs full Creo against a one-shot baseline but does not include an ablation that keeps the stages while removing locking, or the reverse. That means the data cannot separate whether the locking mechanism itself reduces drift and raises agency or whether any staged interface would produce similar effects. The abstract also skips participant counts, exact tasks, and statistical details, so the strength of the ownership and diversity claims is hard to judge without the full methods. If the paper has those numbers and perhaps some additional controls, the limitation shrinks. This is aimed at HCI researchers and tool builders who work on generative interfaces. Anyone thinking about co-creation or controllability in image models will find usable design ideas and a starting empirical comparison. It is worth sending to peer review because the core idea is practical and the study provides some evidence, even if the design needs tighter isolation of the locking component and fuller reporting on the user data.

Referee Report

2 major / 1 minor

Summary. The paper introduces Creo, a multi-stage text-to-image system that scaffolds generation from rough sketches to high-resolution outputs using editable intermediary abstractions and a decision-locking mechanism that applies diffs to preserve prior user decisions. A comparative user study against a one-shot baseline reports higher user ownership with Creo, and an embedding analysis indicates less homogeneous outputs. The authors conclude that multi-stage generation combined with intermediate control and decision locking is a key design principle for improving controllability, agency, creativity, and diversity in generative systems.

Significance. If the empirical results prove robust, this work offers a concrete design principle for co-creative interfaces that could meaningfully advance human-AI collaboration in visual ideation by reducing premature commitment and unintended drift. The progressive scaffolding approach directly targets well-documented pain points in current T2I tools and provides initial evidence that such workflows can increase user agency and output variety.

major comments (2)

[Abstract / User Study] The user study (described in the abstract and evaluation sections) compares only the full Creo system against a one-shot baseline and includes no ablation conditions that retain multi-stage progression while disabling the decision-locking mechanism. This design cannot isolate whether locking (via diff application to preserve prior decisions) is necessary for the reported gains in ownership and reduced drift, or whether any staged interface would produce similar benefits. The distinction is load-bearing for the central claim that 'multi-stage generation, combined with intermediate control and decision locking, is a key design principle.'
[Abstract] The abstract reports ownership gains and an embedding-based homogeneity analysis but omits essential methodological details: participant count, task descriptions, statistical tests, exact embedding model and distance metric, and any controls for the locking component. Without these, the soundness and generalizability of the empirical support for the design principle cannot be assessed.

minor comments (1)

Additional diagrams illustrating the locking mechanism across stages and how diffs are applied would improve clarity of the system description.

Simulated Author's Rebuttal

2 responses · 0 unresolved

We thank the referee for their thoughtful and constructive feedback. We address each major comment point by point below, providing clarifications and indicating revisions where appropriate.

read point-by-point responses

Referee: [Abstract / User Study] The user study (described in the abstract and evaluation sections) compares only the full Creo system against a one-shot baseline and includes no ablation conditions that retain multi-stage progression while disabling the decision-locking mechanism. This design cannot isolate whether locking (via diff application to preserve prior decisions) is necessary for the reported gains in ownership and reduced drift, or whether any staged interface would produce similar benefits. The distinction is load-bearing for the central claim that 'multi-stage generation, combined with intermediate control and decision locking, is a key design principle.'

Authors: We agree that an ablation isolating the decision-locking mechanism from multi-stage progression alone would strengthen causal claims about its specific role. However, the locking mechanism (via diff-based preservation of prior decisions) is tightly integrated into Creo's multi-stage workflow; without it, the system would default to full regeneration at each stage, reintroducing the very drift and loss of agency the design seeks to mitigate. The one-shot baseline was chosen to represent the dominant current paradigm in T2I tools. We have added a dedicated paragraph in the Discussion section acknowledging this as a limitation of the current study design, explaining the integrated rationale, and outlining plans for future ablation experiments. The central claim is framed around the combined system rather than isolated components. revision: partial
Referee: [Abstract] The abstract reports ownership gains and an embedding-based homogeneity analysis but omits essential methodological details: participant count, task descriptions, statistical tests, exact embedding model and distance metric, and any controls for the locking component. Without these, the soundness and generalizability of the empirical support for the design principle cannot be assessed.

Authors: We appreciate this observation. While abstracts are length-constrained, we have revised the abstract to incorporate the requested details: participant count, a concise description of the two ideation tasks, the statistical tests applied to ownership measures, the embedding model and distance metric used for the homogeneity analysis, and explicit mention that decision locking is active in the Creo condition. These additions are drawn directly from the Evaluation section and improve transparency without altering the abstract's core message. revision: yes

Circularity Check

0 steps flagged

No circularity: empirical system evaluation without derivations or self-referential fits

full rationale

The paper describes a multi-stage T2I interface and supports its claims via a comparative user study (ownership, agency) plus embedding homogeneity analysis against a one-shot baseline. No equations, parameters, or derivations appear; the central design principle is presented as an empirical finding rather than a mathematical result that reduces to its inputs. No self-citation chains, uniqueness theorems, or fitted quantities renamed as predictions are invoked. The study design limitations noted by the skeptic concern experimental controls, not circular reasoning in any derivation chain.

Axiom & Free-Parameter Ledger

0 free parameters · 0 axioms · 0 invented entities

This is an applied HCI system paper with no formal mathematical axioms, free parameters, or postulated physical entities; the central claims rest on the design of the Creo stages and the validity of the user study protocol.

pith-pipeline@v0.9.0 · 5599 in / 1082 out tokens · 20149 ms · 2026-05-10T12:35:25.848147+00:00 · methodology

discussion (0)

Reference graph

Works this paper leans on

70 extracted references · 6 canonical work pages · 2 internal anchors

[1]

Maneesh Agrawala. 2023. Unpredictable black boxes are terrible interfaces.ACM TechTalks(2023)

2023
[2]

Hadi Alzayer, Zhihao Xia, Xuaner Zhang, Eli Shechtman, Jia-Bin Huang, and Michael Gharbi. 2025. Magic fixup: Streamlining photo editing by watching dynamic videos.ACM Transactions on Graphics44, 5 (2025), 1–25

2025
[3]

Artists & Illustrators. 2021. How to Illustrate a Children’s Book.Artists & Illustrators(2021). https://www.artistsandillustrators.co.uk/how-to/illustration/ how-to-illustrate-a-childrens-book/ Accessed January 2026

2021
[4]

Seok-Hyung Bae, Ravin Balakrishnan, and Karan Singh. 2008. ILoveSketch: as- natural-as-possible sketching system for creating 3d curve models. InProceedings of the 21st annual ACM symposium on User interface software and technology. 151–160

2008
[5]

Eric J Bigelow, John P McCoy, and Tomer D Ullman. 2023. Non-commitment in mental imagery.Cognition238 (2023), 105498

2023
[6]

Stephen Brade, Bryan Wang, Mauricio Sousa, Sageev Oore, and Tovi Gross- man. 2023. Promptify: Text-to-image generation through interactive prompt exploration with large language models. InProceedings of the 36th Annual ACM Symposium on User Interface Software and Technology. 1–14

2023
[7]

Juliette Brun, Pascal Le Masson, and Benoit Weil. 2016. Designing with sketches: the generative effects of knowledge preordering.Design Science2 (2016), e13

2016
[8]

2010.Sketching user experiences: getting the design right and the right design

Bill Buxton. 2010.Sketching user experiences: getting the design right and the right design. Morgan kaufmann

2010
[9]

Fanny Chevalier, Pierre Dragicevic, and Christophe Hurter. 2012. Histomages: fully synchronized views for image editing. InProceedings of the 25th annual ACM symposium on User interface software and technology. 281–286

2012
[10]

Francis D. K. Ching. 2014.Architectural Graphics(6 ed.). Wiley, Hoboken, NJ, USA

2014
[11]

2013.The Visual Language of Comics: Introduction to the Structure and Cognition of Sequential Images

Neil Cohn. 2013.The Visual Language of Comics: Introduction to the Structure and Cognition of Sequential Images. Bloomsbury Academic, London

2013
[12]

Sebastian Deterding, Jonathan Hook, Rebecca Fiebrink, Marco Gillies, Jeremy Gow, Memo Akten, Gillian Smith, Antonios Liapis, and Kate Compton. 2017. Mixed-initiative creative interfaces. InProceedings of the 2017 CHI conference extended abstracts on human factors in computing systems. 628–635

2017
[13]

Anil R Doshi and Oliver P Hauser. 2024. Generative AI enhances individual creativity but reduces the collective diversity of novel content.Science advances 10, 28 (2024), eadn5290

2024
[14]

1914.Composition

Arthur Wesley Dow. 1914.Composition. Doubleday, Doran, Incorporated

1914
[15]

2008.Comics and Sequential Art(revised edition ed.)

Will Eisner. 2008.Comics and Sequential Art(revised edition ed.). W. W. Norton & Company, New York

2008
[16]

Judith E Fan, Wilma A Bainbridge, Rebecca Chamberlain, and Jeffrey D Wammes
[17]

Drawing as a versatile cognitive tool.Nature Reviews Psychology2, 9 (2023), 556–568

2023
[18]

Yingchaojie Feng, Xingbo Wang, Kam Kwai Wong, Sijia Wang, Yuhong Lu, Min- feng Zhu, Baicheng Wang, and Wei Chen. 2023. Promptmagician: Interactive prompt engineering for text-to-image creation.IEEE Transactions on Visualization and Computer Graphics30, 1 (2023), 295–305

2023
[19]

Jennifer Fernquist, Tovi Grossman, and George Fitzmaurice. 2011. Sketch-sketch revolution: an engaging tutorial system for guided sketching and application learning. InProceedings of the 24th annual ACM symposium on User interface software and technology. 373–382

2011
[20]

1995.Sketches of thought

Vinod Goel. 1995.Sketches of thought. MIT press

1995
[21]

Gabriela Goldschmidt. 2014. Modeling the role of sketching in design idea gener- ation. InAn anthology of theories and models of design: philosophy, approaches and empirical explorations. Springer, 433–450

2014
[22]

Charles Goodwin. [n. d.]. 1994!. Professional Vision.American Anthropologist96, 3 ([n. d.]), 606–633

1994
[23]

Thomas RG Green. 1989. Cognitive dimensions of notations.People and computers V(1989), 443–460

1989
[24]

Matthew Guzdial and Mark Riedl. 2019. An interaction framework for studying co-creative ai.arXiv preprint arXiv:1903.09709(2019)

work page arXiv 2019
[25]

Aaron Hertzmann. 2020. Why do line drawings work? a realism hypothesis. Perception49, 4 (2020), 439–451

2020
[26]

Aspen Hopkins, Angie Boggust, and Harini Suresh. 2025. Chatbot Evaluation Is (Sometimes) Ill-Posed: Contextualization Errors in the Human-Interface-Model Pipeline. InProceedings of the Human-Centered Evaluation and Auditing Workshop (HEAL@CHI)

2025
[27]

Emmanuel Iarussi, Adrien Bousseau, and Theophanis Tsandilas. 2013. The draw- ing assistant: Automated drawing guidance and feedback from photographs. In ACM Symposium on User Interface Software and Technology (UIST). ACM

2013
[28]

Takeo Igarashi, Satoshi Matsuoka, and Hidehiko Tanaka. 1999. Teddy: A Sketch- ing Interface for 3D Freeform Design. InProceedings of ACM SIGGRAPH. ACM, 409–416

1999
[29]

David G Jansson and Steven M Smith. 1991. Design fixation.Design studies12, 1 (1991), 3–11

1991
[30]

Seung-Jun Lee, Jeongche Yoon, Sang-Hyun Lee, Joon Hyub Lee, and Seok-Hyung Bae. 2025. 3D Sketching + 2D Generative AI for Car Exterior Design. InProceed- ings of the 38th Annual ACM Symposium on User Interface Software and Technology (UIST 2025). ACM. doi:10.1145/3746059.3747609 Best Demo Honorable Mention

work page doi:10.1145/3746059.3747609 2025
[31]

Youn-Kyung Lim, Erik Stolterman, and Josh Tenenberg. 2008. The anatomy of prototypes: Prototypes as filters, prototypes as manifestations of design ideas. ACM Transactions on Computer-Human Interaction (TOCHI)15, 2 (2008), 1–27

2008
[32]

Margaret Livingstone and David H Hubel. 2002. Vision and art: The biology of seeing.(No Title)(2002)

2002
[33]

1947.Creative illustration

Andrew Loomis. 1947.Creative illustration. Viking Press New York, NY

1947
[34]

Todd Lubart. 2005. How can computers be partners in the creative process: classification and commentary on the special issue.International journal of human-computer studies63, 4-5 (2005), 365–369

2005
[35]

Jiaju Ma, Chau Vu, Asya Lyubavina, Catherine Liu, and Jingyi Li. 2025. Compu- tational Scaffolding of Composition, Value, and Color for Disciplined Drawing. InProceedings of the 38th Annual ACM Symposium on User Interface Software and Technology (UIST ’25). Association for Computing Machinery, New York, NY, USA, Article 161, 15 pages. doi:10.1145/3746059.3747605

work page doi:10.1145/3746059.3747605 2025
[36]

1994.Understanding Comics: The Invisible Art

Scott McCloud. 1994.Understanding Comics: The Invisible Art. HarperCollins, New York

1994
[37]

1941.The Natural Way to Draw

Kimon Nicolaides. 1941.The Natural Way to Draw. Houghton Mifflin, Boston, MA, USA

1941
[38]

Peter O’Donovan, Aseem Agarwala, and Aaron Hertzmann. 2015. DesignScape: Design with Interactive Layout Suggestions. InProceedings of the 33rd Annual ACM Conference on Human Factors in Computing Systems(Seoul, Republic of Korea)(CHI ’15). Association for Computing Machinery, New York, NY, USA, 1221–1224. doi:10.1145/2702123.2702149

work page doi:10.1145/2702123.2702149 2015
[39]

Changhoon Oh, Jungwoo Song, Jinhan Choi, Seonghyeon Kim, Sungwoo Lee, and Bongwon Suh. 2018. I lead, you help but only with enough details: Understanding user experience of co-creation with artificial intelligence. InProceedings of the 2018 CHI conference on human factors in computing systems. 1–13

2018
[40]

Okun and Susan Zwerman

Jeffrey A. Okun and Susan Zwerman. 2010.The VES Handbook of Visual Effects: Industry Standard VFX Practices and Procedures. Focal Press, Burlington, MA

2010
[41]

2012.Computer Animation: Algorithms and Techniques(3rd ed.)

Rick Parent. 2012.Computer Animation: Algorithms and Techniques(3rd ed.). Morgan Kaufmann, Burlington, MA

2012
[42]

Adobe Photoshop. 2026. photoshop.Retrieved January(2026)

2026
[43]

A Terry Purcell and John S Gero. 1998. Drawings and the design process: A review of protocol studies in design and other disciplines and related research in cognitive psychology.Design studies19, 4 (1998), 389–430

1998
[44]

Aditya Ramesh, Prafulla Dhariwal, Alex Nichol, Casey Chu, and Mark Chen
[45]

Hierarchical Text-Conditional Image Generation with CLIP Latents.arXiv preprint arXiv:2204.06125(2022)

work page internal anchor Pith review arXiv 2022
[46]

Robin Rombach, Andreas Blattmann, Dominik Lorenz, Patrick Esser, and Björn Ommer. 2022. High-Resolution Image Synthesis with Latent Diffusion Mod- els. InProceedings of the IEEE/CVF Conference on Computer Vision and Pattern Recognition

2022
[47]

Chitwan Saharia, William Chan, Saurabh Saxena, Lala Li, Jay Whang, Emily Denton, Seyed Kamyar Ghasemipour, and et al. 2022. Photorealistic Text-to- Image Diffusion Models with Deep Language Understanding.arXiv preprint arXiv:2205.11487(2022)

work page internal anchor Pith review arXiv 2022
[48]

Vishnu Sarukkai, Lu Yuan, Mia Tang, Maneesh Agrawala, and Kayvon Fatahalian
[49]

InProceedings of the 37th Annual ACM Symposium on User Interface Software and Technology

Block and detail: Scaffolding sketch-to-image generation. InProceedings of the 37th Annual ACM Symposium on User Interface Software and Technology. 1–13
[50]

Donald A. Schön. 1983.The Reflective Practitioner: How Professionals Think in Action. Basic Books, New York, NY

1983
[51]

Xinyu Shi, Li-Yi Wei, Nanxuan Zhao, Jian Zhao, and Rubaiat Habib Kazi. 2026. Notational Animating: An Interactive Approach to Creating and Editing Ani- mation Keyframes. InProceedings of the ACM Conference on Human Factors in Computing Systems (CHI 2026). ACM

2026
[52]

Ben Shneiderman. 2007. Creativity Support Tools: Accelerating Discovery and Innovation.Commun. ACM50, 12 (2007), 20–32

2007
[53]

Maria Shugrina, Jingwan Lu, and Stephen Diverdi. 2017. Playful palette: an interactive parametric color mixer for artists.ACM Transactions on Graphics , , De Simone et al. (TOG)36, 4 (2017), 1–10

2017
[54]

Maria Shugrina, Wenjia Zhang, Fanny Chevalier, Sanja Fidler, and Karan Singh
[55]

InProceedings of the 2019 CHI conference on human factors in computing systems

Color builder: A direct manipulation interface for versatile color theme authoring. InProceedings of the 2019 CHI conference on human factors in computing systems. 1–12

2019
[56]

Hari Subramonyam, Roy Pea, Christopher Pondoc, Maneesh Agrawala, and Colleen Seifert. 2024. Bridging the gulf of envisioning: Cognitive challenges in prompt based interactions with llms. InProceedings of the 2024 CHI Conference on Human Factors in Computing Systems. 1–19

2024
[57]

Masaki Suwa, John Gero, and Terry Purcell. 2000. Unexpected discoveries and S-invention of design requirements: important vehicles for a design process. Design studies21, 6 (2000), 539–567

2000
[58]

Amanda Swearngin, Amy J Ko, and James Fogarty. 2018. Scout: Mixed-initiative exploration of design variations through high-level design constraints. InAdjunct Proceedings of the 31st Annual ACM Symposium on User Interface Software and Technology. 134–136

2018
[59]

Jeff Szuc. 2020. Behind the Scenes: Illustration Process Tutorial.Jeff Szuc (2020). https://www.jeffszuc.com/posts/articles/behind-the-scenes-illustration- process-tutorial Accessed January 2026

2020
[60]

Mia Tang, Yael Vinker, Chuan Yan, Lvmin Zhang, and Maneesh Agrawala. 2025. Instance Segmentation of Scene Sketches Using Natural Image Priors. InACM SIGGRAPH Conference Proceedings. ACM, 96:1–96:10

2025
[61]

1995.The Illusion of Life: Disney Animation

Frank Thomas and Ollie Johnston. 1995.The Illusion of Life: Disney Animation. Disney Editions, New York

1995
[62]

Barbara Tversky. 2002. What do sketches say about thinking. In2002 AAAI Spring Symposium, Sketch Understanding Workshop, Stanford University, AAAI Technical Report SS-02-08, Vol. 148. 151

2002
[63]

Ilse M Verstijnen, Cees van Leeuwen, Gabriela Goldschmidt, Ronald Hamel, and JM Hennessey. 1998. Sketching and creative discovery.Design studies19, 4 (1998), 519–546

1998
[64]

Yael Vinker, Tamar Rott Shaham, Kristine Zheng, Alex Zhao, Judith E Fan, and Antonio Torralba. 2025. SketchAgent: Language-Driven Sequential Sketch Gen- eration. InProceedings of the IEEE/CVF Conference on Computer Vision and Pattern Recognition (CVPR). 23355–23368

2025
[65]

Andrey Voynov, Kfir Aberman, and Daniel Cohen-Or. 2023. Sketch-guided text- to-image diffusion models. InACM SIGGRAPH 2023 conference proceedings. 1–11

2023
[66]

Samangi Wadinambiarachchi, Ryan M Kelly, Saumya Pareek, Qiushi Zhou, and Eduardo Velloso. 2024. The effects of generative ai on design fixation and diver- gent thinking. InProceedings of the 2024 CHI Conference on Human Factors in Computing Systems. 1–18

2024
[67]

Blake Williford, Abhay Doke, Michel Pahud, Ken Hinckley, and Tracy Ham- mond. 2019. DrawMyPhoto: assisting novices in drawing from photographs. In Proceedings of the 2019 Conference on Creativity and Cognition. 198–209

2019
[68]

Karl DD Willis, Yewen Pu, Jieliang Luo, Hang Chu, Tao Du, Joseph G Lambourne, Armando Solar-Lezama, and Wojciech Matusik. 2021. Fusion 360 gallery: A dataset and environment for programmatic cad construction from human design sequences.ACM Transactions on Graphics (TOG)40, 4 (2021), 1–24

2021
[69]

Lvmin Zhang, Anyi Rao, and Maneesh Agrawala. 2023. Adding conditional con- trol to text-to-image diffusion models. InProceedings of the IEEE/CVF international conference on computer vision. 3836–3847

2023
[70]

a close up of my main character

Lvmin Zhang, Chuan Yan, Yuwei Guo, Jinbo Xing, and Maneesh Agrawala. 2025. Generating Past and Future in Digital Painting Processes.ACM Transactions on Graphics(2025), 127:1–127:13. Creo , , 7 Appendix ACreoWorkflows: Two entry points to the same abstractions We support two entry points into theCreoworkflow: starting from a text prompt or from an existing...

2025