Recognition: no theorem link
Conversational Customization of Productivity Systems: A Design Probe of Malleable AI Interfaces
Pith reviewed 2026-05-13 02:35 UTC · model grok-4.3
The pith
Users turn their email inbox into a flexible data layer by authoring custom behaviors through natural language conversations rather than treating it as a fixed interface.
A machine-rendered reading of the paper's core claim, the machinery that carries it, and where it could break.
Core claim
In a design probe of a conversationally customizable email system, users created and refined inbox categories, interface elements, and workflow behaviors through natural language. These customizations were typically grounded in existing patterns that participants adapted to their own needs. The inbox therefore became a flexible data layer shaped by user-authored features rather than a fixed interface. Customization also introduced risks such as mis-specified behavior, unintended filtering, and uncertainty about outcomes, which participants managed through ongoing oversight and iterative refinement.
What carries the argument
Conversationally customizable email interface that lets users iteratively restructure categories, add interface elements, and author new workflow behaviors directly in natural language.
If this is right
- Users adapt familiar patterns to create specialized functionality instead of generating entirely novel features.
- The inbox shifts from a fixed interface to a data layer that users actively shape through their own instructions.
- Customization brings risks including mis-specified behavior and unintended filtering that require user oversight.
- Systems must provide support for iterative refinement, visibility into AI behavior, and safe experimentation.
Where Pith is reading between the lines
- The same conversational approach could let users reshape other productivity tools such as calendars or task lists by layering custom behaviors on top of existing data.
- Over longer periods, users might develop more complex layered customizations that depend on the AI correctly interpreting successive instructions.
- Designers of future tools could add explicit safeguards such as preview modes or rollback options to reduce the cost of unintended filtering.
Load-bearing premise
Short-term use of one specific email-based design probe captures how conversational customization would play out in long-term, real-world productivity work across different tools and users.
What would settle it
A follow-up study in which participants use the system for weeks across multiple productivity tools and show no increase in ongoing oversight or refinement of customizations would indicate the observed pattern of risk management does not generalize.
Figures
read the original abstract
Customization has long been a central goal in interactive systems, yet prior work shows that end-user tailoring occurs infrequently and is often confined to initial setup or moments of breakdown. Recent advances in generative AI suggest that highly malleable systems-where users can modify system behavior through natural language-are now technically feasible. However, it remains unclear how such malleability is used in practice: What kinds of customizations do users create, when do they choose to customize, and how do these modifications shape their experience of everyday tools? We present a design probe that uses a conversationally customizable email system as an instrument to study how users create and refine functionality within everyday tools. The system allows users to iteratively modify their inbox by restructuring categories, introducing interface elements, and authoring new workflow behaviors directly through natural language interaction. We study how participants create, refine, and use these features over several days within their own email workflows. We find that users' customizations are often grounded in existing patterns, which they adapt and specialize to fit their needs, rather than generating entirely novel functionality. Malleability changes how users engage with their inbox, shifting it from a fixed interface to a flexible data layer shaped through user-authored features. At the same time, customization introduces new forms of risk, including mis-specified behavior, unintended filtering, and uncertainty around outcomes, which users manage through ongoing oversight and refinement. These findings highlight how conversational customization becomes embedded within everyday interaction, and point toward the need for systems that support iterative refinement, visibility into behavior, and safe experimentation as users shape their own tools.
Editorial analysis
A structured set of objections, weighed in public.
Referee Report
Summary. The paper presents a design probe of a conversationally customizable email system that lets users restructure categories, add interface elements, and author workflow behaviors through natural language. In a study where participants used the system within their own email workflows for several days, the authors report that customizations are typically adaptations of existing patterns rather than entirely novel functionality. They claim that malleability transforms the inbox from a fixed interface into a flexible data layer shaped by user-authored features, while also introducing risks such as mis-specified behavior, unintended filtering, and outcome uncertainty that users address through ongoing oversight and iterative refinement.
Significance. If the results hold, this work is significant for HCI research on end-user customization and malleable AI interfaces. By embedding a functional probe in participants' real workflows, it supplies concrete qualitative evidence of how conversational customization integrates into daily use, revealing adaptive behaviors alongside emergent oversight challenges. These insights can guide the design of future systems that better support iterative refinement, behavioral visibility, and safe experimentation in productivity tools.
major comments (2)
- Abstract and Findings: The central claim that malleability produces a durable shift 'from a fixed interface to a flexible data layer shaped through user-authored features' rests on observations from a probe lasting only 'several days.' This duration is insufficient to distinguish sustained changes in engagement from novelty effects or temporary adaptation, directly weakening the generalization to everyday productivity workflows stated in the abstract.
- Method and Discussion: The probe is confined to a single email system, yet the claims extend to 'productivity systems' broadly and to how users manage risks like mis-specification across tools. Without data from additional domains or a cross-tool comparison, the observed risk-management strategies (ongoing oversight and refinement) may be domain-specific rather than generalizable.
Simulated Author's Rebuttal
We thank the referee for the detailed and constructive comments. We address each major point below, clarifying the scope of our design probe while revising the manuscript to better reflect its exploratory nature and limitations.
read point-by-point responses
-
Referee: Abstract and Findings: The central claim that malleability produces a durable shift 'from a fixed interface to a flexible data layer shaped through user-authored features' rests on observations from a probe lasting only 'several days.' This duration is insufficient to distinguish sustained changes in engagement from novelty effects or temporary adaptation, directly weakening the generalization to everyday productivity workflows stated in the abstract.
Authors: We agree that the short duration of the probe precludes strong claims about long-term durability or sustained shifts in engagement. The study was explicitly framed as a design probe to surface initial patterns of conversational customization within participants' real workflows, not as a longitudinal evaluation. We have revised the abstract to remove references to a 'durable shift' and to emphasize observed behaviors during the multi-day period. We have also expanded the discussion and added a dedicated limitations paragraph that acknowledges potential novelty effects and calls for future longitudinal work to assess persistence. These changes align our claims more precisely with the data collected. revision: yes
-
Referee: Method and Discussion: The probe is confined to a single email system, yet the claims extend to 'productivity systems' broadly and to how users manage risks like mis-specification across tools. Without data from additional domains or a cross-tool comparison, the observed risk-management strategies (ongoing oversight and refinement) may be domain-specific rather than generalizable.
Authors: We acknowledge that confining the probe to email limits direct evidence for generalizability across productivity systems. The email domain was chosen because it provides a rich, authentic workflow for observing customization and risk management in situ. We have revised the discussion to explicitly qualify that the specific oversight strategies observed may be influenced by email characteristics and to recommend cross-domain studies. At the same time, the core mechanisms—adapting existing patterns through conversation and managing uncertainty via iterative refinement—are positioned as transferable insights for malleable AI interfaces, supported by connections to prior end-user programming literature. We do not claim domain-independence but maintain that the probe yields actionable HCI implications. revision: partial
Circularity Check
No circularity: empirical qualitative design probe with no derivations or self-referential fitting
full rationale
The paper is a design-probe study that deploys a conversational email system with participants for several days and reports observed customization patterns, risks, and engagement shifts. No equations, parameters, predictions, or derivations appear in the abstract or described claims. Central findings (customizations grounded in existing patterns; inbox as flexible data layer; risk management via oversight) are presented as direct outcomes of the user study rather than reductions to fitted inputs, self-definitions, or self-citation chains. The work is self-contained against external benchmarks as an exploratory qualitative instrument; no load-bearing step reduces by construction to its own inputs.
Axiom & Free-Parameter Ledger
axioms (1)
- domain assumption Natural language interaction enables users to meaningfully modify system behavior in productivity tools
Reference graph
Works this paper leans on
-
[1]
Nikola Banovic, Fanny Chevalier, Tovi Grossman, and George Fitzmaurice. 2012. Triggering Triggers and Burying Barriers to Customizing Software. InProceedings of the SIGCHI Conference on Human Factors in Computing Systems (CHI ’12). ACM, 2717–2726
work page 2012
-
[2]
Tommaso Calò, Andrea Sillano, and Luigi De Russis. 2025. MorphGUI: Real- Time GUI Customization with Large Language Models.International Journal of Human-Computer Studies197 (2025). doi:10.1016/j.ijhcs.2025.103695
-
[3]
Yining Cao, Peiling Jiang, and Haijun Xia. 2025. Generative and Malleable User Interfaces with Generative and Evolving Task-Driven Data Model. InProceedings of the 2025 CHI Conference on Human Factors in Computing Systems (CHI ’25). ACM. doi:10.1145/3706598.3713285
-
[4]
Practice on long sequential user behavior modeling for click-through rate prediction
Mia Xu Chen, Benjamin N. Lee, Gagan Bansal, Yuan Cao, Shuyuan Zhang, Justin Lu, Jackie Tsay, Yinan Wang, Andrew M. Dai, Zhifeng Chen, Timothy Sohn, and Yonghui Wu. 2019. Gmail Smart Compose: Real-Time Assisted Writing. InProceedings of the 25th ACM SIGKDD International Conference on Knowledge Discovery and Data Mining (KDD ’19). ACM, 2287–2295. doi:10.114...
-
[5]
Paul Dourish. 2003. The Appropriation of Interactive Technologies: Some Lessons from Placeless Documents.Computer Supported Cooperative Work12, 4 (2003), 465–490
work page 2003
-
[6]
Google. 2023. Duet AI for Google Workspace. https://workspace.google.com/ solutions/ai. Accessed: 2026
work page 2023
-
[7]
Hilary Hutchinson, Wendy Mackay, Bo Westerlund, Benjamin B. Bederson, Al- lison Druin, Catherine Plaisant, Michel Beaudouin-Lafon, Stéphane Conversy, Helen Evans, Heiko Hansen, Nicolas Roussel, and Björn Eiderbäck. 2003. Tech- nology Probes: Inspiring Design for and with Families. InProceedings of the SIGCHI Conference on Human Factors in Computing System...
-
[8]
Anjuli Kannan, Karol Kurach, Sujith Ravi, Tobias Kaufmann, Andrew Tomkins, Balint Miklos, Greg Corrado, László Lukács, Marina Ganea, Peter Young, and Vivek Ramavajjala. 2016. Smart Reply: Automated Response Suggestion for Email. InProceedings of the 22nd ACM SIGKDD International Conference on Knowledge Discovery and Data Mining (KDD ’16). ACM, 955–964. do...
-
[9]
Eagan, and Peter van Hardenberg
Clemens Nylandsted Klokmose, James R. Eagan, and Peter van Hardenberg. 2024. MyWebstrates: Webstrates as Local-First Software. InProceedings of the 37th Annual ACM Symposium on User Interface Software and Technology (UIST ’24). ACM. doi:10.1145/3654777.3676445
-
[10]
Ko, Robin Abraham, Laura Beckwith, Alan Blackwell, Margaret M
Andrew J. Ko, Robin Abraham, Laura Beckwith, Alan Blackwell, Margaret M. Burnett, Martin Erwig, Chris Scaffidi, Joseph Lawrance, Henry Lieberman, Brad A. Myers, Mary Beth Rosson, Gregg Rothermel, Mary Shaw, and Susan Wiedenbeck
-
[11]
Surveys 43, 3 (2011), 21:1–21:44
The State of the Art in End-User Software Engineering.Comput. Surveys 43, 3 (2011), 21:1–21:44. doi:10.1145/1922649.1922658
-
[12]
Geoffrey Litt, Josh Horowitz, Peter van Hardenberg, and Todd Matthews. 2025. Malleable Software: Restoring User Agency in a World of Locked-Down Apps. Ink & Switch. https://www.inkandswitch.com/essay/malleable-software/
work page 2025
-
[13]
Wendy E. Mackay. 1991. Triggers and Barriers to Customizing Software. In Proceedings of the SIGCHI Conference on Human Factors in Computing Systems (CHI ’91). ACM, 153–160
work page 1991
-
[14]
Allan MacLean, Kathleen Carter, Lennart Lövstrand, and Thomas Moran. 1990. User-Tailorable Systems: Pressing the Issues with Buttons. InProceedings of the SIGCHI Conference on Human Factors in Computing Systems (CHI ’90). ACM, 175–182
work page 1990
-
[15]
Microsoft. 2023. Microsoft 365 Copilot. https://www.microsoft.com/en-us/ microsoft-365/copilot. Accessed: 2026
work page 2023
-
[16]
Bryan Min, Allen Chen, Yining Cao, and Haijun Xia. 2025. Malleable Overview- Detail Interfaces. InProceedings of the 2025 CHI Conference on Human Factors in Computing Systems (CHI ’25). ACM. doi:10.1145/3706598.3714164
-
[17]
Bryan Min and Haijun Xia. 2025. Gradual Generation of User Interfaces as a Design Method for Malleable Software. InProceedings of the 38th Annual ACM Symposium on User Interface Software and Technology (UIST ’25). ACM
work page 2025
-
[18]
Bryan Min and Haijun Xia. 2025. Meridian: A Design Framework for Malleable Overview-Detail Interfaces. InProceedings of the 38th Annual ACM Symposium on User Interface Software and Technology (UIST ’25). ACM
work page 2025
-
[19]
Brad A. Myers, Andrew J. Ko, and Margaret M. Burnett. 2006. Invited Research Overview: End-User Programming. InProceedings of the SIGCHI Conference on Human Factors in Computing Systems (CHI ’06). ACM
work page 2006
-
[20]
OpenAI. 2024. Assistants API. https://platform.openai.com/docs/assistants. Accessed: 2026
work page 2024
-
[21]
Shwetha Rajaram, Nels Numan, Balasaravanan Thoravi Kumaravel, Nicolai Mar- quardt, and Andrew D. Wilson. 2024. BlendScape: Enabling End-User Customiza- tion of Video-Conferencing Environments through Generative AI. InProceedings of the 37th Annual ACM Symposium on User Interface Software and Technology (UIST ’24). ACM, 40:1–40:19. doi:10.1145/3654777.3676326
-
[22]
Blase Ur, Marias Pak Yong Ho, Stephen Brawner, Jiyun Lee, Sarah Mennicken, Noah Picard, Diane Schulze, and Michael L. Littman. 2016. Trigger-Action Pro- gramming in the Wild: An Analysis of 200,000 IFTTT Recipes. InProceedings of the 2016 CHI Conference on Human Factors in Computing Systems (CHI ’16). ACM, 3227–3231. doi:10.1145/2858036.2858556
-
[23]
Shunyu Yao, Jeffrey Zhao, Dian Yu, Nan Du, Izhak Shafran, Karthik Narasimhan, and Yuan Cao. 2023. ReAct: Synergizing Reasoning and Acting in Language Mod- els. InProceedings of the 11th International Conference on Learning Representations (ICLR ’23)
work page 2023
-
[24]
Zapier, Inc. 2020. Zapier: The Easiest Way to Automate Your Work. https: //zapier.com. Accessed: 2026. UIST 2026, A User Interaction Model Figure 6: The conversational customization loop. Users identify friction in their daily workflow, propose a change through the natural language meta-interface, and the system generates a standalone feature bundle. Afte...
work page 2020
discussion (0)
Sign in with ORCID, Apple, or X to comment. Anyone can read and Pith papers without signing in.