arxiv: 2604.05203 · v1 · submitted 2026-04-06 · 💻 cs.HC

Recognition: no theorem link

Decision-Oriented Programming with Aporia

Saketh Ram Kasibatla , Raven Rothkopf , Hila Peleg , Benjamin C. Pierce , Sorin Lerner , Harrison Goldstein , Nadia Polikarpova

Authors on Pith no claims yet

Pith reviewed 2026-05-10 18:47 UTC · model grok-4.3

classification 💻 cs.HC

keywords decision-oriented programmingAI coding agentsdesign decisionsmental modelsuser studytest suiteshuman-AI collaborationprogramming tools

0 comments

The pith

Explicit decision tracking in AI-assisted programming keeps programmers engaged and forms more accurate mental models.

A machine-rendered reading of the paper's core claim, the machinery that carries it, and where it could break.

The paper proposes decision-oriented programming as a paradigm where key design decisions serve as the shared, explicit medium between programmers and AI agents. Decisions are structured, interactively elicited by the agent, and each one is directly traceable to the generated code. Aporia implements this vision with a persistent Decision Bank for storing and editing decisions, proactive design questions to elicit them, and conversion of each decision into an executable test suite for validation. In a study of 14 programmers, the approach increased engagement with design, supported exploration and validation steps, and produced mental models five times less likely to disagree with the actual code than those formed using a standard coding agent.

Core claim

Decision-oriented programming treats decisions as the primary interface between the programmer and the AI agent. Decisions are made explicit and structured, co-authored through interactive elicitation, and each one is traceable back to the resulting code. Aporia realizes this by maintaining a Decision Bank of persistent, editable decisions; proactively asking programmers design questions; and encoding decisions as executable test suites that validate the implementation. In a user study, this approach increased engagement in the design process, supported exploration and validation, and led to significantly more accurate programmer understanding of the code.

What carries the argument

The Decision Bank, which stores persistent, editable decisions that the agent elicits through questions and encodes as executable test suites linking intent directly to implementation.

Load-bearing premise

The observed gains in engagement and mental model accuracy from the 14-participant study will generalize and stem specifically from explicit decision tracking rather than other differences in the interface or agent behavior.

What would settle it

A controlled follow-up study with a larger group of programmers where a baseline agent asks similar design questions but omits the structured Decision Bank and test encoding, then measures whether mental model accuracy and engagement differences disappear.

Figures

Figures reproduced from arXiv: 2604.05203 by Benjamin C. Pierce, Harrison Goldstein, Hila Peleg, Nadia Polikarpova, Raven Rothkopf, Saketh Ram Kasibatla, Sorin Lerner.

**Figure 1.** Figure 1: We describe Decision-Oriented Programming, a paradigm for supporting human decision-making in AI-assisted [PITH_FULL_IMAGE:figures/full_fig_p001_1.png] view at source ↗

**Figure 2.** Figure 2: An example paper details page in NotCRP. behind design choices [34, 38, 42]. Specifically, Questions, Options, Criteria (QOC) notation [38] represents design spaces in terms of Questions that identify key issues, Options that provide possible answers, and Criteria for evaluating those options. More broadly, the design rationale tradition argues for making design reasoning explicit and persistent as a “cop… view at source ↗

**Figure 3.** Figure 3: Aporia is a VS Code extension that supports decision-oriented programming. Users enter their high-level goal into the Goal Field 1 , which prompts Aporia to generate design questions as Question Bubbles 2 . Clicking on a question opens the Question Detail View 3 , which includes Aporia’s argument for the question, as well as references to relevant code. Users can answer the question with a yes/no response … view at source ↗

**Figure 4.** Figure 4: Sample correctness analysis of post-task survey responses from P12 (Task A, left) and P3 (Task B, right). Free [PITH_FULL_IMAGE:figures/full_fig_p006_4.png] view at source ↗

**Figure 5.** Figure 5: Distribution of participants’ post-study survey lik [PITH_FULL_IMAGE:figures/full_fig_p007_5.png] view at source ↗

**Figure 6.** Figure 6: Participants’ self-reported decision-making strate [PITH_FULL_IMAGE:figures/full_fig_p007_6.png] view at source ↗

**Figure 7.** Figure 7: Participants’ self-reported validation strategies. [PITH_FULL_IMAGE:figures/full_fig_p009_7.png] view at source ↗

read the original abstract

AI agents allow developers to express computational intent abstractly, reducing cognitive effort and helping achieve flow during programming. Increased abstraction, however, comes at a cost: developers cede decision-making authority to agents, often without realizing that important design decisions are being made without them. We aim to bring these decisions to the foreground in a paradigm we dub decision-oriented programming. In DOP, (1) decisions are explicit and structured, serving as the shared medium between the programmer and the agent; (2) decisions are co-authored interactively, with the agent proactively eliciting them from the programmer; and (3) each decision is traceable to code. As a step towards this vision, we have built Aporia, a design probe that tracks decisions in a persistent, editable Decision Bank; elicits them by asking programmers design questions; and encodes each decision as an executable test suite that can be used to validate the implementation. In a user study of 14 programmers, Aporia increased engagement in the design process and scaffolded both exploration and validation. Participants also gained a more accurate understanding of their implementations, with their mental models 5x less likely to disagree with the code than a baseline coding agent.

Editorial analysis

A structured set of objections, weighed in public.

Desk editor's note, referee report, simulated authors' rebuttal, and a circularity audit. Tearing a paper down is the easy half of reading it; the pith above is the substance, this is the friction.

Desk Editor's Note private letter to a colleague

Aporia gives a concrete system for making design decisions explicit with AI coding agents, backed by a small study showing engagement and mental-model gains but thin on measurement details.

read the letter

The main point is that this paper defines decision-oriented programming as a way to keep humans in control when using AI agents: decisions are explicit, co-authored through questions, and turned into traceable tests stored in a Decision Bank. Aporia is the prototype that does the tracking and elicitation. That combination feels like a useful step beyond just chatting with a code generator. The user study with 14 programmers reports clear benefits—higher engagement, better scaffolding for exploration and validation, and mental models that disagreed with the code 5x less often than the baseline. Those numbers are the strongest evidence they have. The work is honest about being a design probe rather than a full theory or large-scale trial. What it does well is ship a working mechanism that turns abstract decisions into executable artifacts, which is more actionable than many prior interactive-programming ideas. The soft spots are exactly where the reader flagged: the abstract gives almost no information on how mental-model disagreement was measured, how the baseline agent was built or matched, participant background, or the analysis. With n=14 that leaves the 5x claim suggestive but not yet robust. No internal contradictions in the DOP definition or the system description, and the citation pattern looks standard for an early HCI probe. This is for people working on human-AI programming tools who want a specific interface idea to build on or critique. It deserves a serious referee because it has a real implementation and some empirical signal, even if the evaluation section will need expansion. I would send it to review rather than desk-reject.

Referee Report

2 major / 2 minor

Summary. The manuscript introduces decision-oriented programming (DOP) as a paradigm in which design decisions are explicit and structured as the shared medium between programmer and AI agent, co-authored interactively, and traceable to code. It presents Aporia, a design probe implementing DOP via a persistent editable Decision Bank, proactive elicitation through design questions, and encoding of each decision as an executable test suite for validation. A user study with 14 programmers reports that Aporia increases engagement in the design process, scaffolds exploration and validation, and produces mental models 5x less likely to disagree with the code than a baseline coding agent.

Significance. If the user-study results hold after methodological clarification, the work could meaningfully advance human-AI collaboration in programming by foregrounding decision authority rather than abstracting it away. The design-probe approach and quantitative outcome on mental-model accuracy provide a concrete, falsifiable starting point for tools that aim to preserve developer understanding and agency.

major comments (2)

[Abstract and User Study] Abstract and User Study section: the central empirical claim of a '5x' reduction in mental-model disagreement with the code is load-bearing for the paper's contribution, yet the manuscript supplies no description of how mental models were elicited or measured, how disagreement was operationalized or scored, the exact baseline agent implementation, participant selection or demographics, task details, session length, or any statistical analysis supporting the 5x factor. These omissions prevent evaluation of the result's validity.
[User Study] User Study section: the comparison to the baseline coding agent must demonstrate that the baseline was matched on all interface and capability dimensions except explicit decision tracking; without this, it is impossible to isolate the effect of the Decision Bank and elicitation features from other differences that could drive the observed engagement and accuracy gains.

minor comments (2)

[Introduction] The three DOP principles are listed in the abstract but would benefit from a concise table or enumerated list in the introduction to make their distinctions from related ideas (e.g., test-driven development) immediately clear.
[Abstract] The phrase '5x less likely' should be accompanied by the raw rates or probabilities (e.g., baseline disagreement rate vs. Aporia rate) so readers can assess the practical magnitude.

Simulated Author's Rebuttal

2 responses · 0 unresolved

We thank the referee for their constructive and detailed review. The comments correctly identify gaps in methodological reporting that limit evaluability of the user-study claims. We have revised the manuscript to supply the requested details and clarifications. Our point-by-point responses follow.

read point-by-point responses

Referee: [Abstract and User Study] Abstract and User Study section: the central empirical claim of a '5x' reduction in mental-model disagreement with the code is load-bearing for the paper's contribution, yet the manuscript supplies no description of how mental models were elicited or measured, how disagreement was operationalized or scored, the exact baseline agent implementation, participant selection or demographics, task details, session length, or any statistical analysis supporting the 5x factor. These omissions prevent evaluation of the result's validity.

Authors: We agree that the original manuscript omitted critical methodological details required to assess the 5x claim. This was a reporting oversight. The revised User Study section now provides: (1) the exact protocol for eliciting mental models (post-task structured interviews plus code-behavior prediction tasks), (2) the operationalization and scoring of disagreement (binary mismatch between participant prediction and observed execution trace, aggregated across tasks), (3) the baseline agent specification (identical underlying LLM and chat interface, differing only in the absence of the Decision Bank and proactive elicitation), (4) participant recruitment, demographics, and screening criteria, (5) task descriptions and session durations, and (6) the descriptive frequency analysis underlying the 5x ratio together with its limitations. These additions enable readers to evaluate internal validity. revision: yes
Referee: [User Study] User Study section: the comparison to the baseline coding agent must demonstrate that the baseline was matched on all interface and capability dimensions except explicit decision tracking; without this, it is impossible to isolate the effect of the Decision Bank and elicitation features from other differences that could drive the observed engagement and accuracy gains.

Authors: We concur that isolating the contribution of explicit decision tracking requires the baseline to be matched on all other interface and capability dimensions. The revised manuscript now includes a dedicated subsection describing the baseline implementation and the controls used to equate it with Aporia on chat latency, code-generation quality, UI layout, and available commands. We also added a limitations paragraph discussing residual confounds and the rationale for the chosen between-subjects design. These changes make the attribution argument explicit and falsifiable. revision: yes

Circularity Check

0 steps flagged

No circularity: empirical user study with no derivation chain

full rationale

This paper presents an empirical design-probe study of the Aporia system rather than any mathematical derivation, first-principles result, or fitted model. The central claims rest on a 14-participant user study measuring engagement, scaffolding, and mental-model accuracy; these outcomes are reported directly from participant data and do not reduce by construction to prior definitions, self-citations, or fitted parameters. No equations, ansatzes, uniqueness theorems, or self-referential predictions appear in the provided abstract or study description. The work is therefore self-contained against external benchmarks (the study itself) with no load-bearing circular steps.

Axiom & Free-Parameter Ledger

0 free parameters · 2 axioms · 2 invented entities

The central claim rests on domain assumptions about the value of explicit decisions in human-AI programming collaboration and on the effectiveness of encoding decisions as test suites; no free parameters or invented entities with independent evidence are introduced beyond the system itself.

axioms (2)

domain assumption Making design decisions explicit and interactive improves programmer engagement and mental model accuracy
Core premise of the DOP paradigm stated in the abstract
ad hoc to paper Encoding each decision as an executable test suite provides effective validation
Described as one of the three core properties of Aporia

invented entities (2)

Decision Bank no independent evidence
purpose: Persistent, editable store for decisions that serves as shared medium between programmer and agent
New construct introduced to track and surface decisions
Aporia design probe no independent evidence
purpose: Implementation that elicits decisions via questions and encodes them as tests
The concrete artifact built to explore the DOP vision

pith-pipeline@v0.9.0 · 5525 in / 1550 out tokens · 56234 ms · 2026-05-10T18:47:42.842125+00:00 · methodology

discussion (0)

Reference graph

Works this paper leans on

59 extracted references · 30 canonical work pages · 1 internal anchor

[1]

[n. d.]. Agent Context Protocol. https://zed.dev/acp
[2]

[n. d.]. Model Context Protocol. https://modelcontextprotocol.io/
[3]

[n. d.]. Visual Studio Code. https://code.visualstudio.com
[4]

2026.Claude Code CLI

Anthropic. 2026.Claude Code CLI. https://code.claude.com/docs/en/cli-reference

2026
[5]

2026.Introducing Claude Sonnet 4.6

Anthropic. 2026.Introducing Claude Sonnet 4.6. https://www.anthropic.com/ news/claude-sonnet-4-6

2026
[6]

James, and Nadia Polikarpova

Shraddha Barke, Michael B. James, and Nadia Polikarpova. 2023. Grounded Copilot: How Programmers Interact with Code-Generating Models.Proceedings of the ACM on Programming Languages7, OOPSLA1 (April 2023), 85–111. doi:10. 1145/3586030

2023
[7]

2002.Test Driven Development

Kent Beck. 2002.Test Driven Development. By Example. Addison-Wesley Longman, Amsterdam

2002
[8]

2015.Test-driven development: by example(20

Kent Beck. 2015.Test-driven development: by example(20. printing ed.). Addison- Wesley, Boston

2015
[9]

Markus Borg, Dave Hewett, Nadim Hagatulah, Noric Couderc, Emma Söderberg, Donald Graham, Uttam Kini, and Dave Farley. 2026. Echoes of AI: Investi- gating the Downstream Effects of AI Assistants on Software Maintainability. arXiv:2507.00788 [cs.SE] https://arxiv.org/abs/2507.00788

work page arXiv 2026
[10]

Dibyendu Brinto Bose. 2025. From prompts to properties: Rethinking llm code generation with property-based testing. InProceedings of the 33rd ACM Interna- tional Conference on the Foundations of Software Engineering. 1660–1665

2025
[11]

Virginia Braun and Victoria Clarke. 2006. Using thematic analysis in psychology. Qualitative research in psychology3, 2 (2006), 77–101

2006
[12]

Ruijia Cheng, Titus Barik, Alan Leung, Fred Hohman, and Jeffrey Nichols. 2024. BISCUIT: Scaffolding LLM-Generated Code with Ephemeral UIs in Computational Notebooks. In2024 IEEE Symposium on Visual Languages and Human-Centric Computing (VL/HCC). 13–23. doi:10.1109/VL/HCC60511.2024.00012

work page doi:10.1109/vl/hcc60511.2024.00012 2024
[13]

Koen Claessen and John Hughes. 2000. QuickCheck: a lightweight tool for random testing of Haskell programs. InProceedings of the Fifth ACM SIGPLAN International Conference on Functional Programming (ICFP ’00), Montreal, Canada, September 18-21, 2000, Martin Odersky and Philip Wadler (Eds.). ACM, 268–279. doi:10.1145/351240.351266

work page doi:10.1145/351240.351266 2000
[14]

2026.code-server: VS Code in the browser

Coder. 2026.code-server: VS Code in the browser. https://github.com/coder/code- server

2026
[15]

Cursor. 2023. Cursor. https://cursor.com

2023
[16]

Sarah Fakhoury, Aaditya Naik, Georgios Sakkas, Saikat Chakraborty, and Shu- vendu K. Lahiri. 2024. LLM-Based Test-Driven Interactive Code Generation: User Study and Empirical Evaluation.IEEE Transactions on Software Engineering50, 9 (2024), 2254–2268. doi:10.1109/TSE.2024.3428972

work page doi:10.1109/tse.2024.3428972 2024
[17]

Muhammad Shoaib Farooq, Uzma Omer, Amna Ramzan, Mansoor Ahmad Rasheed, and Zabihullah Atal. 2023. Behavior Driven Development: A Sys- tematic Literature Review.IEEE Access11 (2023), 88008–88024. doi:10.1109/ ACCESS.2023.3302356

work page arXiv 2023
[18]

Ahmed Fawzy, Amjed Tahir, and Kelly Blincoe. 2025. Vibe Coding in Prac- tice: Motivations, Challenges, and a Future Outlook – a Grey Literature Review. arXiv:2510.00328 [cs.SE] https://arxiv.org/abs/2510.00328

work page arXiv 2025
[19]

K. J. Kevin Feng, Kevin Pu, Matt Latzke, Tal August, Pao Siangliulue, Jonathan Bragg, Daniel S. Weld, Amy X. Zhang, and Joseph Chee Chang. 2026. Cocoa: Co-Planning and Co-Execution with AI Agents. arXiv:2412.10999 [cs.HC] https: //arxiv.org/abs/2412.10999

work page arXiv 2026
[20]

InProceedings of the 2024 CHI Conference on Human Factors in Computing Systems(Honolulu, HI, USA)(CHI ’24)

Katy Ilonka Gero, Chelse Swoopes, Ziwei Gu, Jonathan K. Kummerfeld, and Elena L. Glassman. 2024. Supporting Sensemaking of Large Language Model Outputs at Scale. InProceedings of the CHI Conference on Human Factors in Computing Systems, CHI 2024, Honolulu, HI, USA, May 11-16, 2024, Florian ’Floyd’ Mueller, Penny Kyburz, Julie R. Williamson, Corina Sas, Ma...

work page arXiv 2024
[21]

Emmanuel Anaya González, Raven Rothkopf, Sorin Lerner, and Nadia Polikar- pova. 2025. HiLDE: Intentional Code Generation via Human-in-the-Loop Decod- ing. In2025 IEEE Symposium on Visual Languages and Human-Centric Computing (VL/HCC). 222–233. doi:10.1109/VL-HCC65237.2025.00032

work page doi:10.1109/vl-hcc65237.2025.00032 2025
[22]

Kevin Han, Siddharth Maddikayala, Tim Knappe, Om Patel, Austen Liao, and Amir Barati Farimani. 2026. TDFlow: Agentic Workflows for Test Driven Development. InProceedings of the 19th Conference of the European Chapter of the Association for Computational Linguistics (Volume 1: Long Papers), Vera Demberg, Kentaro Inui, and Lluís Marquez (Eds.). Association ...

work page doi:10.18653/v1/2026.eacl-long.70 2026
[23]

Sandra G Hart and Lowell E Staveland. 1988. Development of NASA-TLX (Task Load Index): Results of empirical and theoretical research. InAdvances in psy- chology. Vol. 52. Elsevier, 139–183

1988
[24]

Grace Murray Hopper. 1952. The education of a computer. InProceedings of the 1952 ACM National Meeting (Pittsburgh)(Pittsburgh, Pennsylvania)(ACM ’52). Association for Computing Machinery, New York, NY, USA, 243–249. doi:10. 1145/609784.609818

work page arXiv 1952
[25]

Grace Murray Hopper. 1969. Standardization of high-level languages. InPro- ceedings of the May 14-16, 1969, Spring Joint Computer Conference(Boston, Mas- sachusetts)(AFIPS ’69 (Spring)). Association for Computing Machinery, New York, NY, USA, 608–609. doi:10.1145/1476793.1476890

work page doi:10.1145/1476793.1476890 1969
[26]

Yiran Hu, Nan Jiang, Shanchao Liang, Yi Wu, and Lin Tan. 2025. TENET: Lever- aging Tests Beyond Validation for Code Generation. arXiv:2509.24148 [cs.SE] https://arxiv.org/abs/2509.24148

work page arXiv 2025
[27]

Ruanqianqian Huang, Avery Reyna, Sorin Lerner, Haijun Xia, and Brian Hempel
[28]

arXiv:2512.14012

Professional Software Developers Don’t Vibe, They Control: AI Agent Use for Coding in 2025. arXiv:2512.14012 [cs.SE] https://arxiv.org/abs/2512.14012

work page arXiv 2025
[29]

GitHub Inc. 2022. GitHub Copilot. https://github.com/features/copilot

2022
[30]

Stack Exchange Inc. [n. d.]. 2025 Stack Overflow Developer Survey. https: //survey.stackoverflow.co/2025

2025
[31]

Windsurf Inc. 2024. Windsurf. https://windsurf.com

2024
[32]

Eshin Jolly. 2018. Pymer4: Connecting R and Python for linear mixed modeling. Journal of Open Source Software3, 31 (2018), 862

2018
[33]

Hen- ley, Carina Negreanu, and Advait Sarkar

Majeed Kazemitabaar, Jack Williams, Ian Drosos, Tovi Grossman, Austin Z. Hen- ley, Carina Negreanu, and Advait Sarkar. 2024. Improving Steering and Veri- fication in AI-Assisted Data Analysis with Interactive Task Decomposition. In Proceedings of the 37th Annual ACM Symposium on User Interface Software and Technology (UIST ’24). ACM. doi:10.1145/3654777.3...

work page doi:10.1145/3654777.3676345 2024
[34]

Maurice George Kendall and Jean Dickinson Gibbons. 1962. Rank correlation methods. (1962)

1962
[35]

Werner Kunz and Horst W. J. Rittel. 1970.Issues as Elements of Information Systems. Technical Report 131. Institute of Urban and Regional Development, University of California, Berkeley, California

1970
[36]

LaToza, Maryam Arab, Dastyni Loksa, and Amy J

Thomas D. LaToza, Maryam Arab, Dastyni Loksa, and Amy J. Ko. 2020. Explicit programming strategies.Empir. Softw. Eng.25, 4 (2020), 2416–2449. doi:10.1007/ S10664-020-09810-1

2020
[37]

Yunhao Liang, Ruixuan Ying, Shiwen Ni, and Zhe Cui. 2026. Scaling Test- Driven Code Generation from Functions to Classes: An Empirical Study. arXiv:2602.03557 [cs.SE] https://arxiv.org/abs/2602.03557

work page arXiv 2026
[38]

Mackay and Joanna McGrenere

Wendy E. Mackay and Joanna McGrenere. 2025. Comparative Structured Obser- vation.ACM Trans. Comput.-Hum. Interact.32, 2, Article 14 (April 2025), 27 pages. doi:10.1145/3711838

work page doi:10.1145/3711838 2025
[39]

Young, Victoria Bellotti, and Thomas P

Allan MacLean, Richard M. Young, Victoria Bellotti, and Thomas P. Moran. 1991. Questions, Options, and Criteria: Elements of Design Space Analysis.Hum. Comput. Interact.6, 3-4 (1991), 201–250. doi:10.1080/07370024.1991.9667168

work page doi:10.1080/07370024.1991.9667168 1991
[40]

2018.Statistical rethinking: A Bayesian course with examples in R and Stan

Richard McElreath. 2018.Statistical rethinking: A Bayesian course with examples in R and Stan. Chapman and Hall/CRC

2018
[41]

Dirk Merkel. 2014. Docker: lightweight linux containers for consistent develop- ment and deployment.Linux journal2014, 239 (2014), 2

2014
[42]

2026.Visual Studio Code

Microsoft Corporation. 2026.Visual Studio Code. https://visualstudio.com Version 1.109

2026
[43]

1996.Design rationale: Concepts, techniques, and use

Thomas P Moran and John M Carroll. 1996.Design rationale: Concepts, techniques, and use. CRC Press

1996
[44]

1994.Heuristic Evaluation

Jakob Neilsen. 1994.Heuristic Evaluation. John Wiley & Sons, Inc., USA. 25–62 pages

1994
[45]

Anthropic PBC. 2025. Claude Code. https://claude.com/product/claude-code

2025
[46]

Veronica Pimenova, Sarah Fakhoury, Christian Bird, Margaret-Anne Storey, and Madeline Endres. 2025. Good Vibrations? A Qualitative Study of Co-Creation, Communication, Flow, and Trust in Vibe Coding. arXiv:2509.12491 [cs.SE] https: //arxiv.org/abs/2509.12491

work page arXiv 2025
[47]

Kevin Pu, Daniel Lazaro, Ian Arawjo, Haijun Xia, Ziang Xiao, Tovi Grossman, and Yan Chen. 2025. Assistance or disruption? exploring and evaluating the design and trade-offs of proactive ai programming support. InProceedings of the 2025 CHI conference on human factors in computing systems. 1–21

2025
[48]

Advait Sarkar and Ian Drosos. 2025. Vibe coding: programming through conver- sation with artificial intelligence. arXiv:2506.23253 [cs.HC] https://arxiv.org/ abs/2506.23253

work page arXiv 2025
[49]

Margaret-Anne Storey. 2026. From Technical Debt to Cognitive and Intent Debt: Rethinking Software Health in the Age of AI. arXiv:2603.22106 [cs.SE] https://arxiv.org/abs/2603.22106

work page internal anchor Pith review Pith/arXiv arXiv 2026
[50]

Sangho Suh, Meng Chen, Bryan Min, Toby Jia-Jun Li, and Haijun Xia. 2024. Luminate: Structured Generation and Exploration of Design Space with Large Language Models for Human-AI Co-Creation. InProceedings of the CHI Conference on Human Factors in Computing Systems, CHI 2024, Honolulu, HI, USA, May 11- 16, 2024, Florian ’Floyd’ Mueller, Penny Kyburz, Julie ...

work page doi:10.1145/3613904.3642400 2024
[51]

Mojtaba Vaismoradi, Hannele Turunen, and Terese Bondas. 2013. Content analy- sis and thematic analysis: Implications for conducting a qualitative descriptive study.Nursing & health sciences15, 3 (2013), 398–405

2013
[52]

Glassman, Jeevana Priya Inala, and Chenglong Wang

Priyan Vaithilingam, Elena L. Glassman, Jeevana Priya Inala, and Chenglong Wang. 2024. DynaVis: Dynamically Synthesized UI Widgets for Visualization Editing. InProceedings of the 2024 CHI Conference on Human Factors in Computing Systems(Honolulu, HI, USA)(CHI ’24). Association for Computing Machinery, New York, NY, USA, Article 985, 17 pages. doi:10.1145/...

work page doi:10.1145/3613904.3642639 2024
[53]

Glassman, and Ian Arawjo

Priyan Vaithilingam, Munyeong Kim, Frida-Cecilia Acosta-Parenteau, Daniel Lee, Amine Mhedhbi, Elena L. Glassman, and Ian Arawjo. 2025. Semantic Commit: Helping Users Update Intent Specifications for AI Memory at Scale. InProceedings of the 38th Annual ACM Symposium on User Interface Software and Technology, UIST 2025, Busan, Korea, 28 September 2025 - 1 O...

work page doi:10.1145/3746059.3747778 2025
[54]

Experience: Evaluating the Usability of Code Generation Tools Powered by Large Language Models

Priyan Vaithilingam, Tianyi Zhang, and Elena L. Glassman. 2022. Expectation vs. Experience: Evaluating the Usability of Code Generation Tools Powered by Large Language Models. InCHI Conference on Human Factors in Computing Systems Ex- tended Abstracts. ACM, New Orleans LA USA, 1–7. doi:10.1145/3491101.3519665

work page doi:10.1145/3491101.3519665 2022
[55]

Vasudev Vikram, Caroline Lemieux, Joshua Sunshine, and Rohan Padhye. 2023. Can large language models write good property-based tests?arXiv preprint arXiv:2307.04346(2023)

work page arXiv 2023
[56]

1992.Individual Comparisons by Ranking Methods

Frank Wilcoxon. 1992.Individual Comparisons by Ranking Methods. Springer New York, New York, NY, 196–202. doi:10.1007/978-1-4612-4380-9_16

work page doi:10.1007/978-1-4612-4380-9_16 1992
[57]

Ryan Yen, Jiawen Stefanie Zhu, Sangho Suh, Haijun Xia, and Jian Zhao. 2024. CoLadder: Manipulating Code Generation via Multi-Level Blocks. InProceedings of the 37th Annual ACM Symposium on User Interface Software and Technology, UIST 2024, Pittsburgh, PA, USA, October 13-16, 2024, Lining Yao, Mayank Goel, Alexandra Ion, and Pedro Lopes (Eds.). ACM, 11:1–1...

work page doi:10.1145/3654777 2024
[58]

J. D. Zamfirescu-Pereira, Eunice Jun, Michael Terry, Qian Yang, and Bjoern Hartmann. 2025. Beyond Code Generation: LLM-supported Exploration of the Program Design Space. InProceedings of the 2025 CHI Conference on Human Factors in Computing Systems, CHI 2025, YokohamaJapan, 26 April 2025- 1 May 2025, Naomi Yamashita, Vanessa Evers, Koji Yatani, Sharon Xia...

work page doi:10.1145/3706598.3714154 2025
[59]

Wenshuo Zhang, Leixian Shen, Shuchang Xu, Jindu Wang, Jian Zhao, Huamin Qu, and Linping Yuan. 2025. NeuroSync: Intent-Aware Code-Based Problem Solving via Direct LLM Understanding Modification. InProceedings of the 38th Annual ACM Symposium on User Interface Software and Technology, UIST 2025, Busan, Korea, 28 September 2025 - 1 October 2025, Andrea Bianc...

work page doi:10.1145/3746059.3747668 2025