pith. machine review for the scientific record. sign in

arxiv: 2604.10587 · v1 · submitted 2026-04-12 · 💻 cs.HC

Recognition: unknown

CogInstrument: Modeling Cognitive Processes for Bidirectional Human-LLM Alignment in Planning Tasks

Authors on Pith no claims yet

Pith reviewed 2026-05-10 15:52 UTC · model grok-4.3

classification 💻 cs.HC
keywords cognitive motifshuman-LLM alignmentplanning tasksbidirectional collaborationgraphical reasoningcausal dependenciesuser agencyeditable interfaces
0
0 comments X

The pith

CogInstrument turns implicit human reasoning into editable graphical motifs with causal links to improve alignment with LLMs during planning.

A machine-rendered reading of the paper's core claim, the machinery that carries it, and where it could break.

The paper claims that standard LLM chat interfaces hide the causal structure of human planning, so users cannot easily check or fix the logic behind outputs. CogInstrument extracts cognitive motifs from natural language, displays them as graphs of linked concepts, and lets users edit those graphs to negotiate changes with the model. A within-subjects study with twelve participants found that this approach supports more precise revisions and greater reuse than text-only dialogue. The system is presented as a way to give both sides a shared, inspectable model of the reasoning process. If the motifs accurately capture what matters, the result is higher user agency and trust in the collaboration.

Core claim

CogInstrument models user reasoning as cognitive motifs, which are compositional units of concepts joined by causal dependencies. These motifs are pulled from dialogue, shown as editable graphical structures, and used as the medium for iterative inspection and reconciliation between the human and the LLM. The paper states that this externalization converts opaque planning conversations into verifiable, revisable representations that both parties can negotiate directly.

What carries the argument

Cognitive motifs: revisable units of concepts connected by explicit causal dependencies that are extracted from natural language and rendered as editable graphs for bidirectional negotiation.

If this is right

  • Users can revise specific causal assumptions instead of restarting entire dialogues when an LLM output is misaligned.
  • Saved motifs become reusable templates that transfer across related planning problems.
  • The LLM receives explicit structural constraints rather than only surface-level text instructions.
  • Verification steps become possible at each causal link, raising the chance that flawed premises are caught before final plans are accepted.
  • The collaboration gains a persistent, inspectable record of the reasoning that both sides can reference later.

Where Pith is reading between the lines

These are editorial extensions of the paper, not claims the author makes directly.

  • If the motif representation proves stable across users, it could become a common intermediate layer for other planning tools that need to share causal structure with people.
  • The same graphical editing approach might extend to non-LLM systems where transparent reasoning chains are required, such as decision-support software.
  • Automated detection of motif conflicts could be added later to flag when user edits contradict earlier assumptions.
  • Longer-term use might reveal whether repeated motif editing leads to users developing more explicit mental models of their own planning habits.

Load-bearing premise

Human planning reasoning can be decomposed into discrete, revisable cognitive motifs whose causal links are reliably extractable from natural language and usefully edited in graphical form.

What would settle it

A replication study in which participants using the motif graphs show no measurable gain in detecting or correcting logical errors in LLM plans compared with a matched text-only interface would undermine the claim that the graphical externalization improves alignment.

Figures

Figures reproduced from arXiv: 2604.10587 by Anqi Wang, Dongyijie Pan, Pan Hui, Xin Tong.

Figure 1
Figure 1. Figure 1: From intent-centric prompting to cognition-centric interaction. We introduce a pipeline that extracts and structures [PITH_FULL_IMAGE:figures/full_fig_p001_1.png] view at source ↗
Figure 2
Figure 2. Figure 2: Representative clarification probes for different [PITH_FULL_IMAGE:figures/full_fig_p004_2.png] view at source ↗
Figure 3
Figure 3. Figure 3: A motif is a reusable reasoning pattern with con [PITH_FULL_IMAGE:figures/full_fig_p004_3.png] view at source ↗
Figure 4
Figure 4. Figure 4: CogInstrument interface. Panels (A–E) provide synchronized views of the underlying reasoning state, ranging from high-level dialogue planning (A) and structural reasoning mapping (B–D) to direct intervention and patch management (E) [PITH_FULL_IMAGE:figures/full_fig_p006_4.png] view at source ↗
Figure 5
Figure 5. Figure 5: Three interaction modes: user-driven revision, [PITH_FULL_IMAGE:figures/full_fig_p006_5.png] view at source ↗
Figure 6
Figure 6. Figure 6: Paired participant trajectories and condition means [PITH_FULL_IMAGE:figures/full_fig_p007_6.png] view at source ↗
Figure 7
Figure 7. Figure 7: CogInstrument consists of three dependency types among concepts: enable, constraint, and determine. A motif is a reusable cognitive dependency pattern including at least two concepts. A.4 Cognitive Motif Formula Definition. A cognitive motif is formally: 𝜇 = (𝐶𝜇, 𝐸𝜇, 𝜙𝜇 ) (2) where: • 𝐶𝜇 ⊆ C: concept nodes in the motif • 𝐸𝜇 : causal edges within the motif • 𝜙𝜇 : abstract reasoning function (e.g., “constrai… view at source ↗
Figure 8
Figure 8. Figure 8: System framework of CogInstrument. A.5 System Design [PITH_FULL_IMAGE:figures/full_fig_p014_8.png] view at source ↗
Figure 9
Figure 9. Figure 9: Full system-log timelines for Participants P1–P4. These traces mainly illustrate lighter structural uptake and several [PITH_FULL_IMAGE:figures/full_fig_p017_9.png] view at source ↗
Figure 10
Figure 10. Figure 10: Full system-log timelines for Participants P5–P8. Compared with Figure [PITH_FULL_IMAGE:figures/full_fig_p018_10.png] view at source ↗
Figure 11
Figure 11. Figure 11: Full system-log timelines for Participants P9–P12. These traces highlight the broadest range of strategies in the [PITH_FULL_IMAGE:figures/full_fig_p019_11.png] view at source ↗
read the original abstract

Although Large Language Models (LLMs) demonstrate proficiency in knowledge-intensive tasks, current interfaces frequently precipitate cognitive misalignment by failing to externalize users' underlying reasoning structures. Existing tools typically represent intent as "flat lists," thereby disregarding the causal dependencies and revisable assumptions inherent in human decision-making. We introduce CogInstrument, a system that represents user reasoning through cognitive motifs-compositional, revisable units comprising concepts linked by causal dependencies. CogInstrument extracts these motifs from natural language interactions and renders them as editable graphical structures to facilitate bidirectional alignment. This structural externalization enables both the user and the LLM to inspect, negotiate, and reconcile reasoning processes iteratively. A within-subjects study (N=12) demonstrates that CogInstrument explicitly surfaces implicit reasoning structures, facilitating more targeted revision and reusability over conventional LLM-based dialogue interfaces. By enabling users to verify the logical grounding of LLM outputs, CogInstrument significantly enhances user agency, trust, and structural control over the collaboration. This work formalizes cognitive motifs as a fundamental unit for human-LLM alignment, providing a novel framework for achieving structured, reasoning-based human-AI collaboration.

Editorial analysis

A structured set of objections, weighed in public.

Desk editor's note, referee report, simulated authors' rebuttal, and a circularity audit. Tearing a paper down is the easy half of reading it; the pith above is the substance, this is the friction.

Referee Report

2 major / 3 minor

Summary. The paper introduces CogInstrument, a system that extracts 'cognitive motifs'—compositional units of concepts linked by causal dependencies—from natural language interactions in planning tasks and renders them as editable graphical structures. This externalization is intended to enable bidirectional alignment by allowing users and LLMs to inspect, revise, and reconcile reasoning processes. A within-subjects study with N=12 participants is presented as demonstrating that the system supports more targeted revision and reusability than conventional flat LLM dialogue interfaces, thereby increasing user agency, trust, and structural control.

Significance. If the empirical results are robust, the work could contribute a structured alternative to current LLM interfaces by formalizing cognitive motifs as a unit for human-AI alignment. The graphical rendering approach has potential to improve inspectability of reasoning chains in collaborative planning. The conceptual framing is novel within HCI, though its impact hinges on stronger validation of the motif extraction and measurable benefits.

major comments (2)
  1. [§5 (User Study)] §5 (User Study): The within-subjects evaluation with N=12 reports qualitative benefits in revision, reusability, agency, and trust but provides no task descriptions, dependent variables, quantitative metrics, statistical tests, effect sizes, or controls for order effects and interface novelty. This directly undermines the abstract's claim that CogInstrument 'significantly enhances' these outcomes, as differences could stem from confounds rather than the motif structure itself.
  2. [§3.2 (Cognitive Motif Extraction)] §3.2 (Cognitive Motif Extraction): The procedure for identifying causal dependencies and revisable assumptions from natural language is described at a high level without validation against human annotations, inter-rater reliability, or ablation showing that the graphical representation (vs. text alone) drives the reported gains. This is load-bearing for the central claim that motifs accurately decompose reasoning into editable units.
minor comments (3)
  1. [Abstract] The abstract and introduction could include one or two concrete examples of a cognitive motif (e.g., a planning scenario with extracted concepts and dependencies) to clarify the representation before the system description.
  2. [Figure 2] Figure captions and the system architecture diagram would benefit from explicit labels indicating which components handle extraction versus rendering versus LLM negotiation.
  3. [§2] Related work should reference prior HCI systems on externalizing reasoning (e.g., argument mapping or causal diagramming tools) to better position the novelty of cognitive motifs.

Simulated Author's Rebuttal

2 responses · 0 unresolved

We thank the referee for the detailed and constructive comments. The feedback identifies key areas where additional detail and precision are needed to support our claims. We respond to each major comment below, indicating the revisions we will incorporate in the next version of the manuscript.

read point-by-point responses
  1. Referee: [§5 (User Study)] §5 (User Study): The within-subjects evaluation with N=12 reports qualitative benefits in revision, reusability, agency, and trust but provides no task descriptions, dependent variables, quantitative metrics, statistical tests, effect sizes, or controls for order effects and interface novelty. This directly undermines the abstract's claim that CogInstrument 'significantly enhances' these outcomes, as differences could stem from confounds rather than the motif structure itself.

    Authors: We agree that §5 requires substantial expansion to provide a clearer account of the evaluation. The study was exploratory and relied primarily on qualitative data from semi-structured interviews and interaction logs to surface themes around targeted revision and perceived agency. In the revision we will: (1) add explicit task descriptions for the planning scenarios used, (2) define the dependent variables (e.g., revision count, reuse of motifs, Likert-scale measures of agency and trust), (3) report any quantitative observations collected (e.g., edit frequency, session duration), and (4) include a dedicated limitations subsection addressing order effects, interface novelty, and the absence of statistical testing given the small sample. We will also revise the abstract to replace the phrase 'significantly enhances' with 'supports greater' or 'facilitates improved' to align with the exploratory, qualitative nature of the evidence and to avoid implying statistical significance. revision: partial

  2. Referee: [§3.2 (Cognitive Motif Extraction)] §3.2 (Cognitive Motif Extraction): The procedure for identifying causal dependencies and revisable assumptions from natural language is described at a high level without validation against human annotations, inter-rater reliability, or ablation showing that the graphical representation (vs. text alone) drives the reported gains. This is load-bearing for the central claim that motifs accurately decompose reasoning into editable units.

    Authors: We acknowledge that §3.2 currently presents the extraction process at a high level. We will expand the section with the full prompt template, decision rules for detecting causal links and revisable assumptions, and multiple concrete input-output examples from the study sessions. While the extraction is LLM-driven rather than manually annotated, we will add a small-scale validation subsection comparing a sample of automatically extracted motifs against independent human annotations (with agreement metrics). The main study already contrasts the full graphical motif interface against a standard text-only LLM dialogue baseline; we will strengthen the discussion to clarify how this comparison isolates the contribution of the structured, editable motif representation versus flat text. An explicit ablation of graphical versus textual motif rendering is beyond the current scope but will be noted as future work. revision: yes

Circularity Check

0 steps flagged

No significant circularity; claims rest on empirical user study

full rationale

The paper introduces CogInstrument as a novel interface for externalizing cognitive motifs and evaluates its benefits via a within-subjects user study (N=12). No mathematical derivations, equations, fitted parameters, or load-bearing self-citations appear in the abstract or described content. The central claims about improved revision, reusability, agency, trust, and structural control are presented as outcomes of the empirical demonstration rather than any self-referential definitions, constructed predictions, or reductions to prior inputs by construction. The formalization of cognitive motifs is introduced as a new framework without tautological loops or ansatz smuggling.

Axiom & Free-Parameter Ledger

0 free parameters · 1 axioms · 1 invented entities

The approach rests on domain assumptions about human cognition rather than new mathematical constructs; no free parameters or invented physical entities are introduced.

axioms (1)
  • domain assumption Human decision-making in planning tasks involves causal dependencies and revisable assumptions that can be decomposed into compositional cognitive motifs.
    Invoked as the basis for extracting and rendering reasoning structures from natural language interactions.
invented entities (1)
  • cognitive motifs no independent evidence
    purpose: To serve as the fundamental unit for representing and externalizing user reasoning in a form editable by both humans and LLMs.
    Newly defined compositional units without independent empirical validation outside this work.

pith-pipeline@v0.9.0 · 5501 in / 1268 out tokens · 38935 ms · 2026-05-10T15:52:45.643022+00:00 · methodology

discussion (0)

Sign in with ORCID, Apple, or X to comment. Anyone can read and Pith papers without signing in.

Reference graph

Works this paper leans on

80 extracted references · 55 canonical work pages

  1. [1]

    M., Basappa, R.,Bergsmann,S.,Bouneffouf,D.,Callaghan,P.,Cavazza, M., Chaminade, T.,

    Saleema Amershi, Maya Cakmak, W. Bradley Knox, and Todd Kulesza. 2014. Power to the People: The Role of Humans in In- teractive Machine Learning.AI Magazine35, 4 (2014), 105–120. arXiv:https://onlinelibrary.wiley.com/doi/pdf/10.1609/aimag.v35i4.2513 doi:10.1609/aimag.v35i4.2513

  2. [2]

    Chris Argyris. 2002. Teaching Smart People How to Learn.Reflections: The SoL Journal4, 2 (Dec. 2002), 4–15. doi:10.1162/152417302762251291

  3. [3]

    Michel Beaudouin-Lafon. 2000. Instrumental interaction: an interaction model for designing post-WIMP user interfaces. InProceedings of the SIGCHI Conference on Human Factors in Computing Systems(The Hague, The Netherlands)(CHI ’00). Association for Computing Machinery, New York, NY, USA, 446–453. doi:10.114 5/332040.332473

  4. [4]

    Bradshaw, Paul J

    Jeffrey M. Bradshaw, Paul J. Feltovich, Hyuckchul Jung, Shriniwas Kulkarni, William Taysom, and Andrzej Uszok. 2003. Dimensions of Adjustable Auton- omy and Mixed-Initiative Interaction. InInternational Workshop on Conceptual Autonomy. Springer, 17–39

  5. [5]

    Ulrik Brandes and Boris Köpf. 2002. Fast and Simple Horizontal Coordinate Assignment.Graph Drawing(2002), 31–44

  6. [6]

    Virginia Braun and Victoria Clarke. 2019. Reflecting on reflexive thematic analysis. Qualitative research in sport, exercise and health11, 4 (2019), 589–597

  7. [7]

    Ruth M. J. Byrne. 2005.The Rational Imagination: How People Create Alternatives to Reality. MIT Press, Cambridge, MA

  8. [9]

    DaEun Choi, Kihoon Son, Jaesang Yu, HyunJoon Jung, and Juho Kim. 2025. IdeaBlocks: Expressing and Reusing Exploratory Intents for Design Exploration with Generative AI. InAdjunct Proceedings of the 38th Annual ACM Symposium on User Interface Software and Technology. ACM, Busan Republic of Korea, 1–4. doi:10.1145/3746058.3759001

  9. [10]

    H., & Brennan, S

    Herbert H. Clark and Susan E. Brennan. 1991. Grounding in Communication. In Perspectives on Socially Shared Cognition. American Psychological Association, 127–149. doi:10.1037/10096-006

  10. [11]

    Adam J Coscia, Shunan Guo, Eunyee Koh, and Alex Endert. 2025. OnGoal: Tracking and Visualizing Conversational Goals in Multi-Turn Dialogue with Large Language Models. InProceedings of the 38th Annual ACM Symposium on User Interface Software and Technology. ACM, Busan Republic of Korea, 1–18. doi:10.1145/3746059.3747746

  11. [12]

    Gray, Erik Heishman, Fayin Li, Azriel Rosenfeld, Michael J

    Zoran Duric, Wayne D. Gray, Erik Heishman, Fayin Li, Azriel Rosenfeld, Michael J. Schoelles, Christian Schunn, and Harry Wechsler. 2002. Integrating Perceptual and Cognitive Modeling for Adaptive and Intelligent Human-Computer Interac- tion.Proc. IEEE90, 7 (jul 2002), 1272–1289. doi:10.1109/JPROC.2002.801449

  12. [13]

    Karin Ericsson and Herbert A. Simon. 1980. Verbal reports as data.Psychological Review87 (1980), 215–251. https://api.semanticscholar.org/CorpusID:144763091

  13. [14]

    Li Feng, Ryan Yen, Yuzhe You, Mingming Fan, Jian Zhao, and Zhicong Lu. 2024. CoPrompt: Supporting Prompt Sharing and Referring in Collaborative Natural Language Programming. InProceedings of the CHI Conference on Human Factors in Computing Systems. ACM, Honolulu HI USA, 1–21. doi:10.1145/3613904.3642212

  14. [15]

    1988.Knowledge in Flux: Modeling the Dynamics of Epistemic States

    Peter Gärdenfors. 1988.Knowledge in Flux: Modeling the Dynamics of Epistemic States. MIT Press, Cambridge, MA

  15. [16]

    Dedre Gentner. 1983. Structure-mapping: A theoretical framework for analogy. Cognitive Science7, 2 (1983), 155–170

  16. [17]

    Frederic Gmeiner, Nicolai Marquardt, Michael Bentley, Hugo Romat, Michel Pahud, David Brown, Asta Roseway, Nikolas Martelaro, Kenneth Holstein, Ken Hinckley, and Nathalie Riche. 2025. Intent Tagging: Exploring Micro-Prompting Interactions for Supporting Granular Human-GenAI Co-Creation Workflows. In Proceedings of the 2025 CHI Conference on Human Factors ...

  17. [18]

    Goodman, Joshua B

    Noah D. Goodman, Joshua B. Tenenbaum, and Tobias Gerstenberg. 2015.Concepts in a Probabilistic Language of Thought. The MIT Press, 623–654. http://www.js tor.org/stable/j.ctt17kk9nr.27

  18. [19]

    Sobel, Laura E

    Alison Gopnik, Clark Glymour, David M. Sobel, Laura E. Schulz, Tamar Kushnir, and David Danks. 2004. A Theory of Causal Learning in Children: Causal Maps and Bayes Nets.Psychological Review111, 1 (2004), 3–32. doi:10.1037/0033- 295X.111.1.3

  19. [20]

    Griffiths, Nick Chater, Charles Kemp, Amy Perfors, and Joshua B

    Thomas L. Griffiths, Nick Chater, Charles Kemp, Amy Perfors, and Joshua B. Tenenbaum. 2010. Probabilistic models of cognition: Exploring representations and inductive biases.Trends in Cognitive Sciences14, 8 (2010), 357–364

  20. [21]

    Alicia Guo, Shreya Sathyanarayanan, Leijie Wang, Jeffrey Heer, and Amy X. Zhang. 2025. From Pen to Prompt: How Creative Writers Integrate AI into their Writing Practice. InProceedings of the 2025 Conference on Creativity and Cognition. ACM, Virtual United Kingdom, 527–545. doi:10.1145/3698061.3726910

  21. [22]

    and Nilsson, Nils J

    Peter E. Hart, Nils J. Nilsson, and Bertram Raphael. 1968. A Formal Basis for the Heuristic Determination of Minimum Cost Paths.IEEE Transactions on Systems Science and Cybernetics4, 2 (1968), 100–107. doi:10.1109/TSSC.1968.300136

  22. [23]

    Jeffrey Heer. 2019. Agency plus Automation: Designing Artificial Intelligence into Interactive Systems.Proceedings of the National Academy of Sciences (PNAS) 116, 6 (2019), 1844–1850. doi:10.1073/pnas.1807184115

  23. [24]

    Eric Horvitz. 1999. Principles of Mixed-Initiative User Interfaces. InProceedings of the SIGCHI Conference on Human Factors in Computing Systems. 159–166. doi:10.1145/302979.303030

  24. [25]

    Ziheng Huang, Kexin Quan, Joel Chan, and Stephen MacNeil. 2023. CausalMap- per: Challenging designers to think in systems with Causal Maps and Large Language Model. InCreativity and Cognition. ACM, Virtual Event USA, 325–329. doi:10.1145/3591196.3596818 TLDR: CausalMapper is presented, a mixed- initiative system, that leverages a large language model (LLM...

  25. [26]

    1995.Cognition in the Wild

    Edwin Hutchins. 1995.Cognition in the Wild. The MIT Press. doi:10.7551/mitpre ss/1881.001.0001

  26. [27]

    Hutchins, James D

    Edwin L. Hutchins, James D. Hollan, and Donald A. Norman. 1985. Direct manipulation interfaces.Hum.-Comput. Interact.1, 4 (Dec. 1985), 311–338. doi:10 .1207/s15327051hci0104_2

  27. [28]

    Dirk Ifenthaler. 2011. Identifying cross-domain distinguishing features of cog- nitive structure.Educational Technology Research and Development59, 6 (Dec. Conference acronym ’XX, June 03–05, 2018, Woodstock, NY Anqi Wang, Dongyijie Pan, Xin Tong, and Pan Hui 2011), 817–840. doi:10.1007/s11423-011-9207-4

  28. [29]

    Schulz, and Joshua B

    Julian Jara-Ettinger, Laura E. Schulz, and Joshua B. Tenenbaum. 2020. The Naïve Utility Calculus as a unified, quantitative framework for action understanding. Cognitive Psychology123 (2020), 101334. doi:10.1016/j.cogpsych.2020.101334

  29. [30]

    Dae Hyun Kim, Daeheon Jeong, Shakhnozakhon Yadgarova, Hyungyu Shin, Jinho Son, Hariharan Subramonyam, and Juho Kim. 2025. PlanTogether: Facilitating AI Application Planning Using Information Graphs and Large Language Models. In Proceedings of the 2025 CHI Conference on Human Factors in Computing Systems. ACM, Yokohama Japan, 1–23. doi:10.1145/3706598.3714044

  30. [31]

    Tae Soo Kim, Yoonjoo Lee, Minsuk Chang, and Juho Kim. 2023. Cells, Gen- erators, and Lenses: Design Framework for Object-Oriented Interaction with Large Language Models. InProceedings of the 36th Annual ACM Symposium on User Interface Software and Technology. ACM, San Francisco CA USA, 1–18. doi:10.1145/3586183.3606833

  31. [32]

    Yoonsu Kim, Brandon Chin, Kihoon Son, Seoyoung Kim, and Juho Kim. 2025. IntentFlow: Interactive Support for Communicating Intent with LLMs in Writing Tasks. doi:10.48550/arXiv.2507.22134

  32. [33]

    Lake, Ruslan Salakhutdinov, and Joshua B

    Brenden M. Lake, Ruslan Salakhutdinov, and Joshua B. Tenenbaum. 2015. Human- level concept learning through probabilistic program induction.Science350, 6266 (2015), 1332–1338

  33. [34]

    Lake, Tomer D

    Brenden M. Lake, Tomer D. Ullman, Joshua B. Tenenbaum, and Samuel J. Gersh- man. 2017. Building machines that learn and think like people.Behavioral and Brain Sciences40 (2017), e253. doi:10.1017/S0140525X16001837

  34. [35]

    Chance Jiajie Li, Jiayi Wu, Zhenze Mo, Ao Qu, Yuhan Tang, Kaiya Ivy Zhao, Yulu Gan, Jie Fan, Jiangbo Yu, Jinhua Zhao, Paul Liang, Luis Alonso, and Kent Larson

  35. [36]

    Simulating society requires simulating thought,

    Simulating Society Requires Simulating Thought. arXiv:2506.06958 [cs] doi:10.48550/arXiv.2506.06958

  36. [37]

    Shuai Ma, Qiaoyi Chen, Xinru Wang, Chengbo Zheng, Zhenhui Peng, Ming Yin, and Xiaojuan Ma. 2025. Towards Human-AI Deliberation: Design and Evaluation of LLM-Empowered Deliberative AI for AI-Assisted Decision-Making. InProceedings of the 2025 CHI Conference on Human Factors in Computing Systems. Article 261, 261:1–261:23 pages. doi:10.1145/3706598.3713423

  37. [38]

    Sara McNeil. 2015. Visualizing mental models: understanding cognitive change to support teaching and learning of multimedia design and development.Educa- tional Technology Research and Development63, 1 (Feb. 2015), 73–96. doi:10.1007/ s11423-014-9354-5

  38. [39]

    Yu Mei, Yuanxi Wang, Shiyi Wang, Qingyang Wan, Zhuojun Li, Chun Yu, Weinan Shi, and Yuanchun Shi. 2025. InterQuest: A Mixed-Initiative Framework for Dynamic User Interest Modeling in Conversational Search. InProceedings of the 38th Annual ACM Symposium on User Interface Software and Technology. ACM, Busan Republic of Korea, 1–23. doi:10.1145/3746059.3747753

  39. [40]

    Donald A. Norman. 1986. Cognitive Engineering. InUser Centered System Design (0 ed.). CRC Press, Boca Raton, 31–62. doi:10.1201/b15703-3

  40. [41]

    Donald A. Norman. 2013.The design of everyday things(rev. and expanded edition ed.). MIT press, Cambridge (Mass.)

  41. [42]

    Judith Reitman Olson and Gary M. Olson. 1995. The Growth of Cognitive Modeling in Human-Computer Interaction Since GOMS. InReadings in Human– Computer Interaction: Toward the Year 2000, Ronald M. Baecker, Jonathan Grudin, William A. S. Buxton, and Saul Greenberg (Eds.). Morgan Kaufmann Publishers Inc., San Francisco, CA, USA, 603–625. doi:10.1016/B978-0-0...

  42. [43]

    2009.Causality: Models, Reasoning, and Inference(2 ed.)

    Judea Pearl. 2009.Causality: Models, Reasoning, and Inference(2 ed.). Cambridge University Press, Cambridge, UK

  43. [44]

    2002.People and Technology: A Cognitive Approach to Contempo- rary Instruments

    Pierre Rabardel. 2002.People and Technology: A Cognitive Approach to Contempo- rary Instruments. Université Paris 8, Paris. Translated by Heidi Wood

  44. [45]

    Nathalie Riche, Anna Offenwanger, Frederic Gmeiner, David Brown, Hugo Romat, Michel Pahud, Nicolai Marquardt, Kori Inkpen, and Ken Hinckley. 2025. AI- Instruments: Embodying Prompts as Instruments to Abstract & Reflect Graphical Interface Commands as General-Purpose Tools. InProceedings of the 2025 CHI Conference on Human Factors in Computing Systems. ACM...

  45. [46]

    Rumelhart

    David E. Rumelhart. 1980. Schemata: The building blocks of cognition. In Theoretical Issues in Reading Comprehension, Rand J. Spiro, Bertram C. Bruce, and William F. Brewer (Eds.). Lawrence Erlbaum Associates, Hillsdale, NJ, 33–58

  46. [47]

    Gaver, Jacob Beaver, and Steve Benford

    Dario D. Salvucci and Frank J. Lee. 2003. Simple cognitive modeling in a complex cognitive architecture. InProceedings of the SIGCHI Conference on Human Factors in Computing Systems. ACM, Ft. Lauderdale Florida USA, 265–272. doi:10.1145/ 642611.642658

  47. [48]

    Omar Shaikh, Hussein Mozannar, Gagan Bansal, Adam Fourney, and Eric Horvitz

  48. [49]

    Shaikh, H

    Navigating Rifts in Human-LLM Grounding: Study and Benchmark.arXiv preprint arXiv:2503.13975(2025). arXiv:2503.13975 [cs.CL] https://arxiv.org/abs/ 2503.13975

  49. [50]

    Bernstein

    Omar Shaikh, Shardul Sapkota, Shan Rizvi, Eric Horvitz, Joon Sung Park, Diyi Yang, and Michael S. Bernstein. 2025. Creating General User Models from Computer Use. InProceedings of the 38th Annual ACM Symposium on User Interface Software and Technology. ACM, Busan Republic of Korea, 1–23. doi:10.1145/3746059.3747722

  50. [51]

    Xinyu Shi, Yinghou Wang, Ryan Rossi, and Jian Zhao. 2025. Brickify: Enabling Expressive Design Intent Specification through Direct Manipulation on Design Tokens. InProceedings of the 2025 CHI Conference on Human Factors in Computing Systems (CHI ’25). Association for Computing Machinery, New York, NY, USA, Article 424, 20 pages. doi:10.1145/3706598.3714087

  51. [52]

    Kihoon Son, DaEun Choi, Tae Soo Kim, Young-Ho Kim, Sangdoo Yun, and Juho Kim. 2025. ClearFairy: Capturing Creative Workflows through Decision Structur- ing, In-Situ Questioning, and Rationale Inference. doi:10.48550/arXiv.2509.14537 arXiv:2509.14537 [cs]

  52. [53]

    Hari Subramonyam, Roy Pea, Christopher Pondoc, Maneesh Agrawala, and Colleen Seifert. 2024. Bridging the Gulf of Envisioning: Cognitive Challenges in Prompt Based Interactions with LLMs. InProceedings of the CHI Conference on Human Factors in Computing Systems. ACM, Honolulu HI USA, 1–19. doi:10.114 5/3613904.3642754

  53. [54]

    Hari Subramonyam, Divy Thakkar, Andrew Ku, Juergen Dieber, and Anoop K. Sinha. 2025. Prototyping with Prompts: Emerging Approaches and Challenges in Generative AI Design for Collaborative Software Teams. InProceedings of the 2025 CHI Conference on Human Factors in Computing Systems (CHI ’25). Association for Computing Machinery, New York, NY, USA, Article...

  54. [55]

    Kozo Sugiyama, Shojiro Tagawa, and Mitsuhiko Toda. 1981. Methods for Visual Understanding of Hierarchical System Structures.IEEE Transactions on Systems, Man, and Cybernetics11, 2 (1981), 109–125

  55. [56]

    Sangho Suh, Meng Chen, Bryan Min, Toby Jia-Jun Li, and Haijun Xia. 2024. Luminate: Structured Generation and Exploration of Design Space with Large Language Models for Human-AI Co-Creation. InProceedings of the CHI Conference on Human Factors in Computing Systems. ACM, Honolulu HI USA, 1–26. doi:10.1 145/3613904.3642400

  56. [57]

    Sangho Suh, Bryan Min, Srishti Palani, and Haijun Xia. 2023. Sensecape: En- abling Multilevel Exploration and Sensemaking with Large Language Models. In Proceedings of the 36th Annual ACM Symposium on User Interface Software and Technology. ACM, San Francisco CA USA, 1–18. doi:10.1145/3586183.3606756

  57. [58]

    John Sweller. 1988. Cognitive load during problem solving: Effects on learning. Cognitive Science12, 2 (1988), 257–285

  58. [59]

    Lev Tankelevitch, Viktor Kewenig, Auste Simkute, Ava Elizabeth Scott, Advait Sarkar, Abigail Sellen, and Sean Rintel. 2024. The Metacognitive Demands and Opportunities of Generative AI. InProceedings of the CHI Conference on Human Factors in Computing Systems. Article 680, 680:1–680:24 pages. doi:10.1145/3613 904.3642902

  59. [60]

    Robert E. Tarjan. 1972. Depth-First Search and Linear Graph Algorithms.SIAM J. Comput.1, 2 (1972), 146–160. doi:10.1137/0201010

  60. [61]

    Tauber and David Ackermann

    Michael J. Tauber and David Ackermann. 1991.Mental models and human- computer interaction 2. Number 7 in Human factors in information technology. North-Holland Distributors for the U.S.A. and Canada, Elsevier Science Pub. Co, Amsterdam New York New York, N.Y., U.S.A

  61. [62]

    Tenenbaum, Charles Kemp, Thomas L

    Joshua B. Tenenbaum, Charles Kemp, Thomas L. Griffiths, and Noah D. Goodman

  62. [63]

    doi:10.1126/science.1192788

    How to Grow a Mind: Statistics, Structure, and Abstraction.Science331, 6022 (2011), 1279–1285. doi:10.1126/science.1192788

  63. [64]

    Joshua B Tenenbaum, Charles Kemp, Thomas L Griffiths, and Noah D Goodman

  64. [65]

    How to Grow a Mind: Statistics, Structure, and Abstraction.Science331, 6022 (2011), 1279–1285

  65. [66]

    Glassman, and Ian Arawjo

    Priyan Vaithilingam, Munyeong Kim, Frida-Cecilia Acosta-Parenteau, Daniel Lee, Amine Mhedhbi, Elena L. Glassman, and Ian Arawjo. 2025. Semantic Commit: Helping Users Update Intent Specifications for AI Memory at Scale. InProceedings of the 38th Annual ACM Symposium on User Interface Software and Technology. Article 137, 137:1–137:18 pages. doi:10.1145/374...

  66. [67]

    Glassman, and Ian Arawjo

    Priyan Vaithilingam, Munyeong Kim, Frida-Cecilia Acosta-Parenteau, Daniel Lee, Amine Mhedhbi, Elena L. Glassman, and Ian Arawjo. 2025. Semantic Commit: Helping Users Update Intent Specifications for AI Memory at Scale. doi:10.48550 /arXiv.2504.09283 arXiv:2504.09283 [cs]

  67. [68]

    Anqi Wang, Zhengyi Li, Xin Tong, and Pan Hui. 2026. DesignerlyLoop: Form- ing Design Intent through Curated Reasoning for Human-LLM Alignment. arXiv:2511.15331 [cs.HC] https://arxiv.org/abs/2511.15331

  68. [69]

    Xingyi Wang, Xiaozheng Wang, Sunyup Park, and Yaxing Yao. 2025. Mental Models of Generative AI Chatbot Ecosystems. InProceedings of the 30th Interna- tional Conference on Intelligent User Interfaces. ACM, Cagliari Italy, 1016–1031. doi:10.1145/3708359.3712125

  69. [70]

    Weisz, Jessica He, Michael Muller, Gabriela Hoefer, Rachel Miles, and Werner Geyer

    Justin D. Weisz, Jessica He, Michael Muller, Gabriela Hoefer, Rachel Miles, and Werner Geyer. 2024. Design Principles for Generative AI Applications. InPro- ceedings of the 2024 CHI Conference on Human Factors in Computing Systems (Honolulu, HI, USA)(CHI ’24). Association for Computing Machinery, New York, NY, USA, Article 378, 22 pages. doi:10.1145/36139...

  70. [71]

    Thomas Willemain. 2019. Visualization and The Process of Modeling: A Cognitive- theoretic View. doi:10.1287/9beec4ec-43ac-4cf9-912b-eb018324857f

  71. [72]

    and Goodman, Noah D

    Lionel Wong, Gabriel Grand, Alexander K. Lew, Noah D. Goodman, Vikash K. Mansinghka, Jacob Andreas, and Joshua B. Tenenbaum. 2023. From Word Mod- els to World Models: Translating from Natural Language to the Probabilistic Language of Thought. arXiv:2306.12672 [cs.CL] https://arxiv.org/abs/2306.12672

  72. [73]

    Tongshuang Wu, Michael Terry, and Carrie Jun Cai. 2022. AI Chains: Transparent and Controllable Human-AI Interaction by Chaining Large Language Model CogInstrument Conference acronym ’XX, June 03–05, 2018, Woodstock, NY Prompts. InCHI Conference on Human Factors in Computing Systems. ACM, New Orleans LA USA, 1–22. doi:10.1145/3491102.3517582

  73. [74]

    Qian Yang, Jina Suh, Nan-Chen Chen, and Gonzalo Ramos. 2018. Grounding in- teractive machine learning tool design in how non-experts actually build models. InProceedings of the 2018 designing interactive systems conference. 573–584

  74. [75]

    Ryan Yen and Jian Zhao. 2024. Memolet: Reifying the Reuse of User-AI Conversational Memories. InProceedings of the 37th Annual ACM Sympo- sium on User Interface Software and Technology. Article 58, 58:1–58:22 pages. doi:10.1145/3654777.3676388

  75. [76]

    Yiwen Yin, Yu Mei, Chun Yu, Toby Jia-Jun Li, Aamir Khan Jadoon, Sixiang Cheng, Weinan Shi, Mohan Chen, and Yuanchun Shi. 2025. From Operation to Cognition: Automatic Modeling Cognitive Dependencies from User Demonstrations for GUI Task Automation. InProceedings of the 2025 CHI Conference on Human Factors in Computing Systems. ACM, Yokohama Japan, 1–24. do...

  76. [77]

    Matej Zečević, Moritz Willig, Devendra Singh Dhami, and Kristian Kersting. 2023. Causal Parrots: Large Language Models May Talk Causality But Are Not Causal. arXiv:2308.13067 [cs.AI] https://arxiv.org/abs/2308.13067

  77. [78]

    Rzeszotarski

    Chao Zhang, Kexin Ju, Zhuolun Han, Yu-Chun Grace Yen, and Jeffrey M. Rzeszotarski. 2025. Synthia: Visually Interpreting and Synthesizing Feedback for Writing Revision. InProceedings of the 38th Annual ACM Symposium on User Interface Software and Technology. ACM, Busan Republic of Korea, 1–16. doi:10.1145/3746059.3747703

  78. [79]

    Wenshuo Zhang, Leixian Shen, Shuchang Xu, Jindu Wang, Jian Zhao, Huamin Qu, and Linping Yuan. 2025. NeuroSync: Intent-Aware Code-Based Problem Solving via Direct LLM Understanding Modification. doi:10.1145/3746059.3747668

  79. [80]

    Zhongyi Zhou, Jing Jin, Vrushank Phadnis, Xiuxiu Yuan, Jun Jiang, Xun Qian, Kristen Wright, Mark Sherwood, Jason Mayes, Jingtao Zhou, Yiyi Huang, Zheng Xu, Yinda Zhang, Johnny Lee, Alex Olwal, David Kim, Ram Iyengar, Na Li, and Ruofei Du. 2025. InstructPipe: Generating Visual Blocks Pipelines with Human Instructions and LLMs. InProceedings of the 2025 CHI...

  80. [81]

    Increasing budget enables higher-quality hotels

    John Zimmerman, Jodi Forlizzi, and Shelley Evenson. 2007. Research Through Design as a Method for Interaction Design Research in HCI. InProceedings of the SIGCHI Conference on Human Factors in Computing Systems (CHI ’07). Association for Computing Machinery, New York, NY, USA, 493–502. doi:10.1145/1240624.12 40704 A Framework: Representative Model A.1 Typ...