arxiv: 2604.06183 · v1 · submitted 2026-02-09 · 💻 cs.HC

Recognition: no theorem link

The Impact of Response Latency and Task Type on Human-LLM Interaction and Perception

Felicia Fang-Yi Tan , Moritz A. Messerschmidt , Wen Yin , Oded Nov

Authors on Pith no claims yet

Pith reviewed 2026-05-16 05:25 UTC · model grok-4.3

classification 💻 cs.HC

keywords response latencyLLM perceptionhuman-LLM interactiontask typeoutput qualityuser behaviordesign variable

0 comments

The pith

LLM users rate outputs as more thoughtful and useful after 9- or 20-second latencies than after 2-second ones.

A machine-rendered reading of the paper's core claim, the machinery that carries it, and where it could break.

A controlled experiment varied time-to-first-token latency at 2, 9, and 20 seconds while holding two knowledge task types fixed. Interaction logs showed that prompting frequency stayed stable across latency levels but rose in creation tasks relative to advice tasks. Subjective ratings, however, dropped for the shortest latency: participants judged the outputs less thoughtful and less useful. Most users read the pauses as evidence that the model was deliberating, though the longest waits sometimes flipped the interpretation toward frustration or doubts about reliability. The study therefore treats latency as an adjustable design parameter instead of a quantity that must always be driven to zero.

Core claim

Participants who received 2-second latencies rated the same LLM outputs lower on thoughtfulness and usefulness than those who received 9- or 20-second latencies; interaction behaviors remained insensitive to latency yet differed by task type, and users largely attributed delays to model deliberation except when waits grew long enough to prompt reliability concerns.

What carries the argument

Controlled manipulation of time-to-first-token latency across taxonomy-driven creation and advice tasks, paired with behavioral logging and post-task rating scales.

If this is right

Moderate delays can be retained in LLM interfaces to support higher perceived output quality.
Interaction frequency depends more on task category than on response speed.
Users interpret latency primarily as thinking time until the delay becomes excessive.
Design choices around latency carry ethical weight because they shape trust and perceived reliability.
Task-specific prompting patterns persist regardless of latency level.

Where Pith is reading between the lines

These are editorial extensions of the paper, not claims the author makes directly.

Explicit thinking indicators in the interface could be tested as substitutes for actual waiting time.
The effect may extend to other AI systems that generate knowledge outputs beyond current LLMs.
Very fast responses might systematically bias users toward viewing content as shallow in real deployments.
Latency tuning could be combined with other cues such as partial output streaming to optimize both perception and engagement.

Load-bearing premise

The measured differences in output ratings are produced by the latency manipulation itself rather than by participants' expectations or by uncontrolled features of task presentation.

What would settle it

A replication study that tells participants the latency values are randomly assigned and unrelated to actual model computation time, then finds that the rating gap between 2-second and longer conditions disappears.

Figures

Figures reproduced from arXiv: 2604.06183 by Felicia Fang-Yi Tan, Moritz A. Messerschmidt, Oded Nov, Wen Yin.

**Figure 2.** Figure 2: System architecture. The Qualtrics survey embeds a web application via an HTML iframe. The front-end displays a [PITH_FULL_IMAGE:figures/full_fig_p005_2.png] view at source ↗

**Figure 3.** Figure 3: Front-end chat interface with key features highlighted: (a) start or refresh a new chat, (b) chat view with streamed [PITH_FULL_IMAGE:figures/full_fig_p006_3.png] view at source ↗

**Figure 4.** Figure 4: The experiment’s task interface. The top panel presents the task description, the middle panel is where participants [PITH_FULL_IMAGE:figures/full_fig_p007_4.png] view at source ↗

**Figure 5.** Figure 5: Mean event count per participant for prompt sub [PITH_FULL_IMAGE:figures/full_fig_p008_5.png] view at source ↗

**Figure 6.** Figure 6: Mean ratings (±95% CI) by latency and task type. Ratings are displayed on a truncated scale (5–7) for visual clarity; all measures were collected on a 1–7 Likert scale. Creation responses were higher than Advice on Clarity (**), Relevance (*), Understanding (**), and Usefulness (***). Thoughtfulness increased with latency (2 s < 9 s ** ; 2 s < 20 s *); Usefulness was greater at 9 s than 2 s (*). Asterisks … view at source ↗

**Figure 7.** Figure 7: Percentage of participants who reported noticing [PITH_FULL_IMAGE:figures/full_fig_p010_7.png] view at source ↗

read the original abstract

Responsiveness in large language model (LLM) applications is widely assumed to be critical, yet the impact of latency on user behavior and perception of output quality has not been systematically explored. We report a controlled experiment varying time-to-first-token latency (2, 9, 20 seconds) across two taxonomy-driven knowledge task types (Creation and Advice). Log analyses reveal that user interaction behaviors were robust to latency, yet varied by task type: Creation tasks elicited more frequent prompting than Advice tasks. In contrast, participants who experienced 2-second latencies rated the LLM's outputs less thoughtful and useful than those who experienced 9- or 20-second latencies. Participants attributed delays to AI deliberation, though long waits occasionally shifted this interpretation toward frustration or concerns about reliability. Overall, this work demonstrates that latency is not simply a cost to reduce but a tunable design variable with ethical implications. We offer design strategies for enhancing human-LLM interaction.

Editorial analysis

A structured set of objections, weighed in public.

Desk editor's note, referee report, simulated authors' rebuttal, and a circularity audit. Tearing a paper down is the easy half of reading it; the pith above is the substance, this is the friction.

Desk Editor's Note private letter to a colleague

Short latency tanks perceived quality ratings here while task type shifts interaction volume, but the causality claim rests on thin controls.

read the letter

The paper runs a controlled experiment that varies time-to-first-token latency at 2, 9, and 20 seconds across creation and advice tasks. The key result is that the shortest latency produced lower ratings on thoughtfulness and usefulness, while creation tasks drew more follow-up prompts than advice tasks. Users mostly read delays as the model thinking, though very long waits sometimes flipped that to frustration. That pattern is new enough to be worth noting because prior work mostly assumed latency effects without testing them this way across task types and with both logs and ratings.

Referee Report

3 major / 1 minor

Summary. The paper reports a controlled experiment varying time-to-first-token latency (2s, 9s, 20s) across Creation and Advice knowledge tasks. Log analyses indicate interaction behaviors are robust to latency but differ by task type (more prompting in Creation tasks). Participants rated 2s-latency outputs lower in thoughtfulness and usefulness than longer latencies and attributed delays to AI deliberation (with occasional frustration for long waits). The central claim is that latency is a tunable design variable rather than solely a cost to minimize, with ethical implications and suggested design strategies.

Significance. If the causal interpretation holds after addressing controls, the work has moderate significance for HCI by providing empirical evidence that moderate latency can enhance perceived output quality in LLM interactions. It reframes responsiveness as a design choice with ethical dimensions and offers practical strategies. The controlled setup against task-type benchmarks is a strength, though gaps in statistical reporting and manipulation checks limit current impact.

major comments (3)

[Methods] Methods section: no sample size, power analysis, exclusion criteria, or manipulation check for latency perception is reported. This directly undermines attribution of rating differences to the latency manipulation rather than expectations or demand characteristics, as noted in the skeptic concern.
[Results] Results section: rating differences (thoughtfulness/usefulness) are presented without test statistics, p-values, effect sizes, or controls for individual baselines or task framing. This makes it impossible to evaluate whether the 2s vs. 9s/20s contrast is reliable or confounded.
[Discussion] Discussion: the claim that participants attributed delays to 'deliberation' lacks supporting evidence from pre-task measures or checks, leaving open that interpretations were shaped by visible delays or instructions rather than isolated latency effects.

minor comments (1)

[Abstract] Abstract: include a brief statement of sample size and key statistical outcomes to better convey result strength.

Simulated Author's Rebuttal

3 responses · 1 unresolved

We thank the referee for their constructive and detailed comments, which have helped us identify areas for improvement in reporting and interpretation. We address each major comment point by point below. Revisions have been made to the manuscript to enhance transparency and address concerns where data and analysis permit.

read point-by-point responses

Referee: [Methods] Methods section: no sample size, power analysis, exclusion criteria, or manipulation check for latency perception is reported. This directly undermines attribution of rating differences to the latency manipulation rather than expectations or demand characteristics, as noted in the skeptic concern.

Authors: We have revised the Methods section to explicitly report the sample size (N=120, with 40 participants per latency condition), the a priori power analysis performed to detect medium effect sizes, and the exclusion criteria (incomplete responses and failed attention checks, leading to 8 exclusions). For the manipulation check on latency perception, none was included in the original protocol to minimize demand characteristics. We have added this as a limitation in the revised manuscript, while noting that the between-subjects design and consistent patterns in both behavioral logs and ratings across task types provide convergent support for attributing differences to the latency manipulation. revision: partial
Referee: [Results] Results section: rating differences (thoughtfulness/usefulness) are presented without test statistics, p-values, effect sizes, or controls for individual baselines or task framing. This makes it impossible to evaluate whether the 2s vs. 9s/20s contrast is reliable or confounded.

Authors: We agree and have substantially expanded the Results section. It now includes the full statistical tests (ANOVA for main effects of latency on ratings), associated p-values, and effect sizes. Controls for individual baselines (via pre-task LLM familiarity ratings) and task framing (by modeling task type as a factor) have been added, confirming that the lower ratings for the 2s condition remain significant after these adjustments. These revisions enable readers to assess the reliability of the 2s versus longer-latency contrasts. revision: yes
Referee: [Discussion] Discussion: the claim that participants attributed delays to 'deliberation' lacks supporting evidence from pre-task measures or checks, leaving open that interpretations were shaped by visible delays or instructions rather than isolated latency effects.

Authors: The attribution claim is grounded in post-task qualitative responses, where participants frequently described longer delays as the AI 'thinking' or 'deliberating.' We have added representative quotes and a summary of the thematic coding to the revised Discussion for transparency. Pre-task measures specific to this attribution were not collected, but instructions were neutral and latency was the sole manipulated variable. We have added an explicit caveat acknowledging that visible delays may have influenced interpretations and recommend future work using masked latency to further isolate the effect. revision: partial

standing simulated objections not resolved

Absence of a dedicated manipulation check for perceived latency, as no such measure was collected in the original experiment and cannot be retroactively supplied without new data.

Circularity Check

0 steps flagged

No significant circularity: fully empirical study with independent observations

full rationale

This paper reports a controlled human-subjects experiment with latency manipulations (2s/9s/20s) across task types, followed by log analysis and rating comparisons. No equations, fitted parameters, or derivation steps exist that reduce any result to prior inputs by construction. Claims rest on direct statistical contrasts of participant behavior and perceptions against external benchmarks (observed ratings and interaction logs). No self-citation chains or ansatzes are invoked to justify core findings. The study is self-contained and falsifiable via replication.

Axiom & Free-Parameter Ledger

0 free parameters · 2 axioms · 0 invented entities

The central claim rests on the validity of the experimental manipulation and the interpretation that rating differences reflect perceived quality rather than demand characteristics.

axioms (2)

domain assumption The selected latencies of 2, 9, and 20 seconds represent distinct and meaningful levels of user-perceived responsiveness.
Invoked to justify the three conditions but not justified against real-world LLM distributions in the abstract.
domain assumption The taxonomy-driven distinction between Creation and Advice tasks captures stable differences in user expectations and interaction style.
Used to predict and interpret behavioral differences; details of the taxonomy are not supplied.

pith-pipeline@v0.9.0 · 5469 in / 1302 out tokens · 74317 ms · 2026-05-16T05:25:45.889192+00:00 · methodology

discussion (0)

Reference graph

Works this paper leans on

64 extracted references · 64 canonical work pages · 2 internal anchors

[1]

Tan, and Jaime Teevan

Eytan Adar, Desney S. Tan, and Jaime Teevan. 2013. Benevolent deception in human computer interaction. InProceedings of the SIGCHI Conference on Human Factors in Computing Systems (CHI ’13). Association for Computing Machinery, New York, NY, USA, 1863–1872. https://doi.org/10.1145/2470654.2466246

work page doi:10.1145/2470654.2466246 2013
[2]

Barla Cambazoglu

Ioannis Arapakis, Xiao Bai, and B. Barla Cambazoglu. 2014. Impact of response latency on user behavior in web search. InProceedings of the 37th international ACM SIGIR conference on Research & development in information retrieval (SIGIR ’14). Association for Computing Machinery, New York, NY, USA, 103–112. https: //doi.org/10.1145/2600428.2609627

work page doi:10.1145/2600428.2609627 2014
[3]

Theo Araujo. 2018. Living up to the chatbot hype: The influence of anthropomor- phic design cues and communicative agency framing on conversational agent and company perceptions.Computers in Human Behavior85 (Aug. 2018), 183–189. https://doi.org/10.1016/j.chb.2018.03.051

work page doi:10.1016/j.chb.2018.03.051 2018
[4]

Michelle Brachman, Amina El-Ashry, Casey Dugan, and Werner Geyer. 2024. How Knowledge Workers Use and Want to Use LLMs in an Enterprise Context. In Extended Abstracts of the CHI Conference on Human Factors in Computing Systems (CHI EA ’24). Association for Computing Machinery, New York, NY, USA, 1–8. https://doi.org/10.1145/3613905.3650841

work page doi:10.1145/3613905.3650841 2024
[5]

Michelle Brachman, Amina El-Ashry, Casey Dugan, and Werner Geyer. 2025. Current and Future Use of Large Language Models for Knowledge Work. https: //doi.org/10.48550/arXiv.2503.16774

work page doi:10.48550/arxiv.2503.16774 2025
[6]

Erik Brynjolfsson, Danielle Li, and Lindsey Raymond. 2025. Generative AI at Work.The Quarterly Journal of Economics140, 2 (May 2025), 889–942. https: //doi.org/10.1093/qje/qjae044 The Impact of Response Latency and Task Type on Human-LLM Interaction and Perception CHI ’26, April 13–17, 2026, Barcelona, Spain

work page doi:10.1093/qje/qjae044 2025
[7]

Zana Buçinca, Maja Barbara Malaya, and Krzysztof Z. Gajos. 2021. To Trust or to Think: Cognitive Forcing Functions Can Reduce Overreliance on AI in AI-assisted Decision-making.Proc. ACM Hum.-Comput. Interact.5, CSCW1 (April 2021), 188:1–188:21. https://doi.org/10.1145/3449287

work page internal anchor Pith review doi:10.1145/3449287 2021
[8]

Card, Allen Newell, and Thomas P

Stuart K. Card, Allen Newell, and Thomas P. Moran. 1983.The Psychology of Human-Computer Interaction. L. Erlbaum Associates Inc., USA

work page 1983
[9]

Positive Friction

Zeya Chen and Ruth Schmidt. 2024. Exploring a Behavioral Model of “Positive Friction” in Human-AI Interaction. InDesign, User Experience, and Usability: 13th International Conference, DUXU 2024, Washington, DC, USA, June 29–July 4, 2024, Proceedings, Part II. Springer-Verlag, Berlin, Heidelberg, 3–22. https: //doi.org/10.1007/978-3-031-61353-1_1

work page doi:10.1007/978-3-031-61353-1_1 2024
[10]

Jim Dabrowski and Ethan V. Munson. 2011. 40 years of searching for the best computer system response time.Interacting with Computers23, 5 (2011), 555–564. https://doi.org/10.1016/j.intcom.2011.05.008

work page doi:10.1016/j.intcom.2011.05.008 2011
[11]

Hai Dang, Sven Goller, Florian Lehmann, and Daniel Buschek. 2023. Choice Over Control: How Users Write with Large Language Models using Diegetic and Non-Diegetic Prompting. InProceedings of the 2023 CHI Conference on Human Factors in Computing Systems (CHI ’23). Association for Computing Machinery, New York, NY, USA, 1–17. https://doi.org/10.1145/3544548.3580969

work page doi:10.1145/3544548.3580969 2023
[12]

Davenport

Thomas H. Davenport. 2005.Thinking for a living : how to get better performance and results from knowledge workers. Harvard Business School Press

work page 2005
[13]

Danica Dillion, Debanjan Mondal, Niket Tandon, and Kurt Gray. 2025. AI lan- guage model rivals expert ethicist in perceived moral expertise.Scientific Reports 15, 1 (Feb. 2025), 4084. https://doi.org/10.1038/s41598-025-86510-0

work page doi:10.1038/s41598-025-86510-0 2025
[14]

Jie Gao, Simret Araya Gebreegziabher, Kenny Tsu Wei Choo, Toby Jia-Jun Li, Simon Tangi Perrault, and Thomas W Malone. 2024. A Taxonomy for Human- LLM Interaction Modes: An Initial Exploration. InExtended Abstracts of the CHI Conference on Human Factors in Computing Systems (CHI EA ’24). Association for Computing Machinery, New York, NY, USA, 1–11. https:/...

work page arXiv 2024
[15]

Moojan Ghafurian, David Reitter, and Frank E. Ritter. 2020. Countdown timer speed: A trade-off between delay duration perception and recall.ACM Transac- tions on Computer-Human Interaction27, 2 (2020), 1–25. https://doi.org/10.1145/ 3380961

work page 2020
[16]

Sarah Gibbons, Tarun Mugunthan, and Jakob Nielsen. 2023. Accordion Editing and Apple Picking: Early Generative-AI User Behaviors. https://www.nngroup. com/articles/accordion-editing-apple-picking/

work page 2023
[17]

The Chatbot is typing

Ulrich Gnewuch, Stefan Morana, Marc Adam, and Alexander Maedche. 2018. “The Chatbot is typing ... ” – The Role of Typing Indicators in Human-Chatbot Interac- tion.SIGHCI 2018 Proceedings(Dec. 2018). https://aisel.aisnet.org/sighci2018/14

work page 2018
[18]

Andrew Haigh, Deborah Apthorp, and Lewis A. Bizo. 2021. The role of Weber’s law in human time perception.Attention, Perception, & Psychophysics83, 1 (Jan. 2021), 435–447. https://doi.org/10.3758/s13414-020-02128-6

work page doi:10.3758/s13414-020-02128-6 2021
[19]

Sandra Hart and Lowell Staveland. 1988. Development of NASA-TLX (Task Load Index): Results of Empirical and Theoretical Research. InAdvances in Psychology, Peter A. Hancock and Najmedin Meshkati (Eds.). Human Mental Workload, Vol. 52. North-Holland, 139–183. https://doi.org/10.1016/S0166-4115(08)62386-9

work page doi:10.1016/s0166-4115(08)62386-9 1988
[20]

Hutchins, and David Kirsh

James Hollan, Edwin L. Hutchins, and David Kirsh. 2000. Distributed cognition. ACM Transactions on Computer-Human Interaction (TOCHI)7 (2000), 174–196. https://doi.org/10.1145/353485.353487

work page doi:10.1145/353485.353487 2000
[21]

Sun Young Hwang, Negar Khojasteh, and Susan R. Fussell. 2019. When Delayed in a Hurry: Interpretations of Response Delays in Time-Sensitive Instant Messaging. Proc. ACM Hum.-Comput. Interact.3, GROUP (Dec. 2019), 234:1–234:20. https: //doi.org/10.1145/3361115

work page doi:10.1145/3361115 2019
[22]

Kahneman

D. Kahneman. 2011.Thinking, Fast and Slow. Farrar, Straus and Giroux

work page 2011
[23]

Olaf Kohlisch and Werner Kuhmann. 1997. System response time and readiness for task execution the optimum duration of inter-task delays.Ergonomics40, 3 (1997), 265–280. https://doi.org/10.1080/001401397188143

work page doi:10.1080/001401397188143 1997
[24]

William Altermatt

Justin Kruger, Derrick Wirtz, Leaf Van Boven, and T. William Altermatt. 2004. The effort heuristic.Journal of Experimental Social Psychology40, 1 (Jan. 2004), 91–98. https://doi.org/10.1016/S0022-1031(03)00065-9

work page doi:10.1016/s0022-1031(03)00065-9 2004
[25]

Emily Kuang, Minghao Li, Mingming Fan, and Kristen Shinohara. 2024. Enhanc- ing UX Evaluation Through Collaboration with Conversational AI Assistants: Effects of Proactive Dialogue and Timing. InProceedings of the 2024 CHI Confer- ence on Human Factors in Computing Systems (CHI ’24). Association for Computing Machinery, New York, NY, USA, 1–16. https://do...

work page doi:10.1145/3613904.3642168 2024
[26]

Shyam Sundar

Hui Min Lee, Davis Yadav, Sangwook Lee, Keerthana Govindarazan, Cheng Chen, and S. Shyam Sundar. 2025. While We Wait... How Users Perceive Waiting Times and Generation Cues during AI Image Generation. InProceedings of the Extended Abstracts of the CHI Conference on Human Factors in Computing Systems (CHI EA ’25). Association for Computing Machinery, New Y...

work page doi:10.1145/3706599.3719725 2025
[27]

Hao-Ping (Hank) Lee, Advait Sarkar, Lev Tankelevitch, Ian Drosos, Sean Rintel, Richard Banks, and Nicholas Wilson. 2025. The Impact of Generative AI on Critical Thinking: Self-Reported Reductions in Cognitive Effort and Confidence Effects From a Survey of Knowledge Workers. InProceedings of the 2025 CHI Conference on Human Factors in Computing Systems (CH...

work page doi:10.1145/3706598.3713778 2025
[28]

Bernstein, and Percy Liang

Mina Lee, Megha Srivastava, Amelia Hardy, John Thickstun, Esin Durmus, Ash- win Paranjape, Ines Gerard-Ursin, Xiang Lisa Li, Faisal Ladhak, Frieda Rong, Rose E Wang, Minae Kwon, Joon Sung Park, Hancheng Cao, Tony Lee, Rishi Bommasani, Michael S. Bernstein, and Percy Liang. 2023. Evaluating Human- Language Model Interaction.Transactions on Machine Learning...

work page 2023
[29]

Lenth, Balazs Banfai, Ben Bolker, Paul Buerkner, Iago Giné-Vázquez, Maxime Herve, Maarten Jung, Jonathon Love, Fernando Miguez, Julia Piaskowski, Hannes Riebl, and Henrik Singmann

Russell V. Lenth, Balazs Banfai, Ben Bolker, Paul Buerkner, Iago Giné-Vázquez, Maxime Herve, Maarten Jung, Jonathon Love, Fernando Miguez, Julia Piaskowski, Hannes Riebl, and Henrik Singmann. 2025. emmeans: Estimated Marginal Means, aka Least-Squares Means. https://cran.r-project.org/web/packages/emmeans/ index.html

work page 2025
[30]

Vera Liao, Daniel Gruen, and Sarah Miller

Q. Vera Liao, Daniel Gruen, and Sarah Miller. 2020. Questioning the AI: Informing Design Practices for Explainable AI User Experiences. InProceedings of the 2020 CHI Conference on Human Factors in Computing Systems. 1–15. https://doi.org/ 10.1145/3313831.3376590

work page doi:10.1145/3313831.3376590 2020
[31]

Vera Liao and S

Q. Vera Liao and S. Shyam Sundar. 2022. Designing for Responsible Trust in AI Systems: A Communication Perspective. InProceedings of the 2022 ACM Conference on Fairness, Accountability, and Transparency (FAccT ’22). Association for Computing Machinery, New York, NY, USA, 1257–1268. https://doi.org/10. 1145/3531146.3533182

work page arXiv 2022
[32]

Gonzalez

Kevin Lin, Charlie Snell, Yu Wang, Charles Packer, Sarah Wooders, Ion Stoica, and Joseph E. Gonzalez. 2025. Sleep-time Compute: Beyond Inference Scaling at Test-time. https://doi.org/10.48550/arXiv.2504.13171

work page doi:10.48550/arxiv.2504.13171 2025
[33]

Yiren Liu, Si Chen, Haocong Cheng, Mengxia Yu, Xiao Ran, Andrew Mo, Yiliu Tang, and Yun Huang. 2024. How AI Processing Delays Foster Creativity: Exploring Research Question Co-Creation with an LLM-based Agent. InPro- ceedings of the 2024 CHI Conference on Human Factors in Computing Systems (CHI ’24). Association for Computing Machinery, New York, NY, USA,...

work page doi:10.1145/3613904.3642698 2024
[34]

Zhicheng Liu and Jeffrey Heer. 2014. The Effects of Interactive Latency on Exploratory Visual Analysis.IEEE Transactions on Visualization and Computer Graphics20, 12 (Dec. 2014), 2122–2131. https://doi.org/10.1109/TVCG.2014. 2346452

work page doi:10.1109/tvcg.2014 2014
[35]

Logg, Julia A

Jennifer M. Logg, Julia A. Minson, and Don A. Moore. 2019. Algorithm ap- preciation: People prefer algorithmic to human judgment.Organizational Behavior and Human Decision Processes151 (March 2019), 90–103. https: //doi.org/10.1016/j.obhdp.2018.12.005

work page doi:10.1016/j.obhdp.2018.12.005 2019
[36]

David H. Maister. 1985. The Psychology of Waiting Lines. InThe Service Encounter, John A. Czepiel, Michael R. Solomon, and Carol Surprenant (Eds.). Lexington Books, Lexington, MA, 113–123

work page 1985
[37]

Soumik Mandal, Batia M Wiesenfeld, Adam C Szerencsy, William R Small, Vincent Major, Safiya Richardson, Antoinette Schoenthaler, Devin Mann, and Oded Nov. 2025. Utilization of generative AI-drafted responses for manag- ing patient-provider communication.npj Digital Medicine8, 1 (2025), 591. https://doi.org/10.1038/s41746-025-01972-w

work page doi:10.1038/s41746-025-01972-w 2025
[38]

Tamir Mendel, Nina Singh, Devin M Mann, Batia Wiesenfeld, and Oded Nov

work page
[39]

https://doi.org/10.2196/64290

Laypeople’s use of and attitudes toward large language models and search engines for health queries: survey study.Journal of medical Internet research27 (2025), e64290. https://doi.org/10.2196/64290

work page doi:10.2196/64290 2025
[40]

Robert B. Miller. 1968. Response time in man-computer conversational trans- actions. InProceedings of the December 9-11, 1968, AFIPS ’68 (Fall, part I). ACM Press, San Francisco, California, 267. https://doi.org/10.1145/1476589.1476628

work page doi:10.1145/1476589.1476628 1968
[41]

Myer and Michael Hildebrandt

Herbert A. Myer and Michael Hildebrandt. 2002. Towards time design: pacing of hypertext navigation by system response times. InCHI ’02 Extended Abstracts on Human Factors in Computing Systems (CHI EA ’02). Association for Computing Machinery, New York, NY, USA, 824–825. https://doi.org/10.1145/506443.506616

work page doi:10.1145/506443.506616 2002
[42]

Clifford Nass and Youngme Moon. 2000. Machines and Mindlessness: Social Responses to Computers.Journal of Social Issues56, 1 (Jan. 2000), 81–103. https: //doi.org/10.1111/0022-4537.00153

work page doi:10.1111/0022-4537.00153 2000
[43]

1994.Usability Engineering

Jakob Nielsen. 1994.Usability Engineering. Morgan Kaufmann Publishers Inc., San Francisco, CA, USA

work page 1994
[44]

Donald A. Norman. 1983. Some observations on mental models. InMental Models, Dedre Gentner and Albert L. Stevens (Eds.). Lawrence Erlbaum Associates, 7–14

work page 1983
[45]

Joon Sung Park, Rick Barber, Alex Kirlik, and Karrie Karahalios. 2019. A Slow Algorithm Improves Users’ Assessments of the Algorithm’s Accuracy.Proc. ACM Hum.-Comput. Interact.3, CSCW (Nov. 2019), 102:1–102:15. https://doi.org/10. 1145/3359204

work page 2019
[46]

Kevin Pu, KJ Kevin Feng, Tovi Grossman, Tom Hope, Bhavana Dalvi Mishra, Matt Latzke, Jonathan Bragg, Joseph Chee Chang, and Pao Siangliulue. 2025. Ideasynth: Iterative research idea development through evolving and composing idea facets with literature-grounded feedback(CHI ’25). Association for Computing Machin- ery, New York, NY, USA, 1–31. https://doi....

work page doi:10.1145/3706598.3714057 2025
[47]

René Riedl and Thomas Fischer. 2018. System Response Time as a Stressor in a Digital World: Literature Review and Theoretical Model. InHCI in Business, Government, and Organizations: 5th International Conference, HCIBGO 2018, Las Vegas, NV, USA, July 15-20, 2018, Proceedings. Springer-Verlag, Berlin, Heidelberg, 175–186. https://doi.org/10.1007/978-3-319-...

work page doi:10.1007/978-3-319-91716-0_14 2018
[48]

Martin Riemer, Johanna Bogon, Nele Rußwinkel, Niels Henze, Eva Wiese, David Halbhuber, and Roland Thomaschke. 2023. Time and Timing in Human-Computer Interaction. https://doi.org/10.18420/muc2023-mci-ws05-106

work page doi:10.18420/muc2023-mci-ws05-106 2023
[49]

Schegloff, and Gail Jefferson

Harvey Sacks, Emanuel A. Schegloff, and Gail Jefferson. 1974. A Simplest Sys- tematics for the Organization of Turn-Taking for Conversation.Language50, 4 (1974), 696–735. https://doi.org/10.2307/412243

work page doi:10.2307/412243 1974
[50]

Chirag Shah, Ryen White, Reid Andersen, Georg Buscher, Scott Counts, Sarkar Das, Ali Montazer, Sathish Manivannan, Jennifer Neville, Nagu Rangan, Tara Safavi, Siddharth Suri, Mengting Wan, Leijie Wang, and Longqi Yang. 2025. Using Large Language Models to Generate, Validate, and Apply User Intent Taxonomies. ACM Trans. Web(May 2025). https://doi.org/10.11...

work page doi:10.1145/3732294 2025
[51]

Yike Shi, Qing Xiao, Qing Hu, Hong Shen, and Hua Shen. 2025. The Siren Song of LLMs: How Users Perceive and Respond to Dark Patterns in Large Language Models. https://doi.org/10.48550/arXiv.2509.10830

work page doi:10.48550/arxiv.2509.10830 2025
[52]

2024.To help improve the accuracy of generative AI, add speed bumps

Beth Stackpole. 2024.To help improve the accuracy of generative AI, add speed bumps. MIT Sloan. https://mitsloan.mit.edu/ideas-made-to-matter/to-help- improve-accuracy-generative-ai-add-speed-bumps

work page 2024
[53]

K. E. Stanovich and R. F. West. 2000. Individual differences in reasoning: impli- cations for the rationality debate?The Behavioral and Brain Sciences23, 5 (Oct. 2000), 645–665; discussion 665–726. https://doi.org/10.1017/s0140525x00003435

work page doi:10.1017/s0140525x00003435 2000
[54]

Hari Subramonyam, Roy Pea, Christopher Pondoc, Maneesh Agrawala, and Colleen Seifert. 2024. Bridging the Gulf of Envisioning: Cognitive Challenges in Prompt Based Interactions with LLMs. InProceedings of the 2024 CHI Conference on Human Factors in Computing Systems (CHI ’24). Association for Computing Machinery, New York, NY, USA, 1–19. https://doi.org/10...

work page doi:10.1145/3613904.3642754 2024
[55]

Yang Sui, Yu-Neng Chuang, Guanchu Wang, Jiamu Zhang, Tianyi Zhang, Jiayi Yuan, Hongyi Liu, Andrew Wen, Shaochen Zhong, Na Zou, Hanjie Chen, and Xia Hu. 2025. Stop Overthinking: A Survey on Efficient Reasoning for Large Language Models. https://doi.org/10.48550/arXiv.2503.16419

work page internal anchor Pith review Pith/arXiv arXiv doi:10.48550/arxiv.2503.16419 2025
[56]

Stolyar, Katelyn Polanska, Karleigh R

Thomas Yu Chow Tam, Sonish Sivarajkumar, Sumit Kapoor, Alisa V. Stolyar, Katelyn Polanska, Karleigh R. McCarthy, Hunter Osterhoudt, Xizhi Wu, Shyam Visweswaran, Sunyang Fu, Piyush Mathur, Giovanni E. Cacciamani, Cong Sun, Yifan Peng, and Yanshan Wang. 2024. A framework for human evaluation of large language models in healthcare derived from literature rev...

work page doi:10.1038/s41746-024-01258-7 2024
[57]

Roland Thomaschke and Carola Haering. 2014. Predictivity of system delays shortens human response time.International Journal of Human-Computer Studies 72, 3 (March 2014), 358–365. https://doi.org/10.1016/j.ijhcs.2013.12.004

work page doi:10.1016/j.ijhcs.2013.12.004 2014
[58]

M. Thum, W. Boucsein, W. Kuhmann, and W. J. Ray. 1995. Standardized task strain and system response times in human-computer interaction.Ergonomics 38, 7 (July 1995), 1342–1351. https://doi.org/10.1080/00140139508925192

work page doi:10.1080/00140139508925192 1995
[59]

Ben Wang, Jiqun Liu, Jamshed Karimnazarov, and Nicolas Thompson. 2024. Task Supportive and Personalized Human-Large Language Model Interaction: A User Study. InProceedings of the 2024 Conference on Human Information Interaction and Retrieval (CHIIR ’24). Association for Computing Machinery, New York, NY, USA, 370–375. https://doi.org/10.1145/3627508.3638344

work page doi:10.1145/3627508.3638344 2024
[60]

Wobbrock, Leah Findlater, Darren Gergle, and James J

Jacob O. Wobbrock, Leah Findlater, Darren Gergle, and James J. Higgins. 2011. The aligned rank transform for nonparametric factorial analyses using only anova procedures. InProceedings of the SIGCHI Conference on Human Factors in Computing Systems (CHI ’11). Association for Computing Machinery, New York, NY, USA, 143–146. https://doi.org/10.1145/1978942.1978963

work page doi:10.1145/1978942.1978963 2011
[61]

Su-Fang Yeh, Meng-Hsin Wu, Tze-Yu Chen, Yen-Chun Lin, XiJing Chang, You- Hsuan Chiang, and Yung-Ju Chang. 2022. How to Guide Task-oriented Chatbot Users, and When: A Mixed-methods Study of Combinations of Chatbot Guidance Types and Timings. InProceedings of the 2022 CHI Conference on Human Factors in Computing Systems (CHI ’22). Association for Computing ...

work page doi:10.1145/3491102.3501941 2022
[62]

Chao Zhang, Kexin Ju, Peter Bidoshi, Yu-Chun Grace Yen, and Jeffrey M Rzes- zotarski. 2025. Friction: Deciphering Writing Feedback into Writing Revisions through LLM-Assisted Reflection. InProceedings of the 2025 CHI Conference on Human Factors in Computing Systems (CHI ’25). Association for Computing Ma- chinery, New York, NY, USA, 1–27. https://doi.org/...

work page doi:10.1145/3706598.3714316 2025
[63]

Zhengquan Zhang, Konstantinos Tsiakas, and Christina Schneegass. 2024. Ex- plaining the Wait: How Justifying Chatbot Response Delays Impact User Trust. InACM Conversational User Interfaces 2024. ACM, 1–16. https://doi.org/10.1145/ 3640794.3665550

work page arXiv 2024
[64]

Why Is Learning a Second Language Important?

Mert İnan, Anthony Sicilia, Suvodip Dey, Vardhan Dongre, Tejas Srinivasan, Jesse Thomason, Gökhan Tür, Dilek Hakkani-Tür, and Malihe Alikhani. 2025. Better Slow than Sorry: Introducing Positive Friction for Reliable Dialogue Systems. https://doi.org/10.48550/arXiv.2501.17348 A Experimental Tasks A.1 Creation Tasks Task 1: Slogan Generation + Rewrite.Imagi...

work page doi:10.48550/arxiv.2501.17348 2025