pith. machine review for the scientific record. sign in

arxiv: 2604.23136 · v1 · submitted 2026-04-25 · 💻 cs.CY · cs.HC

Recognition: unknown

How Researchers Navigate Accountability, Transparency, and Trust When Using AI Tools in Early-Stage Research: A Think-Aloud Study

Houjiang Liu, Matthew Lease, Sanjana Gautam, Yujin Choi

Pith reviewed 2026-05-08 07:17 UTC · model grok-4.3

classification 💻 cs.CY cs.HC
keywords accountabilitytransparencytrustAI toolsearly-stage researchresponsible AIthink-aloud studyLLM
0
0 comments X

The pith

Researchers using AI in early-stage work develop their own checks because AI outputs hide uncertainty and lack clear origins.

A machine-rendered reading of the paper's core claim, the machinery that carries it, and where it could break.

The paper examines how 15 researchers actually use LLM-based AI tools while exploring literature, synthesizing ideas, and forming research directions. It shows that the confident presentation of AI results makes it harder for the accountable researchers to spot where extra scrutiny is needed. Opaque retrieval steps also prevent easy tracing of where information comes from, while trust in the tools proves unstable and quick to break. In response, the participants created practical workarounds to keep their own judgment reliable. These patterns matter because AI is entering core research steps where individual responsibility cannot be handed off.

Core claim

The confident tone of AI outputs misrepresents epistemic uncertainty, making it more difficult for researchers, who remain ultimately accountable, to identify which outputs require the greatest scrutiny. Opaque retrieval and content construction make provenance difficult to establish for transparency. Trust in AI is fragile, context-dependent, and easily eroded. In response, participant researchers develop compensatory strategies to restore scholarly judgment under uncertainty.

What carries the argument

Think-aloud observations of 15 researchers performing literature exploration, synthesis, and ideation with LLM tools, which surface the compensatory strategies they create to handle uncertainty and provenance gaps.

If this is right

  • Researchers must treat AI outputs as provisional and add extra verification steps to meet their accountability obligations.
  • Provenance tracking becomes a user-driven task because AI systems do not supply clear source trails.
  • Trust in AI tools requires repeated calibration because it shifts with task type and prior experience.
  • Deliberate choices in how AI is integrated into early research are needed to keep accountability, transparency, and informed trust intact.

Where Pith is reading between the lines

These are editorial extensions of the paper, not claims the author makes directly.

  • AI tools for research could reduce user burden by surfacing uncertainty estimates and source links directly in their outputs.
  • The same accountability pressures may appear when professionals in medicine or law adopt similar generative tools.
  • Research training programs might need to include explicit practice in spotting and correcting for AI-induced gaps in uncertainty and provenance.

Load-bearing premise

Verbal reports from a think-aloud study with 15 researchers accurately reflect their real-time judgments and workarounds without being changed by the presence of observers or the study setting.

What would settle it

Direct observation of researchers using the same AI tools in their normal unrecorded workflows to check whether the same compensatory strategies appear without prompting from a study protocol.

read the original abstract

In the early stages of scientific research, researchers rely on core scholarly judgments to identify relevant literature, assess credible evidence, and determine which directions merit pursuit. As AI tools become increasingly integrated into these early-stage workflows, the scholarly judgments that were once transparent and attributable to individual researchers become obscured, raising critical Responsible AI (RAI) concerns around accountability, transparency, and trust. Yet how these three dimensions manifest in real-time, in-situ scholarly practice remains largely unexplored. To address this gap, we conducted a think-aloud study with 15 researchers to examine how they used AI tools powered by large language models (LLMs) across early-stage research tasks, including literature exploration, synthesis, and research ideation. Our key findings address the tripartite constructs of accountability, transparency, and trust. First, the confident tone of AI outputs misrepresents epistemic uncertainty, making it more difficult for researchers (who are ultimately accountable) to identify which outputs require the greatest scrutiny. Second, opaque retrieval and content construction make provenance difficult to establish for transparency. Third, trust in AI is fragile, context-dependent, and easily eroded. In response, participant researchers were seen to develop compensatory strategies to restore scholarly judgment under uncertainty. Overall, our findings serve to contextualize AI-mediated research as a RAI problem grounded in lived researcher experience and motivate attention to deliberate AI integration that preserves accountability, supports transparency, and fosters informed trust.

Editorial analysis

A structured set of objections, weighed in public.

Desk editor's note, referee report, simulated authors' rebuttal, and a circularity audit. Tearing a paper down is the easy half of reading it; the pith above is the substance, this is the friction.

Referee Report

1 major / 2 minor

Summary. The manuscript describes a think-aloud study with 15 researchers using LLM-powered AI tools for early-stage research tasks like literature exploration, synthesis, and ideation. It claims that AI's confident tone misrepresents epistemic uncertainty, hindering accountability by making it hard to identify outputs needing scrutiny; opaque retrieval and content construction impede transparency by obscuring provenance; trust in AI is fragile, context-dependent, and easily eroded; and researchers develop compensatory strategies to restore scholarly judgment under uncertainty.

Significance. If these observations hold, the paper provides valuable empirical grounding for Responsible AI concerns in academic research. It illustrates how AI characteristics affect core scholarly judgments in practice and suggests ways to better integrate AI while maintaining accountability, transparency, and informed trust. This contributes to understanding human-AI collaboration in science.

major comments (1)
  1. [Methods] The think-aloud protocol is the primary method for capturing real-time behaviors, but the paper does not discuss or mitigate potential reactivity effects. Requiring concurrent verbalization can increase cognitive load and lead to more cautious or compensatory behaviors than in natural silent use, which directly threatens the validity of the observed strategies for handling uncertainty, provenance, and trust. Without addressing this (e.g., via silent control conditions or post-hoc checks), the central claims lack sufficient grounding.
minor comments (2)
  1. [Abstract] The abstract does not provide details on task prompts, specific AI tools, participant selection, coding scheme, or inter-rater reliability, which are important for assessing the study's rigor and findings.
  2. [Discussion] Consider adding more concrete examples or quotes from participants to illustrate the compensatory strategies, as this would make the findings more vivid and persuasive.

Simulated Author's Rebuttal

1 responses · 0 unresolved

We thank the referee for their constructive and detailed feedback, which highlights an important methodological consideration. We address the major comment below and will revise the manuscript to strengthen the presentation of our methods.

read point-by-point responses
  1. Referee: [Methods] The think-aloud protocol is the primary method for capturing real-time behaviors, but the paper does not discuss or mitigate potential reactivity effects. Requiring concurrent verbalization can increase cognitive load and lead to more cautious or compensatory behaviors than in natural silent use, which directly threatens the validity of the observed strategies for handling uncertainty, provenance, and trust. Without addressing this (e.g., via silent control conditions or post-hoc checks), the central claims lack sufficient grounding.

    Authors: We agree that the manuscript does not explicitly discuss potential reactivity effects of the concurrent think-aloud protocol, and this is a valid methodological concern. Think-aloud was chosen as the primary method because it allows capture of real-time scholarly judgments during AI-assisted tasks without the distortions introduced by retrospective accounts, which aligns with our focus on in-situ accountability, transparency, and trust processes. However, we recognize that concurrent verbalization may have increased cognitive load or prompted more deliberate compensatory strategies than would occur in silent use. In the revised version, we will add a paragraph to the Methods section explaining the rationale for this protocol (drawing on established HCI and cognitive psychology literature) and expand the Limitations section to acknowledge reactivity as a potential influence on the observed behaviors. We will also note that no silent control conditions or formal post-hoc reactivity checks were included, as the study was designed as an initial qualitative exploration with a small sample prioritizing rich process data; this will be framed as a limitation and a direction for future comparative work. revision: yes

Circularity Check

0 steps flagged

No circularity: empirical qualitative study with direct observational grounding

full rationale

The paper reports findings from a think-aloud protocol involving 15 researchers performing literature exploration, synthesis, and ideation tasks with LLMs. All central claims (confident tone misrepresenting uncertainty, opaque provenance, fragile trust, and compensatory strategies) are presented as direct summaries of participant verbalizations and behaviors observed in the study sessions. No equations, fitted parameters, predictions, or first-principles derivations exist. No self-citations are invoked to justify uniqueness theorems or ansatzes that would reduce the findings to prior inputs. The study is self-contained against its own empirical data; the derivation chain consists solely of thematic analysis of recorded think-aloud sessions and does not loop back to its own assumptions by construction.

Axiom & Free-Parameter Ledger

0 free parameters · 1 axioms · 0 invented entities

The central claim rests on standard qualitative HCI assumptions without free parameters, new entities, or ad-hoc inventions; it applies established think-aloud methods to a new context.

axioms (1)
  • domain assumption Think-aloud protocols can reveal real-time decision-making processes and compensatory strategies in scholarly tasks.
    Invoked to interpret participant verbalizations as reflective of accountability, transparency, and trust judgments during AI-assisted work.

pith-pipeline@v0.9.0 · 5567 in / 1529 out tokens · 57242 ms · 2026-05-08T07:17:33.929374+00:00 · methodology

discussion (0)

Sign in with ORCID, Apple, or X to comment. Anyone can read and Pith papers without signing in.

Reference graph

Works this paper leans on

79 extracted references · 22 canonical work pages · 2 internal anchors

  1. [1]

    Muhammad Naveed Akbar. 2025. Use of artificial intelligence tools by doctoral students: a mixed-methods explanatory-sequential investigation.Journal of Further and Higher Education(2025), 1–19

  2. [2]

    Abdulrahman M Al-Zahrani. 2024. The impact of generative AI tools on re- searchers and research: Implications for academia in higher education.Innova- tions in Education and Teaching International61, 5 (2024), 1029–1043

  3. [3]

    Hikari Ando, Rosanna Cousins, and Carolyn Young. 2014. Achieving saturation in thematic analysis: Development and refinement of a codebook.Comprehensive Psychology3 (2014), 03–CP

  4. [4]

    Wenceslao Arroyo-Machado, Jinghuan Ma, Tipeng Chen, Timothy P Johnson, Shaika Islam, Lesley Michalegko, and Eric Welch. 2025. Generative AI and academic scientists in US universities: Perception, experience, and adoption intentions.PloS one20, 8 (2025), e0330416

  5. [5]

    Tita Alissa Bach, Magnhild Kaarstad, Elizabeth Solberg, and Aleksandar Babic

  6. [6]

    Insights into suggested Responsible AI (RAI) practices in real-world settings: a systematic literature review.AI and Ethics5, 3 (2025), 3185–3232

  7. [7]

    Emily M Bender, Timnit Gebru, Angelina McMillan-Major, and Shmargaret Shmitchell. 2021. On the dangers of stochastic parrots: Can language models be too big?. InProceedings of the 2021 ACM Conference on Fairness, Accountability, and Transparency. ACM, New York, NY, USA. doi:10.1145/3442188.3445922

  8. [8]

    Sophie Berretta, Alina Tausch, Greta Ontrup, Björn Gilles, Corinna Peifer, and Annette Kluge. 2023. Defining human-AI teaming the human-centered way: a scoping review and network analysis.Frontiers in Artificial Intelligence6 (2023), 1250725

  9. [9]

    Marcel Binz, Stephan Alaniz, Adina Roskies, Balazs Aczel, Carl T Bergstrom, Colin Allen, Daniel Schad, Dirk Wulff, Jevin D West, Qiong Zhang, Richard M Shiffrin, Samuel J Gershman, Vencislav Popov, Emily M Bender, Marco Marelli, Matthew M Botvinick, Zeynep Akata, and Eric Schulz. 2025. How should the advancement of large language models affect the practic...

  10. [10]

    2009.The Craft of research, third edition

    Wayne C Booth, Gregory G Colomb, and Joseph M Williams. 2009.The Craft of research, third edition. University of Chicago Press, Chicago, IL

  11. [11]

    Anna Carobene, Andrea Padoan, Federico Cabitza, Giuseppe Banfi, and Mario Plebani. 2024. Rising adoption of artificial intelligence in scientific publishing: evaluating the role, risks, and ethical implications in paper drafting and review process.Clinical Chemistry and Laboratory Medicine (CCLM)62, 5 (2024), 835– 843. 1https://www.cosmicai.org/ 2https://...

  12. [12]

    Elizabeth Charters. 2003. The use of think-aloud methods in qualitative research an introduction to think-aloud methods.Brock Education Journal12, 2 (2003)

  13. [13]

    Jiaqi Chen, Yanzhe Zhang, Yutong Zhang, Yijia Shao, and Diyi Yang. 2025. Gen- erative Interfaces for Language Models.arXiv preprint arXiv:2508.19227(2025)

  14. [14]

    Qiguang Chen, Mingda Yang, Libo Qin, Jinhao Liu, Zheng Yan, Jiannan Guan, Dengyun Peng, Yiyan Ji, Hanjing Li, Mengkang Hu, et al . 2025. AI4Research: A Survey of Artificial Intelligence for Scientific Research.arXiv preprint arXiv:2507.01903(2025)

  15. [15]

    Nicholas Clark, Hua Shen, Bill Howe, and Tanushree Mitra. 2025. Epistemic alignment: A mediating framework for user-llm knowledge delivery.arXiv preprint arXiv:2504.01205(2025)

  16. [16]

    A Feder Cooper, Emanuel Moss, Benjamin Laufer, and Helen Nissenbaum. 2022. Accountability in an algorithmic society: relationality, responsibility, and robust- ness in machine learning. InProceedings of the 2022 ACM conference on fairness, accountability, and transparency. 864–876

  17. [17]

    Eric Corbett and Remi Denton. 2023. Interrogating the T in FAccT. InProceedings of the 2023 ACM Conference on Fairness, Accountability, and Transparency. 1624– 1634

  18. [18]

    Manuel Alejandro Cruz-Aguilar. 2025. The epistemic revolution of AI: reconfig- uring the foundations of scientific knowledge.AI & SOCIETY(2025), 1–17

  19. [19]

    Advait Deshpande and Helen Sharp. 2022. Responsible AI Systems: Who are the Stakeholders?. InProceedings of the 2022 AAAI/ACM Conference on AI, Ethics, and Society (AIES ’22). Association for Computing Machinery, New York, NY, USA, 227–236. doi:10.1145/3514094.3534187

  20. [20]

    David M Douglas. 2025. Researchers’ perceptions of automating scientific re- search.AI & SOCIETY40, 5 (2025), 4131–4144

  21. [21]

    Mingming Fan, Serina Shi, and Khai N Truong. 2020. Practices and Challenges of Using Think-Aloud Protocols in Industry: An International Survey.Journal of Usability Studies15, 2 (2020)

  22. [22]

    K J Kevin Feng, Kevin Pu, Matt Latzke, Tal August, Pao Siangliulue, Jonathan Bragg, Daniel S Weld, Amy X Zhang, and Joseph Chee Chang. 2026. Cocoa: Co-planning and co-execution with AI agents.arXiv [cs.HC](18 Feb. 2026). arXiv:2412.10999 [cs.HC] doi:10.48550/arXiv.2412.10999

  23. [23]

    Andrea Ferrario and Michele Loi. 2022. How explainability contributes to trust in AI. InProceedings of the 2022 ACM conference on fairness, accountability, and transparency. 1457–1466

  24. [24]

    Marsha E Fonteyn, Benjamin Kuipers, and Susan J Grobe. 1993. A description of think aloud method and protocol analysis.Qualitative health research3, 4 (1993), 430–441

  25. [25]

    Ben Gansky and Sean McDonald. 2022. CounterFAccTual: How FAccT under- mines its organizing principles. InProceedings of the 2022 ACM Conference on Fairness, Accountability, and Transparency. 1982–1992

  26. [26]

    Sanjana Gautam, Mohit Chandra, Ankolika De, Tatiana Chakravorti, Girik Malik, and Munmun De Choudhury. 2025. Towards Experience-Centered AI: A Frame- work for Integrating Lived Experience in Design and Development. InProceedings of the AAAI/ACM Conference on AI, Ethics, and Society, Vol. 8. 1062–1077

  27. [27]

    José Mauro Granjeiro, Altair Antoninha Del Bel Cury, Jaime Aparecido Cury, Mike Bueno, Manoel Damião Sousa-Neto, and Carlos Estrela. 2025. The future of scientific writing: AI tools, benefits, and ethical implications.Brazilian Dental Journal36 (2025), e25–6471

  28. [28]

    Jingjing Hu and Xuesong Andy Gao. 2017. Using think-aloud protocol in self- regulated reading research.Educational Research Review22 (2017), 181–193

  29. [29]

    Paul Humphreys. 2020. Why automated science should be cautiously wel- comed. InA Critical Reflection on Automated Science: Will Science Remain Human? Springer, 11–26

  30. [30]

    Maurice Jakesch, Zana Buçinca, Saleema Amershi, and Alexandra Olteanu. 2022. How different groups prioritize ethical values for responsible AI. In2022 ACM Conference on Fairness Accountability and Transparency. ACM, New York, NY, USA, 310–323. doi:10.1145/3531146.3533097

  31. [31]

    Hyeonsu Kang, Joseph Chee Chang, Yongsung Kim, and Aniket Kittur. 2022. Threddy: An interactive system for personalized thread-based exploration and organization of scientific literature. InProceedings of the 35th Annual ACM Sym- posium on User Interface Software and Technology. ACM, New York, NY, USA. doi:10.1145/3526113.3545660

  32. [32]

    Hyeonsu B Kang, Tongshuang Wu, Joseph Chee Chang, and Aniket Kittur. 2023. Synergi: A mixed-initiative system for scholarly synthesis and sensemaking. In Proceedings of the 36th Annual ACM Symposium on User Interface Software and Technology (UIST ’23, Article 43). ACM, New York, NY, USA, 1–19. doi:10.1145/ 3586183.3606759

  33. [33]

    Shivani Kapania, Ruiyi Wang, Toby Jia-Jun Li, Tianshi Li, and Hong Shen. 2025. ’I’m Categorizing LLM as a Productivity Tool’: Examining Ethics of LLM Use in HCI Research Practices.Proceedings of the ACM on Human-Computer Interaction 9, 2 (2025), 1–26

  34. [34]

    Mohamed Khalifa and Mona Albadawy. 2024. Using artificial intelligence in academic writing and research: An essential productivity tool.Computer Methods and Programs in Biomedicine Update5 (2024), 100145

  35. [35]

    David Klahr and Herbert A Simon. 1999. Studies of scientific discovery: Com- plementary approaches and convergent findings.Psychological Bulletin125, 5 (1999), 524

  36. [36]

    2023.Language models and cognitive automation for economic research

    Anton Korinek. 2023.Language models and cognitive automation for economic research. Technical Report. national Bureau of economic Research

  37. [37]

    Benjamin Laufer, Sameer Jain, A Feder Cooper, Jon Kleinberg, and Hoda Heidari

  38. [38]

    InProceedings of the 2022 ACM conference on fairness, accountability, and transparency

    Four years of FAccT: A reflexive, mixed-methods analysis of research contributions, shortcomings, and future prospects. InProceedings of the 2022 ACM conference on fairness, accountability, and transparency. 401–426

  39. [39]

    Weixin Liang, Yaohui Zhang, Zhengxuan Wu, Haley Lepp, Wenlong Ji, Xuandong Zhao, Hancheng Cao, Sheng Liu, Siyu He, Zhi Huang, et al. 2024. Mapping the increasing use of LLMs in scientific papers.arXiv preprint arXiv:2404.01268 (2024)

  40. [40]

    Q Vera Liao and S Shyam Sundar. 2022. Designing for Responsible Trust in AI Systems: A Communication Perspective. InProceedings of the 2022 ACM Conference on Fairness, Accountability, and Transparency (FAccT ’22). Association for Computing Machinery, New York, NY, USA, 1257–1268. doi:10.1145/3531146. 3533182

  41. [41]

    Zhehui Liao, Maria Antoniak, Inyoung Cheong, Evie Yu-Yen Cheng, Ai-Heng Lee, Kyle Lo, Joseph Chee Chang, and Amy X Zhang. 2024. Llms as research tools: A large scale survey of researchers’ usage and perceptions.arXiv preprint arXiv:2411.05025(2024)

  42. [42]

    Yiren Liu, Pranav Sharma, Mehul Oswal, Haijun Xia, and Yun Huang. 2025. Per- sonaFlow: Designing LLM-Simulated Expert Perspectives for Enhanced Research Ideation. InProceedings of the 2025 ACM Designing Interactive Systems Conference. 506–534

  43. [43]

    Kyle Lo, Joseph Chee Chang, Andrew Head, Jonathan Bragg, Amy X Zhang, Cassidy Trier, Chloe Anastasiades, Tal August, Russell Authur, Danielle Bragg, Erin Bransom, Isabel Cachola, Stefan Candra, Yoganand Chandrasekhar, Yen-Sung Chen, Evie Yu-Yen Cheng, Yvonne Chou, Doug Downey, Rob Evans, Raymond Fok, Fangzhou Hu, Regan Huff, Dongyeop Kang, Rodney Kinney, ...

  44. [44]

    Chris Lu, Cong Lu, Robert Tjarko Lange, Jakob Foerster, Jeff Clune, and David Ha

  45. [45]

    The AI Scientist: Towards Fully Automated Open-Ended Scientific Discovery

    The ai scientist: Towards fully automated open-ended scientific discovery. arXiv preprint arXiv:2408.06292(2024)

  46. [46]

    Heljä Lundgrén-Laine and Sanna Salanterä. 2010. Think-aloud technique and protocol analysis in clinical decision-making research.Qualitative health research 20, 4 (2010), 565–575

  47. [47]

    Arianna Manzini, Geoff Keeling, Nahema Marchal, Kevin R McKee, Verena Rieser, and Iason Gabriel. 2024. Should users trust advanced AI assistants? Justified trust as a function of competence and alignment. InProceedings of the 2024 ACM Conference on Fairness, Accountability, and Transparency. 1174–1186

  48. [48]

    Siddharth Mehrotra, Carolina Centeio Jorge, Catholijn M Jonker, and Myrthe L Tielman. 2024. Integrity-based explanations for fostering appropriate trust in AI agents.ACM Transactions on Interactive Intelligent Systems14, 1 (2024), 1–36

  49. [49]

    Meredith Ringel Morris. 2023. Scientists’ Perspectives on the Potential for Gen- erative AI in their Fields.arXiv preprint arXiv:2304.01420(2023)

  50. [50]

    Kristoffer L Nielbo, Folgert Karsdorp, Melvin Wevers, Alie Lassche, Rebekah B Baglini, Mike Kestemont, and Nina Tahmasebi. 2024. Quantitative text analysis. Nature Reviews Methods Primers4, 1 (2024), 25

  51. [51]

    Gabrielle O’Brien. 2025. How Scientists Use Large Language Models to Program. InProceedings of the 2025 CHI Conference on Human Factors in Computing Systems. 1–16

  52. [52]

    Adetoun A Oyelude. 2024. Artificial intelligence (AI) tools for academic research. Library Hi Tech News41, 8 (2024), 18–20

  53. [53]

    Saumya Pareek, Eduardo Velloso, and Jorge Goncalves. 2024. Trust Development and Repair in AI-Assisted Decision-Making during Complementary Expertise. In Proceedings of the 2024 ACM Conference on Fairness, Accountability, and Trans- parency. 546–561

  54. [54]

    Anh Ngoc Quynh Phan and Chloe Le. 2025. AI as research partner: key impli- cations of using AI for data visualisation in qualitative research.International Journal of Social Research Methodology(2025), 1–8

  55. [55]

    Robert Pinzolits. 2024. AI in academia: An overview of selected tools and their areas of application.MAP Education and Humanities4 (2024), 37–50

  56. [56]

    Kevin Pu, KJ Kevin Feng, Tovi Grossman, Tom Hope, Bhavana Dalvi Mishra, Matt Latzke, Jonathan Bragg, Joseph Chee Chang, and Pao Siangliulue. 2025. Ideasynth: Iterative research idea development through evolving and composing idea facets with literature-grounded feedback. InProceedings of the 2025 CHI Conference on Human Factors in Computing Systems. 1–31

  57. [57]

    Habeeb Ibrahim Abdul Razack, Sam T Mathew, Fathinul Fikri Ahmad Saad, and Saleh A Alqahtani. 2021. Artificial intelligence-assisted tools for redefining the communication landscape of the scholarly world.Science Editing8, 2 (2021), 134–144. FAccT’26, June 25–28, 2026, Montreal, Canada Gautam et al

  58. [58]

    Anka Reuel, Patrick Connolly, Kiana Jafari Meimandi, Shekhar Tewari, Jakub Wiatrak, Dikshita Venkatesh, and Mykel Kochenderfer. 2025. Responsible ai in the global context: Maturity model and survey. InProceedings of the 2025 ACM Conference on Fairness, Accountability, and Transparency. 2505–2541

  59. [59]

    Cynthia Rudin. 2019. Stop Explaining Black Box Machine Learning Models for High Stakes Decisions and Use Interpretable Models Instead.Nature machine intelligence1, 5 (May 2019), 206–215. doi:10.1038/s42256-019-0048-x

  60. [60]

    Daniel Schiff, Bogdana Rakova, Aladdin Ayesh, Anat Fanti, and Michael Lennon

  61. [61]

    Principles to practices for responsible AI: closing the gap.arXiv preprint arXiv:2006.04707(2020)

  62. [62]

    Samuel Schmidgall, Yusheng Su, Ze Wang, Ximeng Sun, Jialian Wu, Xiaodong Yu, Jiang Liu, Michael Moor, Zicheng Liu, and Emad Barsoum. 2025. Agent Laboratory: Using LLM agents as research assistants.arXiv [cs.HC](17 June 2025). arXiv:2501.04227 [cs.HC]

  63. [63]

    Hope Schroeder, Marianne Aubin Le Quéré, Casey Randazzo, David Mimno, and Sarita Schoenebeck. 2024. Large language models in qualitative research: Can we do the data justice.arXiv preprint arXiv:2410.07362(2024)

  64. [64]

    Leixian Shen, Enya Shen, Yuyu Luo, Xiaocong Yang, Xuming Hu, Xiongshuai Zhang, Zhiwei Tai, and Jianmin Wang. 2022. Towards natural language interfaces for data visualization: A survey.IEEE transactions on visualization and computer graphics29, 6 (2022), 3121–3144

  65. [65]

    Yang Shi, Tian Gao, Xiaohan Jiao, and Nan Cao. 2023. Understanding design collaboration between designers and artificial intelligence: a systematic literature review.Proceedings of the ACM on Human-Computer Interaction7, CSCW2 (2023), 1–35

  66. [66]

    Scott Spillias, Paris Tuohy, Matthew Andreotta, Ruby Annand-Jones, Fabio Boschetti, Christopher Cvitanovic, Joseph Duggan, Elisabeth A Fulton, Denis B Karcher, Cecile Paris, et al. 2024. Human-AI collaboration to identify literature for evidence synthesis.Cell Reports Sustainability1, 7 (2024)

  67. [67]

    Chris Stokel-Walker. 2023. ChatGPT listed as author on research papers: many scientists disapprove.Nature613, 7945 (Jan. 2023), 620–621. doi:10.1038/d41586- 023-00107-z

  68. [68]

    Lu Sun, Stone Tao, Junjie Hu, and Steven P Dow. 2024. Metawriter: Exploring the potential and perils of ai writing support in scientific peer review.Proceedings of the ACM on Human-Computer Interaction8, CSCW1 (2024), 1–32

  69. [69]

    Cecilie Steenbuch Traberg, Jon Roozenbeek, and Sander van der Linden. 2026. AI is turning research into a scientific monoculture.Communications Psychology 4, 1 (2026), 37

  70. [70]

    Richard Van Noorden and Jeffrey M Perkel. 2023. AI and science: what 1,600 researchers think.Nature621, 7980 (2023), 672–675

  71. [71]

    Maike Vollstedt and Sebastian Rezat. 2019. An introduction to grounded theory with a special focus on axial coding and the coding paradigm.Compendium for early career researchers in mathematics education13, 1 (2019), 81–100

  72. [72]

    Kelly B Wagman, Matthew T Dearing, and Marshini Chetty. 2025. Generative AI Uses and Risks for Knowledge Workers in a Science Organization. InProceedings of the 2025 CHI Conference on Human Factors in Computing Systems. 1–17

  73. [73]

    Hanchen Wang, Tianfan Fu, Yuanqi Du, Wenhao Gao, Kexin Huang, Ziming Liu, Payal Chandak, Shengchao Liu, Peter Van Katwyk, Andreea Deac, et al . 2023. Scientific discovery in the age of artificial intelligence.Nature620, 7972 (2023), 47–60

  74. [74]

    Hanchen Wang, Tianfan Fu, Yuanqi Du, Wenhao Gao, Kexin Huang, Ziming Liu, Payal Chandak, Shengchao Liu, Peter Van Katwyk, Andreea Deac, Anima Anandkumar, Karianne Bergen, Carla P Gomes, Shirley Ho, Pushmeet Kohli, Joan Lasenby, Jure Leskovec, Tie-Yan Liu, Arjun Manrai, Debora Marks, Bharath Ramsundar, Le Song, Jimeng Sun, Jian Tang, Petar Veličković, Max ...

  75. [75]

    Haomin Wen, Zhenjie Wei, Yan Lin, Jiyuan Wang, Yuxuan Liang, and Huaiyu Wan. 2024. Overleafcopilot: Empowering academic writing in overleaf with large language models.arXiv preprint arXiv:2403.09733(2024)

  76. [76]

    Yongjun Xu, Xin Liu, Xin Cao, Changping Huang, Enke Liu, Sen Qian, Xingchen Liu, Yanjun Wu, Fengliang Dong, Cheng-Wei Qiu, et al. 2021. Artificial intelli- gence: A powerful paradigm for scientific research.The Innovation2, 4 (2021)

  77. [77]

    Yuchi Yahagi, Rintaro Chujo, Yuga Harada, Changyo Han, Kohei Sugiyama, and Takeshi Naemura. 2025. PaperWave: Listening to Research Papers as Conversa- tional Podcasts Scripted by LLM. InProceedings of the Extended Abstracts of the CHI Conference on Human Factors in Computing Systems. 1–10

  78. [78]

    Lixiang Yan, Vanessa Echeverria, Gloria Milena Fernandez-Nieto, Yueqiao Jin, Zachari Swiecki, Linxuan Zhao, Dragan Gašević, and Roberto Martinez- Maldonado. 2024. Human-ai collaboration in thematic analysis using chatgpt: A user study and design recommendations. InExtended Abstracts of the CHI Conference on Human Factors in Computing Systems. 1–7

  79. [79]

    Chengbo Zheng, Yuanhao Zhang, Zeyu Huang, Chuhan Shi, Minrui Xu, and Xiaojuan Ma. 2024. Disciplink: Unfolding interdisciplinary information seeking process via human-ai co-exploration. InProceedings of the 37th Annual ACM Symposium on User Interface Software and Technology. 1–20. Received 13 January 2025