arxiv: 2604.23136 · v1 · submitted 2026-04-25 · 💻 cs.CY · cs.HC

Recognition: unknown

How Researchers Navigate Accountability, Transparency, and Trust When Using AI Tools in Early-Stage Research: A Think-Aloud Study

Houjiang Liu, Matthew Lease, Sanjana Gautam, Yujin Choi

Pith reviewed 2026-05-08 07:17 UTC · model grok-4.3

classification 💻 cs.CY cs.HC

keywords accountabilitytransparencytrustAI toolsearly-stage researchresponsible AIthink-aloud studyLLM

0 comments

The pith

Researchers using AI in early-stage work develop their own checks because AI outputs hide uncertainty and lack clear origins.

A machine-rendered reading of the paper's core claim, the machinery that carries it, and where it could break.

The paper examines how 15 researchers actually use LLM-based AI tools while exploring literature, synthesizing ideas, and forming research directions. It shows that the confident presentation of AI results makes it harder for the accountable researchers to spot where extra scrutiny is needed. Opaque retrieval steps also prevent easy tracing of where information comes from, while trust in the tools proves unstable and quick to break. In response, the participants created practical workarounds to keep their own judgment reliable. These patterns matter because AI is entering core research steps where individual responsibility cannot be handed off.

Core claim

The confident tone of AI outputs misrepresents epistemic uncertainty, making it more difficult for researchers, who remain ultimately accountable, to identify which outputs require the greatest scrutiny. Opaque retrieval and content construction make provenance difficult to establish for transparency. Trust in AI is fragile, context-dependent, and easily eroded. In response, participant researchers develop compensatory strategies to restore scholarly judgment under uncertainty.

What carries the argument

Think-aloud observations of 15 researchers performing literature exploration, synthesis, and ideation with LLM tools, which surface the compensatory strategies they create to handle uncertainty and provenance gaps.

If this is right

Researchers must treat AI outputs as provisional and add extra verification steps to meet their accountability obligations.
Provenance tracking becomes a user-driven task because AI systems do not supply clear source trails.
Trust in AI tools requires repeated calibration because it shifts with task type and prior experience.
Deliberate choices in how AI is integrated into early research are needed to keep accountability, transparency, and informed trust intact.

Where Pith is reading between the lines

These are editorial extensions of the paper, not claims the author makes directly.

AI tools for research could reduce user burden by surfacing uncertainty estimates and source links directly in their outputs.
The same accountability pressures may appear when professionals in medicine or law adopt similar generative tools.
Research training programs might need to include explicit practice in spotting and correcting for AI-induced gaps in uncertainty and provenance.

Load-bearing premise

Verbal reports from a think-aloud study with 15 researchers accurately reflect their real-time judgments and workarounds without being changed by the presence of observers or the study setting.

What would settle it

Direct observation of researchers using the same AI tools in their normal unrecorded workflows to check whether the same compensatory strategies appear without prompting from a study protocol.

read the original abstract

In the early stages of scientific research, researchers rely on core scholarly judgments to identify relevant literature, assess credible evidence, and determine which directions merit pursuit. As AI tools become increasingly integrated into these early-stage workflows, the scholarly judgments that were once transparent and attributable to individual researchers become obscured, raising critical Responsible AI (RAI) concerns around accountability, transparency, and trust. Yet how these three dimensions manifest in real-time, in-situ scholarly practice remains largely unexplored. To address this gap, we conducted a think-aloud study with 15 researchers to examine how they used AI tools powered by large language models (LLMs) across early-stage research tasks, including literature exploration, synthesis, and research ideation. Our key findings address the tripartite constructs of accountability, transparency, and trust. First, the confident tone of AI outputs misrepresents epistemic uncertainty, making it more difficult for researchers (who are ultimately accountable) to identify which outputs require the greatest scrutiny. Second, opaque retrieval and content construction make provenance difficult to establish for transparency. Third, trust in AI is fragile, context-dependent, and easily eroded. In response, participant researchers were seen to develop compensatory strategies to restore scholarly judgment under uncertainty. Overall, our findings serve to contextualize AI-mediated research as a RAI problem grounded in lived researcher experience and motivate attention to deliberate AI integration that preserves accountability, supports transparency, and fosters informed trust.

Editorial analysis

A structured set of objections, weighed in public.

Desk editor's note, referee report, simulated authors' rebuttal, and a circularity audit. Tearing a paper down is the easy half of reading it; the pith above is the substance, this is the friction.

Desk Editor's Note private letter to a colleague

This think-aloud study with 15 researchers surfaces some concrete examples of how people actually handle AI outputs during literature work and ideation, but the protocol itself likely shapes the behaviors being reported.

read the letter

The main takeaway is that researchers notice AI's confident tone masking uncertainty, struggle with tracing where outputs come from, and treat trust as something they have to rebuild on the fly with extra checks. Those observations come from direct sessions on literature exploration, synthesis, and ideation tasks. The work is new in applying think-aloud specifically to accountability, transparency, and trust questions in early-stage LLM use rather than general attitudes or later-stage writing. It does a decent job showing compensatory moves like double-checking sources or rephrasing prompts, which feel like real tactics people might use. The findings stay grounded in what participants said while working, without overclaiming broader theory. That said, the method carries a built-in problem: asking people to talk through their process in real time tends to make them more deliberate and self-justifying than they would be working silently. With no silent control condition and only 15 participants, it's hard to separate study effects from everyday practice. The abstract also skips details on the exact prompts, which AI tools were used, how transcripts were coded, or inter-rater checks, so the strength of the tripartite claims is still unclear until those sections are examined. This paper fits readers in HCI and responsible AI who want case-level evidence on scholarly workflows rather than large-scale surveys. It deserves a serious referee because it brings fresh empirical material to a practical problem, even if the current version needs tighter methods reporting and a clearer discussion of protocol limitations. I would send it out for review with those requests.

Referee Report

1 major / 2 minor

Summary. The manuscript describes a think-aloud study with 15 researchers using LLM-powered AI tools for early-stage research tasks like literature exploration, synthesis, and ideation. It claims that AI's confident tone misrepresents epistemic uncertainty, hindering accountability by making it hard to identify outputs needing scrutiny; opaque retrieval and content construction impede transparency by obscuring provenance; trust in AI is fragile, context-dependent, and easily eroded; and researchers develop compensatory strategies to restore scholarly judgment under uncertainty.

Significance. If these observations hold, the paper provides valuable empirical grounding for Responsible AI concerns in academic research. It illustrates how AI characteristics affect core scholarly judgments in practice and suggests ways to better integrate AI while maintaining accountability, transparency, and informed trust. This contributes to understanding human-AI collaboration in science.

major comments (1)

[Methods] The think-aloud protocol is the primary method for capturing real-time behaviors, but the paper does not discuss or mitigate potential reactivity effects. Requiring concurrent verbalization can increase cognitive load and lead to more cautious or compensatory behaviors than in natural silent use, which directly threatens the validity of the observed strategies for handling uncertainty, provenance, and trust. Without addressing this (e.g., via silent control conditions or post-hoc checks), the central claims lack sufficient grounding.

minor comments (2)

[Abstract] The abstract does not provide details on task prompts, specific AI tools, participant selection, coding scheme, or inter-rater reliability, which are important for assessing the study's rigor and findings.
[Discussion] Consider adding more concrete examples or quotes from participants to illustrate the compensatory strategies, as this would make the findings more vivid and persuasive.

Simulated Author's Rebuttal

1 responses · 0 unresolved

We thank the referee for their constructive and detailed feedback, which highlights an important methodological consideration. We address the major comment below and will revise the manuscript to strengthen the presentation of our methods.

read point-by-point responses

Referee: [Methods] The think-aloud protocol is the primary method for capturing real-time behaviors, but the paper does not discuss or mitigate potential reactivity effects. Requiring concurrent verbalization can increase cognitive load and lead to more cautious or compensatory behaviors than in natural silent use, which directly threatens the validity of the observed strategies for handling uncertainty, provenance, and trust. Without addressing this (e.g., via silent control conditions or post-hoc checks), the central claims lack sufficient grounding.

Authors: We agree that the manuscript does not explicitly discuss potential reactivity effects of the concurrent think-aloud protocol, and this is a valid methodological concern. Think-aloud was chosen as the primary method because it allows capture of real-time scholarly judgments during AI-assisted tasks without the distortions introduced by retrospective accounts, which aligns with our focus on in-situ accountability, transparency, and trust processes. However, we recognize that concurrent verbalization may have increased cognitive load or prompted more deliberate compensatory strategies than would occur in silent use. In the revised version, we will add a paragraph to the Methods section explaining the rationale for this protocol (drawing on established HCI and cognitive psychology literature) and expand the Limitations section to acknowledge reactivity as a potential influence on the observed behaviors. We will also note that no silent control conditions or formal post-hoc reactivity checks were included, as the study was designed as an initial qualitative exploration with a small sample prioritizing rich process data; this will be framed as a limitation and a direction for future comparative work. revision: yes

Circularity Check

0 steps flagged

No circularity: empirical qualitative study with direct observational grounding

full rationale

The paper reports findings from a think-aloud protocol involving 15 researchers performing literature exploration, synthesis, and ideation tasks with LLMs. All central claims (confident tone misrepresenting uncertainty, opaque provenance, fragile trust, and compensatory strategies) are presented as direct summaries of participant verbalizations and behaviors observed in the study sessions. No equations, fitted parameters, predictions, or first-principles derivations exist. No self-citations are invoked to justify uniqueness theorems or ansatzes that would reduce the findings to prior inputs. The study is self-contained against its own empirical data; the derivation chain consists solely of thematic analysis of recorded think-aloud sessions and does not loop back to its own assumptions by construction.

Axiom & Free-Parameter Ledger

0 free parameters · 1 axioms · 0 invented entities

The central claim rests on standard qualitative HCI assumptions without free parameters, new entities, or ad-hoc inventions; it applies established think-aloud methods to a new context.

axioms (1)

domain assumption Think-aloud protocols can reveal real-time decision-making processes and compensatory strategies in scholarly tasks.
Invoked to interpret participant verbalizations as reflective of accountability, transparency, and trust judgments during AI-assisted work.

pith-pipeline@v0.9.0 · 5567 in / 1529 out tokens · 57242 ms · 2026-05-08T07:17:33.929374+00:00 · methodology

discussion (0)

Reference graph

Works this paper leans on

79 extracted references · 22 canonical work pages · 2 internal anchors

[1]

Muhammad Naveed Akbar. 2025. Use of artificial intelligence tools by doctoral students: a mixed-methods explanatory-sequential investigation.Journal of Further and Higher Education(2025), 1–19

2025
[2]

Abdulrahman M Al-Zahrani. 2024. The impact of generative AI tools on re- searchers and research: Implications for academia in higher education.Innova- tions in Education and Teaching International61, 5 (2024), 1029–1043

2024
[3]

Hikari Ando, Rosanna Cousins, and Carolyn Young. 2014. Achieving saturation in thematic analysis: Development and refinement of a codebook.Comprehensive Psychology3 (2014), 03–CP

2014
[4]

Wenceslao Arroyo-Machado, Jinghuan Ma, Tipeng Chen, Timothy P Johnson, Shaika Islam, Lesley Michalegko, and Eric Welch. 2025. Generative AI and academic scientists in US universities: Perception, experience, and adoption intentions.PloS one20, 8 (2025), e0330416

2025
[5]

Tita Alissa Bach, Magnhild Kaarstad, Elizabeth Solberg, and Aleksandar Babic
[6]

Insights into suggested Responsible AI (RAI) practices in real-world settings: a systematic literature review.AI and Ethics5, 3 (2025), 3185–3232

2025
[7]

Emily M Bender, Timnit Gebru, Angelina McMillan-Major, and Shmargaret Shmitchell. 2021. On the dangers of stochastic parrots: Can language models be too big?. InProceedings of the 2021 ACM Conference on Fairness, Accountability, and Transparency. ACM, New York, NY, USA. doi:10.1145/3442188.3445922

work page doi:10.1145/3442188.3445922 2021
[8]

Sophie Berretta, Alina Tausch, Greta Ontrup, Björn Gilles, Corinna Peifer, and Annette Kluge. 2023. Defining human-AI teaming the human-centered way: a scoping review and network analysis.Frontiers in Artificial Intelligence6 (2023), 1250725

2023
[9]

Marcel Binz, Stephan Alaniz, Adina Roskies, Balazs Aczel, Carl T Bergstrom, Colin Allen, Daniel Schad, Dirk Wulff, Jevin D West, Qiong Zhang, Richard M Shiffrin, Samuel J Gershman, Vencislav Popov, Emily M Bender, Marco Marelli, Matthew M Botvinick, Zeynep Akata, and Eric Schulz. 2025. How should the advancement of large language models affect the practic...

work page doi:10.1073/pnas.2401227121 2025
[10]

2009.The Craft of research, third edition

Wayne C Booth, Gregory G Colomb, and Joseph M Williams. 2009.The Craft of research, third edition. University of Chicago Press, Chicago, IL

2009
[11]

Anna Carobene, Andrea Padoan, Federico Cabitza, Giuseppe Banfi, and Mario Plebani. 2024. Rising adoption of artificial intelligence in scientific publishing: evaluating the role, risks, and ethical implications in paper drafting and review process.Clinical Chemistry and Laboratory Medicine (CCLM)62, 5 (2024), 835– 843. 1https://www.cosmicai.org/ 2https://...

2024
[12]

Elizabeth Charters. 2003. The use of think-aloud methods in qualitative research an introduction to think-aloud methods.Brock Education Journal12, 2 (2003)

2003
[13]

Jiaqi Chen, Yanzhe Zhang, Yutong Zhang, Yijia Shao, and Diyi Yang. 2025. Gen- erative Interfaces for Language Models.arXiv preprint arXiv:2508.19227(2025)

work page internal anchor Pith review Pith/arXiv arXiv 2025
[14]

Qiguang Chen, Mingda Yang, Libo Qin, Jinhao Liu, Zheng Yan, Jiannan Guan, Dengyun Peng, Yiyan Ji, Hanjing Li, Mengkang Hu, et al . 2025. AI4Research: A Survey of Artificial Intelligence for Scientific Research.arXiv preprint arXiv:2507.01903(2025)

work page arXiv 2025
[15]

Nicholas Clark, Hua Shen, Bill Howe, and Tanushree Mitra. 2025. Epistemic alignment: A mediating framework for user-llm knowledge delivery.arXiv preprint arXiv:2504.01205(2025)

work page arXiv 2025
[16]

A Feder Cooper, Emanuel Moss, Benjamin Laufer, and Helen Nissenbaum. 2022. Accountability in an algorithmic society: relationality, responsibility, and robust- ness in machine learning. InProceedings of the 2022 ACM conference on fairness, accountability, and transparency. 864–876

2022
[17]

Eric Corbett and Remi Denton. 2023. Interrogating the T in FAccT. InProceedings of the 2023 ACM Conference on Fairness, Accountability, and Transparency. 1624– 1634

2023
[18]

Manuel Alejandro Cruz-Aguilar. 2025. The epistemic revolution of AI: reconfig- uring the foundations of scientific knowledge.AI & SOCIETY(2025), 1–17

2025
[19]

Advait Deshpande and Helen Sharp. 2022. Responsible AI Systems: Who are the Stakeholders?. InProceedings of the 2022 AAAI/ACM Conference on AI, Ethics, and Society (AIES ’22). Association for Computing Machinery, New York, NY, USA, 227–236. doi:10.1145/3514094.3534187

work page doi:10.1145/3514094.3534187 2022
[20]

David M Douglas. 2025. Researchers’ perceptions of automating scientific re- search.AI & SOCIETY40, 5 (2025), 4131–4144

2025
[21]

Mingming Fan, Serina Shi, and Khai N Truong. 2020. Practices and Challenges of Using Think-Aloud Protocols in Industry: An International Survey.Journal of Usability Studies15, 2 (2020)

2020
[22]

K J Kevin Feng, Kevin Pu, Matt Latzke, Tal August, Pao Siangliulue, Jonathan Bragg, Daniel S Weld, Amy X Zhang, and Joseph Chee Chang. 2026. Cocoa: Co-planning and co-execution with AI agents.arXiv [cs.HC](18 Feb. 2026). arXiv:2412.10999 [cs.HC] doi:10.48550/arXiv.2412.10999

work page doi:10.48550/arxiv.2412.10999 2026
[23]

Andrea Ferrario and Michele Loi. 2022. How explainability contributes to trust in AI. InProceedings of the 2022 ACM conference on fairness, accountability, and transparency. 1457–1466

2022
[24]

Marsha E Fonteyn, Benjamin Kuipers, and Susan J Grobe. 1993. A description of think aloud method and protocol analysis.Qualitative health research3, 4 (1993), 430–441

1993
[25]

Ben Gansky and Sean McDonald. 2022. CounterFAccTual: How FAccT under- mines its organizing principles. InProceedings of the 2022 ACM Conference on Fairness, Accountability, and Transparency. 1982–1992

2022
[26]

Sanjana Gautam, Mohit Chandra, Ankolika De, Tatiana Chakravorti, Girik Malik, and Munmun De Choudhury. 2025. Towards Experience-Centered AI: A Frame- work for Integrating Lived Experience in Design and Development. InProceedings of the AAAI/ACM Conference on AI, Ethics, and Society, Vol. 8. 1062–1077

2025
[27]

José Mauro Granjeiro, Altair Antoninha Del Bel Cury, Jaime Aparecido Cury, Mike Bueno, Manoel Damião Sousa-Neto, and Carlos Estrela. 2025. The future of scientific writing: AI tools, benefits, and ethical implications.Brazilian Dental Journal36 (2025), e25–6471

2025
[28]

Jingjing Hu and Xuesong Andy Gao. 2017. Using think-aloud protocol in self- regulated reading research.Educational Research Review22 (2017), 181–193

2017
[29]

Paul Humphreys. 2020. Why automated science should be cautiously wel- comed. InA Critical Reflection on Automated Science: Will Science Remain Human? Springer, 11–26

2020
[30]

Maurice Jakesch, Zana Buçinca, Saleema Amershi, and Alexandra Olteanu. 2022. How different groups prioritize ethical values for responsible AI. In2022 ACM Conference on Fairness Accountability and Transparency. ACM, New York, NY, USA, 310–323. doi:10.1145/3531146.3533097

work page doi:10.1145/3531146.3533097 2022
[31]

Hyeonsu Kang, Joseph Chee Chang, Yongsung Kim, and Aniket Kittur. 2022. Threddy: An interactive system for personalized thread-based exploration and organization of scientific literature. InProceedings of the 35th Annual ACM Sym- posium on User Interface Software and Technology. ACM, New York, NY, USA. doi:10.1145/3526113.3545660

work page doi:10.1145/3526113.3545660 2022
[32]

Hyeonsu B Kang, Tongshuang Wu, Joseph Chee Chang, and Aniket Kittur. 2023. Synergi: A mixed-initiative system for scholarly synthesis and sensemaking. In Proceedings of the 36th Annual ACM Symposium on User Interface Software and Technology (UIST ’23, Article 43). ACM, New York, NY, USA, 1–19. doi:10.1145/ 3586183.3606759

work page arXiv 2023
[33]

Shivani Kapania, Ruiyi Wang, Toby Jia-Jun Li, Tianshi Li, and Hong Shen. 2025. ’I’m Categorizing LLM as a Productivity Tool’: Examining Ethics of LLM Use in HCI Research Practices.Proceedings of the ACM on Human-Computer Interaction 9, 2 (2025), 1–26

2025
[34]

Mohamed Khalifa and Mona Albadawy. 2024. Using artificial intelligence in academic writing and research: An essential productivity tool.Computer Methods and Programs in Biomedicine Update5 (2024), 100145

2024
[35]

David Klahr and Herbert A Simon. 1999. Studies of scientific discovery: Com- plementary approaches and convergent findings.Psychological Bulletin125, 5 (1999), 524

1999
[36]

2023.Language models and cognitive automation for economic research

Anton Korinek. 2023.Language models and cognitive automation for economic research. Technical Report. national Bureau of economic Research

2023
[37]

Benjamin Laufer, Sameer Jain, A Feder Cooper, Jon Kleinberg, and Hoda Heidari
[38]

InProceedings of the 2022 ACM conference on fairness, accountability, and transparency

Four years of FAccT: A reflexive, mixed-methods analysis of research contributions, shortcomings, and future prospects. InProceedings of the 2022 ACM conference on fairness, accountability, and transparency. 401–426

2022
[39]

Weixin Liang, Yaohui Zhang, Zhengxuan Wu, Haley Lepp, Wenlong Ji, Xuandong Zhao, Hancheng Cao, Sheng Liu, Siyu He, Zhi Huang, et al. 2024. Mapping the increasing use of LLMs in scientific papers.arXiv preprint arXiv:2404.01268 (2024)

work page arXiv 2024
[40]

Q Vera Liao and S Shyam Sundar. 2022. Designing for Responsible Trust in AI Systems: A Communication Perspective. InProceedings of the 2022 ACM Conference on Fairness, Accountability, and Transparency (FAccT ’22). Association for Computing Machinery, New York, NY, USA, 1257–1268. doi:10.1145/3531146. 3533182

work page doi:10.1145/3531146 2022
[41]

Zhehui Liao, Maria Antoniak, Inyoung Cheong, Evie Yu-Yen Cheng, Ai-Heng Lee, Kyle Lo, Joseph Chee Chang, and Amy X Zhang. 2024. Llms as research tools: A large scale survey of researchers’ usage and perceptions.arXiv preprint arXiv:2411.05025(2024)

work page arXiv 2024
[42]

Yiren Liu, Pranav Sharma, Mehul Oswal, Haijun Xia, and Yun Huang. 2025. Per- sonaFlow: Designing LLM-Simulated Expert Perspectives for Enhanced Research Ideation. InProceedings of the 2025 ACM Designing Interactive Systems Conference. 506–534

2025
[43]

Kyle Lo, Joseph Chee Chang, Andrew Head, Jonathan Bragg, Amy X Zhang, Cassidy Trier, Chloe Anastasiades, Tal August, Russell Authur, Danielle Bragg, Erin Bransom, Isabel Cachola, Stefan Candra, Yoganand Chandrasekhar, Yen-Sung Chen, Evie Yu-Yen Cheng, Yvonne Chou, Doug Downey, Rob Evans, Raymond Fok, Fangzhou Hu, Regan Huff, Dongyeop Kang, Rodney Kinney, ...

2024
[44]

Chris Lu, Cong Lu, Robert Tjarko Lange, Jakob Foerster, Jeff Clune, and David Ha
[45]

The AI Scientist: Towards Fully Automated Open-Ended Scientific Discovery

The ai scientist: Towards fully automated open-ended scientific discovery. arXiv preprint arXiv:2408.06292(2024)

work page internal anchor Pith review arXiv 2024
[46]

Heljä Lundgrén-Laine and Sanna Salanterä. 2010. Think-aloud technique and protocol analysis in clinical decision-making research.Qualitative health research 20, 4 (2010), 565–575

2010
[47]

Arianna Manzini, Geoff Keeling, Nahema Marchal, Kevin R McKee, Verena Rieser, and Iason Gabriel. 2024. Should users trust advanced AI assistants? Justified trust as a function of competence and alignment. InProceedings of the 2024 ACM Conference on Fairness, Accountability, and Transparency. 1174–1186

2024
[48]

Siddharth Mehrotra, Carolina Centeio Jorge, Catholijn M Jonker, and Myrthe L Tielman. 2024. Integrity-based explanations for fostering appropriate trust in AI agents.ACM Transactions on Interactive Intelligent Systems14, 1 (2024), 1–36

2024
[49]

Meredith Ringel Morris. 2023. Scientists’ Perspectives on the Potential for Gen- erative AI in their Fields.arXiv preprint arXiv:2304.01420(2023)

work page arXiv 2023
[50]

Kristoffer L Nielbo, Folgert Karsdorp, Melvin Wevers, Alie Lassche, Rebekah B Baglini, Mike Kestemont, and Nina Tahmasebi. 2024. Quantitative text analysis. Nature Reviews Methods Primers4, 1 (2024), 25

2024
[51]

Gabrielle O’Brien. 2025. How Scientists Use Large Language Models to Program. InProceedings of the 2025 CHI Conference on Human Factors in Computing Systems. 1–16

2025
[52]

Adetoun A Oyelude. 2024. Artificial intelligence (AI) tools for academic research. Library Hi Tech News41, 8 (2024), 18–20

2024
[53]

Saumya Pareek, Eduardo Velloso, and Jorge Goncalves. 2024. Trust Development and Repair in AI-Assisted Decision-Making during Complementary Expertise. In Proceedings of the 2024 ACM Conference on Fairness, Accountability, and Trans- parency. 546–561

2024
[54]

Anh Ngoc Quynh Phan and Chloe Le. 2025. AI as research partner: key impli- cations of using AI for data visualisation in qualitative research.International Journal of Social Research Methodology(2025), 1–8

2025
[55]

Robert Pinzolits. 2024. AI in academia: An overview of selected tools and their areas of application.MAP Education and Humanities4 (2024), 37–50

2024
[56]

Kevin Pu, KJ Kevin Feng, Tovi Grossman, Tom Hope, Bhavana Dalvi Mishra, Matt Latzke, Jonathan Bragg, Joseph Chee Chang, and Pao Siangliulue. 2025. Ideasynth: Iterative research idea development through evolving and composing idea facets with literature-grounded feedback. InProceedings of the 2025 CHI Conference on Human Factors in Computing Systems. 1–31

2025
[57]

Habeeb Ibrahim Abdul Razack, Sam T Mathew, Fathinul Fikri Ahmad Saad, and Saleh A Alqahtani. 2021. Artificial intelligence-assisted tools for redefining the communication landscape of the scholarly world.Science Editing8, 2 (2021), 134–144. FAccT’26, June 25–28, 2026, Montreal, Canada Gautam et al

2021
[58]

Anka Reuel, Patrick Connolly, Kiana Jafari Meimandi, Shekhar Tewari, Jakub Wiatrak, Dikshita Venkatesh, and Mykel Kochenderfer. 2025. Responsible ai in the global context: Maturity model and survey. InProceedings of the 2025 ACM Conference on Fairness, Accountability, and Transparency. 2505–2541

2025
[59]

Cynthia Rudin. 2019. Stop Explaining Black Box Machine Learning Models for High Stakes Decisions and Use Interpretable Models Instead.Nature machine intelligence1, 5 (May 2019), 206–215. doi:10.1038/s42256-019-0048-x

work page doi:10.1038/s42256-019-0048-x 2019
[60]

Daniel Schiff, Bogdana Rakova, Aladdin Ayesh, Anat Fanti, and Michael Lennon
[61]

Principles to practices for responsible AI: closing the gap.arXiv preprint arXiv:2006.04707(2020)

work page arXiv 2006
[62]

Samuel Schmidgall, Yusheng Su, Ze Wang, Ximeng Sun, Jialian Wu, Xiaodong Yu, Jiang Liu, Michael Moor, Zicheng Liu, and Emad Barsoum. 2025. Agent Laboratory: Using LLM agents as research assistants.arXiv [cs.HC](17 June 2025). arXiv:2501.04227 [cs.HC]

work page arXiv 2025
[63]

Hope Schroeder, Marianne Aubin Le Quéré, Casey Randazzo, David Mimno, and Sarita Schoenebeck. 2024. Large language models in qualitative research: Can we do the data justice.arXiv preprint arXiv:2410.07362(2024)

work page arXiv 2024
[64]

Leixian Shen, Enya Shen, Yuyu Luo, Xiaocong Yang, Xuming Hu, Xiongshuai Zhang, Zhiwei Tai, and Jianmin Wang. 2022. Towards natural language interfaces for data visualization: A survey.IEEE transactions on visualization and computer graphics29, 6 (2022), 3121–3144

2022
[65]

Yang Shi, Tian Gao, Xiaohan Jiao, and Nan Cao. 2023. Understanding design collaboration between designers and artificial intelligence: a systematic literature review.Proceedings of the ACM on Human-Computer Interaction7, CSCW2 (2023), 1–35

2023
[66]

Scott Spillias, Paris Tuohy, Matthew Andreotta, Ruby Annand-Jones, Fabio Boschetti, Christopher Cvitanovic, Joseph Duggan, Elisabeth A Fulton, Denis B Karcher, Cecile Paris, et al. 2024. Human-AI collaboration to identify literature for evidence synthesis.Cell Reports Sustainability1, 7 (2024)

2024
[67]

Chris Stokel-Walker. 2023. ChatGPT listed as author on research papers: many scientists disapprove.Nature613, 7945 (Jan. 2023), 620–621. doi:10.1038/d41586- 023-00107-z

work page doi:10.1038/d41586- 2023
[68]

Lu Sun, Stone Tao, Junjie Hu, and Steven P Dow. 2024. Metawriter: Exploring the potential and perils of ai writing support in scientific peer review.Proceedings of the ACM on Human-Computer Interaction8, CSCW1 (2024), 1–32

2024
[69]

Cecilie Steenbuch Traberg, Jon Roozenbeek, and Sander van der Linden. 2026. AI is turning research into a scientific monoculture.Communications Psychology 4, 1 (2026), 37

2026
[70]

Richard Van Noorden and Jeffrey M Perkel. 2023. AI and science: what 1,600 researchers think.Nature621, 7980 (2023), 672–675

2023
[71]

Maike Vollstedt and Sebastian Rezat. 2019. An introduction to grounded theory with a special focus on axial coding and the coding paradigm.Compendium for early career researchers in mathematics education13, 1 (2019), 81–100

2019
[72]

Kelly B Wagman, Matthew T Dearing, and Marshini Chetty. 2025. Generative AI Uses and Risks for Knowledge Workers in a Science Organization. InProceedings of the 2025 CHI Conference on Human Factors in Computing Systems. 1–17

2025
[73]

Hanchen Wang, Tianfan Fu, Yuanqi Du, Wenhao Gao, Kexin Huang, Ziming Liu, Payal Chandak, Shengchao Liu, Peter Van Katwyk, Andreea Deac, et al . 2023. Scientific discovery in the age of artificial intelligence.Nature620, 7972 (2023), 47–60

2023
[74]

Hanchen Wang, Tianfan Fu, Yuanqi Du, Wenhao Gao, Kexin Huang, Ziming Liu, Payal Chandak, Shengchao Liu, Peter Van Katwyk, Andreea Deac, Anima Anandkumar, Karianne Bergen, Carla P Gomes, Shirley Ho, Pushmeet Kohli, Joan Lasenby, Jure Leskovec, Tie-Yan Liu, Arjun Manrai, Debora Marks, Bharath Ramsundar, Le Song, Jimeng Sun, Jian Tang, Petar Veličković, Max ...

work page doi:10.1038/s41586-023-06221-2 2023
[75]

Haomin Wen, Zhenjie Wei, Yan Lin, Jiyuan Wang, Yuxuan Liang, and Huaiyu Wan. 2024. Overleafcopilot: Empowering academic writing in overleaf with large language models.arXiv preprint arXiv:2403.09733(2024)

work page arXiv 2024
[76]

Yongjun Xu, Xin Liu, Xin Cao, Changping Huang, Enke Liu, Sen Qian, Xingchen Liu, Yanjun Wu, Fengliang Dong, Cheng-Wei Qiu, et al. 2021. Artificial intelli- gence: A powerful paradigm for scientific research.The Innovation2, 4 (2021)

2021
[77]

Yuchi Yahagi, Rintaro Chujo, Yuga Harada, Changyo Han, Kohei Sugiyama, and Takeshi Naemura. 2025. PaperWave: Listening to Research Papers as Conversa- tional Podcasts Scripted by LLM. InProceedings of the Extended Abstracts of the CHI Conference on Human Factors in Computing Systems. 1–10

2025
[78]

Lixiang Yan, Vanessa Echeverria, Gloria Milena Fernandez-Nieto, Yueqiao Jin, Zachari Swiecki, Linxuan Zhao, Dragan Gašević, and Roberto Martinez- Maldonado. 2024. Human-ai collaboration in thematic analysis using chatgpt: A user study and design recommendations. InExtended Abstracts of the CHI Conference on Human Factors in Computing Systems. 1–7

2024
[79]

Chengbo Zheng, Yuanhao Zhang, Zeyu Huang, Chuhan Shi, Minrui Xu, and Xiaojuan Ma. 2024. Disciplink: Unfolding interdisciplinary information seeking process via human-ai co-exploration. InProceedings of the 37th Annual ACM Symposium on User Interface Software and Technology. 1–20. Received 13 January 2025

2024