pith. machine review for the scientific record. sign in

arxiv: 2604.05151 · v1 · submitted 2026-04-06 · 💻 cs.CY

Recognition: 1 theorem link

· Lean Theorem

Context Collapse: Barriers to Adoption for Generative AI in Workplace Settings

Authors on Pith no claims yet

Pith reviewed 2026-05-10 18:53 UTC · model grok-4.3

classification 💻 cs.CY
keywords generative AIworkplace adoptioncontext awarenesshuman-AI interactioncontext collapseuser strategiesAI deployment barriers
0
0 comments X

The pith

Generative AI tools in workplaces fail because distinct user contexts collapse into one another or rot over time, reducing the value of any context-accounting efforts.

A machine-rendered reading of the paper's core claim, the machinery that carries it, and where it could break.

The paper uses expert interviews to show that generative AI systems pressed into workplace use do not meet expectations for handling context. Developers, users, and social scientists hold different ideas of what context means, and computational methods hit specific pitfalls as a result. Multiple contexts merge or degrade, which makes sustained efforts to track context less effective. Users respond with practical workarounds. The authors conclude that simply gathering more context data is not the solution and instead call for interactional practices that better embed these tools in actual use.

Core claim

Expert interviews reveal that generative AI tools fail to account for users' contexts in workplace settings. Distinct contexts collapse into one another or rot and degrade over time, lowering the utility of any computational effort to represent context. Tool developers, users, and social scientists conceptualize context differently, producing concrete pitfalls in current approaches. Users adopt specific strategies to work around these shortfalls. The paper ends by advocating a move from indiscriminate data collection toward interactional practices that embed GenAI systems more appropriately in their contexts of use.

What carries the argument

Context collapse and rotting, the process in which multiple distinct contexts merge or degrade over time, is the mechanism that undermines computational attempts to account for context in generative AI tools.

If this is right

  • Any system that tries to represent workplace context through data collection alone will lose effectiveness as contexts collapse or rot.
  • Users will continue to improvise their own strategies to compensate for the tools' inability to handle shifting contexts.
  • Differing ideas of context among developers, users, and social scientists create predictable failures in how generative AI is designed and adopted.
  • Shifting design focus from data collection to ongoing interactional practices would better match how people actually use these tools.

Where Pith is reading between the lines

These are editorial extensions of the paper, not claims the author makes directly.

  • Design teams could test lightweight, ongoing user feedback loops instead of one-time context data gathering to reduce collapse effects.
  • The same collapse pattern may appear in other AI tools that rely on static user profiles, such as recommendation or productivity systems.
  • Organizations adopting generative AI might benefit from explicit training on spotting when context has rotted and how to refresh it.

Load-bearing premise

The interviewed experts' experiences and strategies sufficiently represent those of typical users and developers and can be generalized without further checks.

What would settle it

A workplace deployment study that tracks whether context data collected for a generative AI tool stays relevant or shows clear degradation in usefulness after several months of use.

read the original abstract

As generative AI technologies are pressed into service in workplace settings, current approaches to account for the contexts in which such technologies are used fall short of users' expectations and needs. This paper empirically demonstrates, through expert interviews, both how these tools fail to account for users' context and how users deploy concrete strategies address such failures. The paper analyzes how context is variously conceptualized by tool developers, users, and social scientists to identify specific pitfalls inherent in computational approaches to context. Multiple distinct contexts tend to collapse into one another or rot, degrading over time, reducing the utility of any efforts to account for context. The paper concludes with a provocation to shift from an indiscriminate collection of context-relevant data toward a more interactional set of practices to embed GenAI systems more appropriately into users' contexts of use.

Editorial analysis

A structured set of objections, weighed in public.

Desk editor's note, referee report, simulated authors' rebuttal, and a circularity audit. Tearing a paper down is the easy half of reading it; the pith above is the substance, this is the friction.

Referee Report

2 major / 1 minor

Summary. The paper claims that generative AI tools deployed in workplace settings fail to account for users' contexts, resulting in 'context collapse' (where distinct contexts merge) or 'context rot' (degradation over time), which reduces the utility of context-aware features. Drawing on expert interviews, it demonstrates specific tool failures and user mitigation strategies, contrasts computational conceptualizations of context with those from users and social scientists, and concludes by advocating a shift from indiscriminate data collection to interactional practices for embedding GenAI systems more appropriately in use contexts.

Significance. If the empirical findings hold, the work makes a meaningful contribution to CSCW and HCI by providing a grounded critique of context modeling in generative AI, identifying collapse and rot as recurring barriers to adoption. The provocation toward interactional practices offers a constructive design-oriented alternative to purely technical solutions, with potential to inform more robust workplace AI deployments and future studies on context dynamics.

major comments (2)
  1. [Methods] Methods section: The manuscript provides no details on the expert interview sample (size, selection criteria, participant roles such as developers versus end-users, or demographics), interview protocol, or qualitative analysis approach. Since the central claims about context collapse, rotting, and user strategies rest entirely on interpretive analysis of these interviews, the absence of this information prevents assessment of generalizability and leaves the evidence strength unverifiable.
  2. [Findings] Findings/Discussion: The claim that 'multiple distinct contexts tend to collapse into one another or rot' is presented as an empirical demonstration, yet the text offers few concrete, quoted examples from the interviews showing how collapse or rot manifests in specific workplace scenarios or how the identified user strategies directly counteract it. This makes the load-bearing phenomenon somewhat underspecified relative to the abstract's description.
minor comments (1)
  1. [Abstract] The abstract contains a minor grammatical issue ('how users deploy concrete strategies address such failures' is missing 'to').

Simulated Author's Rebuttal

2 responses · 0 unresolved

We thank the referee for their constructive feedback, which highlights important areas for strengthening the transparency and specificity of our empirical claims. We address each major comment below and will revise the manuscript to incorporate additional details and examples.

read point-by-point responses
  1. Referee: [Methods] Methods section: The manuscript provides no details on the expert interview sample (size, selection criteria, participant roles such as developers versus end-users, or demographics), interview protocol, or qualitative analysis approach. Since the central claims about context collapse, rotting, and user strategies rest entirely on interpretive analysis of these interviews, the absence of this information prevents assessment of generalizability and leaves the evidence strength unverifiable.

    Authors: We agree that the current Methods section lacks sufficient detail for readers to fully assess the evidence base. In the revised manuscript, we will add a dedicated subsection describing the sample (including size, selection criteria such as expertise in AI deployment or workplace use, participant roles distinguishing developers from end-users, and relevant demographics), the semi-structured interview protocol, and the qualitative analysis approach (thematic analysis with iterative coding). This will directly address concerns about verifiability and generalizability. revision: yes

  2. Referee: [Findings] Findings/Discussion: The claim that 'multiple distinct contexts tend to collapse into one another or rot' is presented as an empirical demonstration, yet the text offers few concrete, quoted examples from the interviews showing how collapse or rot manifests in specific workplace scenarios or how the identified user strategies directly counteract it. This makes the load-bearing phenomenon somewhat underspecified relative to the abstract's description.

    Authors: We acknowledge that the presentation of the core phenomena would benefit from greater specificity. In the revised Findings section, we will include additional direct quotes and anonymized workplace scenarios drawn from the interviews to illustrate how context collapse and rot occur in practice, as well as how the documented user strategies mitigate these issues. This will make the empirical grounding more explicit while preserving participant confidentiality. revision: yes

Circularity Check

0 steps flagged

No significant circularity; qualitative empirical analysis is self-contained

full rationale

The paper advances its claims about context collapse through interpretive analysis of expert interviews, without any equations, derivations, fitted parameters, or self-referential definitions. Central premises rest on primary data collection rather than reducing to prior outputs or self-citations by construction. This matches the default expectation for non-circular empirical work.

Axiom & Free-Parameter Ledger

0 free parameters · 1 axioms · 0 invented entities

The work rests on qualitative social science assumptions about the validity of interview data for understanding real-world technology use barriers, with no free parameters or new entities introduced.

axioms (1)
  • domain assumption Expert interviews can reliably reveal users' strategies and conceptualizations of context in AI use.
    The paper's empirical demonstration relies on this to support claims about failures and strategies.

pith-pipeline@v0.9.0 · 5443 in / 1285 out tokens · 77434 ms · 2026-05-10T18:53:24.433347+00:00 · methodology

discussion (0)

Sign in with ORCID, Apple, or X to comment. Anyone can read and Pith papers without signing in.

Lean theorems connected to this paper

Citations machine-checked in the Pith Canon. Every link opens the source theorem in the public Lean library.

What do these tags mean?
matches
The paper's claim is directly supported by a theorem in the formal canon.
supports
The theorem supports part of the paper's argument, but the paper may add assumptions or extra steps.
extends
The paper goes beyond the formal theorem; the theorem is a base layer rather than the whole result.
uses
The paper appears to rely on the theorem as machinery.
contradicts
The paper's claim conflicts with a theorem or certificate in the canon.
unclear
Pith found a possible connection, but the passage is too broad, indirect, or ambiguous to say the theorem truly supports the claim.

Reference graph

Works this paper leans on

137 extracted references · 25 canonical work pages · 4 internal anchors

  1. [1]

    Syed AbuMusab. 2024. Generative AI and human labor: who is replaceable? AI & SOCIETY 39, 6 (2024), 3051–3053

  2. [2]

    Ackerman

    Mark S. Ackerman. 2000. The Intellectual Challenge of CSCW: The Gap Between Social Requirements and Technical Feasibility. Human–Computer Interaction 15, 2 (2000), 179–203. doi:10.1207/S15327051HCI1523_5

  3. [3]

    Philip E. Agre. 1997. Toward a Critical Technical Practice. In Social Science, Technical Systems, and Cooperative Work: Beyond the Great Divide , Geoffrey C. Bowker, Les Gasser, Susan Leigh Star, and Bill Turner (Eds.). Psychology Press, 131–158

  4. [4]

    Philip E Agre and David Chapman. 1987. Pengi: An implementation of a theory of activity. In Proceedings of the sixth National conference on Artificial intelligence-Volume 1. 268–272

  5. [5]

    Varol Akman. 2000. Rethinking context as a social construct. Journal of pragmatics 32, 6 (2000), 743–759

  6. [6]

    Humaid Al Naqbi, Zied Bahroun, and Vian Ahmed. 2024. Enhancing Work Productivity through Generative Artificial Intelligence: A Comprehensive Lit- erature Review. Sustainability 16, 3 (2024). doi:10.3390/su16031166

  7. [7]

    Humaid Al Naqbi, Zied Bahroun, and Vian Ahmed. 2024. Enhancing work pro- ductivity through generative artificial intelligence: A comprehensive literature review. Sustainability 16, 3 (2024), 1166

  8. [8]

    Abhijit Anand, Vinay Setty, Avishek Anand, et al. 2023. Context aware query rewriting for text rankers using llm. arXiv preprint arXiv:2308.16753 (2023)

  9. [9]

    Anthropic. 2025. Bringing Memory to Claude. Anthropic Blog (2025). https: //www.anthropic.com/news/memory

  10. [10]

    Suriya Ganesh Ayyamperumal and Limin Ge. 2024. Current state of LLM Risks and AI Guardrails. arXiv preprint arXiv:2406.12934 (2024)

  11. [11]

    Stefan Baack. 2024. A critical analysis of the largest source for generative ai training data: Common crawl. In Proceedings of the 2024 ACM Conference on Fairness, Accountability, and Transparency. 2199–2208

  12. [12]

    Yuntao Bai, Andy Jones, Kamal Ndousse, Amanda Askell, Anna Chen, Nova DasSarma, Dawn Drain, Stanislav Fort, Deep Ganguli, Tom Henighan, et al

  13. [13]

    Training a Helpful and Harmless Assistant with Reinforcement Learning from Human Feedback

    Training a helpful and harmless assistant with reinforcement learning from human feedback. arXiv preprint arXiv:2204.05862 (2022)

  14. [14]

    Yushi Bai, Xin Lv, Jiajie Zhang, Yuze He, Ji Qi, Lei Hou, Jie Tang, Yuxiao Dong, and Juanzi Li. 2024. Longalign: A recipe for long context alignment of large language models. arXiv preprint arXiv:2401.18058 (2024)

  15. [15]

    Brian Balfour. 2025. The Next Great Distribution Shift. https://blog.brianbalfour. com/p/the-next-great-distribution-shift

  16. [16]

    Ajay Bandi, Pydi Venkata Satya Ramesh Adapa, and Yudu Eswar Vinay Pratap Kumar Kuchi. 2023. The power of generative ai: A review of require- ments, models, input–output formats, evaluation metrics, and challenges.Future Internet 15, 8 (2023), 260

  17. [17]

    Veronica Barassi. 2024. Toward a theory of AI errors: Making sense of halluci- nations, catastrophic failures, and the fallacy of generative AI. Harvard Data Science Review Special Issue 5 (2024)

  18. [18]

    Linda Liska Belgrave and Kapriskie Seide. 2019. Coding for grounded theory. The SAGE handbook of current developments in grounded theory (2019), 167–185

  19. [19]

    Stuart Bender. 2025. Generative-AI, the media industries, and the disappearance of human creative labour. Media Practice and Education 26, 2 (2025), 200–217

  20. [20]

    Debraj Bhattacharya. 2025. Context Is the New Moat in the AI-Driven Enterprise. Medium (2025). https://medium.com/@debraj.bhattacharya83/context-is-the- new-moat-in-the-ai-driven-enterprise-3d94d195e77b

  21. [21]

    Paula Bialski. 2024. Middle Tech: Software Work and the Culture of Good Enough . Princeton University Press

  22. [22]

    Alexander Bick, Adam Blandin, and David Deming. 2025. The Impact of Gen- erative AI on Work Productivity . Technical Report. Federal Reserve Bank of St. Louis. https://www.stlouisfed.org/on-the-economy/2025/feb/impact- generative-ai-work-productivity

  23. [23]

    Andrea J Bingham and Patricia Witkowsky. 2021. Deductive and inductive approaches to qualitative data analysis. Analyzing and interpreting qualitative data: After the interview 1 (2021), 133–146

  24. [24]

    Ann E Blandford. 2013. Semi-structured qualitative studies. Interaction Design Foundation

  25. [25]

    Petter Bae Brandtzaeg and Marika Lüders. 2018. Time collapse in social media: Extending the context collapse. Social Media+ Society 4, 1 (2018), 2056305118763349

  26. [26]

    Kristin A Briney. 2024. Measuring data rot: An analysis of the continued availability of shared data from a Single University. PloS one 19, 6 (2024), e0304781

  27. [27]

    Tom Brown, Benjamin Mann, Nick Ryder, Melanie Subbiah, Jared D Kaplan, Prafulla Dhariwal, Arvind Neelakantan, Pranav Shyam, Girish Sastry, Amanda Askell, et al. 2020. Language models are few-shot learners. Advances in neural information processing systems 33 (2020), 1877–1901

  28. [28]

    Erik Brynjolfsson, Danielle Li, and Lindsey Raymond. 2025. Generative AI at work. The Quarterly Journal of Economics 140, 2 (2025), 889–942

  29. [29]

    Inha Cha and Richmond Y Wong. 2025. Understanding Socio-technical Factors Configuring AI Non-Use in UX Work Practices. In Proceedings of the 2025 CHI Conference on Human Factors in Computing Systems . 1–17

  30. [30]

    Aditya Challapally, Chris Pease, Ramesh Raskar, and Pradyumna Chari. 2025. The genai divide: State of ai in business 2025

  31. [31]

    Kaiyan Chang, Kun Wang, Nan Yang, Ying Wang, Dantong Jin, Wenlong Zhu, Zhirong Chen, Cangyuan Li, Hao Yan, Yunhao Zhou, et al . 2024. Data is all Moss et al. you need: Finetuning llms for chip design via an automated design-data aug- mentation framework. In Proceedings of the 61st ACM/IEEE Design Automation Conference. 1–6

  32. [32]

    Yupeng Chang, Xu Wang, Jindong Wang, Yuan Wu, Linyi Yang, Kaijie Zhu, Hao Chen, Xiaoyuan Yi, Cunxiang Wang, Yidong Wang, et al. 2024. A survey on evaluation of large language models. ACM transactions on intelligent systems and technology 15, 3 (2024), 1–45

  33. [33]

    Aaron Chatterji, Thomas Cunningham, David Deming, Zoe Hitzig, Christopher Ong, Carl Yan Shan, and Kevin Wadman. 2025. How People Use ChatGPT . Technical Report w34255. National Bureau of Economic Research. w34255 pages. doi:10.3386/w34255

  34. [34]

    Shreyas Chaudhari, Pranjal Aggarwal, Vishvak Murahari, Tanmay Rajpurohit, Ashwin Kalyan, Karthik Narasimhan, Ameet Deshpande, and Bruno Castro da Silva. 2025. Rlhf deciphered: A critical analysis of reinforcement learning from human feedback for llms. Comput. Surveys 58, 2 (2025), 1–37

  35. [35]

    Peter Baile Chen, Tomer Wolfson, Michael Cafarella, and Dan Roth. 2025. En- richIndex: Using LLMs to Enrich Retrieval Indices Offline. arXiv preprint arXiv:2504.03598 (2025)

  36. [36]

    Paola Cillo and Gaia Rubera. 2025. Generative AI in innovation and marketing processes: A roadmap of research opportunities. Journal of the Academy of Marketing Science 53, 3 (2025), 684–701

  37. [37]

    A Feder Cooper, Emanuel Moss, Benjamin Laufer, and Helen Nissenbaum

  38. [38]

    In Proceedings of the 2022 ACM conference on fairness, accountability, and transparency

    Accountability in an algorithmic society: relationality, responsibility, and robustness in machine learning. In Proceedings of the 2022 ACM conference on fairness, accountability, and transparency. 864–876

  39. [39]

    Michael A Cusumano. 2023. Generative AI as a new innovation platform. Commun. ACM 66, 10 (2023), 18–21

  40. [40]

    Foundations of Language

    Eve Danziger. 2010. Deixis, gesture, and cognition in spatial frame of reference typology. Studies in Language. International Journal sponsored by the Foundation “Foundations of Language” 34, 1 (2010), 167–185

  41. [41]

    Jenny L Davis and Nathan Jurgenson. 2014. Context collapse: Theorizing context collusions and collisions. Information, communication & society 17, 4 (2014), 476–485

  42. [42]

    Nassim Dehouche. 2021. Plagiarism in the age of massive Generative Pre- trained Transformers (GPT-3). Ethics in Science and Environmental Politics 21 (2021), 17–23

  43. [43]

    Sarah Desrochers, James Wilson, and Matthew Beauchesne. 2024. Reducing hallucinations in large language models through contextual position encoding. (2024)

  44. [44]

    Yi Dong, Ronghui Mu, Yanghao Zhang, Siqi Sun, Tianle Zhang, Changshun Wu, Gaojie Jin, Yi Qi, Jinwei Hu, Jie Meng, et al. 2025. Safeguarding large language models: A survey. Artificial Intelligence Review 58, 12 (2025), 382

  45. [45]

    Paul Dourish. 2004. What we talk about when we talk about context. Personal and ubiquitous computing 8, 1 (2004), 19–30

  46. [46]

    working memory

    Amanda Dsouza, Christopher Glaze, Changho Shin, and Frederic Sala. 2024. Evaluating language model context windows: A" working memory" test and inference-time correction. arXiv preprint arXiv:2407.03651 (2024)

  47. [47]

    Vincent Enoasmo, Cedric Featherstonehaugh, Xavier Konstantinopoulos, and Zacharias Huntington. 2025. Structural embedding projection for contextual large language model inference. arXiv preprint arXiv:2501.18826 (2025)

  48. [48]

    Jad Esber, Sean Thielen-Esparza, Yondon Fu, and Tina He. 2025. Context is All You Need. A vailable at SSRN 5277474 (2025)

  49. [49]

    Wendy Nelson Espeland and Michael Sauder. 2007. Rankings and reactivity: How public measures recreate social worlds. American journal of sociology 113, 1 (2007), 1–40

  50. [50]

    L John Fahrner, Emma Chen, Eric Topol, and Pranav Rajpurkar. 2025. The generative era of medical AI. Cell 188, 14 (2025), 3648–3660

  51. [51]

    Katja Fleischmann. 2024. Generative Artificial Intelligence in Graphic Design Education: A Student Perspective. Canadian Journal of Learning and Technology 50, 1 (2024), 1–17

  52. [52]

    Monika Fludernik. 1991. Shifters and deixis: Some reflections on Jakobson, Jes- persen, and reference. Walter de Gruyter, Berlin/New York Berlin, New York

  53. [53]

    Ben Gansky and Sean McDonald. 2022. CounterFAccTual: How FAccT under- mines its organizing principles. In Proceedings of the 2022 ACM Conference on Fairness, Accountability, and Transparency. 1982–1992

  54. [54]

    Yunfan Gao, Yun Xiong, Xinyu Gao, Kangxiang Jia, Jinliu Pan, Yuxi Bi, Yixin Dai, Jiawei Sun, Haofen Wang, and Haofen Wang. 2023. Retrieval-augmented generation for large language models: A survey. arXiv preprint arXiv:2312.10997 2, 1 (2023)

  55. [55]

    Michail Giannakos, Roger Azevedo, Peter Brusilovsky, Mutlu Cukurova, Yannis Dimitriadis, Davinia Hernandez-Leo, Sanna Järvelä, Manolis Mavrikis, and Bart Rienties. 2025. The promise and challenges of generative AI in education. Behaviour & Information Technology 44, 11 (2025), 2518–2544

  56. [56]

    Barney G Glaser and Anselm L Strauss. 1998. Grounded theory. Strategien qualitativer Forschung. Bern: Huber 4 (1998)

  57. [57]

    Dhruv Grewal, Cinthia B Satornino, Thomas Davenport, and Abhijit Guha. 2025. How generative AI Is shaping the future of marketing. Journal of the Academy of Marketing Science 53, 3 (2025), 702–722

  58. [58]

    C. J. Gustafson. 2025. Context as a Competitive Moat. (2025). https://www. lookingforleverage.com/p/context-as-a-competitive-moat

  59. [59]

    Manuel Hoffmann, Sam Boysel, Frank Nagle, Sida Peng, and Kevin Xu. 2024. Generative AI and the Nature of Work. Technical Report. CESifo Working Paper

  60. [60]

    Tarik Houichime and Younes El Amrani. 2025. Context Is All You Need: A Hybrid Attention-Based Method for Detecting Code Design Patterns. IEEE Access (2025)

  61. [61]

    Lei Huang, Weijiang Yu, Weitao Ma, Weihong Zhong, Zhangyin Feng, Haotian Wang, Qianglong Chen, Weihua Peng, Xiaocheng Feng, Bing Qin, et al. 2025. A survey on hallucination in large language models: Principles, taxonomy, challenges, and open questions. ACM Transactions on Information Systems 43, 2 (2025), 1–55

  62. [62]

    Yuheng Huang, Jiayang Song, Zhijie Wang, Shengming Zhao, Huaming Chen, Felix Juefei-Xu, and Lei Ma. 2023. Look before you leap: An exploratory study of uncertainty measurement for large language models. arXiv preprint arXiv:2307.10236 (2023)

  63. [63]

    Xiang Hui, Oren Reshef, and Luofeng Zhou. 2024. The short-term effects of generative artificial intelligence on employment: Evidence from an online labor market. Organization Science 35, 6 (2024), 1977–1989

  64. [64]

    Dell Hymes. 2020. The scope of sociolinguistics. International Journal of the Sociology of Language 2020, 263 (2020), 67–76

  65. [65]

    Nanna Inie, Jeanette Falk, and Steve Tanimoto. 2023. Designing participatory ai: Creative professionals’ worries and expectations about generative ai. In Extended Abstracts of the 2023 CHI Conference on Human Factors in Computing Systems. 1–8

  66. [66]

    Sunnie SY Kim, Elizabeth Anne Watkins, Olga Russakovsky, Ruth Fong, and Andrés Monroy-Hernández. 2023. Humans, ai, and context: Understanding end-users’ trust in a real-world computer vision application. InProceedings of the 2023 ACM Conference on Fairness, Accountability, and Transparency . 77–88

  67. [67]

    Heiko Koziolek, Sten Grüner, Rhaban Hark, Virendra Ashiwal, Sofia Linsbauer, and Nafise Eskandani. 2024. LLM-based and retrieval-augmented control code generation. In Proceedings of the 1st International Workshop on Large Language Models for Code. 22–29

  68. [68]

    Philippe Laban, Alexander R Fabbri, Caiming Xiong, and Chien-Sheng Wu

  69. [69]

    Laban, A

    Summary of a haystack: A challenge to long-context llms and rag systems. arXiv preprint arXiv:2407.01370 (2024)

  70. [70]

    Oran Lang, Doron Yaya-Stupp, Ilana Traynis, Heather Cole-Lewis, Chloe R Bennett, Courtney R Lyles, Charles Lau, Michal Irani, Christopher Semturs, Dale R Webster, et al. 2024. Using generative AI to investigate medical imagery models and datasets. EBioMedicine 102 (2024)

  71. [71]

    Matthew Law and Rama Adithya Varanasi. 2025. Generative AI and Changing Work: Systematic Review of Practitioner-Led Work Transformations Through the Lens of Job Crafting. In International Conference on Human-Computer Inter- action. Springer, 131–152

  72. [72]

    Patrick Lewis, Ethan Perez, Aleksandra Piktus, Fabio Petroni, Vladimir Karpukhin, Naman Goyal, Heinrich Küttler, Mike Lewis, Wen-tau Yih, Tim Rock- täschel, et al. 2020. Retrieval-augmented generation for knowledge-intensive nlp tasks. Advances in neural information processing systems 33 (2020), 9459– 9474

  73. [73]

    Belinda Z Li, Been Kim, and Zi Wang. 2025. QuestBench: Can LLMs ask the right question to acquire information in reasoning tasks? arXiv preprint arXiv:2503.22674 (2025)

  74. [74]

    Jiachang Liu, Dinghan Shen, Yizhe Zhang, Bill Dolan, Lawrence Carin, and Weizhu Chen. 2021. What Makes Good In-Context Examples for GPT-3? arXiv preprint arXiv:2101.06804 (2021)

  75. [75]

    Nelson F Liu, Kevin Lin, John Hewitt, Ashwin Paranjape, Michele Bevilacqua, Fabio Petroni, and Percy Liang. 2024. Lost in the middle: How language models use long contexts. Transactions of the Association for Computational Linguistics 12 (2024), 157–173

  76. [76]

    Xiaoou Liu, Tiejin Chen, Longchao Da, Chacha Chen, Zhen Lin, and Hua Wei

  77. [77]

    In Proceedings of the 31st ACM SIGKDD Conference on Knowl- edge Discovery and Data Mining V

    Uncertainty quantification and confidence calibration in large language models: A survey. In Proceedings of the 31st ACM SIGKDD Conference on Knowl- edge Discovery and Data Mining V. 2 . 6107–6117

  78. [78]

    Sasha Luccioni, Yacine Jernite, and Emma Strubell. 2024. Power Hungry Pro- cessing: Watts Driving the Cost of AI Deployment?. In Proceedings of the 2024 ACM Conference on Fairness, Accountability, and Transparency (Rio de Janeiro, Brazil) (FAccT ’24). Association for Computing Machinery, New York, NY, USA, 85–99. doi:10.1145/3630106.3658542

  79. [79]

    Donald MacKenzie. 2008. An engine, not a camera: How financial models shape markets. Mit Press

  80. [80]

    Michael Madaio, Lisa Egede, Hariharan Subramonyam, Jennifer Wort- man Vaughan, and Hanna Wallach. 2022. Assessing the fairness of ai systems: Ai practitioners’ processes, challenges, and needs for support. Proceedings of the ACM on Human-Computer Interaction 6, CSCW1 (2022), 1–26

Showing first 80 references.