pith. machine review for the scientific record. sign in

arxiv: 2604.28053 · v1 · submitted 2026-04-30 · 💻 cs.CY · cs.AI

Recognition: unknown

To Build or Not to Build? Factors that Lead to Non-Development or Abandonment of AI Systems

Authors on Pith no claims yet

Pith reviewed 2026-05-07 06:39 UTC · model grok-4.3

classification 💻 cs.CY cs.AI
keywords AI abandonmentresponsible AInon-developmentAI ethicsscoping reviewAI incident databasepractitioner surveydevelopment factors
0
0 comments X

The pith

Diverse practical factors, not just ethics, often lead organizations to abandon AI development.

A machine-rendered reading of the paper's core claim, the machinery that carries it, and where it could break.

Responsible AI research typically examines the use and impacts of AI systems after they are deployed. This paper shifts focus to the earlier decisions about whether to build such systems at all. By reviewing academic and grey literature, analyzing real-world abandonment cases from an incident database, and surveying practitioners, the authors identify a range of factors that lead to non-development or abandonment. Their taxonomy shows that while ethical concerns play a role, practical issues like resource constraints, regulatory hurdles, organizational dynamics, and development challenges are frequently central. This matters for creating more effective ways to intervene and guide AI development responsibly from the start.

Core claim

The paper establishes that decisions to abandon AI development occur throughout the lifecycle and are driven by six categories of factors: ethical concerns, stakeholder feedback, development lifecycle challenges, organizational dynamics, resource constraints, and legal/regulatory concerns. Empirical data from cases and surveys indicate that non-ethics-related levers often motivate these decisions, contrasting with the emphasis in responsible AI communities on ethical risks. Synthesizing this, the work points to gaps in research and opportunities to better support appropriate engagement or disengagement with AI projects.

What carries the argument

Taxonomy of six categories of factors contributing to AI abandonment, derived from thematic analysis and validated with case data.

If this is right

  • Responsible AI research should broaden its scope to include diverse pre-deployment factors beyond ethics.
  • Early lifecycle interventions can influence which AI systems are built.
  • Organizations can apply the taxonomy to assess projects and decide on continuation or abandonment.
  • Better understanding of these factors can improve AI governance and reduce wasted resources on doomed projects.
  • Support for disengagement decisions can lead to more responsible innovation overall.

Where Pith is reading between the lines

These are editorial extensions of the paper, not claims the author makes directly.

  • The taxonomy might be useful for similar decisions in other tech fields like biotechnology or autonomous vehicles.
  • Future work could develop decision-support tools based on these categories for AI teams.
  • If non-ethical factors dominate, policies focused only on ethics might not address the main barriers to responsible AI.
  • This highlights the need for more representative data collection on AI project decisions across different sectors and sizes of organizations.

Load-bearing premise

The sources used, including the scoping review, AI incident database cases, and practitioner survey, together give a representative picture of the actual factors driving real-world AI non-development and abandonment.

What would settle it

A broad, unbiased survey or audit of AI development projects across multiple industries and company sizes that finds ethical concerns to be the dominant reason for abandonment in the majority of cases.

Figures

Figures reproduced from arXiv: 2604.28053 by Jatinder Singh, Shreya Chappidi.

Figure 1
Figure 1. Figure 1: Overall methodology, including scoping literature workflow, used to establish and analyze categories of factors view at source ↗
read the original abstract

Responsible AI research typically focuses on examining the use and impacts of deployed AI systems. Yet, there is currently limited visibility into the pre-deployment decisions to pursue building such systems in the first place. Decisions taken in the earlier stages of development shape which systems are ultimately released, and therefore represent potential, but underexplored, points for intervention. As such, this paper investigates factors influencing AI non-development and abandonment throughout the development lifecycle. Specifically, we first perform a scoping review of academic literature, civil society resources, and grey literature including journalism and industry reports. Through thematic analysis of these sources, we develop a taxonomy of six categories of factors contributing to AI abandonment: ethical concerns, stakeholder feedback, development lifecycle challenges, organizational dynamics, resource constraints, and legal/regulatory concerns. Then, we collect data on real-world case of AI system abandonment via an AI incident database and a practitioner survey to evidence and compare factors that drive abandonment both prior to and following system deployment. While academic responsible AI communities often emphasize ethical risks as reasons to not develop AI, our empirical analysis of these cases demonstrates the diverse, and often non-ethics-related, levers that motivate organizations to abandon AI development. Synthesizing evidence from our taxonomy and related case study analyses, we identify gaps and opportunities in current responsible AI research to (1) engage with the diverse range of levers that influence organizations to abandon AI development, and (2) better support appropriate (dis)engagement with AI system development.

Editorial analysis

A structured set of objections, weighed in public.

Desk editor's note, referee report, simulated authors' rebuttal, and a circularity audit. Tearing a paper down is the easy half of reading it; the pith above is the substance, this is the friction.

Referee Report

3 major / 3 minor

Summary. The paper performs a scoping review of academic literature, civil society resources, and grey literature on AI non-development and abandonment, followed by thematic analysis to derive a taxonomy of six factor categories (ethical concerns, stakeholder feedback, development lifecycle challenges, organizational dynamics, resource constraints, and legal/regulatory concerns). It then analyzes real-world cases from an AI incident database and collects data via a practitioner survey to compare factors influencing abandonment before versus after deployment. The central claim is that responsible AI research over-emphasizes ethical risks, whereas empirical evidence from these sources shows diverse and often non-ethics-related levers drive organizations to abandon AI development; the paper concludes by identifying gaps and opportunities for responsible AI research to better engage with these levers and support appropriate disengagement.

Significance. If the empirical claims hold after addressing sampling and methodological details, this work would be significant for broadening responsible AI scholarship beyond post-deployment ethics and impacts to include pre-deployment decision points. The mixed-methods design—scoping review with thematic analysis yielding a new six-category taxonomy, supplemented by incident database cases and practitioner survey data—provides a concrete foundation for understanding a wider set of organizational levers. This synthesis of literature with primary case and survey evidence is a strength, as is the explicit identification of research gaps around supporting (dis)engagement decisions. The result could inform more balanced interventions in AI governance and practice.

major comments (3)
  1. [Scoping review and thematic analysis] Scoping review and thematic analysis section: The manuscript provides no quantitative details on the number of sources screened or included, the search strategy, inclusion/exclusion criteria, or the process used to validate the six themes and taxonomy. These omissions are load-bearing because the taxonomy underpins the subsequent empirical comparison and the claim that non-ethics factors are diverse and prominent.
  2. [AI incident database analysis] AI incident database analysis (case collection section): The selection criteria for cases, total number analyzed, and any discussion of selection bias are not provided. Incident databases are known to over-represent deployed systems with visible harms or failures and under-represent pre-deployment internal abandonments driven by resource or organizational factors; without addressing this, the reported distribution of factors and the contrast with ethics-focused literature cannot be taken as representative.
  3. [Practitioner survey] Practitioner survey section: No information is given on sample size, recruitment method, response rate, or respondent demographics. This is critical for evaluating self-selection and social-desirability biases, which directly affect the reliability of the survey evidence used to support the central claim that non-ethics levers often dominate abandonment decisions.
minor comments (3)
  1. [Taxonomy] The taxonomy would be clearer if presented in a dedicated table or figure that includes brief definitions and one illustrative example per category.
  2. [Abstract and introduction] The abstract and introduction should explicitly state the number of sources reviewed, database cases analyzed, and survey respondents to allow readers to assess scale immediately.
  3. [Discussion or limitations] A short limitations subsection discussing potential biases in the three data sources (scoping review, incident database, survey) would strengthen the manuscript even if the main claims are retained.

Simulated Author's Rebuttal

3 responses · 0 unresolved

We thank the referee for their careful reading and constructive comments, which identify important gaps in methodological transparency and potential biases. These points strengthen the paper by ensuring readers can properly evaluate the evidence. We address each major comment below and indicate planned revisions to the manuscript.

read point-by-point responses
  1. Referee: [Scoping review and thematic analysis] Scoping review and thematic analysis section: The manuscript provides no quantitative details on the number of sources screened or included, the search strategy, inclusion/exclusion criteria, or the process used to validate the six themes and taxonomy. These omissions are load-bearing because the taxonomy underpins the subsequent empirical comparison and the claim that non-ethics factors are diverse and prominent.

    Authors: We agree that the scoping review and thematic analysis section requires substantially more methodological detail to support the taxonomy. In the revised manuscript we will expand this section to report: the search strategy (databases, keywords, date range, and grey literature sources), the number of records identified, screened, and included (with a PRISMA-style flow diagram), explicit inclusion/exclusion criteria, and the thematic analysis procedure (inductive coding steps, how the six categories were iteratively refined, and validation methods such as author discussion or pilot coding). These additions will make the derivation of the taxonomy fully transparent and allow readers to assess its robustness. revision: yes

  2. Referee: [AI incident database analysis] AI incident database analysis (case collection section): The selection criteria for cases, total number analyzed, and any discussion of selection bias are not provided. Incident databases are known to over-represent deployed systems with visible harms or failures and under-represent pre-deployment internal abandonments driven by resource or organizational factors; without addressing this, the reported distribution of factors and the contrast with ethics-focused literature cannot be taken as representative.

    Authors: We acknowledge the omission of case-selection details and the need to address selection bias. In the revision we will add a dedicated subsection specifying the database used, the total number of incidents reviewed, the exact selection criteria applied, and the final number of cases analyzed. We will also include an explicit limitations paragraph discussing the known biases of incident databases (over-representation of high-visibility post-deployment failures) and how this may under-sample internal pre-deployment decisions driven by resources or organizational dynamics. We will qualify the generalizability of the database findings, note that the practitioner survey provides complementary evidence less subject to this bias, and adjust the strength of claims accordingly while retaining the multi-method contribution. revision: yes

  3. Referee: [Practitioner survey] Practitioner survey section: No information is given on sample size, recruitment method, response rate, or respondent demographics. This is critical for evaluating self-selection and social-desirability biases, which directly affect the reliability of the survey evidence used to support the central claim that non-ethics levers often dominate abandonment decisions.

    Authors: We agree that survey methodology details are essential for evaluating bias and reliability. In the revised manuscript we will report the sample size, recruitment channels (e.g., professional networks, AI practitioner communities, conferences), response rate, and respondent demographics (roles, experience, organization type and size). We will also add a discussion of self-selection bias (respondents may skew toward those interested in responsible AI) and social-desirability bias, describe mitigation steps (anonymous format, neutral wording), and explain how these factors are considered when interpreting the finding that non-ethics levers are prominent. This will allow readers to assess the survey evidence appropriately. revision: yes

Circularity Check

0 steps flagged

No circularity in empirical synthesis and taxonomy development

full rationale

The paper's derivation proceeds from an external scoping review of literature/reports (used to thematically derive a six-category taxonomy) to independent collection of new case data from an AI incident database and a practitioner survey (used to evidence and compare factors). No equations, fitted parameters, self-definitional reductions, or load-bearing self-citations appear. The claim that empirical cases show diverse non-ethics levers rests on synthesis of external sources plus newly gathered data rather than any input being renamed or forced as output by construction. This is a standard qualitative empirical structure with no reduction to the paper's own inputs.

Axiom & Free-Parameter Ledger

0 free parameters · 2 axioms · 0 invented entities

The paper introduces no new mathematical parameters or postulated entities. It rests on standard qualitative-research assumptions about source representativeness and thematic validity.

axioms (2)
  • domain assumption The selected academic, civil society, and grey literature sources collectively capture the main factors influencing AI development decisions.
    The scoping review treats these sources as sufficient to generate a comprehensive taxonomy.
  • domain assumption The AI incident database entries and practitioner survey responses accurately reflect real organizational reasons for abandonment.
    The empirical comparison depends on these data sources being representative and unbiased.

pith-pipeline@v0.9.0 · 5571 in / 1465 out tokens · 77621 ms · 2026-05-07T06:39:16.766436+00:00 · methodology

discussion (0)

Sign in with ORCID, Apple, or X to comment. Anyone can read and Pith papers without signing in.

Reference graph

Works this paper leans on

178 extracted references · 141 canonical work pages · 1 internal anchor

  1. [1]

    Mohamed Abdalla. 2025. $100,000 or the Robot Gets It! Tech Workers’ Resistance Guide: Tech Worker Actions, History, Risks, Impacts, and the Case for a Radical Flank.Proceedings of the AAAI/ACM Conference on AI, Ethics, and Society8, 1 (Oct. 2025), 2–14. doi:10.1609/aies.v8i1.36526

  2. [2]

    Jiang, Cella Sum, Maarten Sap, and Sauvik Das

    William Agnew, Harry H. Jiang, Cella Sum, Maarten Sap, and Sauvik Das. 2024. Data Defenses Against Large Language Models. (2024). doi:10.48550/ARXIV.2410.13138 Publisher: arXiv Version Number: 1

  3. [3]

    Zo Ahmed. 2024. AI coding assistants do not boost productivity or prevent burnout, study finds. https://www.techspot.com/news/ 104945-ai-coding-assistants-do-not-boost-productivity-or.html

  4. [4]

    AIAAIC. 2025. AI, Algorithmic and Automation Incidents and Controversies (AIAAIC). https://www.aiaaic.org/

  5. [5]

    Leah Hope Ajmani, Nuredin Ali Abdelkadir, and Stevie Chancellor. 2025. Secondary Stakeholders in AI: Fighting for, Brokering, and Navigating Agency. InProceedings of the 2025 ACM Conference on Fairness, Accountability, and Transparency (FAccT ’25). Association for Computing Machinery, New York, NY, USA, 1095–1107. doi:10.1145/3715275.3732071

  6. [6]

    Omiye, Ilies Ghanzouri, John Hanson Cabot, and Elsie Gyang Ross

    Saeed Amal, Lida Safarnejad, Jesutofunmi A. Omiye, Ilies Ghanzouri, John Hanson Cabot, and Elsie Gyang Ross. 2022. Use of Multi- Modal Data and Machine Learning to Improve Cardiovascular Disease Care.Frontiers in Cardiovascular Medicine9 (April 2022), 840262. doi:10.3389/fcvm.2022.840262

  7. [7]

    Archer Amon, Zhipeng Yin, Zichong Wang, Avash Palikhe, Tongjia Yu, and Wenbin Zhang. 2026. Uncertain Boundaries: Multidisciplinary Approaches to Copyright Issues in Generative AI.SIGKDD Explor. Newsl.27, 2 (Dec. 2026), 1–12. doi:10.1145/3787470.3787472

  8. [8]

    Markus Anderljung, Joslyn Barnhart, Anton Korinek, Jade Leung, Cullen O’Keefe, Jess Whittlestone, Shahar Avin, Miles Brundage, Justin Bullock, Duncan Cass-Beggs, Ben Chang, Tantum Collins, Tim Fist, Gillian Hadfield, Alan Hayes, Lewis Ho, Sara Hooker, Eric Horvitz, Noam Kolt, Jonas Schuett, Yonadav Shavit, Divya Siddarth, Robert Trager, and Kevin Wolf. 20...

  9. [9]

    Dani Anguiano and Lois Beckett. 2023. How Hollywood writers triumphed over AI – and why it matters.The Guardian(Oct. 2023). https://www.theguardian.com/culture/2023/oct/01/hollywood-writers-strike-artificial-intelligence

  10. [10]

    Guido Appenzeller, Matt Bornstein, and Martin Casado. 2023. Navigating the High Cost of AI Compute. https://a16z.com/navigating- the-high-cost-of-ai-compute/

  11. [11]

    Zahra Ashktorab, Michael Desmond, Qian Pan, James Johnson, Michelle Brachman, Casey Dugan, Marina Danilevsky, and Werner Geyer. 2025. Emerging Reliance Behaviors in Human-AI Content Grounded Data Generation: The Role of Cognitive Forcing Functions and Hallucinations.Proceedings of the 4th Annual Symposium on Human-Computer Interaction for Work(June 2025),...

  12. [12]

    Rob Ashmore, Radu Calinescu, and Colin Paterson. 2022. Assuring the Machine Learning Lifecycle: Desiderata, Methods, and Challenges. Comput. Surveys54, 5 (June 2022), 1–39. doi:10.1145/3453444

  13. [13]

    Deepika Badampudi, Claes Wohlin, and Kai Petersen. 2016. Software component decision-making: In-house, OSS, COTS or outsourcing - A systematic literature review.Journal of Systems and Software121 (Nov. 2016), 105–124. doi:10.1016/j.jss.2016.07.027

  14. [14]

    Baumer and M

    Eric P.S. Baumer and M. Six Silberman. 2011. When the implication is not to design (technology). InProceedings of the SIGCHI Conference on Human Factors in Computing Systems (CHI ’11). Association for Computing Machinery, New York, NY, USA, 2271–2274. doi:10.1145/1978942.1979275 Factors that Lead to Non-Development or Abandonment of AI Systems FAccT ’26, ...

  15. [15]

    Andrew Bell, Oded Nov, and Julia Stoyanovich. 2023. Think about the stakeholders first! Toward an algorithmic transparency playbook for regulatory compliance.Data & Policy5 (Jan. 2023), e12. doi:10.1017/dap.2023.8

  16. [16]

    Digital medicine and the curse of dimensionality,

    Visar Berisha, Chelsea Krantsevich, P. Richard Hahn, Shira Hahn, Gautam Dasarathy, Pavan Turaga, and Julie Liss. 2021. Digital medicine and the curse of dimensionality.npj Digital Medicine4, 1 (Oct. 2021), 153. doi:10.1038/s41746-021-00521-5

  17. [17]

    Castro, Ryutaro Tanno, Anton Schwaighofer, Kerem C

    Mélanie Bernhardt, Daniel C. Castro, Ryutaro Tanno, Anton Schwaighofer, Kerem C. Tezcan, Miguel Monteiro, Shruthi Bannur, Matthew P. Lungren, Aditya Nori, Ben Glocker, Javier Alvarez-Valle, and Ozan Oktay. 2022. Active label cleaning for improved dataset quality under resource constraints.Nature Communications13, 1 (March 2022), 1161. doi:10.1038/s41467-0...

  18. [18]

    Umang Bhatt and Holli Sargeant. 2024. When Should Algorithms Resign? A Proposal for AI Governance.Computer57, 10 (Oct. 2024), 99–103. doi:10.1109/MC.2024.3431328

  19. [19]

    Scheirer

    Stella Biderman and Walter J. Scheirer. 2021. Pitfalls in Machine Learning Research: Reexamining the Development Cycle. doi:10. 48550/arXiv.2011.02832 arXiv:2011.02832 [cs]

  20. [20]

    Emily Black, Rakshit Naidu, Rayid Ghani, Kit Rodolfa, Daniel Ho, and Hoda Heidari. 2023. Toward Operationalizing Pipeline-aware ML Fairness: A Research Agenda for Developing Practical Guidelines and Tools. InEquity and Access in Algorithms, Mechanisms, and Optimization. ACM, Boston MA USA, 1–11. doi:10.1145/3617694.3623259

  21. [21]

    William Boag, Harini Suresh, Bianca Lepe, and Catherine D’Ignazio. 2022. Tech Worker Organizing for Power and Accountability. In Proceedings of the 2022 ACM Conference on Fairness, Accountability, and Transparency (FAccT ’22). Association for Computing Machinery, New York, NY, USA, 452–463. doi:10.1145/3531146.3533111

  22. [22]

    Virginia Braun and Victoria Clarke. 2006. Using thematic analysis in psychology.Qualitative Research in Psychology3, 2 (Jan. 2006), 77–101. doi:10.1191/1478088706qp063oa Publisher: Routledge _eprint: https://doi.org/10.1191/1478088706qp063oa

  23. [23]

    2025.Deep Dive: Economics of the AI Build-Out

    Claire Burch. 2025.Deep Dive: Economics of the AI Build-Out. Technical Report. Contrary Research. https://research.contrary.com/ report/the-economics-of-ai-build-out

  24. [24]

    Matt Burgess. 2024. How to Stop Your Data From Being Used to Train AI.Wired(Oct. 2024). https://www.wired.com/story/how-to- stop-your-data-from-being-used-to-train-ai/ Section: tags

  25. [25]

    Garance Burke and Hilke Schellmann. 2024. Researchers say an AI-powered transcription tool used in hospitals invents things no one ever said. https://apnews.com/article/ai-artificial-intelligence-health-business-90020cdf5fa16c79ca2e5b6c4c9bbb14 Section: Technology

  26. [26]

    Center for Labor and a Just Economy. 2025. Bargaining Victory: SEIU Local 688 won strong protections for workers and the public in AI agreement with Pennsylvania Governor Shapiro. https://clje.law.harvard.edu/news/bargaining-victory-seiu-local-688-won-strong- protections-for-workers-and-the-public-in-ai-agreement-with-pennsylvania-governor-shapiro/

  27. [27]

    Belue, Stephanie A

    Shreya Chappidi, Mason J. Belue, Stephanie A. Harmon, Sarisha Jagasia, Ying Zhuge, Erdal Tasci, Baris Turkbey, Jatinder Singh, Kevin Camphausen, and Andra V. Krauze. 2025. From manual clinical criteria to machine learning algorithms: Comparing outcome endpoints derived from diverse electronic health record data modalities.PLOS Digital Health4, 5 (May 2025...

  28. [28]

    Shreya Chappidi, Jennifer Cobbe, Chris Norval, Anjali Mazumder, and Jatinder Singh. 2025. Accountability Capture: How Record-Keeping to Support AI Transparency and Accountability (Re)shapes Algorithmic Oversight. doi:10.48550/arXiv.2510.04609 arXiv:2510.04609 [cs]

  29. [29]

    Hyesun Choung, Prabu David, and Arun Ross. 2023. Trust in AI and Its Role in the Acceptance of AI Technologies.International Journal of Human–Computer Interaction39, 9 (May 2023), 1727–1739. doi:10.1080/10447318.2022.2050543

  30. [30]

    Jennifer Cobbe, Michelle Seng Ah Lee, and Jatinder Singh. 2021. Reviewable Automated Decision-Making: A Framework for Accountable Algorithmic Systems. InProceedings of the 2021 ACM Conference on Fairness, Accountability, and Transparency. ACM, Virtual Event Canada, 598–609. doi:10.1145/3442188.3445921

  31. [31]

    Jennifer Cobbe and Jatinder Singh. 2021. Artificial intelligence as a service: Legal responsibilities, liabilities, and policy challenges. Computer Law & Security Review42 (2021), 105573. doi:10.1016/j.clsr.2021.105573

  32. [32]

    Jennifer Cobbe, Michael Veale, and Jatinder Singh. 2023. Understanding accountability in algorithmic supply chains. In2023 ACM Conference on Fairness, Accountability, and Transparency. ACM, Chicago IL USA, 1186–1197. doi:10.1145/3593013.3594073

  33. [33]

    Eric Corbett, Remi Denton, and Sheena Erete. 2023. Power and Public Participation in AI. InProceedings of the 3rd ACM Conference on Equity and Access in Algorithms, Mechanisms, and Optimization (EAAMO ’23). Association for Computing Machinery, New York, NY, USA, 1–13. doi:10.1145/3617694.3623228

  34. [34]

    Nicholas Kluge Corrêa, Camila Galvão, James William Santos, Carolina Del Pino, Edson Pontes Pinto, Camila Barbosa, Diogo Massmann, Rodrigo Mambrini, Luiza Galvão, Edmund Terem, and Nythamar De Oliveira. 2023. Worldwide AI ethics: A review of 200 guidelines and recommendations for AI governance.Patterns4, 10 (Oct. 2023), 100857. doi:10.1016/j.patter.2023.100857

  35. [35]

    Cross, Michael A

    James L. Cross, Michael A. Choma, and John A. Onofrey. 2024. Bias in medical AI: Implications for clinical decision-making.PLOS Digital Health3, 11 (Nov. 2024), e0000651. doi:10.1371/journal.pdig.0000651 FAccT ’26, June 25–28, 2026, Montreal, QC, Canada Chappidi & Singh

  36. [36]

    Data Center Watch. 2025. $64 billion of data center projects have been blocked or delayed amid local opposition. https://www. datacenterwatch.org/report

  37. [37]

    Fred D. Davis. 1989. Perceived Usefulness, Perceived Ease of Use, and User Acceptance of Information Technology.MIS Quarterly13, 3 (1989), 319–340. doi:10.2307/249008 Publisher: Management Information Systems Research Center, University of Minnesota

  38. [38]

    Matthew DeCamp and Charlotta Lindvall. 2020. Latent bias and the implementation of artificial intelligence in medicine.Journal of the American Medical Informatics Association27, 12 (Dec. 2020), 2020–2023. doi:10.1093/jamia/ocaa094

  39. [39]

    Innovation & Technology Department for Science and Media & Sport Department for Digital, Culture. 2021. Quantifying the UK Data Skills Gap - Full report. https://www.gov.uk/government/publications/quantifying-the-uk-data-skills-gap/quantifying-the-uk-data- skills-gap-full-report

  40. [40]

    Alicia DeVrio, Motahhare Eslami, and Kenneth Holstein. 2024. Building, Shifting, & Employing Power: A Taxonomy of Responses From Below to Algorithmic Harm. InProceedings of the 2024 ACM Conference on Fairness, Accountability, and Transparency (FAccT ’24). Association for Computing Machinery, New York, NY, USA, 1093–1106. doi:10.1145/3630106.3658958

  41. [41]

    Smith, Nicole DeCario, and Will Buchanan

    Jesse Dodge, Taylor Prewitt, Remi Tachet des Combes, Erika Odmark, Roy Schwartz, Emma Strubell, Alexandra Sasha Luccioni, Noah A. Smith, Nicole DeCario, and Will Buchanan. 2022. Measuring the Carbon Intensity of AI in Cloud Instances. InProceedings of the 2022 ACM Conference on Fairness, Accountability, and Transparency (FAccT ’22). Association for Comput...

  42. [42]

    Upol Ehsan, Ranjit Singh, Jacob Metcalf, and Mark Riedl. 2022. The Algorithmic Imprint. In2022 ACM Conference on Fairness Accountability and Transparency. ACM, Seoul Republic of Korea, 1305–1317. doi:10.1145/3531146.3533186

  43. [43]

    Maria Eriksson, Erasmo Purificato, Arman Noroozian, Joao Vinagre, Guillaume Chaslot, Emilia Gomez, and David Fernandez-Llorca

  44. [44]

    Can we trust AI benchmarks? An interdisciplinary review of current issues in AI evaluation.arXiv preprint arXiv:2502.06559, 2025

    Can We Trust AI Benchmarks? An Interdisciplinary Review of Current Issues in AI Evaluation. doi:10.48550/arXiv.2502.06559 arXiv:2502.06559 [cs]

  45. [45]

    et al.: Dermatologist- level classification of skin cancer with deep neural networks

    Andre Esteva, Brett Kuprel, Roberto A. Novoa, Justin Ko, Susan M. Swetter, Helen M. Blau, and Sebastian Thrun. 2017. Dermatologist- level classification of skin cancer with deep neural networks.Nature542, 7639 (Feb. 2017), 115–118. doi:10.1038/nature21056

  46. [46]

    Sheryl Estrada. 2025. MIT report: 95% of generative AI pilots at companies are failing. https://fortune.com/2025/08/18/mit-report-95- percent-generative-ai-pilots-at-companies-failing-cfo/

  47. [47]

    Aidatul Fitriyah and Daryna Dzemish Abdulovna. 2024. EU’s AI Regulation Approaches and Their Implication for Human Rights. Media Iuris7, 3 (Oct. 2024), 417–438. doi:10.20473/mi.v7i3.62050

  48. [48]

    Flores, Alejandro Schuler, Anne Verena Eberhard, Jeffrey W

    Alyssa M. Flores, Alejandro Schuler, Anne Verena Eberhard, Jeffrey W. Olin, John P. Cooke, Nicholas J. Leeper, Nigam H. Shah, and Elsie G. Ross. 2021. Unsupervised Learning for Automated Detection of Coronary Artery Disease Subgroups.Journal of the American Heart Association10, 23 (Dec. 2021), e021976. doi:10.1161/JAHA.121.021976

  49. [49]

    Luciano Floridi and Josh Cowls. 2019. A Unified Framework of Five Principles for AI in Society.Harvard Data Science Review1, 1 (July 2019). doi:10.1162/99608f92.8cd550d1 Publisher: The MIT Press

  50. [50]

    James Fodor. 2025. Line Goes Up? Inherent Limitations of Benchmarks for Evaluating Large Language Models. doi:10.48550/arXiv.2502. 14318 arXiv:2502.14318 [cs]

  51. [51]

    Riccardo Fogliato, Alexandra Chouldechova, and Max G’Sell. 2020. Fairness Evaluation in Presence of Biased Noisy Labels. InProceedings of the Twenty Third International Conference on Artificial Intelligence and Statistics. PMLR, 2325–2336. https://proceedings.mlr.press/ v108/fogliato20a.html ISSN: 2640-3498

  52. [52]

    Freeman and James L

    Richard B. Freeman and James L. Medoff. 1979. The Two Faces of Unionism. https://papers.ssrn.com/abstract=261218

  53. [53]

    Would You Want an AI Tutor?

    Caterina Fuligni, Daniel Domínguez Figaredo, and Julia Stoyanovich. 2025. "Would You Want an AI Tutor?"Understanding Stakeholder Perceptions of LLM-based Systems in the Classroom. https://www.semanticscholar.org/paper/ 469a07c98826b0d937ae846b4e1c3ca0f1d0d84f

  54. [54]

    Vahid Garousi, Michael Felderer, and Mika V. Mäntylä. 2019. Guidelines for including grey literature and conducting multivocal literature reviews in software engineering.Information and Software Technology106 (Feb. 2019), 101–121. doi:10.1016/j.infsof.2018.09.006

  55. [55]

    Mäntylä, and Aurona Rainer

    Vahid Garousi, Michael Felderer, Mika V. Mäntylä, and Austen Rainer. 2020. Benefitting from the Grey Literature in Software Engineering Research. InContemporary Empirical Methods in Software Engineering, Michael Felderer and Guilherme Horta Travassos (Eds.). Springer International Publishing, Cham, 385–413. doi:10.1007/978-3-030-32489-6_14

  56. [56]

    Gartner. 2024. Gartner Predicts 30% of Generative AI Projects Will Be Abandoned After Proof of Concept By End of

  57. [57]

    https://www.gartner.com/en/newsroom/press-releases/2024-07-29-gartner-predicts-30-percent-of-generative-ai-projects- will-be-abandoned-after-proof-of-concept-by-end-of-2025

  58. [58]

    Timnit Gebru, Jamie Morgenstern, Briana Vecchione, Jennifer Wortman Vaughan, Hanna Wallach, Hal Daumé III, and Kate Crawford

  59. [59]

    doi:10.48550/arXiv.1803.09010 arXiv:1803.09010 [cs]

    Datasheets for Datasets. http://arxiv.org/abs/1803.09010 arXiv:1803.09010 [cs]

  60. [60]

    Slota, Sherri R

    Apoorva Gondimalla, Varshinee Sreekanth, Govind Joshi, Whitney Nelson, Eunsol Choi, Stephen C. Slota, Sherri R. Greenberg, Kenneth R. Fleischmann, and Min Kyung Lee. 2024. Aligning Data with the Goals of an Organization and Its Workers: Designing Data Labeling for Social Service Case Notes. InProceedings of the CHI Conference on Human Factors in Computing...

  61. [61]

    Gray and Siddharth Suri

    Mary L. Gray and Siddharth Suri. 2019. Ghost Work: How to Stop Silicon Valley from Building a New Global Underclass. https://www.semanticscholar.org/paper/Ghost-Work%3A-How-to-Stop-Silicon-Valley-from-a-New-Gray-Suri/ 47f936331872d75df74473883bb65068c14fa7da

  62. [62]

    Daniel Greene, Anna Lauren Hoffmann, and Luke Stark. 2019. Better, Nicer, Clearer, Fairer: A Critical Assessment of the Movement for Ethical Artificial Intelligence and Machine Learning. InProceedings of the 52nd Hawaii International Conference on System Sciences (Critical and Ethical Studies of Digital and Social Media). doi:10.24251/HICSS.2019.258

  63. [63]

    Trisha Greenhalgh, Joseph Wherton, Chrysanthi Papoutsi, Jennifer Lynch, Gemma Hughes, Christine A’Court, Susan Hinder, Nick Fahy, Rob Procter, and Sara Shaw. 2017. Beyond Adoption: A New Framework for Theorizing and Evaluating Nonadoption, Abandonment, and Challenges to the Scale-Up, Spread, and Sustainability of Health and Care Technologies.Journal of Me...

  64. [64]

    Luke Guerdan, Amanda Coston, Zhiwei Steven Wu, and Kenneth Holstein. 2023. Ground(less) Truth: A Causal Framework for Proxy Labels in Human-Algorithm Decision-Making. InProceedings of the 2023 ACM Conference on Fairness, Accountability, and Transparency (FAccT ’23). Association for Computing Machinery, New York, NY, USA, 688–704. doi:10.1145/3593013.3594036

  65. [65]

    Luke Guerdan, Devansh Saxena, Stevie Chancellor, Zhiwei Steven Wu, and Kenneth Holstein. 2025. Measurement as Bricolage: Examining How Data Scientists Construct Target Variables for Predictive Modeling Tasks. (2025). doi:10.48550/ARXIV.2507.02819 Publisher: arXiv Version Number: 3

  66. [66]

    Varshney, S

    Hangzhi Guo, Pranav Narayanan Venkit, Eunchae Jang, Mukund Srinath, Wenbo Zhang, Bonam Mingole, Vipul Gupta, Kush R. Varshney, S. Shyam Sundar, and Amulya Yadav. 2024. Hey GPT, Can You be More Racist? Analysis from Crowdsourced Attempts to Elicit Biased Content from Generative AI. doi:10.48550/arXiv.2410.15467 arXiv:2410.15467 [cs]

  67. [67]

    Lungren, and Pranav Rajpurkar

    Zach Harned, Matthew P. Lungren, and Pranav Rajpurkar. 2019. Machine Vision, Medical AI, and Malpractice. https://papers.ssrn. com/abstract=3442249

  68. [68]

    Will Henshall. 2024. The Billion-Dollar Price Tag of Building AI. https://time.com/6984292/cost-artificial-intelligence-compute-epoch- report/

  69. [69]

    Eleanore Hickman and Martin Petrin. 2021. Trustworthy AI and Corporate Governance: The EU’s Ethics Guidelines for Trustworthy Artificial Intelligence from a Company Law Perspective.European Business Organization Law Review22, 4 (Dec. 2021), 593–625. doi:10.1007/s40804-021-00224-0

  70. [70]

    Denniston, Christopher J

    Henry David Jeffry Hogg, Mohaimen Al-Zubaidy, Technology Enhanced Macular Services Study Reference Group, James Talks, Alastair K. Denniston, Christopher J. Kelly, Johann Malawana, Chrysanthi Papoutsi, Marion Dawn Teare, Pearse A. Keane, Fiona R. Beyer, and Gregory Maniatopoulos. 2023. Stakeholder Perspectives of Clinical Artificial Intelligence Implement...

  71. [71]

    Tomasz Hollanek, Yulu Pi, Cosimo Fiorini, Virginia Vignali, Dorian Peters, and Eleanor Drage. 2025. A Toolkit for Compliance, a Toolkit for Justice: Drawing on Cross-sectoral Expertise to Develop a Pro-justice EU AI Act Toolkit. InProceedings of the 2025 ACM Conference on Fairness, Accountability, and Transparency (FAccT ’25). Association for Computing Ma...

  72. [72]

    Tomasz Hollanek, Yulu Pi, Dorian Peters, Selen Yakar, and Eleanor Drage. 2025. The EU AI Act in Development Practice: A Pro-justice Approach. doi:10.48550/arXiv.2504.20075 arXiv:2504.20075 [cs]

  73. [73]

    Cen, Isabella Struckman, Andrew Ilyas, Luis Videgaray, and Aleksander Mądry

    Aspen Hopkins, Sarah H. Cen, Isabella Struckman, Andrew Ilyas, Luis Videgaray, and Aleksander Mądry. 2025. AI Supply Chains: An Emerging Ecosystem of AI Actors, Products, and Services.Proceedings of the AAAI/ACM Conference on AI, Ethics, and Society8, 2 (Oct. 2025), 1266–1277. doi:10.1609/aies.v8i2.36628

  74. [74]

    It’s Just a Wild, Wild West

    Anna Ida Hudig, Emma Kallina, and Jatinder Singh. 2026. “It’s Just a Wild, Wild West”: Harnessing Public Procurement as an AI Governance Mechanism. InProceedings of the 2026 CHI Conference on Human Factors in Computing Systems. ACM, Barcelona, Spain, 22. doi:10.1145/3772318.3791968

  75. [75]

    Wiebke Hutiri, Orestis Papakyriakopoulos, and Alice Xiang. 2024. Not My Voice! A Taxonomy of Ethical and Safety Harms of Speech Generators. InProceedings of the 2024 ACM Conference on Fairness, Accountability, and Transparency (FAccT ’24). Association for Computing Machinery, New York, NY, USA, 359–376. doi:10.1145/3630106.3658911

  76. [76]

    Effectiveness and Safety of Lattice Radiotherapy in Treating Large Volume Tumors: A Systematic Review and Meta -Analysis,

    IBM Institute for Business Value. 2025.2025 CEO Study - 5 mindshifts to supercharge business growth. Technical Report 32nd edition. https://www.ibm.com/downloads/documents/us-en/12f5a711174dc2ac

  77. [77]

    Irina Ivanova. 2025. As Klarna flips from AI-first to hiring people again, a new landmark survey reveals most AI projects fail to deliver. https://fortune.com/2025/05/09/klarna-ai-humans-return-on-investment/

  78. [78]

    Abigail Z. Jacobs. 2021. Measurement as governance in and for responsible AI. http://arxiv.org/abs/2109.05658 arXiv:2109.05658 [cs]

  79. [79]

    Bender, Timnit Gebru, Angelina McMillan-Major, and Shmargaret Shmitchell

    Abigail Z. Jacobs and Hanna Wallach. 2021. Measurement and Fairness. InProceedings of the 2021 ACM Conference on Fairness, Accountability, and Transparency (FAccT ’21). Association for Computing Machinery, New York, NY, USA, 375–385. doi:10.1145/3442188. 3445901 FAccT ’26, June 25–28, 2026, Montreal, QC, Canada Chappidi & Singh

  80. [80]

    Pierre Le Jeune, Jiaen Liu, Luca Rossi, and Matteo Dora. 2025. RealHarm: A Collection of Real-World Language Model Application Failures. (2025). doi:10.48550/ARXIV.2504.10277 Version Number: 1

Showing first 80 references.