arxiv: 2112.04359 · v1 · submitted 2021-12-08 · 💻 cs.CL · cs.AI· cs.CY

Recognition: 2 theorem links

· Lean Theorem

Ethical and social risks of harm from Language Models

Abeba Birhane, Atoosa Kasirzadeh, Borja Balle, Conor Griffin, Courtney Biles, Geoffrey Irving, Iason Gabriel, John Mellor, Jonathan Uesato, Julia Haas, Laura Rimell, Laura Weidinger, Lisa Anne Hendricks, Maribeth Rauh, Mia Glaese, Myra Cheng, Po-Sen Huang, Sasha Brown, Sean Legassick, Tom Stepleton, Will Hawkins, William Isaac, Zac Kenton

Authors on Pith no claims yet

Pith reviewed 2026-05-11 18:19 UTC · model grok-4.3

classification 💻 cs.CL cs.AIcs.CY

keywords language modelsethical riskssocial harmsresponsible AIdiscriminationmisinformationAI safetymitigation strategies

0 comments

The pith

Language models pose 21 specific ethical and social risks across six main categories that require mitigation through organizational action.

A machine-rendered reading of the paper's core claim, the machinery that carries it, and where it could break.

The paper structures the risk landscape for large-scale language models by drawing on multidisciplinary literature to identify harms. It groups these into six areas: discrimination and toxicity, information hazards, misinformation, malicious uses, human-computer interaction problems, and automation with environmental impacts. The analysis details origins for each risk and points to mitigation approaches. It stresses that outlining these risks supports responsible innovation and calls for expanded evaluation methods.

Core claim

Large-scale language models carry risks including perpetuation of stereotypes and exclusion, leaks of private data or inference of sensitive information, generation of false or misleading content that erodes trust, use by malicious actors to cause harm, unsafe or deceptive interactions in conversational agents, and broader effects like job displacement and environmental costs that may disproportionately affect certain groups. The paper reviews 21 such risks in depth, discusses their points of origin, and outlines potential mitigations while highlighting the need for organizational responsibilities and collaborative efforts.

What carries the argument

The six risk areas framework that organizes harms from language models into discrimination/exclusion/toxicity, information hazards, misinformation harms, malicious uses, human-computer interaction harms, and automation/access/environmental harms, enabling systematic analysis and mitigation planning.

If this is right

Organizations bear responsibility for implementing mitigations tailored to each risk category.
Collaboration across stakeholders is required to address the full set of risks effectively.
Further research should expand toolkits for assessing and evaluating the 21 risks in language models.
Responsible development of language models requires in-depth understanding of their points of origin and potential knock-on effects.

Where Pith is reading between the lines

These are editorial extensions of the paper, not claims the author makes directly.

The taxonomy could serve as a checklist for auditing specific language model deployments in practice.
It implies that interdisciplinary teams are needed to translate these risk categories into concrete technical safeguards.
If widely adopted, the structure might influence standards for evaluating new models before release.

Load-bearing premise

The analysis assumes that the identified risks are grounded in established literature from multiple disciplines and that outlining them will foster responsible innovation without new empirical validation for each risk's likelihood or severity.

What would settle it

A large-scale empirical study of deployed language models that finds zero instances of discrimination, data leakage, misinformation generation, or the other listed harms in real user interactions would challenge the paper's risk landscape.

read the original abstract

This paper aims to help structure the risk landscape associated with large-scale Language Models (LMs). In order to foster advances in responsible innovation, an in-depth understanding of the potential risks posed by these models is needed. A wide range of established and anticipated risks are analysed in detail, drawing on multidisciplinary expertise and literature from computer science, linguistics, and social sciences. We outline six specific risk areas: I. Discrimination, Exclusion and Toxicity, II. Information Hazards, III. Misinformation Harms, V. Malicious Uses, V. Human-Computer Interaction Harms, VI. Automation, Access, and Environmental Harms. The first area concerns the perpetuation of stereotypes, unfair discrimination, exclusionary norms, toxic language, and lower performance by social group for LMs. The second focuses on risks from private data leaks or LMs correctly inferring sensitive information. The third addresses risks arising from poor, false or misleading information including in sensitive domains, and knock-on risks such as the erosion of trust in shared information. The fourth considers risks from actors who try to use LMs to cause harm. The fifth focuses on risks specific to LLMs used to underpin conversational agents that interact with human users, including unsafe use, manipulation or deception. The sixth discusses the risk of environmental harm, job automation, and other challenges that may have a disparate effect on different social groups or communities. In total, we review 21 risks in-depth. We discuss the points of origin of different risks and point to potential mitigation approaches. Lastly, we discuss organisational responsibilities in implementing mitigations, and the role of collaboration and participation. We highlight directions for further research, particularly on expanding the toolkit for assessing and evaluating the outlined risks in LMs.

Editorial analysis

A structured set of objections, weighed in public.

Desk editor's note, referee report, simulated authors' rebuttal, and a circularity audit. Tearing a paper down is the easy half of reading it; the pith above is the substance, this is the friction.

Desk Editor's Note private letter to a colleague

This paper delivers a clear taxonomy of 21 risks from language models across six areas, pulled together from existing multidisciplinary literature without new experiments or data.

read the letter

This paper's main value is the structured breakdown of risks. It groups them into six areas—discrimination and toxicity, information hazards, misinformation, malicious uses, human-computer interaction issues, and automation plus environmental harms—and walks through 21 specific ones with origins and mitigation pointers. The synthesis draws on computer science, linguistics, and social science sources, which gives it a broad base without overreaching into unproven claims. It closes with notes on organizational roles and the need for better evaluation methods, which keeps the piece practical rather than purely diagnostic. That organization makes the risk space easier to navigate than scattered prior discussions. The writing stays direct and cites its sources explicitly, so readers can check the grounding for each point. No circular logic appears; the argument rests on external references. The main limitation is the lack of new empirical checks or severity rankings. Some risks may overlap in practice or shift as models change, and the piece does not quantify likelihoods or test mitigations. That fits a review format but leaves the taxonomy as a starting map rather than a tested model. This is useful for teams at AI labs, policy groups, or ethics researchers who need a shared reference for spotting gaps in their work. Readers already familiar with the field will find it efficient for cross-checking concerns. I would send it for peer review. The synthesis is solid enough to serve as a reference point, and feedback could tighten any overlaps or add recent citations without major rework.

Referee Report

0 major / 1 minor

Summary. The manuscript provides a structured review of ethical and social risks from large language models (LMs), categorizing them into six areas: I. Discrimination, Exclusion and Toxicity; II. Information Hazards; III. Misinformation Harms; V. Malicious Uses; V. Human-Computer Interaction Harms; VI. Automation, Access, and Environmental Harms. It reviews 21 risks in depth by drawing on multidisciplinary literature, discusses origins and mitigations, organizational responsibilities, collaboration needs, and future research directions to support responsible innovation.

Significance. If the taxonomy and analysis hold, the paper makes a significant contribution by synthesizing existing multidisciplinary literature into an organizing framework for LM risks. This can help guide responsible development and policy in AI. Credit is due for the explicit coverage of mitigation approaches, points of origin, and calls for participation and further empirical work on risk assessment, which are appropriate strengths for a synthesis review without new data.

minor comments (1)

[Abstract] Abstract: The risk areas are listed as I, II, III, V, V, VI. This skips IV and duplicates V; renumbering to I-VI would improve clarity and professionalism.

Simulated Author's Rebuttal

0 responses · 0 unresolved

We thank the referee for their positive and accurate summary of our manuscript, as well as for the recommendation to accept. We are pleased that the taxonomy of 21 risks across the six areas, along with the discussion of origins, mitigations, and future directions, is viewed as a significant contribution to responsible innovation in language models.

Circularity Check

0 steps flagged

No significant circularity

full rationale

The paper is a multidisciplinary literature review that synthesizes 21 risks across six areas from external sources in computer science, linguistics, and social sciences. No new derivations, equations, parameter fits, or predictions are introduced; the taxonomy is presented explicitly as an organizing framework drawn from established literature rather than constructed from the paper's own assumptions. No self-citations serve as load-bearing premises, and all claims trace to independent prior work without reduction to the paper's inputs by definition or construction.

Axiom & Free-Parameter Ledger

0 free parameters · 0 axioms · 0 invented entities

This is a non-mathematical review paper that relies on domain knowledge from social sciences and computer science rather than introducing new parameters, axioms, or entities.

pith-pipeline@v0.9.0 · 5705 in / 1122 out tokens · 65655 ms · 2026-05-11T18:19:29.986498+00:00 · methodology

discussion (0)

Lean theorems connected to this paper

Citations machine-checked in the Pith Canon. Every link opens the source theorem in the public Lean library.

Cost.FunctionalEquation washburn_uniqueness_aczel unclear
The analysis assumes that the identified risks are grounded in established literature from multiple disciplines and that outlining them will foster responsible innovation, without providing new empirical validation for each risk's likelihood or severity.

Forward citations

Cited by 44 Pith papers

Reviewed papers in the Pith corpus that reference this work. Sorted by Pith novelty score.

VoxSafeBench: Not Just What Is Said, but Who, How, and Where
cs.SD 2026-04 unverdicted novelty 8.0

VoxSafeBench reveals that speech language models recognize social norms from text but fail to apply them when acoustic cues like speaker or scene determine the appropriate response.
Mechanism Plausibility in Generative Agent-Based Modeling
cs.MA 2026-05 unverdicted novelty 7.0

Introduces the Mechanism Plausibility Scale to distinguish generative sufficiency from mechanistic plausibility in LLM-based agent-based models.
BiAxisAudit: A Novel Framework to Evaluate LLM Bias Across Prompt Sensitivity and Response-Layer Divergence
cs.CL 2026-05 unverdicted novelty 7.0

BiAxisAudit measures LLM bias on two axes—across-prompt sensitivity via factorial grids and within-response divergence via split coding—revealing that task format explains as much variance as model choice and that 63....
PersonaTeaming: Supporting Persona-Driven Red-Teaming for Generative AI
cs.HC 2026-05 unverdicted novelty 7.0

Persona-driven workflow and interface improve automated and human-AI red-teaming of generative AI by incorporating diverse perspectives into adversarial prompt creation.
Decoding-Time Debiasing via Process Reward Models: From Controlled Fill-in to Open-Ended Generation
cs.CL 2026-05 unverdicted novelty 7.0

Decoding-time use of process reward models for bias mitigation raises fairness scores by up to 0.40 on a bilingual benchmark while preserving fluency across four LLMs and extends to open-ended generation with low overhead.
Toward Fair Speech Technologies: A Comprehensive Survey of Bias and Fairness in Speech AI
eess.AS 2026-05 accept novelty 7.0

The paper delivers a unified framework for fairness in speech technologies by formalizing seven definitions, organizing research into three paradigms, diagnosing pipeline-specific biases, and mapping mitigations to th...
LLM-Assisted Empirical Software Engineering: Systematic Literature Review and Research Agenda
cs.SE 2026-04 unverdicted novelty 7.0

A systematic review of 50 studies identifies 69 LLM-assisted tasks in empirical software engineering, concentrated in data processing and analysis with gaps in human-centered integration and reproducibility reporting.
A Systematic Survey of Security Threats and Defenses in LLM-Based AI Agents: A Layered Attack Surface Framework
cs.CR 2026-04 unverdicted novelty 7.0

A new 7x4 taxonomy organizes agentic AI security threats by architectural layer and persistence timescale, revealing under-explored upper layers and missing defenses after surveying 116 papers.
Reliable Self-Harm Risk Screening via Adaptive Multi-Agent LLM Systems
cs.LG 2026-04 unverdicted novelty 7.0

Adaptive multi-agent LLM pipelines with bandit-based sampling achieve lower false positive rates (0.095 vs 0.159) than single-agent models on two behavioral health datasets while maintaining similar false negative rates.
LLM-as-Judge Framework for Evaluating Tone-Induced Hallucination in Vision-Language Models
cs.CV 2026-04 unverdicted novelty 7.0

Ghost-100 benchmark shows prompt tone drives hallucination rates and intensities in VLMs, with non-monotonic peaks at intermediate pressure and task-specific differences that aggregate metrics hide.
IntervenSim: Intervention-Aware Social Network Simulation for Opinion Dynamics
cs.SI 2026-04 unverdicted novelty 7.0

IntervenSim is an intervention-aware social network simulation that couples source interventions with crowd interactions in a feedback loop, improving MAPE by 41.6% and DTW by 66.9% over prior static frameworks on rea...
A Generalist Agent
cs.AI 2022-05 accept novelty 7.0

Gato is a multi-modal, multi-task, multi-embodiment generalist policy using one transformer network to handle text, vision, games, and robotics tasks.
OPT: Open Pre-trained Transformer Language Models
cs.CL 2022-05 unverdicted novelty 7.0

OPT releases open decoder-only transformers up to 175B parameters that match GPT-3 performance at one-seventh the carbon cost, along with code and training logs.
Flamingo: a Visual Language Model for Few-Shot Learning
cs.CV 2022-04 unverdicted novelty 7.0

Flamingo models reach new state-of-the-art few-shot results on image and video tasks by bridging frozen vision and language models with cross-attention layers trained on interleaved web-scale data.
TokenRatio: Principled Token-Level Preference Optimization via Ratio Matching
cs.CL 2026-05 unverdicted novelty 6.0

TBPO derives a token-level preference optimization objective from sequence-level pairwise data via Bregman divergence ratio matching that generalizes DPO and improves alignment quality.
Overtrained, Not Misaligned
cs.LG 2026-05 unverdicted novelty 6.0

Emergent misalignment arises from overtraining after primary task convergence and is preventable by early stopping, which retains 93% of task performance on average.
Navigating the Sea of LLM Evaluation: Investigating Bias in Toxicity Benchmarks
cs.AI 2026-05 unverdicted novelty 6.0

Toxicity benchmarks for LLMs produce inconsistent results when task type, input domain, or model changes, revealing intrinsic evaluation biases.
PersonaTeaming: Supporting Persona-Driven Red-Teaming for Generative AI
cs.HC 2026-05 unverdicted novelty 6.0

PersonaTeaming Workflow improves automated red-teaming attack success rates over RainbowPlus using personas while maintaining diversity, and PersonaTeaming Playground supports human-AI collaboration in red-teaming as ...
Ethics Testing: Proactive Identification of Generative AI System Harms
cs.SE 2026-04 unverdicted novelty 6.0

Ethics testing is introduced as a systematic approach to generate tests that identify software harms induced by unethical behavior in generative AI outputs.
Transient Turn Injection: Exposing Stateless Multi-Turn Vulnerabilities in Large Language Models
cs.CR 2026-04 unverdicted novelty 6.0

Transient Turn Injection is a new attack that evades LLM moderation by spreading harmful intent over multiple isolated turns using automated agents.
AlignCultura: Towards Culturally Aligned Large Language Models?
cs.CL 2026-04 unverdicted novelty 6.0

Align-Cultura introduces the CULTURAX dataset and shows that culturally fine-tuned LLMs improve joint HHH scores by 4-6%, cut cultural failures by 18%, and gain 10-12% efficiency with minimal leakage.
Human-Guided Harm Recovery for Computer Use Agents
cs.AI 2026-04 conditional novelty 6.0

Introduces harm recovery as a post-execution safeguard for computer-use agents, operationalized via a human-preference rubric, reward model, and BackBench benchmark that shows improved recovery trajectories.
The Salami Slicing Threat: Exploiting Cumulative Risks in LLM Systems
cs.CR 2026-04 unverdicted novelty 6.0

Salami Attack chains low-risk inputs to cumulatively trigger high-risk LLM behaviors, achieving over 90% success on GPT-4o and Gemini while resisting some defenses.
Safety, Security, and Cognitive Risks in State-Space Models: A Systematic Threat Analysis with Spectral, Stateful, and Capacity Attacks
cs.CR 2026-04 unverdicted novelty 6.0

State-space models are vulnerable to three new attack types that corrupt state integrity, with experiments showing up to 156x output changes and 6x higher targeted corruption than random inputs.
Towards an AI co-scientist
cs.AI 2025-02 unverdicted novelty 6.0

A multi-agent AI system generates novel biomedical hypotheses that show promising experimental validation in drug repurposing for leukemia, new targets for liver fibrosis, and a bacterial gene transfer mechanism.
BloombergGPT: A Large Language Model for Finance
cs.LG 2023-03 conditional novelty 6.0

BloombergGPT is a 50B parameter LLM trained on a 708B token mixed financial and general dataset that outperforms prior models on financial benchmarks while preserving general LLM performance.
Ignore Previous Prompt: Attack Techniques For Language Models
cs.CL 2022-11 unverdicted novelty 6.0

PromptInject shows that simple adversarial prompts can cause goal hijacking and prompt leaking in GPT-3, exploiting its stochastic behavior.
Red Teaming Language Models to Reduce Harms: Methods, Scaling Behaviors, and Lessons Learned
cs.CL 2022-08 accept novelty 6.0

RLHF-aligned language models show increasing resistance to red teaming with scale up to 52B parameters, unlike prompted or rejection-sampled models, supported by a released dataset of 38,961 attacks.
Language Models (Mostly) Know What They Know
cs.CL 2022-07 unverdicted novelty 6.0

Language models show good calibration when asked to estimate the probability that their own answers are correct, with performance improving as models get larger.
Emergent Abilities of Large Language Models
cs.CL 2022-06 unverdicted novelty 6.0

Emergent abilities are capabilities present in large language models but absent in smaller ones and cannot be predicted by extrapolating smaller model performance.
Training a Helpful and Harmless Assistant with Reinforcement Learning from Human Feedback
cs.CL 2022-04 unverdicted novelty 6.0

RLHF alignment training on language models boosts NLP performance, supports skill specialization, enables weekly online updates with fresh human data, and shows a linear relation between RL reward and sqrt(KL divergen...
PaLM: Scaling Language Modeling with Pathways
cs.CL 2022-04 accept novelty 6.0

PaLM 540B demonstrates continued scaling benefits by setting new few-shot SOTA results on hundreds of benchmarks and outperforming humans on BIG-bench.
LaMDA: Language Models for Dialog Applications
cs.CL 2022-01 unverdicted novelty 6.0

LaMDA shows that fine-tuning on human-value annotations and consulting external knowledge sources significantly improves safety and factual grounding in large dialog models beyond what scaling alone achieves.
Beyond Inefficiency: Systemic Costs of Incivility in Multi-Agent Monte Carlo Simulations
cs.AI 2026-05 unverdicted novelty 5.0

Monte Carlo simulations of LLM agents confirm that toxic debates take 25% longer to converge, with larger delays in smaller models, and show a first-mover advantage independent of toxicity.
Quantifying and Predicting Disagreement in Graded Human Ratings
cs.CL 2026-05 unverdicted novelty 5.0

Annotation disagreement on toxic language can be moderately predicted from textual features, with high-opposition items proving harder for models to estimate accurately.
Representational Harms in LLM-Generated Narratives Against Global Majority Nationalities
cs.CL 2026-04 unverdicted novelty 5.0

LLMs generate narratives containing persistent stereotypes, erasure, and one-dimensional portrayals of Global Majority national identities, with minoritized groups overrepresented in subordinated roles by more than fi...
BodhiPromptShield: Pre-Inference Prompt Mediation for Suppressing Privacy Propagation in LLM/VLM Agents
cs.CR 2026-04 unverdicted novelty 5.0

BodhiPromptShield reduces stage-wise privacy propagation in LLM/VLM agents from 10.7% to 7.1% on the Controlled Prompt-Privacy Benchmark by mediating sensitive spans before inference and restoring only at authorized b...
Sociodemographic Biases in Educational Counselling by Large Language Models
cs.CY 2026-04 unverdicted novelty 5.0

LLMs show sociodemographic biases in educational counseling that are amplified by vague student descriptions and substantially reduced by concrete individualized details.
The PICCO Framework for Large Language Model Prompting: A Taxonomy and Reference Architecture for Prompt Structure
cs.CL 2026-04 accept novelty 5.0

PICCO is a five-element reference architecture (Persona, Instructions, Context, Constraints, Output) for structuring LLM prompts, derived from synthesizing prior frameworks along with a taxonomy distinguishing prompt ...
Measuring the metacognition of AI
cs.AI 2026-03 unverdicted novelty 5.0

Meta-d' and signal detection theory provide quantitative tools to assess metacognitive sensitivity and risk-based regulation in large language models.
PaLM 2 Technical Report
cs.CL 2023-05 unverdicted novelty 5.0

PaLM 2 reports state-of-the-art results on language, reasoning, and multilingual tasks with improved efficiency over PaLM.
Gemma: Open Models Based on Gemini Research and Technology
cs.CL 2024-03 accept novelty 4.0

Gemma introduces open 2B and 7B LLMs derived from Gemini technology that beat comparable open models on 11 of 18 text tasks and come with safety assessments.
AI Trust OS -- A Continuous Governance Framework for Autonomous AI Observability and Zero-Trust Compliance in Enterprise Environments
cs.AI 2026-04 unverdicted novelty 3.0

AI Trust OS is a proposed always-on operating layer that discovers undocumented AI systems via telemetry and produces continuous zero-trust compliance artifacts for regulations including ISO 42001, EU AI Act, SOC 2, G...
Gemma 2: Improving Open Language Models at a Practical Size
cs.CL 2024-07 conditional novelty 3.0

Gemma 2 models achieve leading performance at their sizes by combining established Transformer modifications with knowledge distillation for the 2B and 9B variants.

Reference graph

Works this paper leans on

300 extracted references · 300 canonical work pages · cited by 43 Pith papers · 12 internal anchors

[1]

URL http: //dx.doi.org/10.1145/2976749.2978318

M. Abadi, A. Chu, I. Goodfellow, H. B. McMahan, I. Mironov, K. Talwar, and L. Zhang. Deep Learning with Differential Privacy . In Proceedings of the 2016 ACM SIGSAC Conference on Computer and Communications Security , CCS '16, pages 308--318, Vienna, Austria, October 2016. Association for Computing Machinery. ISBN 9781450341394. doi:10.1145/2976749.297831...

work page doi:10.1145/2976749.2978318 2016
[2]

A. Abid. Anti- Muslim Bias in GPT -3, August 2020. URL https://twitter.com/abidlabs/status/1291165311329341440

work page arXiv 2020
[3]

A. Abid, M. Farooqi, and J. Zou. Persistent Anti - Muslim Bias in Large Language Models . arXiv:2101.05783 [cs], January 2021. URL http://arxiv.org/abs/2101.05783. arXiv: 2101.05783

work page arXiv 2021
[4]

Acemoglu and P

D. Acemoglu and P. Restrepo. Artificial Intelligence , Automation and Work . Working Paper 24196, National Bureau of Economic Research, January 2018. URL https://www.nber.org/papers/w24196

work page 2018
[5]

D. I. Adelani, J. Abbott, G. Neubig, D. D'souza, J. Kreutzer, C. Lignos, C. Palen-Michel, H. Buzaaba, S. Rijhwani, S. Ruder, S. Mayhew, I. A. Azime, S. Muhammad, C. C. Emezue, J. Nakatumba-Nabende, P. Ogayo, A. Aremu, C. Gitau, D. Mbaye, J. Alabi, S. M. Yimam, T. Gwadabe, I. Ezeani, R. A. Niyongabo, J. Mukiibi, V. Otiende, I. Orife, D. David, S. Ngom, T. ...

work page arXiv 2021
[6]

Agüera y Arcas, M

B. Agüera y Arcas, M. Mitchell, and A. Todorov. Physiognomy’s New Clothes , May 2017. URL https://medium.com/@blaisea/physiognomys-new-clothes-f2d4b59fdd6a

work page 2017
[7]

Ahlgren, M

J. Ahlgren, M. E. Berezin, K. Bojarczuk, E. Dulskyte, I. Dvortsova, J. George, N. Gucevska, M. Harman, R. Lämmel, E. Meijer, S. Sapora, and J. Spahr-Summers. WES : Agent -based User Interaction Simulation on Real Infrastructure . arXiv:2004.05363 [cs], April 2020. URL http://arxiv.org/abs/2004.05363. arXiv: 2004.05363

work page arXiv 2004
[8]

J. Alammar. The Illustrated Transformer , June 2018. URL https://jalammar.github.io/illustrated-transformer/

work page 2018
[9]

Allcott, M

H. Allcott, M. Gentzkow, and C. Yu. Trends in the diffusion of misinformation on social media. Research & Politics, 6 0 (2): 0 2053168019848554, April 2019. ISSN 2053-1680. doi:10.1177/2053168019848554. URL https://doi.org/10.1177/2053168019848554

work page doi:10.1177/2053168019848554 2019
[10]

Andersen

R. Andersen. The Panopticon Is Already Here . The Atlantic, July 2020. URL https://www.theatlantic.com/magazine/archive/2020/09/china-ai-surveillance/614197/

work page 2020
[11]

Angwin, J

J. Angwin, J. Larson, S. Mattu, and L. Kirchner. Machine Bias . ProPublica, May 2016. URL https://www.propublica.org/article/machine-bias-risk-assessments-in-criminal-sentencing

work page 2016
[12]

Armstrong, A

S. Armstrong, A. Sandberg, and N. Bostrom. Thinking Inside the Box : Controlling and Using an Oracle AI . Minds and Machines, 22 0 (4): 0 299--324, November 2012. ISSN 1572-8641. doi:10.1007/s11023-012-9282-2. URL https://doi.org/10.1007/s11023-012-9282-2

work page doi:10.1007/s11023-012-9282-2 2012
[13]

Autor and A

D. Autor and A. Salomons. New Frontiers : The Evolving Content and Geography of New Work in the 20th Century - David Autor . Working Paper, 2019. URL https://app.scholarsite.io/david-autor/articles/new-frontiers-the-evolving-content-and-geography-of-new-work-in-the-20th-century

work page 2019
[14]

J. K. Baker. Stochastic modeling for automatic speech understanding. In Readings in speech recognition, pages 297--307. Morgan Kaufmann Publishers Inc., San Francisco, CA, USA, May 1990. ISBN 9781558601246

work page 1990
[15]

Barocas and A

S. Barocas and A. D. Selbst. Big Data 's Disparate Impact . California Law Review, 104: 0 671, 2016. URL https://heinonline.org/HOL/Page?handle=hein.journals/calr104&id=695&div=&collection=

work page 2016
[16]

Barocas, M

S. Barocas, M. Hardt, and A. Narayanan. Fairness and machine learning. fairmlbook.org, 2019. URL https://fairmlbook.org/

work page 2019
[17]

Analysis Methods in Neural Language Processing: A Survey

Y. Belinkov and J. Glass. Analysis Methods in Neural Language Processing : A Survey . Transactions of the Association for Computational Linguistics, 7: 0 49--72, April 2019. ISSN 2307-387X. doi:10.1162/tacl_a_00254. URL https://doi.org/10.1162/tacl_a_00254

work page doi:10.1162/tacl_a_00254 2019
[18]

E. Bender. The \# BenderRule : On Naming the Languages We Study and Why It Matters . The Gradient, September 2019. URL https://thegradient.pub/the-benderrule-on-naming-the-languages-we-study-and-why-it-matters/

work page 2019
[19]

E. M. Bender. On Achieving and Evaluating Language - Independence in NLP . Linguistic Issues in Language Technology, 6 0 (0), November 2011. ISSN 1945-3604. URL http://elanguage.net/journals/lilt/article/view/2624

work page 2011
[20]

E. M. Bender and B. Friedman. Data Statements for Natural Language Processing : Toward Mitigating System Bias and Enabling Better Science . Transactions of the Association for Computational Linguistics, 6: 0 587--604, December 2018. ISSN 2307-387X. doi:10.1162/tacl_a_00041. URL https://doi.org/10.1162/tacl_a_00041

work page doi:10.1162/tacl_a_00041 2018
[21]

E. M. Bender and A. Koller. Climbing towards NLU : On Meaning , Form , and Understanding in the Age of Data . In Proceedings of the 58th Annual Meeting of the Association for Computational Linguistics , pages 5185--5198, Online, July 2020. Association for Computational Linguistics. doi:10.18653/v1/2020.acl-main.463. URL https://aclanthology.org/2020.acl-main.463

work page doi:10.18653/v1/2020.acl-main.463 2020
[22]

E. M. Bender, T. Gebru, A. McMillan-Major, and S. Shmitchell. On the Dangers of Stochastic Parrots : Can Language Models Be Too Big ? In Proceedings of the 2021 ACM Conference on Fairness , Accountability , and Transparency , FAccT '21, pages 610--623, Virtual Event, Canada, March 2021. Association for Computing Machinery. ISBN 9781450383097. doi:10.1145/...

work page doi:10.1145/3442188.3445922 2021
[23]

Y. Bengio. Neural net language models, January 2008. URL http://www.scholarpedia.org/article/Neural_net_language_models

work page 2008
[24]

Benjamin

R. Benjamin. Race After Technology : Abolitionist Tools for the New Jim Code . Social Forces, 98 0 (4): 0 1--3, June 2020. ISSN 0037-7732. doi:10.1093/sf/soz162. URL https://doi.org/10.1093/sf/soz162

work page doi:10.1093/sf/soz162 2020
[25]

H. Bergen. ‘ I ’d Blush if I Could ’: Digital Assistants , Disembodied Cyborgs and the Problem of Gender . Word and Text, A Journal of Literary Studies and Linguistics, VI 0 (01): 0 95--113, 2016. ISSN 2069-9271. URL https://www.ceeol.com/search/article-detail?id=469884

work page 2016
[26]

T. W. Bickmore, H. Trinh, S. Olafsson, T. K. O'Leary, R. Asadi, N. M. Rickles, and R. Cruz. Patient and Consumer Safety Risks When Using Conversational Assistants for Medical Information : An Observational Study of Siri , Alexa , and Google Assistant . Journal of Medical Internet Research, 20 0 (9): 0 e11510, September 2018. doi:10.2196/11510. URL https:/...

work page doi:10.2196/11510 2018
[27]

S. L. Blodgett and B. O'Connor. Racial Disparity in Natural Language Processing : A Case Study of Social Media African - American English . arXiv:1707.00061 [cs], June 2017. URL http://arxiv.org/abs/1707.00061. arXiv: 1707.00061

work page arXiv 2017
[28]

S. L. Blodgett, L. Green, and B. O'Connor. Demographic Dialectal Variation in Social Media : A Case Study of African - American English . In Proceedings of the 2016 Conference on Empirical Methods in Natural Language Processing , pages 1119--1130, Austin, Texas, November 2016. Association for Computational Linguistics. doi:10.18653/v1/D16-1120. URL https:...

work page doi:10.18653/v1/d16-1120 2016
[29]

S. L. Blodgett, S. Barocas, H. Daumé III, and H. Wallach. Language ( Technology ) is Power : A Critical Survey of " Bias " in NLP . arXiv:2005.14050 [cs], May 2020. URL http://arxiv.org/abs/2005.14050. arXiv: 2005.14050

work page arXiv 2005
[30]

S. L. Blodgett, G. Lopez, A. Olteanu, R. Sim, and H. Wallach. Stereotyping Norwegian Salmon : An Inventory of Pitfalls in Fairness Benchmark Datasets . In Proceedings of the 59th Annual Meeting of the Association for Computational Linguistics and the 11th International Joint Conference on Natural Language Processing ( Volume 1: Long Papers ) , pages 1004-...

work page doi:10.18653/v1/2021.acl-long.81 2021
[31]

S. Bok. Secrecy and Openness in Science : Ethical Considerations . Science, Technology, & Human Values, 7 0 (38): 0 32--41, 1982. ISSN 0162-2439. URL https://www.jstor.org/stable/689458

work page 1982
[32]

On the Opportunities and Risks of Foundation Models

R. Bommasani, D. A. Hudson, E. Adeli, R. Altman, S. Arora, S. von Arx, M. S. Bernstein, J. Bohg, A. Bosselut, E. Brunskill, E. Brynjolfsson, S. Buch, D. Card, R. Castellon, N. Chatterji, A. Chen, K. Creel, J. Q. Davis, D. Demszky, C. Donahue, M. Doumbouya, E. Durmus, S. Ermon, J. Etchemendy, K. Ethayarajh, L. Fei-Fei, C. Finn, T. Gale, L. Gillespie, K. Go...

work page internal anchor Pith review Pith/arXiv arXiv 2021
[33]

N. Bostrom. Superintelligence: paths, dangers, strategies. Oxford University Press, Oxford, 2014. ISBN 9780199678112. OCLC: ocn881706835

work page 2014
[34]

Bostrom et al

N. Bostrom et al. Information hazards: A typology of potential harms from knowledge. Review of Contemporary Philosophy, pages 44--79, 2011

work page 2011
[35]

G. C. Bowker and S. L. Star. Sorting Things Out : Classification and Its Consequences . Inside Technology . MIT Press, Cambridge, MA, USA, September 1999. ISBN 9780262024617

work page 1999
[36]

G. Branwen. GPT -3 Creative Fiction , June 2020. URL https://www.gwern.net/GPT-3

work page 2020
[37]

Breazeal and B

C. Breazeal and B. Scassellati. Infant-like Social Interactions between a Robot and a Human Caregiver . Adaptive Behavior, 8 0 (1): 0 49--74, January 2000. ISSN 1059-7123. doi:10.1177/105971230000800104. URL https://doi.org/10.1177/105971230000800104

work page doi:10.1177/105971230000800104 2000
[38]

T. B. Brown, B. Mann, N. Ryder, M. Subbiah, J. Kaplan, P. Dhariwal, A. Neelakantan, P. Shyam, G. Sastry, A. Askell, S. Agarwal, A. Herbert-Voss, G. Krueger, T. Henighan, R. Child, A. Ramesh, D. M. Ziegler, J. Wu, C. Winter, C. Hesse, M. Chen, E. Sigler, M. Litwin, S. Gray, B. Chess, J. Clark, C. Berner, S. McCandlish, A. Radford, I. Sutskever, and D. Amod...

work page internal anchor Pith review Pith/arXiv arXiv 2005
[39]

S. Browne. Dark Matters . Duke University Press, September 2015. ISBN 9780822375302. URL https://www.degruyter.com/document/doi/10.1515/9780822375302/html

work page doi:10.1515/9780822375302/html 2015
[40]

Buchanan, A

B. Buchanan, A. Lohn, M. Musser, and S. Katerina. Truth, Lies , and Truth , Lies , and Automation : How Language Models Could Change DisinformationAutomation : How Language Models Could Change Disinformation . Technical report, CSET, May 2021

work page 2021
[41]

Buolamwini and T

J. Buolamwini and T. Gebru. Gender Shades : Intersectional Accuracy Disparities in Commercial Gender Classification . In Conference on Fairness , Accountability and Transparency , pages 77--91. PMLR, January 2018. URL https://proceedings.mlr.press/v81/buolamwini18a.html

work page 2018
[42]

Semantics derived automatically from language corpora contain human-like biases,

A. Caliskan, J. J. Bryson, and A. Narayanan. Semantics derived automatically from language corpora contain human-like biases. Science, 356 0 (6334): 0 183--186, April 2017. ISSN 0036-8075, 1095-9203. doi:10.1126/science.aal4230. URL http://arxiv.org/abs/1608.07187. arXiv: 1608.07187

work page doi:10.1126/science.aal4230 2017
[43]

Y. T. Cao and H. Daumé III. Toward Gender - Inclusive Coreference Resolution . Proceedings of the 58th Annual Meeting of the Association for Computational Linguistics, pages 4568--4595, 2020. doi:10.18653/v1/2020.acl-main.418. URL http://arxiv.org/abs/1910.13913. arXiv: 1910.13913

work page doi:10.18653/v1/2020.acl-main.418 2020
[44]

Carlini, C

N. Carlini, C. Liu, \'U . Erlingsson, J. Kos, and D. Song. The Secret Sharer : Evaluating and Testing Unintended Memorization in Neural Networks . In 28th USENIX Security Symposium ( USENIX Security 19) , pages 267--284, 2019. ISBN 9781939133069. URL https://www.usenix.org/conference/usenixsecurity19/presentation/carlini

work page 2019
[45]

Extracting training data from large language models

N. Carlini, F. Tramer, E. Wallace, M. Jagielski, A. Herbert-Voss, K. Lee, A. Roberts, T. Brown, D. Song, U. Erlingsson, A. Oprea, and C. Raffel. Extracting Training Data from Large Language Models . arXiv:2012.07805 [cs], June 2021. URL http://arxiv.org/abs/2012.07805. arXiv: 2012.07805

work page arXiv 2012
[46]

Caswell, J

I. Caswell, J. Kreutzer, L. Wang, A. Wahab, D. van Esch, N. Ulzii-Orshikh, A. Tapo, N. Subramani, A. Sokolov, C. Sikasote, M. Setyawan, S. Sarin, S. Samb, B. Sagot, C. Rivera, A. Rios, I. Papadimitriou, S. Osei, P. J. O. Suárez, I. Orife, K. Ogueji, R. A. Niyongabo, T. Q. Nguyen, M. Müller, A. Müller, S. H. Muhammad, N. Muhammad, A. Mnyakeni, J. Mirzakhal...

work page arXiv 2021
[47]

Cave and K

S. Cave and K. Dihal. The Whiteness of AI . Philosophy & Technology, 33 0 (4): 0 685--703, December 2020. ISSN 2210-5441. doi:10.1007/s13347-020-00415-6. URL https://doi.org/10.1007/s13347-020-00415-6

work page doi:10.1007/s13347-020-00415-6 2020
[48]

Cercas Curry, J

A. Cercas Curry, J. Robertson, and V. Rieser. Conversational Assistants and Gender Stereotypes : Public Perceptions and Desiderata for Voice Personas . In Proceedings of the Second Workshop on Gender Bias in Natural Language Processing , pages 72--78, Barcelona, Spain (Online), December 2020. Association for Computational Linguistics. URL https://aclantho...

work page 2020
[49]

M. Chen, J. Tworek, H. Jun, Q. Yuan, H. P. d. O. Pinto, J. Kaplan, H. Edwards, Y. Burda, N. Joseph, G. Brockman, A. Ray, R. Puri, G. Krueger, M. Petrov, H. Khlaaf, G. Sastry, P. Mishkin, B. Chan, S. Gray, N. Ryder, M. Pavlov, A. Power, L. Kaiser, M. Bavarian, C. Winter, P. Tillet, F. P. Such, D. Cummings, M. Plappert, F. Chantzis, E. Barnes, A. Herbert-Vo...

work page internal anchor Pith review Pith/arXiv arXiv 2021
[50]

R. J. Chen, M. Y. Lu, T. Y. Chen, D. F. K. Williamson, and F. Mahmood. Synthetic data in machine learning for medicine and healthcare. Nature Biomedical Engineering, 5 0 (6): 0 493--497, June 2021 b . ISSN 2157-846X. doi:10.1038/s41551-021-00751-8. URL https://www.nature.com/articles/s41551-021-00751-8

work page doi:10.1038/s41551-021-00751-8 2021
[51]

Chouldechova and A

A. Chouldechova and A. Roth. The Frontiers of Fairness in Machine Learning . arXiv:1810.08810 [cs, stat], October 2018. URL http://arxiv.org/abs/1810.08810. arXiv: 1810.08810

work page arXiv 2018
[52]

Colleoni, A

E. Colleoni, A. Rozza, and A. Arvidsson. Echo Chamber or Public Sphere ? Predicting Political Orientation and Measuring Political Homophily in Twitter Using Big Data . Journal of Communication, 64 0 (2): 0 317--332, April 2014. ISSN 0021-9916. doi:10.1111/jcom.12084. URL https://doi.org/10.1111/jcom.12084

work page doi:10.1111/jcom.12084 2014
[53]

GitHub Copilot · Your AI pair programmer, 2021

CopilotonGitHub. GitHub Copilot · Your AI pair programmer, 2021. URL https://copilot.github.com/

work page 2021
[54]

Explaining

D. Coyle and A. Weller. “ Explaining ” machine learning reveals policy challenges. Science, 368 0 (6498): 0 1433--1434, June 2020. doi:10.1126/science.aba9647. URL https://www.science.org/doi/full/10.1126/science.aba9647

work page doi:10.1126/science.aba9647 2020
[55]

J. T. Craft, K. E. Wright, R. E. Weissler, and R. M. Queen. Language and Discrimination : Generating Meaning , Perceiving Identities , and Discriminating Outcomes . Annual Review of Linguistics, 6 0 (1): 0 389--407, January 2020. ISSN 2333-9683, 2333-9691. doi:10.1146/annurev-linguistics-011718-011659. URL https://www.annualreviews.org/doi/10.1146/annurev...

work page doi:10.1146/annurev-linguistics-011718-011659 2020
[56]

Crawford

K. Crawford. Atlas of AI . Yale University Press, 2021. URL https://yalebooks.yale.edu/book/9780300209570/atlas-ai

work page arXiv 2021
[57]

Crenshaw

K. Crenshaw. On Intersectionality : Essential Writings . Books, March 2017 a . URL https://scholarship.law.columbia.edu/books/255

work page 2017
[58]

Crenshaw

K. Crenshaw. On Intersectionality : Essential Writings . The New Press, March 2017 b . URL https://scholarship.law.columbia.edu/books/255

work page 2017
[59]

Cyphers and G

B. Cyphers and G. Gebhart. Behind the One - Way Mirror : A Deep Dive Into the Technology of Corporate Surveillance . Technical report, Electronic Frontier Foundation, December 2019. URL https://www.eff.org/wp/behind-the-one-way-mirror

work page 2019
[60]

R. Dale. GPT -3: What ’s it good for? Natural Language Engineering, 27 0 (1): 0 113--118, January 2021. ISSN 1351-3249, 1469-8110. doi:10.1017/S1351324920000601. URL https://www.cambridge.org/core/journals/natural-language-engineering/article/gpt3-whats-it-good-for/0E05CFE68A7AC8BF794C8ECBE28AA990

work page doi:10.1017/s1351324920000601 2021
[61]

J. Dastin. Amazon scraps secret AI recruiting tool that showed bias against women. Reuters, October 2018. URL https://www.reuters.com/article/us-amazon-com-jobs-automation-insight-idUSKCN1MK08G

work page 2018
[62]

A. M. Davani, A. Omrani, B. Kennedy, M. Atari, X. Ren, and M. Dehghani. Fair Hate Speech Detection through Evaluation of Social Group Counterfactuals . arXiv:2010.12779 [cs], October 2020. URL http://arxiv.org/abs/2010.12779. arXiv: 2010.12779

work page arXiv 2010
[63]

Bringing the people back in: Contesting benchmark machine learning datasets.arXiv preprint arXiv:2007.07399, 2020

E. Denton, A. Hanna, R. Amironesei, A. Smart, H. Nicole, and M. K. Scheuerman. Bringing the People Back In : Contesting Benchmark Machine Learning Datasets . arXiv:2007.07399 [cs], July 2020. URL http://arxiv.org/abs/2007.07399. arXiv: 2007.07399

work page arXiv 2007
[64]

BERT: Pre-training of Deep Bidirectional Transformers for Language Understanding

J. Devlin, M.-W. Chang, K. Lee, and K. Toutanova. BERT : Pre -training of Deep Bidirectional Transformers for Language Understanding . arXiv:1810.04805 [cs], May 2019. URL http://arxiv.org/abs/1810.04805. arXiv: 1810.04805

work page internal anchor Pith review Pith/arXiv arXiv 2019
[65]

Dietterich and E

T. Dietterich and E. B. Kong. Machine Learning Bias , Statistical Bias , and Statistical Variance of Decision Tree Algorithms . Technical report, Department of Computer Science, Oregon State University, 1995

work page 1995
[66]

Stevie Bergman, Shannon Spruit, Dirk Hovy, Y-Lan Boureau, and Verena Rieser

E. Dinan, G. Abercrombie, A. S. Bergman, S. Spruit, D. Hovy, Y.-L. Boureau, and V. Rieser. Anticipating Safety Issues in E2E Conversational AI : Framework and Tooling . arXiv:2107.03451 [cs], July 2021. URL http://arxiv.org/abs/2107.03451. arXiv: 2107.03451

work page arXiv 2021
[67]

Proceedings of the 2018 AAAI/ACM Conference on AI, Ethics, and Society , pages =

L. Dixon, J. Li, J. Sorensen, N. Thain, and L. Vasserman. Measuring and Mitigating Unintended Bias in Text Classification . In Proceedings of the 2018 AAAI / ACM Conference on AI , Ethics , and Society , AIES '18, pages 67--73, New Orleans, LA, USA, December 2018. Association for Computing Machinery. ISBN 9781450360128. doi:10.1145/3278721.3278729. URL ht...

work page doi:10.1145/3278721.3278729 2018
[68]

Dobberstein

L. Dobberstein. Korean app-maker Scatter Lab fined for using private data to create homophobic and lewd chatbot. The Register, April 2021. URL https://www.theregister.com/2021/04/29/scatter_lab_fined_for_lewd_chatbot/

work page 2021
[69]

Dodge, M

J. Dodge, M. Sap, A. Marasović, W. Agnew, G. Ilharco, D. Groeneveld, M. Mitchell, and M. Gardner. Documenting Large Webtext Corpora : A Case Study on the Colossal Clean Crawled Corpus . arXiv:2104.08758 [cs], September 2021. URL http://arxiv.org/abs/2104.08758. arXiv: 2104.08758

work page arXiv 2021
[70]

Towards A Rigorous Science of Interpretable Machine Learning

F. Doshi-Velez and B. Kim. Towards A Rigorous Science of Interpretable Machine Learning . arXiv:1702.08608 [cs, stat], March 2017. URL http://arxiv.org/abs/1702.08608. arXiv: 1702.08608

work page internal anchor Pith review arXiv 2017
[71]

D. M. Douglas. Doxing: a conceptual analysis. Ethics and Information Technology, 18 0 (3): 0 199--210, September 2016. ISSN 1572-8439. doi:10.1007/s10676-016-9406-0. URL https://doi.org/10.1007/s10676-016-9406-0

work page doi:10.1007/s10676-016-9406-0 2016
[72]

C. Du. Chinese AI lab challenges Google , OpenAI with a model of 1.75 trillion parameters. PingWest, June 2021. URL https://en.pingwest.com/a/8693

work page 2021
[73]

M. Duggan. Online Harassment 2017. Technical report, Pew Research Center, July 2017. URL https://www.pewresearch.org/internet/2017/07/11/online-harassment-2017/

work page 2017
[74]

W. H. Dutton and C. T. Robertson. Disentangling polarisation and civic empowerment in the digital age : The role of filter bubbles and echo chambers in the rise of populism. In The Routledge Companion to Media Disinformation and Populism . Routledge, 2021

work page 2021
[75]

In: Halevi, S., Rabin, T

C. Dwork, F. McSherry, K. Nissim, and A. Smith. Calibrating Noise to Sensitivity in Private Data Analysis . In S. Halevi and T. Rabin, editors, Theory of Cryptography , Lecture Notes in Computer Science , pages 265--284, Berlin, Heidelberg, 2006. Springer. ISBN 9783540327325. doi:10.1007/11681878_14

work page doi:10.1007/11681878_14 2006
[76]

Evans and J

R. Evans and J. Gao. DeepMind AI Reduces Google Data Centre Cooling Bill by 40\ URL https://deepmind.com/blog/article/deepmind-ai-reduces-google-data-centre-cooling-bill-40

work page
[77]

Switch Transformers: Scaling to Trillion Parameter Models with Simple and Efficient Sparsity

W. Fedus, B. Zoph, and N. Shazeer. Switch Transformers : Scaling to Trillion Parameter Models with Simple and Efficient Sparsity . arXiv:2101.03961 [cs], January 2021. URL http://arxiv.org/abs/2101.03961. arXiv: 2101.03961

work page internal anchor Pith review arXiv 2021
[78]

Ferrer, T

X. Ferrer, T. van Nuenen, J. M. Such, and N. Criado. Discovering and Categorising Language Biases in Reddit . arXiv:2008.02754 [cs], August 2020. URL http://arxiv.org/abs/2008.02754. arXiv: 2008.02754

work page arXiv 2008
[79]

Finkelstein, E

S. Finkelstein, E. Yarzebinski, C. Vaughn, A. Ogan, and J. Cassell. The Effects of Culturally Congruent Educational Technologies on Student Achievement . In H. C. Lane, K. Yacef, J. Mostow, and P. Pavlik, editors, Artificial Intelligence in Education , Lecture Notes in Computer Science , pages 493--502, Berlin, Heidelberg, 2013. Springer. ISBN 97836423911...

work page doi:10.1007/978-3-642-39112-5_50 2013
[80]

C. Flood. Fake news infiltrates financial markets. Financial Times, May 2017. URL https://www.ft.com/content/a37e4874-2c2a-11e7-bc4b-5528796fe35c

work page 2017

Showing first 80 references.