Recognition: 2 theorem links
· Lean TheoremEthical and social risks of harm from Language Models
Pith reviewed 2026-05-11 18:19 UTC · model grok-4.3
The pith
Language models pose 21 specific ethical and social risks across six main categories that require mitigation through organizational action.
A machine-rendered reading of the paper's core claim, the machinery that carries it, and where it could break.
Core claim
Large-scale language models carry risks including perpetuation of stereotypes and exclusion, leaks of private data or inference of sensitive information, generation of false or misleading content that erodes trust, use by malicious actors to cause harm, unsafe or deceptive interactions in conversational agents, and broader effects like job displacement and environmental costs that may disproportionately affect certain groups. The paper reviews 21 such risks in depth, discusses their points of origin, and outlines potential mitigations while highlighting the need for organizational responsibilities and collaborative efforts.
What carries the argument
The six risk areas framework that organizes harms from language models into discrimination/exclusion/toxicity, information hazards, misinformation harms, malicious uses, human-computer interaction harms, and automation/access/environmental harms, enabling systematic analysis and mitigation planning.
If this is right
- Organizations bear responsibility for implementing mitigations tailored to each risk category.
- Collaboration across stakeholders is required to address the full set of risks effectively.
- Further research should expand toolkits for assessing and evaluating the 21 risks in language models.
- Responsible development of language models requires in-depth understanding of their points of origin and potential knock-on effects.
Where Pith is reading between the lines
- The taxonomy could serve as a checklist for auditing specific language model deployments in practice.
- It implies that interdisciplinary teams are needed to translate these risk categories into concrete technical safeguards.
- If widely adopted, the structure might influence standards for evaluating new models before release.
Load-bearing premise
The analysis assumes that the identified risks are grounded in established literature from multiple disciplines and that outlining them will foster responsible innovation without new empirical validation for each risk's likelihood or severity.
What would settle it
A large-scale empirical study of deployed language models that finds zero instances of discrimination, data leakage, misinformation generation, or the other listed harms in real user interactions would challenge the paper's risk landscape.
read the original abstract
This paper aims to help structure the risk landscape associated with large-scale Language Models (LMs). In order to foster advances in responsible innovation, an in-depth understanding of the potential risks posed by these models is needed. A wide range of established and anticipated risks are analysed in detail, drawing on multidisciplinary expertise and literature from computer science, linguistics, and social sciences. We outline six specific risk areas: I. Discrimination, Exclusion and Toxicity, II. Information Hazards, III. Misinformation Harms, V. Malicious Uses, V. Human-Computer Interaction Harms, VI. Automation, Access, and Environmental Harms. The first area concerns the perpetuation of stereotypes, unfair discrimination, exclusionary norms, toxic language, and lower performance by social group for LMs. The second focuses on risks from private data leaks or LMs correctly inferring sensitive information. The third addresses risks arising from poor, false or misleading information including in sensitive domains, and knock-on risks such as the erosion of trust in shared information. The fourth considers risks from actors who try to use LMs to cause harm. The fifth focuses on risks specific to LLMs used to underpin conversational agents that interact with human users, including unsafe use, manipulation or deception. The sixth discusses the risk of environmental harm, job automation, and other challenges that may have a disparate effect on different social groups or communities. In total, we review 21 risks in-depth. We discuss the points of origin of different risks and point to potential mitigation approaches. Lastly, we discuss organisational responsibilities in implementing mitigations, and the role of collaboration and participation. We highlight directions for further research, particularly on expanding the toolkit for assessing and evaluating the outlined risks in LMs.
Editorial analysis
A structured set of objections, weighed in public.
Referee Report
Summary. The manuscript provides a structured review of ethical and social risks from large language models (LMs), categorizing them into six areas: I. Discrimination, Exclusion and Toxicity; II. Information Hazards; III. Misinformation Harms; V. Malicious Uses; V. Human-Computer Interaction Harms; VI. Automation, Access, and Environmental Harms. It reviews 21 risks in depth by drawing on multidisciplinary literature, discusses origins and mitigations, organizational responsibilities, collaboration needs, and future research directions to support responsible innovation.
Significance. If the taxonomy and analysis hold, the paper makes a significant contribution by synthesizing existing multidisciplinary literature into an organizing framework for LM risks. This can help guide responsible development and policy in AI. Credit is due for the explicit coverage of mitigation approaches, points of origin, and calls for participation and further empirical work on risk assessment, which are appropriate strengths for a synthesis review without new data.
minor comments (1)
- [Abstract] Abstract: The risk areas are listed as I, II, III, V, V, VI. This skips IV and duplicates V; renumbering to I-VI would improve clarity and professionalism.
Simulated Author's Rebuttal
We thank the referee for their positive and accurate summary of our manuscript, as well as for the recommendation to accept. We are pleased that the taxonomy of 21 risks across the six areas, along with the discussion of origins, mitigations, and future directions, is viewed as a significant contribution to responsible innovation in language models.
Circularity Check
No significant circularity
full rationale
The paper is a multidisciplinary literature review that synthesizes 21 risks across six areas from external sources in computer science, linguistics, and social sciences. No new derivations, equations, parameter fits, or predictions are introduced; the taxonomy is presented explicitly as an organizing framework drawn from established literature rather than constructed from the paper's own assumptions. No self-citations serve as load-bearing premises, and all claims trace to independent prior work without reduction to the paper's inputs by definition or construction.
Axiom & Free-Parameter Ledger
Lean theorems connected to this paper
-
Cost.FunctionalEquationwashburn_uniqueness_aczel unclearThe analysis assumes that the identified risks are grounded in established literature from multiple disciplines and that outlining them will foster responsible innovation, without providing new empirical validation for each risk's likelihood or severity.
Forward citations
Cited by 44 Pith papers
-
VoxSafeBench: Not Just What Is Said, but Who, How, and Where
VoxSafeBench reveals that speech language models recognize social norms from text but fail to apply them when acoustic cues like speaker or scene determine the appropriate response.
-
Mechanism Plausibility in Generative Agent-Based Modeling
Introduces the Mechanism Plausibility Scale to distinguish generative sufficiency from mechanistic plausibility in LLM-based agent-based models.
-
BiAxisAudit: A Novel Framework to Evaluate LLM Bias Across Prompt Sensitivity and Response-Layer Divergence
BiAxisAudit measures LLM bias on two axes—across-prompt sensitivity via factorial grids and within-response divergence via split coding—revealing that task format explains as much variance as model choice and that 63....
-
PersonaTeaming: Supporting Persona-Driven Red-Teaming for Generative AI
Persona-driven workflow and interface improve automated and human-AI red-teaming of generative AI by incorporating diverse perspectives into adversarial prompt creation.
-
Decoding-Time Debiasing via Process Reward Models: From Controlled Fill-in to Open-Ended Generation
Decoding-time use of process reward models for bias mitigation raises fairness scores by up to 0.40 on a bilingual benchmark while preserving fluency across four LLMs and extends to open-ended generation with low overhead.
-
Toward Fair Speech Technologies: A Comprehensive Survey of Bias and Fairness in Speech AI
The paper delivers a unified framework for fairness in speech technologies by formalizing seven definitions, organizing research into three paradigms, diagnosing pipeline-specific biases, and mapping mitigations to th...
-
LLM-Assisted Empirical Software Engineering: Systematic Literature Review and Research Agenda
A systematic review of 50 studies identifies 69 LLM-assisted tasks in empirical software engineering, concentrated in data processing and analysis with gaps in human-centered integration and reproducibility reporting.
-
A Systematic Survey of Security Threats and Defenses in LLM-Based AI Agents: A Layered Attack Surface Framework
A new 7x4 taxonomy organizes agentic AI security threats by architectural layer and persistence timescale, revealing under-explored upper layers and missing defenses after surveying 116 papers.
-
Reliable Self-Harm Risk Screening via Adaptive Multi-Agent LLM Systems
Adaptive multi-agent LLM pipelines with bandit-based sampling achieve lower false positive rates (0.095 vs 0.159) than single-agent models on two behavioral health datasets while maintaining similar false negative rates.
-
LLM-as-Judge Framework for Evaluating Tone-Induced Hallucination in Vision-Language Models
Ghost-100 benchmark shows prompt tone drives hallucination rates and intensities in VLMs, with non-monotonic peaks at intermediate pressure and task-specific differences that aggregate metrics hide.
-
IntervenSim: Intervention-Aware Social Network Simulation for Opinion Dynamics
IntervenSim is an intervention-aware social network simulation that couples source interventions with crowd interactions in a feedback loop, improving MAPE by 41.6% and DTW by 66.9% over prior static frameworks on rea...
-
A Generalist Agent
Gato is a multi-modal, multi-task, multi-embodiment generalist policy using one transformer network to handle text, vision, games, and robotics tasks.
-
OPT: Open Pre-trained Transformer Language Models
OPT releases open decoder-only transformers up to 175B parameters that match GPT-3 performance at one-seventh the carbon cost, along with code and training logs.
-
Flamingo: a Visual Language Model for Few-Shot Learning
Flamingo models reach new state-of-the-art few-shot results on image and video tasks by bridging frozen vision and language models with cross-attention layers trained on interleaved web-scale data.
-
TokenRatio: Principled Token-Level Preference Optimization via Ratio Matching
TBPO derives a token-level preference optimization objective from sequence-level pairwise data via Bregman divergence ratio matching that generalizes DPO and improves alignment quality.
-
Overtrained, Not Misaligned
Emergent misalignment arises from overtraining after primary task convergence and is preventable by early stopping, which retains 93% of task performance on average.
-
Navigating the Sea of LLM Evaluation: Investigating Bias in Toxicity Benchmarks
Toxicity benchmarks for LLMs produce inconsistent results when task type, input domain, or model changes, revealing intrinsic evaluation biases.
-
PersonaTeaming: Supporting Persona-Driven Red-Teaming for Generative AI
PersonaTeaming Workflow improves automated red-teaming attack success rates over RainbowPlus using personas while maintaining diversity, and PersonaTeaming Playground supports human-AI collaboration in red-teaming as ...
-
Ethics Testing: Proactive Identification of Generative AI System Harms
Ethics testing is introduced as a systematic approach to generate tests that identify software harms induced by unethical behavior in generative AI outputs.
-
Transient Turn Injection: Exposing Stateless Multi-Turn Vulnerabilities in Large Language Models
Transient Turn Injection is a new attack that evades LLM moderation by spreading harmful intent over multiple isolated turns using automated agents.
-
AlignCultura: Towards Culturally Aligned Large Language Models?
Align-Cultura introduces the CULTURAX dataset and shows that culturally fine-tuned LLMs improve joint HHH scores by 4-6%, cut cultural failures by 18%, and gain 10-12% efficiency with minimal leakage.
-
Human-Guided Harm Recovery for Computer Use Agents
Introduces harm recovery as a post-execution safeguard for computer-use agents, operationalized via a human-preference rubric, reward model, and BackBench benchmark that shows improved recovery trajectories.
-
The Salami Slicing Threat: Exploiting Cumulative Risks in LLM Systems
Salami Attack chains low-risk inputs to cumulatively trigger high-risk LLM behaviors, achieving over 90% success on GPT-4o and Gemini while resisting some defenses.
-
Safety, Security, and Cognitive Risks in State-Space Models: A Systematic Threat Analysis with Spectral, Stateful, and Capacity Attacks
State-space models are vulnerable to three new attack types that corrupt state integrity, with experiments showing up to 156x output changes and 6x higher targeted corruption than random inputs.
-
Towards an AI co-scientist
A multi-agent AI system generates novel biomedical hypotheses that show promising experimental validation in drug repurposing for leukemia, new targets for liver fibrosis, and a bacterial gene transfer mechanism.
-
BloombergGPT: A Large Language Model for Finance
BloombergGPT is a 50B parameter LLM trained on a 708B token mixed financial and general dataset that outperforms prior models on financial benchmarks while preserving general LLM performance.
-
Ignore Previous Prompt: Attack Techniques For Language Models
PromptInject shows that simple adversarial prompts can cause goal hijacking and prompt leaking in GPT-3, exploiting its stochastic behavior.
-
Red Teaming Language Models to Reduce Harms: Methods, Scaling Behaviors, and Lessons Learned
RLHF-aligned language models show increasing resistance to red teaming with scale up to 52B parameters, unlike prompted or rejection-sampled models, supported by a released dataset of 38,961 attacks.
-
Language Models (Mostly) Know What They Know
Language models show good calibration when asked to estimate the probability that their own answers are correct, with performance improving as models get larger.
-
Emergent Abilities of Large Language Models
Emergent abilities are capabilities present in large language models but absent in smaller ones and cannot be predicted by extrapolating smaller model performance.
-
Training a Helpful and Harmless Assistant with Reinforcement Learning from Human Feedback
RLHF alignment training on language models boosts NLP performance, supports skill specialization, enables weekly online updates with fresh human data, and shows a linear relation between RL reward and sqrt(KL divergen...
-
PaLM: Scaling Language Modeling with Pathways
PaLM 540B demonstrates continued scaling benefits by setting new few-shot SOTA results on hundreds of benchmarks and outperforming humans on BIG-bench.
-
LaMDA: Language Models for Dialog Applications
LaMDA shows that fine-tuning on human-value annotations and consulting external knowledge sources significantly improves safety and factual grounding in large dialog models beyond what scaling alone achieves.
-
Beyond Inefficiency: Systemic Costs of Incivility in Multi-Agent Monte Carlo Simulations
Monte Carlo simulations of LLM agents confirm that toxic debates take 25% longer to converge, with larger delays in smaller models, and show a first-mover advantage independent of toxicity.
-
Quantifying and Predicting Disagreement in Graded Human Ratings
Annotation disagreement on toxic language can be moderately predicted from textual features, with high-opposition items proving harder for models to estimate accurately.
-
Representational Harms in LLM-Generated Narratives Against Global Majority Nationalities
LLMs generate narratives containing persistent stereotypes, erasure, and one-dimensional portrayals of Global Majority national identities, with minoritized groups overrepresented in subordinated roles by more than fi...
-
BodhiPromptShield: Pre-Inference Prompt Mediation for Suppressing Privacy Propagation in LLM/VLM Agents
BodhiPromptShield reduces stage-wise privacy propagation in LLM/VLM agents from 10.7% to 7.1% on the Controlled Prompt-Privacy Benchmark by mediating sensitive spans before inference and restoring only at authorized b...
-
Sociodemographic Biases in Educational Counselling by Large Language Models
LLMs show sociodemographic biases in educational counseling that are amplified by vague student descriptions and substantially reduced by concrete individualized details.
-
The PICCO Framework for Large Language Model Prompting: A Taxonomy and Reference Architecture for Prompt Structure
PICCO is a five-element reference architecture (Persona, Instructions, Context, Constraints, Output) for structuring LLM prompts, derived from synthesizing prior frameworks along with a taxonomy distinguishing prompt ...
-
Measuring the metacognition of AI
Meta-d' and signal detection theory provide quantitative tools to assess metacognitive sensitivity and risk-based regulation in large language models.
-
PaLM 2 Technical Report
PaLM 2 reports state-of-the-art results on language, reasoning, and multilingual tasks with improved efficiency over PaLM.
-
Gemma: Open Models Based on Gemini Research and Technology
Gemma introduces open 2B and 7B LLMs derived from Gemini technology that beat comparable open models on 11 of 18 text tasks and come with safety assessments.
-
AI Trust OS -- A Continuous Governance Framework for Autonomous AI Observability and Zero-Trust Compliance in Enterprise Environments
AI Trust OS is a proposed always-on operating layer that discovers undocumented AI systems via telemetry and produces continuous zero-trust compliance artifacts for regulations including ISO 42001, EU AI Act, SOC 2, G...
-
Gemma 2: Improving Open Language Models at a Practical Size
Gemma 2 models achieve leading performance at their sizes by combining established Transformer modifications with knowledge distillation for the 2B and 9B variants.
Reference graph
Works this paper leans on
-
[1]
URL http: //dx.doi.org/10.1145/2976749.2978318
M. Abadi, A. Chu, I. Goodfellow, H. B. McMahan, I. Mironov, K. Talwar, and L. Zhang. Deep Learning with Differential Privacy . In Proceedings of the 2016 ACM SIGSAC Conference on Computer and Communications Security , CCS '16, pages 308--318, Vienna, Austria, October 2016. Association for Computing Machinery. ISBN 9781450341394. doi:10.1145/2976749.297831...
- [2]
- [3]
-
[4]
D. Acemoglu and P. Restrepo. Artificial Intelligence , Automation and Work . Working Paper 24196, National Bureau of Economic Research, January 2018. URL https://www.nber.org/papers/w24196
work page 2018
-
[5]
D. I. Adelani, J. Abbott, G. Neubig, D. D'souza, J. Kreutzer, C. Lignos, C. Palen-Michel, H. Buzaaba, S. Rijhwani, S. Ruder, S. Mayhew, I. A. Azime, S. Muhammad, C. C. Emezue, J. Nakatumba-Nabende, P. Ogayo, A. Aremu, C. Gitau, D. Mbaye, J. Alabi, S. M. Yimam, T. Gwadabe, I. Ezeani, R. A. Niyongabo, J. Mukiibi, V. Otiende, I. Orife, D. David, S. Ngom, T. ...
-
[6]
B. Agüera y Arcas, M. Mitchell, and A. Todorov. Physiognomy’s New Clothes , May 2017. URL https://medium.com/@blaisea/physiognomys-new-clothes-f2d4b59fdd6a
work page 2017
-
[7]
J. Ahlgren, M. E. Berezin, K. Bojarczuk, E. Dulskyte, I. Dvortsova, J. George, N. Gucevska, M. Harman, R. Lämmel, E. Meijer, S. Sapora, and J. Spahr-Summers. WES : Agent -based User Interaction Simulation on Real Infrastructure . arXiv:2004.05363 [cs], April 2020. URL http://arxiv.org/abs/2004.05363. arXiv: 2004.05363
-
[8]
J. Alammar. The Illustrated Transformer , June 2018. URL https://jalammar.github.io/illustrated-transformer/
work page 2018
-
[9]
H. Allcott, M. Gentzkow, and C. Yu. Trends in the diffusion of misinformation on social media. Research & Politics, 6 0 (2): 0 2053168019848554, April 2019. ISSN 2053-1680. doi:10.1177/2053168019848554. URL https://doi.org/10.1177/2053168019848554
- [10]
- [11]
-
[12]
S. Armstrong, A. Sandberg, and N. Bostrom. Thinking Inside the Box : Controlling and Using an Oracle AI . Minds and Machines, 22 0 (4): 0 299--324, November 2012. ISSN 1572-8641. doi:10.1007/s11023-012-9282-2. URL https://doi.org/10.1007/s11023-012-9282-2
-
[13]
D. Autor and A. Salomons. New Frontiers : The Evolving Content and Geography of New Work in the 20th Century - David Autor . Working Paper, 2019. URL https://app.scholarsite.io/david-autor/articles/new-frontiers-the-evolving-content-and-geography-of-new-work-in-the-20th-century
work page 2019
-
[14]
J. K. Baker. Stochastic modeling for automatic speech understanding. In Readings in speech recognition, pages 297--307. Morgan Kaufmann Publishers Inc., San Francisco, CA, USA, May 1990. ISBN 9781558601246
work page 1990
-
[15]
S. Barocas and A. D. Selbst. Big Data 's Disparate Impact . California Law Review, 104: 0 671, 2016. URL https://heinonline.org/HOL/Page?handle=hein.journals/calr104&id=695&div=&collection=
work page 2016
-
[16]
S. Barocas, M. Hardt, and A. Narayanan. Fairness and machine learning. fairmlbook.org, 2019. URL https://fairmlbook.org/
work page 2019
-
[17]
Analysis Methods in Neural Language Processing: A Survey
Y. Belinkov and J. Glass. Analysis Methods in Neural Language Processing : A Survey . Transactions of the Association for Computational Linguistics, 7: 0 49--72, April 2019. ISSN 2307-387X. doi:10.1162/tacl_a_00254. URL https://doi.org/10.1162/tacl_a_00254
-
[18]
E. Bender. The \# BenderRule : On Naming the Languages We Study and Why It Matters . The Gradient, September 2019. URL https://thegradient.pub/the-benderrule-on-naming-the-languages-we-study-and-why-it-matters/
work page 2019
-
[19]
E. M. Bender. On Achieving and Evaluating Language - Independence in NLP . Linguistic Issues in Language Technology, 6 0 (0), November 2011. ISSN 1945-3604. URL http://elanguage.net/journals/lilt/article/view/2624
work page 2011
-
[20]
E. M. Bender and B. Friedman. Data Statements for Natural Language Processing : Toward Mitigating System Bias and Enabling Better Science . Transactions of the Association for Computational Linguistics, 6: 0 587--604, December 2018. ISSN 2307-387X. doi:10.1162/tacl_a_00041. URL https://doi.org/10.1162/tacl_a_00041
-
[21]
E. M. Bender and A. Koller. Climbing towards NLU : On Meaning , Form , and Understanding in the Age of Data . In Proceedings of the 58th Annual Meeting of the Association for Computational Linguistics , pages 5185--5198, Online, July 2020. Association for Computational Linguistics. doi:10.18653/v1/2020.acl-main.463. URL https://aclanthology.org/2020.acl-main.463
-
[22]
E. M. Bender, T. Gebru, A. McMillan-Major, and S. Shmitchell. On the Dangers of Stochastic Parrots : Can Language Models Be Too Big ? In Proceedings of the 2021 ACM Conference on Fairness , Accountability , and Transparency , FAccT '21, pages 610--623, Virtual Event, Canada, March 2021. Association for Computing Machinery. ISBN 9781450383097. doi:10.1145/...
-
[23]
Y. Bengio. Neural net language models, January 2008. URL http://www.scholarpedia.org/article/Neural_net_language_models
work page 2008
-
[24]
R. Benjamin. Race After Technology : Abolitionist Tools for the New Jim Code . Social Forces, 98 0 (4): 0 1--3, June 2020. ISSN 0037-7732. doi:10.1093/sf/soz162. URL https://doi.org/10.1093/sf/soz162
-
[25]
H. Bergen. ‘ I ’d Blush if I Could ’: Digital Assistants , Disembodied Cyborgs and the Problem of Gender . Word and Text, A Journal of Literary Studies and Linguistics, VI 0 (01): 0 95--113, 2016. ISSN 2069-9271. URL https://www.ceeol.com/search/article-detail?id=469884
work page 2016
-
[26]
T. W. Bickmore, H. Trinh, S. Olafsson, T. K. O'Leary, R. Asadi, N. M. Rickles, and R. Cruz. Patient and Consumer Safety Risks When Using Conversational Assistants for Medical Information : An Observational Study of Siri , Alexa , and Google Assistant . Journal of Medical Internet Research, 20 0 (9): 0 e11510, September 2018. doi:10.2196/11510. URL https:/...
- [27]
-
[28]
S. L. Blodgett, L. Green, and B. O'Connor. Demographic Dialectal Variation in Social Media : A Case Study of African - American English . In Proceedings of the 2016 Conference on Empirical Methods in Natural Language Processing , pages 1119--1130, Austin, Texas, November 2016. Association for Computational Linguistics. doi:10.18653/v1/D16-1120. URL https:...
- [29]
-
[30]
S. L. Blodgett, G. Lopez, A. Olteanu, R. Sim, and H. Wallach. Stereotyping Norwegian Salmon : An Inventory of Pitfalls in Fairness Benchmark Datasets . In Proceedings of the 59th Annual Meeting of the Association for Computational Linguistics and the 11th International Joint Conference on Natural Language Processing ( Volume 1: Long Papers ) , pages 1004-...
-
[31]
S. Bok. Secrecy and Openness in Science : Ethical Considerations . Science, Technology, & Human Values, 7 0 (38): 0 32--41, 1982. ISSN 0162-2439. URL https://www.jstor.org/stable/689458
work page 1982
-
[32]
On the Opportunities and Risks of Foundation Models
R. Bommasani, D. A. Hudson, E. Adeli, R. Altman, S. Arora, S. von Arx, M. S. Bernstein, J. Bohg, A. Bosselut, E. Brunskill, E. Brynjolfsson, S. Buch, D. Card, R. Castellon, N. Chatterji, A. Chen, K. Creel, J. Q. Davis, D. Demszky, C. Donahue, M. Doumbouya, E. Durmus, S. Ermon, J. Etchemendy, K. Ethayarajh, L. Fei-Fei, C. Finn, T. Gale, L. Gillespie, K. Go...
work page internal anchor Pith review Pith/arXiv arXiv 2021
-
[33]
N. Bostrom. Superintelligence: paths, dangers, strategies. Oxford University Press, Oxford, 2014. ISBN 9780199678112. OCLC: ocn881706835
work page 2014
-
[34]
N. Bostrom et al. Information hazards: A typology of potential harms from knowledge. Review of Contemporary Philosophy, pages 44--79, 2011
work page 2011
-
[35]
G. C. Bowker and S. L. Star. Sorting Things Out : Classification and Its Consequences . Inside Technology . MIT Press, Cambridge, MA, USA, September 1999. ISBN 9780262024617
work page 1999
-
[36]
G. Branwen. GPT -3 Creative Fiction , June 2020. URL https://www.gwern.net/GPT-3
work page 2020
-
[37]
C. Breazeal and B. Scassellati. Infant-like Social Interactions between a Robot and a Human Caregiver . Adaptive Behavior, 8 0 (1): 0 49--74, January 2000. ISSN 1059-7123. doi:10.1177/105971230000800104. URL https://doi.org/10.1177/105971230000800104
-
[38]
T. B. Brown, B. Mann, N. Ryder, M. Subbiah, J. Kaplan, P. Dhariwal, A. Neelakantan, P. Shyam, G. Sastry, A. Askell, S. Agarwal, A. Herbert-Voss, G. Krueger, T. Henighan, R. Child, A. Ramesh, D. M. Ziegler, J. Wu, C. Winter, C. Hesse, M. Chen, E. Sigler, M. Litwin, S. Gray, B. Chess, J. Clark, C. Berner, S. McCandlish, A. Radford, I. Sutskever, and D. Amod...
work page internal anchor Pith review Pith/arXiv arXiv 2005
-
[39]
S. Browne. Dark Matters . Duke University Press, September 2015. ISBN 9780822375302. URL https://www.degruyter.com/document/doi/10.1515/9780822375302/html
-
[40]
B. Buchanan, A. Lohn, M. Musser, and S. Katerina. Truth, Lies , and Truth , Lies , and Automation : How Language Models Could Change DisinformationAutomation : How Language Models Could Change Disinformation . Technical report, CSET, May 2021
work page 2021
-
[41]
J. Buolamwini and T. Gebru. Gender Shades : Intersectional Accuracy Disparities in Commercial Gender Classification . In Conference on Fairness , Accountability and Transparency , pages 77--91. PMLR, January 2018. URL https://proceedings.mlr.press/v81/buolamwini18a.html
work page 2018
-
[42]
Semantics derived automatically from language corpora contain human-like biases,
A. Caliskan, J. J. Bryson, and A. Narayanan. Semantics derived automatically from language corpora contain human-like biases. Science, 356 0 (6334): 0 183--186, April 2017. ISSN 0036-8075, 1095-9203. doi:10.1126/science.aal4230. URL http://arxiv.org/abs/1608.07187. arXiv: 1608.07187
-
[43]
Y. T. Cao and H. Daumé III. Toward Gender - Inclusive Coreference Resolution . Proceedings of the 58th Annual Meeting of the Association for Computational Linguistics, pages 4568--4595, 2020. doi:10.18653/v1/2020.acl-main.418. URL http://arxiv.org/abs/1910.13913. arXiv: 1910.13913
-
[44]
N. Carlini, C. Liu, \'U . Erlingsson, J. Kos, and D. Song. The Secret Sharer : Evaluating and Testing Unintended Memorization in Neural Networks . In 28th USENIX Security Symposium ( USENIX Security 19) , pages 267--284, 2019. ISBN 9781939133069. URL https://www.usenix.org/conference/usenixsecurity19/presentation/carlini
work page 2019
-
[45]
Extracting training data from large language models
N. Carlini, F. Tramer, E. Wallace, M. Jagielski, A. Herbert-Voss, K. Lee, A. Roberts, T. Brown, D. Song, U. Erlingsson, A. Oprea, and C. Raffel. Extracting Training Data from Large Language Models . arXiv:2012.07805 [cs], June 2021. URL http://arxiv.org/abs/2012.07805. arXiv: 2012.07805
-
[46]
I. Caswell, J. Kreutzer, L. Wang, A. Wahab, D. van Esch, N. Ulzii-Orshikh, A. Tapo, N. Subramani, A. Sokolov, C. Sikasote, M. Setyawan, S. Sarin, S. Samb, B. Sagot, C. Rivera, A. Rios, I. Papadimitriou, S. Osei, P. J. O. Suárez, I. Orife, K. Ogueji, R. A. Niyongabo, T. Q. Nguyen, M. Müller, A. Müller, S. H. Muhammad, N. Muhammad, A. Mnyakeni, J. Mirzakhal...
-
[47]
S. Cave and K. Dihal. The Whiteness of AI . Philosophy & Technology, 33 0 (4): 0 685--703, December 2020. ISSN 2210-5441. doi:10.1007/s13347-020-00415-6. URL https://doi.org/10.1007/s13347-020-00415-6
-
[48]
A. Cercas Curry, J. Robertson, and V. Rieser. Conversational Assistants and Gender Stereotypes : Public Perceptions and Desiderata for Voice Personas . In Proceedings of the Second Workshop on Gender Bias in Natural Language Processing , pages 72--78, Barcelona, Spain (Online), December 2020. Association for Computational Linguistics. URL https://aclantho...
work page 2020
-
[49]
M. Chen, J. Tworek, H. Jun, Q. Yuan, H. P. d. O. Pinto, J. Kaplan, H. Edwards, Y. Burda, N. Joseph, G. Brockman, A. Ray, R. Puri, G. Krueger, M. Petrov, H. Khlaaf, G. Sastry, P. Mishkin, B. Chan, S. Gray, N. Ryder, M. Pavlov, A. Power, L. Kaiser, M. Bavarian, C. Winter, P. Tillet, F. P. Such, D. Cummings, M. Plappert, F. Chantzis, E. Barnes, A. Herbert-Vo...
work page internal anchor Pith review Pith/arXiv arXiv 2021
-
[50]
R. J. Chen, M. Y. Lu, T. Y. Chen, D. F. K. Williamson, and F. Mahmood. Synthetic data in machine learning for medicine and healthcare. Nature Biomedical Engineering, 5 0 (6): 0 493--497, June 2021 b . ISSN 2157-846X. doi:10.1038/s41551-021-00751-8. URL https://www.nature.com/articles/s41551-021-00751-8
-
[51]
A. Chouldechova and A. Roth. The Frontiers of Fairness in Machine Learning . arXiv:1810.08810 [cs, stat], October 2018. URL http://arxiv.org/abs/1810.08810. arXiv: 1810.08810
-
[52]
E. Colleoni, A. Rozza, and A. Arvidsson. Echo Chamber or Public Sphere ? Predicting Political Orientation and Measuring Political Homophily in Twitter Using Big Data . Journal of Communication, 64 0 (2): 0 317--332, April 2014. ISSN 0021-9916. doi:10.1111/jcom.12084. URL https://doi.org/10.1111/jcom.12084
-
[53]
GitHub Copilot · Your AI pair programmer, 2021
CopilotonGitHub. GitHub Copilot · Your AI pair programmer, 2021. URL https://copilot.github.com/
work page 2021
-
[54]
D. Coyle and A. Weller. “ Explaining ” machine learning reveals policy challenges. Science, 368 0 (6498): 0 1433--1434, June 2020. doi:10.1126/science.aba9647. URL https://www.science.org/doi/full/10.1126/science.aba9647
-
[55]
J. T. Craft, K. E. Wright, R. E. Weissler, and R. M. Queen. Language and Discrimination : Generating Meaning , Perceiving Identities , and Discriminating Outcomes . Annual Review of Linguistics, 6 0 (1): 0 389--407, January 2020. ISSN 2333-9683, 2333-9691. doi:10.1146/annurev-linguistics-011718-011659. URL https://www.annualreviews.org/doi/10.1146/annurev...
work page doi:10.1146/annurev-linguistics-011718-011659 2020
- [56]
- [57]
- [58]
-
[59]
B. Cyphers and G. Gebhart. Behind the One - Way Mirror : A Deep Dive Into the Technology of Corporate Surveillance . Technical report, Electronic Frontier Foundation, December 2019. URL https://www.eff.org/wp/behind-the-one-way-mirror
work page 2019
-
[60]
R. Dale. GPT -3: What ’s it good for? Natural Language Engineering, 27 0 (1): 0 113--118, January 2021. ISSN 1351-3249, 1469-8110. doi:10.1017/S1351324920000601. URL https://www.cambridge.org/core/journals/natural-language-engineering/article/gpt3-whats-it-good-for/0E05CFE68A7AC8BF794C8ECBE28AA990
-
[61]
J. Dastin. Amazon scraps secret AI recruiting tool that showed bias against women. Reuters, October 2018. URL https://www.reuters.com/article/us-amazon-com-jobs-automation-insight-idUSKCN1MK08G
work page 2018
- [62]
-
[63]
E. Denton, A. Hanna, R. Amironesei, A. Smart, H. Nicole, and M. K. Scheuerman. Bringing the People Back In : Contesting Benchmark Machine Learning Datasets . arXiv:2007.07399 [cs], July 2020. URL http://arxiv.org/abs/2007.07399. arXiv: 2007.07399
-
[64]
BERT: Pre-training of Deep Bidirectional Transformers for Language Understanding
J. Devlin, M.-W. Chang, K. Lee, and K. Toutanova. BERT : Pre -training of Deep Bidirectional Transformers for Language Understanding . arXiv:1810.04805 [cs], May 2019. URL http://arxiv.org/abs/1810.04805. arXiv: 1810.04805
work page internal anchor Pith review Pith/arXiv arXiv 2019
-
[65]
T. Dietterich and E. B. Kong. Machine Learning Bias , Statistical Bias , and Statistical Variance of Decision Tree Algorithms . Technical report, Department of Computer Science, Oregon State University, 1995
work page 1995
-
[66]
Stevie Bergman, Shannon Spruit, Dirk Hovy, Y-Lan Boureau, and Verena Rieser
E. Dinan, G. Abercrombie, A. S. Bergman, S. Spruit, D. Hovy, Y.-L. Boureau, and V. Rieser. Anticipating Safety Issues in E2E Conversational AI : Framework and Tooling . arXiv:2107.03451 [cs], July 2021. URL http://arxiv.org/abs/2107.03451. arXiv: 2107.03451
-
[67]
Proceedings of the 2018 AAAI/ACM Conference on AI, Ethics, and Society , pages =
L. Dixon, J. Li, J. Sorensen, N. Thain, and L. Vasserman. Measuring and Mitigating Unintended Bias in Text Classification . In Proceedings of the 2018 AAAI / ACM Conference on AI , Ethics , and Society , AIES '18, pages 67--73, New Orleans, LA, USA, December 2018. Association for Computing Machinery. ISBN 9781450360128. doi:10.1145/3278721.3278729. URL ht...
-
[68]
L. Dobberstein. Korean app-maker Scatter Lab fined for using private data to create homophobic and lewd chatbot. The Register, April 2021. URL https://www.theregister.com/2021/04/29/scatter_lab_fined_for_lewd_chatbot/
work page 2021
-
[69]
J. Dodge, M. Sap, A. Marasović, W. Agnew, G. Ilharco, D. Groeneveld, M. Mitchell, and M. Gardner. Documenting Large Webtext Corpora : A Case Study on the Colossal Clean Crawled Corpus . arXiv:2104.08758 [cs], September 2021. URL http://arxiv.org/abs/2104.08758. arXiv: 2104.08758
-
[70]
Towards A Rigorous Science of Interpretable Machine Learning
F. Doshi-Velez and B. Kim. Towards A Rigorous Science of Interpretable Machine Learning . arXiv:1702.08608 [cs, stat], March 2017. URL http://arxiv.org/abs/1702.08608. arXiv: 1702.08608
work page internal anchor Pith review arXiv 2017
-
[71]
D. M. Douglas. Doxing: a conceptual analysis. Ethics and Information Technology, 18 0 (3): 0 199--210, September 2016. ISSN 1572-8439. doi:10.1007/s10676-016-9406-0. URL https://doi.org/10.1007/s10676-016-9406-0
-
[72]
C. Du. Chinese AI lab challenges Google , OpenAI with a model of 1.75 trillion parameters. PingWest, June 2021. URL https://en.pingwest.com/a/8693
work page 2021
-
[73]
M. Duggan. Online Harassment 2017. Technical report, Pew Research Center, July 2017. URL https://www.pewresearch.org/internet/2017/07/11/online-harassment-2017/
work page 2017
-
[74]
W. H. Dutton and C. T. Robertson. Disentangling polarisation and civic empowerment in the digital age : The role of filter bubbles and echo chambers in the rise of populism. In The Routledge Companion to Media Disinformation and Populism . Routledge, 2021
work page 2021
-
[75]
C. Dwork, F. McSherry, K. Nissim, and A. Smith. Calibrating Noise to Sensitivity in Private Data Analysis . In S. Halevi and T. Rabin, editors, Theory of Cryptography , Lecture Notes in Computer Science , pages 265--284, Berlin, Heidelberg, 2006. Springer. ISBN 9783540327325. doi:10.1007/11681878_14
-
[76]
R. Evans and J. Gao. DeepMind AI Reduces Google Data Centre Cooling Bill by 40\ URL https://deepmind.com/blog/article/deepmind-ai-reduces-google-data-centre-cooling-bill-40
-
[77]
Switch Transformers: Scaling to Trillion Parameter Models with Simple and Efficient Sparsity
W. Fedus, B. Zoph, and N. Shazeer. Switch Transformers : Scaling to Trillion Parameter Models with Simple and Efficient Sparsity . arXiv:2101.03961 [cs], January 2021. URL http://arxiv.org/abs/2101.03961. arXiv: 2101.03961
work page internal anchor Pith review arXiv 2021
- [78]
-
[79]
S. Finkelstein, E. Yarzebinski, C. Vaughn, A. Ogan, and J. Cassell. The Effects of Culturally Congruent Educational Technologies on Student Achievement . In H. C. Lane, K. Yacef, J. Mostow, and P. Pavlik, editors, Artificial Intelligence in Education , Lecture Notes in Computer Science , pages 493--502, Berlin, Heidelberg, 2013. Springer. ISBN 97836423911...
-
[80]
C. Flood. Fake news infiltrates financial markets. Financial Times, May 2017. URL https://www.ft.com/content/a37e4874-2c2a-11e7-bc4b-5528796fe35c
work page 2017
discussion (0)
Sign in with ORCID, Apple, or X to comment. Anyone can read and Pith papers without signing in.