United Nations

Towards safer pretraining: Analyzing, filtering harmful content in webscale datasets for responsible llms · 1982 · arXiv 2505.02009

1 Pith paper cite this work. Polarity classification is still indexing.

1 Pith paper citing it

read on arXiv browse 1 citing papers

representative citing papers

Epistemic Injustice in Language Models: An Audit of Pretraining Filters and Guardrails

cs.CL · 2026-06-04 · unverdicted · novelty 5.0

An audit finds language model filters and guardrails disproportionately suppress mentions of marginalized groups via lexical cues while failing to catch explicit harms.

citing papers explorer

Showing 1 of 1 citing paper.

Epistemic Injustice in Language Models: An Audit of Pretraining Filters and Guardrails cs.CL · 2026-06-04 · unverdicted · none · ref 23
An audit finds language model filters and guardrails disproportionately suppress mentions of marginalized groups via lexical cues while failing to catch explicit harms.

United Nations

fields

years

verdicts

representative citing papers

citing papers explorer