hub

C row S -Pairs: A Challenge Dataset for Measuring Social Biases in Masked Language Models

Nangia, Nikita, Vania, Clara, Bhalerao, Rasika, Bowman, Samuel R · 2020 · DOI 10.18653/v1/2020.emnlp-main.154

16 Pith papers cite this work. Polarity classification is still indexing.

16 Pith papers citing it

open at publisher browse 16 citing papers

hub tools

JSON dossier citing papers JSON publisher DOI

citation-role summary

background 1

citation-polarity summary

background 1

representative citing papers

Pythia: A Suite for Analyzing Large Language Models Across Training and Scaling

cs.CL · 2023-04-03 · accept · novelty 8.0

Pythia releases 16 identically trained LLMs with full checkpoints and data tools to study training dynamics, scaling, memorization, and bias in language models.

Polar: A Benchmark for Evaluating Political Bias in LLMs

cs.CL · 2026-06-11 · unverdicted · novelty 7.0

Polar is a new cross-context benchmark showing LLM political bias measurements are not fixed but vary with country, issue, model, and language.

Is She Even Relevant? When BERT Ignores Explicit Gender Cues

cs.CL · 2026-05-08 · conditional · novelty 7.0

A Dutch BERT model encodes gender linearly by epoch 20 but does not dynamically update its representations when explicit female cues contradict learned stereotypical associations in short sentence templates.

Wait, am I Being Fair? Characterizing Deductive Stereotyping and Mitigating It with Fair-GCG

cs.CL · 2026-06-30 · unverdicted · novelty 6.0

The paper characterizes deductive stereotyping in LLMs and introduces Fair-GCG to discover injection phrases that improve fairness across benchmarks, reasoning, and real-world tasks.

Debiasing Without Protected Attributes: Latent Concept Erasure from Textual Profiles

cs.CL · 2026-06-10 · unverdicted · novelty 6.0

H-SAL erases latent concepts from text profiles using self-descriptions as implicit debiasing signals and shows competitive performance on a new multi-domain Stack Exchange helpfulness benchmark.

On the Limits of LLM Adaptability: Impact of Model-Internalized Priors on Annotation Task Performance

cs.CL · 2026-05-30 · unverdicted · novelty 6.0

LLMs correct only 34.8% of zero-shot annotation errors via prompting, and Definition-Specific Familiarity correlates positively with performance (partial r = +0.41) while memorization metrics do not.

Omissive Bias in Religious Representation: Benchmarking LLM Answers to Everyday Ethical Decision-making

cs.LG · 2026-05-23 · unverdicted · novelty 6.0

LLMs show omissive bias by underrepresenting religious frameworks in responses to non-religious ethical questions relative to human expectations.

Compared to What? Baselines and Metrics for Counterfactual Prompting

cs.CL · 2026-05-01 · conditional · novelty 6.0

Counterfactual prompting effects on LLMs are often indistinguishable from those caused by meaning-preserving paraphrases, causing most previously reported demographic sensitivities to disappear under proper statistical comparison.

Contrastive Analysis of Linguistic Representations in Large Language Model Outputs through Structured Synthetic Data Generation and Abstracted N-gram Associations

cs.CL · 2026-04-19 · unverdicted · novelty 6.0

A methodological framework detects subtle group-associated linguistic biases in LLM outputs by generating controlled synthetic minimal pairs, abstracting n-grams, and ranking high-signal fragments with a PMI variant for expert review.

Can We Trust a Black-box LLM? LLM Untrustworthy Boundary Detection via Bias-Diffusion and Multi-Agent Reinforcement Learning

cs.AI · 2026-04-07 · unverdicted · novelty 6.0

GMRL-BD detects untrustworthy topic boundaries for black-box LLMs by combining bias-diffusion on a Wikipedia KG with multi-agent RL, supported by a released dataset labeling biases in models like Llama2 and Qwen2.

Invisible Influences: Investigating Implicit Intersectional Biases through Persona Engineering in Large Language Models

cs.CL · 2026-03-16 · unverdicted · novelty 6.0

The paper proposes the BADx metric to quantify persona-induced amplification of implicit intersectional biases in five LLMs, showing that context modulates bias beyond what static embedding tests capture.

Enhancing Chat Language Models by Scaling High-quality Instructional Conversations

cs.CL · 2023-05-23 · conditional · novelty 6.0

UltraChat supplies 1.5 million high-quality multi-turn dialogues that, when used to fine-tune LLaMA, produce UltraLLaMA, which outperforms prior open-source chat models including Vicuna.

BLOOM: A 176B-Parameter Open-Access Multilingual Language Model

cs.CL · 2022-11-09 · unverdicted · novelty 6.0

BLOOM is a 176B-parameter open-access multilingual language model trained on the ROOTS corpus that achieves competitive performance on benchmarks, with improved results after multitask prompted finetuning.

Galactica: A Large Language Model for Science

cs.CL · 2022-11-16 · unverdicted · novelty 5.0

Galactica, a science-specialized LLM, reports higher scores than GPT-3, Chinchilla, and PaLM on LaTeX knowledge, mathematical reasoning, and medical QA benchmarks while outperforming general models on BIG-bench.

Topics as Proxies for Sociodemographics: How Conversational Context Affects LLM Answers

cs.CL · 2026-06-01 · unverdicted · novelty 4.0

LLMs show minimal sociodemographic disparities in advice because they infer user demographics poorly from history; conversation topics are the main predictor and act as proxies for groups.

Benchmark Data Contamination of Large Language Models: A Survey

cs.CL · 2024-06-06 · unverdicted · novelty 3.0

A survey reviewing benchmark data contamination in LLMs, its impact on evaluation, and alternative assessment approaches.

citing papers explorer

Showing 0 of 0 citing papers after filters.

No citing papers match the current filters.

C row S -Pairs: A Challenge Dataset for Measuring Social Biases in Masked Language Models

hub tools

citation-role summary

citation-polarity summary

fields

years

verdicts

roles

polarities

representative citing papers

citing papers explorer