StylisticBias benchmark shows 15 visual attributes explain nearly 80% of bias variation in six MLLMs by isolating single cues like age and fashion in generated images.
The Woman Worked as a Babysitter: On Biases in Language Generation
13 Pith papers cite this work. Polarity classification is still indexing.
citation-role summary
citation-polarity summary
roles
background 1polarities
background 1representative citing papers
A Dutch BERT model encodes gender linearly by epoch 20 but does not dynamically update its representations when explicit female cues contradict learned stereotypical associations in short sentence templates.
SPAGBias reveals that LLMs form nuanced gender associations with specific urban micro-spaces that exceed real-world distributions and produce failures in planning and descriptive tasks.
LLMs trained on simple specification gaming generalize to zero-shot reward tampering including rewriting their own reward function.
LLMs default to responses more similar to opinions from the USA and some European and South American countries; prompting for a country shifts alignment but can introduce stereotypes, while translation does not reliably match language speakers.
BBQ is a new benchmark dataset showing that QA models often default to social stereotypes, achieving up to 3.4 points higher accuracy when the correct answer aligns with bias.
Generative LMs in laissez-faire open-ended prompting settings disproportionately generate subordinated portrayals of minoritized race, gender, and sexual orientation identities at rates hundreds to thousands of times higher than empowering ones.
StarCoder2-15B matches or beats CodeLlama-34B on code tasks despite being smaller, and StarCoder2-3B outperforms prior 15B models, with open weights and exact training data identifiers released.
The Flan Collection demonstrates that task balancing, data enrichment, and mixed prompt training are critical to effective instruction tuning, yielding stronger Flan-T5 models released publicly.
Introduces the first interpersonal emotion dataset from congressional tweets and demonstrates that joint neural modeling of interpersonal group relationships and emotions yields performance gains on both.
Foundation models are large adaptable AI systems with emergent capabilities that offer broad opportunities but carry risks from homogenization, opacity, and inherited defects across downstream applications.
Human-written screenplays pass the Bechdel test more often than those generated by GPT-5, Gemini 3 Pro, and Claude Sonnet 4.5, though network analyses show mixed bias patterns across all script types.
LLMs exhibit masculine bias when assigning gender to animal characters in generated stories, with neutrality often resulting in erasure of feminine perspectives.
citing papers explorer
-
StylisticBias: A Few Human Visual Cues Drive Most Social Biases in MLLMs
StylisticBias benchmark shows 15 visual attributes explain nearly 80% of bias variation in six MLLMs by isolating single cues like age and fashion in generated images.
-
Is She Even Relevant? When BERT Ignores Explicit Gender Cues
A Dutch BERT model encodes gender linearly by epoch 20 but does not dynamically update its representations when explicit female cues contradict learned stereotypical associations in short sentence templates.
-
SPAGBias: Uncovering and Tracing Structured Spatial Gender Bias in Large Language Models
SPAGBias reveals that LLMs form nuanced gender associations with specific urban micro-spaces that exceed real-world distributions and produce failures in planning and descriptive tasks.
-
Sycophancy to Subterfuge: Investigating Reward-Tampering in Large Language Models
LLMs trained on simple specification gaming generalize to zero-shot reward tampering including rewriting their own reward function.
-
Towards Measuring the Representation of Subjective Global Opinions in Language Models
LLMs default to responses more similar to opinions from the USA and some European and South American countries; prompting for a country shifts alignment but can introduce stereotypes, while translation does not reliably match language speakers.
-
BBQ: A Hand-Built Bias Benchmark for Question Answering
BBQ is a new benchmark dataset showing that QA models often default to social stereotypes, achieving up to 3.4 points higher accuracy when the correct answer aligns with bias.
-
Laissez-Faire Harms: Algorithmic Biases in Generative Language Models
Generative LMs in laissez-faire open-ended prompting settings disproportionately generate subordinated portrayals of minoritized race, gender, and sexual orientation identities at rates hundreds to thousands of times higher than empowering ones.
-
StarCoder 2 and The Stack v2: The Next Generation
StarCoder2-15B matches or beats CodeLlama-34B on code tasks despite being smaller, and StarCoder2-3B outperforms prior 15B models, with open weights and exact training data identifiers released.
-
The Flan Collection: Designing Data and Methods for Effective Instruction Tuning
The Flan Collection demonstrates that task balancing, data enrichment, and mixed prompt training are critical to effective instruction tuning, yielding stronger Flan-T5 models released publicly.
-
How people talk about each other: Modeling Generalized Intergroup Bias and Emotion
Introduces the first interpersonal emotion dataset from congressional tweets and demonstrates that joint neural modeling of interpersonal group relationships and emotions yields performance gains on both.
-
On the Opportunities and Risks of Foundation Models
Foundation models are large adaptable AI systems with emergent capabilities that offer broad opportunities but carry risks from homogenization, opacity, and inherited defects across downstream applications.
-
Do Language Models Pass the Bechdel Test? Auditing Gender Biases in LLM-Generated Screenplays
Human-written screenplays pass the Bechdel test more often than those generated by GPT-5, Gemini 3 Pro, and Claude Sonnet 4.5, though network analyses show mixed bias patterns across all script types.
-
Neutrality Bites: Gender Representation in AI-Generated Animal Stories
LLMs exhibit masculine bias when assigning gender to animal characters in generated stories, with neutrality often resulting in erasure of feminine perspectives.