Large language models display the identifiable victim effect at roughly twice the human baseline, strongly amplified by instruction tuning and chain-of-thought prompting but inverted by reasoning-specialized models.
Political Compass or Spinning Arrow? Towards More Meaningful Evaluations for Values and Opinions in Large Language Models
9 Pith papers cite this work. Polarity classification is still indexing.
citation-role summary
citation-polarity summary
roles
background 1polarities
background 1representative citing papers
Polar is a new cross-context benchmark showing LLM political bias measurements are not fixed but vary with country, issue, model, and language.
LLMs exhibit higher perplexity on far-right and nationalist party texts than social-democratic ones, consistent across models and languages with correlation to translation metrics.
LLMs default to U.S. frameworks for English prompts and China frameworks for Chinese prompts on jurisdiction-underspecified legal-administrative queries, with the pattern holding across all seven tested models.
An empirical red-teaming study measures political Overton Windows across more than 30 open-source LLMs from 10 families and finds left-leaning bias, inverse size correlation, regional variation, and variable jailbreak effectiveness.
LLMs deviate from human moral preferences in kidney allocation scenarios and rarely express indecision, though low-rank fine-tuning with few examples can improve both consistency and uncertainty calibration.
Behavioral audit finds emergent, city-dependent racial steering in LLM housing recommendations that changes with user identity and preference context.
Training data for open LLMs is systematically left-leaning, with pre-training corpora containing more political material than post-training data and model stances aligning with data distributions.
Inducing targeted values in LLMs through fine-tuning causes spillover to related or opposing values, boosts safety metrics, and increases anthropomorphic and sycophantic language across all tested values.
citing papers explorer
-
Narrative over Numbers: The Identifiable Victim Effect and its Amplification Under Alignment and Reasoning in Large Language Models
Large language models display the identifiable victim effect at roughly twice the human baseline, strongly amplified by instruction tuning and chain-of-thought prompting but inverted by reasoning-specialized models.
-
Polar: A Benchmark for Evaluating Political Bias in LLMs
Polar is a new cross-context benchmark showing LLM political bias measurements are not fixed but vary with country, issue, model, and language.
-
Large Language Models are Perplexed by some Political Parties
LLMs exhibit higher perplexity on far-right and nationalist party texts than social-democratic ones, consistent across models and languages with correlation to translation metrics.
-
Which Institutional Frameworks Do Chatbots Assume? Auditing Jurisdictional Defaults in Multilingual LLMs
LLMs default to U.S. frameworks for English prompts and China frameworks for Chinese prompts on jurisdiction-underspecified legal-administrative queries, with the pattern holding across all seven tested models.
-
How Far Will They Go? Red-Teaming Online Influence with Large Language Models
An empirical red-teaming study measures political Overton Windows across more than 30 open-source LLMs from 10 families and finds left-leaning bias, inverse size correlation, regional variation, and variable jailbreak effectiveness.
-
Who Gets the Kidney? Human-AI Alignment, Indecision, and Moral Values
LLMs deviate from human moral preferences in kidney allocation scenarios and rarely express indecision, though low-rank fine-tuning with few examples can improve both consistency and uncertainty calibration.
-
The Geography of Algorithmic Judgment: LLM Intermediaries, Place Identity, and Racial Steering in Housing Search
Behavioral audit finds emergent, city-dependent racial steering in LLM housing recommendations that changes with user identity and preference context.
-
What Is The Political Content in LLMs' Pre- and Post-Training Data?
Training data for open LLMs is systematically left-leaning, with pre-training corpora containing more political material than post-training data and model stances aligning with data distributions.
-
How Value Induction Reshapes LLM Behaviour
Inducing targeted values in LLMs through fine-tuning causes spillover to related or opposing values, boosts safety metrics, and increases anthropomorphic and sycophantic language across all tested values.