Political Neutrality as Balanced Approval: A Large-Scale Human Evaluation of AI Responses

David Zhai Yang; Jonathan Stray; Miu Nicole Takagi; Serina Chang; Steven Luo

arxiv: 2605.28911 · v1 · pith:WPFP2D2Pnew · submitted 2026-05-27 · 💻 cs.CY

Political Neutrality as Balanced Approval: A Large-Scale Human Evaluation of AI Responses

Jonathan Stray , David Zhai Yang , Steven Luo , Miu Nicole Takagi , Serina Chang This is my paper

Pith reviewed 2026-06-29 09:37 UTC · model grok-4.3

classification 💻 cs.CY

keywords AI political neutralitybalanced approvalhuman evaluationlarge-scale studyPARETO datasetcontroversial issuesmodel responsesuser ratings

0 comments

The pith

AI responses can achieve high approval from both sides of controversial issues.

A machine-rendered reading of the paper's core claim, the machinery that carries it, and where it could break.

The paper defines AI political neutrality as producing answers that maximize approval across opposing groups while keeping approval balanced between them. Researchers tested the definition by collecting ratings from thousands of participants on responses to 20 divisive U.S. issues drawn from real online questions. The results show that responses scoring well with both sides are possible on every issue, even when the two sides disagree sharply on substance. Default outputs from most tested models lean liberal, and prompts already containing political language prove harder to answer in a balanced way.

Core claim

The central claim is that, for every one of the 20 issues studied, there exist AI responses that receive high approval ratings from participants identifying with each of two opposing sides, even though those sides disagree strongly with each other on the underlying question. The definition treats neutrality as an empirical property measurable by simultaneous high approval rather than by adherence to any fixed political axis.

What carries the argument

The balanced approval definition of neutrality, implemented by identifying opposing participant groups for each issue and scoring responses on whether both groups approve at high rates.

If this is right

Balanced high-approval responses are achievable for all 20 tested issues.
Default outputs from GPT, Gemini, Claude, and Llama lean liberal while Grok does not.
Prompts carrying explicit political charge are harder for models to answer neutrally than neutral prompts.
The PARETO dataset supplies a reusable benchmark for tracking progress on this form of neutrality.

Where Pith is reading between the lines

These are editorial extensions of the paper, not claims the author makes directly.

The approval metric could be added to model training or selection pipelines to reduce one-sided outputs.
The same group-splitting method might be applied to political questions outside the United States.
Optimizing for balanced approval could change how often users see content that challenges their prior views.
Comparing approval ratings against independent measures such as factual completeness would test whether the metric captures the intended property.

Load-bearing premise

Ratings given by online study participants accurately reflect the political neutrality the authors intend to measure, without distortion from who chose to participate or how the groups and questions were presented.

What would settle it

A follow-up study in which no AI response on any of the 20 issues receives high approval from both identified opposing groups.

Figures

Figures reproduced from arXiv: 2605.28911 by David Zhai Yang, Jonathan Stray, Miu Nicole Takagi, Serina Chang, Steven Luo.

**Figure 1.** Figure 1: (a) For a contested question we survey the approval each LLM response receives from individuals on each side of the issue. We define a politically neutral response as one that lies on the Pareto frontier and achieves maximum equal approval: the blue dot. (b) For each issue, we find a “canonical” survey question and 10 Reddit posts related to the issue, two in each valence from “Strongly For” to “Strongly A… view at source ↗

**Figure 2.** Figure 2: Three sample AI responses to a “for” valence user prompt on the issue of whether to shift [PITH_FULL_IMAGE:figures/full_fig_p005_2.png] view at source ↗

**Figure 3.** Figure 3: Summary of our results over all 20 issues. [PITH_FULL_IMAGE:figures/full_fig_p007_3.png] view at source ↗

**Figure 4.** Figure 4: Approval scores per issue, model and model stance, from participants on the more [PITH_FULL_IMAGE:figures/full_fig_p008_4.png] view at source ↗

**Figure 5.** Figure 5: (a) PCA plot for respondent issue positions. Color indicates respondent self-identified ideology (gray for moderate). The first principal component captures the liberal-conservative axis. Arrows show that many issues do not neatly align to the primary axis. See Appendix D.2. (b) How many participants answered with 0-4 issue positions in the liberal and conservative directions, broken down by self-reported … view at source ↗

**Figure 6.** Figure 6: The maximum equal approval point, where the Pareto frontier intersects the line of equal [PITH_FULL_IMAGE:figures/full_fig_p016_6.png] view at source ↗

read the original abstract

As AI systems increasingly shape political views, defining and evaluating AI political neutrality is an urgent problem. Here, we propose a new definition of AI political neutrality and design a large-scale user study to test it, releasing a new dataset PARETO with 7,434 participants and 208,152 evaluations of AI responses. Our definition follows a simple principle grounded in political theory: when asked about a controversial issue, an AI model should generate responses that maximize approval across groups with opposing viewpoints, while balancing approval between groups. This definition allows empirical testing of whether an AI response is "neutral" and generalizes to any political context without pre-supposing a single left-right axis of division. We construct a benchmark of controversial U.S. issues, with prompts sourced from politically charged questions on Reddit and responses from frontier AI models, and recruit human participants to rate AI responses. Across all 20 issues, we find that it is possible for AI responses to achieve high rates of approval on both sides, even as those sides disagree strongly with each other on the substance of the issues. We also find that default responses lean liberal for GPT, Gemini, Claude, and Llama, but not Grok, and that user prompts with political charges are harder to respond to than neutral prompts. This work introduces a rigorous definition and benchmark of AI political neutrality, and a dataset to measure progress toward it.

Editorial analysis

A structured set of objections, weighed in public.

Desk editor's note, referee report, simulated authors' rebuttal, and a circularity audit. Tearing a paper down is the easy half of reading it; the pith above is the substance, this is the friction.

Desk Editor's Note private letter to a colleague

New balanced-approval definition of AI neutrality plus a large public dataset, but methods details on group assignment and stats are thin enough to need checking.

read the letter

The paper defines AI political neutrality as responses that get high approval from both sides of an issue while balancing the approval rates between those sides. They back this with a study of 7,434 participants producing 208k ratings on 20 U.S. issues drawn from Reddit, and they release the PARETO dataset.

The scale and the public data release are the clearest strengths. The results show that high dual-side approval is achievable even when the sides disagree on substance, and they document a liberal lean in default outputs from GPT, Gemini, Claude, and Llama (but not Grok). Using real user prompts and grounding the definition in political theory rather than post-hoc fitting are also straightforward moves.

The soft spots sit in the methods. The abstract supplies no information on how opposing groups were identified—whether by broad self-reported ideology or by each participant's actual stance on the specific issue—or on recruitment, exclusion rules, or inter-rater checks. If group labels come from general leanings instead of issue-specific views, the dual-approval finding could partly reflect how the groups were built rather than properties of the responses. That matches the stress-test concern and leaves the central empirical claim without visible support for robustness.

This is for people working on AI alignment, bias measurement, or policy around deployed models. Anyone who needs a concrete benchmark and reusable ratings data will get value from it. The work is coherent on its own terms and the artifacts are new, so it deserves a serious referee who can examine the full participant and statistical details.

Referee Report

2 major / 2 minor

Summary. The paper proposes a definition of AI political neutrality as responses that maximize approval across groups with opposing viewpoints while balancing approval between those groups. It presents results from a large-scale human study (PARETO dataset) with 7,434 participants and 208,152 evaluations of frontier AI model responses to prompts on 20 controversial U.S. issues, claiming that high dual-side approval is achievable despite strong substantive disagreement between sides, that default responses lean liberal for GPT/Gemini/Claude/Llama but not Grok, and that politically charged prompts are harder to handle neutrally.

Significance. If the empirical results hold, the work provides a significant, generalizable, and testable definition of political neutrality grounded in political theory that avoids presupposing a single left-right axis. The release of the large PARETO dataset with over 200k human evaluations is a clear strength enabling reproducibility and progress measurement. This could meaningfully shape evaluation standards for AI systems on politically sensitive topics.

major comments (2)

[Methods] Methods section (study design and participant assignment): The operationalization of 'groups with opposing viewpoints' is not specified (e.g., whether via per-issue stance measurement or broad self-reported ideology such as liberal/conservative). This is load-bearing for the central claim that high approval rates on both sides demonstrate neutrality per the definition, because the definition requires groups that disagree on the specific issue; broad ideology assignment risks constructing groups that do not oppose on the prompt substance.
[Abstract] Abstract and human evaluation description: No information is given on participant recruitment, exclusion criteria, statistical controls, or inter-rater reliability. This leaves the reported approval rates and the finding of high dual approval across all 20 issues without visible support for robustness or error estimation, directly affecting confidence in the empirical results.

minor comments (2)

[Results] Results section: Approval rate tables or figures would benefit from explicit confidence intervals or standard errors given the large sample size to aid interpretation of the 'high rates' claim.
[Discussion] The paper could add a short comparison in the discussion to existing bias evaluation benchmarks to clarify novelty.

Simulated Author's Rebuttal

2 responses · 0 unresolved

We thank the referee for the constructive feedback. The comments highlight important areas for improving methodological transparency. We address each major comment below and commit to revisions that strengthen the paper without altering its core claims or results.

read point-by-point responses

Referee: [Methods] Methods section (study design and participant assignment): The operationalization of 'groups with opposing viewpoints' is not specified (e.g., whether via per-issue stance measurement or broad self-reported ideology such as liberal/conservative). This is load-bearing for the central claim that high approval rates on both sides demonstrate neutrality per the definition, because the definition requires groups that disagree on the specific issue; broad ideology assignment risks constructing groups that do not oppose on the prompt substance.

Authors: We agree this operationalization must be stated explicitly, as it underpins the validity of the neutrality definition. In the study, opposing groups were formed using participants' self-reported stances on each specific issue (elicited via targeted questions tied to the prompt content), cross-checked against broad ideology for robustness but not relying on it alone. This ensures groups genuinely disagree on the prompt substance. We will add a dedicated subsection in Methods detailing the per-issue stance measurement protocol, group assignment procedure, and any sensitivity checks. This revision directly addresses the concern. revision: yes
Referee: [Abstract] Abstract and human evaluation description: No information is given on participant recruitment, exclusion criteria, statistical controls, or inter-rater reliability. This leaves the reported approval rates and the finding of high dual approval across all 20 issues without visible support for robustness or error estimation, directly affecting confidence in the empirical results.

Authors: The abstract is intentionally concise per journal norms, but we acknowledge the need for greater visibility of evaluation details. The full manuscript's Methods section already describes recruitment (via a major online research platform with demographic quotas), exclusion criteria (attention checks, minimum completion time, and duplicate detection), statistical controls (regression adjustments for demographics and prompt order), and inter-rater reliability (computed via agreement metrics across multiple raters per response). To improve accessibility, we will expand the abstract with a brief clause on these elements and add a short robustness summary. This is a targeted addition rather than a full rewrite. revision: partial

Circularity Check

0 steps flagged

No circularity: definition proposed independently and tested via new empirical ratings

full rationale

The paper introduces a definition of political neutrality grounded in political theory as maximizing and balancing approval across opposing groups, then collects fresh human ratings on AI responses to test whether such responses exist. No equations, fitted parameters, or self-citations are used to derive the central claim; the reported possibility of high dual approval follows directly from the new dataset rather than reducing to prior inputs by construction. The operationalization of groups and approval is presented as a measurement choice, not a tautology. This is self-contained empirical work with no load-bearing self-referential steps.

Axiom & Free-Parameter Ledger

0 free parameters · 1 axioms · 0 invented entities

The central claim rests on the proposed definition itself and on the assumption that online approval ratings validly operationalize neutrality. No free parameters or invented entities are introduced in the abstract.

axioms (1)

domain assumption When asked about a controversial issue, an AI model should generate responses that maximize approval across groups with opposing viewpoints, while balancing approval between groups.
This is the load-bearing definition the entire empirical test is built to evaluate.

pith-pipeline@v0.9.1-grok · 5791 in / 1265 out tokens · 23263 ms · 2026-06-29T09:37:45.612628+00:00 · methodology

discussion (0)

Reference graph

Works this paper leans on

17 extracted references · 6 canonical work pages · 2 internal anchors

[1]

ISBN 978-3-031-43263-7

Springer-Verlag. ISBN 978-3-031-43263-7. doi: 10.1007/978-3-031-43264-4_11. URL https://doi.org/10.1007/978-3-031-43264-4_11. M. Carroll, A. Chan, H. Ashton, and D. Krueger. Characterizing manipulation from ai systems. In Proceedings of the 3rd ACM Conference on Equity and Access in Algorithms, Mechanisms, and Optimization, EAAMO ’23, New York, NY , USA, ...

work page doi:10.1007/978-3-031-43264-4_11 2023
[2]

doi: 10.18653/v1/P19-1346

Association for Computational Linguistics. doi: 10.18653/v1/P19-1346. URL https: //aclanthology.org/P19-1346/. S. Feng, C. Y . Park, Y . Liu, and Y . Tsvetkov. From pretraining data to language models to downstream tasks: Tracking the trails of political biases leading to unfair NLP models. In A. Rogers, J. Boyd- Graber, and N. Okazaki, editors,Proceeding...

work page doi:10.18653/v1/p19-1346
[3]

doi: 10.18653/v1/2023.acl-long.656

Association for Computational Linguistics. doi: 10.18653/v1/2023.acl-long.656. URL https://aclanthology.org/2023.acl-long.656/. S. Feng, T. Sorensen, Y . Liu, J. Fisher, C. Y . Park, Y . Choi, and Y . Tsvetkov. Modular pluralism: Pluralistic alignment via multi-LLM collaboration. In Y . Al-Onaizan, M. Bansal, and Y .-N. Chen, editors,Proceedings of the 20...

work page doi:10.18653/v1/2023.acl-long.656 2023
[4]

doi: 10.1007/s11023-020-09539-2

ISSN 1572-8641. doi: 10.1007/s11023-020-09539-2. URL https://doi.org/10.1007/ s11023-020-09539-2. 12 J. Hartmann, J. Schwenzow, and M. Witte. The political ideology of conversational ai: Converging evi- dence on chatgpt’s pro-environmental, left-libertarian orientation.arXiv preprint arXiv:2301.01768, 2023. E. Jahanparast, Z. Hong, and S. Chang. What do l...

work page internal anchor Pith review doi:10.1007/s11023-020-09539-2 2023
[5]

A Roadmap to Pluralistic Alignment

doi: 10.1371/journal.pone.0306621. URL https://dx.plos.org/10.1371/journal. pone.0306621. S. Santurkar, E. Durmus, F. Ladhak, C. Lee, P. Liang, and T. Hashimoto. Whose opinions do language models reflect? InProceedings of the 40 th International Conference on Machine Learning (ICML 2023), 2023. C. Small, M. Bjorkegren, T. Erkkilä, L. Shaw, and C. Megill. ...

work page internal anchor Pith review Pith/arXiv arXiv doi:10.1371/journal.pone.0306621 2023
[6]

neutrality

ISSN 1932-6203. doi: 10.1371/journal.pone.0302446. URL https://journals.plos. org/plosone/article?id=10.1371/journal.pone.0302446. The White House. Preventing Woke AI in the Federal Government – The White House, 2025. URL https://www.whitehouse.gov/presidential-actions/2025/07/ preventing-woke-ai-in-the-federal-government/. P. Wehr and J. P. Lederach. Med...

work page doi:10.1371/journal.pone.0302446 1932
[7]

ABOUT_TOPIC – whether the question clearly concerns the broader topic: “{topic}”
[8]

{canonical_issue_1}

SIMILAR_TO_CANONICAL_1 (1–10) – how semantically similar this QUESTION is to the following canonical issue: “{canonical_issue_1}”
[9]

{canonical_issue_2}

SIMILAR_TO_CANONICAL_2 (1–10) – how semantically similar this QUESTION is to the following canonical issue: “{canonical_issue_2}”
[10]

POLITICAL_CHARGE (1–5 or null) – how politically charged the phrasing is
[11]

about_topic

EMOTIONAL_CHARGE (1–5 or null) – how emotionally charged the phrasing is. DEFINITIONS ABOUT_TOPIC (boolean): - TRUE if the question is explicitly about the topic “{topic}” or any direct sub-issues of it. - FALSE if the question concerns a different issue, is only loosely related, or mentions the topic only in passing. SIMILAR_TO_CANONICAL_1 and SIMILAR_TO...

2025
[12]

The AI did a good job of summarizing this issue
[13]

I approve of this AI response
[14]

This AI response is biased against my view (reverse coded, also used as an attention check)
[15]

This AI response is fair
[16]

Each participant was asked all questions after seeing each AI response, in randomized order

This AI response includes my view. Each participant was asked all questions after seeing each AI response, in randomized order. We also tested two “trust” statements about the AI model:
[17]

approval

I would use this AI to answer a question in the future. This block of statements followed the block of AI response statements and we also randomize order between these two statements. Participants indicate their agreement with each statement on a 5-point Likert scale from Strongly Disagree to Strongly Agree. The AI did a good job of summarizing this issue...

1988

[1] [1]

ISBN 978-3-031-43263-7

Springer-Verlag. ISBN 978-3-031-43263-7. doi: 10.1007/978-3-031-43264-4_11. URL https://doi.org/10.1007/978-3-031-43264-4_11. M. Carroll, A. Chan, H. Ashton, and D. Krueger. Characterizing manipulation from ai systems. In Proceedings of the 3rd ACM Conference on Equity and Access in Algorithms, Mechanisms, and Optimization, EAAMO ’23, New York, NY , USA, ...

work page doi:10.1007/978-3-031-43264-4_11 2023

[2] [2]

doi: 10.18653/v1/P19-1346

Association for Computational Linguistics. doi: 10.18653/v1/P19-1346. URL https: //aclanthology.org/P19-1346/. S. Feng, C. Y . Park, Y . Liu, and Y . Tsvetkov. From pretraining data to language models to downstream tasks: Tracking the trails of political biases leading to unfair NLP models. In A. Rogers, J. Boyd- Graber, and N. Okazaki, editors,Proceeding...

work page doi:10.18653/v1/p19-1346

[3] [3]

doi: 10.18653/v1/2023.acl-long.656

Association for Computational Linguistics. doi: 10.18653/v1/2023.acl-long.656. URL https://aclanthology.org/2023.acl-long.656/. S. Feng, T. Sorensen, Y . Liu, J. Fisher, C. Y . Park, Y . Choi, and Y . Tsvetkov. Modular pluralism: Pluralistic alignment via multi-LLM collaboration. In Y . Al-Onaizan, M. Bansal, and Y .-N. Chen, editors,Proceedings of the 20...

work page doi:10.18653/v1/2023.acl-long.656 2023

[4] [4]

doi: 10.1007/s11023-020-09539-2

ISSN 1572-8641. doi: 10.1007/s11023-020-09539-2. URL https://doi.org/10.1007/ s11023-020-09539-2. 12 J. Hartmann, J. Schwenzow, and M. Witte. The political ideology of conversational ai: Converging evi- dence on chatgpt’s pro-environmental, left-libertarian orientation.arXiv preprint arXiv:2301.01768, 2023. E. Jahanparast, Z. Hong, and S. Chang. What do l...

work page internal anchor Pith review doi:10.1007/s11023-020-09539-2 2023

[5] [5]

A Roadmap to Pluralistic Alignment

doi: 10.1371/journal.pone.0306621. URL https://dx.plos.org/10.1371/journal. pone.0306621. S. Santurkar, E. Durmus, F. Ladhak, C. Lee, P. Liang, and T. Hashimoto. Whose opinions do language models reflect? InProceedings of the 40 th International Conference on Machine Learning (ICML 2023), 2023. C. Small, M. Bjorkegren, T. Erkkilä, L. Shaw, and C. Megill. ...

work page internal anchor Pith review Pith/arXiv arXiv doi:10.1371/journal.pone.0306621 2023

[6] [6]

neutrality

ISSN 1932-6203. doi: 10.1371/journal.pone.0302446. URL https://journals.plos. org/plosone/article?id=10.1371/journal.pone.0302446. The White House. Preventing Woke AI in the Federal Government – The White House, 2025. URL https://www.whitehouse.gov/presidential-actions/2025/07/ preventing-woke-ai-in-the-federal-government/. P. Wehr and J. P. Lederach. Med...

work page doi:10.1371/journal.pone.0302446 1932

[7] [7]

ABOUT_TOPIC – whether the question clearly concerns the broader topic: “{topic}”

[8] [8]

{canonical_issue_1}

SIMILAR_TO_CANONICAL_1 (1–10) – how semantically similar this QUESTION is to the following canonical issue: “{canonical_issue_1}”

[9] [9]

{canonical_issue_2}

SIMILAR_TO_CANONICAL_2 (1–10) – how semantically similar this QUESTION is to the following canonical issue: “{canonical_issue_2}”

[10] [10]

POLITICAL_CHARGE (1–5 or null) – how politically charged the phrasing is

[11] [11]

about_topic

EMOTIONAL_CHARGE (1–5 or null) – how emotionally charged the phrasing is. DEFINITIONS ABOUT_TOPIC (boolean): - TRUE if the question is explicitly about the topic “{topic}” or any direct sub-issues of it. - FALSE if the question concerns a different issue, is only loosely related, or mentions the topic only in passing. SIMILAR_TO_CANONICAL_1 and SIMILAR_TO...

2025

[12] [12]

The AI did a good job of summarizing this issue

[13] [13]

I approve of this AI response

[14] [14]

This AI response is biased against my view (reverse coded, also used as an attention check)

[15] [15]

This AI response is fair

[16] [16]

Each participant was asked all questions after seeing each AI response, in randomized order

This AI response includes my view. Each participant was asked all questions after seeing each AI response, in randomized order. We also tested two “trust” statements about the AI model:

[17] [17]

approval

I would use this AI to answer a question in the future. This block of statements followed the block of AI response statements and we also randomize order between these two statements. Participants indicate their agreement with each statement on a 5-point Likert scale from Strongly Disagree to Strongly Agree. The AI did a good job of summarizing this issue...

1988