arxiv: 2605.01006 · v2 · submitted 2026-05-01 · 💻 cs.CL · cs.CY

Recognition: no theorem link

Can AI Debias the News? LLM Interventions Improve Cross-Partisan Receptivity but LLMs Overestimate Their Own Effectiveness

Faisal Feroz , Jonas R. Kunst

Authors on Pith no claims yet

Pith reviewed 2026-05-11 00:43 UTC · model grok-4.3

classification 💻 cs.CL cs.CY

keywords LLM debiasingpartisan newscross-partisan receptivityideological framingtrustworthiness judgmentsbackfire effecthuman oversightsilicon participants

0 comments

The pith

LLM reframing of partisan headlines raises conservatives' trust in liberal content without backfire, but models overestimate their own results.

A machine-rendered reading of the paper's core claim, the machinery that carries it, and where it could break.

The paper examines whether large language models can reduce partisan divides by debiasing news headlines at scale. It finds that merely swapping emotive words for milder synonyms produces no change in how human readers judge opposing news. A deeper intervention that alters the ideological framing itself raises conservatives' sense of trustworthiness, completeness, and willingness to engage with liberal headlines, and this improvement does not trigger rejection among liberal readers. Simulated LLM participants show effects in the same direction but of greater size, and the models' guesses about which reader traits predict responsiveness differ from the actual human patterns observed.

Core claim

Substantive LLM reframing of liberal news headlines increases conservatives' perceived trustworthiness, completeness, and engagement without producing a backfire among liberals, whereas lexical debiasing has no human effect. LLM-simulated responses align directionally with humans but are larger in magnitude, and the models' implicit account of who responds to debiasing diverges from the psychological profile that actually predicts human responsiveness.

What carries the argument

The contrast between lexical debiasing through synonym replacement and substantive ideological reframing, measured by comparing human participant judgments against LLM-simulated participant responses in two pre-registered experiments.

If this is right

Debiasing must change ideological framing rather than surface wording to affect real readers' trust judgments.
The absence of backfire effects holds for the tested liberal sample when conservatives receive reframed content.
Models need external human evaluation because their simulated effect sizes exceed observed human responses.
Models' predictions about which reader characteristics moderate the intervention do not match actual human moderators.

Where Pith is reading between the lines

These are editorial extensions of the paper, not claims the author makes directly.

The same framing changes could be applied to full articles or social media posts to test whether the trust gains persist in richer contexts.
Collecting human response data to retrain models might reduce the gap between simulated and actual effect sizes.
Repeated exposure to reframed content over time could be tested to see whether initial receptivity grows into sustained cross-partisan habits.

Load-bearing premise

Headline-only stimuli and the recruited conservative and liberal samples produce effects that generalize beyond the experimental setting to real-world news consumption and broader populations.

What would settle it

A study presenting the same reframed headlines inside full news articles or to a demographically broader sample that finds no rise in conservatives' willingness to engage would show the reported improvement does not hold outside the tested conditions.

read the original abstract

Partisan news media erode cross-partisan trust, but large language models (LLMs) offer a potential means of debiasing such content at scale. Across two pre-registered experiments, we tested whether LLM-generated debiasing of liberal news headlines could improve conservative readers' trust-relevant judgments. Study 1 found that subtle lexical debiasing (replacing emotive words with more moderate synonyms) had no effect on any outcome. Study 2 found that a more substantive reframing intervention significantly increased conservatives' perceived trustworthiness, completeness, and willingness to engage with liberal news headlines, without producing a backfire effect among a sample of liberals. In Study 1, the intervention produced robust effects among LLM-simulated silicon participants, whereas it had no impact on human readers. In Study 2, the intervention's effects among silicon participants aligned directionally with human responses but were significantly larger in magnitude for some outcomes. Moderation analyses revealed that the model's implicit theory of who responds to debiasing diverged from the psychological profile that actually predicted human responsiveness. These findings demonstrate that LLM-based debiasing can improve cross-partisan receptivity when targeting ideological framing rather than surface-level language, but that current models lack both the quantitative accuracy and qualitative psychological fidelity to evaluate their own interventions without human oversight.

Editorial analysis

A structured set of objections, weighed in public.

Desk editor's note, referee report, simulated authors' rebuttal, and a circularity audit. Tearing a paper down is the easy half of reading it; the pith above is the substance, this is the friction.

Referee Report

1 major / 2 minor

Summary. The manuscript reports two pre-registered experiments testing LLM-generated debiasing of liberal news headlines. Study 1 finds that subtle lexical substitutions produce no effects on human readers' trust-relevant judgments (despite effects in LLM-simulated participants). Study 2 finds that substantive reframing increases conservatives' perceived trustworthiness, completeness, and engagement without backfire among liberals; silicon participants show directional alignment but larger effect magnitudes, and moderation analyses indicate divergence between the model's implicit theory of responsiveness and actual human moderators.

Significance. If the directional findings hold under broader conditions, the work provides empirical evidence that targeted LLM reframing (rather than surface lexical changes) can enhance cross-partisan receptivity to news content, with direct implications for scalable interventions against partisan media erosion. The pre-registered design, use of human participant data, and explicit comparison to LLM self-evaluation are strengths that ground the claim that current models overestimate their effectiveness and lack psychological fidelity for autonomous deployment.

major comments (1)

[Study 2] Study 2 methods and results: the central claim that LLM reframing improves cross-partisan receptivity (and that LLMs overestimate their own effectiveness) is demonstrated exclusively with headline-only stimuli. The introduction positions the research as addressing erosion of trust in partisan news media, yet no data are provided on full-length articles where narrative context, sourcing, and counter-framing could attenuate or reverse the observed gains in trustworthiness and engagement. This scope limitation directly affects the load-bearing generalizability of both the human effects and the silicon-human divergence.

minor comments (2)

[Abstract] Abstract: the summary supplies no sample sizes, effect sizes, confidence intervals, or key statistical tests, forcing readers to consult the full text for quantitative assessment of the reported improvements and overestimation.
[Discussion] Discussion: the limitations paragraph could more explicitly discuss potential demand effects in the recruited partisan samples and whether headline isolation might inflate effects relative to naturalistic news consumption.

Simulated Author's Rebuttal

1 responses · 0 unresolved

We thank the referee for the constructive feedback. We address the major comment point-by-point below.

read point-by-point responses

Referee: [Study 2] Study 2 methods and results: the central claim that LLM reframing improves cross-partisan receptivity (and that LLMs overestimate their own effectiveness) is demonstrated exclusively with headline-only stimuli. The introduction positions the research as addressing erosion of trust in partisan news media, yet no data are provided on full-length articles where narrative context, sourcing, and counter-framing could attenuate or reverse the observed gains in trustworthiness and engagement. This scope limitation directly affects the load-bearing generalizability of both the human effects and the silicon-human divergence.

Authors: We agree that the experiments use headline-only stimuli, as stated in the title, abstract, and methods. The introduction frames the broader problem of partisan media trust erosion, but the pre-registered research questions, hypotheses, and interventions specifically target headlines to isolate lexical versus substantive reframing effects under controlled conditions. This design choice avoids confounds from article length or narrative structure. We acknowledge that this limits generalizability to full-length articles, where sourcing and counter-framing could moderate effects. In revision, we will expand the Discussion to explicitly note this scope limitation, its implications for the human results and silicon-human divergence, and directions for future work on full articles. Headlines nonetheless represent a high-stakes, frequently encountered news format where initial receptivity is formed, providing a valid and internally consistent test of the central claims. revision: partial

Circularity Check

0 steps flagged

No circularity: purely empirical pre-registered experiments

full rationale

The paper reports two pre-registered human experiments (Study 1 lexical debiasing, Study 2 framing reframing) plus parallel LLM-simulated silicon participants, with all outcome measures (trustworthiness, completeness, engagement) collected from recruited liberal and conservative samples. No equations, parameter fitting, derivations, or self-citations are used to generate the central claims; results are grounded directly in participant data rather than model outputs or prior author theorems. The silicon-human divergence and moderation analyses are likewise empirical comparisons, not reductions to fitted inputs or self-referential definitions. This is a standard empirical design with independent human data, so no load-bearing step reduces to its own inputs by construction.

Axiom & Free-Parameter Ledger

0 free parameters · 1 axioms · 0 invented entities

Empirical study with no free parameters or invented entities; relies on standard experimental psychology assumptions about response validity and generalizability.

axioms (1)

domain assumption Participant responses reflect genuine shifts in receptivity rather than demand characteristics or social desirability bias.
Standard assumption in survey-based psychology experiments but not tested or discussed in the provided abstract.

pith-pipeline@v0.9.0 · 5535 in / 1137 out tokens · 52246 ms · 2026-05-11T00:43:02.446661+00:00 · methodology

Review history (2 revisions) →

discussion (0)

Reference graph

Works this paper leans on

8 extracted references · 2 canonical work pages

[1]

softened

to identify the range of media trust scores at which the effect of debiasing on trustworthiness was statistically significant. The analysis revealed that the effect of condition became significant (p < .05) for media trust scores above 0.35. Given that the observed range of media trust scores was 1.00 to 7.00, this threshold falls below the observed range...

2025
[2]

rather than a stimulus-driven reaction to specific textual cues (Caparos et al., 2015), then interventions that leave the underlying argumentative frame intact should fail regardless of how thoroughly individual words are softened. This interpretation is consistent with our human null findings and motivates the more substantive reframing intervention test...

2015
[3]

were collected the same day, with the remaining 3 Democrat responses submitted in the early hours Can AI Debias the News? 43 of 29 July 2025 (UTC). We note that the articles used in Study 2 were the same set originally scraped on 26 April 2025 for Study 1, introducing a roughly three-month lag between scraping and data collection. However, given the volum...

2025
[4]

Simple effects and simple slopes reported with z-values were obtained from emmeans-based contrasts. All data, materials, and code can be obtained at https://osf.io/c2fpe/overview?view_only=02561ad5eb08446d877760a9aad35d88 (pre-registration), and https://osf.io/na78b/overview?view_only=8503c0c0d91649db80f903e05d5ba5b3 (code, materials). Can AI Debias the N...

2028
[5]

These effects among conservative silicon participants were substantially larger than the corresponding human effects across all outcomes

and openness to consider the article’s perspective (β = −0.04, 95% CI [−0.12, 0.03], z = 1.21, p = .225). These effects among conservative silicon participants were substantially larger than the corresponding human effects across all outcomes. Can AI Debias the News? 57 Moderation by Individual Differences (Stage 3). In-group identification significantly ...

2020
[6]

Surface-level linguistic changes leave the underlying argumentative structure intact and are therefore insufficient to shift how partisan readers evaluate out-group content

rather than to individual word choices. Surface-level linguistic changes leave the underlying argumentative structure intact and are therefore insufficient to shift how partisan readers evaluate out-group content. Can AI Debias the News? 68 Second, more invasive debiasing did not backfire among the outlet's ideologically aligned audience. As mentioned in ...

2020
[7]

and underscore the need for human-in-the-loop oversight in any editorial deployment of such systems (Mosqueira-Rey et al., 2023). Moderation and Individual Differences Across both studies, cognitive flexibility and in-group identification did not meaningfully moderate the effect of debiasing among human participants. This null pattern is informative. It s...

work page doi:10.1016/j.artint.2024.104145 2023
[8]

https://www.pewresearch.org/politics/2016/06/22/partisanship-and-political-animosity-in-2016/ Can AI Debias the News? 75 Gopnik, A

Pew Research Center. https://www.pewresearch.org/politics/2016/06/22/partisanship-and-political-animosity-in-2016/ Can AI Debias the News? 75 Gopnik, A. (2022, July 15). What AI still doesn’t know how to do. Wall Street Journal. https://www.wsj.com/tech/ai/what-ai-still-doesnt-know-how-to-do-11657891316 Hackenburg, K., Ibrahim, L., Tappin, B. M., & Tsakir...

work page doi:10.1007/s00146-025-02464-x 2016