When AI Takes Sides on Questions of Faith: Persistent Asymmetries in AI-Mediated Faith Guidance

Brett Israelsen; Josh Coates; Julie Park; Nancy Fulda; Pete Whiting; Sheryl Carty

arxiv: 2605.22975 · v2 · pith:A3PYQVYYnew · submitted 2026-05-21 · 💻 cs.CL · cs.CY

When AI Takes Sides on Questions of Faith: Persistent Asymmetries in AI-Mediated Faith Guidance

Brett Israelsen , Sheryl Carty , Josh Coates , Nancy Fulda , Julie Park , Pete Whiting This is my paper

Pith reviewed 2026-05-25 05:48 UTC · model grok-4.3

classification 💻 cs.CL cs.CY

keywords large language modelsreligious conversionAI biasfaith guidanceasymmetryLLM evaluationethics

0 comments

The pith

Large language models give asymmetric advice on religious conversions, favoring some faiths over others.

A machine-rendered reading of the paper's core claim, the machinery that carries it, and where it could break.

The paper tests whether LLMs treat questions about switching religions symmetrically and finds they do not. Models consistently use more encouraging language for transitions toward Catholic, Bahá'í, and Sikh faiths while using more discouraging language for transitions toward Atheism, Agnosticism, or Jehovah's Witnesses. This pattern holds across 20 models and 182 religion pairings when the same query is reversed. A reader would care because these repeatable differences could shape real user decisions if AI systems are used for personal guidance at scale. The asymmetries appear tied to model behavior rather than the scoring method alone.

Core claim

When prompted for advice on hypothetical faith transitions and then asked the reversed question, every tested LLM produced consistent asymmetries: higher support for joining some religions and lower support for leaving them, while the opposite held for others. Catholic, Bahá'í, and Sikh faiths received broadly favorable treatment on average, whereas Atheists, Agnostics, and Jehovah's Witnesses were primarily disfavored. The pattern varied by model size and provider yet remained reproducible across multiple trials, phrasings, and dataset variations.

What carries the argument

A human-verified LLM-as-a-judge framework that scores the encouraging versus discouraging language in model responses to simulated user queries about joining or leaving a given religion.

If this is right

All 20 tested models exhibit reproducible asymmetry in religious advice.
The specific pattern of favored and disfavored religions differs by model size and provider.
Asymmetries remain stable across changes in question phrasing and the set of religion pairings.
Any imbalances that are reproduced at scale could carry real-world effects on users.
The observed preferences are a property of model behavior rather than an artifact of the evaluation method.

Where Pith is reading between the lines

These are editorial extensions of the paper, not claims the author makes directly.

AI developers may want to audit training data or alignment processes for similar religion-related patterns before deploying models in advisory roles.
Individuals using AI for faith-related questions could benefit from cross-checking outputs against multiple models or human sources.
The results suggest a need to examine whether similar asymmetries appear in other domains involving personal identity or belief.
Controlled experiments could test whether targeted fine-tuning on balanced conversion examples reduces the observed differences.

Load-bearing premise

The LLM judge accurately captures the tested models' genuine preferences instead of injecting its own systematic biases when scoring language.

What would settle it

Re-running the full set of queries with a different judge model or with human scorers and finding that the direction or strength of the asymmetries changes or disappears.

read the original abstract

We ask whether large language models (LLMs) treat queries about religious conversion symmetrically. The answer is no. When asked for advice on hypothetical faith transitions from religion A->B vs. religion B->A , models exhibited consistent asymmetries, favoring some religions while subtly discouraging conversion to others. On average Catholic, Bah\'a'\'i, and Sikh religions were broadly favored (high support for joining, low support for leaving), while Atheists, Agnostics, and Jehovah's Witnesses were primarily disfavored. Patterns varied by model size and model provider, with Grok 4.20 exhibiting the strongest asymmetries. We tested 20 commercial and open-source language models across 182 religion pairings using a human-verified LLM-as-judge framework. Each model was probed via interactions with a simulated user asking for advice on a potential faith conversion. Models tended to use more encouraging language for some faith transitions over others; these patterns were systematically repeatable across multiple trials. All LLMs tested exhibited reproducible asymmetry, though the pattern of preferences differed for each. Overall preferences persist across multiple question phrasings and variations in the religious pairing dataset. Taken together, these results suggest that asymmetry is a robust property of model behavior rather than an artifact of how the models' answers were scored. It is important to consider that any imbalances deployed and reproduced at scale can have real-world implications.

Editorial analysis

A structured set of objections, weighed in public.

Desk editor's note, referee report, simulated authors' rebuttal, and a circularity audit. Tearing a paper down is the easy half of reading it; the pith above is the substance, this is the friction.

Desk Editor's Note private letter to a colleague

The paper measures consistent directional asymmetries in LLM religious advice via reversal tests, but the LLM-judge scoring step lacks the quantitative checks needed to rule out artifacts.

read the letter

The core result is that 20 LLMs, when asked for advice on hypothetical faith transitions and then the reverse, produce repeatable asymmetries: Catholic, Bahá'í, and Sikh transitions tend to receive more encouraging language while Atheist, Agnostic, and Jehovah's Witness transitions receive more discouraging language. The pattern holds across model sizes and providers, with Grok showing the largest effect. That is the main new observation relative to earlier bias studies that did not focus on religious conversion pairs or use systematic reversal.

Referee Report

2 major / 2 minor

Summary. The manuscript reports an empirical study of 20 commercial and open-source LLMs probed on 182 religion-pair queries about hypothetical faith transitions. Using a human-verified LLM-as-a-judge framework, the authors find reproducible asymmetries: models on average favor conversions toward Catholic, Bahá'í, and Sikh faiths (high encouragement to join, low encouragement to leave) while disfavoring transitions involving Atheism, Agnosticism, and Jehovah's Witnesses. Patterns vary by model and provider (strongest in Grok 4.20) but persist across phrasings and are claimed not to be scoring artifacts.

Significance. If the measured asymmetries reflect the probed models' output distributions rather than downstream judge artifacts, the work documents a reproducible form of value-laden bias in LLMs on sensitive personal-advice domains. The multi-model scope and human-verification step are strengths; however, the absence of detailed prompting, rubric, and statistical controls in the reported methods limits the strength of the robustness claim.

major comments (2)

[Methods] Methods section: the description of the LLM-as-a-judge prompt, scoring rubric, and human-verification protocol is insufficient to evaluate whether the judge introduces religion-correlated lexical biases that could produce or amplify the reported pattern (Catholic/Bahá'í/Sikh favored, Atheist/Agnostic/JW disfavored). No inter-annotator agreement statistics or ablation replacing the judge with full human scoring on the full set are provided, which is load-bearing for the central claim that asymmetries reside in the probed models.
[Abstract and Results] Abstract and Results: the assertion that 'results are not an artifact of how the models' answers were scored' and that asymmetries 'persist across multiple question phrasings' lacks quantitative support such as effect-size comparisons, statistical tests for phrasing invariance, or exclusion criteria for outlier responses. Without these, it is impossible to determine whether the asymmetries survive basic robustness checks.

minor comments (2)

[Abstract] The abstract states '182 religion pairings' but does not list the exact set of religions or the pairing construction method; a table or appendix listing the 182 pairs would improve reproducibility.
[Results] Model-size and provider variation is mentioned but not accompanied by a table breaking down asymmetry strength by model family or parameter count.

Simulated Author's Rebuttal

2 responses · 0 unresolved

We thank the referee for their constructive feedback. We address each major comment below and commit to revisions that strengthen the methodological transparency and quantitative robustness of the claims.

read point-by-point responses

Referee: [Methods] Methods section: the description of the LLM-as-a-judge prompt, scoring rubric, and human-verification protocol is insufficient to evaluate whether the judge introduces religion-correlated lexical biases that could produce or amplify the reported pattern (Catholic/Bahá'í/Sikh favored, Atheist/Agnostic/JW disfavored). No inter-annotator agreement statistics or ablation replacing the judge with full human scoring on the full set are provided, which is load-bearing for the central claim that asymmetries reside in the probed models.

Authors: We agree the original Methods section lacked sufficient detail. In revision we will add the complete LLM-as-a-judge prompts, the full scoring rubric with anchor examples, and a precise description of the human-verification protocol. We will also report inter-annotator agreement (Fleiss' kappa) on a 100-response subset double-annotated by three humans. A full human re-scoring of every response is not feasible at the scale of the study; instead we will add an ablation on a stratified 200-response sample comparing judge scores to human scores, confirming high agreement and absence of religion-correlated systematic discrepancies. revision: partial
Referee: [Abstract and Results] Abstract and Results: the assertion that 'results are not an artifact of how the models' answers were scored' and that asymmetries 'persist across multiple question phrasings' lacks quantitative support such as effect-size comparisons, statistical tests for phrasing invariance, or exclusion criteria for outlier responses. Without these, it is impossible to determine whether the asymmetries survive basic robustness checks.

Authors: We accept that the original text would be strengthened by explicit quantitative evidence. The revised Results section will include standardized effect-size comparisons (Cohen's d) between primary and alternative phrasings, statistical tests (repeated-measures ANOVA with post-hoc contrasts) for phrasing invariance, and documented outlier detection/exclusion criteria together with sensitivity analyses demonstrating that the reported asymmetries remain statistically significant after outlier removal. revision: yes

Circularity Check

0 steps flagged

No circularity: empirical measurement study with independent observations

full rationale

This paper reports an empirical study that probes multiple LLMs with 182 religion-pair queries, scores responses via a human-verified LLM-as-a-judge framework, and aggregates observed asymmetries in language use. No equations, derivations, fitted parameters, or predictions appear in the provided text. The central claim rests on repeatable patterns across models, phrasings, and trials rather than any self-referential reduction, self-citation chain, or ansatz. The setup is self-contained against external benchmarks (human verification and cross-model consistency), so no load-bearing step reduces to its own inputs by construction.

Axiom & Free-Parameter Ledger

0 free parameters · 1 axioms · 0 invented entities

This is an empirical measurement paper; no mathematical free parameters, invented physical entities, or non-standard axioms are introduced in the abstract.

axioms (1)

domain assumption LLM-generated text can be reliably scored for encouragement versus discouragement by another LLM after human verification of the judge.
The paper relies on an LLM-as-a-judge framework to quantify asymmetries.

pith-pipeline@v0.9.0 · 5796 in / 1308 out tokens · 19213 ms · 2026-05-25T05:48:03.977548+00:00 · methodology

When AI Takes Sides on Questions of Faith: Persistent Asymmetries in AI-Mediated Faith Guidance

Core claim

What carries the argument

If this is right

Where Pith is reading between the lines

Load-bearing premise

What would settle it

discussion (0)