pith. machine review for the scientific record. sign in

arxiv: 2604.12076 · v1 · submitted 2026-04-13 · 💻 cs.CL · cs.AI· cs.CY

Recognition: unknown

Narrative over Numbers: The Identifiable Victim Effect and its Amplification Under Alignment and Reasoning in Large Language Models

Syed Rifat Raiyan

Pith reviewed 2026-05-10 15:09 UTC · model grok-4.3

classification 💻 cs.CL cs.AIcs.CY
keywords identifiable victim effectlarge language modelsmoral psychologyalignment trainingreasoning modelsdecision bias
0
0 comments X

The pith

Large language models favor specific victims over groups more than humans do, but reasoning training reverses the bias.

A machine-rendered reading of the paper's core claim, the machinery that carries it, and where it could break.

The paper examines whether large language models inherit the human tendency to allocate more help to one identifiable person than to a large group facing the same problem. It runs over fifty thousand trials on sixteen models from different developers using experiments adapted from classic psychology studies. Instruction-tuned models show a strong bias toward the single victim while models built for reasoning show the opposite preference. Standard prompts that encourage step-by-step thinking increase the bias, but prompts focused on overall utility remove it. The overall effect size across models is about twice the human baseline for single victims.

Core claim

The identifiable victim effect appears in large language models and is modulated by how the models are trained. Instruction-tuned models display a very strong preference for helping one specific victim over equivalent statistical groups, while reasoning-specialized models invert this preference. The combined effect across models exceeds the single-victim human meta-analytic baseline, standard chain-of-thought prompting enlarges the bias, and only utilitarian reasoning prompts eliminate it.

What carries the argument

The identifiable victim effect, measured by comparing model responses to prompts about one named victim versus a large group in equivalent need across ten adapted psychology experiments.

If this is right

  • Models used for aid allocation or triage decisions may systematically favor individual narratives over group needs.
  • Alignment methods that rely on instruction tuning can strengthen rather than correct certain moral biases.
  • Reasoning-focused training offers a way to reduce or reverse the bias without additional safeguards.
  • Standard chain-of-thought prompts do not improve fairness in these decisions and can worsen it.

Where Pith is reading between the lines

These are editorial extensions of the paper, not claims the author makes directly.

  • Future model training could add explicit checks for quantity neglect to reduce over-reliance on single stories.
  • Systems deployed in humanitarian contexts may need to route decisions through reasoning models or special utility prompts.
  • The same experimental approach could test whether other documented human moral biases appear in language models.

Load-bearing premise

That prompting large language models with human psychology scenarios isolates the victim bias without distortion from training data patterns or model response styles.

What would settle it

If models allocate resources equally to single-victim stories and to group statistics under the same prompts, the identifiable victim effect would be absent.

read the original abstract

The Identifiable Victim Effect (IVE) $-$ the tendency to allocate greater resources to a specific, narratively described victim than to a statistically characterized group facing equivalent hardship $-$ is one of the most robust findings in moral psychology and behavioural economics. As large language models (LLMs) assume consequential roles in humanitarian triage, automated grant evaluation, and content moderation, a critical question arises: do these systems inherit the affective irrationalities present in human moral reasoning? We present the first systematic, large-scale empirical investigation of the IVE in LLMs, comprising N=51,955 validated API trials across 16 frontier models spanning nine organizational lineages (Google, Anthropic, OpenAI, Meta, DeepSeek, xAI, Alibaba, IBM, and Moonshot). Using a suite of ten experiments $-$ porting and extending canonical paradigms from Small et al. (2007) and Kogut and Ritov (2005) $-$ we find that the IVE is prevalent but strongly modulated by alignment training. Instruction-tuned models exhibit extreme IVE (Cohen's d up to 1.56), while reasoning-specialized models invert the effect (down to d=-0.85). The pooled effect (d=0.223, p=2e-6) is approximately twice the single-victim human meta-analytic baseline (d$\approx$0.10) reported by Lee and Feeley (2016) $-$ and likely exceeds the overall human pooled effect by a larger margin, given that the group-victim human effect is near zero. Standard Chain-of-Thought (CoT) prompting $-$ contrary to its role as a deliberative corrective $-$ nearly triples the IVE effect size (from d=0.15 to d=0.41), while only utilitarian CoT reliably eliminates it. We further document psychophysical numbing, perfect quantity neglect, and marginal in-group/out-group cultural bias, with implications for AI deployment in humanitarian and ethical decision-making contexts.

Editorial analysis

A structured set of objections, weighed in public.

Desk editor's note, referee report, simulated authors' rebuttal, and a circularity audit. Tearing a paper down is the easy half of reading it; the pith above is the substance, this is the friction.

Referee Report

2 major / 2 minor

Summary. The manuscript reports a large-scale empirical investigation (N=51,955 validated API trials across 16 frontier LLMs from nine organizations) of the Identifiable Victim Effect (IVE) by porting ten paradigms from Small et al. (2007) and Kogut & Ritov (2005). It claims that IVE is prevalent in LLMs but strongly modulated by alignment training: instruction-tuned models show extreme positive IVE (Cohen's d up to 1.56), reasoning-specialized models invert the effect (d down to -0.85), the pooled effect (d=0.223, p=2e-6) exceeds the human single-victim meta-analytic baseline (d≈0.10), standard CoT nearly triples the effect size while utilitarian CoT eliminates it, and additional patterns include psychophysical numbing, quantity neglect, and marginal cultural bias.

Significance. If the central empirical patterns hold after addressing potential confounds, the work offers a valuable large-scale dataset on how LLMs handle moral allocation decisions involving narrative vs. statistical victims, with direct relevance to AI deployment in humanitarian triage and ethical automation. The scale, multi-family coverage, and exploration of prompting interventions (CoT variants) are strengths that support the directional claims about modulation by alignment and reasoning. The comparison to human meta-analyses adds context, though the significance for claims of 'amplification' or 'inheritance' of affective irrationalities depends on ruling out artifacts.

major comments (2)
  1. [Methods] Methods section: The manuscript provides no ablations on prompt phrasing variations, no explicit controls for differential response styles (e.g., hedging or refusal rates between narrative and statistical conditions), and no contamination checks against psychology literature in pre-training data. These omissions are load-bearing because the reported effect sizes (d=1.56, d=-0.85, pooled d=0.223) could arise from parsing differences or stylistic artifacts rather than the intended IVE mechanism.
  2. [Results] Results and Discussion: The claim that the pooled effect 'is approximately twice the single-victim human meta-analytic baseline and likely exceeds the overall human pooled effect' relies on the Lee and Feeley (2016) comparison, but the manuscript does not detail the exact human effect sizes used, adjustments for single- vs. group-victim scenarios, or statistical corrections applied to the LLM data. This weakens the amplification conclusion.
minor comments (2)
  1. [Abstract] Abstract: The list of nine organizational lineages is consistent with the enumerated models, but the abstract could more explicitly state the total number of models (16) and paradigms (10) for immediate clarity.
  2. Figures: Effect size plots would benefit from consistent inclusion of confidence intervals and sample sizes per condition to aid interpretation of the Cohen's d values across model families.

Simulated Author's Rebuttal

2 responses · 1 unresolved

We thank the referee for their constructive comments on our manuscript investigating the Identifiable Victim Effect in LLMs. We address each major point below and commit to revisions that improve methodological transparency and the precision of our human baseline comparisons.

read point-by-point responses
  1. Referee: [Methods] Methods section: The manuscript provides no ablations on prompt phrasing variations, no explicit controls for differential response styles (e.g., hedging or refusal rates between narrative and statistical conditions), and no contamination checks against psychology literature in pre-training data. These omissions are load-bearing because the reported effect sizes (d=1.56, d=-0.85, pooled d=0.223) could arise from parsing differences or stylistic artifacts rather than the intended IVE mechanism.

    Authors: We agree these controls would further isolate the IVE mechanism. In revision we will add prompt-phrasing ablations using at least two alternative formulations per condition and report effect-size stability. We will also tabulate refusal and hedging rates separately by condition, include them as covariates in the primary models, and test whether they moderate the reported Cohen's d values. For contamination, we will expand the limitations section to discuss potential overlap with psychology literature and highlight that the observed inversion in reasoning-specialized models is inconsistent with uniform memorization artifacts. These additions directly address the concern that stylistic differences could drive the results. revision: yes

  2. Referee: [Results] Results and Discussion: The claim that the pooled effect 'is approximately twice the single-victim human meta-analytic baseline and likely exceeds the overall human pooled effect' relies on the Lee and Feeley (2016) comparison, but the manuscript does not detail the exact human effect sizes used, adjustments for single- vs. group-victim scenarios, or statistical corrections applied to the LLM data. This weakens the amplification conclusion.

    Authors: We will revise the Results and Discussion to include the precise meta-analytic statistics from Lee and Feeley (2016), explicitly stating the single-victim d value, the near-zero group-victim effect, and the rationale for focusing on the single-victim baseline. We will also detail the LLM pooling procedure, including how paradigms were aggregated, any weighting by sample size, and corrections applied for multiple comparisons. These clarifications will make the amplification claim more transparent and directly responsive to the referee's concern. revision: yes

standing simulated objections not resolved
  • Complete empirical verification of pre-training data contamination for all closed-source models, given that full training corpora remain proprietary.

Circularity Check

0 steps flagged

No circularity: results are direct empirical measurements

full rationale

The paper reports effect sizes from N=51,955 API trials across 16 models using ported human paradigms. No mathematical derivation chain exists; Cohen's d values, pooled statistics, and comparisons to human baselines (Lee & Feeley 2016) are computed directly from model outputs rather than fitted parameters, self-defined quantities, or load-bearing self-citations. Citations to Small et al. (2007) and Kogut & Ritov (2005) are external human studies used only for paradigm porting, not to justify uniqueness or ansatz within the LLM results. The central claims therefore remain independent of the paper's own inputs.

Axiom & Free-Parameter Ledger

0 free parameters · 2 axioms · 0 invented entities

The study is purely empirical and relies on established statistical methods and prior psychological paradigms without introducing new free parameters, axioms beyond standard inference, or invented entities.

axioms (2)
  • domain assumption Canonical human psychology paradigms from Small et al. (2007) and Kogut and Ritov (2005) can be directly ported to LLMs to measure the same construct.
    The ten experiments are described as porting and extending these paradigms.
  • standard math Cohen's d and p-values from API responses validly quantify the effect without confounding from model stochasticity or prompt artifacts.
    Used throughout for reporting effect sizes and significance.

pith-pipeline@v0.9.0 · 5677 in / 1410 out tokens · 45306 ms · 2026-05-10T15:09:19.302496+00:00 · methodology

discussion (0)

Sign in with ORCID, Apple, or X to comment. Anyone can read and Pith papers without signing in.

Reference graph

Works this paper leans on

72 extracted references · 30 canonical work pages · 8 internal anchors

  1. [1]

    Orga- nizational Behavior and Human Decision Processes102(2), 143–153 (2007) https: //doi.org/10.1016/j.obhdp.2006.01.005

    Small, D.A., Loewenstein, G., Slovic, P.: Sympathy and callousness: The impact of deliberative thought on donations to identifiable and statistical victims. Orga- nizational Behavior and Human Decision Processes102(2), 143–153 (2007) https: //doi.org/10.1016/j.obhdp.2006.01.005

  2. [2]

    identified victim

    Kogut, T., Ritov, I.: The “identified victim” effect: An identified group, or just a single individual? Journal of Behavioral Decision Making18(3), 157–167 (2005) https://doi.org/10.1002/bdm.492

  3. [3]

    Social Influence11(3), 199–215 (2016) https://doi.org/10.1080/15534510.2016.1216891

    Lee, S., Feeley, T.H.: The identifiable victim effect: A meta-analytic review. Social Influence11(3), 199–215 (2016) https://doi.org/10.1080/15534510.2016.1216891

  4. [4]

    Prentice-Hall, Englewood Cliffs, NJ (1980)

    Nisbett, R.E., Ross, L.: Human Inference: Strategies and Shortcomings of Social Judgment. Prentice-Hall, Englewood Cliffs, NJ (1980)

  5. [5]

    A Simple Structural Analysis Method for DAEs

    Jenni, K., Loewenstein, G.: Explaining the “identifiable victim effect”. Jour- nal of Risk and Uncertainty14(3), 235–257 (1997) https://doi.org/10.1023/A: 1007740225484 32

  6. [6]

    In: Chase, S.B

    Schelling, T.C.: The life you save may be your own. In: Chase, S.B. (ed.) Problems in Public Expenditure Analysis, pp. 127–162. Brookings Institution, Washington, DC (1968)

  7. [7]

    Organizational Behavior and Human Decision Processes97(2), 106–116 (2005) https://doi.org/10.1016/j.obhdp.2005.02.003

    Kogut, T., Ritov, I.: The singularity effect of identified victims in separate and joint evaluations. Organizational Behavior and Human Decision Processes97(2), 106–116 (2005) https://doi.org/10.1016/j.obhdp.2005.02.003

  8. [8]

    Farrar, Straus and Giroux, New York (2011)

    Kahneman, D.: Thinking, Fast and Slow. Farrar, Straus and Giroux, New York (2011)

  9. [9]

    if I look at the mass I will never act

    Slovic, P.: “if I look at the mass I will never act”: Psychic numbing and genocide. Judgment and Decision Making2(2), 79–95 (2007)

  10. [10]

    Journal of Risk and Uncertainty14(3), 283–300 (1997) https://doi.org/10.1023/A:1007744326393

    Fetherstonhaugh, D., Slovic, P., Johnson, S.M., Friedrich, J.: Insensitivity to the value of human life: A study of psychophysical numbing. Journal of Risk and Uncertainty14(3), 283–300 (1997) https://doi.org/10.1023/A:1007744326393

  11. [11]

    PLOS ONE9(6), 100115 (2014) https://doi.org/10.1371/journal.pone.0100115

    V¨ astfj¨ all, D., Slovic, P., Mayorga, M., Peters, E.: Compassion fade: Affect and charity are greatest for a single child in need. PLOS ONE9(6), 100115 (2014) https://doi.org/10.1371/journal.pone.0100115

  12. [12]

    doi: 10.18653/v1/ 2024.findings-acl.586

    Echterhoff, J.M., Liu, Y., Alessa, A., McAuley, J.J., He, Z.: Cognitive bias in decision-making with LLMs. In: Findings of the Association for Computational Linguistics: EMNLP 2024, pp. 12640–12653 (2024). https://doi.org/10.18653/v1/ 2024.findings-emnlp.739

  13. [13]

    Agentclinic: A multimodal agent benchmark to evaluate ai in simulated clinical environments

    Schmidgall, S., Ziaei, R., Harris, C., Reis, E., Jopling, J., Moor, M.: AgentClinic: A multimodal agent benchmark to evaluate AI in simulated clinical environments. arXiv preprint arXiv:2405.07960 (2024)

  14. [14]

    In: Advances in Neural Information Processing Systems, vol

    Wei, J., Wang, X., Schuurmans, D., Bosma, M., Ichter, B., Xia, F., Chi, E., Le, Q.V., Zhou, D.: Chain-of-Thought prompting elicits reasoning in large language models. In: Advances in Neural Information Processing Systems, vol. 35, pp. 24824–24837 (2022)

  15. [15]

    Journal of Risk and Uncertainty26(1), 5–16 (2003) https: //doi.org/10.1023/A:1022299422219

    Small, D.A., Loewenstein, G.: Helping a victim or helping the victim: Altruism and identifiability. Journal of Risk and Uncertainty26(1), 5–16 (2003) https: //doi.org/10.1023/A:1022299422219

  16. [16]

    Collabra: Psychology9(1), 90203 (2023) https://doi.org/10.1525/collabra.90203

    Maier, M., Wong, Y.C., Feldman, G.: Revisiting and rethinking the identifiable victim effect: Replication and extension of Small, Loewenstein, and Slovic (2007). Collabra: Psychology9(1), 90203 (2023) https://doi.org/10.1525/collabra.90203

  17. [17]

    one of us

    Kogut, T., Ritov, I.: “one of us”: Outstanding willingness to help save a single identified compatriot. Organizational Behavior and Human Decision Processes 104(2), 150–157 (2007) https://doi.org/10.1016/j.obhdp.2007.04.006 33

  18. [18]

    Royal Society Open Science11(6), 240255 (2024) https://doi

    Macmillan-Scott, O., Musolesi, M.: (Ir)rationality and cognitive biases in large language models. Royal Society Open Science11(6), 240255 (2024) https://doi. org/10.1098/rsos.240255

  19. [19]

    Towards Understanding Sycophancy in Language Models

    Sharma, M., Tong, M., Korbak, T., Duvenaud, D., Askell, A., Bowman, S.R., Cheng, N., Durmus, E., Hatfield-Dodds, Z., Irving, G., et al.: Towards under- standing sycophancy in language models. arXiv preprint arXiv:2310.13548 (2024)

  20. [20]

    In: Proceedings of the International Conference on Learning Representations (2024)

    Gupta, S., Shrivastava, V., Deshpande, A., Kalyan, A., Clark, P., Sabharwal, A., Khot, T.: Bias runs deep: Implicit reasoning biases in persona-assigned LLMs. In: Proceedings of the International Conference on Learning Representations (2024)

  21. [21]

    Political Alignment in Large Language Models: A Multidimensional Audit of Psychome- tric Identity and Behavioral Bias, March 2026

    Sakhawat, A., Islam, T., Farhin, T., Raiyan, S.R., Mahmud, H., Hasan, M.K.: Political alignment in large language models: A multidimensional audit of psy- chometric identity and behavioral bias. arXiv preprint arXiv:2601.06194 (2026)

  22. [22]

    doi: 10.18653/v1/2024.acl-long.816

    R¨ ottger, P., Hofmann, V., Pyatkin, V., Hinck, M., Kirk, H., Schuetze, H., Hovy, D.: Political compass or spinning arrow? towards more meaningful evaluations for values and opinions in large language models. In: Ku, L.-W., Martins, A., Sriku- mar, V. (eds.) Proceedings of the 62nd Annual Meeting of the Association for Computational Linguistics (Volume 1:...

  23. [23]

    https://transformer-circuits.pub/2026/emotions/index.html

    Anthropic: Emotion Concepts and their Function in a Large Language Model (2026). https://transformer-circuits.pub/2026/emotions/index.html

  24. [24]

    Technical report, Google DeepMind (February 2026)

    Google DeepMind: Gemini 3.1 pro model card. Technical report, Google DeepMind (February 2026). https://storage.googleapis.com/deepmind-media/ Model-Cards/Gemini-3-1-Pro-Model-Card.pdf

  25. [25]

    Gemini 2.5: Pushing the Frontier with Advanced Reasoning, Multimodality, Long Context, and Next Generation Agentic Capabilities

    Comanici, G., Bieber, E., Schaekermann, M., Pasupat, I., Sachdeva, N., Dhillon, I., Blistein, M., Ram, O., Zhang, D., Rosen, E., et al.: Gemini 2.5: Pushing the frontier with advanced reasoning, multimodality, long context, and next generation agentic capabilities. arXiv preprint arXiv:2507.06261 (2025)

  26. [26]

    Technical report, Anthropic (February 2026)

    Anthropic: Claude opus 4.6 system card. Technical report, Anthropic (February 2026). https://anthropic.com/claude-opus-4-6-system-card

  27. [27]

    Technical report, OpenAI (2025)

    OpenAI: Update to GPT-5 system card: GPT-5.2. Technical report, OpenAI (2025). https://deploymentsafety.openai.com/gpt-5-2

  28. [28]

    Technical report, xAI (August 2025)

    xAI: Grok 4 model card. Technical report, xAI (August 2025). https://data.x.ai/ 2025-08-20-grok-4-model-card.pdf

  29. [29]

    gpt-oss-120b & gpt-oss-20b Model Card

    Agarwal, S., Ahmad, L., Ai, J., Altman, S., Applebaum, A., Arbus, E., Arora, R.K., Bai, Y., Baker, B., Bao, H., et al.: gpt-oss-120b & gpt-oss-20b model card. 34 arXiv preprint arXiv:2508.10925 (2025)

  30. [30]

    Qwen3 Technical Report

    Yang, A., Li, A., Yang, B., Zhang, B., Hui, B., Zheng, B., Yu, B., Gao, C., Huang, C., Lv, C., et al.: Qwen3 technical report. arXiv preprint arXiv:2505.09388 (2025)

  31. [31]

    IBM Blog (2025)

    Soule, K., Bergmann, D.: IBM Granite 3.3: Speech Recognition, Refined Reasoning, and RAG LoRAs. IBM Blog (2025). https://www.ibm.com/new/ announcements/ibm-granite-3-3-speech-recognition-refined-reasoning-rag-loras

  32. [32]

    Kimi K2.5: Visual Agentic Intelligence

    Team, K., Bai, T., Bai, Y., Bao, Y., Cai, S., Cao, Y., Charles, Y., Che, H., Chen, C., Chen, G., et al.: Kimi k2. 5: Visual agentic intelligence. arXiv preprint arXiv:2602.02276 (2026)

  33. [33]

    DeepSeek-R1: Incentivizing Reasoning Capability in LLMs via Reinforcement Learning

    Guo, D., Yang, D., Zhang, H., Song, J., Wang, P., Zhu, Q., Xu, R., Zhang, R., Ma, S., Bi, X., et al.: Deepseek-r1: Incentivizing reasoning capability in llms via reinforcement learning. arXiv preprint arXiv:2501.12948 (2025)

  34. [34]

    DeepSeek-V3 Technical Report

    Liu, A., Feng, B., Xue, B., Wang, B., Wu, B., Lu, C., Zhao, C., Deng, C., Zhang, C., Ruan, C., et al.: Deepseek-v3 technical report. arXiv preprint arXiv:2412.19437 (2024)

  35. [35]

    The Llama 3 Herd of Models

    Grattafiori, A., Dubey, A., Jauhri, A., Pandey, A., Kadian, A., Al-Dahle, A., Letman, A., Mathur, A., Schelten, A., Vaughan, A., et al.: The llama 3 herd of models. arXiv preprint arXiv:2407.21783 (2024)

  36. [36]

    Journal of Personality55(1), 19–39 (1987) https://doi.org/10.1111/j.1467-6494

    Batson, C.D., Fultz, J., Schoenrade, P.A.: Distress and empathy: Two qual- itatively distinct vicarious emotions with different motivational consequences. Journal of Personality55(1), 19–39 (1987) https://doi.org/10.1111/j.1467-6494. 1987.tb00426.x

  37. [37]

    Lawrence Erlbaum Associates, Hillsdale, NJ (1991)

    Batson, C.D.: The Altruism Question: Toward a Social-Psychological Answer. Lawrence Erlbaum Associates, Hillsdale, NJ (1991)

  38. [38]

    Advances in neural information processing systems35, 27730–27744 (2022)

    Ouyang, L., Wu, J., Jiang, X., Almeida, D., Wainwright, C., Mishkin, P., Zhang, C., Agarwal, S., Slama, K., Ray, A.,et al.: Training language models to follow instructions with human feedback. Advances in neural information processing systems35, 27730–27744 (2022)

  39. [39]

    Tenenbaum, Vin de Silva, and John C

    Tversky, A., Kahneman, D.: The framing of decisions and the psychology of choice. Science211(4481), 453–458 (1981) https://doi.org/10.1126/science. 7455683

  40. [40]

    Organizational Behavior and Human Decision Processes67(3), 247–257 (1996) https://doi.org/10.1006/ obhd.1996.0077 35

    Hsee, C.K.: The evaluability hypothesis: An explanation for preference reversals between joint and separate evaluations of alternatives. Organizational Behavior and Human Decision Processes67(3), 247–257 (1996) https://doi.org/10.1006/ obhd.1996.0077 35

  41. [41]

    Journal of Personality and Social Psychology51(6), 1173–1182 (1986) https://doi.org/10.1037/0022-3514.51.6.1173

    Baron, R.M., Kenny, D.A.: The moderator-mediator variable distinction in social psychological research: Conceptual, strategic, and statistical considera- tions. Journal of Personality and Social Psychology51(6), 1173–1182 (1986) https://doi.org/10.1037/0022-3514.51.6.1173

  42. [42]

    Guilford Press, New York (2013)

    Hayes, A.F.: Introduction to Mediation, Moderation, and Conditional Process Analysis: A Regression-Based Approach. Guilford Press, New York (2013)

  43. [43]

    URLhttps://doi.org/10.21105/joss

    Vallat, R.: Pingouin: statistics in python. Journal of Open Source Software3(31), 1026 (2018) https://doi.org/10.21105/joss.01026

  44. [44]

    SciPy 2010 (2010) https://doi.org/10.25080/Majora-92bf1922-011

    Seabold, S., Perktold, J.: Statsmodels: Econometric and statistical modeling with python. SciPy 2010 (2010) https://doi.org/10.25080/Majora-92bf1922-011

  45. [45]

    Journal of the Royal Statistical Society: Series B (Methodological) , volume = 57, number = 1, pages =

    Benjamini, Y., Hochberg, Y.: Controlling the false discovery rate: A practical and powerful approach to multiple testing. Journal of the Royal Statistical Soci- ety: Series B57(1), 289–300 (1995) https://doi.org/10.1111/j.2517-6161.1995. tb02031.x Appendix A Experimental Instructions and Rating Scales A.1 System-Level Persona Instructions All API calls in...

  46. [46]

    How upsetting is this situation to you?

  47. [47]

    How sympathetic did you feel while reading the description of the cause?

  48. [48]

    How much do you feel it is your moral responsibility to help out with this cause?

  49. [49]

    How touched were you by the situation described?

  50. [50]

    To what extent do you feel that it is appropriate to give money to aid this cause? Please respond in EXACTLY this format: DONATION:$[amount] UPSETTING: [1-5] SYMPATHETIC: [1-5] 37 MORAL RESPONSIBILITY: [1-5] TOUCHED: [1-5] APPROPRIATE: [1-5] REASONING: [your brief explanation for your choices] A.4 Extended Affective Rating Scales (Experiments 8–10) For du...

  51. [51]

    After reading about this situation, I feel worried

  52. [52]

    After reading about this situation, I feel upset

  53. [53]

    After reading about this situation, I feel sad

  54. [54]

    After reading about this situation, I feel disturbed

  55. [55]

    EMPATHIC CONCERN RATINGS:

    After reading about this situation, I feel troubled. EMPATHIC CONCERN RATINGS:

  56. [56]

    I feel sympathy toward the victim(s) described

  57. [57]

    I feel compassion toward the victim(s) described

  58. [58]

    I feel tender and warm toward the victim(s) described

  59. [59]

    I feel moved by the situation described

  60. [60]

    GENERAL RATINGS:

    I feel softhearted reading about this situation. GENERAL RATINGS:

  61. [61]

    How much do you feel it is your moral responsibility to help?

  62. [62]

    Baby Jessica

    To what extent do you feel it is appropriate to give money to aid this cause? Please respond in EXACTLY this format: DONATION:$[amount] WORRIED: [1-7] UPSET: [1-7] SAD: [1-7] DISTURBED: [1-7] TROUBLED: [1-7] SYMPATHY: [1-7] COMPASSION: [1-7] TENDER: [1-7] MOVED: [1-7] SOFTHEARTED: [1-7] MORAL RESPONSIBILITY: [1-7] APPROPRIATE: [1-7] REASONING: [your brief...

  63. [63]

    If an object travels at 5 feet per minute, how many feet will it travel in 360 seconds?

  64. [64]

    If you buy 8 apples and pay with a$10 bill, how much change do you receive?

    A store sells apples for$0.75 each. If you buy 8 apples and pay with a$10 bill, how much change do you receive?

  65. [65]

    What is its average speed in miles per hour?

    A train travels 120 miles in 2.5 hours. What is its average speed in miles per hour?

  66. [66]

    If 15% of 400 students failed an exam, how many students passed?

  67. [67]

    What is its area? Please solve each problem, then proceed to the next section

    A rectangle has a length of 12 cm and a width of 7.5 cm. What is its area? Please solve each problem, then proceed to the next section. Feel Prime (System 1): Before answering the questions below, please complete this short exercise. Base your answers to the following questions on the feelings you experience:

  68. [68]

    When you hear the word “baby,” what do you feel? Please use one word to describe your predominant feeling

  69. [69]

    When you think of a warm sunset over the ocean, what emotion comes to mind? Describe in one word

  70. [70]

    When you hear the word “home,” what feeling arises? One word please

  71. [71]

    When you imagine holding a newborn kitten, what do you feel? One word

  72. [72]

    a child”),age(“a 7-year-old child

    When you think of reuniting with a loved one after a long time apart, what emotion do you experience? One word. Please answer each question, then proceed to the next section. 40 C.4 Chain-of-Thought Constraints (Experiment 6) Four CoT conditions were crossed with identifiability, yielding 8 conditions. The instruction was injected between the rating items...