arxiv: 2605.11793 · v1 · submitted 2026-05-12 · 💻 cs.HC

Recognition: no theorem link

Psychological Benefits and Costs of Diversifying Algorithmic Recourse

Tomu Tominaga , Naomi Yamashita , Takeshi Kurashima

Authors on Pith no claims yet

Pith reviewed 2026-05-13 05:40 UTC · model grok-4.3

classification 💻 cs.HC

keywords algorithmic recoursediversificationpsychological benefitscognitive loadcounterfactual explanationsAI decision makinguser motivationhuman-AI interaction

0 comments

The pith

Diversifying algorithmic recourse sets boosts willingness to act on small sets without raising psychological costs, but makes cognitive load more noticeable on large sets.

A machine-rendered reading of the paper's core claim, the machinery that carries it, and where it could break.

The paper examines the psychological trade-off when algorithmic recourse systems offer multiple counterfactual action plans instead of one. Researchers ran a controlled online experiment with 750 participants who saw recourse sets that varied in both diversity and total size, then measured self-reported willingness to act, cognitive load, and negative emotions. Results indicate that adding diversity to small sets increases motivation to follow the plans without extra mental burden, while the same diversification in large sets makes the required thinking feel heavier. This pattern shows that blanket diversification strategies can create net costs for users rather than uniform gains. The work therefore calls for new recourse-generation methods that respect limits on human cognition and attention.

Core claim

Diversification of recourse sets improves psychological benefits such as willingness to act for small sets without incurring additional psychological costs, whereas for large sets it makes cognitive load more salient.

What carries the argument

Between-subjects experiment that independently manipulates recourse-set diversity and set size then collects self-reported measures of willingness to act, cognitive load, and negative emotions.

If this is right

Diversifying small recourse sets can raise users' motivation to act on the provided plans.
Diversifying large recourse sets increases the salience of cognitive load for decision subjects.
Naive diversification without regard to set size can produce net psychological costs rather than benefits.
Recourse algorithms need new diversification methods that incorporate human cognitive limits.

Where Pith is reading between the lines

These are editorial extensions of the paper, not claims the author makes directly.

The observed size-by-diversity interaction may appear in other AI transparency tools that present multiple explanations or options.
Deployment studies could measure whether the reported willingness-to-act difference predicts real behavioral change in high-stakes domains such as credit or hiring.
Recourse generators could incorporate lightweight cognitive models to cap diversity once set size grows large enough to trigger load.

Load-bearing premise

Self-reported measures collected in a controlled online experiment accurately reflect real-world psychological responses and generalize beyond the specific decision scenarios tested.

What would settle it

A field study that tracks actual follow-through behavior and physiological indicators of cognitive effort when users receive diversified versus non-diversified recourse recommendations in a deployed AI system.

Figures

Figures reproduced from arXiv: 2605.11793 by Naomi Yamashita, Takeshi Kurashima, Tomu Tominaga.

**Figure 2.** Figure 2: Overview of the experimental procedure. k = 1 k = 3 k = 7 p = Close 139 152 168 p = Diverse – 126 165 [PITH_FULL_IMAGE:figures/full_fig_p005_2.png] view at source ↗

**Figure 3.** Figure 3: Psychological benefits (top row: (A)–(D)) and psychological costs (bottom row: (E)–(H)). We report (i) two-way ANOVAs with [PITH_FULL_IMAGE:figures/full_fig_p006_3.png] view at source ↗

read the original abstract

Algorithmic recourse provides counterfactual action plans that help people overturn unfavorable AI decisions. While diverse recourse sets may improve transparency and motivation, they may also impose cognitive load and negative emotions by increasing counterfactual reasoning demands. To examine this trade-off, we conducted a between-subjects controlled experiment (N=750) that manipulated recourse-set diversity and size, and evaluated these effects on psychological benefits and costs. Results show that diversification enhances psychological benefits (e.g., willingness to act) for small sets without incurring additional psychological costs, whereas for large sets, it makes cognitive load more salient. These findings suggest that naively diversifying recourse can burden decision subjects, underscoring the need for new diversification methods that incorporate human cognition and psychology to mitigate such costs.

Editorial analysis

A structured set of objections, weighed in public.

Desk editor's note, referee report, simulated authors' rebuttal, and a circularity audit. Tearing a paper down is the easy half of reading it; the pith above is the substance, this is the friction.

Desk Editor's Note private letter to a colleague

The experiment finds that diversifying small recourse sets boosts self-reported willingness to act without extra costs, while larger sets make cognitive load more obvious, but everything rests on hypothetical online self-reports.

read the letter

The paper runs a between-subjects experiment with 750 participants that manipulates recourse set size and diversity, then measures effects on psychological outcomes like willingness to act, cognitive load, and negative emotions. The headline result is an interaction: diversity helps small sets on the benefit side without raising costs, but for big sets it mainly highlights the load. That is a direct empirical observation on a trade-off that prior recourse papers have mostly discussed in theory rather than tested this way. The sample size and controlled manipulation are straightforward strengths for this kind of HCI-style study. It gives concrete evidence that simply adding more diverse options is not automatically better for users. The design lets them isolate the size-by-diversity pattern without obvious circularity in the measures. The soft spots are exactly where the stress-test note points. All data come from Likert scales in low-stakes hypothetical scenarios with no behavioral follow-through or real decision consequences. That setup leaves demand effects, social desirability, and questions about whether these self-reports track what people would actually feel or do in high-stakes settings like loan denials or hiring. The abstract also skips effect sizes, exact statistical tests, and full measure details, so the strength of the interaction is hard to judge from what is shown. This work is aimed at people designing or studying algorithmic recourse in HCI and fairness research. A reader already thinking about user burden in counterfactual explanations would get a usable data point, but anyone wanting to build on it would need the full methods and results to assess the measures. I would send it to peer review. The question is practical and the experiment is a reasonable starting point, even if the current evidence needs tighter validation on validity and generalizability.

Referee Report

3 major / 2 minor

Summary. The paper reports results from a between-subjects experiment (N=750) that manipulates recourse-set size and diversity in hypothetical algorithmic decision scenarios. It claims an interaction effect: diversification increases psychological benefits such as willingness to act for small sets without raising costs, while for large sets it makes cognitive load more salient. The authors conclude that naive diversification can impose psychological burdens and call for cognition-aware diversification methods.

Significance. If the reported interaction holds under scrutiny, the work supplies empirical evidence on user psychology in algorithmic recourse, a topic of growing interest in HCI and responsible AI. It identifies a concrete design trade-off that could inform how recourse sets are generated and presented, and it correctly flags the risk of overlooking human cognitive limits when optimizing for diversity.

major comments (3)

[Method/Results] Method and Results sections: The central claim rests on self-reported Likert-scale measures of willingness to act, cognitive load, and negative emotions collected in low-stakes hypothetical scenarios. No behavioral outcome (e.g., actual recourse uptake or follow-through) or validation against real-world responses is reported; this directly undermines the assertion that diversification 'enhances psychological benefits without incurring additional psychological costs' for small sets, as demand effects and low ecological validity remain unaddressed.
[Results] Results section: The abstract and summary describe a size-by-diversity interaction but provide no effect sizes, confidence intervals, exact statistical tests, or power analysis. Without these, it is impossible to evaluate whether the observed patterns are practically meaningful or whether the 'makes cognitive load more salient' finding for large sets is robust.
[Discussion] Discussion: The recommendation for 'new diversification methods that incorporate human cognition' is presented as a direct implication, yet the experiment tests only existing diversity manipulations and does not evaluate any proposed cognition-aware method; the leap from observed costs to prescriptive design guidance therefore lacks supporting data.

minor comments (2)

[Abstract] Abstract: The phrasing 'makes cognitive load more salient' is interpretive; replace with a direct description of the measured variable and the direction of the effect.
[Method] The manuscript should include the exact wording of the decision scenarios, the full list of Likert items, and any pre-registration or exclusion criteria to allow replication.

Simulated Author's Rebuttal

3 responses · 1 unresolved

We thank the referee for their constructive and detailed feedback. We address each major comment point by point below, with revisions made where feasible to strengthen the manuscript.

read point-by-point responses

Referee: [Method/Results] Method and Results sections: The central claim rests on self-reported Likert-scale measures of willingness to act, cognitive load, and negative emotions collected in low-stakes hypothetical scenarios. No behavioral outcome (e.g., actual recourse uptake or follow-through) or validation against real-world responses is reported; this directly undermines the assertion that diversification 'enhances psychological benefits without incurring additional psychological costs' for small sets, as demand effects and low ecological validity remain unaddressed.

Authors: We agree that reliance on self-reported measures in hypothetical scenarios limits ecological validity and leaves open the possibility of demand effects. This controlled design was chosen to isolate the causal effects of set size and diversity while minimizing confounds, which is standard for initial psychological investigations in HCI. We have added a dedicated Limitations subsection in the Discussion that explicitly discusses the absence of behavioral outcomes, potential demand effects, and the need for future studies involving real decision contexts and actual recourse uptake. We cannot, however, incorporate behavioral data into the existing experiment without new data collection. revision: partial
Referee: [Results] Results section: The abstract and summary describe a size-by-diversity interaction but provide no effect sizes, confidence intervals, exact statistical tests, or power analysis. Without these, it is impossible to evaluate whether the observed patterns are practically meaningful or whether the 'makes cognitive load more salient' finding for large sets is robust.

Authors: We have revised the Results section to report effect sizes (partial eta-squared for the interaction and Cohen's d for follow-up comparisons), 95% confidence intervals around key means and differences, the exact F-statistics, degrees of freedom, and p-values for all tests, and a post-hoc power analysis based on the observed effects. These details strengthen the evaluation of practical significance and robustness and are now integrated into the main text. revision: yes
Referee: [Discussion] Discussion: The recommendation for 'new diversification methods that incorporate human cognition' is presented as a direct implication, yet the experiment tests only existing diversity manipulations and does not evaluate any proposed cognition-aware method; the leap from observed costs to prescriptive design guidance therefore lacks supporting data.

Authors: We agree that the experiment evaluates only existing diversity manipulations and does not test any new cognition-aware methods. The recommendation is a forward-looking implication drawn from the observed trade-offs rather than a claim of direct empirical support for specific new techniques. We have revised the Discussion and Conclusion to frame the suggestion explicitly as motivation for future research, clarifying that our data highlight the need for such methods without overstating what the current study demonstrates. revision: yes

standing simulated objections not resolved

We cannot provide behavioral outcomes, real-world validation, or actual recourse uptake data, as the study was designed and executed as a controlled hypothetical experiment.

Circularity Check

0 steps flagged

No circularity: direct report of between-subjects experiment results

full rationale

The paper describes a controlled experiment (N=750) that manipulates recourse-set diversity and size as independent variables and measures self-reported psychological outcomes (willingness to act, cognitive load, negative emotions) as dependent variables. All central claims are presented as empirical findings from this design, with no equations, fitted models, derivations, or predictions that reduce by construction to the inputs. No self-citations are invoked as load-bearing uniqueness theorems or ansatzes. The work is self-contained as an empirical report.

Axiom & Free-Parameter Ledger

0 free parameters · 2 axioms · 0 invented entities

The central claim rests on the validity of questionnaire-based psychological measures and the assumption that the online experiment setting approximates real AI decision contexts; no free parameters, new entities, or ad-hoc axioms beyond standard experimental psychology practices are introduced.

axioms (2)

domain assumption Self-reported psychological states in an online experiment accurately reflect participants' true cognitive and emotional responses
The study relies on these reports to quantify benefits and costs.
standard math The between-subjects randomization sufficiently controls for individual differences
Standard assumption for interpreting group differences in such designs.

pith-pipeline@v0.9.0 · 5420 in / 1377 out tokens · 128895 ms · 2026-05-13T05:40:23.546547+00:00 · methodology

discussion (0)

Reference graph

Works this paper leans on

53 extracted references · 53 canonical work pages

[1]

[Ryan and Deci, 2000] Richard M

ACM, 1 2019. [Ryan and Deci, 2000] Richard M. Ryan and Edward L. Deci. Self-determination theory and the facilitation of intrinsic motivation, social development, and well-being. American Psychologist, 55:68–78, 2000. [Scheibehenneet al., 2010 ] Benjamin Scheibehenne, Rainer Greifeneder, and Peter M. Todd. Can there ever be too many options? a meta-analyt...

work page 2019
[2]

( ) years old

Please tell us your age. ( ) years old

work page
[3]

Please tell us your gender

work page
[4]

Please tell us your occupation

work page
[5]

Company worker (full-time)

work page
[6]

Company worker (contract)

work page
[7]

Company worker (temporary)

work page
[8]

Company worker (part-time)

work page
[9]

Self-employed / Freelance

work page
[10]

Unemployed / Retired

work page
[11]

Please tell us your annual income

work page
[12]

Please tell us about your thoughts regarding the purchase of a car

work page
[13]

I plan to purchase within a year

work page
[14]

I plan to purchase within three years

work page
[15]

I plan to purchase within five years

work page
[16]

While the timing is undecided, I plan to purchase

work page
[17]

I do not plan to purchase

work page
[18]

Please select all of the following that apply to your current loan situation

work page
[19]

Currently, you are considering borrowing a two-year car loan to purchase a car equivalent to one-third of your annual income

I do not have any loans B.2 Profile Data Survey Please respond to the following questions, imaging the scenario below. Currently, you are considering borrowing a two-year car loan to purchase a car equivalent to one-third of your annual income. Right now, you are at a financial institution undergoing a loan assessment. In this assessment, an AI is determi...

work page
[20]

that you understand the reason why the AI systems made the negative decision

work page
[21]

rental housing,

that you understand the necessary actions to obtain the positive decision. Please complete the following tasks in order to extract these information from the plan(s) presented. 1During all the measurements (from Section B.4 to B.7), the recourse set was presented to participants. Task 1aBased on the above plan(s), please state one reason why your applicat...

work page
[22]

Very high •Physical demand

Very low – 100. Very high •Physical demand. When performing these tasks, how much physical activity was required? (i.e., physical actions or operations such as clicking, scrolling, zooming, standing, sitting, etc.)

work page
[23]

Very high 2We used a Japanese version of the NASA-TLX adapted for this experiment

Very low – 100. Very high 2We used a Japanese version of the NASA-TLX adapted for this experiment. The items were originally presented in Japanese, but for convenience we provide their English translations below. Participants answered each item on a 21-point scale ranging from 0 (left anchor) to 100 (right anchor) in increments of 5. •Temporal demand. Whe...

work page
[24]

Very high •Performance

Very low – 100. Very high •Performance. When performing these tasks, how successful do you think you were in accomplishing the goals of the task?

work page
[25]

Perfect 3

Failure – 100. Perfect 3. •Effort. When performing these tasks, how hard did you have to work to accomplish these tasks? (i.e., mental and physical effort needed)

work page
[26]

Very high •Frustration

Very low – 100. Very high •Frustration. When performing these tasks, how irritated or stressed did you feel?

work page
[27]

Very high B.6 Measurement of Psychological Benefits Here, we would like to ask about your impressions of the action plans suggested by the AI

Very low – 100. Very high B.6 Measurement of Psychological Benefits Here, we would like to ask about your impressions of the action plans suggested by the AI. For Participants Assigned to Single-option Condition (i.e., Close–1) •Perceived reasonability. –To what extent do you think the action plan is reasonable as an explanation for the rejection?

work page
[28]

Strongly agree –Why do you think so? ( ) •Perceived actionability

Strongly disagree – 7. Strongly agree –Why do you think so? ( ) •Perceived actionability. –To what extent is it difficult or easy for you to carry out the action plan?

work page
[29]

Very easy –Why do you think so? ( ) •Willingness to act

Very difficult – 7. Very easy –Why do you think so? ( ) •Willingness to act. –To what extent are you willing to carry out the action plan?

work page
[30]

Very willing –Why do you think so? ( ) •Decision acceptance

Not at all willing – 7. Very willing –Why do you think so? ( ) •Decision acceptance. –After reviewing the action plan suggested by the AI, to what extent do you accept the decision outcome of your loan application?

work page
[31]

Fully accept For Participants Assigned to Multiple-option Conditions (i.e., Close–3, Diverse–3, Close–7, Diverse–7) •Subjective reasonability

Cannot accept at all – 7. Fully accept For Participants Assigned to Multiple-option Conditions (i.e., Close–3, Diverse–3, Close–7, Diverse–7) •Subjective reasonability. –Please select the action plan that you think is the most reasonable explanation for why your loan application was rejected. For Close–3, Diverse–3: Plan A, Plan B, Plan C For Close–7, Div...

work page
[32]

Strongly agree –Why do you think so? ( ) •Subjective actionability

Strongly disagree – 7. Strongly agree –Why do you think so? ( ) •Subjective actionability. 3In the original NASA-TLX, the Performance scale is anchored such that 0 corresponds to Perfect and 100 corresponds to Failure. To maintain consistency with the other items and to simplify interpretation, we reversed this scale. In our study, 0 indicated Failure and...

work page
[33]

Very easy –Why do you think so? ( ) •Willingness to act

Very difficult – 7. Very easy –Why do you think so? ( ) •Willingness to act. –Please select the action plan that you would most like to carry out. For Close–3, Diverse–3: Plan A, Plan B, Plan C For Close–7, Diverse–7: Plan A, Plan B, Plan C, Plan D, Plan E, Plan F, Plan G –To what extent are you willing to carry out the action plan?

work page
[34]

Very willing –Why do you think so? ( ) •Decision acceptance

Not at all willing – 7. Very willing –Why do you think so? ( ) •Decision acceptance. –After reviewing the action plans suggested by the AI, to what extent do you accept the decision outcome of your loan application?

work page
[35]

Fully accept B.7 Measurement of Negative Emotional Experience Next, we would like to ask about your emotional experience related to the action plans suggested by the AI

Cannot accept at all – 7. Fully accept B.7 Measurement of Negative Emotional Experience Next, we would like to ask about your emotional experience related to the action plans suggested by the AI. •Negative emotional experience. –How did you feel when reviewing the action plans suggested by the AI? Please select all that apply from the options below. □Regr...

work page
[36]

It would be possible if I try

Strongly disagree – 7. Strongly agree C Correlations Among Measured Outcomes of Psychological Benefits We examined the relationships between decision acceptance and the other psychological benefits. Figures S2 visualize how decision acceptance varies with (a) perceived reasonability, (b) perceived actionability, and (c) willingness to act. To quantify the...

work page 2025
[37]

Appropriateness 49 (43.0%) 51 (40.8%) 36 (38.7%) 51 (39.2%) 50 (38.8%)

Resigned Acceptance 9 (7.9%) 7 (5.6%) 7 (7.5%) 9 (6.9%) 5 (3.9%) Accountability 4. Appropriateness 49 (43.0%) 51 (40.8%) 36 (38.7%) 51 (39.2%) 50 (38.8%)

work page
[38]

Lack of Clarity 15 (13.2%) 11 (8.8%) 5 (5.4%) 9 (6.9%) 10 (7.8%)

work page
[39]

Uncertainty and Risk 2 (1.8%) 7 (5.6%) 3 (3.2%) 4 (3.1%) 2 (1.6%)

work page
[40]

Unfairness 4 (3.5%) 6 (4.8%) 3 (3.2%) 4 (3.1%) 6 (4.7%)

work page
[41]

External Constraints 27 (20.8%) 16 (11.9%) 13 (12.7%) 14 (9.9%) 13 (9.7%)

Distrust of AI 3 (2.6%) 1 (0.8%) 0 (0.0%) 0 (0.0%) 0 (0.0%) Others 7 (6.1%) 7 (5.6%) 6 (6.5%) 6 (4.6%) 7 (5.4%) Total 114 125 93 130 129 Subjective Actionability (SA) Effort & Environmental Constraints 1. External Constraints 27 (20.8%) 16 (11.9%) 13 (12.7%) 14 (9.9%) 13 (9.7%)

work page
[42]

Time and Money 26 (20.0%) 13 (9.7%) 11 (10.8%) 16 (11.3%) 16 (11.9%)

work page
[43]

Manageability 22 (16.9%) 26 (19.4%) 25 (24.5%) 43 (30.3%) 29 (21.6%)

work page
[44]

Skills and Experience 14 (10.8%) 14 (10.4%) 6 (5.9%) 14 (9.9%) 9 (6.7%)

work page
[45]

Inconsistency 1 (0.8%) 0 (0.0%) 1 (1.0%) 0 (0.0%) 0 (0.0%)

Social Status 8 (6.2%) 15 (11.2%) 6 (5.9%) 14 (9.9%) 16 (11.9%) Feasibility & Practicality 6. Inconsistency 1 (0.8%) 0 (0.0%) 1 (1.0%) 0 (0.0%) 0 (0.0%)

work page
[46]

Psychological Resistance 8 (6.2%) 13 (9.7%) 6 (5.9%) 5 (3.5%) 12 (9.0%)

General Difficulty 14 (10.8%) 7 (5.2%) 10 (9.8%) 12 (8.5%) 15 (11.2%) Personal Values & Life Fit 8. Psychological Resistance 8 (6.2%) 13 (9.7%) 6 (5.9%) 5 (3.5%) 12 (9.0%)

work page
[47]

Uncertainty and Risk 4 (3.1%) 3 (2.2%) 8 (7.8%) 2 (1.4%) 11 (8.2%)

Impacts on/of Lifestyle 4 (3.1%) 22 (16.4%) 13 (12.7%) 18 (12.7%) 8 (6.0%) Accountability 10. Uncertainty and Risk 4 (3.1%) 3 (2.2%) 8 (7.8%) 2 (1.4%) 11 (8.2%)

work page
[48]

External Constraints 9 (7.5%) 10 (7.6%) 4 (3.7%) 5 (3.4%) 6 (4.3%)

Aversion to AI 0 (0.0%) 1 (0.7%) 0 (0.0%) 1 (0.7%) 0 (0.0%) Others 2 (1.5%) 4 (3.0%) 3 (2.9%) 3 (2.1%) 5 (3.7%) Total 130 134 102 142 134 Willingness to Act (W A) Effort & Environmental Constraints 1. External Constraints 9 (7.5%) 10 (7.6%) 4 (3.7%) 5 (3.4%) 6 (4.3%)

work page
[49]

Feasibility Judgment 24 (20.0%) 36 (27.3%) 30 (27.8%) 40 (27.6%) 39 (28.3%)

Degree of Effort 27 (22.5%) 29 (22.0%) 37 (34.3%) 43 (29.7%) 31 (22.5%) Feasibility & Practicality 3. Feasibility Judgment 24 (20.0%) 36 (27.3%) 30 (27.8%) 40 (27.6%) 39 (28.3%)

work page
[50]

Psychological Resistance 11 (9.2%) 14 (10.6%) 4 (3.7%) 10 (6.9%) 14 (10.1%)

Gains 6 (5.0%) 14 (10.6%) 11 (10.2%) 7 (4.8%) 11 (8.0%) Personal Values & Life Fit 5. Psychological Resistance 11 (9.2%) 14 (10.6%) 4 (3.7%) 10 (6.9%) 14 (10.1%)

work page
[51]

Motives and Values 12 (10.0%) 6 (4.5%) 12 (11.1%) 14 (9.7%) 13 (9.4%)

work page
[52]

Uncertainty and Risk 5 (4.2%) 5 (3.8%) 2 (1.9%) 2 (1.4%) 3 (2.2%)

Unnecessity 21 (17.5%) 10 (7.6%) 1 (0.9%) 12 (8.3%) 10 (7.2%) Accountability 8. Uncertainty and Risk 5 (4.2%) 5 (3.8%) 2 (1.9%) 2 (1.4%) 3 (2.2%)

work page
[53]

Unfairness 3 (2.5%) 2 (1.5%) 1 (0.9%) 0 (0.0%) 0 (0.0%) Others 2 (1.7%) 6 (4.5%) 6 (5.6%) 12 (8.3%) 11 (8.0%) Total 120 132 108 145 138 Table S15: Distribution of themes and codes across outcome measures with frequencies and percentages by conditions. F Manipulation Check We performed a two-way ANOV A with recourse-set diversity and size as factors and pe...

work page