pith. machine review for the scientific record. sign in

arxiv: 2605.11793 · v1 · submitted 2026-05-12 · 💻 cs.HC

Recognition: no theorem link

Psychological Benefits and Costs of Diversifying Algorithmic Recourse

Authors on Pith no claims yet

Pith reviewed 2026-05-13 05:40 UTC · model grok-4.3

classification 💻 cs.HC
keywords algorithmic recoursediversificationpsychological benefitscognitive loadcounterfactual explanationsAI decision makinguser motivationhuman-AI interaction
0
0 comments X

The pith

Diversifying algorithmic recourse sets boosts willingness to act on small sets without raising psychological costs, but makes cognitive load more noticeable on large sets.

A machine-rendered reading of the paper's core claim, the machinery that carries it, and where it could break.

The paper examines the psychological trade-off when algorithmic recourse systems offer multiple counterfactual action plans instead of one. Researchers ran a controlled online experiment with 750 participants who saw recourse sets that varied in both diversity and total size, then measured self-reported willingness to act, cognitive load, and negative emotions. Results indicate that adding diversity to small sets increases motivation to follow the plans without extra mental burden, while the same diversification in large sets makes the required thinking feel heavier. This pattern shows that blanket diversification strategies can create net costs for users rather than uniform gains. The work therefore calls for new recourse-generation methods that respect limits on human cognition and attention.

Core claim

Diversification of recourse sets improves psychological benefits such as willingness to act for small sets without incurring additional psychological costs, whereas for large sets it makes cognitive load more salient.

What carries the argument

Between-subjects experiment that independently manipulates recourse-set diversity and set size then collects self-reported measures of willingness to act, cognitive load, and negative emotions.

If this is right

  • Diversifying small recourse sets can raise users' motivation to act on the provided plans.
  • Diversifying large recourse sets increases the salience of cognitive load for decision subjects.
  • Naive diversification without regard to set size can produce net psychological costs rather than benefits.
  • Recourse algorithms need new diversification methods that incorporate human cognitive limits.

Where Pith is reading between the lines

These are editorial extensions of the paper, not claims the author makes directly.

  • The observed size-by-diversity interaction may appear in other AI transparency tools that present multiple explanations or options.
  • Deployment studies could measure whether the reported willingness-to-act difference predicts real behavioral change in high-stakes domains such as credit or hiring.
  • Recourse generators could incorporate lightweight cognitive models to cap diversity once set size grows large enough to trigger load.

Load-bearing premise

Self-reported measures collected in a controlled online experiment accurately reflect real-world psychological responses and generalize beyond the specific decision scenarios tested.

What would settle it

A field study that tracks actual follow-through behavior and physiological indicators of cognitive effort when users receive diversified versus non-diversified recourse recommendations in a deployed AI system.

Figures

Figures reproduced from arXiv: 2605.11793 by Naomi Yamashita, Takeshi Kurashima, Tomu Tominaga.

Figure 1
Figure 1. Figure 1: Conceptual difference in counterfactual sample selection [PITH_FULL_IMAGE:figures/full_fig_p001_1.png] view at source ↗
Figure 2
Figure 2. Figure 2: Overview of the experimental procedure. k = 1 k = 3 k = 7 p = Close 139 152 168 p = Diverse – 126 165 [PITH_FULL_IMAGE:figures/full_fig_p005_2.png] view at source ↗
Figure 3
Figure 3. Figure 3: Psychological benefits (top row: (A)–(D)) and psychological costs (bottom row: (E)–(H)). We report (i) two-way ANOVAs with [PITH_FULL_IMAGE:figures/full_fig_p006_3.png] view at source ↗
read the original abstract

Algorithmic recourse provides counterfactual action plans that help people overturn unfavorable AI decisions. While diverse recourse sets may improve transparency and motivation, they may also impose cognitive load and negative emotions by increasing counterfactual reasoning demands. To examine this trade-off, we conducted a between-subjects controlled experiment (N=750) that manipulated recourse-set diversity and size, and evaluated these effects on psychological benefits and costs. Results show that diversification enhances psychological benefits (e.g., willingness to act) for small sets without incurring additional psychological costs, whereas for large sets, it makes cognitive load more salient. These findings suggest that naively diversifying recourse can burden decision subjects, underscoring the need for new diversification methods that incorporate human cognition and psychology to mitigate such costs.

Editorial analysis

A structured set of objections, weighed in public.

Desk editor's note, referee report, simulated authors' rebuttal, and a circularity audit. Tearing a paper down is the easy half of reading it; the pith above is the substance, this is the friction.

Referee Report

3 major / 2 minor

Summary. The paper reports results from a between-subjects experiment (N=750) that manipulates recourse-set size and diversity in hypothetical algorithmic decision scenarios. It claims an interaction effect: diversification increases psychological benefits such as willingness to act for small sets without raising costs, while for large sets it makes cognitive load more salient. The authors conclude that naive diversification can impose psychological burdens and call for cognition-aware diversification methods.

Significance. If the reported interaction holds under scrutiny, the work supplies empirical evidence on user psychology in algorithmic recourse, a topic of growing interest in HCI and responsible AI. It identifies a concrete design trade-off that could inform how recourse sets are generated and presented, and it correctly flags the risk of overlooking human cognitive limits when optimizing for diversity.

major comments (3)
  1. [Method/Results] Method and Results sections: The central claim rests on self-reported Likert-scale measures of willingness to act, cognitive load, and negative emotions collected in low-stakes hypothetical scenarios. No behavioral outcome (e.g., actual recourse uptake or follow-through) or validation against real-world responses is reported; this directly undermines the assertion that diversification 'enhances psychological benefits without incurring additional psychological costs' for small sets, as demand effects and low ecological validity remain unaddressed.
  2. [Results] Results section: The abstract and summary describe a size-by-diversity interaction but provide no effect sizes, confidence intervals, exact statistical tests, or power analysis. Without these, it is impossible to evaluate whether the observed patterns are practically meaningful or whether the 'makes cognitive load more salient' finding for large sets is robust.
  3. [Discussion] Discussion: The recommendation for 'new diversification methods that incorporate human cognition' is presented as a direct implication, yet the experiment tests only existing diversity manipulations and does not evaluate any proposed cognition-aware method; the leap from observed costs to prescriptive design guidance therefore lacks supporting data.
minor comments (2)
  1. [Abstract] Abstract: The phrasing 'makes cognitive load more salient' is interpretive; replace with a direct description of the measured variable and the direction of the effect.
  2. [Method] The manuscript should include the exact wording of the decision scenarios, the full list of Likert items, and any pre-registration or exclusion criteria to allow replication.

Simulated Author's Rebuttal

3 responses · 1 unresolved

We thank the referee for their constructive and detailed feedback. We address each major comment point by point below, with revisions made where feasible to strengthen the manuscript.

read point-by-point responses
  1. Referee: [Method/Results] Method and Results sections: The central claim rests on self-reported Likert-scale measures of willingness to act, cognitive load, and negative emotions collected in low-stakes hypothetical scenarios. No behavioral outcome (e.g., actual recourse uptake or follow-through) or validation against real-world responses is reported; this directly undermines the assertion that diversification 'enhances psychological benefits without incurring additional psychological costs' for small sets, as demand effects and low ecological validity remain unaddressed.

    Authors: We agree that reliance on self-reported measures in hypothetical scenarios limits ecological validity and leaves open the possibility of demand effects. This controlled design was chosen to isolate the causal effects of set size and diversity while minimizing confounds, which is standard for initial psychological investigations in HCI. We have added a dedicated Limitations subsection in the Discussion that explicitly discusses the absence of behavioral outcomes, potential demand effects, and the need for future studies involving real decision contexts and actual recourse uptake. We cannot, however, incorporate behavioral data into the existing experiment without new data collection. revision: partial

  2. Referee: [Results] Results section: The abstract and summary describe a size-by-diversity interaction but provide no effect sizes, confidence intervals, exact statistical tests, or power analysis. Without these, it is impossible to evaluate whether the observed patterns are practically meaningful or whether the 'makes cognitive load more salient' finding for large sets is robust.

    Authors: We have revised the Results section to report effect sizes (partial eta-squared for the interaction and Cohen's d for follow-up comparisons), 95% confidence intervals around key means and differences, the exact F-statistics, degrees of freedom, and p-values for all tests, and a post-hoc power analysis based on the observed effects. These details strengthen the evaluation of practical significance and robustness and are now integrated into the main text. revision: yes

  3. Referee: [Discussion] Discussion: The recommendation for 'new diversification methods that incorporate human cognition' is presented as a direct implication, yet the experiment tests only existing diversity manipulations and does not evaluate any proposed cognition-aware method; the leap from observed costs to prescriptive design guidance therefore lacks supporting data.

    Authors: We agree that the experiment evaluates only existing diversity manipulations and does not test any new cognition-aware methods. The recommendation is a forward-looking implication drawn from the observed trade-offs rather than a claim of direct empirical support for specific new techniques. We have revised the Discussion and Conclusion to frame the suggestion explicitly as motivation for future research, clarifying that our data highlight the need for such methods without overstating what the current study demonstrates. revision: yes

standing simulated objections not resolved
  • We cannot provide behavioral outcomes, real-world validation, or actual recourse uptake data, as the study was designed and executed as a controlled hypothetical experiment.

Circularity Check

0 steps flagged

No circularity: direct report of between-subjects experiment results

full rationale

The paper describes a controlled experiment (N=750) that manipulates recourse-set diversity and size as independent variables and measures self-reported psychological outcomes (willingness to act, cognitive load, negative emotions) as dependent variables. All central claims are presented as empirical findings from this design, with no equations, fitted models, derivations, or predictions that reduce by construction to the inputs. No self-citations are invoked as load-bearing uniqueness theorems or ansatzes. The work is self-contained as an empirical report.

Axiom & Free-Parameter Ledger

0 free parameters · 2 axioms · 0 invented entities

The central claim rests on the validity of questionnaire-based psychological measures and the assumption that the online experiment setting approximates real AI decision contexts; no free parameters, new entities, or ad-hoc axioms beyond standard experimental psychology practices are introduced.

axioms (2)
  • domain assumption Self-reported psychological states in an online experiment accurately reflect participants' true cognitive and emotional responses
    The study relies on these reports to quantify benefits and costs.
  • standard math The between-subjects randomization sufficiently controls for individual differences
    Standard assumption for interpreting group differences in such designs.

pith-pipeline@v0.9.0 · 5420 in / 1377 out tokens · 128895 ms · 2026-05-13T05:40:23.546547+00:00 · methodology

discussion (0)

Sign in with ORCID, Apple, or X to comment. Anyone can read and Pith papers without signing in.

Reference graph

Works this paper leans on

53 extracted references · 53 canonical work pages

  1. [1]

    [Ryan and Deci, 2000] Richard M

    ACM, 1 2019. [Ryan and Deci, 2000] Richard M. Ryan and Edward L. Deci. Self-determination theory and the facilitation of intrinsic motivation, social development, and well-being. American Psychologist, 55:68–78, 2000. [Scheibehenneet al., 2010 ] Benjamin Scheibehenne, Rainer Greifeneder, and Peter M. Todd. Can there ever be too many options? a meta-analyt...

  2. [2]

    ( ) years old

    Please tell us your age. ( ) years old

  3. [3]

    Please tell us your gender

  4. [4]

    Please tell us your occupation

  5. [5]

    Company worker (full-time)

  6. [6]

    Company worker (contract)

  7. [7]

    Company worker (temporary)

  8. [8]

    Company worker (part-time)

  9. [9]

    Self-employed / Freelance

  10. [10]

    Unemployed / Retired

  11. [11]

    Please tell us your annual income

  12. [12]

    Please tell us about your thoughts regarding the purchase of a car

  13. [13]

    I plan to purchase within a year

  14. [14]

    I plan to purchase within three years

  15. [15]

    I plan to purchase within five years

  16. [16]

    While the timing is undecided, I plan to purchase

  17. [17]

    I do not plan to purchase

  18. [18]

    Please select all of the following that apply to your current loan situation

  19. [19]

    Currently, you are considering borrowing a two-year car loan to purchase a car equivalent to one-third of your annual income

    I do not have any loans B.2 Profile Data Survey Please respond to the following questions, imaging the scenario below. Currently, you are considering borrowing a two-year car loan to purchase a car equivalent to one-third of your annual income. Right now, you are at a financial institution undergoing a loan assessment. In this assessment, an AI is determi...

  20. [20]

    that you understand the reason why the AI systems made the negative decision

  21. [21]

    rental housing,

    that you understand the necessary actions to obtain the positive decision. Please complete the following tasks in order to extract these information from the plan(s) presented. 1During all the measurements (from Section B.4 to B.7), the recourse set was presented to participants. Task 1aBased on the above plan(s), please state one reason why your applicat...

  22. [22]

    Very high •Physical demand

    Very low – 100. Very high •Physical demand. When performing these tasks, how much physical activity was required? (i.e., physical actions or operations such as clicking, scrolling, zooming, standing, sitting, etc.)

  23. [23]

    Very high 2We used a Japanese version of the NASA-TLX adapted for this experiment

    Very low – 100. Very high 2We used a Japanese version of the NASA-TLX adapted for this experiment. The items were originally presented in Japanese, but for convenience we provide their English translations below. Participants answered each item on a 21-point scale ranging from 0 (left anchor) to 100 (right anchor) in increments of 5. •Temporal demand. Whe...

  24. [24]

    Very high •Performance

    Very low – 100. Very high •Performance. When performing these tasks, how successful do you think you were in accomplishing the goals of the task?

  25. [25]

    Perfect 3

    Failure – 100. Perfect 3. •Effort. When performing these tasks, how hard did you have to work to accomplish these tasks? (i.e., mental and physical effort needed)

  26. [26]

    Very high •Frustration

    Very low – 100. Very high •Frustration. When performing these tasks, how irritated or stressed did you feel?

  27. [27]

    Very high B.6 Measurement of Psychological Benefits Here, we would like to ask about your impressions of the action plans suggested by the AI

    Very low – 100. Very high B.6 Measurement of Psychological Benefits Here, we would like to ask about your impressions of the action plans suggested by the AI. For Participants Assigned to Single-option Condition (i.e., Close–1) •Perceived reasonability. –To what extent do you think the action plan is reasonable as an explanation for the rejection?

  28. [28]

    Strongly agree –Why do you think so? ( ) •Perceived actionability

    Strongly disagree – 7. Strongly agree –Why do you think so? ( ) •Perceived actionability. –To what extent is it difficult or easy for you to carry out the action plan?

  29. [29]

    Very easy –Why do you think so? ( ) •Willingness to act

    Very difficult – 7. Very easy –Why do you think so? ( ) •Willingness to act. –To what extent are you willing to carry out the action plan?

  30. [30]

    Very willing –Why do you think so? ( ) •Decision acceptance

    Not at all willing – 7. Very willing –Why do you think so? ( ) •Decision acceptance. –After reviewing the action plan suggested by the AI, to what extent do you accept the decision outcome of your loan application?

  31. [31]

    Fully accept For Participants Assigned to Multiple-option Conditions (i.e., Close–3, Diverse–3, Close–7, Diverse–7) •Subjective reasonability

    Cannot accept at all – 7. Fully accept For Participants Assigned to Multiple-option Conditions (i.e., Close–3, Diverse–3, Close–7, Diverse–7) •Subjective reasonability. –Please select the action plan that you think is the most reasonable explanation for why your loan application was rejected. For Close–3, Diverse–3: Plan A, Plan B, Plan C For Close–7, Div...

  32. [32]

    Strongly agree –Why do you think so? ( ) •Subjective actionability

    Strongly disagree – 7. Strongly agree –Why do you think so? ( ) •Subjective actionability. 3In the original NASA-TLX, the Performance scale is anchored such that 0 corresponds to Perfect and 100 corresponds to Failure. To maintain consistency with the other items and to simplify interpretation, we reversed this scale. In our study, 0 indicated Failure and...

  33. [33]

    Very easy –Why do you think so? ( ) •Willingness to act

    Very difficult – 7. Very easy –Why do you think so? ( ) •Willingness to act. –Please select the action plan that you would most like to carry out. For Close–3, Diverse–3: Plan A, Plan B, Plan C For Close–7, Diverse–7: Plan A, Plan B, Plan C, Plan D, Plan E, Plan F, Plan G –To what extent are you willing to carry out the action plan?

  34. [34]

    Very willing –Why do you think so? ( ) •Decision acceptance

    Not at all willing – 7. Very willing –Why do you think so? ( ) •Decision acceptance. –After reviewing the action plans suggested by the AI, to what extent do you accept the decision outcome of your loan application?

  35. [35]

    Fully accept B.7 Measurement of Negative Emotional Experience Next, we would like to ask about your emotional experience related to the action plans suggested by the AI

    Cannot accept at all – 7. Fully accept B.7 Measurement of Negative Emotional Experience Next, we would like to ask about your emotional experience related to the action plans suggested by the AI. •Negative emotional experience. –How did you feel when reviewing the action plans suggested by the AI? Please select all that apply from the options below. □Regr...

  36. [36]

    It would be possible if I try

    Strongly disagree – 7. Strongly agree C Correlations Among Measured Outcomes of Psychological Benefits We examined the relationships between decision acceptance and the other psychological benefits. Figures S2 visualize how decision acceptance varies with (a) perceived reasonability, (b) perceived actionability, and (c) willingness to act. To quantify the...

  37. [37]

    Appropriateness 49 (43.0%) 51 (40.8%) 36 (38.7%) 51 (39.2%) 50 (38.8%)

    Resigned Acceptance 9 (7.9%) 7 (5.6%) 7 (7.5%) 9 (6.9%) 5 (3.9%) Accountability 4. Appropriateness 49 (43.0%) 51 (40.8%) 36 (38.7%) 51 (39.2%) 50 (38.8%)

  38. [38]

    Lack of Clarity 15 (13.2%) 11 (8.8%) 5 (5.4%) 9 (6.9%) 10 (7.8%)

  39. [39]

    Uncertainty and Risk 2 (1.8%) 7 (5.6%) 3 (3.2%) 4 (3.1%) 2 (1.6%)

  40. [40]

    Unfairness 4 (3.5%) 6 (4.8%) 3 (3.2%) 4 (3.1%) 6 (4.7%)

  41. [41]

    External Constraints 27 (20.8%) 16 (11.9%) 13 (12.7%) 14 (9.9%) 13 (9.7%)

    Distrust of AI 3 (2.6%) 1 (0.8%) 0 (0.0%) 0 (0.0%) 0 (0.0%) Others 7 (6.1%) 7 (5.6%) 6 (6.5%) 6 (4.6%) 7 (5.4%) Total 114 125 93 130 129 Subjective Actionability (SA) Effort & Environmental Constraints 1. External Constraints 27 (20.8%) 16 (11.9%) 13 (12.7%) 14 (9.9%) 13 (9.7%)

  42. [42]

    Time and Money 26 (20.0%) 13 (9.7%) 11 (10.8%) 16 (11.3%) 16 (11.9%)

  43. [43]

    Manageability 22 (16.9%) 26 (19.4%) 25 (24.5%) 43 (30.3%) 29 (21.6%)

  44. [44]

    Skills and Experience 14 (10.8%) 14 (10.4%) 6 (5.9%) 14 (9.9%) 9 (6.7%)

  45. [45]

    Inconsistency 1 (0.8%) 0 (0.0%) 1 (1.0%) 0 (0.0%) 0 (0.0%)

    Social Status 8 (6.2%) 15 (11.2%) 6 (5.9%) 14 (9.9%) 16 (11.9%) Feasibility & Practicality 6. Inconsistency 1 (0.8%) 0 (0.0%) 1 (1.0%) 0 (0.0%) 0 (0.0%)

  46. [46]

    Psychological Resistance 8 (6.2%) 13 (9.7%) 6 (5.9%) 5 (3.5%) 12 (9.0%)

    General Difficulty 14 (10.8%) 7 (5.2%) 10 (9.8%) 12 (8.5%) 15 (11.2%) Personal Values & Life Fit 8. Psychological Resistance 8 (6.2%) 13 (9.7%) 6 (5.9%) 5 (3.5%) 12 (9.0%)

  47. [47]

    Uncertainty and Risk 4 (3.1%) 3 (2.2%) 8 (7.8%) 2 (1.4%) 11 (8.2%)

    Impacts on/of Lifestyle 4 (3.1%) 22 (16.4%) 13 (12.7%) 18 (12.7%) 8 (6.0%) Accountability 10. Uncertainty and Risk 4 (3.1%) 3 (2.2%) 8 (7.8%) 2 (1.4%) 11 (8.2%)

  48. [48]

    External Constraints 9 (7.5%) 10 (7.6%) 4 (3.7%) 5 (3.4%) 6 (4.3%)

    Aversion to AI 0 (0.0%) 1 (0.7%) 0 (0.0%) 1 (0.7%) 0 (0.0%) Others 2 (1.5%) 4 (3.0%) 3 (2.9%) 3 (2.1%) 5 (3.7%) Total 130 134 102 142 134 Willingness to Act (W A) Effort & Environmental Constraints 1. External Constraints 9 (7.5%) 10 (7.6%) 4 (3.7%) 5 (3.4%) 6 (4.3%)

  49. [49]

    Feasibility Judgment 24 (20.0%) 36 (27.3%) 30 (27.8%) 40 (27.6%) 39 (28.3%)

    Degree of Effort 27 (22.5%) 29 (22.0%) 37 (34.3%) 43 (29.7%) 31 (22.5%) Feasibility & Practicality 3. Feasibility Judgment 24 (20.0%) 36 (27.3%) 30 (27.8%) 40 (27.6%) 39 (28.3%)

  50. [50]

    Psychological Resistance 11 (9.2%) 14 (10.6%) 4 (3.7%) 10 (6.9%) 14 (10.1%)

    Gains 6 (5.0%) 14 (10.6%) 11 (10.2%) 7 (4.8%) 11 (8.0%) Personal Values & Life Fit 5. Psychological Resistance 11 (9.2%) 14 (10.6%) 4 (3.7%) 10 (6.9%) 14 (10.1%)

  51. [51]

    Motives and Values 12 (10.0%) 6 (4.5%) 12 (11.1%) 14 (9.7%) 13 (9.4%)

  52. [52]

    Uncertainty and Risk 5 (4.2%) 5 (3.8%) 2 (1.9%) 2 (1.4%) 3 (2.2%)

    Unnecessity 21 (17.5%) 10 (7.6%) 1 (0.9%) 12 (8.3%) 10 (7.2%) Accountability 8. Uncertainty and Risk 5 (4.2%) 5 (3.8%) 2 (1.9%) 2 (1.4%) 3 (2.2%)

  53. [53]

    Unfairness 3 (2.5%) 2 (1.5%) 1 (0.9%) 0 (0.0%) 0 (0.0%) Others 2 (1.7%) 6 (4.5%) 6 (5.6%) 12 (8.3%) 11 (8.0%) Total 120 132 108 145 138 Table S15: Distribution of themes and codes across outcome measures with frequencies and percentages by conditions. F Manipulation Check We performed a two-way ANOV A with recourse-set diversity and size as factors and pe...