Recognition: no theorem link
Psychological Benefits and Costs of Diversifying Algorithmic Recourse
Pith reviewed 2026-05-13 05:40 UTC · model grok-4.3
The pith
Diversifying algorithmic recourse sets boosts willingness to act on small sets without raising psychological costs, but makes cognitive load more noticeable on large sets.
A machine-rendered reading of the paper's core claim, the machinery that carries it, and where it could break.
Core claim
Diversification of recourse sets improves psychological benefits such as willingness to act for small sets without incurring additional psychological costs, whereas for large sets it makes cognitive load more salient.
What carries the argument
Between-subjects experiment that independently manipulates recourse-set diversity and set size then collects self-reported measures of willingness to act, cognitive load, and negative emotions.
If this is right
- Diversifying small recourse sets can raise users' motivation to act on the provided plans.
- Diversifying large recourse sets increases the salience of cognitive load for decision subjects.
- Naive diversification without regard to set size can produce net psychological costs rather than benefits.
- Recourse algorithms need new diversification methods that incorporate human cognitive limits.
Where Pith is reading between the lines
- The observed size-by-diversity interaction may appear in other AI transparency tools that present multiple explanations or options.
- Deployment studies could measure whether the reported willingness-to-act difference predicts real behavioral change in high-stakes domains such as credit or hiring.
- Recourse generators could incorporate lightweight cognitive models to cap diversity once set size grows large enough to trigger load.
Load-bearing premise
Self-reported measures collected in a controlled online experiment accurately reflect real-world psychological responses and generalize beyond the specific decision scenarios tested.
What would settle it
A field study that tracks actual follow-through behavior and physiological indicators of cognitive effort when users receive diversified versus non-diversified recourse recommendations in a deployed AI system.
Figures
read the original abstract
Algorithmic recourse provides counterfactual action plans that help people overturn unfavorable AI decisions. While diverse recourse sets may improve transparency and motivation, they may also impose cognitive load and negative emotions by increasing counterfactual reasoning demands. To examine this trade-off, we conducted a between-subjects controlled experiment (N=750) that manipulated recourse-set diversity and size, and evaluated these effects on psychological benefits and costs. Results show that diversification enhances psychological benefits (e.g., willingness to act) for small sets without incurring additional psychological costs, whereas for large sets, it makes cognitive load more salient. These findings suggest that naively diversifying recourse can burden decision subjects, underscoring the need for new diversification methods that incorporate human cognition and psychology to mitigate such costs.
Editorial analysis
A structured set of objections, weighed in public.
Referee Report
Summary. The paper reports results from a between-subjects experiment (N=750) that manipulates recourse-set size and diversity in hypothetical algorithmic decision scenarios. It claims an interaction effect: diversification increases psychological benefits such as willingness to act for small sets without raising costs, while for large sets it makes cognitive load more salient. The authors conclude that naive diversification can impose psychological burdens and call for cognition-aware diversification methods.
Significance. If the reported interaction holds under scrutiny, the work supplies empirical evidence on user psychology in algorithmic recourse, a topic of growing interest in HCI and responsible AI. It identifies a concrete design trade-off that could inform how recourse sets are generated and presented, and it correctly flags the risk of overlooking human cognitive limits when optimizing for diversity.
major comments (3)
- [Method/Results] Method and Results sections: The central claim rests on self-reported Likert-scale measures of willingness to act, cognitive load, and negative emotions collected in low-stakes hypothetical scenarios. No behavioral outcome (e.g., actual recourse uptake or follow-through) or validation against real-world responses is reported; this directly undermines the assertion that diversification 'enhances psychological benefits without incurring additional psychological costs' for small sets, as demand effects and low ecological validity remain unaddressed.
- [Results] Results section: The abstract and summary describe a size-by-diversity interaction but provide no effect sizes, confidence intervals, exact statistical tests, or power analysis. Without these, it is impossible to evaluate whether the observed patterns are practically meaningful or whether the 'makes cognitive load more salient' finding for large sets is robust.
- [Discussion] Discussion: The recommendation for 'new diversification methods that incorporate human cognition' is presented as a direct implication, yet the experiment tests only existing diversity manipulations and does not evaluate any proposed cognition-aware method; the leap from observed costs to prescriptive design guidance therefore lacks supporting data.
minor comments (2)
- [Abstract] Abstract: The phrasing 'makes cognitive load more salient' is interpretive; replace with a direct description of the measured variable and the direction of the effect.
- [Method] The manuscript should include the exact wording of the decision scenarios, the full list of Likert items, and any pre-registration or exclusion criteria to allow replication.
Simulated Author's Rebuttal
We thank the referee for their constructive and detailed feedback. We address each major comment point by point below, with revisions made where feasible to strengthen the manuscript.
read point-by-point responses
-
Referee: [Method/Results] Method and Results sections: The central claim rests on self-reported Likert-scale measures of willingness to act, cognitive load, and negative emotions collected in low-stakes hypothetical scenarios. No behavioral outcome (e.g., actual recourse uptake or follow-through) or validation against real-world responses is reported; this directly undermines the assertion that diversification 'enhances psychological benefits without incurring additional psychological costs' for small sets, as demand effects and low ecological validity remain unaddressed.
Authors: We agree that reliance on self-reported measures in hypothetical scenarios limits ecological validity and leaves open the possibility of demand effects. This controlled design was chosen to isolate the causal effects of set size and diversity while minimizing confounds, which is standard for initial psychological investigations in HCI. We have added a dedicated Limitations subsection in the Discussion that explicitly discusses the absence of behavioral outcomes, potential demand effects, and the need for future studies involving real decision contexts and actual recourse uptake. We cannot, however, incorporate behavioral data into the existing experiment without new data collection. revision: partial
-
Referee: [Results] Results section: The abstract and summary describe a size-by-diversity interaction but provide no effect sizes, confidence intervals, exact statistical tests, or power analysis. Without these, it is impossible to evaluate whether the observed patterns are practically meaningful or whether the 'makes cognitive load more salient' finding for large sets is robust.
Authors: We have revised the Results section to report effect sizes (partial eta-squared for the interaction and Cohen's d for follow-up comparisons), 95% confidence intervals around key means and differences, the exact F-statistics, degrees of freedom, and p-values for all tests, and a post-hoc power analysis based on the observed effects. These details strengthen the evaluation of practical significance and robustness and are now integrated into the main text. revision: yes
-
Referee: [Discussion] Discussion: The recommendation for 'new diversification methods that incorporate human cognition' is presented as a direct implication, yet the experiment tests only existing diversity manipulations and does not evaluate any proposed cognition-aware method; the leap from observed costs to prescriptive design guidance therefore lacks supporting data.
Authors: We agree that the experiment evaluates only existing diversity manipulations and does not test any new cognition-aware methods. The recommendation is a forward-looking implication drawn from the observed trade-offs rather than a claim of direct empirical support for specific new techniques. We have revised the Discussion and Conclusion to frame the suggestion explicitly as motivation for future research, clarifying that our data highlight the need for such methods without overstating what the current study demonstrates. revision: yes
- We cannot provide behavioral outcomes, real-world validation, or actual recourse uptake data, as the study was designed and executed as a controlled hypothetical experiment.
Circularity Check
No circularity: direct report of between-subjects experiment results
full rationale
The paper describes a controlled experiment (N=750) that manipulates recourse-set diversity and size as independent variables and measures self-reported psychological outcomes (willingness to act, cognitive load, negative emotions) as dependent variables. All central claims are presented as empirical findings from this design, with no equations, fitted models, derivations, or predictions that reduce by construction to the inputs. No self-citations are invoked as load-bearing uniqueness theorems or ansatzes. The work is self-contained as an empirical report.
Axiom & Free-Parameter Ledger
axioms (2)
- domain assumption Self-reported psychological states in an online experiment accurately reflect participants' true cognitive and emotional responses
- standard math The between-subjects randomization sufficiently controls for individual differences
Reference graph
Works this paper leans on
-
[1]
[Ryan and Deci, 2000] Richard M
ACM, 1 2019. [Ryan and Deci, 2000] Richard M. Ryan and Edward L. Deci. Self-determination theory and the facilitation of intrinsic motivation, social development, and well-being. American Psychologist, 55:68–78, 2000. [Scheibehenneet al., 2010 ] Benjamin Scheibehenne, Rainer Greifeneder, and Peter M. Todd. Can there ever be too many options? a meta-analyt...
work page 2019
- [2]
-
[3]
Please tell us your gender
-
[4]
Please tell us your occupation
-
[5]
Company worker (full-time)
-
[6]
Company worker (contract)
-
[7]
Company worker (temporary)
-
[8]
Company worker (part-time)
-
[9]
Self-employed / Freelance
-
[10]
Unemployed / Retired
-
[11]
Please tell us your annual income
-
[12]
Please tell us about your thoughts regarding the purchase of a car
-
[13]
I plan to purchase within a year
-
[14]
I plan to purchase within three years
-
[15]
I plan to purchase within five years
-
[16]
While the timing is undecided, I plan to purchase
-
[17]
I do not plan to purchase
-
[18]
Please select all of the following that apply to your current loan situation
-
[19]
I do not have any loans B.2 Profile Data Survey Please respond to the following questions, imaging the scenario below. Currently, you are considering borrowing a two-year car loan to purchase a car equivalent to one-third of your annual income. Right now, you are at a financial institution undergoing a loan assessment. In this assessment, an AI is determi...
-
[20]
that you understand the reason why the AI systems made the negative decision
-
[21]
that you understand the necessary actions to obtain the positive decision. Please complete the following tasks in order to extract these information from the plan(s) presented. 1During all the measurements (from Section B.4 to B.7), the recourse set was presented to participants. Task 1aBased on the above plan(s), please state one reason why your applicat...
-
[22]
Very low – 100. Very high •Physical demand. When performing these tasks, how much physical activity was required? (i.e., physical actions or operations such as clicking, scrolling, zooming, standing, sitting, etc.)
-
[23]
Very high 2We used a Japanese version of the NASA-TLX adapted for this experiment
Very low – 100. Very high 2We used a Japanese version of the NASA-TLX adapted for this experiment. The items were originally presented in Japanese, but for convenience we provide their English translations below. Participants answered each item on a 21-point scale ranging from 0 (left anchor) to 100 (right anchor) in increments of 5. •Temporal demand. Whe...
-
[24]
Very low – 100. Very high •Performance. When performing these tasks, how successful do you think you were in accomplishing the goals of the task?
- [25]
-
[26]
Very low – 100. Very high •Frustration. When performing these tasks, how irritated or stressed did you feel?
-
[27]
Very low – 100. Very high B.6 Measurement of Psychological Benefits Here, we would like to ask about your impressions of the action plans suggested by the AI. For Participants Assigned to Single-option Condition (i.e., Close–1) •Perceived reasonability. –To what extent do you think the action plan is reasonable as an explanation for the rejection?
-
[28]
Strongly agree –Why do you think so? ( ) •Perceived actionability
Strongly disagree – 7. Strongly agree –Why do you think so? ( ) •Perceived actionability. –To what extent is it difficult or easy for you to carry out the action plan?
-
[29]
Very easy –Why do you think so? ( ) •Willingness to act
Very difficult – 7. Very easy –Why do you think so? ( ) •Willingness to act. –To what extent are you willing to carry out the action plan?
-
[30]
Very willing –Why do you think so? ( ) •Decision acceptance
Not at all willing – 7. Very willing –Why do you think so? ( ) •Decision acceptance. –After reviewing the action plan suggested by the AI, to what extent do you accept the decision outcome of your loan application?
-
[31]
Cannot accept at all – 7. Fully accept For Participants Assigned to Multiple-option Conditions (i.e., Close–3, Diverse–3, Close–7, Diverse–7) •Subjective reasonability. –Please select the action plan that you think is the most reasonable explanation for why your loan application was rejected. For Close–3, Diverse–3: Plan A, Plan B, Plan C For Close–7, Div...
-
[32]
Strongly agree –Why do you think so? ( ) •Subjective actionability
Strongly disagree – 7. Strongly agree –Why do you think so? ( ) •Subjective actionability. 3In the original NASA-TLX, the Performance scale is anchored such that 0 corresponds to Perfect and 100 corresponds to Failure. To maintain consistency with the other items and to simplify interpretation, we reversed this scale. In our study, 0 indicated Failure and...
-
[33]
Very easy –Why do you think so? ( ) •Willingness to act
Very difficult – 7. Very easy –Why do you think so? ( ) •Willingness to act. –Please select the action plan that you would most like to carry out. For Close–3, Diverse–3: Plan A, Plan B, Plan C For Close–7, Diverse–7: Plan A, Plan B, Plan C, Plan D, Plan E, Plan F, Plan G –To what extent are you willing to carry out the action plan?
-
[34]
Very willing –Why do you think so? ( ) •Decision acceptance
Not at all willing – 7. Very willing –Why do you think so? ( ) •Decision acceptance. –After reviewing the action plans suggested by the AI, to what extent do you accept the decision outcome of your loan application?
-
[35]
Cannot accept at all – 7. Fully accept B.7 Measurement of Negative Emotional Experience Next, we would like to ask about your emotional experience related to the action plans suggested by the AI. •Negative emotional experience. –How did you feel when reviewing the action plans suggested by the AI? Please select all that apply from the options below. □Regr...
-
[36]
Strongly disagree – 7. Strongly agree C Correlations Among Measured Outcomes of Psychological Benefits We examined the relationships between decision acceptance and the other psychological benefits. Figures S2 visualize how decision acceptance varies with (a) perceived reasonability, (b) perceived actionability, and (c) willingness to act. To quantify the...
work page 2025
-
[37]
Appropriateness 49 (43.0%) 51 (40.8%) 36 (38.7%) 51 (39.2%) 50 (38.8%)
Resigned Acceptance 9 (7.9%) 7 (5.6%) 7 (7.5%) 9 (6.9%) 5 (3.9%) Accountability 4. Appropriateness 49 (43.0%) 51 (40.8%) 36 (38.7%) 51 (39.2%) 50 (38.8%)
-
[38]
Lack of Clarity 15 (13.2%) 11 (8.8%) 5 (5.4%) 9 (6.9%) 10 (7.8%)
-
[39]
Uncertainty and Risk 2 (1.8%) 7 (5.6%) 3 (3.2%) 4 (3.1%) 2 (1.6%)
-
[40]
Unfairness 4 (3.5%) 6 (4.8%) 3 (3.2%) 4 (3.1%) 6 (4.7%)
-
[41]
External Constraints 27 (20.8%) 16 (11.9%) 13 (12.7%) 14 (9.9%) 13 (9.7%)
Distrust of AI 3 (2.6%) 1 (0.8%) 0 (0.0%) 0 (0.0%) 0 (0.0%) Others 7 (6.1%) 7 (5.6%) 6 (6.5%) 6 (4.6%) 7 (5.4%) Total 114 125 93 130 129 Subjective Actionability (SA) Effort & Environmental Constraints 1. External Constraints 27 (20.8%) 16 (11.9%) 13 (12.7%) 14 (9.9%) 13 (9.7%)
-
[42]
Time and Money 26 (20.0%) 13 (9.7%) 11 (10.8%) 16 (11.3%) 16 (11.9%)
-
[43]
Manageability 22 (16.9%) 26 (19.4%) 25 (24.5%) 43 (30.3%) 29 (21.6%)
-
[44]
Skills and Experience 14 (10.8%) 14 (10.4%) 6 (5.9%) 14 (9.9%) 9 (6.7%)
-
[45]
Inconsistency 1 (0.8%) 0 (0.0%) 1 (1.0%) 0 (0.0%) 0 (0.0%)
Social Status 8 (6.2%) 15 (11.2%) 6 (5.9%) 14 (9.9%) 16 (11.9%) Feasibility & Practicality 6. Inconsistency 1 (0.8%) 0 (0.0%) 1 (1.0%) 0 (0.0%) 0 (0.0%)
-
[46]
Psychological Resistance 8 (6.2%) 13 (9.7%) 6 (5.9%) 5 (3.5%) 12 (9.0%)
General Difficulty 14 (10.8%) 7 (5.2%) 10 (9.8%) 12 (8.5%) 15 (11.2%) Personal Values & Life Fit 8. Psychological Resistance 8 (6.2%) 13 (9.7%) 6 (5.9%) 5 (3.5%) 12 (9.0%)
-
[47]
Uncertainty and Risk 4 (3.1%) 3 (2.2%) 8 (7.8%) 2 (1.4%) 11 (8.2%)
Impacts on/of Lifestyle 4 (3.1%) 22 (16.4%) 13 (12.7%) 18 (12.7%) 8 (6.0%) Accountability 10. Uncertainty and Risk 4 (3.1%) 3 (2.2%) 8 (7.8%) 2 (1.4%) 11 (8.2%)
-
[48]
External Constraints 9 (7.5%) 10 (7.6%) 4 (3.7%) 5 (3.4%) 6 (4.3%)
Aversion to AI 0 (0.0%) 1 (0.7%) 0 (0.0%) 1 (0.7%) 0 (0.0%) Others 2 (1.5%) 4 (3.0%) 3 (2.9%) 3 (2.1%) 5 (3.7%) Total 130 134 102 142 134 Willingness to Act (W A) Effort & Environmental Constraints 1. External Constraints 9 (7.5%) 10 (7.6%) 4 (3.7%) 5 (3.4%) 6 (4.3%)
-
[49]
Feasibility Judgment 24 (20.0%) 36 (27.3%) 30 (27.8%) 40 (27.6%) 39 (28.3%)
Degree of Effort 27 (22.5%) 29 (22.0%) 37 (34.3%) 43 (29.7%) 31 (22.5%) Feasibility & Practicality 3. Feasibility Judgment 24 (20.0%) 36 (27.3%) 30 (27.8%) 40 (27.6%) 39 (28.3%)
-
[50]
Psychological Resistance 11 (9.2%) 14 (10.6%) 4 (3.7%) 10 (6.9%) 14 (10.1%)
Gains 6 (5.0%) 14 (10.6%) 11 (10.2%) 7 (4.8%) 11 (8.0%) Personal Values & Life Fit 5. Psychological Resistance 11 (9.2%) 14 (10.6%) 4 (3.7%) 10 (6.9%) 14 (10.1%)
-
[51]
Motives and Values 12 (10.0%) 6 (4.5%) 12 (11.1%) 14 (9.7%) 13 (9.4%)
-
[52]
Uncertainty and Risk 5 (4.2%) 5 (3.8%) 2 (1.9%) 2 (1.4%) 3 (2.2%)
Unnecessity 21 (17.5%) 10 (7.6%) 1 (0.9%) 12 (8.3%) 10 (7.2%) Accountability 8. Uncertainty and Risk 5 (4.2%) 5 (3.8%) 2 (1.9%) 2 (1.4%) 3 (2.2%)
-
[53]
Unfairness 3 (2.5%) 2 (1.5%) 1 (0.9%) 0 (0.0%) 0 (0.0%) Others 2 (1.7%) 6 (4.5%) 6 (5.6%) 12 (8.3%) 11 (8.0%) Total 120 132 108 145 138 Table S15: Distribution of themes and codes across outcome measures with frequencies and percentages by conditions. F Manipulation Check We performed a two-way ANOV A with recourse-set diversity and size as factors and pe...
discussion (0)
Sign in with ORCID, Apple, or X to comment. Anyone can read and Pith papers without signing in.