arxiv: 2604.23896 · v1 · submitted 2026-04-26 · 💻 cs.HC

Recognition: unknown

From Trust to Appropriate Reliance: Measurement Constructs in Human-AI Decision-Making

Konstantinos Papangelis, Muhammad Raees

Authors on Pith no claims yet

Pith reviewed 2026-05-08 05:27 UTC · model grok-4.3

classification 💻 cs.HC

keywords human-AI decision-makingappropriate reliancetrust in AImeasurement constructsAI adviceempirical studiesobjective metrics

0 comments

The pith

Studies of human-AI decision-making use fragmented measures of appropriate reliance that differ from trust, and three distinct views help organize them.

A machine-rendered reading of the paper's core claim, the machinery that carries it, and where it could break.

The paper reviews empirical work on how people decide whether to follow AI advice. It finds that while trust has been the main lens, recent evidence shows trust does not always lead to appropriate reliance. The authors sort existing measures into three views called Traditional, Appropriateness, and Dominance. They argue that without agreed metrics, it is hard to compare results across experiments. If researchers adopt common ways to measure reliance, studies could build on each other more effectively.

Core claim

Our analysis of literature shows that constructs for human-AI appropriate reliance are still fragmented in research. We present three views on appropriate reliance, namely Traditional, Appropriateness, and Dominance, as discussed in research. Using these views, we evaluate objective metrics reported in studies and argue for their consensus to facilitate the comparison across empirical research. We also discuss how studies employ objective metrics and examine their validity in application contexts. Our work contributes to the critical body of research on exploring objective metrics for assessing humans' appropriate reliance on AI advice.

What carries the argument

The three views on appropriate reliance—Traditional, Appropriateness, and Dominance—that classify how studies define and measure whether users follow AI advice correctly rather than blindly or not at all.

If this is right

Objective metrics reported in studies can be evaluated and compared by mapping them to the three views.
Consensus on a shared set of metrics would allow direct comparison of findings across different empirical studies.
Studies should explicitly differentiate appropriate reliance from trust and from mere reliance to clarify what is being assessed.
Examining how objective metrics perform in specific application contexts reveals their practical validity and limits.
Organizing measures under the three views highlights gaps where new metrics may be needed.

Where Pith is reading between the lines

These are editorial extensions of the paper, not claims the author makes directly.

Convergence on one dominant view could simplify future experiments while still allowing context-specific adjustments.
The views could be tested in new studies to see which one best predicts whether users calibrate their reliance in real deployments.
Linking the views to established models of human judgment might explain why trust alone fails to ensure appropriate use of AI.

Load-bearing premise

The reviewed empirical studies are representative of the full human-AI decision-making literature and the three proposed views comprehensively capture without significant gaps or overlaps the range of measurement approaches used.

What would settle it

A new review that identifies many studies whose reliance measures do not fit any of the three views, or that shows most studies align with only one view without the others adding explanatory power, would challenge the proposed classification.

Figures

Figures reproduced from arXiv: 2604.23896 by Konstantinos Papangelis, Muhammad Raees.

**Figure 1.** Figure 1: The PRISMA [58] framework was used to assess research studies. The primary search was conducted on SCOPUS, with a secondary search on the ACM Digital Library (ACM DL). 3 Methodology We followed a systematic protocol (i.e., the PRISMA [58] framework, as summarized in view at source ↗

**Figure 2.** Figure 2: Left: The prevalence of objective metrics reported in selected studies. Accuracy is the most widely used measure. Common metrics for appropriate reliance used are over-reliance, under-reliance, RAIR, RST, and Accuracy-Wid. Studies also measure users’ reliance on AI through agreements or switches. Right: The prevalence of subjective metrics reported in selected studies. Studies commonly use confidence, trus… view at source ↗

read the original abstract

While human-AI decision-making research has primarily used trust measurements to assess the practical usage of AI systems by their end-users, recent empirical evidence suggests that trust measurements do not inform users' appropriate reliance on AI systems. While examining the human-AI decision-making literature, in this work, we review empirical studies that assess people's appropriate reliance on AI advice, differentiating measurements and constructs of appropriate reliance from trust and mere reliance. Our analysis of literature shows that constructs for human-AI appropriate reliance are still fragmented in research. We present three views on appropriate reliance, namely Traditional, Appropriateness, and Dominance, as discussed in research. Using these views, we evaluate objective metrics reported in studies and argue for their consensus to facilitate the comparison across empirical research. We also discuss how studies employ objective metrics and examine their validity in application contexts. Our work contributes to the critical body of research on exploring objective metrics for assessing humans' appropriate reliance on AI advice.

Editorial analysis

A structured set of objections, weighed in public.

Desk editor's note, referee report, simulated authors' rebuttal, and a circularity audit. Tearing a paper down is the easy half of reading it; the pith above is the substance, this is the friction.

Desk Editor's Note private letter to a colleague

This review usefully flags that trust scales miss appropriate reliance on AI advice and groups metrics into three views, but the taxonomy looks too neat to survive contact with the actual studies.

read the letter

The main takeaway is that trust measurements do not reliably tell us whether people are relying on AI advice the right way, and the paper pulls together the scattered ways researchers have tried to measure that instead. It reviews empirical work, separates appropriate reliance from plain reliance or trust, and sorts the approaches into Traditional, Appropriateness, and Dominance views before checking how objective metrics line up under each one. That synthesis is the real service: it shows why comparability across studies has been weak and makes a case for settling on better shared measures. The evaluation of metrics in context is also straightforward and points out practical validity issues without overclaiming. Credit for doing the legwork on the literature and for keeping the focus on objective indicators rather than just adding more self-report scales. The three-view split is the soft spot. If some studies use metrics that sit between views or outside them entirely, the claim of useful organization loses force and the call for consensus becomes harder to act on. The abstract gives no search protocol or count of papers, so it is difficult to tell how complete the picture is or whether the fragmentation is as severe as stated. Minor gaps in coverage are common in reviews, but here they sit right at the center of the argument. This is for HCI researchers who run decision-making experiments and need to choose or defend their reliance measures. It is not a breakthrough result, but the topic is foundational enough that a cleaned-up version would help the subfield. I would send it to peer review so the taxonomy and study selection can be stress-tested by people who know the empirical base better.

Referee Report

2 major / 2 minor

Summary. The manuscript presents a literature review of empirical studies in human-AI decision-making. It argues that trust measurements do not adequately inform appropriate reliance on AI advice, differentiates appropriate reliance constructs from trust and mere reliance, identifies fragmentation in the literature, proposes three organizing views (Traditional, Appropriateness, and Dominance), evaluates objective metrics reported in studies through these views, advocates for consensus on such metrics to enable cross-study comparison, and discusses the validity of these metrics in application contexts.

Significance. If the taxonomy is robust and the metric evaluations accurate, the work could help standardize measurement practices in human-AI interaction research, moving beyond over-reliance on trust scales and enabling better synthesis of empirical findings. The emphasis on objective metrics and contextual validity adds practical value for designing and evaluating AI-assisted decision systems.

major comments (2)

The central diagnosis of fragmentation and the call for metric consensus rest on the claim that all relevant constructs can be usefully partitioned into exactly three non-overlapping views (Traditional, Appropriateness, Dominance). The manuscript must provide explicit classification criteria, a mapping of reviewed studies' metrics to these views, and explicit discussion of potential overlaps or gaps (e.g., a metric that simultaneously meets Appropriateness and Dominance criteria). Without this, the fragmentation argument and downstream evaluation lose force.
The evaluation of objective metrics and their validity in application contexts inherits the same dependency on the three-view framework. The manuscript should report the search methodology, inclusion criteria, and number of studies reviewed so that readers can assess whether the sample is representative and whether the taxonomy comprehensively covers the literature without significant omissions.

minor comments (2)

The abstract could briefly note the number of studies reviewed and one concrete example of a metric validity issue to give readers an immediate sense of the empirical scope.
Ensure consistent use of terminology distinguishing 'trust', 'reliance', and 'appropriate reliance' across sections, especially when citing prior work.

Simulated Author's Rebuttal

2 responses · 0 unresolved

We thank the referee for their constructive feedback, which highlights important areas for strengthening the rigor and transparency of our literature review. We address each major comment below and commit to revisions that directly respond to the concerns raised.

read point-by-point responses

Referee: The central diagnosis of fragmentation and the call for metric consensus rest on the claim that all relevant constructs can be usefully partitioned into exactly three non-overlapping views (Traditional, Appropriateness, and Dominance). The manuscript must provide explicit classification criteria, a mapping of reviewed studies' metrics to these views, and explicit discussion of potential overlaps or gaps (e.g., a metric that simultaneously meets Appropriateness and Dominance criteria). Without this, the fragmentation argument and downstream evaluation lose force.

Authors: We agree that explicit criteria and a transparent mapping are necessary to substantiate the three-view framework. In the revised manuscript, we will add a dedicated subsection that provides precise, operational classification criteria for assigning metrics to the Traditional, Appropriateness, and Dominance views. We will also include a table that systematically maps the objective metrics reported in each reviewed study to these views. Finally, we will expand the discussion to explicitly address potential overlaps and gaps, including concrete examples of metrics that could satisfy criteria from more than one view and our rationale for primary classification in such cases. revision: yes
Referee: The evaluation of objective metrics and their validity in application contexts inherits the same dependency on the three-view framework. The manuscript should report the search methodology, inclusion criteria, and number of studies reviewed so that readers can assess whether the sample is representative and whether the taxonomy comprehensively covers the literature without significant omissions.

Authors: We acknowledge that the current version does not provide sufficient methodological detail for readers to evaluate the scope of the review. We will add a new 'Review Methodology' section that describes the literature search strategy (databases, keywords, and time period), the inclusion and exclusion criteria applied, the total number of studies screened and ultimately included, and any limitations regarding coverage of the broader literature. This addition will directly support assessment of the taxonomy's comprehensiveness. revision: yes

Circularity Check

0 steps flagged

Literature review with no derivations, equations, or self-referential reductions.

full rationale

The paper is a review of external empirical studies on human-AI decision-making. It identifies fragmentation in the literature and organizes findings into three views (Traditional, Appropriateness, Dominance) drawn from the reviewed works. No equations, fitted parameters, or internal derivations exist. All claims rest on analysis and citation of independent prior studies rather than self-definition, self-citation chains, or renaming of the paper's own inputs. The taxonomy functions as an organizational lens on external data, not a construct that reduces to the paper's own definitions or assumptions.

Axiom & Free-Parameter Ledger

0 free parameters · 0 axioms · 0 invented entities

This is a literature review paper. No free parameters, axioms, or invented entities are introduced because the central claim is a synthesis and categorization of existing research rather than a new model or derivation.

pith-pipeline@v0.9.0 · 5460 in / 1104 out tokens · 65473 ms · 2026-05-08T05:27:29.615377+00:00 · methodology

discussion (0)

Reference graph

Works this paper leans on

96 extracted references · 66 canonical work pages · 1 internal anchor

[1]

Alonso-Moral, Roberto Confalonieri, Riccardo Guidotti, Javier Del Ser, Natalia Díaz-Rodríguez, and Francisco Herrera

Sajid Ali, Tamer Abuhmed, Shaker El-Sappagh, Khan Muhammad, Jose M. Alonso-Moral, Roberto Confalonieri, Riccardo Guidotti, Javier Del Ser, Natalia Díaz-Rodríguez, and Francisco Herrera. 2023. Explainable Artificial Intelligence (XAI): What we know and what is left to attain Trustworthy Artificial Intelligence.Information Fusion99 (2023), 101805. doi:10.10...

work page doi:10.1016/j.inffus.2023.101805 2023
[2]

2026.Anonymized Material for Study: Review Protocol, Data Analysis, and Detailed Descriptions

OSF Anonymous. 2026.Anonymized Material for Study: Review Protocol, Data Analysis, and Detailed Descriptions. https://osf.io/nxph2/overview? view_only=2467fe0607724f2bbf4d9ca9c36e86f0

2026
[3]

Annette Baier. 1986. Trust and antitrust.ethics96, 2 (1986), 231–260

1986
[4]

Gagan Bansal, Tongshuang Wu, Joyce Zhou, Raymond Fok, Besmira Nushi, Ece Kamar, Marco Tulio Ribeiro, and Daniel Weld. 2021. Does the Whole Exceed its Parts? The Effect of AI Explanations on Complementary Team Performance. InProceedings of the 2021 CHI Conference on Human 16 Raees and Papangelis Factors in Computing Systems(Yokohama, Japan)(CHI ’21). Assoc...

work page doi:10.1145/3411764.3445717 2021
[5]

Eagan, and Winston Maxwell

Astrid Bertrand, Rafik Belloum, James R. Eagan, and Winston Maxwell. 2022. How Cognitive Biases Affect XAI-assisted Decision-making: A Systematic Review. InProceedings of the 2022 AAAI/ACM Conference on AI, Ethics, and Society(Oxford, United Kingdom)(AIES ’22). Association for Computing Machinery, New York, NY, USA, 78–91. doi:10.1145/3514094.3534164

work page doi:10.1145/3514094.3534164 2022
[6]

Michelle Brachman, Zahra Ashktorab, Michael Desmond, Evelyn Duesterwald, Casey Dugan, Narendra Nath Joshi, Qian Pan, and Aabhas Sharma
[7]

ACM Hum.-Comput

Reliance and Automation for Human-AI Collaborative Data Labeling Conflict Resolution.Proc. ACM Hum.-Comput. Interact.6, CSCW2, Article 321 (Nov. 2022), 27 pages. doi:10.1145/3555212

work page doi:10.1145/3555212 2022
[8]

Zana Buçinca, Maja Barbara Malaya, and Krzysztof Z. Gajos. 2021. To Trust or to Think: Cognitive Forcing Functions Can Reduce Overreliance on AI in AI-assisted Decision-making.Proc. ACM Hum.-Comput. Interact.5, CSCW1, Article 188 (April 2021), 21 pages. doi:10.1145/3449287

work page doi:10.1145/3449287 2021
[9]

Paluch, Finale Doshi-Velez, and Krzysztof Z

Zana Buçinca, Siddharth Swaroop, Amanda E. Paluch, Finale Doshi-Velez, and Krzysztof Z. Gajos. 2025. Contrastive Explanations That Anticipate Human Misconceptions Can Improve Human Decision-Making Skills. InProceedings of the 2025 CHI Conference on Human Factors in Computing Systems (CHI ’25). Association for Computing Machinery, New York, NY, USA, Articl...

work page doi:10.1145/3706598.3713229 2025
[10]

Federico Cabitza, Andrea Campagner, Riccardo Angius, Chiara Natali, and Carlo Reverberi. 2023. AI Shall Have No Dominion: on How to Measure Technology Dominance in AI-supported Human decision-making. InProceedings of the 2023 CHI Conference on Human Factors in Computing Systems (Hamburg, Germany)(CHI ’23). Association for Computing Machinery, New York, NY...

work page doi:10.1145/3544548.3581095 2023
[11]

Casolin and Flora D

Emma R. Casolin and Flora D. Salim. 2024. Towards Understanding Human-AI Reliance Patterns Through Explanation Styles. InCompanion of the 2024 on ACM International Joint Conference on Pervasive and Ubiquitous Computing(Melbourne VIC, Australia)(UbiComp ’24). Association for Computing Machinery, New York, NY, USA, 861–865. doi:10.1145/3675094.3678996

work page doi:10.1145/3675094.3678996 2024
[12]

Federico Maria Cau, Hanna Hauptmann, Lucio Davide Spano, and Nava Tintarev. 2023. Effects of AI and Logic-Style Explanations on Users’ Decisions Under Different Levels of Uncertainty.ACM Trans. Interact. Intell. Syst.13, 4, Article 22 (Dec. 2023), 42 pages. doi:10.1145/3588320

work page doi:10.1145/3588320 2023
[13]

Federico Maria Cau and Lucio Davide Spano. 2025. The Influence of Curiosity Traits and On-Demand Explanations in AI-Assisted Decision-Making. InProceedings of the 30th International Conference on Intelligent User Interfaces (IUI ’25). Association for Computing Machinery, New York, NY, USA, 1440–1457. doi:10.1145/3708359.3712165

work page doi:10.1145/3708359.3712165 2025
[14]

Vera Liao, Jennifer Wortman Vaughan, and Gagan Bansal

Valerie Chen, Q. Vera Liao, Jennifer Wortman Vaughan, and Gagan Bansal. 2023. Understanding the Role of Human Intuition on Reliance in Human-AI Decision-Making with Explanations. 7, CSCW2, Article 370 (Oct. 2023), 32 pages. doi:10.1145/3610219

work page doi:10.1145/3610219 2023
[15]

Chun-Wei Chiang, Zhuoran Lu, Zhuoyan Li, and Ming Yin. 2023. Are Two Heads Better Than One in AI-Assisted Decision Making? Comparing the Behavior and Performance of Groups and Individuals in Human-AI Collaborative Recidivism Risk Assessment. InProceedings of the 2023 CHI Conference on Human Factors in Computing Systems(Hamburg, Germany)(CHI ’23). Associat...

work page doi:10.1145/3544548.3581015 2023
[16]

Hyesun Choung, Prabu David, and Arun Ross. 2023. Trust in AI and Its Role in the Acceptance of AI Technologies.International Journal of Human–Computer Interaction39, 9 (2023), 1727–1739. doi:10.1080/10447318.2022.2050543

work page doi:10.1080/10447318.2022.2050543 2023
[17]

Sam Corbett-Davies, Emma Pierson, Avi Feller, Sharad Goel, and Aziz Huq. 2017. Algorithmic decision making and the cost of fairness. InProceedings of the 23rd acm sigkdd international conference on knowledge discovery and data mining. 797–806

2017
[18]

NAJ Cornelissen, RJM Van Eerdt, HK Schraffenberger, and Willem FG Haselager. 2022. Reflection machines: increasing meaningful human control over Decision Support Systems.Ethics and Information Technology24, 2 (2022), 19. doi:10.1007/s10676-022-09645-y

work page doi:10.1007/s10676-022-09645-y 2022
[19]

Nils Dahlbäck, Arne Jönsson, and Lars Ahrenberg. 1993. Wizard of Oz studies: why and how. InProceedings of the 1st international conference on Intelligent user interfaces. Association for Computing Machinery, 193–200. doi:10.1145/169891.169968

work page doi:10.1145/169891.169968 1993
[20]

Devleena Das and Sonia Chernova. 2020. Leveraging rationales to improve human task performance. InProceedings of the 25th international conference on intelligent user interfaces. 510–518

2020
[21]

Sander de Jong, Ville Paananen, Benjamin Tag, and Niels van Berkel. 2025. Cognitive Forcing for Better Decision-Making: Reducing Overreliance on AI Systems Through Partial Explanations. 9, 2, Article CSCW048 (May 2025), 30 pages. doi:10.1145/3710946

work page doi:10.1145/3710946 2025
[22]

Dominik Dellermann, Philipp Ebel, Matthias Söllner, and Jan Marco Leimeister. 2019. Hybrid intelligence.Business & Information Systems Engineering 61, 5 (2019), 637–643. doi:10.1007/s12599-019-00595-2

work page doi:10.1007/s12599-019-00595-2 2019
[23]

Berkeley J Dietvorst, Joseph P Simmons, and Cade Massey. 2015. Algorithm aversion: people erroneously avoid algorithms after seeing them err. Journal of experimental psychology: General144, 1 (2015), 114

2015
[24]

Finale Doshi-Velez and Been Kim. 2017. Towards a rigorous science of interpretable machine learning.arXiv preprint arXiv:1702.08608(2017)

work page internal anchor Pith review arXiv 2017
[25]

Dudley and Per Ola Kristensson

John J. Dudley and Per Ola Kristensson. 2018. A Review of User Interface Design for Interactive Machine Learning.ACM Trans. Interact. Intell. Syst. 8, 2, Article 8 (June 2018), 37 pages. doi:10.1145/3185517

work page doi:10.1145/3185517 2018
[26]

Dzindolet, Scott A

Mary T. Dzindolet, Scott A. Peterson, Regina A. Pomranky, Linda G. Pierce, and Hall P. Beck. 2003. The role of trust in automation reliance. International Journal of Human-Computer Studies58, 6 (2003), 697–718. doi:10.1016/S1071-5819(03)00038-7 Trust and Technology

work page doi:10.1016/s1071-5819(03)00038-7 2003
[27]

Sven Eckhardt, Niklas Kühl, Mateusz Dolata, and Gerhard Schwabe. 2024. A survey of AI reliance.arXiv preprint arXiv:2408.03948(2024). doi:10.48550/arXiv.2408.03948

work page doi:10.48550/arxiv.2408.03948 2024
[28]

Susanne Gaube, Harini Suresh, Martina Raue, Alexander Merritt, Seth J Berkowitz, Eva Lermer, Joseph F Coughlin, John V Guttag, Errol Colak, and Marzyeh Ghassemi. 2021. Do as AI say: susceptibility in deployment of clinical decision-aids.NPJ digital medicine4, 1 (2021), 31. From Trust to Appropriate Reliance 17

2021
[29]

Ben Green and Yiling Chen. 2019. The Principles and Limits of Algorithm-in-the-Loop Decision Making.Proc. ACM Hum.-Comput. Interact.3, CSCW, Article 50 (Nov. 2019), 24 pages. doi:10.1145/3359152

work page doi:10.1145/3359152 2019
[30]

Hartline, and Jessica Hullman

Ziyang Guo, Yifan Wu, Jason D. Hartline, and Jessica Hullman. 2024. A Decision Theoretic Framework for Measuring AI Reliance. InProceedings of the 2024 ACM Conference on Fairness, Accountability, and Transparency(Rio de Janeiro, Brazil)(FAccT ’24). Association for Computing Machinery, New York, NY, USA, 221–236. doi:10.1145/3630106.3658901

work page doi:10.1145/3630106.3658901 2024
[31]

PA Hancock, Theresa T Kessler, Alexandra D Kaplan, Kimberly Stowers, J Christopher Brill, Deborah R Billings, Kristin E Schaefer, and James L Szalma
[32]

doi:10.3389/fpsyg.2023.1081086

How and why humans trust: A meta-analysis and elaborated model.Frontiers in psychology14 (2023), 1081086. doi:10.3389/fpsyg.2023.1081086

work page doi:10.3389/fpsyg.2023.1081086 2023
[33]

Gaole He, Nilay Aishwarya, and Ujwal Gadiraju. 2025. Is Conversational XAI All You Need? Human-AI Decision Making With a Conversational XAI Assistant. InProceedings of the 30th International Conference on Intelligent User Interfaces (IUI ’25). Association for Computing Machinery, New York, NY, USA, 907–924. doi:10.1145/3708359.3712133

work page doi:10.1145/3708359.3712133 2025
[34]

Gaole He, Abri Bharos, and Ujwal Gadiraju. 2024. To Err Is AI! Debugging as an Intervention to Facilitate Appropriate Reliance on AI Systems. In Proceedings of the 35th ACM Conference on Hypertext and Social Media(Poznan, Poland)(HT ’24). Association for Computing Machinery, New York, NY, USA, 98–105. doi:10.1145/3648188.3675130

work page doi:10.1145/3648188.3675130 2024
[35]

Gaole He, Stefan Buijsman, and Ujwal Gadiraju. 2023. How Stated Accuracy of an AI System and Analogies to Explain Accuracy Affect Human Reliance on the System.Proc. ACM Hum.-Comput. Interact.7, CSCW2, Article 276 (Oct. 2023), 29 pages. doi:10.1145/3610067

work page doi:10.1145/3610067 2023
[36]

Gaole He, Lucie Kuiper, and Ujwal Gadiraju. 2023. Knowing About Knowing: An Illusion of Human Competence Can Hinder Appropriate Reliance on AI Systems. InProceedings of the 2023 CHI Conference on Human Factors in Computing Systems(Hamburg, Germany)(CHI ’23). Association for Computing Machinery, New York, NY, USA, Article 113, 18 pages. doi:10.1145/3544548.3581025

work page doi:10.1145/3544548.3581025 2023
[37]

Maren Hinrichs, Thi Bich Diep Bui, and Stefan Schneegass. 2024. Exploring the Effects of User Input and Decision Criteria Control on Trust in a Decision Support Tool for Spare Parts Inventory Management. InProceedings of the International Conference on Mobile and Ubiquitous Multimedia. 313–323

2024
[38]

Kevin Anthony Hoff and Masooda Bashir. 2015. Trust in Automation: Integrating Empirical Evidence on Factors That Influence Trust.Human Factors57, 3 (2015), 407–434. doi:10.1177/0018720814547570 PMID: 25875432

work page doi:10.1177/0018720814547570 2015
[39]

Angel Hsing-Chi Hwang, Q Vera Liao, Su Lin Blodgett, Alexandra Olteanu, and Adam Trischler. 2025. ’It was 80% me, 20% AI’: Seeking Authenticity in Co-Writing with Large Language Models.Proceedings of the ACM on Human-Computer Interaction9, 2 (2025), 1–41

2025
[40]

2011.Thinking, fast and slow

Daniel Kahneman. 2011.Thinking, fast and slow. macmillan

2011
[41]

Daniel Kahneman and Gary Klein. 2009. Conditions for intuitive expertise: a failure to disagree.American psychologist64, 6 (2009), 515

2009
[42]

Understanding trust and reliance development in AI advice: Assessing model accuracy, model explanations, and experi- ences from previous interactions

Patricia K. Kahr, Gerrit Rooks, Martijn C. Willemsen, and Chris C. P. Snijders. 2024. Understanding Trust and Reliance Development in AI Advice: Assessing Model Accuracy, Model Explanations, and Experiences from Previous Interactions.ACM Trans. Interact. Intell. Syst.14, 4, Article 29 (Dec. 2024), 30 pages. doi:10.1145/3686164

work page doi:10.1145/3686164 2024
[43]

Escalation Risks from Language Models in Military and Diplomatic Decision-Making

Sunnie S. Y. Kim, Q. Vera Liao, Mihaela Vorvoreanu, Stephanie Ballard, and Jennifer Wortman Vaughan. 2024. "I’m Not Sure, But... ": Examining the Impact of Large Language Models’ Uncertainty Expression on User Reliance and Trust. InProceedings of the 2024 ACM Conference on Fairness, Accountability, and Transparency(Rio de Janeiro, Brazil)(FAccT ’24). Asso...

work page doi:10.1145/3630106.3658941 2024
[44]

Wouter Kool and Matthew Botvinick. 2018. Mental labour.Nature human behaviour2, 12 (2018), 899–908

2018
[45]

Abdu, Irene V

Vivian Lai, Chacha Chen, Alison Smith-Renner, Q. Vera Liao, and Chenhao Tan. 2023. Towards a Science of Human-AI Decision Making: An Overview of Design Space in Empirical Human-Subject Studies. InProceedings of the 2023 ACM Conference on Fairness, Accountability, and Transparency (Chicago, IL, USA)(FAccT ’23). Association for Computing Machinery, New York...

work page doi:10.1145/3593013.3594087 2023
[46]

Nancy K Lankton, D Harrison McKnight, and John Tripp. 2015. Technology, humanness, and trust: Rethinking trust in technology.Journal of the association for information systems16, 10 (2015), 1. doi:10.17705/1jais.00411

work page doi:10.17705/1jais.00411 2015
[47]

Lee and Katrina A

John D. Lee and Katrina A. See. 2004. Trust in Automation: Designing for Appropriate Reliance.Human Factors46, 1 (2004), 50–80. doi:10.1518/hfes. 46.1.50_30392 PMID: 15151155

work page doi:10.1518/hfes 2004
[48]

Benedikt Leichtmann, Christina Humer, Andreas Hinterreiter, Marc Streit, and Martina Mara. 2023. Effects of Explainable Artificial Intelligence on trust and human behavior in a high-risk decision task.Computers in Human Behavior139 (2023), 107539

2023
[49]

Logg, Julia A

Jennifer M. Logg, Julia A. Minson, and Don A. Moore. 2019. Algorithm appreciation: People prefer algorithmic to human judgment.Organizational Behavior and Human Decision Processes151 (2019), 90–103. doi:10.1016/j.obhdp.2018.12.005

work page doi:10.1016/j.obhdp.2018.12.005 2019
[50]

Luca Longo, Mario Brcic, Federico Cabitza, Jaesik Choi, Roberto Confalonieri, Javier Del Ser, Riccardo Guidotti, Yoichi Hayashi, Francisco Herrera, Andreas Holzinger, Richard Jiang, Hassan Khosravi, Freddy Lecue, Gianclaudio Malgieri, Andrés Páez, Wojciech Samek, Johannes Schneider, Timo Speith, and Simone Stumpf. 2024. Explainable Artificial Intelligence...

work page doi:10.1016/j.inffus.2024.102301 2024
[51]

Zhuoran Lu, Dakuo Wang, and Ming Yin. 2024. Does More Advice Help? The Effects of Second Opinions in AI-Assisted Decision Making.Proc. ACM Hum.-Comput. Interact.8, CSCW1, Article 217 (April 2024), 31 pages. doi:10.1145/3653708

work page doi:10.1145/3653708 2024
[52]

Shuai Ma, Qiaoyi Chen, Xinru Wang, Chengbo Zheng, Zhenhui Peng, Ming Yin, and Xiaojuan Ma. 2025. Towards Human-AI Deliberation: Design and Evaluation of LLM-Empowered Deliberative AI for AI-Assisted Decision-Making. InProceedings of the 2025 CHI Conference on Human Factors in Computing Systems (CHI ’25). Association for Computing Machinery, New York, NY, ...

work page doi:10.1145/3706598.3713423 2025
[53]

Shuai Ma, Ying Lei, Xinru Wang, Chengbo Zheng, Chuhan Shi, Ming Yin, and Xiaojuan Ma. 2023. Who Should I Trust: AI or Myself? Leveraging Human and AI Correctness Likelihood to Promote Appropriate Trust in AI-Assisted Decision-Making. InProceedings of the 2023 CHI Conference on Human Factors in Computing Systems(Hamburg, Germany)(CHI ’23). Association for ...

work page doi:10.1145/3544548.3581058 2023
[54]

Are You Really Sure?

Shuai Ma, Xinru Wang, Ying Lei, Chuhan Shi, Ming Yin, and Xiaojuan Ma. 2024. “Are You Really Sure?” Understanding the Effects of Human Self-Confidence Calibration in AI-Assisted Decision Making. InProceedings of the 2024 CHI Conference on Human Factors in Computing Systems (Honolulu, HI, USA)(CHI ’24). Association for Computing Machinery, New York, NY, US...

work page doi:10.1145/3613904.3642671 2024
[55]

Scott Mayer McKinney, Marcin Sieniek, Varun Godbole, Jonathan Godwin, Natasha Antropova, Hutan Ashrafian, Trevor Back, Mary Chesus, Greg S Corrado, Ara Darzi, et al. 2020. International evaluation of an AI system for breast cancer screening.Nature577, 7788 (2020), 89–94

2020
[56]

Jonker, and Myrthe L

Siddharth Mehrotra, Chadha Degachi, Oleksandra Vereschak, Catholijn M. Jonker, and Myrthe L. Tielman. 2024. A Systematic Review on Fostering Appropriate Trust in Human-AI Interaction: Trends, Opportunities and Challenges.ACM J. Responsib. Comput.1, 4, Article 26 (Nov. 2024), 45 pages. doi:10.1145/3696449

work page doi:10.1145/3696449 2024
[57]

Tim Miller. 2023. Explainable AI is Dead, Long Live Explainable AI! Hypothesis-driven Decision Support using Evaluative AI. InProceedings of the 2023 ACM Conference on Fairness, Accountability, and Transparency(Chicago, IL, USA)(FAccT ’23). Association for Computing Machinery, New York, NY, USA, 333–342. doi:10.1145/3593013.3594001

work page doi:10.1145/3593013.3594001 2023
[58]

Katelyn Morrison, Donghoon Shin, Kenneth Holstein, and Adam Perer. 2023. Evaluating the Impact of Human Explanation Strategies on Human-AI Visual Decision-Making.Proc. ACM Hum.-Comput. Interact.7, CSCW1, Article 48 (April 2023), 37 pages. doi:10.1145/3579481

work page doi:10.1145/3579481 2023
[59]

Katelyn Morrison, Philipp Spitzer, Violet Turri, Michelle Feng, Niklas Kühl, and Adam Perer. 2024. The Impact of Imperfect XAI on Human-AI Decision-Making.Proc. ACM Hum.-Comput. Interact.8, CSCW1, Article 183 (April 2024), 39 pages. doi:10.1145/3641022

work page doi:10.1145/3641022 2024
[60]

Matthew J Page, Joanne E McKenzie, Patrick M Bossuyt, Isabelle Boutron, Tammy C Hoffmann, Cynthia D Mulrow, Larissa Shamseer, Jennifer M Tetzlaff, Elie A Akl, Sue E Brennan, Roger Chou, Julie Glanville, Jeremy M Grimshaw, Asbjørn Hróbjartsson, Manoj M Lalu, Tianjing Li, Elizabeth W Loder, Evan Mayo-Wilson, Steve McDonald, Luke A McGuinness, Lesley A Stewa...

work page doi:10.1136/bmj.n71 2021
[61]

Raja Parasuraman and Dietrich H Manzey. 2010. Complacency and bias in human use of automation: An attentional integration.Human factors52, 3 (2010), 381–410. doi:10.1177/0018720810376055

work page doi:10.1177/0018720810376055 2010
[62]

Saumya Pareek, Niels van Berkel, Eduardo Velloso, and Jorge Goncalves. 2024. Effect of Explanation Conceptualisations on Trust in AI-assisted Credibility Assessment.Proc. ACM Hum.-Comput. Interact.8, CSCW2, Article 383 (Nov. 2024), 31 pages. doi:10.1145/3686922

work page doi:10.1145/3686922 2024
[63]

Alison Parkes. 2017. The effect of individual and task characteristics on decision aid reliance.Behaviour & Information Technology36, 2 (2017), 165–177

2017
[64]

Muhammad Raees, Vassilis-Javed Khan, Ioanna Lykourentzou, and Konstantinos Papangelis. 2026. Do People Appropriately Rely on AI-Advice? An Analytical Review of HCI Research on Human-AI Decision-Making. InProceedings of the 2026 CHI Conference on Human Factors in Computing Systems(Barcelona, Spain)(CHI ’26). Association for Computing Machinery, New York, N...

work page doi:10.1145/3772318.3791467 2026
[65]

Muhammad Raees, Inge Meijerink, Ioanna Lykourentzou, Vassilis-Javed Khan, and Konstantinos Papangelis. 2024. From explainable to interactive AI: A literature review on current trends in human-AI interaction.International Journal of Human-Computer Studies189, September 2024 (2024), 103301. doi:10.1016/j.ijhcs.2024.103301

work page doi:10.1016/j.ijhcs.2024.103301 2024
[66]

Muhammad Raees and Konstantinos Papangelis. 2026. Trust to Reliance: Measurement Constructs for Human-AI Appropriate Reliance. InProceedings of the Extended Abstracts of the 2026 CHI Conference on Human Factors in Computing Systems (CHI EA ’26). Association for Computing Machinery, New York, NY, USA, Article 695, 7 pages. doi:10.1145/3772363.3798835

work page doi:10.1145/3772363.3798835 2026
[67]

Gonzalo Ramos, Christopher Meek, Patrice Simard, Jina Suh, and Soroush Ghorashi. 2020. Interactive machine teaching: a human-centered approach to building machine-learned models.Human–Computer Interaction35, 5-6 (2020), 413–451

2020
[68]

Varshney, Amit Dhurandhar, and Richard Tomsett

Charvi Rastogi, Yunfeng Zhang, Dennis Wei, Kush R. Varshney, Amit Dhurandhar, and Richard Tomsett. 2022. Deciding Fast and Slow: The Role of Cognitive Biases in AI-assisted Decision-making.Proc. ACM Hum.-Comput. Interact.6, CSCW1, Article 83 (April 2022), 22 pages. doi:10.1145/3512930

work page doi:10.1145/3512930 2022
[69]

Giuseppe Romeo and Daniela Conti. 2026. Exploring automation bias in human–AI collaboration: a review and implications for explainable AI.AI & SOCIETY41, 1 (2026), 259–278

2026
[70]

Sara Salimzadeh, Gaole He, and Ujwal Gadiraju. 2023. A Missing Piece in the Puzzle: Considering the Role of Task Complexity in Human-AI Decision Making. InProceedings of the 31st ACM Conference on User Modeling, Adaptation and Personalization(Limassol, Cyprus)(UMAP ’23). Association for Computing Machinery, New York, NY, USA, 215–227. doi:10.1145/3565472.3592959

work page doi:10.1145/3565472.3592959 2023
[71]

Sara Salimzadeh, Gaole He, and Ujwal Gadiraju. 2024. Dealing with Uncertainty: Understanding the Impact of Prognostic Versus Diagnostic Tasks on Trust and Reliance in Human-AI Decision Making. InProceedings of the 2024 CHI Conference on Human Factors in Computing Systems(Honolulu, HI, USA)(CHI ’24). Association for Computing Machinery, New York, NY, USA, ...

work page doi:10.1145/3613904.3641905 2024
[72]

Nicolas Scharowski, Sebastian AC Perrig, Melanie Svab, Klaus Opwis, and Florian Brühlmann. 2023. Exploring the effects of human-centered AI explanations on trust and reliance.Frontiers in Computer Science5 (2023), 1151150. doi:10.3389/fcomp.2023.1151150

work page doi:10.3389/fcomp.2023.1151150 2023
[73]

Nicolas Scharowski, Sebastian AC Perrig, Nick von Felten, and Florian Brühlmann. 2022. Trust and reliance in XAI–Distinguishing between attitudinal and behavioral measures.arXiv preprint arXiv:2203.12318(2022)

work page arXiv 2022
[74]

Max Schemmer, Andrea Bartos, Philipp Spitzer, Patrick Hemmer, Niklas Kühl, Jonas Liebschner, and Gerhard Satzger. 2023. Towards effective human-AI decision-making: The role of human learning in appropriate reliance on AI advice. InForty-Fourth International Conference on Information From Trust to Appropriate Reliance 19 Systems(Hyderabad, India)(ICIS 2023...

2023
[75]

Max Schemmer, Niklas Kuehl, Carina Benz, Andrea Bartos, and Gerhard Satzger. 2023. Appropriate Reliance on AI Advice: Conceptualization and the Effect of Explanations. InProceedings of the 28th International Conference on Intelligent User Interfaces(Sydney, NSW, Australia)(IUI ’23). Association for Computing Machinery, New York, NY, USA, 410–422. doi:10.1...

work page doi:10.1145/3581641.3584066 2023
[76]

Nadine Schlicker and Markus Langer. 2021. Towards warranted trust: A model on the relation between actual and perceived system trustworthiness. InProceedings of mensch und computer 2021. 325–329

2021
[77]

Jakob Schoeffer, Maria De-Arteaga, and Niklas Kühl. 2024. Explanations, Fairness, and Appropriate Reliance in Human-AI Decision-Making. In Proceedings of the 2024 CHI Conference on Human Factors in Computing Systems(Honolulu, HI, USA)(CHI ’24). Association for Computing Machinery, New York, NY, USA, Article 836, 18 pages. doi:10.1145/3613904.3642621

work page doi:10.1145/3613904.3642621 2024
[78]

Jakob Schoeffer, Johannes Jakubik, Michael Vössing, Niklas Kühl, and Gerhard Satzger. 2025. AI reliance and decision quality: Fundamentals, interdependence, and the effects of interventions.Journal of Artificial Intelligence Research82 (2025), 471–501. doi:10.1613/jair.1.15873

work page doi:10.1613/jair.1.15873 2025
[79]

Phoebe Sengers, Kirsten Boehner, Shay David, and Joseph’Jofish’ Kaye. 2005. Reflective design. InProceedings of the 4th decennial conference on Critical computing: between sense and sensibility. 49–58

2005
[80]

Keng Siau and Weiyu Wang. 2018. Building trust in artificial intelligence, machine learning, and robotics.Cutter business technology journal31, 2 (2018), 47–53

2018

Showing first 80 references.