pith. machine review for the scientific record. sign in

arxiv: 2604.23896 · v1 · submitted 2026-04-26 · 💻 cs.HC

Recognition: unknown

From Trust to Appropriate Reliance: Measurement Constructs in Human-AI Decision-Making

Konstantinos Papangelis, Muhammad Raees

Authors on Pith no claims yet

Pith reviewed 2026-05-08 05:27 UTC · model grok-4.3

classification 💻 cs.HC
keywords human-AI decision-makingappropriate reliancetrust in AImeasurement constructsAI adviceempirical studiesobjective metrics
0
0 comments X

The pith

Studies of human-AI decision-making use fragmented measures of appropriate reliance that differ from trust, and three distinct views help organize them.

A machine-rendered reading of the paper's core claim, the machinery that carries it, and where it could break.

The paper reviews empirical work on how people decide whether to follow AI advice. It finds that while trust has been the main lens, recent evidence shows trust does not always lead to appropriate reliance. The authors sort existing measures into three views called Traditional, Appropriateness, and Dominance. They argue that without agreed metrics, it is hard to compare results across experiments. If researchers adopt common ways to measure reliance, studies could build on each other more effectively.

Core claim

Our analysis of literature shows that constructs for human-AI appropriate reliance are still fragmented in research. We present three views on appropriate reliance, namely Traditional, Appropriateness, and Dominance, as discussed in research. Using these views, we evaluate objective metrics reported in studies and argue for their consensus to facilitate the comparison across empirical research. We also discuss how studies employ objective metrics and examine their validity in application contexts. Our work contributes to the critical body of research on exploring objective metrics for assessing humans' appropriate reliance on AI advice.

What carries the argument

The three views on appropriate reliance—Traditional, Appropriateness, and Dominance—that classify how studies define and measure whether users follow AI advice correctly rather than blindly or not at all.

If this is right

  • Objective metrics reported in studies can be evaluated and compared by mapping them to the three views.
  • Consensus on a shared set of metrics would allow direct comparison of findings across different empirical studies.
  • Studies should explicitly differentiate appropriate reliance from trust and from mere reliance to clarify what is being assessed.
  • Examining how objective metrics perform in specific application contexts reveals their practical validity and limits.
  • Organizing measures under the three views highlights gaps where new metrics may be needed.

Where Pith is reading between the lines

These are editorial extensions of the paper, not claims the author makes directly.

  • Convergence on one dominant view could simplify future experiments while still allowing context-specific adjustments.
  • The views could be tested in new studies to see which one best predicts whether users calibrate their reliance in real deployments.
  • Linking the views to established models of human judgment might explain why trust alone fails to ensure appropriate use of AI.

Load-bearing premise

The reviewed empirical studies are representative of the full human-AI decision-making literature and the three proposed views comprehensively capture without significant gaps or overlaps the range of measurement approaches used.

What would settle it

A new review that identifies many studies whose reliance measures do not fit any of the three views, or that shows most studies align with only one view without the others adding explanatory power, would challenge the proposed classification.

Figures

Figures reproduced from arXiv: 2604.23896 by Konstantinos Papangelis, Muhammad Raees.

Figure 1
Figure 1. Figure 1: The PRISMA [58] framework was used to assess research studies. The primary search was conducted on SCOPUS, with a secondary search on the ACM Digital Library (ACM DL). 3 Methodology We followed a systematic protocol (i.e., the PRISMA [58] framework, as summarized in view at source ↗
Figure 2
Figure 2. Figure 2: Left: The prevalence of objective metrics reported in selected studies. Accuracy is the most widely used measure. Common metrics for appropriate reliance used are over-reliance, under-reliance, RAIR, RST, and Accuracy-Wid. Studies also measure users’ reliance on AI through agreements or switches. Right: The prevalence of subjective metrics reported in selected studies. Studies commonly use confidence, trus… view at source ↗
read the original abstract

While human-AI decision-making research has primarily used trust measurements to assess the practical usage of AI systems by their end-users, recent empirical evidence suggests that trust measurements do not inform users' appropriate reliance on AI systems. While examining the human-AI decision-making literature, in this work, we review empirical studies that assess people's appropriate reliance on AI advice, differentiating measurements and constructs of appropriate reliance from trust and mere reliance. Our analysis of literature shows that constructs for human-AI appropriate reliance are still fragmented in research. We present three views on appropriate reliance, namely Traditional, Appropriateness, and Dominance, as discussed in research. Using these views, we evaluate objective metrics reported in studies and argue for their consensus to facilitate the comparison across empirical research. We also discuss how studies employ objective metrics and examine their validity in application contexts. Our work contributes to the critical body of research on exploring objective metrics for assessing humans' appropriate reliance on AI advice.

Editorial analysis

A structured set of objections, weighed in public.

Desk editor's note, referee report, simulated authors' rebuttal, and a circularity audit. Tearing a paper down is the easy half of reading it; the pith above is the substance, this is the friction.

Referee Report

2 major / 2 minor

Summary. The manuscript presents a literature review of empirical studies in human-AI decision-making. It argues that trust measurements do not adequately inform appropriate reliance on AI advice, differentiates appropriate reliance constructs from trust and mere reliance, identifies fragmentation in the literature, proposes three organizing views (Traditional, Appropriateness, and Dominance), evaluates objective metrics reported in studies through these views, advocates for consensus on such metrics to enable cross-study comparison, and discusses the validity of these metrics in application contexts.

Significance. If the taxonomy is robust and the metric evaluations accurate, the work could help standardize measurement practices in human-AI interaction research, moving beyond over-reliance on trust scales and enabling better synthesis of empirical findings. The emphasis on objective metrics and contextual validity adds practical value for designing and evaluating AI-assisted decision systems.

major comments (2)
  1. The central diagnosis of fragmentation and the call for metric consensus rest on the claim that all relevant constructs can be usefully partitioned into exactly three non-overlapping views (Traditional, Appropriateness, Dominance). The manuscript must provide explicit classification criteria, a mapping of reviewed studies' metrics to these views, and explicit discussion of potential overlaps or gaps (e.g., a metric that simultaneously meets Appropriateness and Dominance criteria). Without this, the fragmentation argument and downstream evaluation lose force.
  2. The evaluation of objective metrics and their validity in application contexts inherits the same dependency on the three-view framework. The manuscript should report the search methodology, inclusion criteria, and number of studies reviewed so that readers can assess whether the sample is representative and whether the taxonomy comprehensively covers the literature without significant omissions.
minor comments (2)
  1. The abstract could briefly note the number of studies reviewed and one concrete example of a metric validity issue to give readers an immediate sense of the empirical scope.
  2. Ensure consistent use of terminology distinguishing 'trust', 'reliance', and 'appropriate reliance' across sections, especially when citing prior work.

Simulated Author's Rebuttal

2 responses · 0 unresolved

We thank the referee for their constructive feedback, which highlights important areas for strengthening the rigor and transparency of our literature review. We address each major comment below and commit to revisions that directly respond to the concerns raised.

read point-by-point responses
  1. Referee: The central diagnosis of fragmentation and the call for metric consensus rest on the claim that all relevant constructs can be usefully partitioned into exactly three non-overlapping views (Traditional, Appropriateness, and Dominance). The manuscript must provide explicit classification criteria, a mapping of reviewed studies' metrics to these views, and explicit discussion of potential overlaps or gaps (e.g., a metric that simultaneously meets Appropriateness and Dominance criteria). Without this, the fragmentation argument and downstream evaluation lose force.

    Authors: We agree that explicit criteria and a transparent mapping are necessary to substantiate the three-view framework. In the revised manuscript, we will add a dedicated subsection that provides precise, operational classification criteria for assigning metrics to the Traditional, Appropriateness, and Dominance views. We will also include a table that systematically maps the objective metrics reported in each reviewed study to these views. Finally, we will expand the discussion to explicitly address potential overlaps and gaps, including concrete examples of metrics that could satisfy criteria from more than one view and our rationale for primary classification in such cases. revision: yes

  2. Referee: The evaluation of objective metrics and their validity in application contexts inherits the same dependency on the three-view framework. The manuscript should report the search methodology, inclusion criteria, and number of studies reviewed so that readers can assess whether the sample is representative and whether the taxonomy comprehensively covers the literature without significant omissions.

    Authors: We acknowledge that the current version does not provide sufficient methodological detail for readers to evaluate the scope of the review. We will add a new 'Review Methodology' section that describes the literature search strategy (databases, keywords, and time period), the inclusion and exclusion criteria applied, the total number of studies screened and ultimately included, and any limitations regarding coverage of the broader literature. This addition will directly support assessment of the taxonomy's comprehensiveness. revision: yes

Circularity Check

0 steps flagged

Literature review with no derivations, equations, or self-referential reductions.

full rationale

The paper is a review of external empirical studies on human-AI decision-making. It identifies fragmentation in the literature and organizes findings into three views (Traditional, Appropriateness, Dominance) drawn from the reviewed works. No equations, fitted parameters, or internal derivations exist. All claims rest on analysis and citation of independent prior studies rather than self-definition, self-citation chains, or renaming of the paper's own inputs. The taxonomy functions as an organizational lens on external data, not a construct that reduces to the paper's own definitions or assumptions.

Axiom & Free-Parameter Ledger

0 free parameters · 0 axioms · 0 invented entities

This is a literature review paper. No free parameters, axioms, or invented entities are introduced because the central claim is a synthesis and categorization of existing research rather than a new model or derivation.

pith-pipeline@v0.9.0 · 5460 in / 1104 out tokens · 65473 ms · 2026-05-08T05:27:29.615377+00:00 · methodology

discussion (0)

Sign in with ORCID, Apple, or X to comment. Anyone can read and Pith papers without signing in.

Reference graph

Works this paper leans on

96 extracted references · 66 canonical work pages · 1 internal anchor

  1. [1]

    Alonso-Moral, Roberto Confalonieri, Riccardo Guidotti, Javier Del Ser, Natalia Díaz-Rodríguez, and Francisco Herrera

    Sajid Ali, Tamer Abuhmed, Shaker El-Sappagh, Khan Muhammad, Jose M. Alonso-Moral, Roberto Confalonieri, Riccardo Guidotti, Javier Del Ser, Natalia Díaz-Rodríguez, and Francisco Herrera. 2023. Explainable Artificial Intelligence (XAI): What we know and what is left to attain Trustworthy Artificial Intelligence.Information Fusion99 (2023), 101805. doi:10.10...

  2. [2]

    2026.Anonymized Material for Study: Review Protocol, Data Analysis, and Detailed Descriptions

    OSF Anonymous. 2026.Anonymized Material for Study: Review Protocol, Data Analysis, and Detailed Descriptions. https://osf.io/nxph2/overview? view_only=2467fe0607724f2bbf4d9ca9c36e86f0

  3. [3]

    Annette Baier. 1986. Trust and antitrust.ethics96, 2 (1986), 231–260

  4. [4]

    Gagan Bansal, Tongshuang Wu, Joyce Zhou, Raymond Fok, Besmira Nushi, Ece Kamar, Marco Tulio Ribeiro, and Daniel Weld. 2021. Does the Whole Exceed its Parts? The Effect of AI Explanations on Complementary Team Performance. InProceedings of the 2021 CHI Conference on Human 16 Raees and Papangelis Factors in Computing Systems(Yokohama, Japan)(CHI ’21). Assoc...

  5. [5]

    Eagan, and Winston Maxwell

    Astrid Bertrand, Rafik Belloum, James R. Eagan, and Winston Maxwell. 2022. How Cognitive Biases Affect XAI-assisted Decision-making: A Systematic Review. InProceedings of the 2022 AAAI/ACM Conference on AI, Ethics, and Society(Oxford, United Kingdom)(AIES ’22). Association for Computing Machinery, New York, NY, USA, 78–91. doi:10.1145/3514094.3534164

  6. [6]

    Michelle Brachman, Zahra Ashktorab, Michael Desmond, Evelyn Duesterwald, Casey Dugan, Narendra Nath Joshi, Qian Pan, and Aabhas Sharma

  7. [7]

    ACM Hum.-Comput

    Reliance and Automation for Human-AI Collaborative Data Labeling Conflict Resolution.Proc. ACM Hum.-Comput. Interact.6, CSCW2, Article 321 (Nov. 2022), 27 pages. doi:10.1145/3555212

  8. [8]

    Zana Buçinca, Maja Barbara Malaya, and Krzysztof Z. Gajos. 2021. To Trust or to Think: Cognitive Forcing Functions Can Reduce Overreliance on AI in AI-assisted Decision-making.Proc. ACM Hum.-Comput. Interact.5, CSCW1, Article 188 (April 2021), 21 pages. doi:10.1145/3449287

  9. [9]

    Paluch, Finale Doshi-Velez, and Krzysztof Z

    Zana Buçinca, Siddharth Swaroop, Amanda E. Paluch, Finale Doshi-Velez, and Krzysztof Z. Gajos. 2025. Contrastive Explanations That Anticipate Human Misconceptions Can Improve Human Decision-Making Skills. InProceedings of the 2025 CHI Conference on Human Factors in Computing Systems (CHI ’25). Association for Computing Machinery, New York, NY, USA, Articl...

  10. [10]

    Federico Cabitza, Andrea Campagner, Riccardo Angius, Chiara Natali, and Carlo Reverberi. 2023. AI Shall Have No Dominion: on How to Measure Technology Dominance in AI-supported Human decision-making. InProceedings of the 2023 CHI Conference on Human Factors in Computing Systems (Hamburg, Germany)(CHI ’23). Association for Computing Machinery, New York, NY...

  11. [11]

    Casolin and Flora D

    Emma R. Casolin and Flora D. Salim. 2024. Towards Understanding Human-AI Reliance Patterns Through Explanation Styles. InCompanion of the 2024 on ACM International Joint Conference on Pervasive and Ubiquitous Computing(Melbourne VIC, Australia)(UbiComp ’24). Association for Computing Machinery, New York, NY, USA, 861–865. doi:10.1145/3675094.3678996

  12. [12]

    Federico Maria Cau, Hanna Hauptmann, Lucio Davide Spano, and Nava Tintarev. 2023. Effects of AI and Logic-Style Explanations on Users’ Decisions Under Different Levels of Uncertainty.ACM Trans. Interact. Intell. Syst.13, 4, Article 22 (Dec. 2023), 42 pages. doi:10.1145/3588320

  13. [13]

    Federico Maria Cau and Lucio Davide Spano. 2025. The Influence of Curiosity Traits and On-Demand Explanations in AI-Assisted Decision-Making. InProceedings of the 30th International Conference on Intelligent User Interfaces (IUI ’25). Association for Computing Machinery, New York, NY, USA, 1440–1457. doi:10.1145/3708359.3712165

  14. [14]

    Vera Liao, Jennifer Wortman Vaughan, and Gagan Bansal

    Valerie Chen, Q. Vera Liao, Jennifer Wortman Vaughan, and Gagan Bansal. 2023. Understanding the Role of Human Intuition on Reliance in Human-AI Decision-Making with Explanations. 7, CSCW2, Article 370 (Oct. 2023), 32 pages. doi:10.1145/3610219

  15. [15]

    Chun-Wei Chiang, Zhuoran Lu, Zhuoyan Li, and Ming Yin. 2023. Are Two Heads Better Than One in AI-Assisted Decision Making? Comparing the Behavior and Performance of Groups and Individuals in Human-AI Collaborative Recidivism Risk Assessment. InProceedings of the 2023 CHI Conference on Human Factors in Computing Systems(Hamburg, Germany)(CHI ’23). Associat...

  16. [16]

    Hyesun Choung, Prabu David, and Arun Ross. 2023. Trust in AI and Its Role in the Acceptance of AI Technologies.International Journal of Human–Computer Interaction39, 9 (2023), 1727–1739. doi:10.1080/10447318.2022.2050543

  17. [17]

    Sam Corbett-Davies, Emma Pierson, Avi Feller, Sharad Goel, and Aziz Huq. 2017. Algorithmic decision making and the cost of fairness. InProceedings of the 23rd acm sigkdd international conference on knowledge discovery and data mining. 797–806

  18. [18]

    NAJ Cornelissen, RJM Van Eerdt, HK Schraffenberger, and Willem FG Haselager. 2022. Reflection machines: increasing meaningful human control over Decision Support Systems.Ethics and Information Technology24, 2 (2022), 19. doi:10.1007/s10676-022-09645-y

  19. [19]

    Nils Dahlbäck, Arne Jönsson, and Lars Ahrenberg. 1993. Wizard of Oz studies: why and how. InProceedings of the 1st international conference on Intelligent user interfaces. Association for Computing Machinery, 193–200. doi:10.1145/169891.169968

  20. [20]

    Devleena Das and Sonia Chernova. 2020. Leveraging rationales to improve human task performance. InProceedings of the 25th international conference on intelligent user interfaces. 510–518

  21. [21]

    Sander de Jong, Ville Paananen, Benjamin Tag, and Niels van Berkel. 2025. Cognitive Forcing for Better Decision-Making: Reducing Overreliance on AI Systems Through Partial Explanations. 9, 2, Article CSCW048 (May 2025), 30 pages. doi:10.1145/3710946

  22. [22]

    Dominik Dellermann, Philipp Ebel, Matthias Söllner, and Jan Marco Leimeister. 2019. Hybrid intelligence.Business & Information Systems Engineering 61, 5 (2019), 637–643. doi:10.1007/s12599-019-00595-2

  23. [23]

    Berkeley J Dietvorst, Joseph P Simmons, and Cade Massey. 2015. Algorithm aversion: people erroneously avoid algorithms after seeing them err. Journal of experimental psychology: General144, 1 (2015), 114

  24. [24]

    Finale Doshi-Velez and Been Kim. 2017. Towards a rigorous science of interpretable machine learning.arXiv preprint arXiv:1702.08608(2017)

  25. [25]

    Dudley and Per Ola Kristensson

    John J. Dudley and Per Ola Kristensson. 2018. A Review of User Interface Design for Interactive Machine Learning.ACM Trans. Interact. Intell. Syst. 8, 2, Article 8 (June 2018), 37 pages. doi:10.1145/3185517

  26. [26]

    Dzindolet, Scott A

    Mary T. Dzindolet, Scott A. Peterson, Regina A. Pomranky, Linda G. Pierce, and Hall P. Beck. 2003. The role of trust in automation reliance. International Journal of Human-Computer Studies58, 6 (2003), 697–718. doi:10.1016/S1071-5819(03)00038-7 Trust and Technology

  27. [27]

    Sven Eckhardt, Niklas Kühl, Mateusz Dolata, and Gerhard Schwabe. 2024. A survey of AI reliance.arXiv preprint arXiv:2408.03948(2024). doi:10.48550/arXiv.2408.03948

  28. [28]

    Susanne Gaube, Harini Suresh, Martina Raue, Alexander Merritt, Seth J Berkowitz, Eva Lermer, Joseph F Coughlin, John V Guttag, Errol Colak, and Marzyeh Ghassemi. 2021. Do as AI say: susceptibility in deployment of clinical decision-aids.NPJ digital medicine4, 1 (2021), 31. From Trust to Appropriate Reliance 17

  29. [29]

    Ben Green and Yiling Chen. 2019. The Principles and Limits of Algorithm-in-the-Loop Decision Making.Proc. ACM Hum.-Comput. Interact.3, CSCW, Article 50 (Nov. 2019), 24 pages. doi:10.1145/3359152

  30. [30]

    Hartline, and Jessica Hullman

    Ziyang Guo, Yifan Wu, Jason D. Hartline, and Jessica Hullman. 2024. A Decision Theoretic Framework for Measuring AI Reliance. InProceedings of the 2024 ACM Conference on Fairness, Accountability, and Transparency(Rio de Janeiro, Brazil)(FAccT ’24). Association for Computing Machinery, New York, NY, USA, 221–236. doi:10.1145/3630106.3658901

  31. [31]

    PA Hancock, Theresa T Kessler, Alexandra D Kaplan, Kimberly Stowers, J Christopher Brill, Deborah R Billings, Kristin E Schaefer, and James L Szalma

  32. [32]

    doi:10.3389/fpsyg.2023.1081086

    How and why humans trust: A meta-analysis and elaborated model.Frontiers in psychology14 (2023), 1081086. doi:10.3389/fpsyg.2023.1081086

  33. [33]

    Gaole He, Nilay Aishwarya, and Ujwal Gadiraju. 2025. Is Conversational XAI All You Need? Human-AI Decision Making With a Conversational XAI Assistant. InProceedings of the 30th International Conference on Intelligent User Interfaces (IUI ’25). Association for Computing Machinery, New York, NY, USA, 907–924. doi:10.1145/3708359.3712133

  34. [34]

    Gaole He, Abri Bharos, and Ujwal Gadiraju. 2024. To Err Is AI! Debugging as an Intervention to Facilitate Appropriate Reliance on AI Systems. In Proceedings of the 35th ACM Conference on Hypertext and Social Media(Poznan, Poland)(HT ’24). Association for Computing Machinery, New York, NY, USA, 98–105. doi:10.1145/3648188.3675130

  35. [35]

    Gaole He, Stefan Buijsman, and Ujwal Gadiraju. 2023. How Stated Accuracy of an AI System and Analogies to Explain Accuracy Affect Human Reliance on the System.Proc. ACM Hum.-Comput. Interact.7, CSCW2, Article 276 (Oct. 2023), 29 pages. doi:10.1145/3610067

  36. [36]

    Gaole He, Lucie Kuiper, and Ujwal Gadiraju. 2023. Knowing About Knowing: An Illusion of Human Competence Can Hinder Appropriate Reliance on AI Systems. InProceedings of the 2023 CHI Conference on Human Factors in Computing Systems(Hamburg, Germany)(CHI ’23). Association for Computing Machinery, New York, NY, USA, Article 113, 18 pages. doi:10.1145/3544548.3581025

  37. [37]

    Maren Hinrichs, Thi Bich Diep Bui, and Stefan Schneegass. 2024. Exploring the Effects of User Input and Decision Criteria Control on Trust in a Decision Support Tool for Spare Parts Inventory Management. InProceedings of the International Conference on Mobile and Ubiquitous Multimedia. 313–323

  38. [38]

    Kevin Anthony Hoff and Masooda Bashir. 2015. Trust in Automation: Integrating Empirical Evidence on Factors That Influence Trust.Human Factors57, 3 (2015), 407–434. doi:10.1177/0018720814547570 PMID: 25875432

  39. [39]

    Angel Hsing-Chi Hwang, Q Vera Liao, Su Lin Blodgett, Alexandra Olteanu, and Adam Trischler. 2025. ’It was 80% me, 20% AI’: Seeking Authenticity in Co-Writing with Large Language Models.Proceedings of the ACM on Human-Computer Interaction9, 2 (2025), 1–41

  40. [40]

    2011.Thinking, fast and slow

    Daniel Kahneman. 2011.Thinking, fast and slow. macmillan

  41. [41]

    Daniel Kahneman and Gary Klein. 2009. Conditions for intuitive expertise: a failure to disagree.American psychologist64, 6 (2009), 515

  42. [42]

    Understanding trust and reliance development in AI advice: Assessing model accuracy, model explanations, and experi- ences from previous interactions

    Patricia K. Kahr, Gerrit Rooks, Martijn C. Willemsen, and Chris C. P. Snijders. 2024. Understanding Trust and Reliance Development in AI Advice: Assessing Model Accuracy, Model Explanations, and Experiences from Previous Interactions.ACM Trans. Interact. Intell. Syst.14, 4, Article 29 (Dec. 2024), 30 pages. doi:10.1145/3686164

  43. [43]

    Escalation Risks from Language Models in Military and Diplomatic Decision-Making

    Sunnie S. Y. Kim, Q. Vera Liao, Mihaela Vorvoreanu, Stephanie Ballard, and Jennifer Wortman Vaughan. 2024. "I’m Not Sure, But... ": Examining the Impact of Large Language Models’ Uncertainty Expression on User Reliance and Trust. InProceedings of the 2024 ACM Conference on Fairness, Accountability, and Transparency(Rio de Janeiro, Brazil)(FAccT ’24). Asso...

  44. [44]

    Wouter Kool and Matthew Botvinick. 2018. Mental labour.Nature human behaviour2, 12 (2018), 899–908

  45. [45]

    Abdu, Irene V

    Vivian Lai, Chacha Chen, Alison Smith-Renner, Q. Vera Liao, and Chenhao Tan. 2023. Towards a Science of Human-AI Decision Making: An Overview of Design Space in Empirical Human-Subject Studies. InProceedings of the 2023 ACM Conference on Fairness, Accountability, and Transparency (Chicago, IL, USA)(FAccT ’23). Association for Computing Machinery, New York...

  46. [46]

    Nancy K Lankton, D Harrison McKnight, and John Tripp. 2015. Technology, humanness, and trust: Rethinking trust in technology.Journal of the association for information systems16, 10 (2015), 1. doi:10.17705/1jais.00411

  47. [47]

    Lee and Katrina A

    John D. Lee and Katrina A. See. 2004. Trust in Automation: Designing for Appropriate Reliance.Human Factors46, 1 (2004), 50–80. doi:10.1518/hfes. 46.1.50_30392 PMID: 15151155

  48. [48]

    Benedikt Leichtmann, Christina Humer, Andreas Hinterreiter, Marc Streit, and Martina Mara. 2023. Effects of Explainable Artificial Intelligence on trust and human behavior in a high-risk decision task.Computers in Human Behavior139 (2023), 107539

  49. [49]

    Logg, Julia A

    Jennifer M. Logg, Julia A. Minson, and Don A. Moore. 2019. Algorithm appreciation: People prefer algorithmic to human judgment.Organizational Behavior and Human Decision Processes151 (2019), 90–103. doi:10.1016/j.obhdp.2018.12.005

  50. [50]

    Luca Longo, Mario Brcic, Federico Cabitza, Jaesik Choi, Roberto Confalonieri, Javier Del Ser, Riccardo Guidotti, Yoichi Hayashi, Francisco Herrera, Andreas Holzinger, Richard Jiang, Hassan Khosravi, Freddy Lecue, Gianclaudio Malgieri, Andrés Páez, Wojciech Samek, Johannes Schneider, Timo Speith, and Simone Stumpf. 2024. Explainable Artificial Intelligence...

  51. [51]

    Zhuoran Lu, Dakuo Wang, and Ming Yin. 2024. Does More Advice Help? The Effects of Second Opinions in AI-Assisted Decision Making.Proc. ACM Hum.-Comput. Interact.8, CSCW1, Article 217 (April 2024), 31 pages. doi:10.1145/3653708

  52. [52]

    Shuai Ma, Qiaoyi Chen, Xinru Wang, Chengbo Zheng, Zhenhui Peng, Ming Yin, and Xiaojuan Ma. 2025. Towards Human-AI Deliberation: Design and Evaluation of LLM-Empowered Deliberative AI for AI-Assisted Decision-Making. InProceedings of the 2025 CHI Conference on Human Factors in Computing Systems (CHI ’25). Association for Computing Machinery, New York, NY, ...

  53. [53]

    Shuai Ma, Ying Lei, Xinru Wang, Chengbo Zheng, Chuhan Shi, Ming Yin, and Xiaojuan Ma. 2023. Who Should I Trust: AI or Myself? Leveraging Human and AI Correctness Likelihood to Promote Appropriate Trust in AI-Assisted Decision-Making. InProceedings of the 2023 CHI Conference on Human Factors in Computing Systems(Hamburg, Germany)(CHI ’23). Association for ...

  54. [54]

    Are You Really Sure?

    Shuai Ma, Xinru Wang, Ying Lei, Chuhan Shi, Ming Yin, and Xiaojuan Ma. 2024. “Are You Really Sure?” Understanding the Effects of Human Self-Confidence Calibration in AI-Assisted Decision Making. InProceedings of the 2024 CHI Conference on Human Factors in Computing Systems (Honolulu, HI, USA)(CHI ’24). Association for Computing Machinery, New York, NY, US...

  55. [55]

    Scott Mayer McKinney, Marcin Sieniek, Varun Godbole, Jonathan Godwin, Natasha Antropova, Hutan Ashrafian, Trevor Back, Mary Chesus, Greg S Corrado, Ara Darzi, et al. 2020. International evaluation of an AI system for breast cancer screening.Nature577, 7788 (2020), 89–94

  56. [56]

    Jonker, and Myrthe L

    Siddharth Mehrotra, Chadha Degachi, Oleksandra Vereschak, Catholijn M. Jonker, and Myrthe L. Tielman. 2024. A Systematic Review on Fostering Appropriate Trust in Human-AI Interaction: Trends, Opportunities and Challenges.ACM J. Responsib. Comput.1, 4, Article 26 (Nov. 2024), 45 pages. doi:10.1145/3696449

  57. [57]

    Tim Miller. 2023. Explainable AI is Dead, Long Live Explainable AI! Hypothesis-driven Decision Support using Evaluative AI. InProceedings of the 2023 ACM Conference on Fairness, Accountability, and Transparency(Chicago, IL, USA)(FAccT ’23). Association for Computing Machinery, New York, NY, USA, 333–342. doi:10.1145/3593013.3594001

  58. [58]

    Katelyn Morrison, Donghoon Shin, Kenneth Holstein, and Adam Perer. 2023. Evaluating the Impact of Human Explanation Strategies on Human-AI Visual Decision-Making.Proc. ACM Hum.-Comput. Interact.7, CSCW1, Article 48 (April 2023), 37 pages. doi:10.1145/3579481

  59. [59]

    Katelyn Morrison, Philipp Spitzer, Violet Turri, Michelle Feng, Niklas Kühl, and Adam Perer. 2024. The Impact of Imperfect XAI on Human-AI Decision-Making.Proc. ACM Hum.-Comput. Interact.8, CSCW1, Article 183 (April 2024), 39 pages. doi:10.1145/3641022

  60. [60]

    Matthew J Page, Joanne E McKenzie, Patrick M Bossuyt, Isabelle Boutron, Tammy C Hoffmann, Cynthia D Mulrow, Larissa Shamseer, Jennifer M Tetzlaff, Elie A Akl, Sue E Brennan, Roger Chou, Julie Glanville, Jeremy M Grimshaw, Asbjørn Hróbjartsson, Manoj M Lalu, Tianjing Li, Elizabeth W Loder, Evan Mayo-Wilson, Steve McDonald, Luke A McGuinness, Lesley A Stewa...

  61. [61]

    Raja Parasuraman and Dietrich H Manzey. 2010. Complacency and bias in human use of automation: An attentional integration.Human factors52, 3 (2010), 381–410. doi:10.1177/0018720810376055

  62. [62]

    Saumya Pareek, Niels van Berkel, Eduardo Velloso, and Jorge Goncalves. 2024. Effect of Explanation Conceptualisations on Trust in AI-assisted Credibility Assessment.Proc. ACM Hum.-Comput. Interact.8, CSCW2, Article 383 (Nov. 2024), 31 pages. doi:10.1145/3686922

  63. [63]

    Alison Parkes. 2017. The effect of individual and task characteristics on decision aid reliance.Behaviour & Information Technology36, 2 (2017), 165–177

  64. [64]

    Muhammad Raees, Vassilis-Javed Khan, Ioanna Lykourentzou, and Konstantinos Papangelis. 2026. Do People Appropriately Rely on AI-Advice? An Analytical Review of HCI Research on Human-AI Decision-Making. InProceedings of the 2026 CHI Conference on Human Factors in Computing Systems(Barcelona, Spain)(CHI ’26). Association for Computing Machinery, New York, N...

  65. [65]

    Muhammad Raees, Inge Meijerink, Ioanna Lykourentzou, Vassilis-Javed Khan, and Konstantinos Papangelis. 2024. From explainable to interactive AI: A literature review on current trends in human-AI interaction.International Journal of Human-Computer Studies189, September 2024 (2024), 103301. doi:10.1016/j.ijhcs.2024.103301

  66. [66]

    Muhammad Raees and Konstantinos Papangelis. 2026. Trust to Reliance: Measurement Constructs for Human-AI Appropriate Reliance. InProceedings of the Extended Abstracts of the 2026 CHI Conference on Human Factors in Computing Systems (CHI EA ’26). Association for Computing Machinery, New York, NY, USA, Article 695, 7 pages. doi:10.1145/3772363.3798835

  67. [67]

    Gonzalo Ramos, Christopher Meek, Patrice Simard, Jina Suh, and Soroush Ghorashi. 2020. Interactive machine teaching: a human-centered approach to building machine-learned models.Human–Computer Interaction35, 5-6 (2020), 413–451

  68. [68]

    Varshney, Amit Dhurandhar, and Richard Tomsett

    Charvi Rastogi, Yunfeng Zhang, Dennis Wei, Kush R. Varshney, Amit Dhurandhar, and Richard Tomsett. 2022. Deciding Fast and Slow: The Role of Cognitive Biases in AI-assisted Decision-making.Proc. ACM Hum.-Comput. Interact.6, CSCW1, Article 83 (April 2022), 22 pages. doi:10.1145/3512930

  69. [69]

    Giuseppe Romeo and Daniela Conti. 2026. Exploring automation bias in human–AI collaboration: a review and implications for explainable AI.AI & SOCIETY41, 1 (2026), 259–278

  70. [70]

    Sara Salimzadeh, Gaole He, and Ujwal Gadiraju. 2023. A Missing Piece in the Puzzle: Considering the Role of Task Complexity in Human-AI Decision Making. InProceedings of the 31st ACM Conference on User Modeling, Adaptation and Personalization(Limassol, Cyprus)(UMAP ’23). Association for Computing Machinery, New York, NY, USA, 215–227. doi:10.1145/3565472.3592959

  71. [71]

    Sara Salimzadeh, Gaole He, and Ujwal Gadiraju. 2024. Dealing with Uncertainty: Understanding the Impact of Prognostic Versus Diagnostic Tasks on Trust and Reliance in Human-AI Decision Making. InProceedings of the 2024 CHI Conference on Human Factors in Computing Systems(Honolulu, HI, USA)(CHI ’24). Association for Computing Machinery, New York, NY, USA, ...

  72. [72]

    Nicolas Scharowski, Sebastian AC Perrig, Melanie Svab, Klaus Opwis, and Florian Brühlmann. 2023. Exploring the effects of human-centered AI explanations on trust and reliance.Frontiers in Computer Science5 (2023), 1151150. doi:10.3389/fcomp.2023.1151150

  73. [73]

    Nicolas Scharowski, Sebastian AC Perrig, Nick von Felten, and Florian Brühlmann. 2022. Trust and reliance in XAI–Distinguishing between attitudinal and behavioral measures.arXiv preprint arXiv:2203.12318(2022)

  74. [74]

    Max Schemmer, Andrea Bartos, Philipp Spitzer, Patrick Hemmer, Niklas Kühl, Jonas Liebschner, and Gerhard Satzger. 2023. Towards effective human-AI decision-making: The role of human learning in appropriate reliance on AI advice. InForty-Fourth International Conference on Information From Trust to Appropriate Reliance 19 Systems(Hyderabad, India)(ICIS 2023...

  75. [75]

    Max Schemmer, Niklas Kuehl, Carina Benz, Andrea Bartos, and Gerhard Satzger. 2023. Appropriate Reliance on AI Advice: Conceptualization and the Effect of Explanations. InProceedings of the 28th International Conference on Intelligent User Interfaces(Sydney, NSW, Australia)(IUI ’23). Association for Computing Machinery, New York, NY, USA, 410–422. doi:10.1...

  76. [76]

    Nadine Schlicker and Markus Langer. 2021. Towards warranted trust: A model on the relation between actual and perceived system trustworthiness. InProceedings of mensch und computer 2021. 325–329

  77. [77]

    Jakob Schoeffer, Maria De-Arteaga, and Niklas Kühl. 2024. Explanations, Fairness, and Appropriate Reliance in Human-AI Decision-Making. In Proceedings of the 2024 CHI Conference on Human Factors in Computing Systems(Honolulu, HI, USA)(CHI ’24). Association for Computing Machinery, New York, NY, USA, Article 836, 18 pages. doi:10.1145/3613904.3642621

  78. [78]

    Jakob Schoeffer, Johannes Jakubik, Michael Vössing, Niklas Kühl, and Gerhard Satzger. 2025. AI reliance and decision quality: Fundamentals, interdependence, and the effects of interventions.Journal of Artificial Intelligence Research82 (2025), 471–501. doi:10.1613/jair.1.15873

  79. [79]

    Phoebe Sengers, Kirsten Boehner, Shay David, and Joseph’Jofish’ Kaye. 2005. Reflective design. InProceedings of the 4th decennial conference on Critical computing: between sense and sensibility. 49–58

  80. [80]

    Keng Siau and Weiyu Wang. 2018. Building trust in artificial intelligence, machine learning, and robotics.Cutter business technology journal31, 2 (2018), 47–53

Showing first 80 references.