AI, Meet Human: Learning Paradigms for Hybrid Decision Making Systems

Clara Punzi; Dino Pedreschi; Fosca Giannotti; Mattia Setzu; Roberto Pellungrini

arxiv: 2402.06287 · v3 · submitted 2024-02-09 · 💻 cs.LG · cs.AI· cs.HC

AI, Meet Human: Learning Paradigms for Hybrid Decision Making Systems

Clara Punzi , Roberto Pellungrini , Mattia Setzu , Fosca Giannotti , Dino Pedreschi This is my paper

Pith reviewed 2026-05-24 04:09 UTC · model grok-4.3

classification 💻 cs.LG cs.AIcs.HC

keywords hybrid decision makinghuman-machine interactionmachine learningtaxonomysurveyhuman-AI collaboration

0 comments

The pith

A taxonomy organizes the ways humans and machine learning systems interact in decisions.

A machine-rendered reading of the paper's core claim, the machinery that carries it, and where it could break.

The paper surveys scattered techniques from computer science on how people train, use, and collaborate with machine learning models. It groups these techniques into one taxonomy of Hybrid Decision Making Systems that supplies both a conceptual map and technical distinctions. A sympathetic reader would care because clearer categories could help designers and researchers compare approaches that currently sit in separate literatures. The survey treats the taxonomy as a framework for modeling human-machine interaction rather than as a new algorithm or experiment.

Core claim

The authors propose a taxonomy of Hybrid Decision Making Systems that supplies both a conceptual and technical framework for understanding how current computer science literature models interaction between humans and machines.

What carries the argument

The taxonomy of Hybrid Decision Making Systems, which classifies literature on human-ML interaction into coherent categories.

If this is right

Researchers gain a shared language for comparing different human-ML collaboration techniques.
System designers can select interaction paradigms with explicit awareness of their technical and conceptual differences.
Future surveys can build on the same categories rather than starting from scattered classifications.

Where Pith is reading between the lines

These are editorial extensions of the paper, not claims the author makes directly.

The taxonomy could serve as a checklist when new hybrid systems are proposed, to check which interaction modes are already covered.
If the taxonomy holds, it might reveal under-explored combinations of human and machine roles that current papers overlook.

Load-bearing premise

The varied techniques in the computer science literature on human-ML interaction can be organized into one coherent taxonomy without substantial loss of important distinctions or coverage gaps.

What would settle it

A systematic review that identifies multiple high-impact human-ML interaction methods or goals that fall outside every category in the proposed taxonomy.

Figures

Figures reproduced from arXiv: 2402.06287 by Clara Punzi, Dino Pedreschi, Fosca Giannotti, Mattia Setzu, Roberto Pellungrini.

**Figure 2.** Figure 2: Overview of the joint learning architecture for the Single-Expert Learning to Defer (L2D-SE) setting, [PITH_FULL_IMAGE:figures/full_fig_p016_2.png] view at source ↗

**Figure 3.** Figure 3: An example of Question Answering machine with a hard reasoning language. The agent maps the [PITH_FULL_IMAGE:figures/full_fig_p025_3.png] view at source ↗

**Figure 4.** Figure 4: An example of a Question Answering hybrid system with a soft reasoning language. The [PITH_FULL_IMAGE:figures/full_fig_p027_4.png] view at source ↗

**Figure 5.** Figure 5: An example of aQuestion Answering hybrid system employing a soft reasoning language and leveraging [PITH_FULL_IMAGE:figures/full_fig_p027_5.png] view at source ↗

read the original abstract

Everyday we increasingly rely on machine learning models to automate and support high-stake tasks and decisions. This growing presence means that humans are now constantly interacting with machine learning-based systems, training and using models everyday. Several different techniques in computer science literature account for the human interaction with machine learning systems, but their classification is sparse and the goals varied. This survey proposes a taxonomy of Hybrid Decision Making Systems, providing both a conceptual and technical framework for understanding how current computer science literature models interaction between humans and machines.

Editorial analysis

A structured set of objections, weighed in public.

Desk editor's note, referee report, simulated authors' rebuttal, and a circularity audit. Tearing a paper down is the easy half of reading it; the pith above is the substance, this is the friction.

Desk Editor's Note private letter to a colleague

This is a literature survey proposing a taxonomy for hybrid human-AI decision systems with no new empirical results or derivations.

read the letter

The main point to know is that this paper is a survey proposing a taxonomy of Hybrid Decision Making Systems to better understand human-machine interaction in the computer science literature. It does not present new experiments, derivations, or findings beyond the organization of existing work. What the paper does well is to highlight the increasing reliance on machine learning for high-stake decisions and the resulting constant interaction between humans and these systems. It notes that several techniques exist but their classification is sparse and goals varied, which is a fair observation. Proposing both a conceptual and technical framework is a logical step for a survey paper aiming to bring some order to the area. The soft spots are mainly around the execution of the taxonomy. The usefulness hinges on whether the classification criteria are clear, the coverage is broad enough, and important distinctions are preserved. The abstract does not provide evidence on these points, so the value is not yet clear. This matches the reader's weakest assumption about organizing varied techniques without loss. If the full paper includes a detailed breakdown with examples from the literature, that would address this. There are no concerns with math, data, or fitting since none are present. This paper is for researchers and practitioners in human-AI collaboration and hybrid systems who want an overview to navigate the field. A reader interested in new algorithms or empirical studies will not find them here. It deserves a serious referee because surveys that offer a coherent taxonomy can help subfield coordination, and the modest claim makes it suitable for review rather than desk rejection.

Referee Report

0 major / 2 minor

Summary. The manuscript is a survey of techniques in the computer science literature for human interaction with machine learning systems. It proposes a taxonomy of Hybrid Decision Making Systems intended to supply both a conceptual and a technical framework for organizing how these interactions are modeled.

Significance. If the taxonomy organizes the literature coherently and with adequate coverage, it could serve as a useful reference point for researchers working on human-AI collaboration and hybrid systems. The paper contains no original empirical results, derivations, or machine-checked proofs; its contribution is therefore entirely synthetic.

minor comments (2)

[Abstract] Abstract: the statement that existing classifications are 'sparse' is not accompanied by any enumeration of prior taxonomies or explicit comparison, leaving the novelty of the proposed framework difficult to gauge from the opening paragraph.
The manuscript would benefit from an explicit statement of the literature search strategy, inclusion criteria, and total number of works reviewed so that readers can assess coverage.

Simulated Author's Rebuttal

0 responses · 0 unresolved

We thank the referee for their review and recommendation of minor revision. No specific major comments were provided in the report, so we have no individual points to address here. We remain ready to incorporate any minor changes identified during the revision process.

Circularity Check

0 steps flagged

No significant circularity: survey taxonomy with no derivations

full rationale

This paper is a literature survey that proposes a conceptual and technical taxonomy for organizing existing work on hybrid human-ML decision systems. It contains no mathematical derivations, fitted parameters, predictions, or load-bearing self-citations. The central claim is the usefulness of the proposed framework for classifying prior techniques; no step reduces by construction to its own inputs or to a self-referential chain. The paper is self-contained as an organizational review against external literature.

Axiom & Free-Parameter Ledger

0 free parameters · 0 axioms · 0 invented entities

This is a survey paper; it introduces no free parameters, axioms, or invented entities beyond the organizational taxonomy itself.

pith-pipeline@v0.9.0 · 5624 in / 890 out tokens · 25344 ms · 2026-05-24T04:09:49.762275+00:00 · methodology

discussion (0)

Reference graph

Works this paper leans on

168 extracted references · 168 canonical work pages · 4 internal anchors

[1]

Albright

A. Albright. 2019. If you give a judge a risk score: evidence from Kentucky bail decisions. Law, Economics, and Business Fellows’ Discussion Paper Series, 85

work page 2019
[2]

automation bias

S. Alon-Barkat and M. Busuioc. 2023. Human–AI interactions in public sector decision making:“automation bias” and “selective adherence” to algorithmic advice.Journal of Public Administration Research and Theory , 33, 1

work page 2023
[3]

J. V. Alves, D. Leitão, S. M. Jesus, M. O. P. Sampaio, J. Liébana, P. Saleiro, M. A. T. Figueiredo, and P. Bizarro. 2024. Cost-sensitive learning to defer to multiple experts with workload constraints. Trans. Mach. Learn. Res

work page 2024
[4]

Amershi, M

S. Amershi, M. Cakmak, W. B. Knox, and T. Kulesza. 2014. Power to the people: the role of humans in interactive machine learning. AI Mag., 35, 4

work page 2014
[5]

Ando and A

S. Ando and A. Yamamoto. 2023. Anomaly detection via few-shot learning on normality. In Machine Learning and Knowledge Discovery in Databases . M.-R. Amini, S. Canu, A. Fischer, T. Guns, P. Kralj Novak, and G. Tsoumakas, (Eds.) Springer International Publishing, Cham, 275–290

work page 2023
[6]

A. N. Angelopoulos, S. Bates, A. Fisch, L. Lei, and T. Schuster. 2022. Conformal risk control

work page 2022
[7]

Asif and F

A. Asif and F. u. Amir Afsar Minhas. 2020. Generalized neural framework for learning with rejection. In 2020 International Joint Conference on Neural Networks (IJCNN)

work page 2020
[8]

Awasthi, A

P. Awasthi, A. Mao, M. Mohri, and Y. Zhong. 2022. H-consistency bounds for surrogate loss minimizers. In International Conference on Machine Learning . PMLR, 1117–1174

work page 2022
[9]

Babuta and M

A. Babuta and M. Oswald. 2021. Machine learning predictive algorithms and the policing of future crimes: governance and oversight. In Predictive Policing and Artificial Intelligence

work page 2021
[10]

Bainbridge

L. Bainbridge. 1983. Ironies of automation. Autom., 19, 6

work page 1983
[11]

Bansal, B

G. Bansal, B. Nushi, E. Kamar, E. Horvitz, and D. S. Weld. 2021. Is the most accurate AI the best teammate? Optimizing AI for teamwork. In AAAI Conference on Artificial Intelligence

work page 2021
[12]

Bansal, B

G. Bansal, B. Nushi, E. Kamar, W. S. Lasecki, D. S. Weld, and E. Horvitz. 2019. Beyond accuracy: the role of mental models in human-ai team performance. In HCOMP

work page 2019
[13]

Bansal and D

G. Bansal and D. S. Weld. 2018. A coverage-based utility model for identifying unknown unknowns. In AAAI

work page 2018
[14]

P. L. Bartlett, M. I. Jordan, and J. D. McAuliffe. 2006. Convexity, classification, and risk bounds. Journal of the American Statistical Association

work page 2006
[15]

P. L. Bartlett and M. H. Wegkamp. 2008. Classification with a reject option using a hinge loss. JMLR

work page 2008
[16]

Battiti and A

R. Battiti and A. Colla. 1994. Democracy in neural nets: voting schemes for classification. Neural Networks

work page 1994
[17]

Bontempelli, F

A. Bontempelli, F. Giunchiglia, A. Passerini, and S. Teso. 2022. Human-in-the-loop handling of knowledge drift. Data Min. Knowl. Discov., 36, 5

work page 2022
[18]

Bostrom, X

K. Bostrom, X. Zhao, S. Chaudhuri, and G. Durrett. 2021. Flexible generation of natural language deductions. In EMNLP

work page 2021
[19]

Brinkrolf and B

J. Brinkrolf and B. Hammer. 2018. Interpretable machine learning with reject option. at - Automatisierungstechnik

work page 2018
[20]

Brinkrolf and B

J. Brinkrolf and B. Hammer. 2017. Probabilistic extension and reject options for pairwise lvq. In International Workshop on Self-Organizing Maps and Learning Vector Quantization, Clustering and Data Visualization

work page 2017
[21]

Bucila, R

C. Bucila, R. Caruana, and A. Niculescu-Mizil. 2006. Model compression. In SIGKDD 2006

work page 2006
[22]

Buçinca, M

Z. Buçinca, M. B. Malaya, and K. Z. Gajos. 2021. To trust or to think: cognitive forcing functions can reduce overreliance on AI in ai-assisted decision-making. Proc. ACM Hum. Comput. Interact. , 5, CSCW1

work page 2021
[23]

Y. Cao, H. Mozannar, L. Feng, H. Wei, and B. An. 2024. In defense of softmax parametrization for calibrated and consistent learning to defer. Advances in Neural Information Processing Systems , 36

work page 2024
[24]

Cecotti and S

H. Cecotti and S. Vajda. 2013. Rejection schemes in multi-class classification - application to handwritten character recognition. In ICDAR. IEEE Computer Society, 445–449

work page 2013
[25]

Charusaie, H

M.-A. Charusaie, H. Mozannar, D. Sontag, and S. Samadi. 2022. Sample efficient learning of predictors that comple- ment humans. In Proc. of the 39th International Conference on Machine Learning

work page 2022
[26]

J. Chen, W. Shi, Z. Fu, S. Cheng, L. Li, and Y. Xiao. 2023. Say what you mean! Large language models speak too positively about negative commonsense knowledge. In ACL (1)

work page 2023
[27]

C. Chow. 1970. On optimum recognition error and reject tradeoff. IEEE Transactions on Information Theory , 16, 1

work page 1970
[28]

M. R. Ciosici, J. Cecil, D. Lee, A. Hedges, M. Freedman, and R. M. Weischedel. 2021. Perhaps ptlms should go to school - A task to assess open book and closed book QA. In EMNLP (1)

work page 2021
[29]

Clark, O

P. Clark, O. Tafjord, and K. Richardson. 2020. Transformers as soft reasoners over language. In IJCAI

work page 2020
[30]

Coenen, A

L. Coenen, A. K. A. Abdullah, and T. Guns. 2020. Probability of default estimation, with a reject option. 2020 IEEE 7th International Conference on Data Science and Advanced Analytics (DSAA)

work page 2020
[31]

D. Cohn, L. Atlas, and R. Ladner. 2016. Improving generalization with active learning. Machine Learning

work page 2016
[32]

D. A. Cohn, L. E. Atlas, and R. E. Ladner. 1994. Improving generalization with active learning. Mach. Learn., 15, 2

work page 1994
[33]

S. Coles. 2001. An introduction to statistical modeling of extreme values . Springer Series in Statistics . London. J. ACM, Vol. 1, No. 1, Article . Publication date: March 2024. 32 Punzi, Pellungrini, Setzu, et al

work page 2001
[34]

Cortes, G

C. Cortes, G. DeSalvo, and M. Mohri. 2016. Learning with rejection. In International Conference on Algorithmic Learning Theory. Springer

work page 2016
[35]

C. Dalitz. 2009. Reject options and confidence measures for knn classifiers. Schriftenreihe des Fachbereichs Elek- trotechnik und Informatik Hochschule Niederrhein , 8, 2009, 16–38

work page 2009
[36]

A. P. Dawid. [n. d.] The well-calibrated bayesian. Journal of the American Statistical Association

work page
[37]

A. De, P. Koley, N. Ganguly, and M. Gomez-Rodriguez. 2020. Regression under human assistance. InAAAI Conference on Artificial Intelligence

work page 2020
[38]

A. De, N. Okati, A. Zarezade, and M. Gomez-Rodriguez. 2020. Classification under human assistance. In AAAI Conference on Artificial Intelligence

work page 2020
[39]

De Stefano, C

C. De Stefano, C. Sansone, and M. Vento. 2000. To reject or not to reject: that is the question-an answer in case of neural classifiers. IEEE Transactions on Systems, Man, and Cybernetics, Part C (Applications and Reviews) , 30, 1

work page 2000
[40]

B. J. Dietvorst, J. P. Simmons, and C. Massey. 2015. Algorithm aversion: people erroneously avoid algorithms after seeing them err. Journal of Experimental Psychology: General , 144, 1

work page 2015
[41]

B. J. Dietvorst, J. P. Simmons, and C. Massey. 2018. Overcoming algorithm aversion: people will use imperfect algorithms if they can (even slightly) modify them. Management science, 64, 3

work page 2018
[42]

Ellis, C

K. Ellis, C. Wong, M. I. Nye, M. Sablé-Meyer, L. Morales, L. B. Hewitt, L. Cary, A. Solar-Lezama, and J. B. Tenenbaum

work page
[43]

Dreamcoder: bootstrapping inductive program synthesis with wake-sleep library learning. In PLDI

work page
[44]

Englich, T

B. Englich, T. Mussweiler, and F. Strack. 2006. Playing dice with criminal sentences: the influence of irrelevant anchors on experts’ judicial decision making.Personality and Social Psychology Bulletin , 32, 2

work page 2006
[45]

A. G. Ferguson. 2020. High-tech surveillance amplifies police bias and overreach. The Conversation, 12

work page 2020
[46]

Fischer, B

L. Fischer, B. Hammer, and H. Wersing. 2015. Efficient rejection strategies for prototype-based classification. Neurocomputing, 169, (Apr. 2015)

work page 2015
[47]

Fischer, B

L. Fischer, B. Hammer, and H. Wersing. 2014. Local rejection strategies for learning vector quantization. In ICANN ’14. Springer

work page 2014
[48]

Franc and D

V. Franc and D. Prusa. 2019. On discriminative learning of prediction uncertainty. In ICML

work page 2019
[49]

R. Gao, M. Saar-Tsechansky, M. De-Arteaga, L. Han, M. K. Lee, and M. Lease. 2021. Human-ai collaboration with bandit feedback. In International Joint Conference on Artificial Intelligence

work page 2021
[50]

Geifman and R

Y. Geifman and R. El-Yaniv. 2017. Selective classification for deep neural networks. InAdvances in Neural Information Processing Systems. Vol. 30

work page 2017
[51]

Geifman and R

Y. Geifman and R. El-Yaniv. 2019. Selectivenet: a deep neural network with an integrated reject option. InInternational Conference on Machine Learning

work page 2019
[52]

Gillespie

T. Gillespie. 2018. Custodians of the Internet: Platforms, content moderation, and the hidden decisions that shape social media

work page 2018
[53]

J. W. Goodell, S. Kumar, W. M. Lim, and D. Pattnaik. 2021. Artificial intelligence and machine learning in finance: identifying foundations, themes, and research clusters from bibliometric analysis. JBEF, 32

work page 2021
[54]

Goodfellow, Y

I. Goodfellow, Y. Bengio, and A. Courville. 2016. Deep Learning. http://www.deeplearningbook.org

work page 2016
[55]

Grandvalet, A

Y. Grandvalet, A. Rakotomamonjy, J. Keshet, and S. Canu. 2008. Support vector machines with a reject option. In Advances in Neural Information Processing Systems . Vol. 21

work page 2008
[56]

Grgic-Hlaca, E

N. Grgic-Hlaca, E. M. Redmiles, K. P. Gummadi, and A. Weller. 2018. Human perceptions of fairness in algorithmic decision making: A case study of criminal risk prediction. In Proc. of the 2018 WWW

work page 2018
[57]

Grgić-Hlača, G

N. Grgić-Hlača, G. Lima, A. Weller, and E. M. Redmiles. 2022. Dimensions of diversity in human perceptions of algorithmic fairness. In Equity and Access in Algorithms, Mechanisms, and Optimization

work page 2022
[58]

Guidotti, A

R. Guidotti, A. Monreale, F. Giannotti, D. Pedreschi, S. Ruggieri, and F. Turini. 2019. Factual and counterfactual explanations for black box decision making. IEEE Intell. Syst., 34, 6

work page 2019
[59]

Guidotti, A

R. Guidotti, A. Monreale, S. Ruggieri, F. Turini, F. Giannotti, and D. Pedreschi. 2019. A survey of methods for explaining black box models. ACM Comput. Surv., 51, 5

work page 2019
[60]

L. Guo, E. M. Daly, O. Alkan, M. Mattetti, O. Cornec, and B. P. Knijnenburg. 2022. Building trust in interactive machine learning via user contributed interpretable rules. In IUI 2022

work page 2022
[61]

Györfi, Z

L. Györfi, Z. Györfi, and I. Vajda. 1979. Bayesian decision with rejection.Problems of Control and Information Theory

work page 1979
[62]

L. K. Hansen, C. Liisberg, and P. Salamon. 1997. The Error-Reject Tradeoff. Open Systems & Information Dynamics , 4, 2, (Apr. 1997)

work page 1997
[63]

Hemmer, S

P. Hemmer, S. Schellhammer, M. Vössing, J. Jakubik, and G. Satzger. 2022. Forming effective human-AI teams: building machine learning models that complement the capabilities of multiple experts. In Proceedings of the Thirty-First International Joint Conference on Artificial Intelligence . (July 2022)

work page 2022
[64]

Hemmer, L

P. Hemmer, L. Thede, M. Vössing, J. Jakubik, and N. Kühl. 2023. Learning to defer with limited expert predictions. Proceedings of the AAAI Conference on Artificial Intelligence , 37, 5, (June 2023). J. ACM, Vol. 1, No. 1, Article . Publication date: March 2024. AI, Meet Human: Learning Paradigms for Hybrid Decision-Making Systems 33

work page 2023
[65]

Hemmer, M

P. Hemmer, M. Westphal, M. Schemmer, S. Vetter, M. Vössing, and G. Satzger. 2023. Human-ai collaboration: the effect of ai delegation on human task performance and task satisfaction. In Proceedings of the 28th International Conference on Intelligent User Interfaces

work page 2023
[66]

Hendrickx, L

K. Hendrickx, L. Perini, D. Van der Plas, W. Meert, and J. Davis. 2024. Machine learning with a reject option: a survey. Machine Learning, 113, 5

work page 2024
[67]

V. J. Hodge and J. Austin. 2014. A Survey of Outlier Detection Methodologies. Artificial Intelligence Review

work page 2014
[68]

J. D. Hwang, C. Bhagavatula, R. L. Bras, J. Da, K. Sakaguchi, A. Bosselut, and Y. Choi. 2021. (comet-) atomic 2020: on symbolic and neural commonsense knowledge graphs. In AAAI

work page 2021
[69]

Johnson, B

J. Johnson, B. Hariharan, L. van der Maaten, L. Fei-Fei, C. L. Zitnick, and R. B. Girshick. 2017. CLEVR: A diagnostic dataset for compositional language and elementary visual reasoning. In CVPR

work page 2017
[70]

Jones, S

E. Jones, S. Sagawa, P. W. Koh, A. Kumar, and P. Liang. 2021. Selective classification can magnify disparities across groups. In ICLR. OpenReview.net

work page 2021
[71]

Jones, S

E. Jones, S. Sagawa, P. W. Koh, A. Kumar, and P. Liang. 2021. Selective classification can magnify disparities across groups. In ICLR 2021

work page 2021
[72]

P. R. M. Júnior, R. M. de Souza, R. de Oliveira Werneck, B. V. Stein, D. V. Pazinato, W. R. de Almeida, O. A. B. Penatti, R. da Silva Torres, and A. Rocha. 2017. Nearest neighbors distance ratio open-set classifier. Machine Learning, 106

work page 2017
[73]

Kassner, O

N. Kassner, O. Tafjord, H. Schütze, and P. Clark. 2021. Beliefbank: adding memory to a pre-trained language model for a systematic notion of belief. In EMNLP 21

work page 2021
[74]

Kaufman, S

S. Kaufman, S. Rosset, C. Perlich, and O. Stitelman. 2012. Leakage in data mining: formulation, detection, and avoidance. ACM Trans. Knowl. Discov. Data, 6, 4

work page 2012
[75]

I’m afraid I can’t let you do that, Doctor

H. Kempt, J.-C. Heilinger, and S. K. Nagel. 2022. “I’m afraid I can’t let you do that, Doctor”: meaningful disagreements with AI in medical contexts. AI & society

work page 2022
[76]

Kerrigan, P

G. Kerrigan, P. Smyth, and M. Steyvers. 2021. Combining human predictions with model probabilities via confusion matrices and calibration. In Advances in Neural Information Processing Systems . Vol. 34

work page 2021
[77]

Keswani, M

V. Keswani, M. Lease, and K. Kenthapadi. 2021. Towards unbiased and accurate deferral to multiple experts. In Proceedings of the 2021 AAAI/ACM Conference on AI, Ethics, and Society

work page 2021
[78]

P. W. Koh and P. Liang. 2017. Understanding black-box predictions via influence functions. InInternational conference on machine learning. PMLR, 1885–1894

work page 2017
[79]

R. Koulu. 2020. Proceduralizing control and discretion: human oversight in artificial intelligence policy. Maastricht Journal of European and Comparative Law , 27, 6

work page 2020
[80]

Lage and F

I. Lage and F. Doshi-Velez. 2020. Learning interpretable concept-based models with human feedback. presented at the International Conference on Machine Learning: Workshop on Human Interpretability in Machine Learnin , 1, 1–11

work page 2020

Showing first 80 references.

[1] [1]

Albright

A. Albright. 2019. If you give a judge a risk score: evidence from Kentucky bail decisions. Law, Economics, and Business Fellows’ Discussion Paper Series, 85

work page 2019

[2] [2]

automation bias

S. Alon-Barkat and M. Busuioc. 2023. Human–AI interactions in public sector decision making:“automation bias” and “selective adherence” to algorithmic advice.Journal of Public Administration Research and Theory , 33, 1

work page 2023

[3] [3]

J. V. Alves, D. Leitão, S. M. Jesus, M. O. P. Sampaio, J. Liébana, P. Saleiro, M. A. T. Figueiredo, and P. Bizarro. 2024. Cost-sensitive learning to defer to multiple experts with workload constraints. Trans. Mach. Learn. Res

work page 2024

[4] [4]

Amershi, M

S. Amershi, M. Cakmak, W. B. Knox, and T. Kulesza. 2014. Power to the people: the role of humans in interactive machine learning. AI Mag., 35, 4

work page 2014

[5] [5]

Ando and A

S. Ando and A. Yamamoto. 2023. Anomaly detection via few-shot learning on normality. In Machine Learning and Knowledge Discovery in Databases . M.-R. Amini, S. Canu, A. Fischer, T. Guns, P. Kralj Novak, and G. Tsoumakas, (Eds.) Springer International Publishing, Cham, 275–290

work page 2023

[6] [6]

A. N. Angelopoulos, S. Bates, A. Fisch, L. Lei, and T. Schuster. 2022. Conformal risk control

work page 2022

[7] [7]

Asif and F

A. Asif and F. u. Amir Afsar Minhas. 2020. Generalized neural framework for learning with rejection. In 2020 International Joint Conference on Neural Networks (IJCNN)

work page 2020

[8] [8]

Awasthi, A

P. Awasthi, A. Mao, M. Mohri, and Y. Zhong. 2022. H-consistency bounds for surrogate loss minimizers. In International Conference on Machine Learning . PMLR, 1117–1174

work page 2022

[9] [9]

Babuta and M

A. Babuta and M. Oswald. 2021. Machine learning predictive algorithms and the policing of future crimes: governance and oversight. In Predictive Policing and Artificial Intelligence

work page 2021

[10] [10]

Bainbridge

L. Bainbridge. 1983. Ironies of automation. Autom., 19, 6

work page 1983

[11] [11]

Bansal, B

G. Bansal, B. Nushi, E. Kamar, E. Horvitz, and D. S. Weld. 2021. Is the most accurate AI the best teammate? Optimizing AI for teamwork. In AAAI Conference on Artificial Intelligence

work page 2021

[12] [12]

Bansal, B

G. Bansal, B. Nushi, E. Kamar, W. S. Lasecki, D. S. Weld, and E. Horvitz. 2019. Beyond accuracy: the role of mental models in human-ai team performance. In HCOMP

work page 2019

[13] [13]

Bansal and D

G. Bansal and D. S. Weld. 2018. A coverage-based utility model for identifying unknown unknowns. In AAAI

work page 2018

[14] [14]

P. L. Bartlett, M. I. Jordan, and J. D. McAuliffe. 2006. Convexity, classification, and risk bounds. Journal of the American Statistical Association

work page 2006

[15] [15]

P. L. Bartlett and M. H. Wegkamp. 2008. Classification with a reject option using a hinge loss. JMLR

work page 2008

[16] [16]

Battiti and A

R. Battiti and A. Colla. 1994. Democracy in neural nets: voting schemes for classification. Neural Networks

work page 1994

[17] [17]

Bontempelli, F

A. Bontempelli, F. Giunchiglia, A. Passerini, and S. Teso. 2022. Human-in-the-loop handling of knowledge drift. Data Min. Knowl. Discov., 36, 5

work page 2022

[18] [18]

Bostrom, X

K. Bostrom, X. Zhao, S. Chaudhuri, and G. Durrett. 2021. Flexible generation of natural language deductions. In EMNLP

work page 2021

[19] [19]

Brinkrolf and B

J. Brinkrolf and B. Hammer. 2018. Interpretable machine learning with reject option. at - Automatisierungstechnik

work page 2018

[20] [20]

Brinkrolf and B

J. Brinkrolf and B. Hammer. 2017. Probabilistic extension and reject options for pairwise lvq. In International Workshop on Self-Organizing Maps and Learning Vector Quantization, Clustering and Data Visualization

work page 2017

[21] [21]

Bucila, R

C. Bucila, R. Caruana, and A. Niculescu-Mizil. 2006. Model compression. In SIGKDD 2006

work page 2006

[22] [22]

Buçinca, M

Z. Buçinca, M. B. Malaya, and K. Z. Gajos. 2021. To trust or to think: cognitive forcing functions can reduce overreliance on AI in ai-assisted decision-making. Proc. ACM Hum. Comput. Interact. , 5, CSCW1

work page 2021

[23] [23]

Y. Cao, H. Mozannar, L. Feng, H. Wei, and B. An. 2024. In defense of softmax parametrization for calibrated and consistent learning to defer. Advances in Neural Information Processing Systems , 36

work page 2024

[24] [24]

Cecotti and S

H. Cecotti and S. Vajda. 2013. Rejection schemes in multi-class classification - application to handwritten character recognition. In ICDAR. IEEE Computer Society, 445–449

work page 2013

[25] [25]

Charusaie, H

M.-A. Charusaie, H. Mozannar, D. Sontag, and S. Samadi. 2022. Sample efficient learning of predictors that comple- ment humans. In Proc. of the 39th International Conference on Machine Learning

work page 2022

[26] [26]

J. Chen, W. Shi, Z. Fu, S. Cheng, L. Li, and Y. Xiao. 2023. Say what you mean! Large language models speak too positively about negative commonsense knowledge. In ACL (1)

work page 2023

[27] [27]

C. Chow. 1970. On optimum recognition error and reject tradeoff. IEEE Transactions on Information Theory , 16, 1

work page 1970

[28] [28]

M. R. Ciosici, J. Cecil, D. Lee, A. Hedges, M. Freedman, and R. M. Weischedel. 2021. Perhaps ptlms should go to school - A task to assess open book and closed book QA. In EMNLP (1)

work page 2021

[29] [29]

Clark, O

P. Clark, O. Tafjord, and K. Richardson. 2020. Transformers as soft reasoners over language. In IJCAI

work page 2020

[30] [30]

Coenen, A

L. Coenen, A. K. A. Abdullah, and T. Guns. 2020. Probability of default estimation, with a reject option. 2020 IEEE 7th International Conference on Data Science and Advanced Analytics (DSAA)

work page 2020

[31] [31]

D. Cohn, L. Atlas, and R. Ladner. 2016. Improving generalization with active learning. Machine Learning

work page 2016

[32] [32]

D. A. Cohn, L. E. Atlas, and R. E. Ladner. 1994. Improving generalization with active learning. Mach. Learn., 15, 2

work page 1994

[33] [33]

S. Coles. 2001. An introduction to statistical modeling of extreme values . Springer Series in Statistics . London. J. ACM, Vol. 1, No. 1, Article . Publication date: March 2024. 32 Punzi, Pellungrini, Setzu, et al

work page 2001

[34] [34]

Cortes, G

C. Cortes, G. DeSalvo, and M. Mohri. 2016. Learning with rejection. In International Conference on Algorithmic Learning Theory. Springer

work page 2016

[35] [35]

C. Dalitz. 2009. Reject options and confidence measures for knn classifiers. Schriftenreihe des Fachbereichs Elek- trotechnik und Informatik Hochschule Niederrhein , 8, 2009, 16–38

work page 2009

[36] [36]

A. P. Dawid. [n. d.] The well-calibrated bayesian. Journal of the American Statistical Association

work page

[37] [37]

A. De, P. Koley, N. Ganguly, and M. Gomez-Rodriguez. 2020. Regression under human assistance. InAAAI Conference on Artificial Intelligence

work page 2020

[38] [38]

A. De, N. Okati, A. Zarezade, and M. Gomez-Rodriguez. 2020. Classification under human assistance. In AAAI Conference on Artificial Intelligence

work page 2020

[39] [39]

De Stefano, C

C. De Stefano, C. Sansone, and M. Vento. 2000. To reject or not to reject: that is the question-an answer in case of neural classifiers. IEEE Transactions on Systems, Man, and Cybernetics, Part C (Applications and Reviews) , 30, 1

work page 2000

[40] [40]

B. J. Dietvorst, J. P. Simmons, and C. Massey. 2015. Algorithm aversion: people erroneously avoid algorithms after seeing them err. Journal of Experimental Psychology: General , 144, 1

work page 2015

[41] [41]

B. J. Dietvorst, J. P. Simmons, and C. Massey. 2018. Overcoming algorithm aversion: people will use imperfect algorithms if they can (even slightly) modify them. Management science, 64, 3

work page 2018

[42] [42]

Ellis, C

K. Ellis, C. Wong, M. I. Nye, M. Sablé-Meyer, L. Morales, L. B. Hewitt, L. Cary, A. Solar-Lezama, and J. B. Tenenbaum

work page

[43] [43]

Dreamcoder: bootstrapping inductive program synthesis with wake-sleep library learning. In PLDI

work page

[44] [44]

Englich, T

B. Englich, T. Mussweiler, and F. Strack. 2006. Playing dice with criminal sentences: the influence of irrelevant anchors on experts’ judicial decision making.Personality and Social Psychology Bulletin , 32, 2

work page 2006

[45] [45]

A. G. Ferguson. 2020. High-tech surveillance amplifies police bias and overreach. The Conversation, 12

work page 2020

[46] [46]

Fischer, B

L. Fischer, B. Hammer, and H. Wersing. 2015. Efficient rejection strategies for prototype-based classification. Neurocomputing, 169, (Apr. 2015)

work page 2015

[47] [47]

Fischer, B

L. Fischer, B. Hammer, and H. Wersing. 2014. Local rejection strategies for learning vector quantization. In ICANN ’14. Springer

work page 2014

[48] [48]

Franc and D

V. Franc and D. Prusa. 2019. On discriminative learning of prediction uncertainty. In ICML

work page 2019

[49] [49]

R. Gao, M. Saar-Tsechansky, M. De-Arteaga, L. Han, M. K. Lee, and M. Lease. 2021. Human-ai collaboration with bandit feedback. In International Joint Conference on Artificial Intelligence

work page 2021

[50] [50]

Geifman and R

Y. Geifman and R. El-Yaniv. 2017. Selective classification for deep neural networks. InAdvances in Neural Information Processing Systems. Vol. 30

work page 2017

[51] [51]

Geifman and R

Y. Geifman and R. El-Yaniv. 2019. Selectivenet: a deep neural network with an integrated reject option. InInternational Conference on Machine Learning

work page 2019

[52] [52]

Gillespie

T. Gillespie. 2018. Custodians of the Internet: Platforms, content moderation, and the hidden decisions that shape social media

work page 2018

[53] [53]

J. W. Goodell, S. Kumar, W. M. Lim, and D. Pattnaik. 2021. Artificial intelligence and machine learning in finance: identifying foundations, themes, and research clusters from bibliometric analysis. JBEF, 32

work page 2021

[54] [54]

Goodfellow, Y

I. Goodfellow, Y. Bengio, and A. Courville. 2016. Deep Learning. http://www.deeplearningbook.org

work page 2016

[55] [55]

Grandvalet, A

Y. Grandvalet, A. Rakotomamonjy, J. Keshet, and S. Canu. 2008. Support vector machines with a reject option. In Advances in Neural Information Processing Systems . Vol. 21

work page 2008

[56] [56]

Grgic-Hlaca, E

N. Grgic-Hlaca, E. M. Redmiles, K. P. Gummadi, and A. Weller. 2018. Human perceptions of fairness in algorithmic decision making: A case study of criminal risk prediction. In Proc. of the 2018 WWW

work page 2018

[57] [57]

Grgić-Hlača, G

N. Grgić-Hlača, G. Lima, A. Weller, and E. M. Redmiles. 2022. Dimensions of diversity in human perceptions of algorithmic fairness. In Equity and Access in Algorithms, Mechanisms, and Optimization

work page 2022

[58] [58]

Guidotti, A

R. Guidotti, A. Monreale, F. Giannotti, D. Pedreschi, S. Ruggieri, and F. Turini. 2019. Factual and counterfactual explanations for black box decision making. IEEE Intell. Syst., 34, 6

work page 2019

[59] [59]

Guidotti, A

R. Guidotti, A. Monreale, S. Ruggieri, F. Turini, F. Giannotti, and D. Pedreschi. 2019. A survey of methods for explaining black box models. ACM Comput. Surv., 51, 5

work page 2019

[60] [60]

L. Guo, E. M. Daly, O. Alkan, M. Mattetti, O. Cornec, and B. P. Knijnenburg. 2022. Building trust in interactive machine learning via user contributed interpretable rules. In IUI 2022

work page 2022

[61] [61]

Györfi, Z

L. Györfi, Z. Györfi, and I. Vajda. 1979. Bayesian decision with rejection.Problems of Control and Information Theory

work page 1979

[62] [62]

L. K. Hansen, C. Liisberg, and P. Salamon. 1997. The Error-Reject Tradeoff. Open Systems & Information Dynamics , 4, 2, (Apr. 1997)

work page 1997

[63] [63]

Hemmer, S

P. Hemmer, S. Schellhammer, M. Vössing, J. Jakubik, and G. Satzger. 2022. Forming effective human-AI teams: building machine learning models that complement the capabilities of multiple experts. In Proceedings of the Thirty-First International Joint Conference on Artificial Intelligence . (July 2022)

work page 2022

[64] [64]

Hemmer, L

P. Hemmer, L. Thede, M. Vössing, J. Jakubik, and N. Kühl. 2023. Learning to defer with limited expert predictions. Proceedings of the AAAI Conference on Artificial Intelligence , 37, 5, (June 2023). J. ACM, Vol. 1, No. 1, Article . Publication date: March 2024. AI, Meet Human: Learning Paradigms for Hybrid Decision-Making Systems 33

work page 2023

[65] [65]

Hemmer, M

P. Hemmer, M. Westphal, M. Schemmer, S. Vetter, M. Vössing, and G. Satzger. 2023. Human-ai collaboration: the effect of ai delegation on human task performance and task satisfaction. In Proceedings of the 28th International Conference on Intelligent User Interfaces

work page 2023

[66] [66]

Hendrickx, L

K. Hendrickx, L. Perini, D. Van der Plas, W. Meert, and J. Davis. 2024. Machine learning with a reject option: a survey. Machine Learning, 113, 5

work page 2024

[67] [67]

V. J. Hodge and J. Austin. 2014. A Survey of Outlier Detection Methodologies. Artificial Intelligence Review

work page 2014

[68] [68]

J. D. Hwang, C. Bhagavatula, R. L. Bras, J. Da, K. Sakaguchi, A. Bosselut, and Y. Choi. 2021. (comet-) atomic 2020: on symbolic and neural commonsense knowledge graphs. In AAAI

work page 2021

[69] [69]

Johnson, B

J. Johnson, B. Hariharan, L. van der Maaten, L. Fei-Fei, C. L. Zitnick, and R. B. Girshick. 2017. CLEVR: A diagnostic dataset for compositional language and elementary visual reasoning. In CVPR

work page 2017

[70] [70]

Jones, S

E. Jones, S. Sagawa, P. W. Koh, A. Kumar, and P. Liang. 2021. Selective classification can magnify disparities across groups. In ICLR. OpenReview.net

work page 2021

[71] [71]

Jones, S

E. Jones, S. Sagawa, P. W. Koh, A. Kumar, and P. Liang. 2021. Selective classification can magnify disparities across groups. In ICLR 2021

work page 2021

[72] [72]

P. R. M. Júnior, R. M. de Souza, R. de Oliveira Werneck, B. V. Stein, D. V. Pazinato, W. R. de Almeida, O. A. B. Penatti, R. da Silva Torres, and A. Rocha. 2017. Nearest neighbors distance ratio open-set classifier. Machine Learning, 106

work page 2017

[73] [73]

Kassner, O

N. Kassner, O. Tafjord, H. Schütze, and P. Clark. 2021. Beliefbank: adding memory to a pre-trained language model for a systematic notion of belief. In EMNLP 21

work page 2021

[74] [74]

Kaufman, S

S. Kaufman, S. Rosset, C. Perlich, and O. Stitelman. 2012. Leakage in data mining: formulation, detection, and avoidance. ACM Trans. Knowl. Discov. Data, 6, 4

work page 2012

[75] [75]

I’m afraid I can’t let you do that, Doctor

H. Kempt, J.-C. Heilinger, and S. K. Nagel. 2022. “I’m afraid I can’t let you do that, Doctor”: meaningful disagreements with AI in medical contexts. AI & society

work page 2022

[76] [76]

Kerrigan, P

G. Kerrigan, P. Smyth, and M. Steyvers. 2021. Combining human predictions with model probabilities via confusion matrices and calibration. In Advances in Neural Information Processing Systems . Vol. 34

work page 2021

[77] [77]

Keswani, M

V. Keswani, M. Lease, and K. Kenthapadi. 2021. Towards unbiased and accurate deferral to multiple experts. In Proceedings of the 2021 AAAI/ACM Conference on AI, Ethics, and Society

work page 2021

[78] [78]

P. W. Koh and P. Liang. 2017. Understanding black-box predictions via influence functions. InInternational conference on machine learning. PMLR, 1885–1894

work page 2017

[79] [79]

R. Koulu. 2020. Proceduralizing control and discretion: human oversight in artificial intelligence policy. Maastricht Journal of European and Comparative Law , 27, 6

work page 2020

[80] [80]

Lage and F

I. Lage and F. Doshi-Velez. 2020. Learning interpretable concept-based models with human feedback. presented at the International Conference on Machine Learning: Workshop on Human Interpretability in Machine Learnin , 1, 1–11

work page 2020