pith. sign in

arxiv: 2402.06287 · v3 · submitted 2024-02-09 · 💻 cs.LG · cs.AI· cs.HC

AI, Meet Human: Learning Paradigms for Hybrid Decision Making Systems

Pith reviewed 2026-05-24 04:09 UTC · model grok-4.3

classification 💻 cs.LG cs.AIcs.HC
keywords hybrid decision makinghuman-machine interactionmachine learningtaxonomysurveyhuman-AI collaboration
0
0 comments X

The pith

A taxonomy organizes the ways humans and machine learning systems interact in decisions.

A machine-rendered reading of the paper's core claim, the machinery that carries it, and where it could break.

The paper surveys scattered techniques from computer science on how people train, use, and collaborate with machine learning models. It groups these techniques into one taxonomy of Hybrid Decision Making Systems that supplies both a conceptual map and technical distinctions. A sympathetic reader would care because clearer categories could help designers and researchers compare approaches that currently sit in separate literatures. The survey treats the taxonomy as a framework for modeling human-machine interaction rather than as a new algorithm or experiment.

Core claim

The authors propose a taxonomy of Hybrid Decision Making Systems that supplies both a conceptual and technical framework for understanding how current computer science literature models interaction between humans and machines.

What carries the argument

The taxonomy of Hybrid Decision Making Systems, which classifies literature on human-ML interaction into coherent categories.

If this is right

  • Researchers gain a shared language for comparing different human-ML collaboration techniques.
  • System designers can select interaction paradigms with explicit awareness of their technical and conceptual differences.
  • Future surveys can build on the same categories rather than starting from scattered classifications.

Where Pith is reading between the lines

These are editorial extensions of the paper, not claims the author makes directly.

  • The taxonomy could serve as a checklist when new hybrid systems are proposed, to check which interaction modes are already covered.
  • If the taxonomy holds, it might reveal under-explored combinations of human and machine roles that current papers overlook.

Load-bearing premise

The varied techniques in the computer science literature on human-ML interaction can be organized into one coherent taxonomy without substantial loss of important distinctions or coverage gaps.

What would settle it

A systematic review that identifies multiple high-impact human-ML interaction methods or goals that fall outside every category in the proposed taxonomy.

Figures

Figures reproduced from arXiv: 2402.06287 by Clara Punzi, Dino Pedreschi, Fosca Giannotti, Mattia Setzu, Roberto Pellungrini.

Figure 1
Figure 1. Figure 1: Paradigms of hybrid systems, where human (circle) and machine (rectangle) steps alternate to form [PITH_FULL_IMAGE:figures/full_fig_p003_1.png] view at source ↗
Figure 2
Figure 2. Figure 2: Overview of the joint learning architecture for the Single-Expert Learning to Defer (L2D-SE) setting, [PITH_FULL_IMAGE:figures/full_fig_p016_2.png] view at source ↗
Figure 3
Figure 3. Figure 3: An example of Question Answering machine with a hard reasoning language. The agent maps the [PITH_FULL_IMAGE:figures/full_fig_p025_3.png] view at source ↗
Figure 4
Figure 4. Figure 4: An example of a Question Answering hybrid system with a soft reasoning language. The [PITH_FULL_IMAGE:figures/full_fig_p027_4.png] view at source ↗
Figure 5
Figure 5. Figure 5: An example of aQuestion Answering hybrid system employing a soft reasoning language and leveraging [PITH_FULL_IMAGE:figures/full_fig_p027_5.png] view at source ↗
read the original abstract

Everyday we increasingly rely on machine learning models to automate and support high-stake tasks and decisions. This growing presence means that humans are now constantly interacting with machine learning-based systems, training and using models everyday. Several different techniques in computer science literature account for the human interaction with machine learning systems, but their classification is sparse and the goals varied. This survey proposes a taxonomy of Hybrid Decision Making Systems, providing both a conceptual and technical framework for understanding how current computer science literature models interaction between humans and machines.

Editorial analysis

A structured set of objections, weighed in public.

Desk editor's note, referee report, simulated authors' rebuttal, and a circularity audit. Tearing a paper down is the easy half of reading it; the pith above is the substance, this is the friction.

Referee Report

0 major / 2 minor

Summary. The manuscript is a survey of techniques in the computer science literature for human interaction with machine learning systems. It proposes a taxonomy of Hybrid Decision Making Systems intended to supply both a conceptual and a technical framework for organizing how these interactions are modeled.

Significance. If the taxonomy organizes the literature coherently and with adequate coverage, it could serve as a useful reference point for researchers working on human-AI collaboration and hybrid systems. The paper contains no original empirical results, derivations, or machine-checked proofs; its contribution is therefore entirely synthetic.

minor comments (2)
  1. [Abstract] Abstract: the statement that existing classifications are 'sparse' is not accompanied by any enumeration of prior taxonomies or explicit comparison, leaving the novelty of the proposed framework difficult to gauge from the opening paragraph.
  2. The manuscript would benefit from an explicit statement of the literature search strategy, inclusion criteria, and total number of works reviewed so that readers can assess coverage.

Simulated Author's Rebuttal

0 responses · 0 unresolved

We thank the referee for their review and recommendation of minor revision. No specific major comments were provided in the report, so we have no individual points to address here. We remain ready to incorporate any minor changes identified during the revision process.

Circularity Check

0 steps flagged

No significant circularity: survey taxonomy with no derivations

full rationale

This paper is a literature survey that proposes a conceptual and technical taxonomy for organizing existing work on hybrid human-ML decision systems. It contains no mathematical derivations, fitted parameters, predictions, or load-bearing self-citations. The central claim is the usefulness of the proposed framework for classifying prior techniques; no step reduces by construction to its own inputs or to a self-referential chain. The paper is self-contained as an organizational review against external literature.

Axiom & Free-Parameter Ledger

0 free parameters · 0 axioms · 0 invented entities

This is a survey paper; it introduces no free parameters, axioms, or invented entities beyond the organizational taxonomy itself.

pith-pipeline@v0.9.0 · 5624 in / 890 out tokens · 25344 ms · 2026-05-24T04:09:49.762275+00:00 · methodology

discussion (0)

Sign in with ORCID, Apple, or X to comment. Anyone can read and Pith papers without signing in.

Reference graph

Works this paper leans on

168 extracted references · 168 canonical work pages · 4 internal anchors

  1. [1]

    Albright

    A. Albright. 2019. If you give a judge a risk score: evidence from Kentucky bail decisions. Law, Economics, and Business Fellows’ Discussion Paper Series, 85

  2. [2]

    automation bias

    S. Alon-Barkat and M. Busuioc. 2023. Human–AI interactions in public sector decision making:“automation bias” and “selective adherence” to algorithmic advice.Journal of Public Administration Research and Theory , 33, 1

  3. [3]

    J. V. Alves, D. Leitão, S. M. Jesus, M. O. P. Sampaio, J. Liébana, P. Saleiro, M. A. T. Figueiredo, and P. Bizarro. 2024. Cost-sensitive learning to defer to multiple experts with workload constraints. Trans. Mach. Learn. Res

  4. [4]

    Amershi, M

    S. Amershi, M. Cakmak, W. B. Knox, and T. Kulesza. 2014. Power to the people: the role of humans in interactive machine learning. AI Mag., 35, 4

  5. [5]

    Ando and A

    S. Ando and A. Yamamoto. 2023. Anomaly detection via few-shot learning on normality. In Machine Learning and Knowledge Discovery in Databases . M.-R. Amini, S. Canu, A. Fischer, T. Guns, P. Kralj Novak, and G. Tsoumakas, (Eds.) Springer International Publishing, Cham, 275–290

  6. [6]

    A. N. Angelopoulos, S. Bates, A. Fisch, L. Lei, and T. Schuster. 2022. Conformal risk control

  7. [7]

    Asif and F

    A. Asif and F. u. Amir Afsar Minhas. 2020. Generalized neural framework for learning with rejection. In 2020 International Joint Conference on Neural Networks (IJCNN)

  8. [8]

    Awasthi, A

    P. Awasthi, A. Mao, M. Mohri, and Y. Zhong. 2022. H-consistency bounds for surrogate loss minimizers. In International Conference on Machine Learning . PMLR, 1117–1174

  9. [9]

    Babuta and M

    A. Babuta and M. Oswald. 2021. Machine learning predictive algorithms and the policing of future crimes: governance and oversight. In Predictive Policing and Artificial Intelligence

  10. [10]

    Bainbridge

    L. Bainbridge. 1983. Ironies of automation. Autom., 19, 6

  11. [11]

    Bansal, B

    G. Bansal, B. Nushi, E. Kamar, E. Horvitz, and D. S. Weld. 2021. Is the most accurate AI the best teammate? Optimizing AI for teamwork. In AAAI Conference on Artificial Intelligence

  12. [12]

    Bansal, B

    G. Bansal, B. Nushi, E. Kamar, W. S. Lasecki, D. S. Weld, and E. Horvitz. 2019. Beyond accuracy: the role of mental models in human-ai team performance. In HCOMP

  13. [13]

    Bansal and D

    G. Bansal and D. S. Weld. 2018. A coverage-based utility model for identifying unknown unknowns. In AAAI

  14. [14]

    P. L. Bartlett, M. I. Jordan, and J. D. McAuliffe. 2006. Convexity, classification, and risk bounds. Journal of the American Statistical Association

  15. [15]

    P. L. Bartlett and M. H. Wegkamp. 2008. Classification with a reject option using a hinge loss. JMLR

  16. [16]

    Battiti and A

    R. Battiti and A. Colla. 1994. Democracy in neural nets: voting schemes for classification. Neural Networks

  17. [17]

    Bontempelli, F

    A. Bontempelli, F. Giunchiglia, A. Passerini, and S. Teso. 2022. Human-in-the-loop handling of knowledge drift. Data Min. Knowl. Discov., 36, 5

  18. [18]

    Bostrom, X

    K. Bostrom, X. Zhao, S. Chaudhuri, and G. Durrett. 2021. Flexible generation of natural language deductions. In EMNLP

  19. [19]

    Brinkrolf and B

    J. Brinkrolf and B. Hammer. 2018. Interpretable machine learning with reject option. at - Automatisierungstechnik

  20. [20]

    Brinkrolf and B

    J. Brinkrolf and B. Hammer. 2017. Probabilistic extension and reject options for pairwise lvq. In International Workshop on Self-Organizing Maps and Learning Vector Quantization, Clustering and Data Visualization

  21. [21]

    Bucila, R

    C. Bucila, R. Caruana, and A. Niculescu-Mizil. 2006. Model compression. In SIGKDD 2006

  22. [22]

    Buçinca, M

    Z. Buçinca, M. B. Malaya, and K. Z. Gajos. 2021. To trust or to think: cognitive forcing functions can reduce overreliance on AI in ai-assisted decision-making. Proc. ACM Hum. Comput. Interact. , 5, CSCW1

  23. [23]

    Y. Cao, H. Mozannar, L. Feng, H. Wei, and B. An. 2024. In defense of softmax parametrization for calibrated and consistent learning to defer. Advances in Neural Information Processing Systems , 36

  24. [24]

    Cecotti and S

    H. Cecotti and S. Vajda. 2013. Rejection schemes in multi-class classification - application to handwritten character recognition. In ICDAR. IEEE Computer Society, 445–449

  25. [25]

    Charusaie, H

    M.-A. Charusaie, H. Mozannar, D. Sontag, and S. Samadi. 2022. Sample efficient learning of predictors that comple- ment humans. In Proc. of the 39th International Conference on Machine Learning

  26. [26]

    J. Chen, W. Shi, Z. Fu, S. Cheng, L. Li, and Y. Xiao. 2023. Say what you mean! Large language models speak too positively about negative commonsense knowledge. In ACL (1)

  27. [27]

    C. Chow. 1970. On optimum recognition error and reject tradeoff. IEEE Transactions on Information Theory , 16, 1

  28. [28]

    M. R. Ciosici, J. Cecil, D. Lee, A. Hedges, M. Freedman, and R. M. Weischedel. 2021. Perhaps ptlms should go to school - A task to assess open book and closed book QA. In EMNLP (1)

  29. [29]

    Clark, O

    P. Clark, O. Tafjord, and K. Richardson. 2020. Transformers as soft reasoners over language. In IJCAI

  30. [30]

    Coenen, A

    L. Coenen, A. K. A. Abdullah, and T. Guns. 2020. Probability of default estimation, with a reject option. 2020 IEEE 7th International Conference on Data Science and Advanced Analytics (DSAA)

  31. [31]

    D. Cohn, L. Atlas, and R. Ladner. 2016. Improving generalization with active learning. Machine Learning

  32. [32]

    D. A. Cohn, L. E. Atlas, and R. E. Ladner. 1994. Improving generalization with active learning. Mach. Learn., 15, 2

  33. [33]

    S. Coles. 2001. An introduction to statistical modeling of extreme values . Springer Series in Statistics . London. J. ACM, Vol. 1, No. 1, Article . Publication date: March 2024. 32 Punzi, Pellungrini, Setzu, et al

  34. [34]

    Cortes, G

    C. Cortes, G. DeSalvo, and M. Mohri. 2016. Learning with rejection. In International Conference on Algorithmic Learning Theory. Springer

  35. [35]

    C. Dalitz. 2009. Reject options and confidence measures for knn classifiers. Schriftenreihe des Fachbereichs Elek- trotechnik und Informatik Hochschule Niederrhein , 8, 2009, 16–38

  36. [36]

    A. P. Dawid. [n. d.] The well-calibrated bayesian. Journal of the American Statistical Association

  37. [37]

    A. De, P. Koley, N. Ganguly, and M. Gomez-Rodriguez. 2020. Regression under human assistance. InAAAI Conference on Artificial Intelligence

  38. [38]

    A. De, N. Okati, A. Zarezade, and M. Gomez-Rodriguez. 2020. Classification under human assistance. In AAAI Conference on Artificial Intelligence

  39. [39]

    De Stefano, C

    C. De Stefano, C. Sansone, and M. Vento. 2000. To reject or not to reject: that is the question-an answer in case of neural classifiers. IEEE Transactions on Systems, Man, and Cybernetics, Part C (Applications and Reviews) , 30, 1

  40. [40]

    B. J. Dietvorst, J. P. Simmons, and C. Massey. 2015. Algorithm aversion: people erroneously avoid algorithms after seeing them err. Journal of Experimental Psychology: General , 144, 1

  41. [41]

    B. J. Dietvorst, J. P. Simmons, and C. Massey. 2018. Overcoming algorithm aversion: people will use imperfect algorithms if they can (even slightly) modify them. Management science, 64, 3

  42. [42]

    Ellis, C

    K. Ellis, C. Wong, M. I. Nye, M. Sablé-Meyer, L. Morales, L. B. Hewitt, L. Cary, A. Solar-Lezama, and J. B. Tenenbaum

  43. [43]

    Dreamcoder: bootstrapping inductive program synthesis with wake-sleep library learning. In PLDI

  44. [44]

    Englich, T

    B. Englich, T. Mussweiler, and F. Strack. 2006. Playing dice with criminal sentences: the influence of irrelevant anchors on experts’ judicial decision making.Personality and Social Psychology Bulletin , 32, 2

  45. [45]

    A. G. Ferguson. 2020. High-tech surveillance amplifies police bias and overreach. The Conversation, 12

  46. [46]

    Fischer, B

    L. Fischer, B. Hammer, and H. Wersing. 2015. Efficient rejection strategies for prototype-based classification. Neurocomputing, 169, (Apr. 2015)

  47. [47]

    Fischer, B

    L. Fischer, B. Hammer, and H. Wersing. 2014. Local rejection strategies for learning vector quantization. In ICANN ’14. Springer

  48. [48]

    Franc and D

    V. Franc and D. Prusa. 2019. On discriminative learning of prediction uncertainty. In ICML

  49. [49]

    R. Gao, M. Saar-Tsechansky, M. De-Arteaga, L. Han, M. K. Lee, and M. Lease. 2021. Human-ai collaboration with bandit feedback. In International Joint Conference on Artificial Intelligence

  50. [50]

    Geifman and R

    Y. Geifman and R. El-Yaniv. 2017. Selective classification for deep neural networks. InAdvances in Neural Information Processing Systems. Vol. 30

  51. [51]

    Geifman and R

    Y. Geifman and R. El-Yaniv. 2019. Selectivenet: a deep neural network with an integrated reject option. InInternational Conference on Machine Learning

  52. [52]

    Gillespie

    T. Gillespie. 2018. Custodians of the Internet: Platforms, content moderation, and the hidden decisions that shape social media

  53. [53]

    J. W. Goodell, S. Kumar, W. M. Lim, and D. Pattnaik. 2021. Artificial intelligence and machine learning in finance: identifying foundations, themes, and research clusters from bibliometric analysis. JBEF, 32

  54. [54]

    Goodfellow, Y

    I. Goodfellow, Y. Bengio, and A. Courville. 2016. Deep Learning. http://www.deeplearningbook.org

  55. [55]

    Grandvalet, A

    Y. Grandvalet, A. Rakotomamonjy, J. Keshet, and S. Canu. 2008. Support vector machines with a reject option. In Advances in Neural Information Processing Systems . Vol. 21

  56. [56]

    Grgic-Hlaca, E

    N. Grgic-Hlaca, E. M. Redmiles, K. P. Gummadi, and A. Weller. 2018. Human perceptions of fairness in algorithmic decision making: A case study of criminal risk prediction. In Proc. of the 2018 WWW

  57. [57]

    Grgić-Hlača, G

    N. Grgić-Hlača, G. Lima, A. Weller, and E. M. Redmiles. 2022. Dimensions of diversity in human perceptions of algorithmic fairness. In Equity and Access in Algorithms, Mechanisms, and Optimization

  58. [58]

    Guidotti, A

    R. Guidotti, A. Monreale, F. Giannotti, D. Pedreschi, S. Ruggieri, and F. Turini. 2019. Factual and counterfactual explanations for black box decision making. IEEE Intell. Syst., 34, 6

  59. [59]

    Guidotti, A

    R. Guidotti, A. Monreale, S. Ruggieri, F. Turini, F. Giannotti, and D. Pedreschi. 2019. A survey of methods for explaining black box models. ACM Comput. Surv., 51, 5

  60. [60]

    L. Guo, E. M. Daly, O. Alkan, M. Mattetti, O. Cornec, and B. P. Knijnenburg. 2022. Building trust in interactive machine learning via user contributed interpretable rules. In IUI 2022

  61. [61]

    Györfi, Z

    L. Györfi, Z. Györfi, and I. Vajda. 1979. Bayesian decision with rejection.Problems of Control and Information Theory

  62. [62]

    L. K. Hansen, C. Liisberg, and P. Salamon. 1997. The Error-Reject Tradeoff. Open Systems & Information Dynamics , 4, 2, (Apr. 1997)

  63. [63]

    Hemmer, S

    P. Hemmer, S. Schellhammer, M. Vössing, J. Jakubik, and G. Satzger. 2022. Forming effective human-AI teams: building machine learning models that complement the capabilities of multiple experts. In Proceedings of the Thirty-First International Joint Conference on Artificial Intelligence . (July 2022)

  64. [64]

    Hemmer, L

    P. Hemmer, L. Thede, M. Vössing, J. Jakubik, and N. Kühl. 2023. Learning to defer with limited expert predictions. Proceedings of the AAAI Conference on Artificial Intelligence , 37, 5, (June 2023). J. ACM, Vol. 1, No. 1, Article . Publication date: March 2024. AI, Meet Human: Learning Paradigms for Hybrid Decision-Making Systems 33

  65. [65]

    Hemmer, M

    P. Hemmer, M. Westphal, M. Schemmer, S. Vetter, M. Vössing, and G. Satzger. 2023. Human-ai collaboration: the effect of ai delegation on human task performance and task satisfaction. In Proceedings of the 28th International Conference on Intelligent User Interfaces

  66. [66]

    Hendrickx, L

    K. Hendrickx, L. Perini, D. Van der Plas, W. Meert, and J. Davis. 2024. Machine learning with a reject option: a survey. Machine Learning, 113, 5

  67. [67]

    V. J. Hodge and J. Austin. 2014. A Survey of Outlier Detection Methodologies. Artificial Intelligence Review

  68. [68]

    J. D. Hwang, C. Bhagavatula, R. L. Bras, J. Da, K. Sakaguchi, A. Bosselut, and Y. Choi. 2021. (comet-) atomic 2020: on symbolic and neural commonsense knowledge graphs. In AAAI

  69. [69]

    Johnson, B

    J. Johnson, B. Hariharan, L. van der Maaten, L. Fei-Fei, C. L. Zitnick, and R. B. Girshick. 2017. CLEVR: A diagnostic dataset for compositional language and elementary visual reasoning. In CVPR

  70. [70]

    Jones, S

    E. Jones, S. Sagawa, P. W. Koh, A. Kumar, and P. Liang. 2021. Selective classification can magnify disparities across groups. In ICLR. OpenReview.net

  71. [71]

    Jones, S

    E. Jones, S. Sagawa, P. W. Koh, A. Kumar, and P. Liang. 2021. Selective classification can magnify disparities across groups. In ICLR 2021

  72. [72]

    P. R. M. Júnior, R. M. de Souza, R. de Oliveira Werneck, B. V. Stein, D. V. Pazinato, W. R. de Almeida, O. A. B. Penatti, R. da Silva Torres, and A. Rocha. 2017. Nearest neighbors distance ratio open-set classifier. Machine Learning, 106

  73. [73]

    Kassner, O

    N. Kassner, O. Tafjord, H. Schütze, and P. Clark. 2021. Beliefbank: adding memory to a pre-trained language model for a systematic notion of belief. In EMNLP 21

  74. [74]

    Kaufman, S

    S. Kaufman, S. Rosset, C. Perlich, and O. Stitelman. 2012. Leakage in data mining: formulation, detection, and avoidance. ACM Trans. Knowl. Discov. Data, 6, 4

  75. [75]

    I’m afraid I can’t let you do that, Doctor

    H. Kempt, J.-C. Heilinger, and S. K. Nagel. 2022. “I’m afraid I can’t let you do that, Doctor”: meaningful disagreements with AI in medical contexts. AI & society

  76. [76]

    Kerrigan, P

    G. Kerrigan, P. Smyth, and M. Steyvers. 2021. Combining human predictions with model probabilities via confusion matrices and calibration. In Advances in Neural Information Processing Systems . Vol. 34

  77. [77]

    Keswani, M

    V. Keswani, M. Lease, and K. Kenthapadi. 2021. Towards unbiased and accurate deferral to multiple experts. In Proceedings of the 2021 AAAI/ACM Conference on AI, Ethics, and Society

  78. [78]

    P. W. Koh and P. Liang. 2017. Understanding black-box predictions via influence functions. InInternational conference on machine learning. PMLR, 1885–1894

  79. [79]

    R. Koulu. 2020. Proceduralizing control and discretion: human oversight in artificial intelligence policy. Maastricht Journal of European and Comparative Law , 27, 6

  80. [80]

    Lage and F

    I. Lage and F. Doshi-Velez. 2020. Learning interpretable concept-based models with human feedback. presented at the International Conference on Machine Learning: Workshop on Human Interpretability in Machine Learnin , 1, 1–11

Showing first 80 references.