Recognition: unknown
Optimized Deferral for Imbalanced Settings
Pith reviewed 2026-05-07 07:38 UTC · model grok-4.3
The pith
Casting deferral optimization as cost-sensitive learning over input-expert pairs yields algorithms that handle expert imbalance better.
A machine-rendered reading of the paper's core claim, the machinery that carries it, and where it could break.
Core claim
We cast the deferral loss optimization as a novel cost-sensitive learning problem over the input-expert domain. We derive new margin-based loss functions and guarantees tailored to this setting, and develop novel algorithms for cost-sensitive learning. Leveraging these results, we design principled deferral algorithms, MILD (Margin-based Imbalanced Learning to Defer), specifically suited for expert imbalance settings.
What carries the argument
The cost-sensitive learning formulation over the input-expert domain, which turns deferral into a weighted classification problem whose weights encode expert imbalance and enables margin-based losses with provable guarantees.
If this is right
- MILD outperforms existing deferral baselines on image classification tasks that exhibit expert imbalance.
- MILD improves routing accuracy and efficiency when directing queries to collections of LLMs.
- The new margin-based losses supply generalization bounds specific to the imbalanced deferral setting.
- The cost-sensitive algorithms developed for the input-expert domain can be reused for other routing or selection tasks with skewed expert usage.
Where Pith is reading between the lines
- The same input-expert cost-sensitive view could be applied to deferral pipelines that involve more than one deferral stage.
- Similar cost-sensitive reductions might help balance load across heterogeneous models in distributed inference systems.
- Empirical tests that vary the degree of imbalance while holding other factors fixed would clarify how MILD scales with skew severity.
- The approach suggests examining whether cost-sensitive losses can also mitigate imbalance when the experts themselves are being trained rather than fixed.
Load-bearing premise
That treating deferral as cost-sensitive classification in the input-expert product space produces loss functions and algorithms whose theoretical guarantees translate into measurable gains on real imbalanced data.
What would settle it
Train MILD and standard two-stage deferral baselines on a controlled dataset with known expert imbalance, then measure whether MILD produces a statistically significant drop in overall error rate or deferral cost; failure to do so would refute the central claim.
Figures
read the original abstract
Learning algorithms can be significantly improved by routing complex or uncertain inputs to specialized experts, balancing accuracy with computational cost. This approach, known as learning to defer, is essential in domains like natural language generation, medical diagnosis, and computer vision, where an effective deferral can reduce errors at low extra resource consumption. However, the two-stage learning to defer setting, which leverages existing predictors such as a collection of LLMs or other classifiers, often faces challenges due to an expert imbalance problem. This imbalance can lead to suboptimal performance, with deferral algorithms favoring the majority expert. We present a comprehensive study of two-stage learning to defer in expert imbalance settings. We cast the deferral loss optimization as a novel cost-sensitive learning problem over the input-expert domain. We derive new margin-based loss functions and guarantees tailored to this setting, and develop novel algorithms for cost-sensitive learning. Leveraging these results, we design principled deferral algorithms, MILD (Margin-based Imbalanced Learning to Defer), specifically suited for expert imbalance settings. Extensive experiments demonstrate the effectiveness of our approach, showing clear improvements over existing baselines on both image classification and real-world Large Language Model (LLM) routing tasks.
Editorial analysis
A structured set of objections, weighed in public.
Referee Report
Summary. The manuscript examines two-stage learning to defer under expert imbalance, where policies tend to favor majority experts. It reformulates deferral optimization as cost-sensitive learning over the joint input-expert domain, derives new margin-based surrogate losses together with generalization guarantees, develops supporting algorithms for cost-sensitive learning, and introduces the MILD algorithm. Experiments on image classification and LLM routing tasks are reported to show improvements over existing baselines.
Significance. If the derived margin-based losses and associated guarantees are valid, the work supplies a principled, cost-sensitive treatment of expert imbalance that is directly relevant to practical deferral settings such as LLM routing. The modeling choice of operating in the input-expert space is coherent with existing cost-sensitive techniques and the empirical evaluation on both vision and language tasks provides concrete evidence of utility. The derivation of tailored losses and the focus on imbalance constitute the primary contributions.
minor comments (4)
- [§3.1] §3.1, Definition 1: the cost matrix C(x, e) is introduced without an explicit statement of how the imbalance ratios are encoded; a short paragraph clarifying the mapping from observed expert frequencies to the cost entries would improve readability.
- [§4.2] §4.2, Theorem 2: the generalization bound is stated in terms of the Rademacher complexity of the joint hypothesis class; it would be helpful to include a brief comparison (one sentence) to the corresponding bound for standard cost-sensitive classification to highlight the novelty of the input-expert formulation.
- [Figure 3] Figure 3 and Table 2: axis labels and legend entries use inconsistent abbreviations (e.g., “MILD” vs. “Mild”); uniform notation across all figures and tables is needed.
- [§5.3] §5.3: the LLM routing experiments report accuracy and deferral rate but do not include a statistical significance test across the five random seeds; adding p-values or confidence intervals would strengthen the empirical claims.
Simulated Author's Rebuttal
We thank the referee for their careful reading of the manuscript and for the positive evaluation, including the accurate summary of our contributions and the recommendation for minor revision. The referee correctly identifies the core technical approach: reformulating two-stage deferral under expert imbalance as cost-sensitive learning over the input-expert domain, deriving margin-based surrogate losses with generalization guarantees, and introducing the MILD algorithm. We will incorporate any minor suggestions in the revised version.
Circularity Check
No significant circularity; derivation self-contained via standard cost-sensitive modeling
full rationale
The paper casts deferral loss optimization as a cost-sensitive learning problem over the input-expert domain, then derives new margin-based losses, guarantees, and the MILD algorithm from that formulation. This modeling step is a coherent extension of existing imbalance-handling techniques rather than a self-definitional loop or a fitted parameter renamed as a prediction. The abstract explicitly presents the losses and algorithms as derived results, with no indication that they reduce by construction to the input data or to prior self-citations that bear the central claim. Experiments on image classification and LLM routing tasks supply external validation outside the derivation. No load-bearing equation or uniqueness theorem is shown to collapse to its own inputs.
Axiom & Free-Parameter Ledger
Reference graph
Works this paper leans on
-
[1]
H -consistency bounds for surrogate loss minimizers
Awasthi, P., Mao, A., Mohri, M., and Zhong, Y. H -consistency bounds for surrogate loss minimizers. In International Conference on Machine Learning, pp.\ 1117--1174, 2022 a
2022
-
[2]
Multi-class H -consistency bounds
Awasthi, P., Mao, A., Mohri, M., and Zhong, Y. Multi-class H -consistency bounds. In Advances in Neural Information Processing Systems, pp.\ 782--795, 2022 b
2022
-
[3]
L., Jordan, M
Bartlett, P. L., Jordan, M. I., and McAuliffe, J. D. Convexity, classification, and risk bounds. Journal of the American Statistical Association, 101 0 (473): 0 138--156, 2006
2006
-
[4]
Bartlett, P. L., Foster, D. J., and Telgarsky, M. Spectrally-normalized margin bounds for neural networks. CoRR, abs/1706.08498, 2017
-
[5]
Sparks of Artificial General Intelligence: Early experiments with GPT-4
Bubeck, S., Chandrasekaran, V., Eldan, R., Gehrke, J., Horvitz, E., Kamar, E., Lee, P., Lee, Y. T., Li, Y., Lundberg, S., et al. Sparks of artificial general intelligence: Early experiments with gpt-4. arXiv preprint arXiv:2303.12712, 2023
work page internal anchor Pith review arXiv 2023
-
[6]
Learning imbalanced datasets with label-distribution-aware margin loss
Cao, K., Wei, C., Gaidon, A., Arechiga, N., and Ma, T. Learning imbalanced datasets with label-distribution-aware margin loss. In Advances in Neural Information Processing Systems, 2019
2019
-
[7]
Generalizing consistent multi-class classification with rejection to be compatible with arbitrary losses
Cao, Y., Cai, T., Feng, L., Gu, L., Gu, J., An, B., Niu, G., and Sugiyama, M. Generalizing consistent multi-class classification with rejection to be compatible with arbitrary losses. In Advances in Neural Information Processing Systems, 2022
2022
-
[8]
Sample efficient learning of predictors that complement humans
Charusaie, M.-A., Mozannar, H., Sontag, D., and Samadi, S. Sample efficient learning of predictors that complement humans. In International Conference on Machine Learning, pp.\ 2972--3005, 2022
2022
-
[9]
V., Bowyer, K
Chawla, N. V., Bowyer, K. W., Hall, L. O., and Kegelmeyer, W. P. SMOTE : synthetic minority over-sampling technique. Journal of Artificial Intelligence Research, 16: 0 321--357, 2002
2002
-
[10]
Regression with cost-based rejection
Cheng, X., Cao, Y., Wang, H., Wei, H., An, B., and Feng, L. Regression with cost-based rejection. In Advances in Neural Information Processing Systems, 2023
2023
-
[11]
Learning with rejection
Cortes, C., DeSalvo, G., and Mohri, M. Learning with rejection. In International Conference on Algorithmic Learning Theory, pp.\ 67--82, 2016 a
2016
-
[12]
Boosting with abstention
Cortes, C., DeSalvo, G., and Mohri, M. Boosting with abstention. In Advances in Neural Information Processing Systems, pp.\ 1660--1668, 2016 b
2016
-
[13]
Structured prediction theory based on factor graph complexity
Cortes, C., Kuznetsov, V., Mohri, M., and Yang, S. Structured prediction theory based on factor graph complexity. In Advances in Neural Information Processing Systems, 2016 c
2016
-
[14]
Adanet: Adaptive structural learning of artificial neural networks
Cortes, C., Gonzalvo, X., Kuznetsov, V., Mohri, M., and Yang, S. Adanet: Adaptive structural learning of artificial neural networks. In International Conference on Machine Learning, pp.\ 874--883, 2017
2017
-
[15]
Theory and algorithms for learning with rejection in binary classification
Cortes, C., DeSalvo, G., and Mohri, M. Theory and algorithms for learning with rejection in binary classification. Annals of Mathematics and Artificial Intelligence, 92 0 (2): 0 277--315, 2024 a
2024
-
[16]
Cardinality-aware set prediction and top- k classification
Cortes, C., Mao, A., Mohri, C., Mohri, M., and Zhong, Y. Cardinality-aware set prediction and top- k classification. In Advances in Neural Information Processing Systems, 2024 b
2024
-
[17]
Balancing the scales: A theoretical and algorithmic framework for learning from imbalanced data
Cortes, C., Mao, A., Mohri, M., and Zhong, Y. Balancing the scales: A theoretical and algorithmic framework for learning from imbalanced data. In International Conference on Machine Learning, 2025
2025
-
[18]
A theoretical framework for modular learning of robust generative models
Cortes, C., Mohri, M., and Zhong, Y. A theoretical framework for modular learning of robust generative models. In International Conference on Machine Learning, 2026
2026
-
[19]
Parametric contrastive learning
Cui, J., Zhong, Z., Liu, S., Yu, B., and Jia, J. Parametric contrastive learning. In International Conference on Computer Vision, 2021
2021
-
[20]
Reslt: Residual learning for long-tailed recognition
Cui, J., Liu, S., Tian, Z., Zhong, Z., and Jia, J. Reslt: Residual learning for long-tailed recognition. IEEE Transactions on Pattern Analysis and Machine Intelligence, 2022
2022
-
[21]
Class-balanced loss based on effective number of samples
Cui, Y., Jia, M., Lin, T.-Y., Song, Y., and Belongie, S. Class-balanced loss based on effective number of samples. In Proceedings of the IEEE Conference on Computer Vision and Pattern Recognition, pp.\ 9268--9277, 2019
2019
-
[22]
Regression under human assistance
De, A., Koley, P., Ganguly, N., and Gomez-Rodriguez, M. Regression under human assistance. In Proceedings of the AAAI Conference on Artificial Intelligence, pp.\ 2611--2620, 2020
2020
-
[23]
Budgeted multiple-expert deferral
DeSalvo, G., Mohri, C., Mohri, M., and Zhong, Y. Budgeted multiple-expert deferral. arXiv preprint arXiv:2510.26706, 2025
-
[24]
Simpro: A simple probabilistic framework towards realistic long-tailed semi-supervised learning
Du, C., Han, Y., and Huang, G. Simpro: A simple probabilistic framework towards realistic long-tailed semi-supervised learning. In International Conference on Machine Learning, 2024
2024
-
[25]
and Wiener, Y
El-Yaniv, R. and Wiener, Y. Active learning via perfect selective classification. Journal of Machine Learning Research, 13 0 (2), 2012
2012
-
[26]
El-Yaniv, R. et al. On the foundations of noise-free selective classification. Journal of Machine Learning Research, 11 0 (5), 2010
2010
-
[27]
The foundations of cost-sensitive learning
Elkan, C. The foundations of cost-sensitive learning. In International Joint Conference on Artificial Intelligence, 2001
2001
-
[28]
A multiple resampling method for learning from imbalanced data sets
Estabrooks, A., Jo, T., and Japkowicz, N. A multiple resampling method for learning from imbalanced data sets. Computational Intelligence, 20 0 (1): 0 18--36, 2004
2004
-
[29]
Learning with average top-k loss
Fan, Y., Lyu, S., Ying, Y., and Hu, B. Learning with average top-k loss. In Advances in Neural Information Processing Systems, pp.\ 497--505, 2017
2017
-
[30]
Gabidolla, M., Zharmagambetov, A., and Carreira - Perpi \ n \' a n, M. \' A . Beyond the ROC curve: Classification trees using cost-optimal curves, with application to imbalanced datasets. In International Conference on Machine Learning, 2024
2024
-
[31]
Enhancing minority classes by mixing: an adaptative optimal transport approach for long-tailed classification
Gao, J., Zhao, H., Li, Z., and Guo, D. Enhancing minority classes by mixing: an adaptative optimal transport approach for long-tailed classification. In Advances in Neural Information Processing Systems, 2023
2023
-
[32]
Distribution alignment optimization through neural collapse for long-tailed classification
Gao, J., Zhao, H., dan Guo, D., and Zha, H. Distribution alignment optimization through neural collapse for long-tailed classification. In International Conference on Machine Learning, 2024
2024
-
[33]
and El-Yaniv, R
Geifman, Y. and El-Yaniv, R. Selective classification for deep neural networks. In Advances in Neural Information Processing Systems, 2017
2017
-
[34]
and El-Yaniv, R
Geifman, Y. and El-Yaniv, R. Selectivenet: A deep neural network with an integrated reject option. In International Conference on Machine Learning, pp.\ 2151--2159, 2019
2019
-
[35]
Wrapped cauchy distributed angular softmax for long-tailed visual recognition
Han, B. Wrapped cauchy distributed angular softmax for long-tailed visual recognition. In International Conference on Machine Learning, pp.\ 12368--12388, 2023
2023
-
[36]
Borderline-smote: a new over-sampling method in imbalanced data sets learning
Han, H., Wang, W.-Y., and Mao, B.-H. Borderline-smote: a new over-sampling method in imbalanced data sets learning. In International Conference on Intelligent Computing, pp.\ 878--887, 2005
2005
-
[37]
Deep residual learning for image recognition
He, K., Zhang, X., Ren, S., and Sun, J. Deep residual learning for image recognition. In Proceedings of the IEEE Conference on Computer Vision and Pattern Recognition, pp.\ 770--778, 2016
2016
-
[38]
Pengcheng He, Jianfeng Gao, and Weizhu Chen
He, P., Gao, J., and Chen, W. Debertav3: Improving deberta using electra-style pre-training with gradient-disentangled embedding sharing. arXiv preprint arXiv:2111.09543, 2021
-
[39]
Measuring Massive Multitask Language Understanding
Hendrycks, D., Burns, C., Basart, S., Zou, A., Mazeika, M., Song, D., and Steinhardt, J. Measuring massive multitask language understanding. arXiv preprint arXiv:2009.03300, 2020
work page internal anchor Pith review arXiv 2009
-
[40]
Disentangling label distribution for long-tailed visual recognition
Hong, Y., Han, S., Choi, K., Seo, S., Kim, B., and Chang, B. Disentangling label distribution for long-tailed visual recognition. In Proceedings of the IEEE Conference on Computer Vision and Pattern Recognition, 2021
2021
-
[41]
Cost-sensitive support vector machines
Iranmehr, A., Masnadi - Shirazi, H., and Vasconcelos, N. Cost-sensitive support vector machines. Neurocomputing, 343: 0 50--64, 2019
2019
-
[42]
A., Brown, M., Yang, M.-H., Wang, L., and Gong, B
Jamal, M. A., Brown, M., Yang, M.-H., Wang, L., and Gong, B. Rethinking class-balanced methods for long-tailed visual recognition from a domain adaptation perspective. In Proceedings of the IEEE Conference on Computer Vision and Pattern Recognition, pp.\ 7610--7619, 2020
2020
-
[43]
Risk-controlled selective prediction for regression deep neural network models
Jiang, W., Zhao, Y., and Wang, Z. Risk-controlled selective prediction for regression deep neural network models. In International Joint Conference on Neural Networks, pp.\ 1--8, 2020
2020
-
[44]
Balanced meta-softmax for long-tailed visual recognition
Jiawei, R., Yu, C., Ma, X., Zhao, H., Yi, S., et al. Balanced meta-softmax for long-tailed visual recognition. In Advances in Neural Information Processing Systems, 2020
2020
-
[45]
Decoupling representation and classifier for long-tailed recognition
Kang, B., Xie, S., Rohrbach, M., Yan, Z., Gordo, A., Feng, J., and Kalantidis, Y. Decoupling representation and classifier for long-tailed recognition. In International Conference on Learning Representations, 2020
2020
-
[46]
Maximum class separation as inductive bias in one matrix
Kasarla, T., Burghouts, G., Van Spengler, M., Van Der Pol, E., Cucchiara, R., and Mettes, P. Maximum class separation as inductive bias in one matrix. In Advances in Neural Information Processing Systems, pp.\ 19553--19566, 2022
2022
-
[47]
Towards unbiased and accurate deferral to multiple experts
Keswani, V., Lease, M., and Kenthapadi, K. Towards unbiased and accurate deferral to multiple experts. In Proceedings of the AAAI/ACM Conference on AI, Ethics, and Society, pp.\ 154--165, 2021
2021
-
[48]
W., Shen, J., and Shao, L
Khan, S., Hayat, M., Zamir, S. W., Shen, J., and Shao, L. Striking the right balance with uncertainty. In Proceedings of the IEEE Conference on Computer Vision and Pattern Recognition, pp.\ 103--112, 2019
2019
-
[49]
Kingma, D. P. and Ba, J. Adam: A method for stochastic optimization. arXiv preprint arXiv:1412.6980, 2014
work page internal anchor Pith review arXiv 2014
-
[50]
R., Paraskevas, O., Oymak, S., and Thrampoulidis, C
Kini, G. R., Paraskevas, O., Oymak, S., and Thrampoulidis, C. Label-imbalanced and group-sensitive classification under overparameterization. In Advances in Neural Information Processing Systems, pp.\ 18970--18983, 2021
2021
-
[51]
Learning multiple layers of features from tiny images
Krizhevsky, A. Learning multiple layers of features from tiny images. Technical report, Toronto University, 2009
2009
-
[52]
and Matwin, S
Kubat, M. and Matwin, S. Addressing the curse of imbalanced training sets: One-sided selection. In International Conference on Machine Learning, 1997
1997
-
[53]
and Yang, X
Le, Y. and Yang, X. Tiny imagenet visual recognition challenge. CS 231N, 7 0 (7): 0 3, 2015
2015
-
[54]
Size-invariance matters: Rethinking metrics and losses for imbalanced multi-object salient object detection
Li, F., Xu, Q., Bao, S., Yang, Z., Cong, R., Cao, X., and Huang, Q. Size-invariance matters: Rethinking metrics and losses for imbalanced multi-object salient object detection. In International Conference on Machine Learning, 2024
2024
-
[55]
When no-rejection learning is optimal for regression with rejection
Li, X., Liu, S., Sun, C., and Wang, H. When no-rejection learning is optimal for regression with rejection. arXiv preprint arXiv:2307.02932, 2023
-
[56]
Focal loss for dense object detection
Lin, T.-Y., Goyal, P., Girshick, R., He, K., and Doll \'a r, P. Focal loss for dense object detection. In International Conference on Computer Vision, pp.\ 2980--2988, 2017
2017
-
[57]
Incorporating uncertainty in learning to defer algorithms for safe computer-aided diagnosis
Liu, J., Gallego, B., and Barbieri, S. Incorporating uncertainty in learning to defer algorithms for safe computer-aided diagnosis. Scientific Reports, 12 0 (1): 0 1762, 2022
2022
-
[58]
Elta: An enhancer against long-tail for aesthetics-oriented models
Liu, L., He, S., Ming, A., Xie, R., and Ma, H. Elta: An enhancer against long-tail for aesthetics-oriented models. In International Conference on Machine Learning, 2024
2024
-
[59]
Exploratory undersampling for class-imbalance learning
Liu, X.-Y., Wu, J., and Zhou, Z.-H. Exploratory undersampling for class-imbalance learning. IEEE Transactions on Systems, Man, and Cybernetics, 39 0 (2): 0 539--550, 2008
2008
-
[60]
Liu, Z., Miao, Z., Zhan, X., Wang, J., Gong, B., and Yu, S. X. Large-scale long-tailed recognition in an open world. In Proceedings of the IEEE Conference on Computer Vision and Pattern Recognition, pp.\ 2537--2546, 2019
2019
-
[61]
Decoupled Weight Decay Regularization
Loshchilov, I. and Hutter, F. Decoupled weight decay regularization. arXiv preprint arXiv:1711.05101, 2017
work page internal anchor Pith review arXiv 2017
-
[62]
Learning adversarially fair and transferable representations
Madras, D., Creager, E., Pitassi, T., and Zemel, R. Learning adversarially fair and transferable representations. arXiv preprint arXiv:1802.06309, 2018
-
[63]
Theory and Algorithms for Learning with Multi-Class Abstention and Multi-Expert Deferral
Mao, A. Theory and Algorithms for Learning with Multi-Class Abstention and Multi-Expert Deferral. PhD thesis, New York University, 2025
2025
-
[64]
Two-stage learning to defer with multiple experts
Mao, A., Mohri, C., Mohri, M., and Zhong, Y. Two-stage learning to defer with multiple experts. In Advances in Neural Information Processing Systems, 2023 a
2023
-
[65]
H -consistency bounds: Characterization and extensions
Mao, A., Mohri, M., and Zhong, Y. H -consistency bounds: Characterization and extensions. In Advances in Neural Information Processing Systems, 2023 b
2023
-
[66]
H -consistency bounds for pairwise misranking loss surrogates
Mao, A., Mohri, M., and Zhong, Y. H -consistency bounds for pairwise misranking loss surrogates. In International Conference on Machine learning, 2023 c
2023
-
[67]
Ranking with abstention
Mao, A., Mohri, M., and Zhong, Y. Ranking with abstention. In ICML 2023 Workshop The Many Facets of Preference-Based Learning, 2023 d
2023
-
[68]
Cross-entropy loss functions: Theoretical analysis and applications
Mao, A., Mohri, M., and Zhong, Y. Cross-entropy loss functions: Theoretical analysis and applications. In International Conference on Machine Learning, 2023 e
2023
-
[69]
Structured prediction with stronger consistency guarantees
Mao, A., Mohri, M., and Zhong, Y. Structured prediction with stronger consistency guarantees. In Advances in Neural Information Processing Systems, pp.\ 46903--46937, 2023 f
2023
-
[70]
Principled approaches for learning to defer with multiple experts
Mao, A., Mohri, M., and Zhong, Y. Principled approaches for learning to defer with multiple experts. In International Symposium on Artificial Intelligence and Mathematics, 2024 a
2024
-
[71]
Predictor-rejector multi-class abstention: Theoretical analysis and algorithms
Mao, A., Mohri, M., and Zhong, Y. Predictor-rejector multi-class abstention: Theoretical analysis and algorithms. In International Conference on Algorithmic Learning Theory, pp.\ 822--867, 2024 b
2024
-
[72]
Theoretically grounded loss functions and algorithms for score-based multi-class abstention
Mao, A., Mohri, M., and Zhong, Y. Theoretically grounded loss functions and algorithms for score-based multi-class abstention. In International Conference on Artificial Intelligence and Statistics, pp.\ 4753--4761, 2024 c
2024
-
[73]
H -consistency guarantees for regression
Mao, A., Mohri, M., and Zhong, Y. H -consistency guarantees for regression. In International Conference on Machine Learning, pp.\ 34712--34737, 2024 d
2024
-
[74]
Multi-label learning with stronger consistency guarantees
Mao, A., Mohri, M., and Zhong, Y. Multi-label learning with stronger consistency guarantees. In Advances in Neural Information Processing Systems, 2024 e
2024
-
[75]
Regression with multi-expert deferral
Mao, A., Mohri, M., and Zhong, Y. Regression with multi-expert deferral. In International Conference on Machine Learning, pp.\ 34738--34759, 2024 f
2024
-
[76]
A universal growth rate for learning with smooth surrogate losses
Mao, A., Mohri, M., and Zhong, Y. A universal growth rate for learning with smooth surrogate losses. In Advances in Neural Information Processing Systems, 2024 g
2024
-
[77]
Realizable H -consistent and B ayes-consistent loss functions for learning to defer
Mao, A., Mohri, M., and Zhong, Y. Realizable H -consistent and B ayes-consistent loss functions for learning to defer. In Advances in Neural Information Processing Systems, 2024 h
2024
-
[78]
Mastering multiple-expert routing: Realizable H -consistency and strong guarantees for learning to defer
Mao, A., Mohri, M., and Zhong, Y. Mastering multiple-expert routing: Realizable H -consistency and strong guarantees for learning to defer. In International Conference on Machine Learning, 2025 a
2025
-
[79]
Principled algorithms for optimizing generalized metrics in binary classification
Mao, A., Mohri, M., and Zhong, Y. Principled algorithms for optimizing generalized metrics in binary classification. In International Conference on Machine Learning, 2025 b
2025
-
[80]
Enhanced -consistency bounds
Mao, A., Mohri, M., and Zhong, Y. Enhanced -consistency bounds. In International Conference on Algorithmic Learning Theory, 2025 c
2025
discussion (0)
Sign in with ORCID, Apple, or X to comment. Anyone can read and Pith papers without signing in.