pith. sign in

arxiv: 2605.19830 · v1 · pith:JOF4YOA5new · submitted 2026-05-19 · 💻 cs.LG · math.ST· stat.TH

Set-Valued Policy Learning

Pith reviewed 2026-05-20 06:57 UTC · model grok-4.3

classification 💻 cs.LG math.STstat.TH
keywords set-valued policiestreatment policiesconformal predictionlearning to defermultiple treatmentsuncertainty quantificationcausal inferencepolicy learning
0
0 comments X

The pith

Set-valued policies output sets of plausible treatments rather than single recommendations to reflect decision uncertainty in multiple-treatment settings.

A machine-rendered reading of the paper's core claim, the machinery that carries it, and where it could break.

The paper proposes shifting from conventional point-valued treatment policies to set-valued ones that output collections of plausible interventions for each patient. This change allows the size of the output set to serve as a built-in measure of how ambiguous the optimal choice appears under the estimated model. The authors extend learning-to-defer methods to multiple treatments using a greatest lower bound construction and add conformal policy learning that relies on randomness injection to achieve marginal coverage guarantees. These guarantees hold without placing assumptions on the form of the unknown optimal treatment rule. The approach is illustrated on synthetic data and a real IVF application, where the resulting policies balance clinical performance with explicit reliability information.

Core claim

We propose a set-valued policy learning paradigm for the multiple-treatment setting, in which policies output a set of plausible treatments rather than a single recommendation. This formulation enables intrinsic uncertainty quantification, with the size of the predicted set reflecting the degree of decision ambiguity. We extend the learning-to-defer framework to multiple treatments via a novel greatest Lower Bound method, and introduce conformal policy learning, which bridges the gap between unobserved ground-truth optimal treatments and estimated optimal treatment rules. Drawing on insights from the noisy-label literature, we develop a randomness-injection approach that guarantees marginal

What carries the argument

The set-valued policy that maps covariates to a collection of treatments whose size signals decision ambiguity, together with the randomness-injection technique that produces marginal coverage without assumptions on the black-box optimal rule.

If this is right

  • When estimation uncertainty is high the policy naturally returns larger sets, giving clinicians explicit latitude to choose among options.
  • The methods produce policies that remain actionable while automatically trading off performance against reliability in settings such as IVF.
  • Conformal policy learning supplies coverage guarantees that survive model misspecification or finite-sample effects that normally plague point-valued rules.
  • The framework extends directly to any multi-action causal decision problem where only noisy estimates of optimality are available.

Where Pith is reading between the lines

These are editorial extensions of the paper, not claims the author makes directly.

  • Larger sets could serve as a signal to defer the final choice to a human expert or to collect additional patient data.
  • The same construction might be applied to sequential decision problems by treating each time step as a multi-treatment choice.
  • Empirical tests could check whether set sizes correlate with actual clinical disagreement among physicians on the same cases.

Load-bearing premise

The randomness-injection approach guarantees marginal coverage without requiring assumptions on underlying black-box optimal treatment rules.

What would settle it

A controlled simulation in which the randomness-injection procedure is applied to data with fully known optimal treatment assignments and the resulting sets fail to contain the true optimal treatment at the promised marginal rate would falsify the coverage claim.

Figures

Figures reproduced from arXiv: 2605.19830 by Antoine Chambaz, Ga\"elle Dormion, Julie Josse, Laura Fuentes-Vicente, Mathieu Even, Uri Shalit.

Figure 1
Figure 1. Figure 1: (a) Set-policy values for Y (x-axis) and ξ (y-axis) across two decision strategies: δlower (points) and δunif (triangles), α = 0.1 (b) Rows: individual observations, columns: treatment levels included in conformal set-valued policies, for α = 0.1. 9 [PITH_FULL_IMAGE:figures/full_fig_p009_1.png] view at source ↗
Figure 2
Figure 2. Figure 2: Distribution of Optimal Treatment Assignments by Feature Values [PITH_FULL_IMAGE:figures/full_fig_p017_2.png] view at source ↗
Figure 3
Figure 3. Figure 3: Mean cardinality for varying levels α. Results compare conformal set-valued policy learning across different randomness levels r, GLB (green) and Oracular conformal prediction (blue). Columns indicate sample size of training data. Remarks from [PITH_FULL_IMAGE:figures/full_fig_p018_3.png] view at source ↗
Figure 4
Figure 4. Figure 4: Marginal coverage gap t 7→ E[∆(t)] for varying levels α. Results compare conformal set￾valued policy learning across different randomness levels (r), GLB (green) and Oracular conformal prediction (blue). Columns indicate sample size of training data. 6000 12000 18000 0 0.050.10.150.20.250.30.350.40.450.50.550.60.650.70.750.80.850.90.95 1 0 0.050.10.150.20.250.30.350.40.450.50.550.60.650.70.750.80.850.90.95… view at source ↗
Figure 5
Figure 5. Figure 5: Uniform set-policy value for varying levels [PITH_FULL_IMAGE:figures/full_fig_p019_5.png] view at source ↗
Figure 6
Figure 6. Figure 6: Set-policy value for follicular yield (Y ) across varying varying levels α, for two decision strategies: δunif (bottom) and δlower (top). Results are shown for conformal set-valued policy learning across different randomness levels (r) and GLB (green). The gray dashed line represents the policy value (for Y ) achieved by the noisy label generation technique (MACF) alone. No error bars for the choice strate… view at source ↗
Figure 7
Figure 7. Figure 7: Set-policy value for estradiol (ξ) across varying varying levels α, for two decision strategies: δunif (bottom) and δlower (top). Results are shown for conformal set-valued policy learning across different randomness levels (r) and GLB (green). The gray dashed line represents the policy value (for ξ) achieved by the noisy label generation technique (MACF). No error bars for the choice strategy δlower since… view at source ↗
Figure 8
Figure 8. Figure 8: Mean cardinality for varying levels α. Results are shown for conformal set-valued policy learning across different randomness levels r and GLB (green). NeurIPS Paper Checklist 1. Claims Question: Do the main claims made in the abstract and introduction accurately reflect the paper’s contributions and scope? Answer: [Yes]. Justification: Contributions and scope are presented in Section 1. Guidelines: 21 [P… view at source ↗
read the original abstract

Conventional treatment policies map patient covariates to a single recommended intervention in order to maximize expected clinical outcomes. Although a rich body of causal inference methods has been developed to estimate such policies, point-valued recommendations can be highly sensitive to estimation uncertainty, model specification, and finite-sample variability, while typically providing little guidance about how confident one should be in the recommended action. In this work, we propose a set-valued policy learning paradigm for the multiple-treatment setting, in which policies output a set of plausible treatments rather than a single recommendation. This formulation enables intrinsic uncertainty quantification, with the size of the predicted set reflecting the degree of decision ambiguity. We extend the learning-to-defer framework to multiple treatments via a novel \textit{greatest Lower Bound} method, and introduce \textit{conformal policy learning}, which bridges the gap between unobserved ground-truth optimal treatments and estimated optimal treatment rules. Drawing on insights from the noisy-label literature, we develop a randomness-injection approach that guarantees marginal coverage without requiring assumptions on underlying black-box optimal treatment rules. Through experiments on synthetic data and a real-world application to In-Vitro Fertilization (IVF), we demonstrate that our methods produce robust and actionable policies that naturally incorporate clinical considerations while effectively balancing performance and reliability.

Editorial analysis

A structured set of objections, weighed in public.

Desk editor's note, referee report, simulated authors' rebuttal, and a circularity audit. Tearing a paper down is the easy half of reading it; the pith above is the substance, this is the friction.

Referee Report

1 major / 1 minor

Summary. The manuscript proposes a set-valued policy learning paradigm for the multiple-treatment setting, in which policies output sets of plausible treatments to quantify decision uncertainty rather than single recommendations. It extends the learning-to-defer framework via a novel greatest lower bound method and introduces conformal policy learning that employs a randomness-injection technique to deliver marginal coverage guarantees without strong assumptions on black-box optimal treatment rules. The approach is evaluated on synthetic data and a real-world IVF application.

Significance. If the coverage guarantees hold under the stated conditions, the work offers a useful advance in robust policy learning for causal inference and clinical applications by naturally incorporating uncertainty through set size. The randomness-injection method, drawing from noisy-label ideas, provides a creative bridge between estimated rules and unobserved ground truth, and the IVF experiment illustrates practical utility in balancing performance with reliability.

major comments (1)
  1. [Conformal policy learning and randomness-injection] Conformal policy learning section: the randomness-injection approach is presented as guaranteeing marginal coverage without assumptions on the black-box optimal treatment rules. However, conformity scores are constructed from the estimated rules; the manuscript must explicitly address whether rule estimation and the subsequent injection/calibration step use disjoint data (or otherwise preserve exchangeability), as overlap would undermine the unconditional coverage claim that is load-bearing for the central contribution.
minor comments (1)
  1. [Abstract] Abstract and method descriptions: the phrase 'greatest Lower Bound' appears with inconsistent capitalization; align with the formal definition and notation used in the main text for clarity.

Simulated Author's Rebuttal

1 responses · 0 unresolved

We thank the referee for their careful reading and for identifying this important clarification needed for the conformal policy learning procedure. We address the concern below and have revised the manuscript to make the data-splitting and exchangeability arguments fully explicit.

read point-by-point responses
  1. Referee: [Conformal policy learning and randomness-injection] Conformal policy learning section: the randomness-injection approach is presented as guaranteeing marginal coverage without assumptions on the black-box optimal treatment rules. However, conformity scores are constructed from the estimated rules; the manuscript must explicitly address whether rule estimation and the subsequent injection/calibration step use disjoint data (or otherwise preserve exchangeability), as overlap would undermine the unconditional coverage claim that is load-bearing for the central contribution.

    Authors: We agree that explicit treatment of exchangeability is essential for the unconditional marginal coverage claim. In the revised manuscript we now state that the procedure employs sample splitting: the black-box optimal treatment rule is estimated on a dedicated training fold, while the randomness-injection step and the subsequent calibration of conformity scores are performed on a completely disjoint calibration fold. Because the calibration observations are exchangeable with future test points and independent of the rule estimator, the standard conformal argument applies directly and yields the stated marginal coverage guarantee without further assumptions on the underlying rule. We have added a new paragraph in Section 4.2 together with a diagram (Figure 3) that illustrates the three-way split (training / calibration / test) and the corresponding exchangeability statement. revision: yes

Circularity Check

0 steps flagged

No significant circularity in derivation chain.

full rationale

The paper's central contribution is a randomness-injection method for marginal coverage in set-valued policies, presented as drawing from noisy-label literature and requiring no assumptions on black-box optimal rules. No equations or steps in the provided abstract reduce a claimed prediction or guarantee to a fitted input or self-citation by construction. The derivation treats the black-box as fixed and invokes external insights, keeping the core claim independent rather than self-referential. This is the common honest non-finding for papers whose guarantees rest on stated external assumptions.

Axiom & Free-Parameter Ledger

0 free parameters · 0 axioms · 0 invented entities

Abstract-only review prevents full enumeration; the central claims appear to rest on standard causal assumptions plus the novel randomness-injection mechanism for coverage.

pith-pipeline@v0.9.0 · 5760 in / 1017 out tokens · 33701 ms · 2026-05-20T06:57:44.930140+00:00 · methodology

discussion (0)

Sign in with ORCID, Apple, or X to comment. Anyone can read and Pith papers without signing in.

Lean theorems connected to this paper

Citations machine-checked in the Pith Canon. Every link opens the source theorem in the public Lean library.

What do these tags mean?
matches
The paper's claim is directly supported by a theorem in the formal canon.
supports
The theorem supports part of the paper's argument, but the paper may add assumptions or extra steps.
extends
The paper goes beyond the formal theorem; the theorem is a base layer rather than the whole result.
uses
The paper appears to rely on the theorem as machinery.
contradicts
The paper's claim conflicts with a theorem or certificate in the canon.
unclear
Pith found a possible connection, but the passage is too broad, indirect, or ambiguous to say the theorem truly supports the claim.

Reference graph

Works this paper leans on

57 extracted references · 57 canonical work pages · 1 internal anchor

  1. [1]

    Conformal meta-learners for predictive inference of individual treatment effects.Advances in neural information processing systems, 36:47682–47703, 2023

    Ahmed M Alaa, Zaid Ahmad, and Mark van der Laan. Conformal meta-learners for predictive inference of individual treatment effects.Advances in neural information processing systems, 36:47682–47703, 2023

  2. [2]

    Generalized random forests.Ann

    Susan Athey, Julie Tibshirani, and Stefan Wager. Generalized random forests.Ann. Statist., 47(2):1148–1178, 2019. ISSN 0090-5364,2168-8966. doi: 10.1214/18-AOS1709. URL https://doi.org/10.1214/18-AOS1709

  3. [3]

    On the utility of prediction sets in human-ai teams, 2022

    Varun Babbar, Umang Bhatt, and Adrian Weller. On the utility of prediction sets in human-ai teams, 2022. URLhttps://arxiv.org/abs/2205.01411

  4. [4]

    Eli Ben-Michael, D James Greiner, Melody Huang, Kosuke Imai, Zhichao Jiang, and Sooahn Shin. Does ai help humans make better decisions? a statistical evaluation framework for experimental and observational studies.Proceedings of the National Academy of Sciences, 122 (38):e2505106122, 2025

  5. [5]

    Predictive inference with weak supervision.J

    Maxime Cauchois, Suyash Gupta, Alnur Ali, and John Duchi. Predictive inference with weak supervision.J. Mach. Learn. Res., 25:Paper No. [118], 45, 2024. ISSN 1532-4435,1533-7928

  6. [6]

    XGBoost : A scalable tree boosting system

    Tianqi Chen and Carlos Guestrin. Xgboost: A scalable tree boosting system. InProceedings of the 22nd ACM SIGKDD International Conference on Knowledge Discovery and Data Mining, page 785–794, New York, NY , USA, 2016. Association for Computing Machinery. doi: 10.1145/2939672.2939785

  7. [7]

    How many labelers do you have? a closer look at gold-standard labels.arXiv preprint arXiv:2206.12041, 2022

    Chen Cheng, Hilal Asi, and John Duchi. How many labelers do you have? a closer look at gold-standard labels.arXiv preprint arXiv:2206.12041, 2022

  8. [8]

    Toward personalized inference on individual treatment effects.Proceedings of the National Academy of Sciences, 120(7): e2300458120, 2023

    Victor Chernozhukov, Kaspar Wüthrich, and Yinchu Zhu. Toward personalized inference on individual treatment effects.Proceedings of the National Academy of Sciences, 120(7): e2300458120, 2023

  9. [9]

    Coindice: Off-policy confidence interval estimation.Advances in neural information processing systems, 33:9398–9411, 2020

    Bo Dai, Ofir Nachum, Yinlam Chow, Lihong Li, Csaba Szepesvári, and Dale Schuurmans. Coindice: Off-policy confidence interval estimation.Advances in neural information processing systems, 33:9398–9411, 2020

  10. [10]

    Towards human-ai complementarity with prediction sets.Advances in Neural Information Processing Systems, 37:31380–31409, 2024

    Giovanni De Toni, Nastaran Okati, Suhas Thejaswi, Eleni Straitouri, and Manuel Rodriguez. Towards human-ai complementarity with prediction sets.Advances in Neural Information Processing Systems, 37:31380–31409, 2024

  11. [11]

    Doubly Robust Policy Evaluation and Learning

    Miroslav Dudík, John Langford, and Lihong Li. Doubly robust policy evaluation and learning. arXiv preprint arXiv:1103.4601, 2011

  12. [12]

    Angelopoulos, Asaf Gendler, and Yaniv Romano

    Bat-Sheva Einbinder, Shai Feldman, Stephen Bates, Anastasios N. Angelopoulos, Asaf Gendler, and Yaniv Romano. Label noise robustness of conformal prediction.J. Mach. Learn. Res., 25: Paper No. [328], 66, 2024. ISSN 1532-4435,1533-7928

  13. [13]

    Policy learning under constraint: Maximizing a primary outcome while controlling an adverse event.arXiv preprint arXiv:2601.22717, 2026

    Laura Fuentes-Vicente, Mathieu Even, Gaëlle Dormion, Julie Josse, and Antoine Chambaz. Policy learning under constraint: Maximizing a primary outcome while controlling an adverse event.arXiv preprint arXiv:2601.22717, 2026

  14. [14]

    When to act and when to ask: policy learning with deferral under hidden confounding.Advances in Neural Information Processing Systems, 37:56108– 56135, 2024

    Marah Ghoummaid and Uri Shalit. When to act and when to ask: policy learning with deferral under hidden confounding.Advances in Neural Information Processing Systems, 37:56108– 56135, 2024

  15. [15]

    Conformal prediction and human decision making.arXiv preprint arXiv:2503.11709, 2025

    Jessica Hullman, Yifan Wu, Dawei Xie, Ziyang Guo, and Andrew Gelman. Conformal prediction and human decision making.arXiv preprint arXiv:2503.11709, 2025

  16. [16]

    Kosuke Imai, Zhichao Jiang, D James Greiner, Ryan Halen, and Sooahn Shin. Experimental evaluation of algorithm-assisted human decision-making: Application to pretrial public safety assessment.Journal of the Royal Statistical Society Series A: Statistics in Society, 186(2): 167–189, 2023. 11

  17. [17]

    Ying Jin, Zhimei Ren, and Emmanuel J. Candès. Sensitivity analysis of individual treatment effects: a robust conformal inference approach.Proc. Natl. Acad. Sci. USA, 120(6):Paper No. e2214889120, 13, 2023. ISSN 0027-8424,1091-6490

  18. [18]

    Conformal convolution and monte carlo meta-learners for predictive inference of individual treatment effects, 2025

    Jef Jonkers, Jarne Verhaeghe, Glenn Van Wallendael, Luc Duchateau, and Sofie Van Hoecke. Conformal convolution and monte carlo meta-learners for predictive inference of individual treatment effects, 2025. URLhttps://arxiv.org/abs/2402.04906

  19. [19]

    kernlab-an s4 package for kernel methods in r.Journal of statistical software, 11:1–20, 2004

    Alexandros Karatzoglou, Alexandros Smola, Kurt Hornik, and Achim Zeileis. kernlab-an s4 package for kernel methods in r.Journal of statistical software, 11:1–20, 2004

  20. [20]

    Edward H. Kennedy. Towards optimal doubly robust estimation of heterogeneous causal effects. Electron. J. Stat., 17(2):3008–3049, 2023. doi: 10.1214/23-ejs2157

  21. [21]

    Conformal prediction intervals for the individual treatment effect.arXiv preprint arXiv:2006.01474, 2020

    Danijel Kivaranovic, Robin Ristl, Martin Posch, and Hannes Leeb. Conformal prediction intervals for the individual treatment effect.arXiv preprint arXiv:2006.01474, 2020

  22. [22]

    Decision theoretic foundations for conformal prediction: Optimal uncertainty quantification for risk-averse agents.arXiv preprint arXiv:2502.02561, 2025

    Shayan Kiyani, George Pappas, Aaron Roth, and Hamed Hassani. Decision theoretic foundations for conformal prediction: Optimal uncertainty quantification for risk-averse agents.arXiv preprint arXiv:2502.02561, 2025

  23. [23]

    Künzel, Jasjeet S

    Sören R. Künzel, Jasjeet S. Sekhon, Peter J. Bickel, and Bin Yu. Metalearners for estimating heterogeneous treatment effects using machine learning.Proceedings of the National Academy of Sciences of the United States of America, 116:4156 – 4165, 2017

  24. [24]

    Human-ai collaboration in healthcare: A review and research agenda

    Yi Lai, Atreyi Kankanhalli, and Desmond Ong. Human-ai collaboration in healthcare: A review and research agenda. 2021

  25. [25]

    Lihua Lei and Emmanuel J. Candès. Conformal inference of counterfactuals and individual treatment effects.J. R. Stat. Soc. Ser. B. Stat. Methodol., 83(5):911–938, 2021. ISSN 1369- 7412,1467-9868

  26. [26]

    Treatment effects on ordinal outcomes: Causal estimands and sharp bounds.Journal of Educational and Behavioral Statistics, 43(5): 540–567, 2018

    Jiannan Lu, Peng Ding, and Tirthankar Dasgupta. Treatment effects on ordinal outcomes: Causal estimands and sharp bounds.Journal of Educational and Behavioral Statistics, 43(5): 540–567, 2018

  27. [27]

    One-step estimation of differentiable hilbert-valued parame- ters.The Annals of Statistics, 52(4):1534–1563, 2024

    Alex Luedtke and Incheoul Chung. One-step estimation of differentiable hilbert-valued parame- ters.The Annals of Statistics, 52(4):1534–1563, 2024

  28. [28]

    Luedtke and Mark J

    Alexander R. Luedtke and Mark J. van der Laan. Optimal individualized treatments in resource- limited settings.Int. J. Biostat., 12(1):283–303, 2016. doi: 10.1515/ijb-2015-0007

  29. [29]

    Lina M Montoya, Mark J van der Laan, Alexander R Luedtke, Jennifer L Skeem, Jeremy R Coyle, and Maya L Petersen. The optimal dynamic treatment rule superlearner: considerations, performance, and application to criminal justice interventions.The International Journal of Biostatistics, 19(1):217–238, 2023

  30. [30]

    Consistent estimators for learning to defer to an expert

    Hussein Mozannar and David Sontag. Consistent estimators for learning to defer to an expert. InProceedings of the 37th International Conference on Machine Learning, volume 119 of Proceedings of Machine Learning Research, pages 7076–7087. PMLR, 13–18 Jul 2020. URL https://proceedings.mlr.press/v119/mozannar20b.html

  31. [31]

    J. A. Nelder and R. W. M. Wedderburn. Generalized linear models.Journal of the Royal Statistical Society. Series A (General), 135(3):370–384, 1972

  32. [32]

    Policy learning with the polle package.arXiv preprint arXiv:2212.02335, 2022

    Andreas Nordland and Klaus K Holst. Policy learning with the polle package.arXiv preprint arXiv:2212.02335, 2022

  33. [33]

    Inductive confidence machines for regression

    Harris Papadopoulos, Kostas Proedrou, V olodya V ovk, and Alex Gammerman. Inductive confidence machines for regression. InMachine learning: ECML 2002, volume 2430 ofLecture Notes in Comput. Sci., pages 345–356. Springer, Berlin, 2002. ISBN 3-540-44036-4. doi: 10.1007/3-540-36755-1\_29. URLhttps://doi.org/10.1007/3-540-36755-1_29. 12

  34. [34]

    URL https://CRAN.R-project.org/package=SuperLearner

    Eric Polley, Erin LeDell, Chris Kennedy, and Mark van der Laan.SuperLearner: Super Learner Prediction, 2024. URL https://CRAN.R-project.org/package=SuperLearner. R package version 2.0-29

  35. [35]

    Min Qian and Susan A. Murphy. Performance guarantees for individualized treatment rules. Ann. Statist., 39(2):1180–1210, 2011. ISSN 0090-5364,2168-8966. doi: 10.1214/10-AOS864. URLhttps://doi.org/10.1214/10-AOS864

  36. [36]

    Donald B. Rubin. Causal inference using potential outcomes: design, modeling, decisions.J. Amer. Statist. Assoc., 100(469):322–331, 2005. ISSN 0162-1459,1537-274X. doi: 10.1198/ 016214504000001880. URLhttps://doi.org/10.1198/016214504000001880

  37. [37]

    Transduction with confidence and credibility

    Craig Saunders, Alexander Gammerman, and V olodya V ovk. Transduction with confidence and credibility. InProceedings of the Sixteenth International Joint Conference on Artificial Intelligence, IJCAI ’99, page 722–726, San Francisco, CA, USA, 1999. Morgan Kaufmann Publishers Inc. ISBN 1558606130

  38. [38]

    Conformal prediction for causal effects of continuous treatments, 2025

    Maresa Schröder, Dennis Frauen, Jonas Schweisthal, Konstantin Heß, Valentyn Melnychuk, and Stefan Feuerriegel. Conformal prediction for causal effects of continuous treatments, 2025. URLhttps://arxiv.org/abs/2407.03094

  39. [39]

    Matteo Sesia, Y . X. Rachel Wang, and Xin Tong. Adaptive conformal classification with noisy labels.J. R. Stat. Soc. Ser. B. Stat. Methodol., 87(3):796–815, 2025. ISSN 1369-7412,1467-9868. doi: 10.1093/jrsssb/qkae114. URLhttps://doi.org/10.1093/jrsssb/qkae114

  40. [40]

    A tutorial on conformal prediction.J

    Glenn Shafer and Vladimir V ovk. A tutorial on conformal prediction.J. Mach. Learn. Res., 9: 371–421, 2008. ISSN 1532-4435,1533-7928

  41. [41]

    Designing decision support systems using counterfactual prediction sets.arXiv preprint arXiv:2306.03928, 2023

    Eleni Straitouri and Manuel Gomez Rodriguez. Designing decision support systems using counterfactual prediction sets.arXiv preprint arXiv:2306.03928, 2023

  42. [42]

    Improving expert predictions with conformal prediction, 2023

    Eleni Straitouri, Lequn Wang, Nastaran Okati, and Manuel Gomez Rodriguez. Improving expert predictions with conformal prediction, 2023. URLhttps://arxiv.org/abs/2201.12006

  43. [43]

    Treatment allocation under uncertain costs.arXiv preprint arXiv:2103.11066, 2021

    Hao Sun, Evan Munro, Georgy Kalashnov, Shuyang Du, and Stefan Wager. Treatment allocation under uncertain costs.arXiv preprint arXiv:2103.11066, 2021

  44. [44]

    policytree: Policy learning via doubly robust empirical welfare maximization over trees.Journal of Open Source Software, 5(50):2232, 2020

    Erik Sverdrup, Ayush Kanodia, Zhengyuan Zhou, Susan Athey, and Stefan Wager. policytree: Policy learning via doubly robust empirical welfare maximization over trees.Journal of Open Source Software, 5(50):2232, 2020

  45. [45]

    Conformal off-policy prediction in contextual bandits.Advances in Neural Information Pro- cessing Systems, 35:31512–31524, 2022

    Muhammad Faaiz Taufiq, Jean-Francois Ton, Rob Cornish, Yee Whye Teh, and Arnaud Doucet. Conformal off-policy prediction in contextual bandits.Advances in Neural Information Pro- cessing Systems, 35:31512–31524, 2022

  46. [46]

    Package ‘grf’

    Julie Tibshirani, Susan Athey, Rina Friedberg, Vitor Hadad, David Hirshberg, Luke Miner, Erik Sverdrup, Stefan Wager, Marvin Wright, and Maintainer Julie Tibshirani. Package ‘grf’. Comprehensive R Archive Network, 2018

  47. [47]

    van der Laan and Sherri Rose.Targeted learning

    Mark J. van der Laan and Sherri Rose.Targeted learning. Springer Series in Statistics. Springer, New York, 2011. ISBN 978-1-4419-9781-4. doi: 10.1007/978-1-4419-9782-1. URL https://doi.org/10.1007/978-1-4419-9782-1 . Causal inference for observational and experimental data

  48. [48]

    van der Laan and Daniel Rubin

    Mark J. van der Laan and Daniel Rubin. Targeted maximum likelihood learning.Int. J. Biostat., 2:Art. 11, 40, 2006. ISSN 1557-4679. doi: 10.2202/1557-4679.1043. URL https: //doi.org/10.2202/1557-4679.1043

  49. [49]

    J., Polley, E

    Mark J. van der Laan, Eric C. Polley, and Alan E. Hubbard. Super learner.Stat. Appl. Genet. Mol. Biol., 6:Art. 25, 23, 2007. doi: 10.2202/1544-6115.1309

  50. [50]

    Conformal prediction for dose-response models with continuous treatments, 2024

    Jarne Verhaeghe, Jef Jonkers, and Sofie Van Hoecke. Conformal prediction for dose-response models with continuous treatments, 2024. URLhttps://arxiv.org/abs/2409.20412. 13

  51. [51]

    Springer, New York, 2005

    Vladimir V ovk, Alexander Gammerman, and Glenn Shafer.Algorithmic learning in a random world. Springer, New York, 2005. ISBN 978-0387-00152-4; 0-387-00152-2

  52. [52]

    Blei, Alp Kucukelbir, and Jon D

    Stefan Wager and Susan Athey. Estimation and inference of heterogeneous treatment effects using random forests.J. Amer. Statist. Assoc., 113(523):1228–1242, 2018. doi: 10.1080/ 01621459.2017.1319839. URLhttps://doi.org/10.1080/01621459.2017.1319839

  53. [53]

    Wright and Andreas Ziegler

    Marvin N. Wright and Andreas Ziegler. ranger: A fast implementation of random forests for high dimensional data in C++ and R.Journal of Statistical Software, 077:1–17, 2015

  54. [54]

    MIT Press, 2024

    Bin Yu and Rebecca L Barter.Veridical data science: The practice of responsible data analysis and decision making. MIT Press, 2024

  55. [55]

    Conformal off-policy prediction

    Yingying Zhang, Chengchun Shi, and Shikai Luo. Conformal off-policy prediction. InPro- ceedings of The 26th International Conference on Artificial Intelligence and Statistics, volume 206 ofProceedings of Machine Learning Research, pages 2751–2768. PMLR, 25–27 Apr 2023. URLhttps://proceedings.mlr.press/v206/zhang23c.html. 14 A Technical proofs A.1 Proof of...

  56. [56]

    Limitations

    Conformal policy learning: the procedure involves estimation at steps 1 and 2 in Section 4.1 (a) Black-box label generation: we generated the noisy labels ˆA∗ i =B(D b)(Xi) (i.e. OTR estimators) using the double-robust Q-learning implementation via polle package [32]. We employed a linear model [ 31] for the Q-model (µ) while the g-model (πb) was specifie...

  57. [57]

    Institutional review board (IRB) approvals or equivalent for research with human subjects Question: Does the paper describe potential risks incurred by study participants, whether such risks were disclosed to the subjects, and whether Institutional Review Board (IRB) approvals (or an equivalent approval/review based on the requirements of your country or ...