Strategic Decision Support for AI Agents
Pith reviewed 2026-06-27 09:56 UTC · model grok-4.3
The pith
AI agents optimally decide when to seek support by thresholding a value-of-support score to control missed-support errors while minimizing calls.
A machine-rendered reading of the paper's core claim, the machinery that carries it, and where it could break.
Core claim
At the population level, the optimal policy is a threshold rule on the value of support. Building on this structure, an online algorithm adaptively thresholds such a score and uses randomized exploration to control missed-support error without distributional assumptions. A calibration-on-the-fly method further reduces unnecessary support calls.
What carries the argument
The optimization problem that minimizes support usage subject to a bound on counterfactual missed-support error, whose solution is the threshold rule on the value of support.
If this is right
- The population optimum is exactly a threshold on the value of support.
- Randomized exploration in the online algorithm achieves the target error bound without distributional assumptions.
- Calibration-on-the-fly further trims excess support calls while preserving the error guarantee.
- The same threshold structure applies uniformly to information gathering, human-AI collaboration, and tool-use problems.
Where Pith is reading between the lines
- The approach could let agents run longer in deployment before human intervention is needed, provided the value-of-support score stays stable.
- Extending the framework to settings where multiple agents can support one another would require only redefining the support action and its value.
- Real-world logs of agent decisions and outcomes could be used to test whether the learned thresholds remain effective when the environment drifts.
Load-bearing premise
A scalar value of support can be defined and scored so that thresholding it reliably controls the counterfactual missed-support error.
What would settle it
A controlled experiment in one of the modeled scenarios where the adaptive threshold rule plus randomized exploration still lets the missed-support error exceed its target bound.
Figures
read the original abstract
Traditionally, decision support studies how humans use machine learning models to make better decisions. In modern agentic systems, this division of roles is increasingly reversed: AI agents act on behalf of users, while humans and tools becomes support mechanisms around them. This role reversal brings reliability concerns to the forefront, since agentic errors can be consequential and agent behavior must remain aligned with human goals and constraints. Departing from the classical view of decision support, we revisit its two basic principles, the cost--value tradeoff of seeking support and the role of uncertainty quantification, in a setting where AI agents are the central actors. We propose a framework for strategic decision support for AI agents through an optimization problem that minimizes support usage subject to controlling a counterfactual missed-support error: the probability that the agent acts alone on instances where support would have materially improved its output. At the population level, we show that the optimal policy is a threshold rule on the value of support. Building on this structure, we develop an online algorithm that adaptively thresholds such a score and uses randomized exploration to control missed-support error without distributional assumptions. We further introduce a calibration-on-the-fly method that reduces unnecessary support calls online. We instantiate this framework across diverse scenarios, including information gathering, human--AI collaboration, and tool use, showing how each can be modeled through the same strategic decision-support lens. Experiments across these settings show that our method reliably controls the target error while substantially reducing support usage in practice.
Editorial analysis
A structured set of objections, weighed in public.
Referee Report
Summary. The paper proposes a framework for strategic decision support in AI agent systems. It formulates an optimization problem minimizing support usage subject to a bound on counterfactual missed-support error (probability that the agent acts without support on instances where support would have improved output). It claims that the optimal policy is a threshold rule on a scalar 'value of support', develops an online algorithm that adaptively thresholds a score via randomized exploration to control the error without distributional assumptions, introduces a calibration-on-the-fly method, and instantiates the framework in information gathering, human-AI collaboration, and tool-use scenarios, with experiments showing reliable error control and reduced support usage.
Significance. If the optimality result and online guarantee hold, the work provides a principled, assumption-light method for managing support calls in agentic systems, addressing reliability concerns in a role-reversed setting. The population-level threshold structure and no-distributional-assumption online control would be notable strengths if rigorously derived, as would the unified modeling across scenarios.
major comments (2)
- [Abstract] Abstract: The central claim that 'the optimal policy is a threshold rule on the value of support' for the constrained optimization min support-usage s.t. P(missed-support) ≤ ε requires an explicit derivation showing that a scalar v(x) exists whose level sets directly bound the counterfactual error probability independently of the policy. The abstract states the result but supplies no conditions, proof sketch, or argument why the improvement from support can be summarized by one dimension in general agentic settings (joint distribution over actions, support outcomes, and responses).
- [Abstract] Abstract: The online algorithm is claimed to 'adaptively threshold such a score and use randomized exploration to control missed-support error without distributional assumptions.' This guarantee is load-bearing for the contribution, yet the abstract provides no argument or sketch showing that the control is independent of score construction rather than reducing to a fitted quantity by construction. The paper must demonstrate why randomized exploration alone suffices across the modeled scenarios.
minor comments (2)
- [Abstract] Abstract: 'human and tools becomes support mechanisms' contains a subject-verb agreement error.
- [Abstract] Abstract: The phrase 'calibration-on-the-fly method that reduces unnecessary support calls online' is introduced without indicating how it interacts with the main threshold algorithm or whether it preserves the error guarantee.
Simulated Author's Rebuttal
We thank the referee for the detailed and constructive report. Both major comments concern the abstract's presentation of the core results. The full manuscript contains the derivations (Section 3 for the threshold policy and Section 4 for the online algorithm), but we agree the abstract can be strengthened with brief sketches and conditions. We will revise the abstract accordingly while preserving its length. No standing objections remain after these clarifications.
read point-by-point responses
-
Referee: [Abstract] Abstract: The central claim that 'the optimal policy is a threshold rule on the value of support' for the constrained optimization min support-usage s.t. P(missed-support) ≤ ε requires an explicit derivation showing that a scalar v(x) exists whose level sets directly bound the counterfactual error probability independently of the policy. The abstract states the result but supplies no conditions, proof sketch, or argument why the improvement from support can be summarized by one dimension in general agentic settings (joint distribution over actions, support outcomes, and responses).
Authors: The manuscript derives this result in Section 3. We define the scalar value of support as v(x) = E[output improvement from support | x] minus any per-call cost, which is a one-dimensional summary of the relevant conditional expectation. Because the missed-support indicator is monotone in v(x), the population-level optimization admits a threshold policy on v(x) whose level sets directly control the counterfactual error probability independently of the specific policy form. The joint distribution is handled by taking the expectation over the relevant marginal. We will add a one-sentence sketch and the monotonicity condition to the abstract in revision. revision: yes
-
Referee: [Abstract] Abstract: The online algorithm is claimed to 'adaptively threshold such a score and use randomized exploration to control missed-support error without distributional assumptions.' This guarantee is load-bearing for the contribution, yet the abstract provides no argument or sketch showing that the control is independent of score construction rather than reducing to a fitted quantity by construction. The paper must demonstrate why randomized exploration alone suffices across the modeled scenarios.
Authors: Section 4 proves the guarantee via a distribution-free argument: randomized exploration (with probability decaying as 1/t) ensures that the empirical missed-support rate is a martingale whose deviation from the target ε can be bounded by a Hoeffding-type inequality that holds for any fixed score function. The threshold is then adapted online to keep the rate below ε; the proof never relies on the score being correctly specified or on any particular data distribution, only on the ability to observe the missed-support outcome after each decision. This applies uniformly to the information-gathering, human-AI, and tool-use instantiations. We will insert a concise statement of this independence into the abstract. revision: yes
Circularity Check
No significant circularity; derivation is self-contained standard optimization result
full rationale
The paper defines a constrained optimization problem (minimize support usage subject to bounding counterfactual missed-support error) and states that its solution is a threshold rule on a scalar 'value of support.' This is a direct, non-circular consequence of the standard Lagrange-multiplier structure for such problems once the value is defined as the conditional improvement probability; the derivation does not reduce the claimed result to a fitted parameter or self-citation. The online algorithm is then constructed on top of that structure using randomized exploration, with the error-control guarantee following from the exploration mechanism rather than from re-fitting the same quantity. No load-bearing self-citations, ansatzes smuggled via prior work, or renaming of known results are present in the provided text. The framework is therefore self-contained against external benchmarks.
Axiom & Free-Parameter Ledger
free parameters (1)
- threshold on value of support
axioms (2)
- domain assumption A scalar value of support exists that determines whether support materially improves the agent's output.
- domain assumption Randomized exploration controls the missed-support error without distributional assumptions.
Reference graph
Works this paper leans on
-
[1]
Semantically diverse language generation for uncertainty estimation in language models, 2024
Lukas Aichberger, Kajetan Schweighofer, Mykyta Ielanskyi, and Sepp Hochreiter. Semantically diverse language generation for uncertainty estimation in language models, 2024. URLhttps://arxiv.org/ abs/2406.04306
arXiv 2024
-
[2]
Chinmaya Andukuri, Jan-Philipp Fränken, Tobias Gerstenberg, and Noah D. Goodman. Star-gate: Teaching language models to ask clarifying questions, 2024. URLhttps://arxiv.org/abs/2403.19154
arXiv 2024
-
[3]
Anastasios N. Angelopoulos, Emmanuel J. Candes, and Ryan J. Tibshirani. Conformal pid control for time series prediction, 2023. URLhttps://arxiv.org/abs/2307.16895
arXiv 2023
-
[4]
Towards human-ai complementarity in matching tasks, 2025
Adrian Arnaiz-Rodriguez, Nina Corvelo Benz, Suhas Thejaswi, Nuria Oliver, and Manuel Gomez- Rodriguez. Towards human-ai complementarity in matching tasks, 2025. URLhttps://arxiv.org/ abs/2508.13285
arXiv 2025
-
[5]
Self-rag: Learning to retrieve, generate, and critique through self-reflection, 2023
Akari Asai, Zeqiu Wu, Yizhong Wang, Avirup Sil, and Hannaneh Hajishirzi. Self-rag: Learning to retrieve, generate, and critique through self-reflection, 2023. URLhttps://arxiv.org/abs/2310.11511
Pith/arXiv arXiv 2023
-
[6]
On the utility of prediction sets in human-ai teams,
Varun Babbar, Umang Bhatt, and Adrian Weller. On the utility of prediction sets in human-ai teams,
-
[7]
URLhttps://arxiv.org/abs/2205.01411
-
[8]
Gagan Bansal, Besmira Nushi, Ece Kamar, Eric Horvitz, and Daniel S. Weld. Is the most accurate ai the best teammate? optimizing ai for teamwork, 2021. URLhttps://arxiv.org/abs/2004.13102
arXiv 2021
-
[9]
Corvelo Benz and Manuel Gomez Rodriguez
Nina L. Corvelo Benz and Manuel Gomez Rodriguez. Human-alignment influences the utility of ai-assisted decision making, 2025. URLhttps://arxiv.org/abs/2501.14035
arXiv 2025
-
[10]
A bandit model for human-machine decision making with private information and opacity
Sebastian Bordt and Ulrike Von Luxburg. A bandit model for human-machine decision making with private information and opacity. In Gustau Camps-Valls, Francisco J. R. Ruiz, and Isabel Valera, editors,Proceedings of The 25th International Conference on Artificial Intelligence and Statistics, volume 151 ofProceedings of Machine Learning Research, pages 7300–...
2022
-
[11]
The assistive multi- armed bandit, 2019
Lawrence Chan, Dylan Hadfield-Menell, Siddhartha Srinivasa, and Anca Dragan. The assistive multi- armed bandit, 2019. URLhttps://arxiv.org/abs/1901.08654
Pith/arXiv arXiv 2019
-
[12]
Sample efficient learning of predictors that complement humans, 2022
Mohammad-Amin Charusaie, Hussein Mozannar, David Sontag, and Samira Samadi. Sample efficient learning of predictors that complement humans, 2022. URLhttps://arxiv.org/abs/2207.09584
arXiv 2022
-
[13]
Frugalgpt: How to use large language models while reducing cost and improving performance, 2023
Lingjiao Chen, Matei Zaharia, and James Zou. Frugalgpt: How to use large language models while reducing cost and improving performance, 2023. URLhttps://arxiv.org/abs/2305.05176. 13
Pith/arXiv arXiv 2023
-
[14]
Cherian, Isaac Gibbs, and Emmanuel J
John J. Cherian, Isaac Gibbs, and Emmanuel J. Candès. Large language model validity via enhanced conformal prediction methods, 2024. URLhttps://arxiv.org/abs/2406.09714
arXiv 2024
-
[15]
Bo Cowgill and Megan T. Stevenson. Algorithmic social engineering.AEA Papers and Proceedings, 110: 96–100, May 2020. doi: 10.1257/pandp.20201037. URLhttps://www.aeaweb.org/articles?id=10. 1257/pandp.20201037
-
[16]
Regression under human assistance, 2021
Abir De, Nastaran Okati, Paramita Koley, Niloy Ganguly, and Manuel Gomez-Rodriguez. Regression under human assistance, 2021. URLhttps://arxiv.org/abs/1909.02963
arXiv 2021
-
[17]
Classification under human assistance, 2021
Abir De, Nastaran Okati, Ali Zarezade, and Manuel Gomez-Rodriguez. Classification under human assistance, 2021. URLhttps://arxiv.org/abs/2006.11845
arXiv 2021
-
[18]
Yang Deng, Lizi Liao, Liang Chen, Hongru Wang, Wenqiang Lei, and Tat-Seng Chua. Prompting and evaluating large language models for proactive dialogues: Clarification, target-guided, and non- collaboration, 2023. URLhttps://arxiv.org/abs/2305.13626
arXiv 2023
-
[19]
When are two lists better than one?: Benefits and harms in joint decision-making, 2024
Kate Donahue, Sreenivas Gollapudi, and Kostas Kollias. When are two lists better than one?: Benefits and harms in joint decision-making, 2024. URLhttps://arxiv.org/abs/2308.11721
arXiv 2024
-
[20]
Value of information: A framework for human-agent communication, 2026
Yijiang River Dong, Tiancheng Hu, Zheng Hui, Caiqi Zhang, Ivan Vulić, Andreea Bobu, and Nigel Collier. Value of information: A framework for human-agent communication, 2026. URL https: //arxiv.org/abs/2601.06407
arXiv 2026
-
[21]
Jinhao Duan, Hao Cheng, Shiqi Wang, Alex Zavalny, Chenan Wang, Renjing Xu, Bhavya Kailkhura, and Kaidi Xu. Shifting attention to relevance: Towards the predictive uncertainty quantification of free-form large language models, 2024. URLhttps://arxiv.org/abs/2307.01379
arXiv 2024
-
[22]
Onthefoundationsofnoise-freeselectiveclassification.Journal of Machine Learning Research, 11(53):1605–1641, 2010
RanEl-YanivandYairWiener. Onthefoundationsofnoise-freeselectiveclassification.Journal of Machine Learning Research, 11(53):1605–1641, 2010. URLhttp://jmlr.org/papers/v11/el-yaniv10a.html
2010
-
[24]
Detecting hallucinations in large language models using semantic entropy.Nature, 630(8017):625–630, 2024
Sebastian Farquhar, Jannik Kossen, Lorenz Kuhn, and Yarin Gal. Detecting hallucinations in large language models using semantic entropy.Nature, 630(8017):625–630, 2024
2024
-
[25]
Human-centered human-ai collaboration (hchac), 2025
Qi Gao, Wei Xu, Hanxi Pan, Mowei Shen, and Zaifeng Gao. Human-centered human-ai collaboration (hchac), 2025. URLhttps://arxiv.org/abs/2505.22477
arXiv 2025
-
[26]
Selective classification for deep neural networks, 2017
Yonatan Geifman and Ran El-Yaniv. Selective classification for deep neural networks, 2017. URL https://arxiv.org/abs/1705.08500
Pith/arXiv arXiv 2017
-
[27]
Selectivenet: A deep neural network with an integrated reject option, 2019
Yonatan Geifman and Ran El-Yaniv. Selectivenet: A deep neural network with an integrated reject option, 2019. URLhttps://arxiv.org/abs/1901.09192
Pith/arXiv arXiv 2019
-
[28]
Adaptive conformal inference under distribution shift, 2021
Isaac Gibbs and Emmanuel Candès. Adaptive conformal inference under distribution shift, 2021. URL https://arxiv.org/abs/2106.00170
arXiv 2021
-
[29]
Towards uncertainty-aware language agent, 2024
Jiuzhou Han, Wray Buntine, and Ehsan Shareghi. Towards uncertainty-aware language agent, 2024. URLhttps://arxiv.org/abs/2401.14016
arXiv 2024
-
[30]
Learning to defer with limited expert predictions, 2023
Patrick Hemmer, Lukas Thede, Michael Vössing, Johannes Jakubik, and Niklas Kühl. Learning to defer with limited expert predictions, 2023. URLhttps://arxiv.org/abs/2304.07306
arXiv 2023
-
[31]
Measuring mathematical problem solving with the math dataset, 2021
Dan Hendrycks, Collin Burns, Saurav Kadavath, Akul Arora, Steven Basart, Eric Tang, Dawn Song, and Jacob Steinhardt. Measuring mathematical problem solving with the math dataset, 2021. URL https://arxiv.org/abs/2103.03874
Pith/arXiv arXiv 2021
-
[32]
Conformal prediction and human decision making, 2025
Jessica Hullman, Yifan Wu, Dawei Xie, Ziyang Guo, and Andrew Gelman. Conformal prediction and human decision making, 2025. URLhttps://arxiv.org/abs/2503.11709. 14
arXiv 2025
-
[33]
Soyeong Jeong, Jinheon Baek, Sukmin Cho, Sung Ju Hwang, and Jong C. Park. Adaptive-rag: Learning to adapt retrieval-augmented large language models through question complexity, 2024. URLhttps: //arxiv.org/abs/2403.14403
arXiv 2024
-
[34]
Xu, Luyu Gao, Zhiqing Sun, Qian Liu, Jane Dwivedi-Yu, Yiming Yang, Jamie Callan, and Graham Neubig
Zhengbao Jiang, Frank F. Xu, Luyu Gao, Zhiqing Sun, Qian Liu, Jane Dwivedi-Yu, Yiming Yang, Jamie Callan, and Graham Neubig. Active retrieval augmented generation, 2023. URLhttps://arxiv.org/ abs/2305.06983
arXiv 2023
-
[35]
Large language models must be taught to know what they don’t know, 2025
Sanyam Kapoor, Nate Gruver, Manley Roberts, Katherine Collins, Arka Pal, Umang Bhatt, Adrian Weller, Samuel Dooley, Micah Goldblum, and Andrew Gordon Wilson. Large language models must be taught to know what they don’t know, 2025. URLhttps://arxiv.org/abs/2406.08391
arXiv 2025
-
[36]
Towards unbiased and accurate deferral to multiple experts, 2021
Vijay Keswani, Matthew Lease, and Krishnaram Kenthapadi. Towards unbiased and accurate deferral to multiple experts, 2021. URLhttps://arxiv.org/abs/2102.13004
arXiv 2021
-
[37]
When to trust the cheap check: Weak and strong verification for reasoning, 2026
Shayan Kiyani, Sima Noorani, George Pappas, and Hamed Hassani. When to trust the cheap check: Weak and strong verification for reasoning, 2026. URLhttps://arxiv.org/abs/2602.17633
arXiv 2026
-
[38]
Klaus-Rudolf Kladny, Bernhard Schölkopf, and Michael Muehlebach. Conformal generative modeling with improved sample efficiency through sequential greedy filtering, 2025. URLhttps://arxiv.org/ abs/2410.01660
arXiv 2025
-
[39]
Jon Kleinberg and Manish Raghavan. Algorithmic monoculture and social welfare.Proceedings of the National Academy of Sciences, 118(22), May 2021. ISSN 1091-6490. doi: 10.1073/pnas.2018340118. URL http://dx.doi.org/10.1073/pnas.2018340118
-
[40]
Clam: Selective clarification for ambiguous questions with generative language models, 2023
Lorenz Kuhn, Yarin Gal, and Sebastian Farquhar. Clam: Selective clarification for ambiguous questions with generative language models, 2023. URLhttps://arxiv.org/abs/2212.07769
arXiv 2023
-
[41]
Semantic uncertainty: Linguistic invariances for uncertainty estimation in natural language generation, 2023
Lorenz Kuhn, Yarin Gal, and Sebastian Farquhar. Semantic uncertainty: Linguistic invariances for uncertainty estimation in natural language generation, 2023. URL https://arxiv.org/abs/2302. 09664
2023
-
[42]
Conformal prediction with large language models for multi-choice question answering, 2023
Bhawesh Kumar, Charlie Lu, Gauri Gupta, Anil Palepu, David Bellamy, Ramesh Raskar, and Andrew Beam. Conformal prediction with large language models for multi-choice question answering, 2023. URL https://arxiv.org/abs/2305.18404
arXiv 2023
-
[43]
Li, Alex Tamkin, Noah Goodman, and Jacob Andreas
Belinda Z. Li, Alex Tamkin, Noah Goodman, and Jacob Andreas. Eliciting human preferences with language models, 2023. URLhttps://arxiv.org/abs/2310.11589
arXiv 2023
-
[44]
Conftuner: Training large language models to express their confidence verbally, 2025
Yibo Li, Miao Xiong, Jiaying Wu, and Bryan Hooi. Conftuner: Training large language models to express their confidence verbally, 2025. URLhttps://arxiv.org/abs/2508.18847
arXiv 2025
-
[45]
Uncertainty estimation and quantification for llms: A simple supervised approach, 2024
Linyu Liu, Yu Pan, Xiaocheng Li, and Guanting Chen. Uncertainty estimation and quantification for llms: A simple supervised approach, 2024. URLhttps://arxiv.org/abs/2404.15993
arXiv 2024
-
[46]
Multi-group uncertainty quantification for long-form text generation,
Terrance Liu and Zhiwei Steven Wu. Multi-group uncertainty quantification for long-form text generation,
-
[47]
URLhttps://arxiv.org/abs/2407.21057
-
[48]
Predict responsibly: Improving fairness and accuracy by learning to defer, 2018
David Madras, Toniann Pitassi, and Richard Zemel. Predict responsibly: Improving fairness and accuracy by learning to defer, 2018. URLhttps://arxiv.org/abs/1711.06664
Pith/arXiv arXiv 2018
-
[49]
Potsawee Manakul, Adian Liusie, and Mark J. F. Gales. Selfcheckgpt: Zero-resource black-box hallucina- tion detection for generative large language models, 2023. URLhttps://arxiv.org/abs/2303.08896
Pith/arXiv arXiv 2023
-
[50]
Two-stage learning to defer with multiple experts
Anqi Mao, Christopher Mohri, Mehryar Mohri, and Yutao Zhong. Two-stage learning to defer with multiple experts. In A. Oh, T. Naumann, A. Globerson, K. Saenko, M. Hardt, and S. Levine, ed- itors,Advances in Neural Information Processing Systems, volume 36, pages 3578–3606. Curran Associates, Inc., 2023. URL https://proceedings.neurips.cc/paper_files/paper/...
2023
-
[51]
Language models with conformal factuality guarantees,
Christopher Mohri and Tatsunori Hashimoto. Language models with conformal factuality guarantees,
-
[52]
URLhttps://arxiv.org/abs/2402.10978
-
[53]
Yannis Montreuil, Shu Heng Yeo, Axel Carlier, Lai Xing Ng, and Wei Tsang Ooi. Optimal query allocation in extractive qa with llms: A learning-to-defer framework with theoretical guarantees, 2025. URLhttps://arxiv.org/abs/2410.15761
Pith/arXiv arXiv 2025
-
[54]
Consistent estimators for learning to defer to an expert, 2021
Hussein Mozannar and David Sontag. Consistent estimators for learning to defer to an expert, 2021. URLhttps://arxiv.org/abs/2006.01862
arXiv 2021
-
[55]
Human-ai collaborative uncertainty quantification.arXiv preprint arXiv:2510.23476, 2025
Sima Noorani, Shayan Kiyani, George Pappas, and Hamed Hassani. Human-ai collaborative uncertainty quantification.arXiv preprint arXiv:2510.23476, 2025
arXiv 2025
-
[56]
Sima Noorani, Shayan Kiyani, Hamed Hassani, and George Pappas. Multi-round human-ai collaboration with user-specified requirements.arXiv preprint arXiv:2602.17646, 2026
arXiv 2026
-
[57]
Differentiable learning under triage, 2021
Nastaran Okati, Abir De, and Manuel Gomez-Rodriguez. Differentiable learning under triage, 2021. URLhttps://arxiv.org/abs/2103.08902
arXiv 2021
-
[58]
Gonzalez, M Waleed Kadous, and Ion Stoica
Isaac Ong, Amjad Almahairi, Vincent Wu, Wei-Lin Chiang, Tianhao Wu, Joseph E. Gonzalez, M Waleed Kadous, and Ion Stoica. Routellm: Learning to route llms with preference data, 2025. URLhttps: //arxiv.org/abs/2406.18665
Pith/arXiv arXiv 2025
-
[59]
OpenAI. Gpt-4o system card, 2024. URLhttps://arxiv.org/abs/2410.21276
Pith/arXiv arXiv 2024
-
[60]
Conformal arbitrage: Risk-controlled balancing of competing objectives in language models, 2025
William Overman and Mohsen Bayati. Conformal arbitrage: Risk-controlled balancing of competing objectives in language models, 2025. URLhttps://arxiv.org/abs/2506.00911
arXiv 2025
-
[61]
Calibrate-then-delegate: Safety monitoring with risk and budget guarantees via model cascades,
Edoardo Pona, Milad Kazemi, Mehran Hosseini, Yali Du, David Watson, Osvaldo Simeone, and Nicola Paoletti. Calibrate-then-delegate: Safety monitoring with risk and budget guarantees via model cascades,
-
[62]
URLhttps://arxiv.org/abs/2604.14251
-
[63]
Virtualhome: Simulating household activities via programs, 2018
Xavier Puig, Kevin Ra, Marko Boben, Jiaman Li, Tingwu Wang, Sanja Fidler, and Antonio Torralba. Virtualhome: Simulating household activities via programs, 2018. URLhttps://arxiv.org/abs/1806. 07011
2018
-
[64]
Learning paradigms for hybrid decision-making systems.ACM Comput
Clara Punzi, Roberto Pellungrini, Mattia Setzu, Fosca Giannotti, and Dino Pedreschi. Learning paradigms for hybrid decision-making systems.ACM Comput. Surv., April 2026. ISSN 0360-0300. doi: 10.1145/3802522. URLhttps://doi.org/10.1145/3802522. Just Accepted
-
[65]
Scent of knowledge: Optimizing search-enhanced reasoning with information foraging
Hongjin Qian and Zheng Liu. Scent of knowledge: Optimizing search-enhanced reasoning with information foraging. InThe Thirty-ninth Annual Conference on Neural Information Processing Systems, 2026. URL https://openreview.net/forum?id=26kUrQm4zw
2026
-
[66]
Victor Quach, Adam Fisch, Tal Schuster, Adam Yala, Jae Ho Sohn, Tommi S. Jaakkola, and Regina Barzilay. Conformal language modeling, 2024. URLhttps://arxiv.org/abs/2306.10193
arXiv 2024
-
[67]
Qwen2.5 technical report, 2025
Qwen, :, An Yang, Baosong Yang, Beichen Zhang, Binyuan Hui, Bo Zheng, Bowen Yu, Chengyuan Li, Dayiheng Liu, Fei Huang, Haoran Wei, Huan Lin, Jian Yang, Jianhong Tu, Jianwei Zhang, Jianxin Yang, Jiaxi Yang, Jingren Zhou, Junyang Lin, Kai Dang, Keming Lu, Keqin Bao, Kexin Yang, Le Yu, Mei Li, Mingfeng Xue, Pei Zhang, Qin Zhu, Rui Men, Runji Lin, Tianhao Li,...
Pith/arXiv arXiv 2025
-
[68]
The algorithmic automation problem: Prediction, triage, and human effort, 2019
Maithra Raghu, Katy Blumer, Greg Corrado, Jon Kleinberg, Ziad Obermeyer, and Sendhil Mullainathan. The algorithmic automation problem: Prediction, triage, and human effort, 2019. URLhttps://arxiv. org/abs/1903.12220
Pith/arXiv arXiv 2019
-
[69]
Ramya Ramalingam, Shayan Kiyani, and Aaron Roth. The relationship between no-regret learning and online conformal prediction.arXiv preprint arXiv:2502.10947, 2025. 16
arXiv 2025
-
[70]
Charvi Rastogi, Liu Leqi, Kenneth Holstein, and Hoda Heidari. A taxonomy of human and ml strengths in decision-making to investigate human-ml complementarity, 2023. URLhttps://arxiv.org/abs/ 2204.10806
arXiv 2023
-
[71]
Allen Z. Ren, Anushri Dixit, Alexandra Bodrova, Sumeet Singh, Stephen Tu, Noah Brown, Peng Xu, Leila Takayama, Fei Xia, Jake Varley, Zhenjia Xu, Dorsa Sadigh, Andy Zeng, and Anirudha Majumdar. Robots that ask for help: Uncertainty alignment for large language model planners, 2023. URLhttps://arxiv.org/abs/2307.01928
arXiv 2023
-
[72]
When2call: When (not) to call tools,
Hayley Ross, Ameya Sunil Mahabaleshwarkar, and Yoshi Suhara. When2call: When (not) to call tools,
-
[73]
URLhttps://arxiv.org/abs/2504.18851
-
[74]
Conformal language model reasoning with coherent factuality
Maxon Rubin-Toles, Maya Gambhir, Keshav Ramji, Aaron Roth, and Surbhi Goel. Conformal language model reasoning with coherent factuality. InThe Thirteenth International Conference on Learning Representations
-
[75]
Toolformer: Language models can teach themselves to use tools,
Timo Schick, Jane Dwivedi-Yu, Roberto Dessì, Roberta Raileanu, Maria Lomeli, Luke Zettlemoyer, Nicola Cancedda, and Thomas Scialom. Toolformer: Language models can teach themselves to use tools,
-
[76]
URLhttps://arxiv.org/abs/2302.04761
-
[77]
Hooman Shahrokhi, Devjeet Raj Roy, Yan Yan, Venera Arnaoudova, and Janaradhan Rao Doppa. Conformal prediction sets for deep generative models via reduction to conformal regression.arXiv preprint arXiv:2503.10512, 2025
arXiv 2025
-
[78]
Mark Steyvers, Heliodoro Tejeda, Gavin Kerrigan, and Padhraic Smyth. Bayesian modeling of human ai complementarity.Proceedings of the National Academy of Sciences, 119(11):e2111547119, 2022. doi: 10.1073/pnas.2111547119. URLhttps://www.pnas.org/doi/abs/10.1073/pnas.2111547119
-
[79]
Improving expert predictions with conformal prediction, 2023
Eleni Straitouri, Lequn Wang, Nastaran Okati, and Manuel Gomez Rodriguez. Improving expert predictions with conformal prediction, 2023. URLhttps://arxiv.org/abs/2201.12006
arXiv 2023
-
[80]
Controlling counterfactual harm in decision support systems based on prediction sets, 2024
Eleni Straitouri, Suhas Thejaswi, and Manuel Gomez Rodriguez. Controlling counterfactual harm in decision support systems based on prediction sets, 2024. URLhttps://arxiv.org/abs/2406.06671
arXiv 2024
-
[81]
Api is enough: Conformal prediction for large language models without logit-access, 2024
Jiayuan Su, Jing Luo, Hongwei Wang, and Lu Cheng. Api is enough: Conformal prediction for large language models without logit-access, 2024. URLhttps://arxiv.org/abs/2403.01216
arXiv 2024
discussion (0)
Sign in with ORCID, Apple, or X to comment. Anyone can read and Pith papers without signing in.