Recognition: no theorem link
A Communication-Theoretic Framework for LLM Agents: Cost-Aware Adaptive Reliability
Pith reviewed 2026-05-12 02:44 UTC · model grok-4.3
The pith
Viewing LLM sampling as a stochastic channel unifies reliability techniques and enables a router that dominates fixed methods on the quality-cost frontier.
A machine-rendered reading of the paper's core claim, the machinery that carries it, and where it could break.
Core claim
We observe that an LLM sampled at temperature T is a discrete stochastic channel p(y|x) in the sense of Shannon's coding theory, and use this identity as the entry point for a framework grounded in communication theory. Each reliability technique is a special case of one of six classical operators: diversity combining, hybrid retransmission, iterative generator-critic decoding, rateless sampling, structured redundant verification, and difficulty-adaptive routing. The framework yields closed-form thresholds for averaging and a contractivity criterion for refinement. A cost-aware semantic-nearest-neighbor router with one Lagrangian knob traverses the quality-cost frontier without retraining.On
What carries the argument
The mapping of LLM temperature sampling to a discrete stochastic channel p(y|x), which allows classical reliability operators to be applied directly to unify and optimize agent reliability techniques.
If this is right
- No fixed model-technique-budget choice dominates across the six channel configurations and 69 tasks.
- The router can achieve any point on the empirical quality-cost frontier by adjusting its single Lagrangian parameter.
- A noise-variance threshold determines when uniform averaging outperforms quality-weighted averaging.
- Generator-critic refinement is contractive only for models above a certain size, explaining observed transitions between 3B and 14B models.
- Per-task adaptive allocation is required to reach optimal reliability performance.
Where Pith is reading between the lines
- The framework could extend to dynamic estimation of channel parameters during agent operation for real-time adaptation.
- Similar channel models might apply to other LLM behaviors such as chain-of-thought reasoning steps.
- Integrating this router into multi-turn agent conversations could treat each turn as a cascaded channel.
Load-bearing premise
LLM sampling at temperature T can be treated as a discrete stochastic channel in the Shannon sense so that classical reliability operators apply directly.
What would settle it
Observing that the proposed router fails to trace the full Pareto frontier or achieve the stated cost and quality improvements when evaluated on a fresh set of tasks or model configurations outside the six tested.
Figures
read the original abstract
Agents built on large language models (LLMs) rely on a range of reliability techniques, including retry, majority voting, and self-consistency, that have been developed in parallel rather than within a common analytical framework. We observe that an LLM sampled at temperature $T$ is a discrete stochastic channel $p(y \mid x)$ in the sense of Shannon's coding theory, and use this identity as the entry point for such a framework grounded in communication theory. Each of these techniques is a special case of one of six classical reliability operators: diversity combining, hybrid retransmission, iterative generator-critic decoding, rateless sampling, structured redundant verification, and difficulty-adaptive routing. Within the framework we give two closed-form results: a noise-variance threshold above which uniform averaging beats quality-weighted averaging, and a contractivity criterion for generator-critic refinement, consistent with a contractive-to-divergent transition we observe between 3B- and 14B-parameter models. We further introduce a cost-aware semantic-nearest-neighbor router whose single Lagrangian knob traverses the quality-cost frontier without retraining. Across six channel configurations spanning local and cloud models on 69 hard tasks, no fixed model-technique-budget choice dominates, motivating per-task allocation. On a 300-item hard split of MMLU, GSM8K, and HumanEval, our router occupies the full empirical Pareto frontier: at matched quality, its normalized cost is ${\approx}56$\% lower than the strongest fixed technique; at matched normalized cost, it improves quality by ${\approx}7$\% ($26$\% over single-shot decoding). These results argue for consolidating these reliability techniques into a single tunable layer informed by channel coding.
Editorial analysis
A structured set of objections, weighed in public.
Referee Report
Summary. The manuscript proposes a communication-theoretic framework for LLM agents by treating LLM sampling at temperature T as a discrete stochastic channel p(y|x) in the Shannon sense. It unifies techniques such as retry, majority voting, and self-consistency as special cases of six classical reliability operators (diversity combining, hybrid retransmission, iterative generator-critic decoding, rateless sampling, structured redundant verification, and difficulty-adaptive routing). The framework yields two closed-form results: a noise-variance threshold above which uniform averaging outperforms quality-weighted averaging, and a contractivity criterion for generator-critic refinement. It introduces a cost-aware semantic-nearest-neighbor router controlled by a single Lagrangian multiplier that traverses the quality-cost frontier. Empirical results on 69 hard tasks and a 300-item hard split of MMLU, GSM8K, and HumanEval claim that the router occupies the full Pareto frontier, achieving ~56% lower normalized cost at matched quality and ~7% quality improvement (~26% over single-shot) at matched cost.
Significance. If the derivations and experiments are substantiated, the work offers a unifying analytical foundation that consolidates disparate LLM reliability methods under communication theory, potentially enabling more principled and tunable designs for agents. The closed-form results provide concrete, testable predictions (including scaling behavior between model sizes) and the single-knob router demonstrates a practical mechanism for cost-aware adaptation without retraining. The reported Pareto dominance on standard benchmarks, if reproducible, would have direct implications for efficient deployment of LLM systems where quality-cost tradeoffs are critical.
major comments (2)
- [Abstract] Abstract: The two closed-form results (noise-variance threshold for averaging methods and contractivity criterion for generator-critic decoding) are asserted without any mathematical expressions, derivation steps, or explicit conditions, which are load-bearing for the central claim that the framework is grounded in communication theory and yields analytical insights.
- [Abstract] Abstract: The primary empirical claim that the router occupies the full Pareto frontier on the 300-item hard split of MMLU/GSM8K/HumanEval (with ~56% normalized cost reduction at matched quality and ~7% quality gain at matched cost) is stated without any tables, figures, definitions of normalized cost, construction details for the hard split, identity of the strongest fixed baselines, or aggregation method across tasks, rendering the result unassessable and unverifiable from the manuscript.
minor comments (2)
- [Abstract] Abstract: The precise definition of 'normalized cost' (e.g., tokens, latency, or API units) is not provided, which affects interpretation of the reported cost-quality tradeoffs.
- [Abstract] Abstract: While the six reliability operators are named, the explicit mapping from common LLM techniques (such as self-consistency or majority voting) to specific operators is not detailed, reducing clarity of the unification claim.
Simulated Author's Rebuttal
We thank the referee for the constructive and detailed review. We address each major comment below and have revised the abstract to improve clarity and self-containment while preserving its conciseness. The full derivations and experimental details remain in the main text and appendix.
read point-by-point responses
-
Referee: [Abstract] Abstract: The two closed-form results (noise-variance threshold for averaging methods and contractivity criterion for generator-critic decoding) are asserted without any mathematical expressions, derivation steps, or explicit conditions, which are load-bearing for the central claim that the framework is grounded in communication theory and yields analytical insights.
Authors: We agree that the abstract would be strengthened by including the explicit expressions. The noise-variance threshold (above which uniform averaging outperforms quality-weighted averaging) and the contractivity criterion for generator-critic refinement are derived in Sections 3.2 and 4.1, respectively, with the observed contractive-to-divergent transition between 3B- and 14B-parameter models. To address the concern, we have revised the abstract to incorporate the key formulas and conditions in compact form. revision: yes
-
Referee: [Abstract] Abstract: The primary empirical claim that the router occupies the full Pareto frontier on the 300-item hard split of MMLU/GSM8K/HumanEval (with ~56% normalized cost reduction at matched quality and ~7% quality gain at matched cost) is stated without any tables, figures, definitions of normalized cost, construction details for the hard split, identity of the strongest fixed baselines, or aggregation method across tasks, rendering the result unassessable and unverifiable from the manuscript.
Authors: We acknowledge that the abstract summarizes the results without supporting details. The full experimental evidence—including tables and figures for the Pareto frontier, the definition of normalized cost (total token usage relative to single-shot baseline), the hard-split construction (items failed by base models under single-shot decoding), the strongest fixed baselines (majority voting and self-consistency across budgets), and macro-average aggregation—is provided in Section 5 and the appendix. We have revised the abstract to include a brief definition of normalized cost and the hard-split selection criterion for improved verifiability. revision: yes
Circularity Check
No circularity detected in claimed derivation chain
full rationale
The abstract frames LLM sampling as a Shannon channel and maps existing techniques to classical reliability operators, then states two closed-form results and a Lagrangian-parameter router. No equations, self-citations, or derivation steps are supplied in the available text, so none can be shown to reduce to their inputs by construction. The Pareto-frontier claim is presented as an empirical outcome on a 300-item split rather than a mathematical identity or fitted prediction renamed as a result. The framework therefore remains self-contained against external communication-theory benchmarks.
Axiom & Free-Parameter Ledger
free parameters (1)
- Lagrangian multiplier
axioms (1)
- domain assumption An LLM sampled at temperature T is a discrete stochastic channel p(y | x)
invented entities (1)
-
cost-aware semantic-nearest-neighbor router
no independent evidence
Reference graph
Works this paper leans on
-
[1]
Delta-Sigma Data Converters: Theory, Design, and Simulation , author =. 1996 , publisher =
work page 1996
- [2]
- [3]
-
[4]
Tse, David and Viswanath, Pramod , title =
-
[5]
Proceedings of ICC'93 -- IEEE International Conference on Communications , pages =
Berrou, Claude and Glavieux, Alain and Thitimajshima, Punya , title =. Proceedings of ICC'93 -- IEEE International Conference on Communications , pages =. 1993 , volume =
work page 1993
- [6]
-
[7]
M. The Effect upon Channel Capacity in Wireless Communications of Perfect and Imperfect Knowledge of the Channel , journal =
-
[8]
IEEE Transactions on Information Theory , volume =
Yoo, Taesang and Goldsmith, Andrea , title =. IEEE Transactions on Information Theory , volume =
-
[9]
IEEE Transactions on Wireless Communications , volume =
Gao, Feifei and Nallanathan, Arumugam and Wang, Jianguo , title =. IEEE Transactions on Wireless Communications , volume =
-
[10]
and Alouini, Mohamed-Slim , title =
Simon, Marvin K. and Alouini, Mohamed-Slim , title =. Wiley-Interscience , year =
-
[11]
IEEE International Symposium on Information Theory (ISIT) , year =
Taricco, Giorgio and Biglieri, Ezio , title =. IEEE International Symposium on Information Theory (ISIT) , year =
-
[12]
IEEE Transactions on Communications , volume =
ten Brink, Stephan , title =. IEEE Transactions on Communications , volume =
-
[13]
IEEE transactions on communications , volume=
Convergence behavior of iteratively decoded parallel concatenated codes , author=. IEEE transactions on communications , volume=. 2001 , publisher=
work page 2001
-
[14]
Richardson, Thomas J. and Urbanke, R. The Capacity of Low-Density Parity-Check Codes Under Message-Passing Decoding , journal =
-
[15]
and Smale, Stephen and Devaney, Robert L
Hirsch, Morris W. and Smale, Stephen and Devaney, Robert L. , title =
-
[16]
Tomiuk, Bruno R. and Beaulieu, Norman C. and Abu-Dayya, Adnan A. , title =. IEEE Transactions on Communications , volume =
-
[17]
IEEE Transactions on Information Theory , volume =
Hagenauer, Joachim and Offer, Elke and Papke, Lutz , title =. IEEE Transactions on Information Theory , volume =
-
[18]
Proceedings of the 43rd Annual IEEE Symposium on Foundations of Computer Science (FOCS) , pages =
Luby, Michael , title =. Proceedings of the 43rd Annual IEEE Symposium on Foundations of Computer Science (FOCS) , pages =
-
[19]
IEEE Transactions on Information Theory , volume =
Shokrollahi, Amin , title =. IEEE Transactions on Information Theory , volume =
-
[20]
Electronics Letters , volume =
Vogt, Joerg and Finger, Adolf , title =. Electronics Letters , volume =
-
[21]
IEEE Transactions on Information Theory , volume =
Caire, Giuseppe and Tuninetti, Daniela , title =. IEEE Transactions on Information Theory , volume =
-
[22]
Goldsmith, Andrea J. and Chua, Soon-Ghee , title =. IEEE Transactions on Communications , volume =
- [23]
- [24]
-
[25]
arXiv preprint arXiv:2406.19664 , year =
Xu, Deng-Ao and others , title =. arXiv preprint arXiv:2406.19664 , year =
-
[26]
arXiv preprint arXiv:2303.08774 , year =
work page internal anchor Pith review Pith/arXiv arXiv
- [27]
-
[28]
LLaMA: Open and Efficient Foundation Language Models
Touvron, Hugo and Martin, Louis and Stone, Kevin and others , title =. arXiv preprint arXiv:2302.13971 , year =
work page internal anchor Pith review Pith/arXiv arXiv
-
[29]
Advances in Neural Information Processing Systems (NeurIPS) , volume =
Wei, Jason and Wang, Xuezhi and Schuurmans, Dale and Bosma, Maarten and Ichter, Brian and Xia, Fei and Chi, Ed and Le, Quoc and Zhou, Denny , title =. Advances in Neural Information Processing Systems (NeurIPS) , volume =
-
[30]
and Cao, Yuan and Narasimhan, Karthik , title =
Yao, Shunyu and Yu, Dian and Zhao, Jeffrey and Shafran, Izhak and Griffiths, Thomas L. and Cao, Yuan and Narasimhan, Karthik , title =. Advances in Neural Information Processing Systems (NeurIPS) , volume =
-
[31]
Findings of the Association for Computational Linguistics: ACL 2024 , pages =
Dhuliawala, Shehzaad and Komeili, Mojtaba and Xu, Jing and Raileanu, Roberta and Li, Xian and Celikyilmaz, Asli and Weston, Jason , title =. Findings of the Association for Computational Linguistics: ACL 2024 , pages =
work page 2024
-
[32]
Proceedings of the 11th International Conference on Learning Representations (ICLR) , year =
Wang, Xuezhi and Wei, Jason and Schuurmans, Dale and Le, Quoc and Chi, Ed and Narang, Sharan and Chowdhery, Aakanksha and Zhou, Denny , title =. Proceedings of the 11th International Conference on Learning Representations (ICLR) , year =
-
[33]
Confidence Improves Self-Consistency in
Taubenfeld, Amir and Sheffer, Tom and Ofek, Eran and Feder, Amir and Goldstein, Ariel and Gekhman, Zorik and Yona, Gal , booktitle=. Confidence Improves Self-Consistency in
-
[34]
NeurIPS 2025 Workshop on Efficient Reasoning , year =
Feng, Austin and Alonso, Marius and Odonnat, Ambroise , title =. NeurIPS 2025 Workshop on Efficient Reasoning , year =
work page 2025
- [35]
-
[36]
arXiv preprint arXiv:2603.08999 , year =
Chen, Yuheng and others , title =. arXiv preprint arXiv:2603.08999 , year =
-
[37]
Li, Yifei and Lin, Zeqi and Zhang, Shizhuo and Fu, Qiang and Chen, Bei and Lou, Jian-Guang and Chen, Weizhu , title =. Proceedings of the 61st Annual Meeting of the Association for Computational Linguistics (Volume 1: Long Papers) , pages =
-
[38]
Proceedings of the 13th International Conference on Learning Representations (ICLR) , year =
Wang, Junlin and Wang, Jue and Athiwaratkun, Ben and Zhang, Ce and Zou, James , title =. Proceedings of the 13th International Conference on Learning Representations (ICLR) , year =
-
[39]
arXiv preprint arXiv:2502.00674 , year =
Li, Wenzhe and Lin, Yong and Xia, Mengzhou and Jin, Chi , title =. arXiv preprint arXiv:2502.00674 , year =
-
[40]
Jiang, Dongfu and Ren, Xiang and Lin, Bill Yuchen , title =. Proceedings of the 61st Annual Meeting of the Association for Computational Linguistics (Volume 1: Long Papers) , pages =
-
[41]
Harnessing Multiple Large Language Models: A Survey on LLM Ensemble
Chen, Zhijun and Lu, Xiaodong and Li, Jingzheng and Chen, Pengpeng and Li, Zhuoran and Sun, Kai and Luo, Yuankai and Mao, Qianren and Li, Ming and Xiao, Likang and Yang, Dingqi and Huang, Xiao and Ban, Yikun and Sun, Hailong and Yu, Philip S. , title =. arXiv preprint arXiv:2502.18036 , year =
work page internal anchor Pith review Pith/arXiv arXiv
-
[42]
arXiv preprint arXiv:2601.16596 , year =
Wen, Jianyu and Wei, Yang and Yu, Xiongxi and Xiao, Changxuan and Zeng, Ke , title =. arXiv preprint arXiv:2601.16596 , year =
-
[43]
Transactions on Machine Learning Research , year =
Chen, Lingjiao and Zaharia, Matei and Zou, James , title =. Transactions on Machine Learning Research , year =
-
[44]
Ong, Isaac and Almahairi, Amjad and Wu, Vincent and Chiang, Wei-Lin and Wu, Tianhao and Gonzalez, Joseph E. and Kadous, M. Waleed and Stoica, Ion , title =. Proceedings of the 13th International Conference on Learning Representations (ICLR) , year =
-
[45]
Fly-Swat or Cannon? Cost-Effective Language Model Choice via Meta-Modeling , journal =
-
[46]
Lu, Keming and Yuan, Hongyi and Lin, Runji and Lin, Junyang and Yuan, Zheng and Zhou, Chang and Zhou, Jingren , title =. Proceedings of the 2024 Conference of the North American Chapter of the Association for Computational Linguistics (NAACL) , year =
work page 2024
-
[47]
A unified approach to routing and cascading for llms.arXiv preprint arXiv:2410.10347, 2024
Gupta, Shrey and others , title =. arXiv preprint arXiv:2410.10347 , year =
-
[48]
arXiv preprint arXiv:2602.09902 , year =
Yang, Hongbin and others , title =. arXiv preprint arXiv:2602.09902 , year =
-
[49]
arXiv preprint arXiv:2602.21231 , year =
Martinez, Alejandro and others , title =. arXiv preprint arXiv:2602.21231 , year =
-
[50]
TRIM: Hybrid Inference via Targeted Stepwise Routing in Multi-Step Reasoning Tasks
Xu, Haibo and others , title =. arXiv preprint arXiv:2601.10245 , year =
work page internal anchor Pith review Pith/arXiv arXiv
-
[51]
Advances in Neural Information Processing Systems (NeurIPS) , volume =
Madaan, Aman and Tandon, Niket and Gupta, Prakhar and Hallinan, Skyler and Gao, Luyu and Wiegreffe, Sarah and Alon, Uri and Dziri, Nouha and Prabhumoye, Shrimai and Yang, Yiming and Gupta, Shashank and Majumder, Bodhisattwa Prasad and Hermann, Katherine and Welleck, Sean and Yazdanbakhsh, Amir and Clark, Peter , title =. Advances in Neural Information Pro...
-
[52]
Advances in Neural Information Processing Systems (NeurIPS) , volume =
Shinn, Noah and Cassano, Federico and Gopinath, Ashwin and Narasimhan, Karthik and Yao, Shunyu , title =. Advances in Neural Information Processing Systems (NeurIPS) , volume =
-
[53]
arXiv preprint arXiv:2511.10621 , year =
Kumar, Abhishek and others , title =. arXiv preprint arXiv:2511.10621 , year =
-
[54]
arXiv preprint arXiv:2502.05605 , year =
Chen, Ruoyu and others , title =. arXiv preprint arXiv:2502.05605 , year =
-
[55]
Learning to Refine: Self-Refinement of Parallel Reasoning in LLMs
Sun, Zhixiang and others , title =. arXiv preprint arXiv:2509.00084 , year =
work page internal anchor Pith review Pith/arXiv arXiv
-
[56]
Co-Reyes, Avi Singh, Kate Baumli, Shariq Iqbal, Colton Bishop, Rebecca Roelofs, Lei M
Kumar, Aviral and others , title =. arXiv preprint arXiv:2409.12917 , year =
-
[57]
arXiv preprint arXiv:2512.20845 , year =
Wang, Lingxiao and others , title =. arXiv preprint arXiv:2512.20845 , year =
-
[58]
Du, Yilun and Li, Shuang and Torralba, Antonio and Tenenbaum, Joshua B. and Mordatch, Igor , title =. Proceedings of the 41st International Conference on Machine Learning (ICML) , year =
-
[59]
arXiv preprint arXiv:2601.04742 , year =
Chen, Xiaojun and others , title =. arXiv preprint arXiv:2601.04742 , year =
-
[60]
Frontiers of Computer Science , year =
Ning, Yucheng and Lin, Xixun and Fang, Fang and Cao, Yanan , title =. Frontiers of Computer Science , year =
-
[61]
arXiv preprint arXiv:2511.07784 , year =
Sun, Chunhui and others , title =. arXiv preprint arXiv:2511.07784 , year =
-
[62]
Proceedings of the 13th International Conference on Learning Representations (ICLR) , year =
Snell, Charlie and Lee, Jaehoon and Xu, Kelvin and Kumar, Aviral , title =. Proceedings of the 13th International Conference on Learning Representations (ICLR) , year =
-
[63]
arXiv preprint arXiv:2502.18080 , year =
Wu, Zhenyu and others , title =. arXiv preprint arXiv:2502.18080 , year =
-
[64]
Li, Haoxiang and others , title =. arXiv preprint arXiv:2503.04412 , year =
-
[65]
arXiv preprint arXiv:2604.01411 , year =
Roberts, Nicholas and Cho, Sungjun and Gao, Zhiqi and Huang, Tzu-Heng and Wu, Albert and Orlanski, Gabriel and Trost, Avi and Buchanan, Kelly and Albarghouthi, Aws and Sala, Frederic , title =. arXiv preprint arXiv:2604.01411 , year =
-
[66]
arXiv preprint arXiv:2506.12928 , year =
Zhu, King and Li, Hanhao and Wu, Siwei and Xing, Tianshun and Ma, Dehua and Tang, Xiangru and Liu, Minghao and Yang, Jian and Liu, Jiaheng and Jiang, Yuchen Eleanor and Zhang, Changwang and Lin, Chenghua and Wang, Jun and Zhang, Ge and Zhou, Wangchunshu , title =. arXiv preprint arXiv:2506.12928 , year =
-
[67]
Proceedings of the 12th International Conference on Learning Representations (ICLR) , year =
Lightman, Hunter and Kosaraju, Vineet and Burda, Yura and Edwards, Harri and Baker, Bowen and Lee, Teddy and Leike, Jan and Schulman, John and Sutskever, Ilya and Cobbe, Karl , title =. Proceedings of the 12th International Conference on Learning Representations (ICLR) , year =
-
[68]
Training Verifiers to Solve Math Word Problems
Cobbe, Karl and Kosaraju, Vineet and Bavarian, Mohammad and Chen, Mark and Jun, Heewoo and Kaiser, Lukasz and Plappert, Matthias and Tworek, Jerry and Hilton, Jacob and Nakano, Reiichiro and Hesse, Christopher and Schulman, John , title =. arXiv preprint arXiv:2110.14168 , year =
work page internal anchor Pith review Pith/arXiv arXiv
-
[69]
Findings of the Association for Computational Linguistics: EMNLP 2023 , year =
Weng, Yixuan and Zhu, Minjun and Xia, Fei and Li, Bin and He, Shizhu and Liu, Kang and Zhao, Jun , title =. Findings of the Association for Computational Linguistics: EMNLP 2023 , year =
work page 2023
-
[70]
Proceedings of the 13th International Conference on Learning Representations (ICLR) , year =
Chow, Yinlam and Tennenholtz, Guy and Gur, Izzeddin and Zhuang, Vincent and Dai, Bo and Thiagarajan, Sridhar and Boutilier, Craig and Agarwal, Rishabh and Kumar, Aviral and Faust, Aleksandra , title =. Proceedings of the 13th International Conference on Learning Representations (ICLR) , year =
-
[71]
Technical report: Enhancing llm reasoning with reward-guided tree search
Guan, Yiyang and others , title =. arXiv preprint arXiv:2411.11694 , year =
-
[72]
and Zhang, Hao and Gonzalez, Joseph E
Zheng, Lianmin and Chiang, Wei-Lin and Sheng, Ying and Zhuang, Siyuan and Wu, Zhanghao and Zhuang, Yonghao and Lin, Zi and Li, Zhuohan and Li, Dacheng and Xing, Eric P. and Zhang, Hao and Gonzalez, Joseph E. and Stoica, Ion , title =. Advances in Neural Information Processing Systems (NeurIPS) , volume =
-
[73]
Gu, Jiawei and Jiang, Xuhui and Shi, Zhichao and Tan, Hexiang and Zhai, Xuehao and Xu, Chengjin and Li, Wei and Shen, Yinghan and Ma, Shengjie and Liu, Honghao and Wang, Saizhuo and Zhang, Kun and Wang, Yuanzhuo and Gao, Wen and Ni, Lionel and Guo, Jian , title =. arXiv preprint arXiv:2411.15594 , year =
work page internal anchor Pith review Pith/arXiv arXiv
-
[74]
A survey on LLM-as-a-Judge , author=. The Innovation , pages=. 2026 , publisher=
work page 2026
-
[75]
LLMs-as-Judges: A Comprehensive Survey on LLM-based Evaluation Methods
Li, Sipeng and others , title =. arXiv preprint arXiv:2412.05579 , year =
work page internal anchor Pith review Pith/arXiv arXiv
-
[76]
arXiv preprint arXiv:2412.12509 , year =
Schroeder, Kayla and Wood-Doughty, Zach , title =. arXiv preprint arXiv:2412.12509 , year =
-
[77]
arXiv preprint arXiv:2506.13639 , year =
Khattar, Dhruv and others , title =. arXiv preprint arXiv:2506.13639 , year =
-
[78]
Large language models hallucination: A comprehensive survey.arXiv preprint arXiv:2510.06265, 2025
Rawte, Vipula and others , title =. arXiv preprint arXiv:2510.06265 , year =
-
[79]
arXiv preprint arXiv:2510.24476 , year =
Ji, Ziwei and others , title =. arXiv preprint arXiv:2510.24476 , year =
-
[80]
AutoGen: Enabling Next-Gen LLM Applications via Multi-Agent Conversation
Wu, Qingyun and Bansal, Gagan and Zhang, Jieyu and Wu, Yiran and Li, Beibin and Zhu, Erkang and Jiang, Li and Zhang, Xiaoyun and Zhang, Shaokun and Liu, Jiale and Awadallah, Ahmed Hassan and White, Ryen W. and Burger, Doug and Wang, Chi , title =. arXiv preprint arXiv:2308.08155 , year =
work page internal anchor Pith review Pith/arXiv arXiv
discussion (0)
Sign in with ORCID, Apple, or X to comment. Anyone can read and Pith papers without signing in.