Harnessing Multiple Large Language Models: A Survey on LLM Ensemble
Pith reviewed 2026-05-23 02:20 UTC · model grok-4.3
The pith
LLM ensemble methods can be systematically reviewed and classified using a three-stage taxonomy based on when the combination occurs relative to inference.
A machine-rendered reading of the paper's core claim, the machinery that carries it, and where it could break.
Core claim
The paper claims to deliver the first comprehensive taxonomy and review of LLM Ensemble, showing that methods fall into three distinct stages relative to the inference process, with each stage containing specific techniques that can be reviewed and compared through existing benchmarks and applications.
What carries the argument
The three-stage taxonomy (ensemble-before-inference, ensemble-during-inference, ensemble-after-inference) that organizes all LLM ensemble methods for review and classification.
If this is right
- Existing methods can be mapped onto the taxonomy without major omissions.
- The review identifies gaps that future work can address in each stage.
- Benchmarks provide a way to evaluate and compare ensemble approaches.
- Applications demonstrate practical uses across various domains.
Where Pith is reading between the lines
- The taxonomy could guide the development of new hybrid methods that operate across multiple stages.
- Practitioners might use the classification to choose ensemble strategies based on their computational constraints.
- Future surveys could expand the taxonomy if new methods emerge that challenge the three-stage division.
- Connections between LLM Ensemble and other multi-model techniques like mixture-of-experts may warrant further investigation.
Load-bearing premise
That the three-stage taxonomy comprehensively partitions the space of all relevant LLM ensemble methods without significant omissions or overlaps.
What would settle it
Discovery of an LLM ensemble technique that requires a fourth distinct category or cannot fit into the existing three without substantial overlap would challenge the taxonomy's completeness.
Figures
read the original abstract
LLM Ensemble -- which involves the comprehensive use of multiple large language models (LLMs), each aimed at handling user queries during downstream inference, to benefit from their individual strengths -- has gained substantial attention recently. The widespread availability of LLMs, coupled with their varying strengths and out-of-the-box usability, has profoundly advanced the field of LLM Ensemble. This paper presents the first systematic review of recent developments in LLM Ensemble. First, we introduce our taxonomy of LLM Ensemble and discuss several related research problems. Then, we provide a more in-depth classification of the methods under the broad categories of "ensemble-before-inference, ensemble-during-inference, ensemble-after-inference'', and review all relevant methods. Finally, we introduce related benchmarks and applications, summarize existing studies, and suggest several future research directions. A curated list of papers on LLM Ensemble is available at https://github.com/junchenzhi/Awesome-LLM-Ensemble.
Editorial analysis
A structured set of objections, weighed in public.
Referee Report
Summary. The paper claims to present the first systematic review of recent developments in LLM Ensemble, which uses multiple LLMs to leverage their individual strengths for downstream inference. It introduces a taxonomy partitioning methods into ensemble-before-inference, ensemble-during-inference, and ensemble-after-inference; provides in-depth classification and review of methods under these categories; discusses related research problems, benchmarks, and applications; summarizes existing studies; and suggests future directions, supported by a curated GitHub list of papers.
Significance. If the taxonomy is shown to be comprehensive, the survey would offer a useful organizing framework for the rapidly growing LLM ensemble literature in NLP, helping researchers identify patterns, gaps, and connections across methods. The public GitHub repository of curated papers is a clear strength, enhancing accessibility and reproducibility of the review.
major comments (1)
- [Taxonomy introduction and classification sections] The claim of presenting the 'first systematic review' (abstract) rests on the three-stage taxonomy being exhaustive and non-overlapping. The manuscript does not explicitly analyze or rule out hybrid methods (e.g., dynamic model selection that combines before- and during-inference stages) or orthogonal dimensions (e.g., ensembles over prompting strategies), which could produce overlaps or omissions and undermine the partition's completeness as a systematic structure.
minor comments (1)
- [Introduction] The abstract states that 'several related research problems' are discussed, but the manuscript could clarify in the introduction or taxonomy section how these problems map onto the three-stage structure.
Simulated Author's Rebuttal
We thank the referee for their constructive feedback, which helps clarify the presentation of our taxonomy. We address the major comment below and will revise the manuscript to incorporate the suggested analysis.
read point-by-point responses
-
Referee: [Taxonomy introduction and classification sections] The claim of presenting the 'first systematic review' (abstract) rests on the three-stage taxonomy being exhaustive and non-overlapping. The manuscript does not explicitly analyze or rule out hybrid methods (e.g., dynamic model selection that combines before- and during-inference stages) or orthogonal dimensions (e.g., ensembles over prompting strategies), which could produce overlaps or omissions and undermine the partition's completeness as a systematic structure.
Authors: We appreciate the referee's observation. Our taxonomy partitions methods according to the primary stage (before, during, or after inference) at which the ensemble decision or aggregation occurs, which provides a clear and actionable organizing principle for the literature. We agree that the manuscript does not explicitly analyze hybrid methods or orthogonal dimensions such as prompting-strategy ensembles. To strengthen the taxonomy section, we will add a dedicated paragraph (or short subsection) that (1) acknowledges the possibility of hybrid approaches, (2) illustrates how a method that spans stages can still be classified by its dominant stage while noting the hybrid aspect, and (3) clarifies that orthogonal dimensions (e.g., prompting) are largely independent of the stage-based partition and can be applied across categories. This revision will make the boundaries of the taxonomy more transparent without altering its core structure or the claim of providing the first systematic review organized around these stages. revision: yes
Circularity Check
No circularity: descriptive survey with proposed taxonomy
full rationale
This paper is a literature review that introduces a three-stage taxonomy solely to organize existing LLM ensemble methods; no derivations, equations, fitted parameters, or predictions are present. The taxonomy is explicitly presented as an organizing framework rather than a result derived from prior claims, and the 'first systematic review' statement rests on coverage of the literature rather than any self-referential reduction. No self-citation chains, ansatzes, or uniqueness theorems are invoked as load-bearing steps. The work is self-contained as a descriptive survey.
Axiom & Free-Parameter Ledger
Forward citations
Cited by 8 Pith papers
-
Sampling from Your Language Model One Byte at a Time
An inference-time technique turns BPE-based LMs into byte- or character-level models, solving the prompt boundary problem while unifying vocabularies across different tokenizers.
-
Rethinking Predictive Modeling for LLM Routing: When Simple kNN Beats Complex Learned Routers
A well-tuned kNN router matches or exceeds state-of-the-art learned routers on new standardized benchmarks spanning instruction, QA, reasoning, and the first multi-modal visual routing dataset, due to locality of mode...
-
A Communication-Theoretic Framework for LLM Agents: Cost-Aware Adaptive Reliability
LLM reliability techniques are unified as communication channel operators, with a new cost-aware router achieving superior quality-cost tradeoffs on hard tasks.
-
Rethinking LLM Ensembling from the Perspective of Mixture Models
ME reinterprets LLM ensembling as a mixture model by sampling a single model stochastically at each token step, matching the ensemble distribution while invoking only one model per step for substantial speed gains.
-
Token-Level LLM Collaboration via FusionRoute
FusionRoute augments token-level expert routing with a trainable complementary logit generator to expand the policy class and recover optimal decoding under mild conditions, outperforming prior collaboration and mergi...
-
SpecFed: Accelerating Federated LLM Inference with Speculative Decoding and Compressed Transmission
SpecFed accelerates federated LLM inference via speculative decoding for parallel processing and top-K compression with server-side reconstruction, achieving high fidelity with reduced communication overhead.
-
Scoring, Reasoning, and Selecting the Best! Ensembling Large Language Models via a Peer-Review Process
LLM-PeerReview ensembles LLMs by scoring responses with LLM-as-Judge and selecting the best via averaging or truth inference, beating Smoothie-Global by 6.9-7.3 points on four datasets.
-
LLM-Powered AI Agent Systems and Their Applications in Industry
A survey categorizing LLM-powered agent systems into software-based, physical, and hybrid types, covering industrial applications and challenges such as latency and security.
Reference graph
Works this paper leans on
-
[1]
[Achiamet al., 2023 ] Josh Achiam, Steven Adler, Sandhini Agar- wal, Lama Ahmad, Ilge Akkaya, Florencia Leoni Aleman, Diogo Almeida, Janko Altenschmidt, Sam Altman, Shyamal Anadkat, et al. Gpt-4 technical report.arXiv:2303.08774,
work page internal anchor Pith review Pith/arXiv arXiv 2023
-
[2]
Automix: Automatically mixing language models
[Aggarwalet al., 2023 ] Pranjal Aggarwal, Aman Madaan, Ankit Anand, Srividya Pranavi Potharaju, Swaroop Mishra, Pei Zhou, Aditya Gupta, Dheeraj Rajagopal, Karthik Kappaganthu, Yiming Yang, et al. Automix: Automatically mixing language models. arXiv preprint arXiv:2310.12963,
-
[3]
Structured probabilistic end-to-end learning from crowds
[Chenet al., 2021 ] Zhijun Chen, Huimin Wang, Hailong Sun, Pengpeng Chen, Tao Han, Xudong Liu, and Jie Yang. Structured probabilistic end-to-end learning from crowds. InIJCAI,
work page 2021
-
[4]
Adversarial learning from crowds
[Chenet al., 2022 ] Pengpeng Chen, Hailong Sun, Yongqiang Yang, and Zhijun Chen. Adversarial learning from crowds. InAAAI,
work page 2022
-
[5]
FrugalGPT: How to Use Large Language Models While Reducing Cost and Improving Performance
[Chenet al., 2023a ] Lingjiao Chen, Matei Zaharia, and James Zou. Frugalgpt: How to use large language models while reducing cost and improving performance.arXiv:2305.05176,
work page internal anchor Pith review Pith/arXiv arXiv
-
[6]
[Chenet al., 2025 ] Yi Chen, JiaHao Zhao, and HaoHao Han. A survey on collaborative mechanisms between large and small lan- guage models.arXiv preprint arXiv:2505.07460,
-
[7]
A unified approach to routing and cascad- ing for llms.arXiv preprint arXiv:2410.10347,
[Dekonincket al., 2024 ] Jasper Dekoninck, Maximilian Baader, and Martin Vechev. A unified approach to routing and cascad- ing for llms.arXiv preprint arXiv:2410.10347,
-
[8]
Hybrid llm: Cost- efficient and quality-aware query routing
[Dinget al., 2024 ] Dujian Ding, Ankur Mallick, Chi Wang, Robert Sim, Subhabrata Mukherjee, Victor R ¨uhle, Laks VS Laksh- manan, and Ahmed Hassan Awadallah. Hybrid llm: Cost- efficient and quality-aware query routing. InICLR,
work page 2024
-
[9]
A survey on ensemble learning.Frontiers of Computer Science, 14(2):241–258,
[Donget al., 2020 ] Xibin Dong, Zhiwen Yu, Wenming Cao, Yifan Shi, and Qianli Ma. A survey on ensemble learning.Frontiers of Computer Science, 14(2):241–258,
work page 2020
-
[10]
Improving Factuality and Reasoning in Language Models through Multiagent Debate
[Duet al., 2023 ] Yilun Du, Shuang Li, Antonio Torralba, Joshua B Tenenbaum, and Igor Mordatch. Improving factuality and rea- soning in language models through multiagent debate.arXiv preprint arXiv:2305.14325,
work page internal anchor Pith review Pith/arXiv arXiv 2023
-
[11]
Bayesian calibration of win rate estimation with llm evaluators.arXiv preprint arXiv:2411.04424,
[Gaoet al., 2024 ] Yicheng Gao, Gonghan Xu, Zhe Wang, and Ar- man Cohan. Bayesian calibration of win rate estimation with llm evaluators.arXiv preprint arXiv:2411.04424,
-
[12]
Smoothie: Label free language model routing
[Guhaet al., 2024 ] Neel Guha, Mayee F Chen, Trevor Chow, Is- han S Khare, and Christopher Re. Smoothie: Label free language model routing. InNeuIPS,
work page 2024
-
[13]
[Gundabathula and Kolar, 2024] Satya Kesav Gundabathula and Sriram R Kolar. Promptmind team at mediqa-corr 2024: Im- proving clinical text correction with error categorization and llm ensembles.arXiv preprint arXiv:2405.08373,
-
[14]
Language model cascades: Token-level uncer- tainty and beyond.arXiv preprint arXiv:2404.10136,
[Guptaet al., 2024 ] Neha Gupta, Harikrishna Narasimhan, Wit- tawat Jitkrittum, Ankit Singh Rawat, Aditya Krishna Menon, and Sanjiv Kumar. Language model cascades: Token-level uncer- tainty and beyond.arXiv preprint arXiv:2404.10136,
-
[15]
Dynamic ensemble reasoning for llm experts.arXiv preprint arXiv:2412.07448,
[Huet al., 2024a ] Jinwu Hu, Yufeng Wang, Shuhai Zhang, Kai Zhou, Guohao Chen, Yu Hu, Bin Xiao, and Mingkui Tan. Dynamic ensemble reasoning for llm experts.arXiv preprint arXiv:2412.07448,
-
[16]
RouterBench: A Benchmark for Multi-LLM Routing System
[Huet al., 2024b ] Qitian Jason Hu, Jacob Bieker, Xiuyu Li, Nan Jiang, Benjamin Keigwin, Gaurav Ranganath, Kurt Keutzer, and Shriyash Kaustubh Upadhyay. Routerbench: A benchmark for multi-llm routing system.arXiv preprint arXiv:2403.12031,
work page internal anchor Pith review Pith/arXiv arXiv
-
[17]
Ensem- ble learning for heterogeneous large language models with deep parallel collaboration
[Huanget al., 2024 ] Yichong Huang, Xiaocheng Feng, Baohang Li, Yang Xiang, Hui Wang, Ting Liu, and Bing Qin. Ensem- ble learning for heterogeneous large language models with deep parallel collaboration. InNeurIPS,
work page 2024
-
[18]
Llm-blender: Ensembling large language models with pairwise ranking and generative fusion
[Jianget al., 2023 ] Dongfu Jiang, Xiang Ren, and Bill Yuchen Lin. Llm-blender: Ensembling large language models with pairwise ranking and generative fusion. InACL,
work page 2023
-
[19]
[Jinet al., 2024 ] Lifeng Jin, Baolin Peng, Linfeng Song, Haitao Mi, Ye Tian, and Dong Yu. Collaborative decoding of critical tokens for boosting factuality of large language models.arXiv preprint arXiv:2402.17982,
-
[20]
When does confidence-based cascade deferral suffice? NeurIPS, 36,
[Jitkrittumet al., 2024 ] Wittawat Jitkrittum, Neha Gupta, Aditya K Menon, Harikrishna Narasimhan, Ankit Rawat, and Sanjiv Ku- mar. When does confidence-based cascade deferral suffice? NeurIPS, 36,
work page 2024
-
[21]
[Leeet al., 2023 ] Young-Suk Lee, Md Arafat Sultan, Yousef El- Kurdi, Tahira Naseem Asim Munawar, Radu Florian, Salim Roukos, and Ram ´on Fernandez Astudillo. Ensemble-instruct: Generating instruction-tuning data with a heterogeneous mixture of lms.arXiv preprint arXiv:2310.13961,
-
[22]
[Liet al., 2024a ] Junyou Li, Qin Zhang, Yangbin Yu, Qiang Fu, and Deheng Ye. More agents is all you need.arXiv preprint arXiv:2402.05120,
-
[23]
Purifying large language models by ensembling a small language model,
[Liet al., 2024b ] Tianlin Li, Qian Liu, Tianyu Pang, Chao Du, Qing Guo, Yang Liu, and Min Lin. Purifying large language models by ensembling a small language model.arXiv preprint arXiv:2402.14845,
-
[24]
[Li, 2025] Yang Li. Llm bandit: Cost-efficient llm generation via preference-conditioned dynamic routing.arXiv preprint arXiv:2502.02743,
-
[25]
DeepSeek-V2: A Strong, Economical, and Efficient Mixture-of-Experts Language Model
[Liuet al., 2024a ] Aixin Liu, Bei Feng, Bin Wang, Bingxuan Wang, Bo Liu, Chenggang Zhao, Chengqi Dengr, Chong Ruan, Damai Dai, Daya Guo, et al. Deepseek-v2: A strong, economical, and efficient mixture-of-experts language model.arXiv preprint arXiv:2405.04434,
work page internal anchor Pith review Pith/arXiv arXiv
-
[26]
Cool-fusion: Fuse large language models without training.arXiv preprint arXiv:2407.19807,
[Liuet al., 2024b ] Cong Liu, Xiaojun Quan, Yan Pan, Liang Lin, Weigang Wu, and Xu Chen. Cool-fusion: Fuse large language models without training.arXiv preprint arXiv:2407.19807,
-
[27]
[Luet al., 2024a ] Jinliang Lu, Ziliang Pang, Min Xiao, Yaochen Zhu, Rui Xia, and Jiajun Zhang. Merge, ensemble, and coop- erate! a survey on collaborative strategies in the era of large lan- guage models.arXiv preprint arXiv:2407.06089,
-
[28]
Routing to the expert: Efficient reward-guided ensemble of large language models
[Luet al., 2024b ] Keming Lu, Hongyi Yuan, Runji Lin, Junyang Lin, Zheng Yuan, Chang Zhou, and Jingren Zhou. Routing to the expert: Efficient reward-guided ensemble of large language models. InNAACL, pages 1964–1974,
work page 1964
-
[29]
[Luet al., 2024c ] Xiaoding Lu, Zongyi Liu, Adian Liusie, Vyas Raina, Vineet Mudupalli, Yuwen Zhang, and William Beauchamp. Blending is all you need: Cheaper, better alterna- tive to trillion-parameters llm.arXiv preprint arXiv:2401.02994,
-
[30]
[Lvet al., 2024b ] Bo Lv, Chen Tang, Yanan Zhang, Xin Liu, Yue Yu, and Ping Luo. Specfuse: Ensembling large language models via next-segment prediction.arXiv preprint arXiv:2412.07380,
-
[31]
[Mauryaet al., 2024 ] Kaushal Kumar Maurya, KV Srivatsa, and Ekaterina Kochmar. Selectllm: Query-aware efficient selec- tion algorithm for large language models.arXiv preprint arXiv:2408.08545,
-
[32]
Pack of llms: Model fusion at test-time via perplexity optimization.arXiv preprint arXiv:2404.11531,
[Mavromatiset al., 2024 ] Costas Mavromatis, Petros Karypis, and George Karypis. Pack of llms: Model fusion at test-time via perplexity optimization.arXiv preprint arXiv:2404.11531,
-
[33]
Routoo: Learning to route to large language models effectively.arXiv preprint arXiv:2401.13979,
[Mohammadshahiet al., 2024 ] Alireza Mohammadshahi, Ar- shad Rafiq Shaikh, and Majid Yazdani. Routoo: Learning to route to large language models effectively.arXiv preprint arXiv:2401.13979,
-
[34]
[Mohammed and Kora, 2023] Ammar Mohammed and Rania Kora. A comprehensive review on ensemble deep learn- ing: Opportunities and challenges.Journal of King Saud University-Computer and Information Sciences, 35(2):757–774,
work page 2023
-
[35]
Relative representations enable zero-shot latent space communication.arXiv:2209.15430,
[Moschellaet al., 2022 ] Luca Moschella, Valentino Maiorca, Marco Fumero, Antonio Norelli, Francesco Locatello, and Emanuele Rodol `a. Relative representations enable zero-shot latent space communication.arXiv:2209.15430,
-
[36]
[Muet al., 2024 ] Feiteng Mu, Yong Jiang, Liwen Zhang, Chu Liu, Wenjie Li, Pengjun Xie, and Fei Huang. Adaptive selection for homogeneous tools: An instantiation in the rag scenario.arXiv preprint arXiv:2406.12429,
-
[37]
[Nguyenet al., 2024 ] Quang H Nguyen, Duy C Hoang, Juliette De- cugis, Saurav Manchanda, Nitesh V Chawla, and Khoa D Doan. Metallm: A high-performant and cost-efficient dynamic frame- work for wrapping llms.arXiv preprint arXiv:2407.10834,
-
[38]
RouteLLM: Learning to Route LLMs with Preference Data
[Onget al., 2024 ] Isaac Ong, Amjad Almahairi, Vincent Wu, Wei- Lin Chiang, Tianhao Wu, Joseph E Gonzalez, M Waleed Kadous, and Ion Stoica. Routellm: Learning to route llms with preference data.arXiv preprint arXiv:2406.18665,
work page internal anchor Pith review Pith/arXiv arXiv 2024
-
[39]
[Parket al., 2024 ] Sungjin Park, Xiao Liu, Yeyun Gong, and Ed- ward Choi. Ensembling large language models with process reward-guided tree search for better complex reasoning.arXiv preprint arXiv:2412.15797,
-
[40]
Cache & distil: Optimising api calls to large language models.arXiv preprint arXiv:2310.13561,
[Ram´ırezet al., 2023] Guillem Ram ´ırez, Matthias Lindemann, Alexandra Birch, and Ivan Titov. Cache & distil: Optimising api calls to large language models.arXiv preprint arXiv:2310.13561,
-
[41]
Snorkel: Rapid training data creation with weak supervision
[Ratneret al., 2017 ] Alexander Ratner, Stephen H Bach, Henry Ehrenberg, Jason Fries, Sen Wu, and Christopher R ´e. Snorkel: Rapid training data creation with weak supervision. InProceed- ings of the VLDB endowment. International conference on very large data bases, volume 11, page 269,
work page 2017
-
[42]
Sentence-BERT: Sentence Embeddings using Siamese BERT-Networks
[Reimers, 2019] N Reimers. Sentence-bert: Sentence embeddings using siamese bert-networks.arXiv preprint arXiv:1908.10084,
work page internal anchor Pith review Pith/arXiv arXiv 2019
-
[43]
[Ruanet al., 2025 ] Wei Ruan, Tianze Yang, Yifan Zhou, Tianming Liu, and Jin Lu. From task-specific models to unified sys- tems: A review of model merging approaches.arXiv preprint arXiv:2503.08998,
-
[44]
Fly-swat or cannon? cost-effective language model choice via meta-modeling
[ˇSakotaet al., 2024 ] Marija ˇSakota, Maxime Peyrard, and Robert West. Fly-swat or cannon? cost-effective language model choice via meta-modeling. InWSDM, pages 606–615,
work page 2024
-
[45]
Large language model routing with bench- mark datasets
[Shnitzeret al., 2023 ] Tal Shnitzer, Anthony Ou, Mirian Silva, Kate Soule, Yuekai Sun, Justin Solomon, Neil Thompson, and Mikhail Yurochkin. Large language model routing with bench- mark datasets. InNeurIPS,
work page 2023
-
[46]
Getting more out of mixture of language model reasoning experts
[Siet al., 2023 ] Chenglei Si, Weijia Shi, Chen Zhao, Luke Zettle- moyer, and Jordan Boyd-Graber. Getting more out of mixture of language model reasoning experts. InFindings of EMNLP,
work page 2023
-
[47]
Pickllm: Context-aware rl-assisted large lan- guage model routing.arXiv preprint arXiv:2412.12170,
[Sikeridiset al., 2024 ] Dimitrios Sikeridis, Dennis Ramdass, and Pranay Pareek. Pickllm: Context-aware rl-assisted large lan- guage model routing.arXiv preprint arXiv:2412.12170,
-
[48]
[Srivatsaet al., 2024 ] KV Srivatsa, Kaushal Kumar Maurya, and Ekaterina Kochmar. Harnessing the power of multiple minds: Lessons learned from llm routing.arXiv preprint arXiv:2405.00467,
-
[49]
[Stripeliset al., 2024 ] Dimitris Stripelis, Zijian Hu, Jipeng Zhang, Zhaozhuo Xu, Alay Dilipbhai Shah, Han Jin, Yuhang Yao, Salman Avestimehr, and Chaoyang He. Tensoropera router: A multi-model router for efficient llm inference.arXiv preprint arXiv:2408.12320,
-
[50]
Gemini: A Family of Highly Capable Multimodal Models
[Teamet al., 2023 ] Gemini Team, Rohan Anil, Sebastian Borgeaud, Jean-Baptiste Alayrac, Jiahui Yu, Radu Soricut, Johan Schalkwyk, Andrew M Dai, Anja Hauth, Katie Millican, et al. Gemini: a family of highly capable multimodal models. arXiv preprint arXiv:2312.11805,
work page internal anchor Pith review Pith/arXiv arXiv 2023
-
[51]
Llm-topla: Efficient llm ensemble by maximising diversity
[Tekinet al., 2024 ] Selim Tekin, Fatih Ilhan, Tiansheng Huang, Si- hao Hu, and Ling Liu. Llm-topla: Efficient llm ensemble by maximising diversity. InFindings of EMNLP,
work page 2024
-
[52]
LLaMA: Open and Efficient Foundation Language Models
[Touvronet al., 2023 ] Hugo Touvron, Thibaut Lavril, Gautier Izac- ard, Xavier Martinet, Marie-Anne Lachaux, Timoth ´ee Lacroix, Baptiste Rozi `ere, Naman Goyal, Eric Hambro, Faisal Azhar, et al. Llama: Open and efficient foundation language models. arXiv preprint arXiv:2302.13971,
work page internal anchor Pith review Pith/arXiv arXiv 2023
-
[53]
Multi-Agent Collaboration Mechanisms: A Survey of LLMs
[Tranet al., 2025 ] Khanh-Tung Tran, Dung Dao, Minh-Duong Nguyen, Quoc-Viet Pham, Barry O’Sullivan, and Hoang D Nguyen. Multi-agent collaboration mechanisms: A survey of llms.arXiv preprint arXiv:2501.06322,
work page internal anchor Pith review Pith/arXiv arXiv 2025
-
[54]
Model cascading: Towards jointly improving efficiency and accuracy of nlp systems, 2022
[Varshney and Baral, 2022] Neeraj Varshney and Chitta Baral. Model cascading: Towards jointly improving efficiency and ac- curacy of nlp systems.arXiv preprint arXiv:2210.05528,
-
[55]
[Wanget al., 2024 ] Yuanshuai Wang, Xingjian Zhang, Jinkun Zhao, Siwei Wen, Peilin Feng, Shuhao Liao, Lei Huang, and Wenjun Wu. Bench-coe: a framework for collaboration of ex- perts from benchmark.arXiv preprint arXiv:2412.04167,
-
[56]
Bridging the gap between different vocabularies for llm ensem- ble
[Xuet al., 2024 ] Yangyifan Xu, Jinliang Lu, and Jiajun Zhang. Bridging the gap between different vocabularies for llm ensem- ble. InNAACL, pages 7133–7145,
work page 2024
-
[57]
Hit the sweet spot! span-level ensemble for large language models
[Xuet al., 2025 ] Yangyifan Xu, Jianghao Chen, Junhong Wu, and Jiajun Zhang. Hit the sweet spot! span-level ensemble for large language models. InCOLING, pages 8314–8325,
work page 2025
-
[58]
Model Merging in LLMs, MLLMs, and Beyond: Methods, Theories, Applications and Opportunities
[Yanget al., 2024 ] Enneng Yang, Li Shen, Guibing Guo, Xingwei Wang, Xiaochun Cao, Jie Zhang, and Dacheng Tao. Model merg- ing in llms, mllms, and beyond: Methods, theories, applications and opportunities.arXiv preprint arXiv:2408.07666,
work page internal anchor Pith review Pith/arXiv arXiv 2024
-
[59]
Cabs: Conflict- aware and balanced sparsification for enhancing model merging
[Yanget al., 2025 ] Zongzhen Yang, Binhang Qi, Hailong Sun, Wenrui Long, Ruobing Zhao, and Xiang Gao. Cabs: Conflict- aware and balanced sparsification for enhancing model merging. arXiv preprint arXiv:2503.01874,
-
[60]
[Yaoet al., 2024 ] Yuxuan Yao, Han Wu, Mingyang Liu, Sichun Luo, Xiongwei Han, Jie Liu, Zhijiang Guo, and Linqi Song. Determine-then-ensemble: Necessity of top-k union for large language model ensembling.arXiv preprint arXiv:2410.03777,
-
[61]
[Yuet al., 2024 ] Yao-Ching Yu, Chun-Chih Kuo, Ziqi Ye, Yu- Cheng Chang, and Yueh-Se Li. Breaking the ceiling of the llm community by treating token generation as a classification for en- sembling.arXiv preprint arXiv:2406.12585,
-
[62]
Large language model cascades with mixture of thought representations for cost-efficient reasoning
[Yueet al., 2024 ] Murong Yue, Jie Zhao, Min Zhang, Liang Du, and Ziyu Yao. Large language model cascades with mixture of thought representations for cost-efficient reasoning. InICLR,
work page 2024
-
[63]
Wrench: A comprehensive benchmark for weak supervision.arXiv preprint arXiv:2109.11377,
[Zhanget al., 2021 ] Jieyu Zhang, Yue Yu, Yinghao Li, Yujing Wang, Yaming Yang, Mao Yang, and Alexander Ratner. Wrench: A comprehensive benchmark for weak supervision.arXiv preprint arXiv:2109.11377,
-
[64]
A survey on programmatic weak supervision.arXiv preprint arXiv:2202.05433,
[Zhanget al., 2022 ] Jieyu Zhang, Cheng-Yu Hsieh, Yue Yu, Chao Zhang, and Alexander Ratner. A survey on programmatic weak supervision.arXiv preprint arXiv:2202.05433,
-
[65]
Ecoassistant: Using llm as- sistant more affordably and accurately.arXiv preprint arXiv:2310.03046,
[Zhanget al., 2023 ] Jieyu Zhang, Ranjay Krishna, Ahmed H Awadallah, and Chi Wang. Ecoassistant: Using llm as- sistant more affordably and accurately.arXiv preprint arXiv:2310.03046,
-
[66]
If multi-agent debate is the answer, what is the question.arXiv preprint arXiv:2502.08788,
[Zhanget al., 2025 ] Hangfan Zhang, Zhiyao Cui, Xinrun Wang, Qiaosheng Zhang, Zhen Wang, Dinghao Wu, and Shuyue Hu. If multi-agent debate is the answer, what is the question.arXiv preprint arXiv:2502.08788,
-
[67]
[Zhang, 2022] Jing Zhang. Knowledge learning with crowdsourc- ing: A brief review and systematic perspective.IEEE/CAA Jour- nal of Automatica Sinica, 9(5):749–762,
work page 2022
-
[68]
Eagle: Efficient training-free router for multi-llm inference
[Zhaoet al., 2024 ] Zesen Zhao, Shuowei Jin, and Z Morley Mao. Eagle: Efficient training-free router for multi-llm inference. arXiv preprint arXiv:2409.15518,
-
[69]
[Zhenget al., 2017 ] Yudian Zheng, Guoliang Li, Yuanbing Li, Cai- hua Shan, and Reynold Cheng. Truth inference in crowdsourcing: Is the problem solved?Proceedings of the VLDB Endowment, 10(5):541–552,
work page 2017
-
[70]
[Zhenget al., 2025 ] Wenhao Zheng, Yixiao Chen, Weitong Zhang, Souvik Kundu, Yun Li, Zhengzhong Liu, Eric P Xing, Hongyi Wang, and Huaxiu Yao. Citer: Collaborative inference for ef- ficient large language model decoding with token-level routing. arXiv preprint arXiv:2502.01976,
-
[71]
[Zhou, 2021] Zhi-Hua Zhou. Ensemble learning. InMachine learn- ing, pages 181–210. Springer, 2021
work page 2021
discussion (0)
Sign in with ORCID, Apple, or X to comment. Anyone can read and Pith papers without signing in.