Recognition: no theorem link
SOMA: Efficient Multi-turn LLM Serving via Small Language Model
Pith reviewed 2026-05-13 01:33 UTC · model grok-4.3
The pith
SOMA adapts smaller language models to conversation-specific regions after initial turns to cut serving costs while preserving quality.
A machine-rendered reading of the paper's core claim, the machinery that carries it, and where it could break.
Core claim
The paper claims that by estimating a local response manifold from early turns and distilling it into a small language model via anti-degeneration controlled soft prompts and LoRA adaptation, the surrogate can serve the rest of the multi-turn conversation efficiently with maintained quality, supported by a theoretical analysis and a gating mechanism for drift.
What carries the argument
The local response manifold, defined as the set of likely response directions in the current conversation context, which is mined using soft prompts that maximize divergence between large and small models, then distilled into localized LoRA fine-tuning of the surrogate model.
If this is right
- Multi-turn sessions incur lower latency and memory costs after the initial turns by routing to the adapted small model.
- Response quality remains comparable because the surrogate is tuned specifically to the local semantic region.
- The gate with rollback provides safety against quality drops due to conversation drift.
- The theoretical analysis validates the key components like the divergence maximization and anti-degeneration control.
Where Pith is reading between the lines
- This approach could extend to other adaptive serving scenarios, such as switching between models based on query complexity in single-turn settings.
- If the manifold estimation proves robust across domains, it might reduce reliance on large models in long interactive applications like virtual assistants.
- Future work could test the framework with different small-large model pairs to see how size gap affects adaptation success.
Load-bearing premise
A stable local response manifold exists and can be reliably estimated from only the early turns of a session, allowing the surrogate model and gate to handle later turns without undetected quality loss.
What would settle it
If experiments show that after the switch, response quality degrades significantly in a substantial fraction of sessions without the gate triggering rollback, or if the estimated manifold does not capture the full range of later responses.
Figures
read the original abstract
Large Language Models (LLMs) are increasingly deployed in multi-turn dialogue settings where preserving conversational context across turns is essential. A standard serving practice concatenates the full dialogue history at every turn, which reliably maintains coherence but incurs substantial cost in latency, memory, and API expenditure, especially when queries are routed to large proprietary models. Existing approaches often struggle to balance the trade-off between response quality and efficiency. We propose a framework that exploits the early turns of a session to estimate a local response manifold and then adapt a smaller surrogate model to this local region for the remainder of the conversation. Concretely, we learn soft prompts that maximize semantic divergence between the large and surrogate small language models' responses to surface least-aligned local directions, stabilize training with anti-degeneration control, and distill the mined cases into localized LoRA fine-tuning so the surrogate runs without prompts at inference. A simple gate enables a one-time switch with rollback on drift. We further provide a theoretical analysis for key components in SOMA. Extensive experiments show the effectiveness of SOMA. The source code is provided at: https://github.com/LabRAI/SOMA.
Editorial analysis
A structured set of objections, weighed in public.
Referee Report
Summary. The paper proposes SOMA, a framework for efficient multi-turn LLM serving. It estimates a local response manifold from early conversation turns by mining divergence-maximizing cases via soft prompts between a large LLM and a small surrogate, distills these into localized LoRA fine-tuning of the surrogate (removing prompts at inference), and uses a simple gate for one-time switching with rollback on detected drift. The work includes theoretical analysis of key components and reports extensive experiments demonstrating effectiveness, with code released.
Significance. If the central assumptions hold, SOMA addresses a practical deployment challenge by trading off quality and efficiency in multi-turn settings through early-turn adaptation of smaller models, potentially lowering latency, memory, and API costs. The open-sourced code and theoretical analysis are strengths that support reproducibility and deeper understanding of the components.
major comments (2)
- [Abstract and theoretical analysis] The central claim rests on the stability of the local response manifold estimated from early turns and the gate's ability to prevent undetected quality drift. However, the theoretical analysis does not provide a formal definition of the manifold, bounds on its coverage radius, or characterization of the gate's false-negative rate under gradual topic shifts or session evolution (see abstract description of the framework and theoretical analysis section).
- [Experiments] The experiments claim to show effectiveness of the efficiency-quality tradeoff, but the manuscript provides no details on baselines, evaluation metrics, error bars, session-length distributions, or how data were selected to test drift scenarios. This makes it impossible to assess whether the surrogate plus gate maintains quality without significant undetected degradation (see experiments section).
minor comments (2)
- [Abstract] The abstract could more precisely state the scope of the theoretical analysis (e.g., which components receive formal treatment) and the exact conditions under which the gate triggers rollback.
- [Method] Notation for the soft-prompt mining and anti-degeneration control could be clarified with explicit equations or pseudocode in the method description.
Simulated Author's Rebuttal
We thank the referee for the constructive feedback on the theoretical foundations and experimental reporting. We address each major comment below and have revised the manuscript to incorporate additional formalization and details.
read point-by-point responses
-
Referee: [Abstract and theoretical analysis] The central claim rests on the stability of the local response manifold estimated from early turns and the gate's ability to prevent undetected quality drift. However, the theoretical analysis does not provide a formal definition of the manifold, bounds on its coverage radius, or characterization of the gate's false-negative rate under gradual topic shifts or session evolution (see abstract description of the framework and theoretical analysis section).
Authors: We agree that a more rigorous formalization would strengthen the central claim. The existing theoretical analysis examines stability through the divergence-maximizing soft prompts and anti-degeneration control, but does not supply an explicit set-theoretic definition of the manifold or radius bounds. In the revision we have added: (i) a formal definition of the local response manifold as the divergence ball around early-turn response embeddings; (ii) coverage-radius bounds derived from the Lipschitz constant of the response mapping and the prompt-optimization objective; and (iii) a characterization of the gate's false-negative rate under gradual shifts, modeled via a Markovian topic-evolution process with concentration inequalities. These additions appear in the updated theoretical analysis section. revision: yes
-
Referee: [Experiments] The experiments claim to show effectiveness of the efficiency-quality tradeoff, but the manuscript provides no details on baselines, evaluation metrics, error bars, session-length distributions, or how data were selected to test drift scenarios. This makes it impossible to assess whether the surrogate plus gate maintains quality without significant undetected degradation (see experiments section).
Authors: We acknowledge that the experimental section lacked sufficient transparency. In the revised manuscript we have inserted: (1) explicit descriptions of all baselines (full-context LLM, prompt-only surrogates, and competing adaptation methods); (2) the complete set of metrics (response quality via automated and human ratings, latency, memory footprint, and drift-detection accuracy); (3) error bars computed over five independent runs with different random seeds; (4) session-length statistics (mean, variance, and distribution) drawn from the evaluation corpora; and (5) a precise account of drift-scenario construction, including both synthetic topic-shift dialogues and real multi-turn sessions with evolving context. These changes allow readers to evaluate the quality-stability tradeoff directly. revision: yes
Circularity Check
No significant circularity; framework steps are independent of inputs.
full rationale
The paper defines a concrete pipeline: early-turn manifold estimation via divergence-maximizing soft prompts, anti-degeneration stabilization, LoRA distillation of mined cases, and a one-time gate with rollback. Theoretical analysis is supplied for components, and effectiveness is shown via experiments. No equation or claim reduces a prediction to a fitted input by construction, nor does any load-bearing premise rest on a self-citation chain whose validity is internal to the paper. The derivation remains self-contained against external benchmarks.
Axiom & Free-Parameter Ledger
Reference graph
Works this paper leans on
-
[1]
Josh Achiam, Steven Adler, Sandhini Agarwal, Lama Ahmad, Ilge Akkaya, Florencia Leoni Aleman, Diogo Almeida, Janko Altenschmidt, Sam Altman, Shyamal Anadkat, et al. Gpt-4 technical report.arXiv preprint arXiv:2303.08774, 2023
work page internal anchor Pith review Pith/arXiv arXiv 2023
-
[2]
Multi-character dialogue dataset
agentlans. Multi-character dialogue dataset. Hugging Face Dataset, 2024. CC-BY-4.0 license
work page 2024
-
[3]
Beyond the bubble: How context -aware memory systems are changing the game in 2025
Shalini Ananda. Beyond the bubble: How context -aware memory systems are changing the game in 2025. Tribe AI Applied AI Blog, April 2025
work page 2025
-
[4]
Anthropic. Introducing claude. Anthropic Blog, March 2023. https://www.anthropic. com/news/introducing-claude
work page 2023
-
[5]
Anthropic. Prompt caching with claude. https://www.anthropic.com/news/ prompt-caching, August 2024. Accessed 2025-06-12
work page 2024
-
[6]
Ge Bai, Jie Liu, Xingyuan Bu, Yancheng He, Jiaheng Liu, Zhanhui Zhou, Zhuoran Lin, Wenbo Su, Tiezheng Ge, Bo Zheng, et al. Mt-bench-101: A fine-grained benchmark for evaluating large language models in multi-turn dialogues.arXiv preprint arXiv:2402.14762, 2024
-
[7]
Adarsh Prasad Behera, Jaya Prakash Champati, Roberto Morabito, Sasu Tarkoma, and James Gross. Towards efficient multi-llm inference: Characterization and analysis of llm routing and hierarchical techniques.arXiv preprint arXiv:2506.06579, 2025
-
[8]
Mikhail Belkin and Partha Niyogi. Laplacian eigenmaps for dimensionality reduction and data representation.Neural computation, 15(6):1373–1396, 2003
work page 2003
-
[9]
Tom Brown, Benjamin Mann, Nick Ryder, Melanie Subbiah, Jared D Kaplan, Prafulla Dhariwal, Arvind Neelakantan, Pranav Shyam, Girish Sastry, Amanda Askell, et al. Language models are few-shot learners.Advances in neural information processing systems, 33:1877–1901, 2020
work page 1901
-
[10]
Sharegpt4v: Improving large multi-modal models with better captions
Lin Chen, Jinsong Li, Xiaoyi Dong, Pan Zhang, Conghui He, Jiaqi Wang, Feng Zhao, and Dahua Lin. Sharegpt4v: Improving large multi-modal models with better captions. InEuropean Conference on Computer Vision, pages 370–387. Springer, 2024
work page 2024
-
[11]
Nuo Chen, Hongguang Li, Juhua Huang, Baoyuan Wang, and Jia Li. Compress to impress: Unleashing the potential of compressive memory in real-world long-term conversations.arXiv preprint arXiv:2402.11975, 2024
-
[12]
Diffusion maps.Applied and computational harmonic analysis, 21(1):5–30, 2006
Ronald R Coifman and Stéphane Lafon. Diffusion maps.Applied and computational harmonic analysis, 21(1):5–30, 2006
work page 2006
-
[13]
Hybrid llm: Cost-efficient and quality- aware query routing.arXiv preprint arXiv:2404.14618, 2024
Dujian Ding, Ankur Mallick, Chi Wang, Robert Sim, Subhabrata Mukherjee, Victor Ruhle, Laks VS Lakshmanan, and Ahmed Hassan Awadallah. Hybrid llm: Cost-efficient and quality- aware query routing.arXiv preprint arXiv:2404.14618, 2024
-
[14]
Towards next-generation intelligent assistants leveraging llm techniques
Xin Luna Dong, Seungwhan Moon, Yifan Ethan Xu, Kshitiz Malik, and Zhou Yu. Towards next-generation intelligent assistants leveraging llm techniques. InProceedings of the 29th ACM SIGKDD Conference on Knowledge Discovery and Data Mining, pages 5792–5793, 2023
work page 2023
-
[15]
Yao Hui Fang and Xing Ce Wang. Non-uniform point cloud upsampling via local manifold distribution.Proceedings of the ACM on Computer Graphics and Interactive Techniques, 8(1):1–15, 2025
work page 2025
-
[16]
{Cost-Efficient} large language model serving for multi- turn conversations with {CachedAttention}
Bin Gao, Zhuomin He, Puru Sharma, Qingxuan Kang, Djordje Jevdjic, Junbo Deng, Xingkun Yang, Zhou Yu, and Pengfei Zuo. {Cost-Efficient} large language model serving for multi- turn conversations with {CachedAttention}. In2024 USENIX Annual Technical Conference (USENIX ATC 24), pages 111–126, 2024
work page 2024
-
[17]
DeepSeek-R1: Incentivizing Reasoning Capability in LLMs via Reinforcement Learning
Daya Guo, Dejian Yang, Haowei Zhang, Junxiao Song, Ruoyu Zhang, Runxin Xu, Qihao Zhu, Shirong Ma, Peiyi Wang, Xiao Bi, et al. Deepseek-r1: Incentivizing reasoning capability in llms via reinforcement learning.arXiv preprint arXiv:2501.12948, 2025. 10
work page internal anchor Pith review Pith/arXiv arXiv 2025
-
[18]
Hipporag: Neurobiologically inspired long-term memory for large language models
Bernal Jiménez Gutiérrez, Yiheng Shu, Yu Gu, Michihiro Yasunaga, and Yu Su. Hipporag: Neurobiologically inspired long-term memory for large language models. InThe Thirty-eighth Annual Conference on Neural Information Processing Systems, 2024
work page 2024
-
[19]
Decoupling strategy and generation in negotiation dialogues.arXiv preprint arXiv:1808.09637, 2018
He He, Derek Chen, Anusha Balakrishnan, and Percy Liang. Decoupling strategy and generation in negotiation dialogues.arXiv preprint arXiv:1808.09637, 2018
-
[20]
Measuring Mathematical Problem Solving With the MATH Dataset
Dan Hendrycks, Collin Burns, Saurav Kadavath, Akul Arora, Steven Basart, Eric Tang, Dawn Song, and Jacob Steinhardt. Measuring mathematical problem solving with the math dataset. arXiv preprint arXiv:2103.03874, 2021
work page internal anchor Pith review Pith/arXiv arXiv 2021
-
[21]
Distilling the Knowledge in a Neural Network
Geoffrey Hinton, Oriol Vinyals, and Jeff Dean. Distilling the knowledge in a neural network. arXiv preprint arXiv:1503.02531, 2015
work page internal anchor Pith review Pith/arXiv arXiv 2015
-
[22]
The Curious Case of Neural Text Degeneration
Ari Holtzman, Jan Buys, Li Du, Maxwell Forbes, and Yejin Choi. The curious case of neural text degeneration.arXiv preprint arXiv:1904.09751, 2019
work page internal anchor Pith review Pith/arXiv arXiv 1904
-
[23]
Lora: Low-rank adaptation of large language models.ICLR, 1(2):3, 2022
Edward J Hu, Yelong Shen, Phillip Wallis, Zeyuan Allen-Zhu, Yuanzhi Li, Shean Wang, Lu Wang, Weizhu Chen, et al. Lora: Low-rank adaptation of large language models.ICLR, 1(2):3, 2022
work page 2022
-
[24]
Accelerating llm serving for multi-turn dialogues with efficient resource management
Jinwoo Jeong and Jeongseob Ahn. Accelerating llm serving for multi-turn dialogues with efficient resource management. InProceedings of the 30th ACM International Conference on Architectural Support for Programming Languages and Operating Systems, Volume 2, pages 1–15, 2025
work page 2025
-
[25]
LLMs Get Lost In Multi-Turn Conversation
Philippe Laban, Hiroaki Hayashi, Yingbo Zhou, and Jennifer Neville. Llms get lost in multi-turn conversation.arXiv preprint arXiv:2505.06120, 2025
work page internal anchor Pith review arXiv 2025
-
[26]
Huayang Li, Tian Lan, Zihao Fu, Deng Cai, Lemao Liu, Nigel Collier, Taro Watanabe, and Yixuan Su. Repetition in repetition out: Towards understanding neural text degeneration from the data perspective.Advances in Neural Information Processing Systems, 36:72888–72903, 2023
work page 2023
-
[27]
Lincan Li, Zheng Chen, and Yushun Dong. Llm as clinical graph structure refiner: Enhancing representation learning in eeg seizure diagnosis, 2026
work page 2026
-
[28]
Beyond Single-Turn: A Survey on Multi-Turn Interactions with Large Language Models
Yubo Li, Xiaobin Shen, Xinyu Yao, Xueying Ding, Yidi Miao, Ramayya Krishnan, and Rema Padman. Beyond single-turn: A survey on multi-turn interactions with large language models. arXiv preprint arXiv:2504.04717, 2025
work page internal anchor Pith review Pith/arXiv arXiv 2025
-
[29]
arXiv preprint arXiv:2404.00971 , year=
Fang Liu, Yang Liu, Lin Shi, Houkun Huang, Ruifeng Wang, Zhen Yang, Li Zhang, Zhongqi Li, and Yuchi Ma. Exploring and evaluating hallucinations in llm-powered code generation. arXiv preprint arXiv:2404.00971, 2024
-
[30]
UMAP: Uniform Manifold Approximation and Projection for Dimension Reduction
Leland McInnes, John Healy, and James Melville. Umap: Uniform manifold approximation and projection for dimension reduction.arXiv preprint arXiv:1802.03426, 2018
work page internal anchor Pith review Pith/arXiv arXiv 2018
-
[31]
Clara Meister, Tiago Pimentel, Gian Wiher, and Ryan Cotterell. Locally typical sampling. Transactions of the Association for Computational Linguistics, 11:102–121, 2023
work page 2023
-
[32]
Eric Melz. Enhancing llm intelligence with arm-rag: Auxiliary rationale memory for retrieval augmented generation.arXiv preprint arXiv:2311.04177, 2023
-
[33]
Accelerating multi-turn llm serving with multi-tier caching and smarter scheduling
Don Moon. Accelerating multi-turn llm serving with multi-tier caching and smarter scheduling. Medium (Byte-Sized AI), April 2025
work page 2025
-
[34]
Enhancing recommendation systems with hybrid manifold regu- larized knowledge graph
Giang Ngo and Nhi NY V o. Enhancing recommendation systems with hybrid manifold regu- larized knowledge graph. In2023 IEEE 10th International Conference on Data Science and Advanced Analytics (DSAA), pages 1–8. IEEE, 2023
work page 2023
-
[35]
Gonzalez, M Waleed Kadous, and Ion Stoica
Isaac Ong, Amjad Almahairi, Vincent Wu, Wei-Lin Chiang, Tianhao Wu, Joseph E. Gonzalez, M Waleed Kadous, and Ion Stoica. Routellm: Learning to route llms with preference data, 2024. 11
work page 2024
-
[36]
gpt-oss-120b & gpt-oss-20b model card, 2025
OpenAI. gpt-oss-120b & gpt-oss-20b model card, 2025
work page 2025
-
[37]
Zhuoshi Pan, Qianhui Wu, Huiqiang Jiang, Menglin Xia, Xufang Luo, Jue Zhang, Qingwei Lin, Victor Rühle, Yuqing Yang, Chin-Yew Lin, et al. Llmlingua-2: Data distillation for efficient and faithful task-agnostic prompt compression.arXiv preprint arXiv:2403.12968, 2024
-
[38]
Soya Park and Chinmay Kulkarni. Thinking assistants: Llm-based conversational assistants that help users think by asking rather than answering.arXiv preprint arXiv:2312.06024, 2023
-
[39]
Language models are unsupervised multitask learners.OpenAI blog, 1(8):9, 2019
Alec Radford, Jeffrey Wu, Rewon Child, David Luan, Dario Amodei, Ilya Sutskever, et al. Language models are unsupervised multitask learners.OpenAI blog, 1(8):9, 2019
work page 2019
-
[40]
Nonlinear dimensionality reduction by locally linear embedding.science, 290(5500):2323–2326, 2000
Sam T Roweis and Lawrence K Saul. Nonlinear dimensionality reduction by locally linear embedding.science, 290(5500):2323–2326, 2000
work page 2000
-
[41]
Timo Schick, Jane Dwivedi-Yu, Roberto Dessì, Roberta Raileanu, Maria Lomeli, Eric Hambro, Luke Zettlemoyer, Nicola Cancedda, and Thomas Scialom. Toolformer: Language models can teach themselves to use tools.Advances in Neural Information Processing Systems, 36:68539– 68551, 2023
work page 2023
-
[42]
Jay Shah, Ganesh Bikshandi, Ying Zhang, Vijay Thakkar, Pradeep Ramani, and Tri Dao. Flashattention-3: Fast and accurate attention with asynchrony and low-precision.Advances in Neural Information Processing Systems, 37:68658–68685, 2024
work page 2024
-
[43]
Large language model routing with benchmark datasets.arXiv preprint arXiv:2309.15789, 2023
Tal Shnitzer, Anthony Ou, Mírian Silva, Kate Soule, Yuekai Sun, Justin Solomon, Neil Thomp- son, and Mikhail Yurochkin. Large language model routing with benchmark datasets.arXiv preprint arXiv:2309.15789, 2023
-
[44]
Andreas Stolcke, Klaus Ries, Noah Coccaro, Elizabeth Shriberg, Rebecca Bates, Daniel Jurafsky, Paul Taylor, Rachel Martin, Carol Van Ess-Dykema, and Marie Meteer. Dialogue act modeling for automatic tagging and recognition of conversational speech.Computational linguistics, 26(3):339–373, 2000
work page 2000
-
[45]
Zhiyu Sun, Ethan Rooke, Jerome Charton, Yusen He, Jia Lu, and Stephen Baek. Zernet: Convolutional neural networks on arbitrary surfaces via zernike local tangent space estimation. InComputer Graphics Forum, volume 39, pages 204–216. Wiley Online Library, 2020
work page 2020
-
[46]
LLaMA: Open and Efficient Foundation Language Models
Hugo Touvron, Thibaut Lavril, Gautier Izacard, Xavier Martinet, Marie-Anne Lachaux, Timo- thée Lacroix, Baptiste Rozière, Naman Goyal, Eric Hambro, Faisal Azhar, et al. Llama: Open and efficient foundation language models.arXiv preprint arXiv:2302.13971, 2023
work page internal anchor Pith review Pith/arXiv arXiv 2023
-
[47]
Visualizing data using t-sne.Journal of machine learning research, 9(11), 2008
Laurens Van der Maaten and Geoffrey Hinton. Visualizing data using t-sne.Journal of machine learning research, 9(11), 2008
work page 2008
-
[48]
Qingyue Wang, Yanhe Fu, Yanan Cao, Shuai Wang, Zhiliang Tian, and Liang Ding. Recursively summarizing enables long-term dialogue memory in large language models.Neurocomputing, 639:130193, 2025
work page 2025
-
[49]
Neural text generation with unlikelihood training.arXiv preprint arXiv:1908.04319, 2019
Sean Welleck, Ilia Kulikov, Stephen Roller, Emily Dinan, Kyunghyun Cho, and Jason Weston. Neural text generation with unlikelihood training.arXiv preprint arXiv:1908.04319, 2019
-
[50]
Efficient Streaming Language Models with Attention Sinks
Guangxuan Xiao, Yuandong Tian, Beidi Chen, Song Han, and Mike Lewis. Efficient streaming language models with attention sinks, 2024.URL https://arxiv. org/abs/2309.17453, page 1, 2024
work page internal anchor Pith review Pith/arXiv arXiv 2024
-
[51]
Yuwen Xiong, Mengye Ren, and Raquel Urtasun. Loco: Local contrastive representation learning.Advances in neural information processing systems, 33:11142–11153, 2020
work page 2020
-
[52]
Remedi: Resources for multi-domain, multi-service, medical dialogues
Guojun Yan, Jiahuan Pei, Pengjie Ren, Zhaochun Ren, Xin Xin, Huasheng Liang, Maarten De Rijke, and Zhumin Chen. Remedi: Resources for multi-domain, multi-service, medical dialogues. InProceedings of the 45th International ACM SIGIR Conference on Research and Development in Information Retrieval, pages 3013–3024, 2022
work page 2022
-
[53]
Zihao Yi, Jiarui Ouyang, Yuwen Liu, Tianhao Liao, Zhe Xu, and Ying Shen. A survey on recent advances in llm-based multi-turn dialogue systems.arXiv preprint arXiv:2402.18013, 2024. 12
-
[54]
Dahai Yu, Lin Jiang, Rongchao Xu, and Guang Wang. Healthmamba: An uncertainty-aware spa- tiotemporal graph state space model for effective and reliable healthcare facility visit prediction, 2026
work page 2026
-
[55]
Zhaoyang Zeng, Daniel McDuff, Yale Song, et al. Contrastive learning of global and local video representations.Advances in Neural Information Processing Systems, 34:7025–7040, 2021
work page 2021
-
[56]
Tianhao Zhang, Jie Yang, Deli Zhao, and Xinliang Ge. Linear local tangent space alignment and application to face recognition.Neurocomputing, 70(7-9):1547–1553, 2007
work page 2007
-
[57]
Yuying Zhao, Yu Wang, Xueqi Cheng, Anne Marie Tumlin, Yunchao Liu, Damin Xia, Meng Jiang, and Tyler Derr. Amplifying your social media presence: Personalized influential content generation with llms.arXiv preprint arXiv:2505.01698, 2025
-
[58]
Lianmin Zheng, Wei-Lin Chiang, Ying Sheng, Siyuan Zhuang, Zhanghao Wu, Yonghao Zhuang, Zi Lin, Zhuohan Li, Dacheng Li, Eric Xing, et al. Judging llm-as-a-judge with mt-bench and chatbot arena.Advances in neural information processing systems, 36:46595–46623, 2023. 13 Technical Appendices and Supplementary Material A Notations This section summarizes all n...
work page 2023
discussion (0)
Sign in with ORCID, Apple, or X to comment. Anyone can read and Pith papers without signing in.