pith. machine review for the scientific record. sign in

arxiv: 2604.04562 · v1 · submitted 2026-04-06 · 💻 cs.DL · cs.AI

Recognition: 1 theorem link

· Lean Theorem

Paper Espresso: From Paper Overload to Research Insight

Mingzhe Du , Luu Anh Tuan , Dong Huang , See-kiong Ng

Authors on Pith no claims yet

Pith reviewed 2026-05-10 19:33 UTC · model grok-4.3

classification 💻 cs.DL cs.AI
keywords arXiv analysisLLM summarizationresearch trend detectiontopic consolidationAI research dynamicspaper overloadcommunity engagementreinforcement learning
0
0 comments X

The pith

An LLM-based platform processes 13,300 arXiv papers over 35 months and extracts trends including a surge in reinforcement learning for reasoning.

A machine-rendered reading of the paper's core claim, the machinery that carries it, and where it could break.

The paper introduces a system that uses large language models to automatically discover, summarize, and analyze papers from arXiv, producing structured outputs with labels and keywords plus trend views at daily, weekly, and monthly scales. Continuous operation for 35 months yielded metadata from over 13,300 papers that the authors released publicly. Analysis of this data shows a mid-2025 increase in work on reinforcement learning for large language model reasoning, the ongoing appearance of thousands of distinct topics, and higher community upvotes for papers introducing newer topics. The work targets the practical problem of researchers struggling to keep pace with the volume of new publications.

Core claim

Paper Espresso applies large language models to generate summaries, topical labels, and keywords for incoming arXiv papers, then uses those outputs for consolidated trend analysis across multiple time granularities. After running without interruption for 35 months and handling 13,300 papers, the collected metadata reveals a clear mid-2025 rise in reinforcement learning applied to LLM reasoning, a total of 6,673 unique topics that continue to emerge without saturation, and a measurable link in which papers on the newest topics receive roughly twice the median upvotes of other papers.

What carries the argument

LLM-driven topic consolidation that turns per-paper topical labels into aggregated trends at daily, weekly, and monthly resolutions.

If this is right

  • The public release of all structured metadata allows independent researchers to examine the same dataset for additional patterns.
  • The measured positive link between topic novelty and engagement implies that early adoption of emerging ideas tends to attract greater community attention.
  • The non-saturating count of 6,673 topics indicates that AI research continues to branch into new areas rather than converging on a fixed set.
  • The observed mid-2025 increase in reinforcement learning for reasoning points to a specific shift in research priorities during that period.

Where Pith is reading between the lines

These are editorial extensions of the paper, not claims the author makes directly.

  • Extending the same processing approach to arXiv categories outside computer science could expose comparable dynamics in other fields.
  • Periodic human audits of the LLM outputs would provide a practical way to quantify and correct any model-induced skew in the detected trends.
  • The platform's ability to run continuously suggests it could serve as a live monitoring layer that updates trend views as new papers arrive.
  • If the correlation between novelty and upvotes holds in follow-up data, it could inform how researchers time the release of their work to maximize initial reception.

Load-bearing premise

That the large language models produce summaries, labels, and topic groupings that accurately reflect the actual content and real research trends in the papers without meaningful errors or systematic distortions.

What would settle it

A side-by-side comparison of the system's generated topics and summaries against independent human expert labels on a random sample of the same papers, measuring agreement and any consistent mismatches.

Figures

Figures reproduced from arXiv: 2604.04562 by Dong Huang, Luu Anh Tuan, Mingzhe Du, See-kiong Ng.

Figure 1
Figure 1. Figure 1: Monthly paper volume: arXiv total (red, left axis) vs. [PITH_FULL_IMAGE:figures/full_fig_p001_1.png] view at source ↗
Figure 2
Figure 2. Figure 2: System architecture of Paper Espresso. The data ingestion layer fetches papers from the Hugging Face Daily Papers API and arXiv. The AI processing layer uses Google Gemini to generate structured summaries and trend analyses. The presentation layer provides an interactive Streamlit interface with multi-granularity browsing. 2 System Architecture The system is organized as modular CLI-driven pipelines (daily… view at source ↗
Figure 3
Figure 3. Figure 3: Bimonthly proportion (%) of the top-10 research topics from May 2023 to March 2026, smoothed with a Gaussian [PITH_FULL_IMAGE:figures/full_fig_p004_3.png] view at source ↗
Figure 4
Figure 4. Figure 4: Community engagement distribution. The his [PITH_FULL_IMAGE:figures/full_fig_p004_4.png] view at source ↗
Figure 6
Figure 6. Figure 6: Co-occurrence heatmap for the top-20 topics. The [PITH_FULL_IMAGE:figures/full_fig_p005_6.png] view at source ↗
Figure 7
Figure 7. Figure 7: Keyword evolution within three major topics. Each line shows the percentage of papers (within that topic) mentioning a [PITH_FULL_IMAGE:figures/full_fig_p006_7.png] view at source ↗
Figure 8
Figure 8. Figure 8: AI research hype cycle derived from 35 months of topic proportion time series. Topics are classified into five lifecycle [PITH_FULL_IMAGE:figures/full_fig_p007_8.png] view at source ↗
Figure 9
Figure 9. Figure 9: Novelty vs. engagement. Papers with more novel [PITH_FULL_IMAGE:figures/full_fig_p007_9.png] view at source ↗
read the original abstract

The accelerating pace of scientific publishing makes it increasingly difficult for researchers to stay current. We present Paper Espresso, an open-source platform that automatically discovers, summarizes, and analyzes trending arXiv papers. The system uses large language models (LLMs) to generate structured summaries with topical labels and keywords, and provides multi-granularity trend analysis at daily, weekly, and monthly scales through LLM-driven topic consolidation. Over 35 months of continuous deployment, Paper Espresso has processed over 13,300 papers and publicly released all structured metadata, revealing rich dynamics in the AI research landscape: a mid-2025 surge in reinforcement learning for LLM reasoning, non-saturating topic emergence (6,673 unique topics), and a positive correlation between topic novelty and community engagement (2.0x median upvotes for the most novel papers). A live demo is available at https://huggingface.co/spaces/Elfsong/Paper_Espresso.

Editorial analysis

A structured set of objections, weighed in public.

Desk editor's note, referee report, simulated authors' rebuttal, and a circularity audit. Tearing a paper down is the easy half of reading it; the pith above is the substance, this is the friction.

Referee Report

2 major / 2 minor

Summary. The paper introduces Paper Espresso, an open-source platform that automatically discovers, summarizes, and analyzes arXiv papers using LLMs for structured summaries, topical labels, keywords, and multi-granularity trend analysis at daily/weekly/monthly scales. Over 35 months of deployment it has processed 13,300 papers, publicly released all structured metadata, and reports three main observations on the AI research landscape: a mid-2025 surge in reinforcement learning for LLM reasoning, non-saturating emergence of 6,673 unique topics, and a positive correlation between topic novelty and community engagement (2.0x median upvotes for the most novel papers). A live demo is provided.

Significance. If the LLM-derived metadata and trends prove reliable, the work supplies a scalable, continuously operating tool for navigating scientific literature overload together with a large public dataset of structured paper metadata that can support downstream studies of research dynamics. The concrete deployment scale and open data release are clear strengths that distinguish the contribution from purely conceptual proposals.

major comments (2)
  1. [Abstract and empirical results] Abstract and the empirical results section: the three headline observations (mid-2025 RL-for-reasoning surge, 6,673 non-saturating topics, and 2.0x novelty-upvote correlation) are extracted directly from LLM-generated summaries, labels, and multi-granularity consolidations. No accuracy metrics, human evaluation, inter-annotator agreement, or error analysis on these outputs are reported, so the patterns could be artifacts of LLM biases rather than genuine landscape dynamics.
  2. [Methods / pipeline description] Methods / pipeline description: the multi-granularity topic consolidation step is described at a high level but lacks any validation that the consolidated topics faithfully reflect paper content across scales; without such checks the non-saturation claim and the novelty-engagement correlation rest on untested fidelity.
minor comments (2)
  1. [Demo] The live demo link is given but the manuscript would benefit from a brief description or screenshot of the interface to help readers understand the user-facing output.
  2. Ensure that the open-source repository URL, data release DOI or persistent link, and exact version of the LLM models used are stated explicitly in the text and abstract.

Simulated Author's Rebuttal

2 responses · 0 unresolved

We thank the referee for the constructive feedback and for recognizing the value of the deployed system and open data release. We agree that the lack of reported validation for the LLM outputs and topic consolidation is a substantive gap that weakens the empirical claims. We will revise the manuscript to address both points directly.

read point-by-point responses
  1. Referee: [Abstract and empirical results] Abstract and the empirical results section: the three headline observations (mid-2025 RL-for-reasoning surge, 6,673 non-saturating topics, and 2.0x novelty-upvote correlation) are extracted directly from LLM-generated summaries, labels, and multi-granularity consolidations. No accuracy metrics, human evaluation, inter-annotator agreement, or error analysis on these outputs are reported, so the patterns could be artifacts of LLM biases rather than genuine landscape dynamics.

    Authors: We agree that the manuscript currently lacks any quantitative validation or error analysis of the LLM-generated structured metadata, leaving open the possibility that the reported trends are influenced by model biases. In the revised version we will add a new Validation subsection that reports: (i) manual accuracy assessment on a random sample of 200 papers (precision for topical labels, keyword relevance, and summary faithfulness), (ii) a categorized error analysis of common LLM failure modes observed during development, and (iii) an explicit discussion of how the public release of all 13,300+ metadata records enables independent community verification. These additions will make clear that the headline observations rest on verifiable outputs rather than unexamined LLM artifacts. revision: yes

  2. Referee: [Methods / pipeline description] Methods / pipeline description: the multi-granularity topic consolidation step is described at a high level but lacks any validation that the consolidated topics faithfully reflect paper content across scales; without such checks the non-saturation claim and the novelty-engagement correlation rest on untested fidelity.

    Authors: We concur that the multi-granularity consolidation procedure is presented at too high a level and that no fidelity checks are provided, which undermines confidence in the non-saturation and novelty-engagement results. The revised Methods section will expand the description to include the exact consolidation prompts and will add a validation experiment: for three distinct time windows we will compare LLM-consolidated topics against both LDA-derived topics and a small human-annotated reference set, reporting quantitative metrics such as normalized mutual information and topic coherence. These results will be presented alongside the original claims to demonstrate that the consolidation step preserves content fidelity across scales. revision: yes

Circularity Check

0 steps flagged

No circularity: purely observational claims from deployed LLM pipeline

full rationale

The manuscript describes an LLM-based pipeline for ingesting arXiv papers, generating structured summaries/labels/keywords, and performing multi-granularity topic consolidation. It then reports three direct empirical observations (RL surge, 6,673 topics, novelty-upvote correlation) extracted from the 13,300 processed papers. No equations, fitted parameters, model predictions, or self-referential derivations appear anywhere in the text. The reported quantities are literal outputs of the described system run on external data; they are not constructed by re-using fitted values or by self-citation chains. Absence of human validation for the LLM outputs is a separate validity/bias concern, not a circularity reduction.

Axiom & Free-Parameter Ledger

0 free parameters · 0 axioms · 0 invented entities

No free parameters, axioms, or invented entities are introduced; the work relies on standard LLM capabilities for summarization and topic consolidation.

pith-pipeline@v0.9.0 · 5464 in / 1147 out tokens · 63766 ms · 2026-05-10T19:33:06.514661+00:00 · methodology

discussion (0)

Sign in with ORCID, Apple, or X to comment. Anyone can read and Pith papers without signing in.

Lean theorems connected to this paper

Citations machine-checked in the Pith Canon. Every link opens the source theorem in the public Lean library.

What do these tags mean?
matches
The paper's claim is directly supported by a theorem in the formal canon.
supports
The theorem supports part of the paper's argument, but the paper may add assumptions or extra steps.
extends
The paper goes beyond the formal theorem; the theorem is a base layer rather than the whole result.
uses
The paper appears to rely on the theorem as machinery.
contradicts
The paper's claim conflicts with a theorem or certificate in the canon.
unclear
Pith found a possible connection, but the passage is too broad, indirect, or ambiguous to say the theorem truly supports the claim.

Reference graph

Works this paper leans on

45 extracted references · 12 canonical work pages · 5 internal anchors

  1. [1]

    [n. d.]. arXiv Monthly Submission Statistics. https://arxiv.org/stats/monthly_ submissions. Accessed: 2026-04-02

  2. [2]

    Litllm: A toolkit for literature review with large language models.arXiv preprint arXiv:2402.01788, 2024

    Shubham Agarwal, Issam H. Laradji, Laurent Charlin, and Christopher Pal. 2024. LitLLM: A Toolkit for Scientific Literature Review.arXiv preprint arXiv:2402.01788 (2024)

  3. [3]

    Waleed Ammar, Dirk Groeneveld, Chandra Bhagavatula, Iz Beltagy, Miles Craw- ford, Doug Downey, Jason Dunkelberger, Ahmed Elgohary, Sergey Feldman, Vu Ha, Rodney Kinney, Sebastian Kohlmeier, Kyle Lo, Tyler Murray, Hsu-Han Ooi, Matthew Peters, Joanna Power, Sam Skjonsberg, Lucy Lu Wang, Chris Wilhelm, Zheng Yuan, Madeleine van Zuylen, and Oren Etzioni. 2018...

  4. [4]

    Leonard Bereska and Efstratios Gavves. 2024. Mechanistic Interpretability for AI Safety – A Review.Transactions on Machine Learning Research(2024)

  5. [5]

    BerriAI. 2025. LiteLLM: A Unified Interface for LLM APIs. https://github.com/ BerriAI/litellm

  6. [6]

    Blei, Andrew Y

    David M. Blei, Andrew Y. Ng, and Michael I. Jordan. 2003. Latent Dirichlet Allocation.Journal of Machine Learning Research3 (2003), 993–1022

  7. [7]

    Allaa Boutaleb, Jerome Picault, and Guillaume Grosjean. 2024. BERTrend: Neural Topic Modeling for Emerging Trends Detection. InProceedings of the Workshop on Future Directions in Event Detection (FuturED)

  8. [8]

    Isabel Cachola, Kyle Lo, Arman Cohan, and Daniel Weld. 2020. TLDR: Extreme Summarization of Scientific Documents. InFindings of the Association for Com- putational Linguistics: EMNLP 2020. 4766–4777

  9. [9]

    Chaomei Chen. 2006. CiteSpace II: Detecting and Visualizing Emerging Trends and Transient Patterns in Scientific Literature.Journal of the American Society for Information Science and Technology57, 3 (2006), 359–377

  10. [10]

    Arman Cohan, Franck Dernoncourt, Doo Soon Kim, Trung Bui, Seokhwan Kim, Walter Chang, and Nazli Goharian. 2018. A Discourse-Aware Attention Model for Abstractive Summarization of Long Documents. InProceedings of the 2018 Conference of the North American Chapter of the Association for Computational Linguistics: Human Language Technologies. 615–621

  11. [11]

    Jingtao Ding, Yunke Zhang, Yu Shang, Jie Feng, Yuheng Zhang, Zefang Zong, Yuan Yuan, Hongyuan Su, Nian Li, Jinghua Piao, Yucheng Deng, Nicholas Sukien- nik, Chen Gao, Fengli Xu, and Yong Li. 2025. Understanding World or Predicting Future? A Comprehensive Survey of World Models.Comput. Surveys(2025)

  12. [12]

    Mingzhe Du, Anh Tuan Luu, Bin Ji, Qian Liu, and See-Kiong Ng. 2024. Mercury: A Code Efficiency Benchmark for Code Large Language Models. InAdvances in Neural Information Processing Systems, Vol. 37

  13. [13]

    Mingzhe Du, Anh Tuan Luu, Bin Ji, Xiaobao Wu, Yuhao Qing, Dong Huang, Terry Yue Zhuo, Qian Liu, and See-Kiong Ng. 2025. CodeArena: A Collective Evaluation Platform for LLM Code Generation. InProceedings of the 63rd An- nual Meeting of the Association for Computational Linguistics (Volume 3: System Demonstrations). Association for Computational Linguistics...

  14. [14]

    Mingzhe Du, Anh Tuan Luu, Yue Liu, Yuhao Qing, Dong Huang, Xinyi He, Qian Liu, Zejun Ma, and See-Kiong Ng. 2025. Afterburner: Reinforcement Learning Facilitates Self-Improving Code Efficiency Optimization.arXiv preprint arXiv:2505.23387(2025)

  15. [15]

    Zhangyin Feng, Yuqi Huo, Teng Fei, Jiawei Zhang, et al . 2025. PaSa: An LLM Agent for Comprehensive Academic Paper Search.arXiv preprint arXiv:2501.10120(2025)

  16. [16]

    2008.Mastering the Hype Cycle: How to Choose the Right Innovation at the Right Time

    Jackie Fenn and Mark Raskino. 2008.Mastering the Hype Cycle: How to Choose the Right Innovation at the Right Time. Harvard Business Press

  17. [17]

    Maarten Grootendorst. 2022. BERTopic: Neural Topic Modeling with a Class- Based TF-IDF Procedure.arXiv preprint arXiv:2203.05794(2022)

  18. [18]

    Geoffrey Hinton, Oriol Vinyals, and Jeff Dean. 2015. Distilling the Knowledge in a Neural Network.arXiv preprint arXiv:1503.02531(2015)

  19. [19]

    Zhang, Zheng Lin, Meng Luo, Qianru Zhang, and See-Kiong Ng

    Dong Huang, Mingzhe Du, Jie M. Zhang, Zheng Lin, Meng Luo, Qianru Zhang, and See-Kiong Ng. 2025. Nexus: Execution-Grounded Multi-Agent Test Oracle Synthesis.arXiv preprint arXiv:2510.26423(2025)

  20. [20]

    Bin Ji, Huijun Liu, Mingzhe Du, Shasha Li, Xiaodong Liu, Jun Ma, Jie Yu, and See-Kiong Ng. 2025. Towards Verifiable Text Generation with Generative Agent. InProceedings of the AAAI Conference on Artificial Intelligence, Vol. 39

  21. [21]

    Bin Ji, Huijun Liu, Mingzhe Du, and See-Kiong Ng. 2024. Chain-of-Thought Im- proves Text Generation with Citations in Large Language Models. InProceedings of the AAAI Conference on Artificial Intelligence, Vol. 38. 18345–18353

  22. [22]

    Andrej Karpathy. 2021. arxiv-sanity-lite: Tag arxiv Papers of Interest and Get Recommendations. https://github.com/karpathy/arxiv-sanity-lite

  23. [23]

    Foster, Pannag R

    Moo Jin Kim, Karl Pertsch, Siddharth Karamcheti, Ted Xiao, Ashwin Balakr- ishna, Suraj Nair, Rafael Rafailov, Ethan P. Foster, Pannag R. Sanketi, Quan Vuong, Thomas Kollar, Benjamin Burchfiel, Russ Tedrake, Dorsa Sadigh, Sergey Levine, Percy Liang, and Chelsea Finn. 2025. OpenVLA: An Open-Source Vision- Language-Action Model. InProceedings of The 8th Conf...

  24. [24]

    Nathan Lambert et al. 2024. Reinforcement Learning with Verifiable Rewards. arXiv preprint(2024)

  25. [25]

    Weixin Liang, Yuhui Zhang, Hancheng Cao, Binglu Wang, Daisy Ding, Xinyu Yang, Kailas Vodrahalli, Siber He, Daniel Smith, Yian Yin, Daniel McFarland, and James Zou. 2024. Can Large Language Models Provide Useful Feedback on Research Papers? A Large-Scale Empirical Analysis.NEJM AI1, 8 (2024)

  26. [26]

    Yaron Lipman, Ricky T. Q. Chen, Heli Ben-Hamu, Maximilian Nickel, and Matt Le. 2023. Flow Matching for Generative Modeling. InInternational Conference on Learning Representations

  27. [27]

    Long Ouyang, Jeffrey Wu, Xu Jiang, Diogo Almeida, Carroll Wainwright, Pamela Mishkin, Chong Zhang, Sandhini Agarwal, Katarina Slama, Alex Ray, et al

  28. [28]

    Advances in Neural Information Processing Systems35 (2022), 27730–27744

    Training Language Models to Follow Instructions with Human Feedback. Advances in Neural Information Processing Systems35 (2022), 27730–27744

  29. [29]

    William Peebles and Saining Xie. 2023. Scalable Diffusion Models with Trans- formers. InProceedings of the IEEE/CVF International Conference on Computer Vision. 4195–4205

  30. [30]

    Manning, and Chelsea Finn

    Rafael Rafailov, Archit Sharma, Eric Mitchell, Stefano Ermon, Christopher D. Manning, and Chelsea Finn. 2023. Direct Preference Optimization: Your Language Model Is Secretly a Reward Model.Advances in Neural Information Processing Systems36 (2023)

  31. [31]

    Robin Rombach, Andreas Blattmann, Dominik Lorenz, Patrick Esser, and Björn Ommer. 2022. High-Resolution Image Synthesis with Latent Diffusion Mod- els. InProceedings of the IEEE/CVF Conference on Computer Vision and Pattern Recognition. 10684–10695

  32. [32]

    Zhihong Shao, Peiyi Wang, Qihao Zhu, Runxin Xu, Junxiao Song, Mingchuan Zhang, Y.K. Li, Y. Wu, and Daya Guo. 2024. DeepSeekMath: Pushing the Limits of Mathematical Reasoning in Open Language Models.arXiv preprint arXiv:2402.03300(2024)

  33. [33]

    Robert Stojnic, Ross Taylor, Ilia Sucholutsky, Douwe Kiela, et al. 2019. Papers with Code. (2019). https://paperswithcode.com

  34. [34]

    Sutton and Andrew G

    Richard S. Sutton and Andrew G. Barto. 2018.Reinforcement Learning: An Intro- duction(2nd ed.). MIT Press

  35. [35]

    Nees Jan van Eck and Ludo Waltman. 2010. Software Survey: VOSviewer, a Computer Program for Bibliometric Mapping.Scientometrics84, 2 (2010), 523– 538

  36. [36]

    Yubo Wang, Xueguang Ma, Ping Nie, et al. 2025. ScholarCopilot: Training Large Language Models for Academic Writing with Accurate Citations.arXiv preprint arXiv:2504.00824(2025)

  37. [37]

    Le, and Denny Zhou

    Jason Wei, Xuezhi Wang, Dale Schuurmans, Maarten Bosma, Brian Ichter, Fei Xia, Ed Chi, Quoc V. Le, and Denny Zhou. 2022. Chain-of-Thought Prompting Elicits Reasoning in Large Language Models.Advances in Neural Information Processing Systems35 (2022), 24824–24837

  38. [38]

    Zhaomin Wu, Mingzhe Du, See-Kiong Ng, and Bingsheng He. 2026. Beyond Prompt-Induced Lies: Investigating LLM Deception on Benign Prompts. InInter- national Conference on Learning Representations

  39. [39]

    Ling Yang, Zhilong Zhang, Yang Song, Shenda Hong, Runsheng Xu, Yue Zhao, Wentao Zhang, Bin Cui, and Ming-Hsuan Yang. 2024. Diffusion Models: A Comprehensive Survey of Methods and Applications.Comput. Surveys56, 4 (2024), 1–39

  40. [40]

    Shunyu Yao, Jeffrey Zhao, Dian Yu, Nan Du, Izhak Shafran, Karthik Narasimhan, and Yuan Cao. 2023. ReAct: Synergizing Reasoning and Acting in Language Models. InInternational Conference on Learning Representations

  41. [41]

    Yu, and Jiawei Zhang

    Haopeng Zhang, Philip S. Yu, and Jiawei Zhang. 2025. A Systematic Survey of Text Summarization: From Statistical Methods to Large Language Models. Comput. Surveys57, 11 (2025), 1–55

  42. [42]

    Jingyi Zhang, Jiaxing Huang, Sheng Jin, and Shijian Lu. 2024. Vision-Language Models for Vision Tasks: A Survey.IEEE Transactions on Pattern Analysis and Machine Intelligence46, 8 (2024), 5625–5644

  43. [43]

    Lvmin Zhang, Anyi Rao, and Maneesh Agrawala. 2023. Adding Conditional Control to Text-to-Image Diffusion Models. InProceedings of the IEEE/CVF Inter- national Conference on Computer Vision. 3836–3847

  44. [44]

    Wayne Xin Zhao, Kun Zhou, Junyi Li, Tianyi Tang, Xiaolei Wang, Yupeng Hou, Yingqian Min, Beichen Zhang, Junjie Zhang, Zican Dong, Yifan Du, Chen Yang, Yushuo Chen, Zhipeng Chen, Jinhao Jiang, Ruiyang Ren, Yifan Li, Xinyu Tang, Zikang Liu, Peiyu Liu, Jian-Yun Nie, and Ji-Rong Wen. 2023. A Survey of Large Language Models.arXiv preprint arXiv:2303.18223(2023)

  45. [45]

    Zixuan Zhou, Xuefei Ning, Ke Hong, Tianyu Fu, Jiaming Xu, Shiyao Li, Yuming Lou, Luning Wang, Zhihang Yuan, Xiuhong Li, Shengen Yan, Guohao Dai, Xiao- Ping Zhang, Yuhan Dong, and Yu Wang. 2024. A Survey on Efficient Inference for Large Language Models.arXiv preprint arXiv:2404.14294(2024)