pith. sign in

arxiv: 2510.27176 · v5 · submitted 2025-10-31 · 💻 cs.AI · cs.CL· cs.DC

Glia: A Human-Inspired AI for Automated Systems Design and Optimization

Pith reviewed 2026-05-18 03:24 UTC · model grok-4.3

classification 💻 cs.AI cs.CLcs.DC
keywords AI systems designmulti-agent LLMsautomated optimizationLLM inferenceGPU clustersrequest routingauto-scalinginterpretable algorithms
0
0 comments X

The pith

Glia uses a multi-agent LLM setup to design interpretable algorithms for computer systems that match human expert performance.

A machine-rendered reading of the paper's core claim, the machinery that carries it, and where it could break.

The paper introduces Glia as an AI architecture for designing mechanisms in networked systems. It employs large language models in a workflow where agents specialize in reasoning, experimentation, and analysis. These agents work together through an evaluation framework that connects abstract ideas to real-world test results. Applied to optimizing a GPU cluster for running large language models, Glia created new methods for routing requests, scheduling work, and automatically scaling resources. These methods reached the level of human experts but in much less time and with fresh understandings of how the workloads operate.

Core claim

By organizing large language models into a human-inspired multi-agent system with dedicated roles for reasoning, experimentation, and analysis that interact via an evaluation framework, it is possible to generate creative, high-performing, and interpretable designs for complex systems problems such as managing distributed GPU clusters for LLM inference, achieving performance on par with human experts while requiring significantly less time.

What carries the argument

The multi-agent LLM workflow in which specialized agents for reasoning, experimentation, and analysis collaborate through an evaluation framework to ground abstract reasoning in empirical feedback.

If this is right

  • Glia can produce system designs that are understandable by humans rather than opaque policies.
  • It can discover novel insights into workload behavior during the design process.
  • Such AI assistance could speed up the development of algorithms for request routing, scheduling, and auto-scaling in similar systems.
  • The approach suggests AI can handle creative aspects of systems design traditionally done by experts.

Where Pith is reading between the lines

These are editorial extensions of the paper, not claims the author makes directly.

  • If this method generalizes, it could be used to design systems for other domains like network protocols or database optimization.
  • The interpretability of the outputs might enable iterative improvement where humans refine the AI-generated ideas.
  • Combining this with traditional optimization tools could lead to hybrid design processes that leverage both reasoning and numerical search.
  • Success here raises the possibility of AI autonomously handling more of the systems research pipeline beyond just design.

Load-bearing premise

That the structured collaboration of LLM agents through empirical evaluation will reliably lead to creative and high-performing system designs without frequent errors in reasoning or experimentation.

What would settle it

Running Glia on the same GPU cluster task multiple times and checking if the generated algorithms consistently achieve performance metrics close to or better than published human-expert baselines, with clear failures if they fall short or cannot be interpreted.

Figures

Figures reproduced from arXiv: 2510.27176 by Ali ParandehGheibi, Arash Nasr-Esfahany, Hari Balakrishnan, Joseph Chandler, Kimia Noorbakhsh, Mohammad Alizadeh, Pantea Karimi, Pouya Hamadanian.

Figure 1
Figure 1. Figure 1: Illustrative pipeline of request routing for LLM infer￾ence. patterns in order to best satisfy specified service-level objectives (SLOs). Typical SLOs include the mean time to first token (TTFT), which captures latency; the mean time per output token (TPOT), which is a measure of throughput; and the mean end-to-end request completion time, which reflects overall responsiveness. Request routing, illustrated… view at source ↗
Figure 2
Figure 2. Figure 2: Distribution of mean request completion times for 100 programs generated by directly prompting the LLM. for generating efficient request routing algorithms. 3.2 Black-box LLM-in-the-loop Search A more sophisticated approach places LLMs within a black-box search loop. In this setting, one or more LLMs generate or modify code candidates, an evaluator executes each candidate on a benchmark and returns a perfo… view at source ↗
Figure 3
Figure 3. Figure 3: Performance of SCG and MCG Glia against other algorithms and baselines. a new SCG after each previous run completes. Both MCG versions achieve the lowest average RT, outperforming SCG, traditional routing heuristics (Round-Robin, LLQ, LOR), and state-of-the-art LLM-based design frameworks (EoH, FunSearch, OpenEvolve). This multi-context scaling enables Glia to effectively utilize larger simulation budgets … view at source ↗
Figure 4
Figure 4. Figure 4: shows the total GPU×hours saved when applying Glia across different layers of the stack. The Glia-discovered autoscaler alone reduces GPU cost by 13% compared to an off-the-shelf autoscaler, while the full Glia￾optimized stack (router, batch scheduler, and autoscaler) cuts total GPU×hours by 40% for this variable workload, compared to standard serving systems (vLLM batch scheduler, LLQ router, and throughp… view at source ↗
Figure 6
Figure 6. Figure 6: Trade-off between tail (P90) Time to First Token (TTFT) and request throughput for the expert-designed algo￾rithm and Glia-designed algorithm. The expert algorithm was tailored to a different problem setup, and struggles in this prefill￾heavy workload. 0 20 40 60 80 100 Num Simulations 30 40 50 60 Best Avg RT So Far (s) Round Robin LLQ LOR Expert MCG-Par4 MCG-Seq SCG EoH FunSearch OpenEvolve [PITH_FULL_IM… view at source ↗
Figure 7
Figure 7. Figure 7: Comparison of Glia variants with baselines and prior methods (lower is better). SCG has the steepest early gains thanks to coherent and continuous white-box reasoning. The two variants of MCG—4-way parallel (MCG-Par4) and sequential (MCG-Seq)—extend the gains and outperform other methods by finding better algorithms more quickly. Shades show 90% confidence intervals. As shown in [PITH_FULL_IMAGE:figures/f… view at source ↗
Figure 8
Figure 8. Figure 8: Comparing Glia variants (lower is better). simulations) but later slows down, taking longer to match the performance of 4-way Parallel Glia. 6 Conclusion We are progressing toward our primary goal: developing Glia into an AI capable of PhD-level systems design and optimization for real-world problems. While this paper’s focus is on AI inference (covering both large language models and traditional AI worklo… view at source ↗
Figure 9
Figure 9. Figure 9: Pyhthon code for the Head-Room Allocator (HRA) request routing algorithm discovered by Glia. """Head-Room Admission (HRA) global scheduler. This scheduler mitigates vLLM pre-emptions by keeping a small KV-cache head-room on every replica *at admission time*. For each incoming request we pessimistically reserve additional blocks to account for the (unknown) decode phase and admit the request only if the tar… view at source ↗
Figure 10
Figure 10. Figure 10: Code generated by FunSearch. class CustomGlobalScheduler(BaseGlobalScheduler): # type: ignore[name-defined] """Latency-oriented, eviction-aware global scheduler. Key features ------------- 1. Decode length prediction per *prefill* bucket (small / mid / large) with an online exponential moving average; gives markedly better memory-footprint forecasts than a single global estimate. 2. Looks ahead and keeps … view at source ↗
Figure 11
Figure 11. Figure 11: Prompt for using an LLM as-is for the request-routing problem. 26 [PITH_FULL_IMAGE:figures/full_fig_p026_11.png] view at source ↗
Figure 12
Figure 12. Figure 12: Base prompt for our FunSearch evaluation. 27 [PITH_FULL_IMAGE:figures/full_fig_p027_12.png] view at source ↗
Figure 13
Figure 13. Figure 13: System prompt used for our OpenEvolve evaluation. 28 [PITH_FULL_IMAGE:figures/full_fig_p028_13.png] view at source ↗
Figure 14
Figure 14. Figure 14: The user’s prompt to Glia for the LLM request-routing problem. 29 [PITH_FULL_IMAGE:figures/full_fig_p029_14.png] view at source ↗
read the original abstract

Can AI autonomously design mechanisms for computer systems on par with the creativity and reasoning of human experts? We present Glia, an AI architecture for networked systems design that uses large language models (LLMs) in a human-inspired multi-agent workflow. Each agent specializes in reasoning, experimentation, and analysis, collaborating through an evaluation framework that grounds abstract reasoning in empirical feedback. Unlike prior ML-for-systems methods that optimize black-box policies, Glia generates interpretable designs and exposes its reasoning. When applied to a distributed GPU cluster for LLM inference, it produces new algorithms for request routing, scheduling, and auto-scaling that perform at human-expert levels in significantly less time, while yielding novel insights into workload behavior. Our results suggest that combining reasoning LLMs with structured experimentation, an AI can produce creative and understandable designs for complex systems problems.

Editorial analysis

A structured set of objections, weighed in public.

Desk editor's note, referee report, simulated authors' rebuttal, and a circularity audit. Tearing a paper down is the easy half of reading it; the pith above is the substance, this is the friction.

Referee Report

2 major / 1 minor

Summary. The manuscript introduces Glia, a multi-agent LLM architecture for automated design of networked systems. Specialized agents handle reasoning, experimentation, and analysis, collaborating via an evaluation framework that incorporates empirical feedback. Applied to request routing, scheduling, and auto-scaling on a distributed GPU cluster for LLM inference, the system is claimed to generate interpretable algorithms that match human-expert performance in less time while revealing novel workload insights.

Significance. If the performance claims are substantiated with rigorous quantitative evidence, the work would be significant as one of the first demonstrations that structured multi-agent LLM workflows can autonomously produce creative, interpretable system designs competitive with human experts, moving beyond black-box policy optimization.

major comments (2)
  1. Abstract: The claim that the generated algorithms 'perform at human-expert levels' is unsupported by any quantitative metrics, baselines, error bars, or description of the comparison protocol against documented human experts. This is load-bearing for the central result.
  2. Evaluation Framework section: No details are provided on test workload durations, variance reporting, number of replications, or statistical controls used by the experimentation and analysis agents to accept or revise designs. Without these, it is impossible to confirm that empirical feedback, rather than LLM narrative, drives the final outputs.
minor comments (1)
  1. Abstract: The phrase 'in significantly less time' would be clearer with a specific time comparison or factor relative to human design processes.

Simulated Author's Rebuttal

2 responses · 0 unresolved

We thank the referee for the constructive and detailed feedback. The comments highlight important areas for strengthening the presentation of our quantitative results and methodological transparency. We address each major comment below and have revised the manuscript to incorporate the suggested improvements.

read point-by-point responses
  1. Referee: Abstract: The claim that the generated algorithms 'perform at human-expert levels' is unsupported by any quantitative metrics, baselines, error bars, or description of the comparison protocol against documented human experts. This is load-bearing for the central result.

    Authors: We agree that the abstract would benefit from more explicit quantitative support for this central claim. The manuscript body reports performance comparisons using metrics such as latency, throughput, and resource efficiency against both standard baselines and human-expert-designed policies, including results from multiple evaluation runs. To address the concern directly, we have revised the abstract to reference these key metrics, note the use of error bars from replications, and briefly describe the comparison protocol (including how human-expert algorithms were sourced and evaluated under identical conditions). This change ensures the claim is better grounded without altering the underlying results. revision: yes

  2. Referee: Evaluation Framework section: No details are provided on test workload durations, variance reporting, number of replications, or statistical controls used by the experimentation and analysis agents to accept or revise designs. Without these, it is impossible to confirm that empirical feedback, rather than LLM narrative, drives the final outputs.

    Authors: We acknowledge that the Evaluation Framework section requires greater specificity on these operational details to demonstrate the role of empirical feedback. We have expanded this section to specify test workload durations, how variance is reported across runs, the number of replications performed for each candidate design, and the statistical controls (such as significance thresholds) applied by the analysis agent when deciding whether to accept, reject, or iterate on a design. These additions clarify the data-driven nature of the workflow. revision: yes

Circularity Check

0 steps flagged

No circularity: empirical workflow is self-contained

full rationale

The paper describes a multi-agent LLM workflow that generates system designs and validates them through an evaluation framework grounded in empirical measurements on a GPU cluster. No equations, fitted parameters, or uniqueness theorems are invoked that reduce the performance claims to self-definition or prior self-citations. The central result—that generated routing/scheduling/auto-scaling algorithms reach human-expert levels—is presented as an outcome of the experimentation loop rather than a definitional or post-hoc fit. Absent any load-bearing self-citation chain or ansatz smuggled via prior work, the derivation remains independent of its inputs.

Axiom & Free-Parameter Ledger

0 free parameters · 1 axioms · 0 invented entities

The architecture rests on the assumption that current LLMs possess sufficient specialized reasoning and experimentation capabilities; no new physical entities or free parameters are introduced in the abstract.

axioms (1)
  • domain assumption Large language models can be effectively specialized into roles for reasoning, experimentation, and analysis that collaborate productively through an evaluation framework.
    The entire Glia architecture depends on this capability of LLMs.

pith-pipeline@v0.9.0 · 5709 in / 1387 out tokens · 35585 ms · 2026-05-18T03:24:29.036727+00:00 · methodology

discussion (0)

Sign in with ORCID, Apple, or X to comment. Anyone can read and Pith papers without signing in.

Lean theorems connected to this paper

Citations machine-checked in the Pith Canon. Every link opens the source theorem in the public Lean library.

What do these tags mean?
matches
The paper's claim is directly supported by a theorem in the formal canon.
supports
The theorem supports part of the paper's argument, but the paper may add assumptions or extra steps.
extends
The paper goes beyond the formal theorem; the theorem is a base layer rather than the whole result.
uses
The paper appears to rely on the theorem as machinery.
contradicts
The paper's claim conflicts with a theorem or certificate in the canon.
unclear
Pith found a possible connection, but the passage is too broad, indirect, or ambiguous to say the theorem truly supports the claim.

Forward citations

Cited by 6 Pith papers

Reviewed papers in the Pith corpus that reference this work. Sorted by Pith novelty score.

  1. VibeServe: Can AI Agents Build Bespoke LLM Serving Systems?

    cs.AI 2026-05 unverdicted novelty 8.0

    VibeServe demonstrates that AI agents can synthesize bespoke LLM serving systems end-to-end, remaining competitive with vLLM in standard settings while outperforming it in six non-standard scenarios involving unusual ...

  2. IteRate: Autonomous AI Synthesis of In-Kernel eBPF Wi-Fi Rate Control Algorithms

    cs.NI 2026-05 unverdicted novelty 8.0

    An AI-driven closed-loop system autonomously creates in-kernel eBPF Wi-Fi rate controllers that outperform the Minstrel algorithm by 21% in web-page load time and peak throughput on a 58-node testbed.

  3. SemaTune: Semantic-Aware Online OS Tuning with Large Language Models

    cs.OS 2026-05 unverdicted novelty 7.0

    SemaTune uses LLM guidance with semantic context to tune up to 41 Linux OS parameters, delivering 72.5% performance gains over defaults and 153.3% over non-LLM baselines on 13 workloads while avoiding degraded states.

  4. Agent-Aided Design for Dynamic CAD Models

    cs.AI 2026-04 unverdicted novelty 6.0

    AADvark extends agent-aided CAD design to dynamic 3D assemblies with movable parts by integrating constraint solvers and visual feedback to create a verification signal for the agent.

  5. AI-Driven Research for Databases

    cs.DB 2026-04 unverdicted novelty 6.0

    Co-evolving LLM-generated solutions with their evaluators enables discovery of novel database algorithms that outperform state-of-the-art baselines, including a query rewrite policy with up to 6.8x lower latency.

  6. Assistants, Not Architects: The Role of LLMs in Networked Systems Design

    cs.NI 2026-04 unverdicted novelty 5.0

    LLMs fail at architectural reasoning for networked systems, but Kepler uses structured constraints and SMT-based optimization to synthesize feasible designs with explanations.

Reference graph

Works this paper leans on

115 extracted references · 115 canonical work pages · cited by 6 Pith papers · 9 internal anchors

  1. [1]

    Vidur: A large-scale simulation framework for llm inference

    Amey Agrawal, Nitin Kedia, Jayashree Mohan, Ashish Panwar, Nipun Kwatra, Bhargav S Gulavani, Ramachandran Ramjee, and Alexey Tumanov. Vidur: A large-scale simulation framework for llm inference. Proceedings of Machine Learning and Systems, 6:351–366, 2024

  2. [2]

    Gulavani, Alexey Tumanov, and Ramachandran Ramjee

    Amey Agrawal, Nitin Kedia, Ashish Panwar, Jayashree Mohan, Nipun Kwatra, Bhargav S. Gulavani, Alexey Tumanov, and Ramachandran Ramjee. T aming Throughput-Latency Tradeoff in LLM Inference with Sarathi-Serve. InOSDI, 2024

  3. [3]

    RLWS: A Reinforcement Learning based GPU Warp Scheduler

    Jayvant Anantpur, Nagendra Gulur Dwarakanath, Shivaram Kalyanakrishnan, Shalabh Bhatnagar, and R. Govindarajan. RL WS: A Reinforcement Learning based GPU W arp Scheduler.arXiv preprint arXiv:1712.04303, 2017

  4. [4]

    GPU Kernel Scientist: An LLM-driven framework for iterative kernel optimization.arXiv preprint arXiv:2506.20807, 2025

    Martin Andrews and Sam Witteveen. Gpu kernel sci- entist: An llm-driven framework for iterative kernel optimization.arXiv preprint arXiv:2506.20807, 2025

  5. [5]

    An AI system to help scientists write expert-level empirical software

    Eser A ygün, Anastasiya Belyaeva, Gheorghe Comanici, Marc Coram, Hao Cui, Jake Garrison, Renee Johnston Anton Kast, Cory Y McLean, Peter Norgaard, Zahra Shamsi, et al. An ai system to help scientists write expert-level empirical software.arXiv preprint arXiv:2509.06503, 2025

  6. [6]

    Current and future use of large language models for knowledge work, 2025

    Michelle Brachman, Amina El-Ashry, Casey Dugan, and W erner Geyer. Current and future use of large language models for knowledge work, 2025

  7. [7]

    Shin, Jiaqi Zheng, Xin Jin, Xia Zhou, Ben Y

    Jie Chen, Kang G. Shin, Jiaqi Zheng, Xin Jin, Xia Zhou, Ben Y . Zhao, and Haitao Zheng. Auto: Scaling deep reinforcement learning for datacenter-scale traffic optimization. InACM SIGCOMM W orkshop on APNet, 2018

  8. [8]

    Barbarians at the gate: How ai is upending systems research, 2025

    Audrey Cheng, Shu Liu, Melissa Pan, Zhifei Li, Bowen W ang, Alex Krentsel, Tian Xia, Mert Cemri, Jongseok Park, Shuo Y ang, Jeff Chen, Lakshya Agrawal, Aditya Desai, Jiarong Xing, Koushik Sen, Matei Zaharia, and Ion Stoica. Barbarians at the gate: How ai is upending systems research, 2025

  9. [9]

    Training Verifiers to Solve Math Word Problems

    Karl Cobbe, Vineet Kosaraju, Mohammad Bavarian, Mark Chen, Heewoo Jun, Lukasz Kaiser, Matthias 14 Plappert, Jerry T worek, Jacob Hilton, Reiichiro Nakano, et al. Training verifiers to solve math word problems.arXiv preprint arXiv:2110.14168, 2021

  10. [10]

    Addressing the ml domain adaptation problem for networking: Realistic and controllable training data generation with netreplica, 2025

    Jaber Daneshamooz, Jessica Nguyen, William Chen, Sanjay Chandrasekaran, Satyandra Guthula, Ankit Gupta, Arpit Gupta, and W alter Willinger. Addressing the ml domain adaptation problem for networking: Realistic and controllable training data generation with netreplica, 2025

  11. [11]

    DeepMind. Advanced version of gemini with deepthink officially achieves gold-medal standard at the international mathematical olympiad.https: //deepmind.google/discover/blog/advanced -version-of-gemini-with-deep-think-offic ially-achieves-gold-medal-standard-at-the -international-mathematical-olympiad/, 2024. Accessed: 2025-10-17

  12. [12]

    PCC vivace: Online-Learning congestion control

    Mo Dong, T ong Meng, Doron Zarchy, Engin Arslan, Y ossi Gilad, Brighten Godfrey, and Michael Schapira. PCC vivace: Online-Learning congestion control. In 15th USENIX Symposium on Networked Systems De- sign and Implementation (NSDI 18), pages 343–356, Renton, W A, April 2018. USENIX Association

  13. [13]

    Brighten Godfrey, and Michael Schapira

    Mo Dong, T ong Meng, Doron Zarchy, Engin Arslan, Y ossi Gilad, P . Brighten Godfrey, and Michael Schapira. PCC Vivace: Online-Learning Congestion Control. InNSDI, pages 343–356, 2018

  14. [14]

    Man-made heuristics are dead

    Rohit Dwivedula, Divyanshu Saxena, Aditya Akella, Swarat Chaudhuri, and Daehyeok Kim. Man-made heuristics are dead. long live code generators!arXiv preprint arXiv:2510.08803, 2025

  15. [15]

    Codemonkeys: Scaling test-time compute for software engineering, 2025

    Ryan Ehrlich, Bradley Brown, Jordan Juravsky, Ronald Clark, Christopher Ré, and Azalia Mirhoseini. Codemonkeys: Scaling test-time compute for software engineering, 2025

  16. [16]

    Towards an AI co-scientist

    Juraj Gottweis, W ei-Hung W eng, Alexander Daryin, T ao Tu, Anil Palepu, Petar Sirkovic, Artiom Myaskovsky, Felix W eissenberger, Keran Rong, Ryutaro T anno, et al. T owards an ai co-scientist. arXiv preprint arXiv:2502.18864, 2025

  17. [17]

    Principles of good design

    Harvard Extension School. Principles of good design. https://cscie2x.dce.harvard.edu/hw/ch01s 06.html. Accessed: 2025-10-17

  18. [18]

    Zhiyuan He, Aashish Gottipati, Lili Qiu, Xufang Luo, Kenuo Xu, Y uqing Y ang, and Francis Y . Y an. Designing Network Algorithms via Large Language Models. InHotNets, page 205–212, New Y ork, NY , USA, 2024. Association for Computing Machinery

  19. [19]

    Zhiyuan He, Aashish Gottipati, Lili Qiu, Y uqing Y ang, and Francis Y . Y an. Congestion control system optimization with large language models, 2025

  20. [20]

    Calm: Co-evolution of algorithms and language model for automatic heuristic design

    Ziyao Huang, W eiwei Wu, Kui Wu, Jianping W ang, and W ei-Bin Lee. Calm: Co-evolution of algorithms and language model for automatic heuristic design. arXiv preprint arXiv:2505.12285, 2025

  21. [21]

    Rotman, P

    Nathan Jay, Noga H. Rotman, P . Brighten Godfrey, Michael Schapira, and A viv T amar. Internet conges- tion control via deep reinforcement learning, 2019

  22. [22]

    Brighten Godfrey, and Michael Schapira

    Nathan Jay, Y air Rotman, P . Brighten Godfrey, and Michael Schapira. An End-to-End Deep Reinforcement Learning Framework for Internet Congestion Control. InICML, 2019

  23. [23]

    T owards safer heuristics with xplain

    Pantea Karimi, Solal Pirelli, Siva Kesava Reddy Kakarla, Ryan Beckett, Santiago Segarra, Beibin Li, Pooria Namyar, and Behnaz Arzani. T owards safer heuristics with xplain. InProceedings of the 23rd ACM W orkshop on Hot T opics in Networks, pages 68–76, 2024

  24. [24]

    Robust heuristic algorithm design with llms, 2025

    Pantea Karimi, Dany Rouhana, Pooria Namyar, Siva Kesava Reddy Kakarla, V enkat Arun, and Behnaz Arzani. Robust heuristic algorithm design with llms, 2025

  25. [25]

    Adaptive neural signal detection for massive mimo.IEEE Transactions on Wireless Communications, 19(8):5635–5648, 2020

    Mehrdad Khani, Mohammad Alizadeh, Jakob Hoydis, and Phil Fleming. Adaptive neural signal detection for massive mimo.IEEE Transactions on Wireless Communications, 19(8):5635–5648, 2020

  26. [26]

    Efficient Memory Management for Large Language Model Serving with PagedAttention

    W oosuk Kwon, Zhuohan Li, Siyuan Zhuang, Ying Sheng, Lianmin Zheng, Cody Hao Y u, Joseph Gonzalez, Hao Zhang, and Ion Stoica. Efficient Memory Management for Large Language Model Serving with PagedAttention. InSOSP, SOSP ’23, page 611–626, New Y ork, NY , USA, 2023. Association for Computing Machinery

  27. [27]

    ShinkaEvolve: Towards Open-Ended And Sample-Efficient Program Evolution

    Robert Tjarko Lange, Y uki Imajuku, and Edoardo Cetin. Shinkaevolve: T owards open-ended and 15 sample-efficient program evolution.arXiv preprint arXiv:2509.19349, 2025

  28. [28]

    Heller, David Schuurmans, Geoffrey J

    Nikolay Lazic, Craig Boutilier, Thomas Lu, Eric W ong, Binz Roy, Marcin Minka, Ben J. Heller, David Schuurmans, Geoffrey J. Gordon, Olivier Duchesnay, Marc L. Bellemare, Albin Cassirer, et al. Data center cooling using model-predictive control. InAdvances in Neural Information Processing Systems (NeurIPS) W orkshop, 2018. Describes learning-assisted contr...

  29. [29]

    Llm inference serving: Survey of recent advances and opportunities, 2024

    Baolin Li, Y ankai Jiang, Vijay Gadepally, and Devesh Tiwari. Llm inference serving: Survey of recent advances and opportunities, 2024

  30. [30]

    Reparo: Loss-resilient generative codec for video conferencing.arXiv preprint arXiv:2305.14135, 2023

    Tianhong Li, Vibhaalakshmi Sivaraman, Pantea Karimi, Lijie Fan, Mohammad Alizadeh, and Dina Katabi. Reparo: Loss-resilient generative codec for video conferencing.arXiv preprint arXiv:2305.14135, 2023

  31. [31]

    Mankowitz, Esme Sutherland Robson, Pushmeet Kohli, Nando de Freitas, Koray Kavukcuoglu, and Oriol Vinyals

    Y ujia Li, David Choi, Junyoung Chung, Nate Kushman, Julian Schrittwieser, Rémi Leblond, T om Eccles, James Keeling, Felix Gimeno, Agustin Dal Lago, Thomas Hubert, Peter Choy, Cyprien de Masson d’Autume, Igor Babuschkin, Xinyun Chen, Po-Sen Huang, Johannes W elbl, Sven Gowal, Alexey Cherepanov, James Molloy, Daniel J. Mankowitz, Esme Sutherland Robson, Pu...

  32. [32]

    Neu- rocuts: Neural decision trees for packet classification

    Eric Liang, Hang Zhu, Xin Jin, and Ion Stoica. Neu- rocuts: Neural decision trees for packet classification. InSIGCOMM, pages 1–15, 2019

  33. [33]

    Evolution of heuristics: T owards efficient automatic algorithm design using large language model

    Fei Liu, Xialiang T ong, Mingxuan Y uan, Xi Lin, Fu Luo, Zhenkun W ang, Zhichao Lu, and Qingfu Zhang. Evolution of heuristics: T owards efficient automatic algorithm design using large language model. InICML, ICML ’24. JMLR.org, 2024

  34. [34]

    arXiv preprint arXiv:2504.19636 (2025)

    Fei Liu, Qingfu Zhang, Jialong Shi, Xialiang T ong, Kun Mao, and Mingxuan Y uan. Fitness landscape of large language model-assisted automated algorithm search.arXiv preprint arXiv:2504.19636, 2025

  35. [35]

    Fine-tuning Large Language Model for Automated Algorithm Design

    Fei Liu, Rui Zhang, Xi Lin, Zhichao Lu, and Qingfu Zhang. Fine-tuning large language model for automated algorithm design.arXiv preprint arXiv:2507.10614, 2025

  36. [36]

    Llm4ad: A platform for algorithm design with large language model

    Fei Liu, Rui Zhang, Zhuoliang Xie, Rui Sun, Kai Li, Xi Lin, Zhenkun W ang, Zhichao Lu, and Qingfu Zhang. Llm4ad: A platform for algorithm design with large language model.arXiv preprint arXiv:2412.17287, 2024

  37. [37]

    CoRR , volume =

    Gang Liu, Yihan Zhu, Jie Chen, and Meng Jiang. Scientific algorithm discovery by augmenting alphaevolve with deep research.arXiv preprint arXiv:2510.06056, 2025

  38. [38]

    Liu, Kevin Lin, John Hewitt, Ashwin Paran- jape, Michele Bevilacqua, Fabio Petroni, and Percy Liang

    Nelson F . Liu, Kevin Lin, John Hewitt, Ashwin Paran- jape, Michele Bevilacqua, Fabio Petroni, and Percy Liang. Lost in the middle: How language models use long contexts.Transactions of the Association for Computational Linguistics, 12:157–173, 2024

  39. [39]

    Alphago moment for model architecture discovery.arXiv preprint arXiv:2507.18074, 2025

    Yixiu Liu, Y ang Nan, W eixian Xu, Xiangkun Hu, Lyumanshan Y e, Zhen Qin, and Pengfei Liu. Alphago moment for model architecture discovery.arXiv preprint arXiv:2507.18074, 2025

  40. [40]

    GitHub - llm-d/llm-d: llm-d en- ables high-performance distributed LLM inference on Kubernetes.https://github.com/llm-d/llm-d,

    llm-d Community. GitHub - llm-d/llm-d: llm-d en- ables high-performance distributed LLM inference on Kubernetes.https://github.com/llm-d/llm-d,

  41. [42]

    MetaMuse: Algorithm Generation via Creative Ideation

    Ruiying Ma, Chieh-Jan Mike Liang, Y anjie Gao, and Francis Y Y an. Algorithm generation via creative ideation.arXiv preprint arXiv:2510.03851, 2025

  42. [43]

    Resource management with deep reinforcement learning

    Hongzi Mao, Mohammad Alizadeh, Ishai Menache, and Srikanth Kandula. Resource management with deep reinforcement learning. InHotNets, pages 50–56, 2016

  43. [44]

    Resource management with deep reinforcement learning

    Hongzi Mao, Mohammad Alizadeh, Ishai Menache, and Srikanth Kandula. Resource management with deep reinforcement learning. InHotNets, 2016

  44. [45]

    Real-world video adaptation with reinforcement learning, 2020

    Hongzi Mao, Shannon Chen, Drew Dimmery, Shaun Singh, Drew Blaisdell, Y uandong Tian, Mohammad Alizadeh, and Eytan Bakshy. Real-world video adaptation with reinforcement learning, 2020

  45. [46]

    Neural adaptive video streaming with pensieve

    Hongzi Mao, Ravi Netravali, and Mohammad Alizadeh. Neural adaptive video streaming with pensieve. InSIGCOMM, pages 197–210, 2017. 16

  46. [47]

    Learning scheduling algorithms for data pro- cessing clusters

    Hongzi Mao, Malte Schwarzkopf, Shaileshh Bojja V enkatakrishnan, Zili Meng, and Mohammad Al- izadeh. Learning scheduling algorithms for data pro- cessing clusters. InSIGCOMM, pages 270–288. 2019

  47. [48]

    Learning scheduling algorithms for data pro- cessing clusters

    Hongzi Mao, Malte Schwarzkopf, Shaileshh Bojja V enkatakrishnan, Zili Meng, and Mohammad Al- izadeh. Learning scheduling algorithms for data pro- cessing clusters. InSIGCOMM, pages 270–288, 2019

  48. [49]

    Bao: Making learned query optimization practical

    Ryan Marcus, Parimarjan Negi, Hongzi Mao, Nesime T atbul, Mohammad Alizadeh, and Tim Kraska. Bao: Making learned query optimization practical. In SIGMOD, pages 1275–1288, 2021

  49. [50]

    Neo: A learned query optimizer.Proc

    Ryan Marcus, Parimarjan Negi, Hongzi Mao, Chi Zhang, Mohammad Alizadeh, Tim Kraska, Olga Papaemmanouil, and Nesime T atbul. Neo: A learned query optimizer.Proc. VLDB Endow ., 12(11):1705–1718, July 2019

  50. [51]

    Interpreting deep learning-based networking systems

    Zili Meng, Minhu W ang, Jiasong Bai, Mingwei Xu, Hongzi Mao, and Hongxin Hu. Interpreting deep learning-based networking systems. InProceedings of the Annual Conference of the ACM Special Interest Group on Data Communication on the Applications, T echnologies, Architectures, and Protocols for Computer Communication, SIGCOMM ’20, page 154–171, New Y ork, N...

  51. [52]

    Study finds chatgpt boosts worker productivity in writing tasks.MIT News, 2023

    MIT News Office. Study finds chatgpt boosts worker productivity in writing tasks.MIT News, 2023. Accessed: 2025-10-17

  52. [53]

    Reinforced generation of combinatorial structures: Hardness of approximation.arXiv preprint arXiv:2509.18057, 2025

    Ansh Nagda, Prabhakar Raghavan, and Abhradeep Thakurta. Reinforced generation of combinatorial structures: Applications to complexity theory.arXiv preprint arXiv:2509.18057, 2025

  53. [55]

    AlphaEvolve: A coding agent for scientific and algorithmic discovery

    Alexander Novikov, Ngân V u, Marvin Eisenberger, Emilien Dupont, Po-Sen Huang, Adam Zsolt W agner, Sergey Shirobokov, Borislav Kozlovskii, Francisco JR Ruiz, Abbas Mehrabian, et al. Alphaevolve: A coding agent for scientific and algorithmic discovery, 2025. URL: https://arxiv . org/abs/2506.13131, 2025

  54. [56]

    GitHub - ai-dynamo/dynamo: A Datacenter Scale Distributed Inference Serving Framework

    NVIDIA. GitHub - ai-dynamo/dynamo: A Datacenter Scale Distributed Inference Serving Framework. https://github.com/ai-dynamo/dynamo, 2025. [Accessed 10-10-2025]

  55. [57]

    OpenAI o3 and o4-mini System Card

    OpenAI. OpenAI o3 and o4-mini System Card. T echnical report, OpenAI, April 2025

  56. [58]

    Splitwise: Efficient generative llm inference using phase splitting

    Pratyush Patel, Esha Choukse, Chaojie Zhang, Aashaka Shah, Íñigo Goiri, Saeed Maleki, and Ricardo Bianchini. Splitwise: Efficient generative llm inference using phase splitting. In2024 ACM/IEEE 51st Annual International Symposium on Computer Architecture (ISCA), pages 118–132, 2024

  57. [59]

    K., Krupke, D., Kidger, P., Sajed, T., Stellato, B., Park, J., et al

    Ori Press, Brandon Amos, Haoyu Zhao, Yikai Wu, Samuel K Ainsworth, Dominik Krupke, Patrick Kidger, T ouqir Sajed, Bartolomeo Stellato, Jisun Park, et al. Algotune: Can language models speed up general-purpose numerical programs?arXiv preprint arXiv:2507.15887, 2025

  58. [60]

    Effective context engineering for ai agents, September 2025

    Prithvi Rajasekaran, Ethan Dixon, Carly Ryan, and Jeremy Hadfield. Effective context engineering for ai agents, September 2025. With contributions from Rafi A yub, Hannah Moran, Cal Rueb, and Connor Jennings. Published online September 29, 2025

  59. [61]

    Mathematical discoveries from program search with large language models.Nature, 625(7995):468–475, 2024

    Bernardino Romera-Paredes, Mohammadamin Barekatain, Alexander Novikov, Matej Balog, M Pawan Kumar, Emilien Dupont, Francisco JR Ruiz, Jordan S Ellenberg, Pengming W ang, Omar Fawzi, et al. Mathematical discoveries from program search with large language models.Nature, 625(7995):468–475, 2024

  60. [62]

    Iroko: A framework to prototype reinforcement learning for data center traffic control.arXiv preprint arXiv:1812.09975, 2018

    Fabian Ruffy, Michael Przystupa, and Ivan Beschast- nikh. Iroko: A framework to prototype reinforcement learning for data center traffic control.arXiv preprint arXiv:1812.09975, 2018

  61. [63]

    DeepConfig: Automating Data Center Network Topologies Management with Machine Learning

    Saim Salman, Christopher Streiffer, Huan Chen, Theophilus Benson, and Asim Kadav. Deepconf: Automating data center network topologies and routing with deep reinforcement learning.arXiv preprint arXiv:1712.03890, 2018. 17

  62. [64]

    Scaling distributed machine learning with In-Network aggregation

    Amedeo Sapio, Marco Canini, Chen-Y u Ho, Jacob Nelson, Panos Kalnis, Changhoon Kim, Arvind Krishnamurthy, Masoud Moshref, Dan Ports, and Peter Richtarik. Scaling distributed machine learning with In-Network aggregation. In18th USENIX Symposium on Networked Systems Design and Implementation (NSDI 21), pages 785–808. USENIX Association, April 2021

  63. [65]

    https://huggingface.co/datasets/anon 8231489123/ShareGPT_Vicuna_unfiltered ,

    ShareGPT Datasets at Hugging Face. https://huggingface.co/datasets/anon 8231489123/ShareGPT_Vicuna_unfiltered ,

  64. [66]

    [Accessed 10-10-2025]

  65. [67]

    OpenEvolve: an open-source evolutionary coding agent, 2025

    Asankhaya Sharma. OpenEvolve: an open-source evolutionary coding agent, 2025

  66. [68]

    Automated high-level code optimization for warehouse performance.IEEE Micro, 2025

    Alexander Shypula, Aman Madaan, Yimeng Zeng, Uri Alon, Jacob Gardner, Milad Hashemi, Graham Neubig, Parthasarathy Ranganathan, Osbert Bastani, and Amir Y azdanbakhsh. Automated high-level code optimization for warehouse performance.IEEE Micro, 2025

  67. [69]

    Galvin, and Greg Gagne.Operating System Concepts

    Abraham Silberschatz, Peter B. Galvin, and Greg Gagne.Operating System Concepts. Wiley Publishing, 10th edition, 2018

  68. [70]

    Gemino: Practical and robust neural compression for video conferencing

    Vibhaalakshmi Sivaraman, Pantea Karimi, V edantha V enkatapathy, Mehrdad Khani, Sadjad Fouladi, Mohammad Alizadeh, Frédo Durand, and Vivienne Sze. Gemino: Practical and robust neural compression for video conferencing. InNSDI, 2024

  69. [71]

    Scaling LLM Test-Time Compute Optimally can be More Effective than Scaling Model Parameters

    Charlie Snell, Jaehoon Lee, Kelvin Xu, and A viral Kumar. Scaling llm test-time compute optimally can be more effective than scaling model parameters. arXiv preprint arXiv:2408.03314, 2024

  70. [72]

    doi:10.48550/arXiv.2507.22876 , url =

    Yiwen Sun, Furong Y e, Zhihan Chen, Ke W ei, and Shaowei Cai. Automatically discovering heuristics in a complex sat solver with large language models. arXiv preprint arXiv:2507.22876, 2025

  71. [73]

    Dearing, Xin W ang, Y uping Fan, and Zhiling Lan

    Yiheng T ao, Yihe Zhang, Matthew T . Dearing, Xin W ang, Y uping Fan, and Zhiling Lan. Prompt-aware scheduling for low-latency llm serving, 2025

  72. [74]

    Aibrix: T owards scalable, cost-effective large language model inference infrastructure, 2025

    The AIBrix Team, Jiaxin Shan, V arun Gupta, Le Xu, Haiyang Shi, Jingyuan Zhang, Ning W ang, Linhui Xu, Rong Kang, T ongping Liu, Yifei Zhang, Yiqing Zhu, Shuowei Jin, Gangmuk Lim, Binbin Chen, Zuzhi Chen, Xiao Liu, Xin Chen, Kante Yin, Chak-Pong Chung, Chenyu Jiang, Yicheng Lu, Jianjun Chen, Caixue Lin, Wu Xiang, Rui Shi, and Liguang Xie. Aibrix: T owards...

  73. [75]

    Rodriguez, W endy A

    Giuseppe Vietri, Liana V . Rodriguez, W endy A. Mar- tinez, Steven Lyons, Jason Liu, Raju Rangaswami, Ming Zhao, and Giri Narasimhan. Driving cache replacement with ml-based lecar. InUSENIX W orkshop on Hot T opics in Storage and File Systems (HotStorage), 2018

  74. [76]

    vllm production stack: reference stack for production vllm deployment.https://github .com/vllm-project/production-stack, 2025

    vllm-project. vllm production stack: reference stack for production vllm deployment.https://github .com/vllm-project/production-stack, 2025

  75. [77]

    Improving parallel program performance with llm optimizers via agent-system interface,

    Anjiang W ei, Allen Nie, Thiago SFX Teixeira, Rohan Y adav, W onchan Lee, Ke W ang, and Alex Aiken. Improving parallel program performance with llm optimizers via agent-system interfaces.arXiv preprint arXiv:2410.15625, 2024

  76. [78]

    Astra: A multi-agent system for GPU kernel performance optimization.arXiv preprint arXiv:2509.07506, 2025

    Anjiang W ei, Tianran Sun, Y ogesh Seenichamy, Hang Song, Anne Ouyang, Azalia Mirhoseini, Ke W ang, and Alex Aiken. Astra: A multi-agent system for gpu kernel performance optimization.arXiv preprint arXiv:2509.07506, 2025

  77. [79]

    Problems in the design of systems

    David Wheeler. Problems in the design of systems. https://www.doc.ic.ac.uk/~dcw/PSD/article 13/. Accessed: 2025-10-17

  78. [80]

    Edsger w

    Wikiquote contributors. Edsger w. dijkstra – wik- iquote. https://en.wikiquote.org/wiki/Edsg er_W._Dijkstra#:~:text=native%20tongue%2 0is%20the%20most,asset%20of%20a%20compet ent%20programmer, 2025. Accessed: 2025-10-17

  79. [81]

    TCP ex Machina: Computer-Generated Congestion Control

    Keith Winstein and Hari Balakrishnan. TCP ex Machina: Computer-Generated Congestion Control. InSIGCOMM, pages 123–134, 2013

  80. [82]

    arXiv preprint arXiv:2510.11661 , year=

    Shijie Xia, Y uhan Sun, and Pengfei Liu. Sr-scientist: Scientific equation discovery with agentic ai.arXiv preprint arXiv:2510.11661, 2025

Showing first 80 references.