Autopoiesis uses LLM-driven program synthesis to evolve serving policies online during deployment, delivering up to 53% and average 34% gains over prior LLM serving systems under runtime dynamics.
Parallax: Efficient llm inference service over decentralized environment
4 Pith papers cite this work. Polarity classification is still indexing.
fields
cs.DC 4years
2026 4verdicts
UNVERDICTED 4representative citing papers
HexiSeq optimizes sequence and head partitioning across mixed GPUs to improve long-context LLM training throughput by up to 1.72x in simulations.
BloomBee is a distributed LLM inference system that achieves up to 1.76x higher throughput and 43.2% lower latency than prior decentralized systems by optimizing communication across multiple dimensions in low-bandwidth internet settings.
This research agenda argues that cloud-native architectures, microservices, autoscaling, and emerging trends like serverless inference and federated learning are required to make large language models efficient and scalable.
citing papers explorer
-
Autopoiesis: A Self-Evolving System Paradigm for LLM Serving Under Runtime Dynamics
Autopoiesis uses LLM-driven program synthesis to evolve serving policies online during deployment, delivering up to 53% and average 34% gains over prior LLM serving systems under runtime dynamics.
-
HexiSeq: Accommodating Long Context Training of LLMs over Heterogeneous Hardware
HexiSeq optimizes sequence and head partitioning across mixed GPUs to improve long-context LLM training throughput by up to 1.72x in simulations.
-
Distributed Generative Inference of LLM at Internet Scales with Multi-Dimensional Communication Optimization
BloomBee is a distributed LLM inference system that achieves up to 1.76x higher throughput and 43.2% lower latency than prior decentralized systems by optimizing communication across multiple dimensions in low-bandwidth internet settings.
-
Cloud-native and Distributed Systems for Efficient and Scalable Large Language Models -- A Research Agenda
This research agenda argues that cloud-native architectures, microservices, autoscaling, and emerging trends like serverless inference and federated learning are required to make large language models efficient and scalable.