Singularity: Planet-Scale, Preemptive and Elastic Scheduling of AI Workloads

Abhishek Singh; Amey Agrawal; Atul Katiyar; Bhargav Gulavani; Chen Chen; Cheng Xu; Dharma Shukla; Eddie Ailijiang; Hasibur Rahman; Karthik Elangovan

arxiv: 2202.07848 · v2 · pith:QVNGSZMFnew · submitted 2022-02-16 · 💻 cs.DC · cs.AI

Singularity: Planet-Scale, Preemptive and Elastic Scheduling of AI Workloads

Dharma Shukla , Muthian Sivathanu , Srinidhi Viswanatha , Bhargav Gulavani , Rimma Nehme , Amey Agrawal , Chen Chen , Nipun Kwatra

show 18 more authors

Ramachandran Ramjee Pankaj Sharma Atul Katiyar Vipul Modi Vaibhav Sharma Abhishek Singh Shreshth Singhal Kaustubh Welankar Lu Xun Ravi Anupindi Karthik Elangovan Hasibur Rahman Zhou Lin Rahul Seetharaman Cheng Xu Eddie Ailijiang Suresh Krishnappa Mark Russinovich (Microsoft)

This is my paper

classification 💻 cs.DC cs.AI

keywords singularityworkloadsdeeplearningacceleratorsacrossapproachdata

0 comments

read the original abstract

Lowering costs by driving high utilization across deep learning workloads is a crucial lever for cloud providers. We present Singularity, Microsoft's globally distributed scheduling service for highly-efficient and reliable execution of deep learning training and inference workloads. At the heart of Singularity is a novel, workload-aware scheduler that can transparently preempt and elastically scale deep learning workloads to drive high utilization without impacting their correctness or performance, across a global fleet of AI accelerators (e.g., GPUs, FPGAs). All jobs in Singularity are preemptable, migratable, and dynamically resizable (elastic) by default: a live job can be dynamically and transparently (a) preempted and migrated to a different set of nodes, cluster, data center or a region and resumed exactly from the point where the execution was preempted, and (b) resized (i.e., elastically scaled-up/down) on a varying set of accelerators of a given type. Our mechanisms are transparent in that they do not require the user to make any changes to their code or require using any custom libraries that may limit flexibility. Additionally, our approach significantly improves the reliability of deep learning workloads. We show that the resulting efficiency and reliability gains with Singularity are achieved with negligible impact on the steady-state performance. Finally, our design approach is agnostic of DNN architectures and handles a variety of parallelism strategies (e.g., data/pipeline/model parallelism).

This paper has not been read by Pith yet.

discussion (0)

Forward citations

Cited by 2 Pith papers

Reviewed papers in the Pith corpus that reference this work. Sorted by Pith novelty score.

Power-Flexible AI Data Centers: A New Paradigm for Grid-Responsive Compute
cs.DC 2026-06 unverdicted novelty 5.0

AI data centers can act as flexible grid resources via software-based power control, with experiments on a 130 kW GPU cluster showing rapid load reduction, curtailment, and distributed shifting.
FlexNPU: Transparent NPU Virtualization for Dynamic LLM Prefill-Decode Co-location
cs.DC 2026-06 unverdicted novelty 5.0

FlexNPU is a transparent virtualization system for Ascend NPUs that supports dynamic prefill-decode co-location in LLM serving and reports throughput gains plus large TTFT reductions versus static baselines.