Strait cuts high-priority deadline violations in ML inference serving by 1-11 percentage points through contention modeling and priority scheduling under high GPU load.
IEEE Transactions on Parallel and Distributed Systems 33(4), 805–817 (2022)
2 Pith papers cite this work. Polarity classification is still indexing.
2
Pith papers citing it
citation-role summary
background 1
citation-polarity summary
verdicts
UNVERDICTED 2roles
background 1polarities
background 1representative citing papers
MuMFiM is a new open-source two-scale modeling framework achieving 1000x GPU microscale speedup and near-optimal strong/weak scaling to 128 nodes on heterogeneous hardware, demonstrated on a human spine ligament.
citing papers explorer
-
Strait: Perceiving Priority and Interference in ML Inference Serving
Strait cuts high-priority deadline violations in ML inference serving by 1-11 percentage points through contention modeling and priority scheduling under high GPU load.
-
A new open source framework for multiscale modeling of fibrous materials on heterogeneous supercomputers
MuMFiM is a new open-source two-scale modeling framework achieving 1000x GPU microscale speedup and near-optimal strong/weak scaling to 128 nodes on heterogeneous hardware, demonstrated on a human spine ligament.