pith. sign in

arxiv: 2504.08795 · v1 · pith:RNX7N26Znew · submitted 2025-04-08 · 💻 cs.DC

DARIS: An Oversubscribed Spatio-Temporal Scheduler for Real-Time DNN Inference on GPUs

classification 💻 cs.DC
keywords darisgpustasksbatchinghigh-priorityimproveslow-priorityreal-time
0
0 comments X
read the original abstract

The widespread use of Deep Neural Networks (DNNs) is limited by high computational demands, especially in constrained environments. GPUs, though effective accelerators, often face underutilization and rely on coarse-grained scheduling. This paper introduces DARIS, a priority-based real-time DNN scheduler for GPUs, utilizing NVIDIA's MPS and CUDA streaming for spatial sharing, and a synchronization-based staging method for temporal partitioning. In particular, DARIS improves GPU utilization and uniquely analyzes GPU concurrency by oversubscribing computing resources. It also supports zero-delay DNN migration between GPU partitions. Experiments show DARIS improves throughput by 15% and 11.5% over batching and state-of-the-art schedulers, respectively, even without batching. All high-priority tasks meet deadlines, with low-priority tasks having under 2% deadline miss rate. High-priority response times are 33% better than those of low-priority tasks.

This paper has not been read by Pith yet.

discussion (0)

Sign in with ORCID, Apple, or X to comment. Anyone can read and Pith papers without signing in.

Forward citations

Cited by 1 Pith paper

Reviewed papers in the Pith corpus that reference this work. Sorted by Pith novelty score.

  1. Edge-Inference Governors Need Memory-Clock State

    cs.PF 2026-06 unverdicted novelty 5.0

    EMC state is required in latency models for edge inference governors; EMC-blind CPU/GPU fits miss 25-28% deadlines while EMC-aware refits limit misses to 1.3% and identify feasible energy points across vision and LLM ...