GPUOS delivers up to 15.3x speedup over standard PyTorch by running a single persistent kernel that receives tasks from a host queue and injects JIT-compiled operators at runtime via NVRTC and device function pointers.
Rtgpu: Real-time gpu scheduling of hard deadline parallel tasks with fine-grain utilization
1 Pith paper cite this work. Polarity classification is still indexing.
1
Pith paper citing it
fields
cs.DC 1years
2026 1verdicts
UNVERDICTED 1representative citing papers
citing papers explorer
-
GPUOS: A GPU Operating System Primitive for Transparent Operation Fusion
GPUOS delivers up to 15.3x speedup over standard PyTorch by running a single persistent kernel that receives tasks from a host queue and injects JIT-compiled operators at runtime via NVRTC and device function pointers.