WebGPU dispatch overhead for batch-1 LLM inference is 24-71 microseconds per operation depending on backend, dominating performance, with a sequential-dispatch method revealing that naive benchmarks overestimate by about 20 times.
Accessed February 2026
1 Pith paper cite this work. Polarity classification is still indexing.
1
Pith paper citing it
fields
cs.LG 1years
2026 1verdicts
UNVERDICTED 1representative citing papers
citing papers explorer
-
Characterizing WebGPU Dispatch Overhead for LLM Inference Across Four GPU Vendors, Three Backends, and Three Browsers
WebGPU dispatch overhead for batch-1 LLM inference is 24-71 microseconds per operation depending on backend, dominating performance, with a sequential-dispatch method revealing that naive benchmarks overestimate by about 20 times.