pith. sign in

arxiv: 2606.00486 · v2 · pith:7SMMFO2Vnew · submitted 2026-05-30 · 💻 cs.AR · cs.PF

Dead on Arrival: Characterizing and Protecting Against Dead-Entry TLB Misses in GPU Microarchitectures

classification 💻 cs.AR cs.PF
keywords workloadsmissesdead-entryevictedimmediatelymechanismmemorypage
0
0 comments X
read the original abstract

GPU workloads with large memory footprints frequently suffer from redundant L2 TLB misses in which a recently evicted translation is immediately re-walked at full page-walk cost. We characterize these dead-entry misses across 24 GPU workloads, finding they account for up to 99% of L2 TLB misses in the most TLB-sensitive applications, yet their performance impact varies widely depending on memory access structure. Workloads where warps share the same virtual page suffer from burst amplification, where a single eviction stalls many warps simultaneously waiting for one translation to return. In contrast, workloads where each warp accesses a distinct set of pages face a capacity-overflow problem that no replacement policy can resolve, a distinction validated by huge page experiments. Building on this two-class taxonomy, we design DEPOT (Dead-Entry PrOTection), a 1 KB Bloom filter mechanism that prevents recently evicted translations from being displaced immediately upon reinstallation, delivering up to 72% IPC improvement on interference-driven workloads with zero overhead on others, and composing with the state-of-the-art TLB prefetching and compaction mechanism, for 2 to 7% additional gain.

This paper has not been read by Pith yet.

discussion (0)

Sign in with ORCID, Apple, or X to comment. Anyone can read and Pith papers without signing in.