pith. sign in

arxiv: 2508.02007 · v2 · pith:X643CASJnew · submitted 2025-08-04 · 💻 cs.AR · cs.OS

Revelator: Rapid Data Fetching via System-Software-Guided Hash-based Speculative Address Translation

classification 💻 cs.AR cs.OS
keywords addressrevelatortranslationva-to-padatahardwaremappingsmemory
0
0 comments X
read the original abstract

Address translation is a major performance bottleneck in modern computing systems. Predicting the physical address (PA) of requested data before address translation completes can hide this latency, but accurate virtual address (VA)-to-PA prediction is difficult because conventional operating systems make VA-to-PA mappings unpredictable. Prior work improves predictability but relies on large pages or VA-to-PA contiguity, or stores speculation metadata in costly hardware structures. We introduce Revelator, a hardware-OS cooperative technique that uses hashing to enable accurate speculative address translation with small system modifications. Revelator employs a tiered hash-based memory allocation policy for both program data and last-level page table entries (PTEs), creating predictable VA-to-PA and VA-to-PTE mappings. After an L2 TLB miss, a lightweight hardware speculation engine uses the OS hash functions to predict these mappings and prefetch the corresponding cache blocks before translation completes, hiding address translation latency and accelerating page table walks (PTWs). Revelator does not rely on large pages or VA-to-PA contiguity and requires only small OS and hardware changes. Across 11 data-intensive workloads, Revelator improves performance by 15.3% on average over the state-of-the-art speculative address translation technique under high memory fragmentation. In virtualized environments, it predicts both guest and host physical addresses, providing a 13.6% average speedup over Nested Paging. In 16-core systems, Revelator achieves 1.40x (1.50x) speedup over Transparent Huge Pages across 30 server workload mixes from Google under medium (high) memory fragmentation. RTL synthesis shows only 0.02% area and 0.03% power overheads on a high-end server-grade CPU. Revelator is freely available at \href{https://github.com/CMU-SAFARI/Virtuoso}{github.com/CMU-SAFARI/Virtuoso}.

This paper has not been read by Pith yet.

discussion (0)

Sign in with ORCID, Apple, or X to comment. Anyone can read and Pith papers without signing in.