IP-CaT jointly optimizes TLB and cache management for L1I prefetching via a translation prefetch buffer and trimodal replacement policy, yielding 8.7% geomean speedup over EPI across 105 server workloads.
A two level neural approach combining off-chip prediction with adaptive prefetch filtering
3 Pith papers cite this work. Polarity classification is still indexing.
years
2026 3verdicts
UNVERDICTED 3representative citing papers
TLX is a Triton extension that exposes multi-warp, asynchronous, and cluster-level controls for modern GPUs, delivering competitive performance with low programmer effort and production deployment.
Affinity Tailor improves per-CPU throughput by 12% on chiplet systems and 3% on non-chiplet systems over Linux CFS by using dynamic compact affinity hints derived from online demand estimates.
citing papers explorer
-
Enhancing Instruction Prefetching via Cache and TLB Management
IP-CaT jointly optimizes TLB and cache management for L1I prefetching via a translation prefetch buffer and trimodal replacement policy, yielding 8.7% geomean speedup over EPI across 105 server workloads.
-
TLX: Hardware-Native, Evolvable MIMW GPU Compiler for Large-scale Production Environments
TLX is a Triton extension that exposes multi-warp, asynchronous, and cluster-level controls for modern GPUs, delivering competitive performance with low programmer effort and production deployment.
-
Affinity Tailor: Dynamic Locality-Aware Scheduling at Scale
Affinity Tailor improves per-CPU throughput by 12% on chiplet systems and 3% on non-chiplet systems over Linux CFS by using dynamic compact affinity hints derived from online demand estimates.