Introduces three linearizable GPU concurrent queues: an adapted wait-free queue using segments, a bounded lock-free queue with wave-batched paths, and a bounded wait-free queue using 64-bit CAS operations.
Scalable GPU graph traversal
2 Pith papers cite this work. Polarity classification is still indexing.
2
Pith papers citing it
years
2026 2representative citing papers
TTP is a hardware prefetcher for ray tracing that leverages traversal stack addresses during DFS to prefetch BVH nodes, achieving 1.48x average speedup and 98.92% L1 accuracy in cycle-level simulations.
citing papers explorer
-
Scalable Concurrent Queues for GPU
Introduces three linearizable GPU concurrent queues: an adapted wait-free queue using segments, a bounded lock-free queue with wave-batched paths, and a bounded wait-free queue using 64-bit CAS operations.
-
TTP: A Hardware-Efficient Design for Precise Prefetching in Ray Tracing
TTP is a hardware prefetcher for ray tracing that leverages traversal stack addresses during DFS to prefetch BVH nodes, achieving 1.48x average speedup and 98.92% L1 accuracy in cycle-level simulations.