RecNMP: Accelerating Personalized Recommendation with Near-Memory Processing

Amin Firoozshahian; Benjamin Youngjae Cho; Bert Maher; Bill Jia; Brandon Reagen; Carole-Jean Wu; David Brooks; Dheevatsa Mudigere; Hsien-Hsin S. Lee; Kim Hazelwood

arxiv: 1912.12953 · v1 · pith:SDZH6Z6Vnew · submitted 2019-12-30 · 💻 cs.DC · cs.AR

RecNMP: Accelerating Personalized Recommendation with Near-Memory Processing

Liu Ke , Udit Gupta , Carole-Jean Wu , Benjamin Youngjae Cho , Mark Hempstead , Brandon Reagen , Xuan Zhang , David Brooks

show 13 more authors

Vikas Chandra Utku Diril Amin Firoozshahian Kim Hazelwood Bill Jia Hsien-Hsin S. Lee Meng Li Bert Maher Dheevatsa Mudigere Maxim Naumov Martin Schatz Mikhail Smelyanskiy Xiaodong Wang

This is my paper

classification 💻 cs.DC cs.AR

keywords recommendationmemoryrecnmpembeddingmodelspersonalizedaccelerateinference

0 comments

read the original abstract

Personalized recommendation systems leverage deep learning models and account for the majority of data center AI cycles. Their performance is dominated by memory-bound sparse embedding operations with unique irregular memory access patterns that pose a fundamental challenge to accelerate. This paper proposes a lightweight, commodity DRAM compliant, near-memory processing solution to accelerate personalized recommendation inference. The in-depth characterization of production-grade recommendation models shows that embedding operations with high model-, operator- and data-level parallelism lead to memory bandwidth saturation, limiting recommendation inference performance. We propose RecNMP which provides a scalable solution to improve system throughput, supporting a broad range of sparse embedding models. RecNMP is specifically tailored to production environments with heavy co-location of operators on a single server. Several hardware/software co-optimization techniques such as memory-side caching, table-aware packet scheduling, and hot entry profiling are studied, resulting in up to 9.8x memory latency speedup over a highly-optimized baseline. Overall, RecNMP offers 4.2x throughput improvement and 45.8% memory energy savings.

This paper has not been read by Pith yet.

RecNMP: Accelerating Personalized Recommendation with Near-Memory Processing

discussion (0)