Deep Reinforcement Learning based Recommendation with Explicit User-Item Interactions Modeling

Feng Liu; Haokun Chen; Huifeng Guo; Ruiming Tang; Weinan Zhang; Xutao Li; Yunming Ye; Yuzhou Zhang

arxiv: 1810.12027 · v3 · pith:W6D55MY4new · submitted 2018-10-29 · 💻 cs.IR

Deep Reinforcement Learning based Recommendation with Explicit User-Item Interactions Modeling

Feng Liu , Ruiming Tang , Xutao Li , Weinan Zhang , Yunming Ye , Haokun Chen , Huifeng Guo , Yuzhou Zhang This is my paper

classification 💻 cs.IR

keywords recommendationinteractionslearningreinforcementusersdeepdynamicfactorization

0 comments

read the original abstract

Recommendation is crucial in both academia and industry, and various techniques are proposed such as content-based collaborative filtering, matrix factorization, logistic regression, factorization machines, neural networks and multi-armed bandits. However, most of the previous studies suffer from two limitations: (1) considering the recommendation as a static procedure and ignoring the dynamic interactive nature between users and the recommender systems, (2) focusing on the immediate feedback of recommended items and neglecting the long-term rewards. To address the two limitations, in this paper we propose a novel recommendation framework based on deep reinforcement learning, called DRR. The DRR framework treats recommendation as a sequential decision making procedure and adopts an "Actor-Critic" reinforcement learning scheme to model the interactions between the users and recommender systems, which can consider both the dynamic adaptation and long-term rewards. Furthermore, a state representation module is incorporated into DRR, which can explicitly capture the interactions between items and users. Three instantiation structures are developed. Extensive experiments on four real-world datasets are conducted under both the offline and online evaluation settings. The experimental results demonstrate the proposed DRR method indeed outperforms the state-of-the-art competitors.

This paper has not been read by Pith yet.

discussion (0)

Forward citations

Cited by 3 Pith papers

Reviewed papers in the Pith corpus that reference this work. Sorted by Pith novelty score.

From Bootstrapping to Sequence Modeling: A Unified Generative Framework for Personalized Landing-Page Modeling
cs.IR 2026-06 unverdicted novelty 5.0

GLAN replaces CQL bootstrapping with Decision Transformer sequence modeling for PLPM, using global inter-day (L-RTG) and local session (HRM) modules to achieve +0.158% DAU and +0.108% LT gains in Kuaishou online tests.
Time-Constrained Recommendations: Reinforcement Learning Strategies for E-Commerce
cs.LG 2025-12 unverdicted novelty 4.0

Reinforcement learning policies for time-constrained slate recommendations improve engagement over contextual bandits in e-commerce settings.
Breaking the Filter Bubble: A Semantic Pareto-DQN Framework for Multi-Objective Recommendation
cs.AI 2026-06 unverdicted novelty 3.0

Introduces semantic Pareto-DQN for multi-objective recommendation that sustains trajectory variance to improve diversity and fairness on MovieLens with limited engagement loss.