Deep Reinforcement Learning based Recommendation with Explicit User-Item Interactions Modeling
read the original abstract
Recommendation is crucial in both academia and industry, and various techniques are proposed such as content-based collaborative filtering, matrix factorization, logistic regression, factorization machines, neural networks and multi-armed bandits. However, most of the previous studies suffer from two limitations: (1) considering the recommendation as a static procedure and ignoring the dynamic interactive nature between users and the recommender systems, (2) focusing on the immediate feedback of recommended items and neglecting the long-term rewards. To address the two limitations, in this paper we propose a novel recommendation framework based on deep reinforcement learning, called DRR. The DRR framework treats recommendation as a sequential decision making procedure and adopts an "Actor-Critic" reinforcement learning scheme to model the interactions between the users and recommender systems, which can consider both the dynamic adaptation and long-term rewards. Furthermore, a state representation module is incorporated into DRR, which can explicitly capture the interactions between items and users. Three instantiation structures are developed. Extensive experiments on four real-world datasets are conducted under both the offline and online evaluation settings. The experimental results demonstrate the proposed DRR method indeed outperforms the state-of-the-art competitors.
This paper has not been read by Pith yet.
Forward citations
Cited by 3 Pith papers
-
From Bootstrapping to Sequence Modeling: A Unified Generative Framework for Personalized Landing-Page Modeling
GLAN replaces CQL bootstrapping with Decision Transformer sequence modeling for PLPM, using global inter-day (L-RTG) and local session (HRM) modules to achieve +0.158% DAU and +0.108% LT gains in Kuaishou online tests.
-
Time-Constrained Recommendations: Reinforcement Learning Strategies for E-Commerce
Reinforcement learning policies for time-constrained slate recommendations improve engagement over contextual bandits in e-commerce settings.
-
Breaking the Filter Bubble: A Semantic Pareto-DQN Framework for Multi-Objective Recommendation
Introduces semantic Pareto-DQN for multi-objective recommendation that sustains trajectory variance to improve diversity and fairness on MovieLens with limited engagement loss.
discussion (0)
Sign in with ORCID, Apple, or X to comment. Anyone can read and Pith papers without signing in.