pith. sign in

arxiv: 1810.12027 · v3 · pith:W6D55MY4new · submitted 2018-10-29 · 💻 cs.IR

Deep Reinforcement Learning based Recommendation with Explicit User-Item Interactions Modeling

classification 💻 cs.IR
keywords recommendationinteractionslearningreinforcementusersdeepdynamicfactorization
0
0 comments X
read the original abstract

Recommendation is crucial in both academia and industry, and various techniques are proposed such as content-based collaborative filtering, matrix factorization, logistic regression, factorization machines, neural networks and multi-armed bandits. However, most of the previous studies suffer from two limitations: (1) considering the recommendation as a static procedure and ignoring the dynamic interactive nature between users and the recommender systems, (2) focusing on the immediate feedback of recommended items and neglecting the long-term rewards. To address the two limitations, in this paper we propose a novel recommendation framework based on deep reinforcement learning, called DRR. The DRR framework treats recommendation as a sequential decision making procedure and adopts an "Actor-Critic" reinforcement learning scheme to model the interactions between the users and recommender systems, which can consider both the dynamic adaptation and long-term rewards. Furthermore, a state representation module is incorporated into DRR, which can explicitly capture the interactions between items and users. Three instantiation structures are developed. Extensive experiments on four real-world datasets are conducted under both the offline and online evaluation settings. The experimental results demonstrate the proposed DRR method indeed outperforms the state-of-the-art competitors.

This paper has not been read by Pith yet.

discussion (0)

Sign in with ORCID, Apple, or X to comment. Anyone can read and Pith papers without signing in.

Forward citations

Cited by 3 Pith papers

Reviewed papers in the Pith corpus that reference this work. Sorted by Pith novelty score.

  1. From Bootstrapping to Sequence Modeling: A Unified Generative Framework for Personalized Landing-Page Modeling

    cs.IR 2026-06 unverdicted novelty 5.0

    GLAN replaces CQL bootstrapping with Decision Transformer sequence modeling for PLPM, using global inter-day (L-RTG) and local session (HRM) modules to achieve +0.158% DAU and +0.108% LT gains in Kuaishou online tests.

  2. Time-Constrained Recommendations: Reinforcement Learning Strategies for E-Commerce

    cs.LG 2025-12 unverdicted novelty 4.0

    Reinforcement learning policies for time-constrained slate recommendations improve engagement over contextual bandits in e-commerce settings.

  3. Breaking the Filter Bubble: A Semantic Pareto-DQN Framework for Multi-Objective Recommendation

    cs.AI 2026-06 unverdicted novelty 3.0

    Introduces semantic Pareto-DQN for multi-objective recommendation that sustains trajectory variance to improve diversity and fairness on MovieLens with limited engagement loss.