SMTPO uses multi-task SFT to improve simulator feedback quality and RL with fine-grained rewards to optimize multi-turn preference reasoning in LLM-based conversational recommendation.
Reason4rec: Large language models for recommendation with deliberative user preference alignment
6 Pith papers cite this work. Polarity classification is still indexing.
citation-role summary
citation-polarity summary
roles
background 4polarities
background 4representative citing papers
IntuRec anchors LLM latent reasoning for recommendation by deriving an intuition embedding from top-K candidates via self- and cross-attention to initialize more accurate trajectories.
RRCM trains an LLM to dynamically retrieve from collaborative and meta memories using group relative policy optimization driven by final top-k recommendation quality.
FLR factorizes latent reasoning into multiple preference factors using multi-factor attention and regularizations, outperforming baselines on recommendation benchmarks while adding robustness and interpretability.
This survey organizes generative recommendation into data, model, and task dimensions, identifying five advantages including world knowledge integration and creative generation while noting challenges in benchmarks and efficiency.
The paper unifies perspectives on Long CoT in reasoning LLMs by introducing a taxonomy, detailing characteristics of deep reasoning and reflection, and discussing emergence phenomena and future directions.
citing papers explorer
-
User Simulator-Guided Multi-Turn Preference Optimization for Reasoning LLM-based Conversational Recommendation
SMTPO uses multi-task SFT to improve simulator feedback quality and RL with fine-grained rewards to optimize multi-turn preference reasoning in LLM-based conversational recommendation.
-
Intuition-Guided Latent Reasoning for LLM-Based Recommendation
IntuRec anchors LLM latent reasoning for recommendation by deriving an intuition embedding from top-K candidates via self- and cross-attention to initialize more accurate trajectories.
-
RRCM: Ranking-Driven Retrieval over Collaborative and Meta Memories for LLM Recommendation
RRCM trains an LLM to dynamically retrieve from collaborative and meta memories using group relative policy optimization driven by final top-k recommendation quality.
-
Factorized Latent Reasoning for LLM-based Recommendation
FLR factorizes latent reasoning into multiple preference factors using multi-factor attention and regularizations, outperforming baselines on recommendation benchmarks while adding robustness and interpretability.
-
A Survey on Generative Recommendation: Data, Model, and Tasks
This survey organizes generative recommendation into data, model, and task dimensions, identifying five advantages including world knowledge integration and creative generation while noting challenges in benchmarks and efficiency.
-
Towards Reasoning Era: A Survey of Long Chain-of-Thought for Reasoning Large Language Models
The paper unifies perspectives on Long CoT in reasoning LLMs by introducing a taxonomy, detailing characteristics of deep reasoning and reflection, and discussing emergence phenomena and future directions.