Recognition: 2 theorem links
OneRec: Unifying Retrieve and Rank with Generative Recommender and Iterative Preference Alignment
Pith reviewed 2026-05-12 18:24 UTC · model grok-4.3
The pith
OneRec replaces the retrieve-and-rank pipeline with a single end-to-end generative model that outperforms existing recommender systems in live use.
A machine-rendered reading of the paper's core claim, the machinery that carries it, and where it could break.
Core claim
OneRec is the first end-to-end generative recommender that surpasses complex retrieve-and-rank systems in real-world scenarios by using an encoder-decoder structure with sparse MoE to encode historical behaviors and decode candidate videos, a session-wise generation method that produces coherent lists without point-by-point rules, and an Iterative Preference Alignment module that trains via DPO on samples drawn through a reward model simulating user preferences under single-display constraints, delivering measurable gains such as increased watch-time after deployment.
What carries the argument
The Iterative Preference Alignment module, which adapts Direct Preference Optimization by training a reward model to generate positive and negative samples for each user request and thereby enables preference tuning without simultaneous paired displays.
Load-bearing premise
The separately trained reward model can faithfully simulate user preferences and supply unbiased training signals for direct preference optimization in a single-display setting.
What would settle it
A live A/B test on the same user traffic that directly compares the generative OneRec outputs against the existing retrieve-and-rank baseline and measures whether watch-time and engagement metrics improve or stay flat.
read the original abstract
Recently, generative retrieval-based recommendation systems have emerged as a promising paradigm. However, most modern recommender systems adopt a retrieve-and-rank strategy, where the generative model functions only as a selector during the retrieval stage. In this paper, we propose OneRec, which replaces the cascaded learning framework with a unified generative model. To the best of our knowledge, this is the first end-to-end generative model that significantly surpasses current complex and well-designed recommender systems in real-world scenarios. Specifically, OneRec includes: 1) an encoder-decoder structure, which encodes the user's historical behavior sequences and gradually decodes the videos that the user may be interested in. We adopt sparse Mixture-of-Experts (MoE) to scale model capacity without proportionally increasing computational FLOPs. 2) a session-wise generation approach. In contrast to traditional next-item prediction, we propose a session-wise generation, which is more elegant and contextually coherent than point-by-point generation that relies on hand-crafted rules to properly combine the generated results. 3) an Iterative Preference Alignment module combined with Direct Preference Optimization (DPO) to enhance the quality of the generated results. Unlike DPO in NLP, a recommendation system typically has only one opportunity to display results for each user's browsing request, making it impossible to obtain positive and negative samples simultaneously. To address this limitation, We design a reward model to simulate user generation and customize the sampling strategy. Extensive experiments have demonstrated that a limited number of DPO samples can align user interest preferences and significantly improve the quality of generated results. We deployed OneRec in the main scene of Kuaishou, achieving a 1.6\% increase in watch-time, which is a substantial improvement.
Editorial analysis
A structured set of objections, weighed in public.
Referee Report
Summary. The manuscript presents OneRec, a unified generative recommender that replaces the traditional retrieve-and-rank cascade with a single encoder-decoder model. It encodes user historical behavior sequences, employs sparse Mixture-of-Experts (MoE) for capacity scaling, performs session-wise generation rather than pointwise next-item prediction, and applies Iterative Preference Alignment via Direct Preference Optimization (DPO) using a reward model to synthesize positive/negative pairs in single-display settings. The central claim is that this is the first end-to-end generative model to significantly outperform complex production recommender systems, evidenced by a 1.6% watch-time lift in an online A/B test on Kuaishou.
Significance. If the empirical results hold under rigorous scrutiny, the work would constitute a meaningful advance in recommender systems by showing that a single generative model can supplant established multi-stage pipelines in a live production environment. The online deployment result and the practical use of MoE plus session-wise generation are concrete strengths that go beyond typical offline-only evaluations.
major comments (2)
- [Abstract] Abstract: the claim of a 1.6% watch-time increase and outperformance over 'current complex and well-designed recommender systems' is presented without any description of the A/B test design, baselines, ablations, statistical significance testing, or error analysis. Because the central claim rests entirely on this online result, the absence of these details prevents verification of the reported gains.
- [Iterative Preference Alignment module] Iterative Preference Alignment section: the DPO procedure constructs positive/negative pairs by sampling from the generative model and scoring them with a separately trained reward model. No quantitative validation of the reward model (e.g., accuracy on held-out human preference labels) or ablation of DPO with versus without the simulator is reported. This leaves the risk that alignment merely amplifies historical biases unaddressed, which directly undermines the claim that 'a limited number of DPO samples can align user interest preferences.'
minor comments (2)
- The distinction between 'session-wise generation' and traditional next-item prediction is described at a high level but lacks concrete implementation details on how generated items are combined or ranked within a session.
- The paper would benefit from a dedicated related-work subsection that explicitly positions the MoE encoder-decoder and the reward-model DPO adaptation against prior generative retrieval and preference-alignment work in recommendation.
Simulated Author's Rebuttal
We thank the referee for the detailed and constructive feedback. We address each major comment below with clarifications and note the changes incorporated in the revised manuscript.
read point-by-point responses
-
Referee: [Abstract] Abstract: the claim of a 1.6% watch-time increase and outperformance over 'current complex and well-designed recommender systems' is presented without any description of the A/B test design, baselines, ablations, statistical significance testing, or error analysis. Because the central claim rests entirely on this online result, the absence of these details prevents verification of the reported gains.
Authors: We agree that the abstract benefits from additional context to support the central claim. In the revised version, we have expanded the abstract to include a concise summary of the A/B test design (including duration, traffic split, and statistical significance with p < 0.05), while retaining the full details, baselines, ablations, and error analysis in the Experiments section. A cross-reference to that section has also been added for readers seeking verification. revision: yes
-
Referee: [Iterative Preference Alignment module] Iterative Preference Alignment section: the DPO procedure constructs positive/negative pairs by sampling from the generative model and scoring them with a separately trained reward model. No quantitative validation of the reward model (e.g., accuracy on held-out human preference labels) or ablation of DPO with versus without the simulator is reported. This leaves the risk that alignment merely amplifies historical biases unaddressed, which directly undermines the claim that 'a limited number of DPO samples can align user interest preferences.'
Authors: We acknowledge the value of explicit validation for the reward model. The reward model is trained on large-scale implicit user feedback from the production system, with its utility demonstrated through the online A/B test gains. In the revision, we have added an ablation comparing performance with and without the Iterative Preference Alignment (DPO with simulator) to quantify its contribution. For held-out human preference labels, such explicit annotations are not feasible at the required scale in this industrial setting; we have clarified this limitation in the text and reinforced that the live A/B results serve as the primary empirical validation against bias amplification concerns. revision: partial
Circularity Check
No circularity: empirical system description without derivations or self-referential reductions
full rationale
The paper describes an end-to-end generative recommender (encoder-decoder with MoE, session-wise generation, and DPO-based Iterative Preference Alignment using a custom reward model) and reports online A/B test gains. No equations, parameter fits, or mathematical derivations appear in the provided text. Claims rest on empirical results rather than any chain that reduces by construction to inputs, fitted parameters renamed as predictions, or self-citation load-bearing uniqueness theorems. Self-citations, if present, are not invoked to justify core premises in a way that creates circularity. The reward-model simulation for DPO is a methodological choice whose validity can be externally checked via held-out labels or ablations; it does not constitute a definitional loop or fitted-input prediction. This matches the expected honest non-finding for system papers lacking formal derivations.
Axiom & Free-Parameter Ledger
free parameters (2)
- MoE expert count and routing
- DPO beta and sampling parameters
axioms (1)
- domain assumption User historical behavior sequences contain sufficient signal to decode coherent future interest sessions via an encoder-decoder architecture.
invented entities (1)
-
Reward model for simulating user generation
no independent evidence
Forward citations
Cited by 31 Pith papers
-
MLPs are Efficient Distilled Generative Recommenders
SID-MLP distills autoregressive generative recommenders into efficient position-specific MLP heads for Semantic ID tasks, achieving 8.74x faster inference with matching accuracy.
-
Why Users Go There: World Knowledge-Augmented Generative Next POI Recommendation
AWARE augments generative next-POI recommendation with LLM agents that produce user-anchored narratives capturing events, culture, and trends, delivering up to 12.4% relative gains on three real datasets.
-
Expressiveness Limits of Autoregressive Semantic ID Generation in Generative Recommendation
Autoregressive semantic ID generation creates tree-induced probability correlations that prevent generative recommenders from capturing simple patterns; Latte adds latent tokens to relax these correlations.
-
Green-Red Watermarking for Recommender Systems
GREW uses a secret-key-driven green-red item partition and three ranking-integrated modules to embed verifiable watermarks in recommender systems that resist extraction attacks without data injection.
-
Objective Shaping with Hard Negatives: Windowed Partial AUC Optimization for RL-based LLM Recommenders
Beam-search negatives induce partial AUC optimization in GRPO for LLM recommenders; Windowed Partial AUC and TAWin improve Top-K alignment on four datasets.
-
ResRank: Unifying Retrieval and Listwise Reranking via End-to-End Joint Training with Residual Passage Compression
ResRank unifies retrieval and listwise reranking by compressing passages to one token each, using residual connections and cosine-similarity scoring, achieving competitive effectiveness on TREC DL and BEIR benchmarks ...
-
On the Equivalence Between Auto-Regressive Next Token Prediction and Full-Item-Vocabulary Maximum Likelihood Estimation in Generative Recommendation--A Short Note
Auto-regressive next-token prediction is strictly equivalent to full-vocabulary maximum likelihood estimation in generative recommendation under bijective item-to-token-sequence mapping.
-
DUET: Joint Exploration of User Item Profiles in Recommendation System
DUET uses a three-stage joint profile generator with RL feedback to create consistent user-item textual profiles that outperform independent generation in recommendation tasks.
-
IAT: Instance-As-Token Compression for Historical User Sequence Modeling in Industrial Recommender Systems
IAT compresses each historical interaction instance into a unified embedding token via temporal-order or user-order schemes, allowing standard sequence models to learn long-range preferences with better performance an...
-
From Passive Feeds to Guided Discovery: AI-Initiated Interaction for Vague Intent in Content Exploration
Red-Rec uses AI-initiated summaries and low-effort option selection to help users with vague intent explore more broadly and with higher serendipity than user-initiated chat while requiring less typing.
-
Conditional Memory Enhanced Item Representation for Generative Recommendation
ComeIR introduces dual-level Engram memory and memory-restoring prediction to reconstruct SID-token embeddings and restore token granularity in generative recommendation.
-
UxSID: Semantic-Aware User Interests Modeling for Ultra-Long Sequence
UxSID uses Semantic IDs and dual-level attention for semantic-group shared interest memory to efficiently model ultra-long user sequences, claiming SOTA performance and 0.337% revenue lift in advertising A/B tests.
-
CapsID: Soft-Routed Variable-Length Semantic IDs for Generative Recommendation
CapsID uses probabilistic capsule routing and confidence-based termination to generate variable-length semantic IDs, improving recall by 9.6% over strong baselines with half the latency of dual-representation systems.
-
Position-Aware Drafting for Inference Acceleration in LLM-Based Generative List-Wise Recommendation
PAD-Rec augments standard draft models with item-position and step-position embeddings plus learnable gates, delivering up to 3.1x wall-clock speedup and 5% average gain over strong speculative-decoding baselines on f...
-
From Local Indices to Global Identifiers: Generative Reranking for Recommender Systems via Global Action Space
GloRank reformulates list-wise reranking as token generation over a global item identifier space, using supervised pre-training followed by reinforcement learning to maximize list-wise utility and outperforming baseli...
-
Modeling Behavioral Intensity and Transitions for Generative Recommendation
BITRec improves generative multi-behavior recommendation by modeling behavioral intensity via separated pathways and transitions via learnable relation matrices, reporting 15-23% gains on large retail datasets.
-
Birds of a Feather Cluster Nearby: a Proximity-Aware Geo-Codebook for Local Service Recommendation
Pro-GEO introduces a geo-centroid coordinate system and geo-rotary position encoding to model geographic proximity as rotational transformations, enabling balanced semantic-spatial modeling in local service recommendations.
-
MTServe: Efficient Serving for Generative Recommendation Models with Hierarchical Caches
MTServe achieves up to 3.1x speedup for generative recommendation model serving by using hierarchical caches with host RAM and system optimizations while keeping cache hit ratios above 98.5%.
-
IceBreaker for Conversational Agents: Breaking the First-Message Barrier with Personalized Starters
IceBreaker applies resonance-aware interest distillation and interaction-oriented starter generation with preference alignment to create cold-start conversation openers, yielding +0.184% active days and +9.425% CTR ga...
-
From Relevance to Authority: Authority-aware Generative Retrieval in Web Search Engines
AuthGR is the first generative retriever to explicitly incorporate document authority alongside relevance using multimodal scoring and progressive training, yielding efficiency gains and real-world engagement improvements.
-
UniRec: Bridging the Expressive Gap between Generative and Discriminative Recommendation via Chain-of-Attribute
UniRec bridges the expressive gap in generative recommendation by prefixing semantic ID sequences with structured attribute tokens, recovering explicit feature crossing and yielding +22.6% HR@50 gains plus online lift...
-
CRAB: Codebook Rebalancing for Bias Mitigation in Generative Recommendation
CRAB mitigates popularity bias in generative recommenders by rebalancing the semantic token codebook through splitting popular tokens and applying a tree-structured regularizer to boost representations for unpopular items.
-
MBGR: Multi-Business Prediction for Generative Recommendation at Meituan
MBGR is a new generative recommendation framework using business-aware semantic IDs, multi-business prediction, and label dynamic routing to handle multiple businesses without seesaw effects or representation confusio...
-
UxSID: Semantic-Aware User Interests Modeling for Ultra-Long Sequence
UxSID introduces semantic-group shared interest memory with Semantic IDs and dual-level attention to model ultra-long user sequences, claiming state-of-the-art results and a 0.337% revenue lift in advertising A/B tests.
-
Revisiting General Map Search via Generative Point-of-Interest Retrieval
GenPOI is a generative POI retrieval system that unifies heterogeneous contexts via LLMs, uses geo-semantic tokenization, and applies proximity constraints to achieve superior performance on large-scale map search data.
-
Bian Que: An Agentic Framework with Flexible Skill Arrangement for Online System Operations
Bian Que is an agentic framework using a unified operational paradigm, flexible Skill Arrangement, and self-evolving mechanism to automate O&M tasks, achieving 75% alert reduction and over 50% MTTR cut in production d...
-
Bian Que: An Agentic Framework with Flexible Skill Arrangement for Online System Operations
Bian Que deploys an agentic system with flexible skills and self-evolution on a major e-commerce search engine, cutting alerts by 75%, reaching 80% root-cause accuracy, and halving resolution time.
-
Harmonizing Generative Retrieval and Ranking in Chain-of-Recommendation
RecoChain unifies generative candidate generation via hierarchical semantic IDs and SIM-based ranking in a single Transformer to improve top-K recommendation performance.
-
Mitigating Collaborative Semantic ID Staleness in Generative Retrieval
A model-agnostic SID alignment update mitigates staleness from temporal drift in user-item interactions for generative retrievers, improving Recall@K and nDCG@K while reducing compute by 8-9x versus full retraining.
-
SID-Coord: Coordinating Semantic IDs for ID-based Ranking in Short-Video Search
SID-Coord coordinates semantic IDs with hashed item IDs via attention fusion, adaptive gating, and interest alignment, yielding +0.664% long-play rate and +0.369% playback duration gains in production search ranking.
-
RecGPT-Mobile: On-Device Large Language Models for User Intent Understanding in Taobao Feed Recommendation
RecGPT-Mobile runs a compact LLM on phones to understand evolving user intent from behaviors and improve mobile e-commerce recommendations.
Reference graph
Works this paper leans on
-
[1]
Mohammad Gheshlaghi Azar, Zhaohan Daniel Guo, Bilal Piot, Remi Munos, Mark Rowland, Michal Valko, and Daniele Calandriello. 2024. A general theoreti- cal paradigm to understand learning from human preferences. In International Conference on Artificial Intelligence and Statistics . PMLR, 4447–4455
work page 2024
-
[2]
Christopher JC Burges. 2010. From ranknet to lambdarank to lambdamart: An overview. Learning 11, 23-581 (2010), 81
work page 2010
-
[3]
Jianxin Chang, Chenbin Zhang, Zhiyi Fu, Xiaoxue Zang, Lin Guan, Jing Lu, Yiqun Hui, Dewei Leng, Yanan Niu, Yang Song, et al. 2023. TWIN: TWo-stage interest network for lifelong user behavior modeling in CTR prediction at kuaishou. In Proceedings of the 29th ACM SIGKDD Conference on Knowledge Discovery and Data Mining. 3785–3794
work page 2023
-
[4]
Yuxin Chen, Junfei Tan, An Zhang, Zhengyi Yang, Leheng Sheng, Enzhi Zhang, Xiang Wang, and Tat-Seng Chua. 2024. On Softmax Direct Preference Optimiza- tion for Recommendation. In The Thirty-eighth Annual Conference on Neural Information Processing Systems. https://openreview.net/forum?id=qp5VbGTaM0
work page 2024
-
[5]
Sayak Ray Chowdhury, Anush Kini, and Nagarajan Natarajan. 2024. Provably Robust DPO: Aligning Language Models with Noisy Feedback. In ICML 2024
work page 2024
-
[6]
Paul Covington, Jay Adams, and Emre Sargin. 2016. Deep neural networks for youtube recommendations. In Proceedings of the 10th ACM conference on recommender systems. 191–198
work page 2016
-
[7]
Damai Dai, Chengqi Deng, Chenggang Zhao, RX Xu, Huazuo Gao, Deli Chen, Jiashi Li, Wangding Zeng, Xingkai Yu, Y Wu, et al. 2024. Deepseekmoe: Towards ultimate expert specialization in mixture-of-experts language models. arXiv preprint arXiv:2401.06066 (2024)
work page internal anchor Pith review Pith/arXiv arXiv 2024
- [8]
-
[9]
Nan Du, Yanping Huang, Andrew M Dai, Simon Tong, Dmitry Lepikhin, Yuanzhong Xu, Maxim Krikun, Yanqi Zhou, Adams Wei Yu, Orhan Firat, et al
-
[10]
In International Conference on Machine Learning
Glam: Efficient scaling of language models with mixture-of-experts. In International Conference on Machine Learning . PMLR, 5547–5569
-
[11]
Abhimanyu Dubey, Abhinav Jauhri, Abhinav Pandey, Abhishek Kadian, Ahmad Al-Dahle, Aiesha Letman, Akhil Mathur, Alan Schelten, Amy Yang, Angela Fan, et al. 2024. The llama 3 herd of models. arXiv preprint arXiv:2407.21783 (2024)
work page internal anchor Pith review Pith/arXiv arXiv 2024
-
[12]
Hongliang Fei, Jingyuan Zhang, Xingxuan Zhou, Junhao Zhao, Xinyang Qi, and Ping Li. 2021. GemNN: gating-enhanced multi-task neural networks with feature interaction learning for CTR prediction. In Proceedings of the 44th international ACM SIGIR conference on research and development in information retrieval . 2166– 2171
work page 2021
-
[13]
Chao Feng, Wuchao Li, Defu Lian, Zheng Liu, and Enhong Chen. 2022. Recom- mender forest for efficient retrieval. Advances in Neural Information Processing Systems 35 (2022), 38912–38924
work page 2022
-
[14]
Luke Gallagher, Ruey-Cheng Chen, Roi Blanco, and J Shane Culpepper. 2019. Joint optimization of cascade ranking models. In Proceedings of the twelfth ACM international conference on web search and data mining . 15–23
work page 2019
-
[15]
Tiezheng Ge, Kaiming He, Qifa Ke, and Jian Sun. 2013. Optimized product quantization. IEEE transactions on pattern analysis and machine intelligence 36, 4 (2013), 744–755
work page 2013
- [16]
-
[17]
B Hidasi. 2015. Session-based Recommendations with Recurrent Neural Networks. arXiv preprint arXiv:1511.06939 (2015)
work page internal anchor Pith review Pith/arXiv arXiv 2015
-
[18]
Michael E Houle and Michael Nett. 2014. Rank-based similarity search: Reducing the dimensional dependence. IEEE transactions on pattern analysis and machine intelligence 37, 1 (2014), 136–150
work page 2014
-
[19]
Jiri Hron, Karl Krauth, Michael Jordan, and Niki Kilbertus. 2021. On component interactions in two-stage recommender systems. Advances in neural information processing systems 34 (2021), 2744–2757
work page 2021
-
[20]
Po-Sen Huang, Xiaodong He, Jianfeng Gao, Li Deng, Alex Acero, and Larry Heck. 2013. Learning deep structured semantic models for web search using clickthrough data. In Proceedings of the 22nd ACM international conference on Information & Knowledge Management . 2333–2338
work page 2013
-
[21]
Xu Huang, Defu Lian, Jin Chen, Liu Zheng, Xing Xie, and Enhong Chen. 2023. Cooperative Retriever and Ranker in Deep Recommenders. In Proceedings of the ACM Web Conference 2023. 1150–1161
work page 2023
-
[22]
Herve Jegou, Matthijs Douze, and Cordelia Schmid. 2010. Product quantization for nearest neighbor search. IEEE transactions on pattern analysis and machine intelligence 33, 1 (2010), 117–128
work page 2010
-
[23]
Wang-Cheng Kang and Julian McAuley. 2018. Self-attentive sequential recom- mendation. In 2018 IEEE international conference on data mining (ICDM) . IEEE, 197–206
work page 2018
-
[24]
Zhirui Kuai, Zuxu Chen, Huimu Wang, Mingming Li, Dadong Miao, Wang Binbin, Xusong Chen, Li Kuang, Yuxing Han, Jiaxing Wang, et al. 2024. Breaking the Hourglass Phenomenon of Residual Quantization: Enhancing the Upper Bound of Generative Retrieval. In Proceedings of the 2024 Conference on Empirical Methods in Natural Language Processing: Industry Track . 677–685
work page 2024
-
[25]
Doyup Lee, Chiheon Kim, Saehoon Kim, Minsu Cho, and Wook-Shin Han. 2022. Autoregressive image generation using residual quantization. InProceedings of the IEEE/CVF Conference on Computer Vision and Pattern Recognition . 11523–11532
work page 2022
- [26]
-
[27]
Shichen Liu, Fei Xiao, Wenwu Ou, and Luo Si. 2017. Cascade ranking for opera- tional e-commerce search. In Proceedings of the 23rd ACM SIGKDD International Conference on Knowledge Discovery and Data Mining . 1557–1565
work page 2017
- [28]
-
[29]
Xu Ma, Pengjie Wang, Hui Zhao, Shaoguo Liu, Chuhan Zhao, Wei Lin, Kuang- Chih Lee, Jian Xu, and Bo Zheng. 2021. Towards a better tradeoff between effectiveness and efficiency in pre-ranking: A learnable feature selection based approach. In Proceedings of the 44th International ACM SIGIR Conference on Research and Development in Information Retrieval . 2036–2040
work page 2021
-
[30]
Yu Meng, Mengzhou Xia, and Danqi Chen. 2024. SimPO: Simple Preference Optimization with a Reference-Free Reward. In Advances in Neural Information Processing Systems (NeurIPS)
work page 2024
-
[31]
Eric Mitchell. [n. d.]. A note on dpo with noisy preferences and relationship to ipo, 2023. URL https://ericmitchell. ai/cdpo. pdf ([n. d.])
work page 2023
-
[32]
Marius Muja and David G Lowe. 2014. Scalable nearest neighbor algorithms for high dimensional data. IEEE transactions on pattern analysis and machine intelligence 36, 11 (2014), 2227–2240
work page 2014
-
[33]
Long Ouyang, Jeffrey Wu, Xu Jiang, Diogo Almeida, Carroll Wainwright, Pamela Mishkin, Chong Zhang, Sandhini Agarwal, Katarina Slama, Alex Ray, et al. 2022. Training language models to follow instructions with human feedback. Advances in neural information processing systems 35 (2022), 27730–27744
work page 2022
-
[34]
Qi Pi, Guorui Zhou, Yujing Zhang, Zhe Wang, Lejian Ren, Ying Fan, Xiaoqiang Zhu, and Kun Gai. 2020. Search-based user interest modeling with lifelong sequential behavior data for click-through rate prediction. In Proceedings of the 29th ACM International Conference on Information & Knowledge Management . 2685–2692
work page 2020
-
[35]
Jiarui Qin, Jiachen Zhu, Bo Chen, Zhirong Liu, Weiwen Liu, Ruiming Tang, Rui Zhang, Yong Yu, and Weinan Zhang. 2022. Rankflow: Joint optimization of multi- stage cascade ranking systems as flows. In Proceedings of the 45th International ACM SIGIR Conference on Research and Development in Information Retrieval . 814–824
work page 2022
-
[36]
Rafael Rafailov, Archit Sharma, Eric Mitchell, Christopher D Manning, Stefano Ermon, and Chelsea Finn. 2024. Direct preference optimization: Your language model is secretly a reward model. Advances in Neural Information Processing Systems 36 (2024)
work page 2024
-
[37]
Shashank Rajput, Nikhil Mehta, Anima Singh, Raghunandan Hulikal Keshavan, Trung Vu, Lukasz Heldt, Lichan Hong, Yi Tay, Vinh Tran, Jonah Samost, et al
-
[38]
Advances in Neural Information Processing Systems 36 (2023), 10299–10315
Recommender systems with generative retrieval. Advances in Neural Information Processing Systems 36 (2023), 10299–10315
work page 2023
-
[39]
Wentao Shi, Jiawei Chen, Fuli Feng, Jizhi Zhang, Junkang Wu, Chongming Gao, and Xiangnan He. 2023. On the theories behind hard negative sampling for recommendation. In Proceedings of the ACM Web Conference 2023 . 812–822
work page 2023
-
[40]
Anshumali Shrivastava and Ping Li. 2014. Asymmetric LSH (ALSH) for sublinear time maximum inner product search (MIPS). Advances in neural information processing systems 27 (2014)
work page 2014
-
[41]
Nisan Stiennon, Long Ouyang, Jeffrey Wu, Daniel Ziegler, Ryan Lowe, Chelsea Voss, Alec Radford, Dario Amodei, and Paul F Christiano. 2020. Learning to summarize with human feedback. Advances in Neural Information Processing Systems 33 (2020), 3008–3021
work page 2020
-
[42]
Fei Sun, Jun Liu, Jian Wu, Changhua Pei, Xiao Lin, Wenwu Ou, and Peng Jiang
-
[43]
In Proceedings of the 28th ACM international conference on information and knowledge management
BERT4Rec: Sequential recommendation with bidirectional encoder rep- resentations from transformer. In Proceedings of the 28th ACM international conference on information and knowledge management . 1441–1450
-
[44]
Yubao Tang, Ruqing Zhang, Jiafeng Guo, and Maarten de Rijke. 2023. Recent advances in generative information retrieval. In Proceedings of the Annual In- ternational ACM SIGIR Conference on Research and Development in Information Retrieval in the Asia Pacific Region . 294–297
work page 2023
-
[45]
Yi Tay, Vinh Tran, Mostafa Dehghani, Jianmo Ni, Dara Bahri, Harsh Mehta, Zhen Qin, Kai Hui, Zhe Zhao, Jai Gupta, et al. 2022. Transformer memory as a differentiable search index. Advances in Neural Information Processing Systems 35 (2022), 21831–21843
work page 2022
-
[46]
Lidan Wang, Jimmy Lin, and Donald Metzler. 2011. A cascade ranking model for efficient ranked retrieval. In Proceedings of the 34th international ACM SIGIR conference on Research and development in Information Retrieval . 105–114
work page 2011
-
[47]
Yunli Wang, Zhiqiang Wang, Jian Yang, Shiyang Wen, Dongying Kong, Han Li, and Kun Gai. 2024. Adaptive Neural Ranking Framework: Toward Maximized Business Goal for Cascade Ranking Systems. In Proceedings of the ACM on Web Conference 2024. 3798–3809. Conference acronym ’XX, June 03–05, 2018, Woodstock, NY Deng et al
work page 2024
-
[48]
Ye Wang, Jiahao Xun, Minjie Hong, Jieming Zhu, Tao Jin, Wang Lin, Haoyuan Li, Linjun Li, Yan Xia, Zhou Zhao, et al. 2024. EAGER: Two-Stream Generative Recommender with Behavior-Semantic Collaboration. In Proceedings of the 30th ACM SIGKDD Conference on Knowledge Discovery and Data Mining . 3245–3254
work page 2024
-
[49]
Zhe Wang, Liqin Zhao, Biye Jiang, Guorui Zhou, Xiaoqiang Zhu, and Kun Gai
-
[50]
arXiv preprint arXiv:2007.16122 (2020)
Cold: Towards the next generation of pre-ranking system. arXiv preprint arXiv:2007.16122 (2020)
- [51]
-
[52]
Shuyuan Xu, Wenyue Hua, and Yongfeng Zhang. 2024. Openp5: An open-source platform for developing, training, and evaluating llm-based recommender sys- tems. In Proceedings of the 47th International ACM SIGIR Conference on Research and Development in Information Retrieval . 386–394
work page 2024
-
[53]
Neil Zeghidour, Alejandro Luebs, Ahmed Omran, Jan Skoglund, and Marco Tagliasacchi. 2021. Soundstream: An end-to-end neural audio codec. IEEE/ACM Transactions on Audio, Speech, and Language Processing 30 (2021), 495–507
work page 2021
-
[54]
Tingting Zhang, Pengpeng Zhao, Yanchi Liu, Victor S Sheng, Jiajie Xu, Deqing Wang, Guanfeng Liu, Xiaofang Zhou, et al . 2019. Feature-level deeper self- attention network for sequential recommendation.. In IJCAI. 4320–4326
work page 2019
-
[55]
Bowen Zheng, Yupeng Hou, Hongyu Lu, Yu Chen, Wayne Xin Zhao, Ming Chen, and Ji-Rong Wen. 2024. Adapting large language models by integrating collaborative semantics for recommendation. In 2024 IEEE 40th International Conference on Data Engineering (ICDE) . IEEE, 1435–1448
work page 2024
-
[56]
Guorui Zhou, Na Mou, Ying Fan, Qi Pi, Weijie Bian, Chang Zhou, Xiaoqiang Zhu, and Kun Gai. 2019. Deep interest evolution network for click-through rate prediction. In Proceedings of the AAAI conference on artificial intelligence , Vol. 33. 5941–5948
work page 2019
-
[57]
Guorui Zhou, Xiaoqiang Zhu, Chenru Song, Ying Fan, Han Zhu, Xiao Ma, Yanghui Yan, Junqi Jin, Han Li, and Kun Gai. 2018. Deep interest network for click-through rate prediction. In Proceedings of the 24th ACM SIGKDD international conference on knowledge discovery & data mining . 1059–1068
work page 2018
-
[58]
Han Zhu, Xiang Li, Pengye Zhang, Guozheng Li, Jie He, Han Li, and Kun Gai
-
[59]
In Proceedings of the 24th ACM SIGKDD international conference on knowledge discovery & data mining
Learning tree-based deep model for recommender systems. In Proceedings of the 24th ACM SIGKDD international conference on knowledge discovery & data mining. 1079–1088
-
[60]
Barret Zoph, Irwan Bello, Sameer Kumar, Nan Du, Yanping Huang, Jeff Dean, Noam Shazeer, and William Fedus. 2022. Designing effective sparse expert models. arXiv preprint arXiv:2202.08906 2, 3 (2022), 17
work page internal anchor Pith review arXiv 2022
discussion (0)
Sign in with ORCID, Apple, or X to comment. Anyone can read and Pith papers without signing in.