arxiv: 2502.18965 · v1 · submitted 2025-02-26 · 💻 cs.IR

Recognition: 2 theorem links

OneRec: Unifying Retrieve and Rank with Generative Recommender and Iterative Preference Alignment

Guorui Zhou, Jiaxin Deng, Kuo Cai, Lejian Ren, Qiang Luo, Qigen Hu, Shiyao Wang, Weifeng Ding

Authors on Pith no claims yet

Pith reviewed 2026-05-12 18:24 UTC · model grok-4.3

classification 💻 cs.IR

keywords generative recommendationunified retrieve and rankdirect preference optimizationmixture of expertssession-wise generationpreference alignmentend-to-end recommender

0 comments

The pith

OneRec replaces the retrieve-and-rank pipeline with a single end-to-end generative model that outperforms existing recommender systems in live use.

A machine-rendered reading of the paper's core claim, the machinery that carries it, and where it could break.

The paper introduces OneRec to collapse the conventional multi-stage recommendation process into one generative model that encodes user history and directly produces sessions of recommended items. It employs an encoder-decoder backbone with sparse mixture-of-experts layers, replaces next-item prediction with session-wise generation for better coherence, and adds an iterative preference alignment step that uses a reward model to adapt direct preference optimization to the single-display constraint of recommendations. If correct, this unification would let systems optimize retrieval and ranking jointly rather than through separate hand-tuned stages, yielding higher user engagement in production. The approach matters because current cascades require extensive engineering to balance components, while a unified generator could simplify deployment and scaling.

Core claim

OneRec is the first end-to-end generative recommender that surpasses complex retrieve-and-rank systems in real-world scenarios by using an encoder-decoder structure with sparse MoE to encode historical behaviors and decode candidate videos, a session-wise generation method that produces coherent lists without point-by-point rules, and an Iterative Preference Alignment module that trains via DPO on samples drawn through a reward model simulating user preferences under single-display constraints, delivering measurable gains such as increased watch-time after deployment.

What carries the argument

The Iterative Preference Alignment module, which adapts Direct Preference Optimization by training a reward model to generate positive and negative samples for each user request and thereby enables preference tuning without simultaneous paired displays.

Load-bearing premise

The separately trained reward model can faithfully simulate user preferences and supply unbiased training signals for direct preference optimization in a single-display setting.

What would settle it

A live A/B test on the same user traffic that directly compares the generative OneRec outputs against the existing retrieve-and-rank baseline and measures whether watch-time and engagement metrics improve or stay flat.

read the original abstract

Recently, generative retrieval-based recommendation systems have emerged as a promising paradigm. However, most modern recommender systems adopt a retrieve-and-rank strategy, where the generative model functions only as a selector during the retrieval stage. In this paper, we propose OneRec, which replaces the cascaded learning framework with a unified generative model. To the best of our knowledge, this is the first end-to-end generative model that significantly surpasses current complex and well-designed recommender systems in real-world scenarios. Specifically, OneRec includes: 1) an encoder-decoder structure, which encodes the user's historical behavior sequences and gradually decodes the videos that the user may be interested in. We adopt sparse Mixture-of-Experts (MoE) to scale model capacity without proportionally increasing computational FLOPs. 2) a session-wise generation approach. In contrast to traditional next-item prediction, we propose a session-wise generation, which is more elegant and contextually coherent than point-by-point generation that relies on hand-crafted rules to properly combine the generated results. 3) an Iterative Preference Alignment module combined with Direct Preference Optimization (DPO) to enhance the quality of the generated results. Unlike DPO in NLP, a recommendation system typically has only one opportunity to display results for each user's browsing request, making it impossible to obtain positive and negative samples simultaneously. To address this limitation, We design a reward model to simulate user generation and customize the sampling strategy. Extensive experiments have demonstrated that a limited number of DPO samples can align user interest preferences and significantly improve the quality of generated results. We deployed OneRec in the main scene of Kuaishou, achieving a 1.6\% increase in watch-time, which is a substantial improvement.

Editorial analysis

A structured set of objections, weighed in public.

Desk editor's note, referee report, simulated authors' rebuttal, and a circularity audit. Tearing a paper down is the easy half of reading it; the pith above is the substance, this is the friction.

Referee Report

2 major / 2 minor

Summary. The manuscript presents OneRec, a unified generative recommender that replaces the traditional retrieve-and-rank cascade with a single encoder-decoder model. It encodes user historical behavior sequences, employs sparse Mixture-of-Experts (MoE) for capacity scaling, performs session-wise generation rather than pointwise next-item prediction, and applies Iterative Preference Alignment via Direct Preference Optimization (DPO) using a reward model to synthesize positive/negative pairs in single-display settings. The central claim is that this is the first end-to-end generative model to significantly outperform complex production recommender systems, evidenced by a 1.6% watch-time lift in an online A/B test on Kuaishou.

Significance. If the empirical results hold under rigorous scrutiny, the work would constitute a meaningful advance in recommender systems by showing that a single generative model can supplant established multi-stage pipelines in a live production environment. The online deployment result and the practical use of MoE plus session-wise generation are concrete strengths that go beyond typical offline-only evaluations.

major comments (2)

[Abstract] Abstract: the claim of a 1.6% watch-time increase and outperformance over 'current complex and well-designed recommender systems' is presented without any description of the A/B test design, baselines, ablations, statistical significance testing, or error analysis. Because the central claim rests entirely on this online result, the absence of these details prevents verification of the reported gains.
[Iterative Preference Alignment module] Iterative Preference Alignment section: the DPO procedure constructs positive/negative pairs by sampling from the generative model and scoring them with a separately trained reward model. No quantitative validation of the reward model (e.g., accuracy on held-out human preference labels) or ablation of DPO with versus without the simulator is reported. This leaves the risk that alignment merely amplifies historical biases unaddressed, which directly undermines the claim that 'a limited number of DPO samples can align user interest preferences.'

minor comments (2)

The distinction between 'session-wise generation' and traditional next-item prediction is described at a high level but lacks concrete implementation details on how generated items are combined or ranked within a session.
The paper would benefit from a dedicated related-work subsection that explicitly positions the MoE encoder-decoder and the reward-model DPO adaptation against prior generative retrieval and preference-alignment work in recommendation.

Simulated Author's Rebuttal

2 responses · 0 unresolved

We thank the referee for the detailed and constructive feedback. We address each major comment below with clarifications and note the changes incorporated in the revised manuscript.

read point-by-point responses

Referee: [Abstract] Abstract: the claim of a 1.6% watch-time increase and outperformance over 'current complex and well-designed recommender systems' is presented without any description of the A/B test design, baselines, ablations, statistical significance testing, or error analysis. Because the central claim rests entirely on this online result, the absence of these details prevents verification of the reported gains.

Authors: We agree that the abstract benefits from additional context to support the central claim. In the revised version, we have expanded the abstract to include a concise summary of the A/B test design (including duration, traffic split, and statistical significance with p < 0.05), while retaining the full details, baselines, ablations, and error analysis in the Experiments section. A cross-reference to that section has also been added for readers seeking verification. revision: yes
Referee: [Iterative Preference Alignment module] Iterative Preference Alignment section: the DPO procedure constructs positive/negative pairs by sampling from the generative model and scoring them with a separately trained reward model. No quantitative validation of the reward model (e.g., accuracy on held-out human preference labels) or ablation of DPO with versus without the simulator is reported. This leaves the risk that alignment merely amplifies historical biases unaddressed, which directly undermines the claim that 'a limited number of DPO samples can align user interest preferences.'

Authors: We acknowledge the value of explicit validation for the reward model. The reward model is trained on large-scale implicit user feedback from the production system, with its utility demonstrated through the online A/B test gains. In the revision, we have added an ablation comparing performance with and without the Iterative Preference Alignment (DPO with simulator) to quantify its contribution. For held-out human preference labels, such explicit annotations are not feasible at the required scale in this industrial setting; we have clarified this limitation in the text and reinforced that the live A/B results serve as the primary empirical validation against bias amplification concerns. revision: partial

Circularity Check

0 steps flagged

No circularity: empirical system description without derivations or self-referential reductions

full rationale

The paper describes an end-to-end generative recommender (encoder-decoder with MoE, session-wise generation, and DPO-based Iterative Preference Alignment using a custom reward model) and reports online A/B test gains. No equations, parameter fits, or mathematical derivations appear in the provided text. Claims rest on empirical results rather than any chain that reduces by construction to inputs, fitted parameters renamed as predictions, or self-citation load-bearing uniqueness theorems. Self-citations, if present, are not invoked to justify core premises in a way that creates circularity. The reward-model simulation for DPO is a methodological choice whose validity can be externally checked via held-out labels or ablations; it does not constitute a definitional loop or fitted-input prediction. This matches the expected honest non-finding for system papers lacking formal derivations.

Axiom & Free-Parameter Ledger

2 free parameters · 1 axioms · 1 invented entities

The approach builds on standard transformer encoder-decoder and DPO techniques but introduces custom adaptations whose effectiveness depends on unstated hyperparameters and the quality of the reward model.

free parameters (2)

MoE expert count and routing
Chosen to scale capacity without proportional FLOP increase; specific values not given in abstract.
DPO beta and sampling parameters
Control preference alignment strength and negative sample generation; not specified.

axioms (1)

domain assumption User historical behavior sequences contain sufficient signal to decode coherent future interest sessions via an encoder-decoder architecture.
Invoked as the basis for the core generative structure.

invented entities (1)

Reward model for simulating user generation no independent evidence
purpose: Generate positive and negative samples for DPO because only one display opportunity exists per user request.
Introduced to adapt DPO to the single-display constraint of recommendation systems.

pith-pipeline@v0.9.0 · 5638 in / 1407 out tokens · 60974 ms · 2026-05-12T18:24:51.490405+00:00 · methodology

discussion (0)

Forward citations

Cited by 31 Pith papers

Reviewed papers in the Pith corpus that reference this work. Sorted by Pith novelty score.

MLPs are Efficient Distilled Generative Recommenders
cs.IR 2026-05 unverdicted novelty 7.0

SID-MLP distills autoregressive generative recommenders into efficient position-specific MLP heads for Semantic ID tasks, achieving 8.74x faster inference with matching accuracy.
Why Users Go There: World Knowledge-Augmented Generative Next POI Recommendation
cs.AI 2026-05 unverdicted novelty 7.0

AWARE augments generative next-POI recommendation with LLM agents that produce user-anchored narratives capturing events, culture, and trends, delivering up to 12.4% relative gains on three real datasets.
Expressiveness Limits of Autoregressive Semantic ID Generation in Generative Recommendation
cs.IR 2026-05 unverdicted novelty 7.0

Autoregressive semantic ID generation creates tree-induced probability correlations that prevent generative recommenders from capturing simple patterns; Latte adds latent tokens to relax these correlations.
Green-Red Watermarking for Recommender Systems
cs.IR 2026-04 unverdicted novelty 7.0

GREW uses a secret-key-driven green-red item partition and three ranking-integrated modules to embed verifiable watermarks in recommender systems that resist extraction attacks without data injection.
Objective Shaping with Hard Negatives: Windowed Partial AUC Optimization for RL-based LLM Recommenders
cs.IR 2026-04 unverdicted novelty 7.0

Beam-search negatives induce partial AUC optimization in GRPO for LLM recommenders; Windowed Partial AUC and TAWin improve Top-K alignment on four datasets.
ResRank: Unifying Retrieval and Listwise Reranking via End-to-End Joint Training with Residual Passage Compression
cs.IR 2026-04 conditional novelty 7.0

ResRank unifies retrieval and listwise reranking by compressing passages to one token each, using residual connections and cosine-similarity scoring, achieving competitive effectiveness on TREC DL and BEIR benchmarks ...
On the Equivalence Between Auto-Regressive Next Token Prediction and Full-Item-Vocabulary Maximum Likelihood Estimation in Generative Recommendation--A Short Note
cs.IR 2026-04 accept novelty 7.0

Auto-regressive next-token prediction is strictly equivalent to full-vocabulary maximum likelihood estimation in generative recommendation under bijective item-to-token-sequence mapping.
DUET: Joint Exploration of User Item Profiles in Recommendation System
cs.IR 2026-04 unverdicted novelty 7.0

DUET uses a three-stage joint profile generator with RL feedback to create consistent user-item textual profiles that outperform independent generation in recommendation tasks.
IAT: Instance-As-Token Compression for Historical User Sequence Modeling in Industrial Recommender Systems
cs.IR 2026-04 unverdicted novelty 7.0

IAT compresses each historical interaction instance into a unified embedding token via temporal-order or user-order schemes, allowing standard sequence models to learn long-range preferences with better performance an...
From Passive Feeds to Guided Discovery: AI-Initiated Interaction for Vague Intent in Content Exploration
cs.HC 2026-03 conditional novelty 7.0

Red-Rec uses AI-initiated summaries and low-effort option selection to help users with vague intent explore more broadly and with higher serendipity than user-initiated chat while requiring less typing.
Conditional Memory Enhanced Item Representation for Generative Recommendation
cs.IR 2026-05 unverdicted novelty 6.0

ComeIR introduces dual-level Engram memory and memory-restoring prediction to reconstruct SID-token embeddings and restore token granularity in generative recommendation.
UxSID: Semantic-Aware User Interests Modeling for Ultra-Long Sequence
cs.AI 2026-05 unverdicted novelty 6.0

UxSID uses Semantic IDs and dual-level attention for semantic-group shared interest memory to efficiently model ultra-long user sequences, claiming SOTA performance and 0.337% revenue lift in advertising A/B tests.
CapsID: Soft-Routed Variable-Length Semantic IDs for Generative Recommendation
cs.IR 2026-05 unverdicted novelty 6.0

CapsID uses probabilistic capsule routing and confidence-based termination to generate variable-length semantic IDs, improving recall by 9.6% over strong baselines with half the latency of dual-representation systems.
Position-Aware Drafting for Inference Acceleration in LLM-Based Generative List-Wise Recommendation
cs.IR 2026-04 unverdicted novelty 6.0

PAD-Rec augments standard draft models with item-position and step-position embeddings plus learnable gates, delivering up to 3.1x wall-clock speedup and 5% average gain over strong speculative-decoding baselines on f...
From Local Indices to Global Identifiers: Generative Reranking for Recommender Systems via Global Action Space
cs.IR 2026-04 unverdicted novelty 6.0

GloRank reformulates list-wise reranking as token generation over a global item identifier space, using supervised pre-training followed by reinforcement learning to maximize list-wise utility and outperforming baseli...
Modeling Behavioral Intensity and Transitions for Generative Recommendation
cs.IR 2026-04 unverdicted novelty 6.0

BITRec improves generative multi-behavior recommendation by modeling behavioral intensity via separated pathways and transitions via learnable relation matrices, reporting 15-23% gains on large retail datasets.
Birds of a Feather Cluster Nearby: a Proximity-Aware Geo-Codebook for Local Service Recommendation
cs.IR 2026-04 unverdicted novelty 6.0

Pro-GEO introduces a geo-centroid coordinate system and geo-rotary position encoding to model geographic proximity as rotational transformations, enabling balanced semantic-spatial modeling in local service recommendations.
MTServe: Efficient Serving for Generative Recommendation Models with Hierarchical Caches
cs.LG 2026-04 unverdicted novelty 6.0

MTServe achieves up to 3.1x speedup for generative recommendation model serving by using hierarchical caches with host RAM and system optimizations while keeping cache hit ratios above 98.5%.
IceBreaker for Conversational Agents: Breaking the First-Message Barrier with Personalized Starters
cs.CL 2026-04 unverdicted novelty 6.0

IceBreaker applies resonance-aware interest distillation and interaction-oriented starter generation with preference alignment to create cold-start conversation openers, yielding +0.184% active days and +9.425% CTR ga...
From Relevance to Authority: Authority-aware Generative Retrieval in Web Search Engines
cs.IR 2026-04 unverdicted novelty 6.0

AuthGR is the first generative retriever to explicitly incorporate document authority alongside relevance using multimodal scoring and progressive training, yielding efficiency gains and real-world engagement improvements.
UniRec: Bridging the Expressive Gap between Generative and Discriminative Recommendation via Chain-of-Attribute
cs.IR 2026-04 unverdicted novelty 6.0

UniRec bridges the expressive gap in generative recommendation by prefixing semantic ID sequences with structured attribute tokens, recovering explicit feature crossing and yielding +22.6% HR@50 gains plus online lift...
CRAB: Codebook Rebalancing for Bias Mitigation in Generative Recommendation
cs.IR 2026-04 unverdicted novelty 6.0

CRAB mitigates popularity bias in generative recommenders by rebalancing the semantic token codebook through splitting popular tokens and applying a tree-structured regularizer to boost representations for unpopular items.
MBGR: Multi-Business Prediction for Generative Recommendation at Meituan
cs.IR 2026-04 unverdicted novelty 6.0

MBGR is a new generative recommendation framework using business-aware semantic IDs, multi-business prediction, and label dynamic routing to handle multiple businesses without seesaw effects or representation confusio...
UxSID: Semantic-Aware User Interests Modeling for Ultra-Long Sequence
cs.AI 2026-05 unverdicted novelty 5.0

UxSID introduces semantic-group shared interest memory with Semantic IDs and dual-level attention to model ultra-long user sequences, claiming state-of-the-art results and a 0.337% revenue lift in advertising A/B tests.
Revisiting General Map Search via Generative Point-of-Interest Retrieval
cs.IR 2026-05 unverdicted novelty 5.0

GenPOI is a generative POI retrieval system that unifies heterogeneous contexts via LLMs, uses geo-semantic tokenization, and applies proximity constraints to achieve superior performance on large-scale map search data.
Bian Que: An Agentic Framework with Flexible Skill Arrangement for Online System Operations
cs.AI 2026-04 unverdicted novelty 5.0

Bian Que is an agentic framework using a unified operational paradigm, flexible Skill Arrangement, and self-evolving mechanism to automate O&M tasks, achieving 75% alert reduction and over 50% MTTR cut in production d...
Bian Que: An Agentic Framework with Flexible Skill Arrangement for Online System Operations
cs.AI 2026-04 unverdicted novelty 5.0

Bian Que deploys an agentic system with flexible skills and self-evolution on a major e-commerce search engine, cutting alerts by 75%, reaching 80% root-cause accuracy, and halving resolution time.
Harmonizing Generative Retrieval and Ranking in Chain-of-Recommendation
cs.IR 2026-04 unverdicted novelty 5.0

RecoChain unifies generative candidate generation via hierarchical semantic IDs and SIM-based ranking in a single Transformer to improve top-K recommendation performance.
Mitigating Collaborative Semantic ID Staleness in Generative Retrieval
cs.IR 2026-04 unverdicted novelty 5.0

A model-agnostic SID alignment update mitigates staleness from temporal drift in user-item interactions for generative retrievers, improving Recall@K and nDCG@K while reducing compute by 8-9x versus full retraining.
SID-Coord: Coordinating Semantic IDs for ID-based Ranking in Short-Video Search
cs.IR 2026-04 unverdicted novelty 5.0

SID-Coord coordinates semantic IDs with hashed item IDs via attention fusion, adaptive gating, and interest alignment, yielding +0.664% long-play rate and +0.369% playback duration gains in production search ranking.
RecGPT-Mobile: On-Device Large Language Models for User Intent Understanding in Taobao Feed Recommendation
cs.IR 2026-05 unverdicted novelty 4.0

RecGPT-Mobile runs a compact LLM on phones to understand evolving user intent from behaviors and improve mobile e-commerce recommendations.

Reference graph

Works this paper leans on

60 extracted references · 60 canonical work pages · cited by 29 Pith papers · 4 internal anchors

[1]

Mohammad Gheshlaghi Azar, Zhaohan Daniel Guo, Bilal Piot, Remi Munos, Mark Rowland, Michal Valko, and Daniele Calandriello. 2024. A general theoreti- cal paradigm to understand learning from human preferences. In International Conference on Artificial Intelligence and Statistics . PMLR, 4447–4455

work page 2024
[2]

Christopher JC Burges. 2010. From ranknet to lambdarank to lambdamart: An overview. Learning 11, 23-581 (2010), 81

work page 2010
[3]

Jianxin Chang, Chenbin Zhang, Zhiyi Fu, Xiaoxue Zang, Lin Guan, Jing Lu, Yiqun Hui, Dewei Leng, Yanan Niu, Yang Song, et al. 2023. TWIN: TWo-stage interest network for lifelong user behavior modeling in CTR prediction at kuaishou. In Proceedings of the 29th ACM SIGKDD Conference on Knowledge Discovery and Data Mining. 3785–3794

work page 2023
[4]

Yuxin Chen, Junfei Tan, An Zhang, Zhengyi Yang, Leheng Sheng, Enzhi Zhang, Xiang Wang, and Tat-Seng Chua. 2024. On Softmax Direct Preference Optimiza- tion for Recommendation. In The Thirty-eighth Annual Conference on Neural Information Processing Systems. https://openreview.net/forum?id=qp5VbGTaM0

work page 2024
[5]

Sayak Ray Chowdhury, Anush Kini, and Nagarajan Natarajan. 2024. Provably Robust DPO: Aligning Language Models with Noisy Feedback. In ICML 2024

work page 2024
[6]

Paul Covington, Jay Adams, and Emre Sargin. 2016. Deep neural networks for youtube recommendations. In Proceedings of the 10th ACM conference on recommender systems. 191–198

work page 2016
[7]

Damai Dai, Chengqi Deng, Chenggang Zhao, RX Xu, Huazuo Gao, Deli Chen, Jiashi Li, Wangding Zeng, Xingkai Yu, Y Wu, et al. 2024. Deepseekmoe: Towards ultimate expert specialization in mixture-of-experts language models. arXiv preprint arXiv:2401.06066 (2024)

work page internal anchor Pith review Pith/arXiv arXiv 2024
[8]

Nicola De Cao, Gautier Izacard, Sebastian Riedel, and Fabio Petroni. 2020. Au- toregressive entity retrieval. arXiv preprint arXiv:2010.00904 (2020)

work page arXiv 2020
[9]

Nan Du, Yanping Huang, Andrew M Dai, Simon Tong, Dmitry Lepikhin, Yuanzhong Xu, Maxim Krikun, Yanqi Zhou, Adams Wei Yu, Orhan Firat, et al

work page
[10]

In International Conference on Machine Learning

Glam: Efficient scaling of language models with mixture-of-experts. In International Conference on Machine Learning . PMLR, 5547–5569

work page
[11]

Abhimanyu Dubey, Abhinav Jauhri, Abhinav Pandey, Abhishek Kadian, Ahmad Al-Dahle, Aiesha Letman, Akhil Mathur, Alan Schelten, Amy Yang, Angela Fan, et al. 2024. The llama 3 herd of models. arXiv preprint arXiv:2407.21783 (2024)

work page internal anchor Pith review Pith/arXiv arXiv 2024
[12]

Hongliang Fei, Jingyuan Zhang, Xingxuan Zhou, Junhao Zhao, Xinyang Qi, and Ping Li. 2021. GemNN: gating-enhanced multi-task neural networks with feature interaction learning for CTR prediction. In Proceedings of the 44th international ACM SIGIR conference on research and development in information retrieval . 2166– 2171

work page 2021
[13]

Chao Feng, Wuchao Li, Defu Lian, Zheng Liu, and Enhong Chen. 2022. Recom- mender forest for efficient retrieval. Advances in Neural Information Processing Systems 35 (2022), 38912–38924

work page 2022
[14]

Luke Gallagher, Ruey-Cheng Chen, Roi Blanco, and J Shane Culpepper. 2019. Joint optimization of cascade ranking models. In Proceedings of the twelfth ACM international conference on web search and data mining . 15–23

work page 2019
[15]

Tiezheng Ge, Kaiming He, Qifa Ke, and Jian Sun. 2013. Optimized product quantization. IEEE transactions on pattern analysis and machine intelligence 36, 4 (2013), 744–755

work page 2013
[16]

Huifeng Guo, Ruiming Tang, Yunming Ye, Zhenguo Li, and Xiuqiang He. 2017. DeepFM: a factorization-machine based neural network for CTR prediction.arXiv preprint arXiv:1703.04247 (2017)

work page arXiv 2017
[17]

B Hidasi. 2015. Session-based Recommendations with Recurrent Neural Networks. arXiv preprint arXiv:1511.06939 (2015)

work page internal anchor Pith review Pith/arXiv arXiv 2015
[18]

Michael E Houle and Michael Nett. 2014. Rank-based similarity search: Reducing the dimensional dependence. IEEE transactions on pattern analysis and machine intelligence 37, 1 (2014), 136–150

work page 2014
[19]

Jiri Hron, Karl Krauth, Michael Jordan, and Niki Kilbertus. 2021. On component interactions in two-stage recommender systems. Advances in neural information processing systems 34 (2021), 2744–2757

work page 2021
[20]

Po-Sen Huang, Xiaodong He, Jianfeng Gao, Li Deng, Alex Acero, and Larry Heck. 2013. Learning deep structured semantic models for web search using clickthrough data. In Proceedings of the 22nd ACM international conference on Information & Knowledge Management . 2333–2338

work page 2013
[21]

Xu Huang, Defu Lian, Jin Chen, Liu Zheng, Xing Xie, and Enhong Chen. 2023. Cooperative Retriever and Ranker in Deep Recommenders. In Proceedings of the ACM Web Conference 2023. 1150–1161

work page 2023
[22]

Herve Jegou, Matthijs Douze, and Cordelia Schmid. 2010. Product quantization for nearest neighbor search. IEEE transactions on pattern analysis and machine intelligence 33, 1 (2010), 117–128

work page 2010
[23]

Wang-Cheng Kang and Julian McAuley. 2018. Self-attentive sequential recom- mendation. In 2018 IEEE international conference on data mining (ICDM) . IEEE, 197–206

work page 2018
[24]

Zhirui Kuai, Zuxu Chen, Huimu Wang, Mingming Li, Dadong Miao, Wang Binbin, Xusong Chen, Li Kuang, Yuxing Han, Jiaxing Wang, et al. 2024. Breaking the Hourglass Phenomenon of Residual Quantization: Enhancing the Upper Bound of Generative Retrieval. In Proceedings of the 2024 Conference on Empirical Methods in Natural Language Processing: Industry Track . 677–685

work page 2024
[25]

Doyup Lee, Chiheon Kim, Saehoon Kim, Minsu Cho, and Wook-Shin Han. 2022. Autoregressive image generation using residual quantization. InProceedings of the IEEE/CVF Conference on Computer Vision and Pattern Recognition . 11523–11532

work page 2022
[26]

Han Liu, Yinwei Wei, Xuemeng Song, Weili Guan, Yuan-Fang Li, and Liqiang Nie. 2024. MMGRec: Multimodal Generative Recommendation with Transformer Model. arXiv preprint arXiv:2404.16555 (2024)

work page arXiv 2024
[27]

Shichen Liu, Fei Xiao, Wenwu Ou, and Luo Si. 2017. Cascade ranking for opera- tional e-commerce search. In Proceedings of the 23rd ACM SIGKDD International Conference on Knowledge Discovery and Data Mining . 1557–1565

work page 2017
[28]

Xinchen Luo, Jiangxia Cao, Tianyu Sun, Jinkai Yu, Rui Huang, Wei Yuan, Hezheng Lin, Yichen Zheng, Shiyao Wang, Qigen Hu, et al . 2024. QARM: Quantita- tive Alignment Multi-Modal Recommendation at Kuaishou. arXiv preprint arXiv:2411.11739 (2024)

work page arXiv 2024
[29]

Xu Ma, Pengjie Wang, Hui Zhao, Shaoguo Liu, Chuhan Zhao, Wei Lin, Kuang- Chih Lee, Jian Xu, and Bo Zheng. 2021. Towards a better tradeoff between effectiveness and efficiency in pre-ranking: A learnable feature selection based approach. In Proceedings of the 44th International ACM SIGIR Conference on Research and Development in Information Retrieval . 2036–2040

work page 2021
[30]

Yu Meng, Mengzhou Xia, and Danqi Chen. 2024. SimPO: Simple Preference Optimization with a Reference-Free Reward. In Advances in Neural Information Processing Systems (NeurIPS)

work page 2024
[31]

Eric Mitchell. [n. d.]. A note on dpo with noisy preferences and relationship to ipo, 2023. URL https://ericmitchell. ai/cdpo. pdf ([n. d.])

work page 2023
[32]

Marius Muja and David G Lowe. 2014. Scalable nearest neighbor algorithms for high dimensional data. IEEE transactions on pattern analysis and machine intelligence 36, 11 (2014), 2227–2240

work page 2014
[33]

Long Ouyang, Jeffrey Wu, Xu Jiang, Diogo Almeida, Carroll Wainwright, Pamela Mishkin, Chong Zhang, Sandhini Agarwal, Katarina Slama, Alex Ray, et al. 2022. Training language models to follow instructions with human feedback. Advances in neural information processing systems 35 (2022), 27730–27744

work page 2022
[34]

Qi Pi, Guorui Zhou, Yujing Zhang, Zhe Wang, Lejian Ren, Ying Fan, Xiaoqiang Zhu, and Kun Gai. 2020. Search-based user interest modeling with lifelong sequential behavior data for click-through rate prediction. In Proceedings of the 29th ACM International Conference on Information & Knowledge Management . 2685–2692

work page 2020
[35]

Jiarui Qin, Jiachen Zhu, Bo Chen, Zhirong Liu, Weiwen Liu, Ruiming Tang, Rui Zhang, Yong Yu, and Weinan Zhang. 2022. Rankflow: Joint optimization of multi- stage cascade ranking systems as flows. In Proceedings of the 45th International ACM SIGIR Conference on Research and Development in Information Retrieval . 814–824

work page 2022
[36]

Rafael Rafailov, Archit Sharma, Eric Mitchell, Christopher D Manning, Stefano Ermon, and Chelsea Finn. 2024. Direct preference optimization: Your language model is secretly a reward model. Advances in Neural Information Processing Systems 36 (2024)

work page 2024
[37]

Shashank Rajput, Nikhil Mehta, Anima Singh, Raghunandan Hulikal Keshavan, Trung Vu, Lukasz Heldt, Lichan Hong, Yi Tay, Vinh Tran, Jonah Samost, et al

work page
[38]

Advances in Neural Information Processing Systems 36 (2023), 10299–10315

Recommender systems with generative retrieval. Advances in Neural Information Processing Systems 36 (2023), 10299–10315

work page 2023
[39]

Wentao Shi, Jiawei Chen, Fuli Feng, Jizhi Zhang, Junkang Wu, Chongming Gao, and Xiangnan He. 2023. On the theories behind hard negative sampling for recommendation. In Proceedings of the ACM Web Conference 2023 . 812–822

work page 2023
[40]

Anshumali Shrivastava and Ping Li. 2014. Asymmetric LSH (ALSH) for sublinear time maximum inner product search (MIPS). Advances in neural information processing systems 27 (2014)

work page 2014
[41]

Nisan Stiennon, Long Ouyang, Jeffrey Wu, Daniel Ziegler, Ryan Lowe, Chelsea Voss, Alec Radford, Dario Amodei, and Paul F Christiano. 2020. Learning to summarize with human feedback. Advances in Neural Information Processing Systems 33 (2020), 3008–3021

work page 2020
[42]

Fei Sun, Jun Liu, Jian Wu, Changhua Pei, Xiao Lin, Wenwu Ou, and Peng Jiang

work page
[43]

In Proceedings of the 28th ACM international conference on information and knowledge management

BERT4Rec: Sequential recommendation with bidirectional encoder rep- resentations from transformer. In Proceedings of the 28th ACM international conference on information and knowledge management . 1441–1450

work page
[44]

Yubao Tang, Ruqing Zhang, Jiafeng Guo, and Maarten de Rijke. 2023. Recent advances in generative information retrieval. In Proceedings of the Annual In- ternational ACM SIGIR Conference on Research and Development in Information Retrieval in the Asia Pacific Region . 294–297

work page 2023
[45]

Yi Tay, Vinh Tran, Mostafa Dehghani, Jianmo Ni, Dara Bahri, Harsh Mehta, Zhen Qin, Kai Hui, Zhe Zhao, Jai Gupta, et al. 2022. Transformer memory as a differentiable search index. Advances in Neural Information Processing Systems 35 (2022), 21831–21843

work page 2022
[46]

Lidan Wang, Jimmy Lin, and Donald Metzler. 2011. A cascade ranking model for efficient ranked retrieval. In Proceedings of the 34th international ACM SIGIR conference on Research and development in Information Retrieval . 105–114

work page 2011
[47]

Yunli Wang, Zhiqiang Wang, Jian Yang, Shiyang Wen, Dongying Kong, Han Li, and Kun Gai. 2024. Adaptive Neural Ranking Framework: Toward Maximized Business Goal for Cascade Ranking Systems. In Proceedings of the ACM on Web Conference 2024. 3798–3809. Conference acronym ’XX, June 03–05, 2018, Woodstock, NY Deng et al

work page 2024
[48]

Ye Wang, Jiahao Xun, Minjie Hong, Jieming Zhu, Tao Jin, Wang Lin, Haoyuan Li, Linjun Li, Yan Xia, Zhou Zhao, et al. 2024. EAGER: Two-Stream Generative Recommender with Behavior-Semantic Collaboration. In Proceedings of the 30th ACM SIGKDD Conference on Knowledge Discovery and Data Mining . 3245–3254

work page 2024
[49]

Zhe Wang, Liqin Zhao, Biye Jiang, Guorui Zhou, Xiaoqiang Zhu, and Kun Gai

work page
[50]

arXiv preprint arXiv:2007.16122 (2020)

Cold: Towards the next generation of pre-ranking system. arXiv preprint arXiv:2007.16122 (2020)

work page arXiv 2007
[51]

Haoran Xu, Amr Sharaf, Yunmo Chen, Weiting Tan, Lingfeng Shen, Benjamin Van Durme, Kenton Murray, and Young Jin Kim. 2024. Contrastive preference optimization: Pushing the boundaries of llm performance in machine translation. arXiv preprint arXiv:2401.08417 (2024)

work page arXiv 2024
[52]

Shuyuan Xu, Wenyue Hua, and Yongfeng Zhang. 2024. Openp5: An open-source platform for developing, training, and evaluating llm-based recommender sys- tems. In Proceedings of the 47th International ACM SIGIR Conference on Research and Development in Information Retrieval . 386–394

work page 2024
[53]

Neil Zeghidour, Alejandro Luebs, Ahmed Omran, Jan Skoglund, and Marco Tagliasacchi. 2021. Soundstream: An end-to-end neural audio codec. IEEE/ACM Transactions on Audio, Speech, and Language Processing 30 (2021), 495–507

work page 2021
[54]

Tingting Zhang, Pengpeng Zhao, Yanchi Liu, Victor S Sheng, Jiajie Xu, Deqing Wang, Guanfeng Liu, Xiaofang Zhou, et al . 2019. Feature-level deeper self- attention network for sequential recommendation.. In IJCAI. 4320–4326

work page 2019
[55]

Bowen Zheng, Yupeng Hou, Hongyu Lu, Yu Chen, Wayne Xin Zhao, Ming Chen, and Ji-Rong Wen. 2024. Adapting large language models by integrating collaborative semantics for recommendation. In 2024 IEEE 40th International Conference on Data Engineering (ICDE) . IEEE, 1435–1448

work page 2024
[56]

Guorui Zhou, Na Mou, Ying Fan, Qi Pi, Weijie Bian, Chang Zhou, Xiaoqiang Zhu, and Kun Gai. 2019. Deep interest evolution network for click-through rate prediction. In Proceedings of the AAAI conference on artificial intelligence , Vol. 33. 5941–5948

work page 2019
[57]

Guorui Zhou, Xiaoqiang Zhu, Chenru Song, Ying Fan, Han Zhu, Xiao Ma, Yanghui Yan, Junqi Jin, Han Li, and Kun Gai. 2018. Deep interest network for click-through rate prediction. In Proceedings of the 24th ACM SIGKDD international conference on knowledge discovery & data mining . 1059–1068

work page 2018
[58]

Han Zhu, Xiang Li, Pengye Zhang, Guozheng Li, Jie He, Han Li, and Kun Gai

work page
[59]

In Proceedings of the 24th ACM SIGKDD international conference on knowledge discovery & data mining

Learning tree-based deep model for recommender systems. In Proceedings of the 24th ACM SIGKDD international conference on knowledge discovery & data mining. 1079–1088

work page
[60]

Barret Zoph, Irwan Bello, Sameer Kumar, Nan Du, Yanping Huang, Jeff Dean, Noam Shazeer, and William Fedus. 2022. Designing effective sparse expert models. arXiv preprint arXiv:2202.08906 2, 3 (2022), 17

work page internal anchor Pith review arXiv 2022