pith. machine review for the scientific record. sign in

arxiv: 2604.14878 · v1 · submitted 2026-04-16 · 💻 cs.IR · cs.AI

Recognition: unknown

GenRec: A Preference-Oriented Generative Framework for Large-Scale Recommendation

Binglei Zhao, Jiabao Gao, Junbo Qi, Kewei Xu, Lunsong Huang, Shengjie Li, Sulong Xu, Xuanhua Yang, Yanyan Zou, Yu Li

Authors on Pith no claims yet

Pith reviewed 2026-05-10 09:55 UTC · model grok-4.3

classification 💻 cs.IR cs.AI
keywords generative retrievalrecommendation systemsnext-token predictionreinforcement learningsemantic IDsuser preference alignmentonline A/B testing
0
0 comments X

The pith

A generative recommendation model with page-wise next-token prediction and hybrid-reward reinforcement learning delivers 9.5% more clicks and 8.7% more transactions in live A/B tests.

A machine-rendered reading of the paper's core claim, the machinery that carries it, and where it could break.

The paper introduces GenRec, a single decoder-only generative framework for large-scale recommendation that tackles inconsistent outputs from pagination, high encoding costs for long sequences, and misalignment with user preferences. It proposes supervising the model on whole interaction pages at once for stronger training signals, compressing multi-token item representations asymmetrically to cut input size in half, and applying a group-relative policy optimization method stabilized by regularization and hybrid rewards. These changes enable the generative approach to outperform the existing production pipeline in real user behavior metrics. A reader would care because this suggests generative models can handle the complexities of industrial recommendation systems more effectively than before.

Core claim

GenRec resolves three scaling challenges in generative retrieval for recommendation through a unified decoder-only model: Page-wise NTP supervises over full pages to provide denser gradients and resolve one-to-many ambiguities; an asymmetric linear Token Merger compresses semantic ID prompts while preserving decoding resolution; and GRPO-SR pairs group relative policy optimization with NLL regularization and hybrid rewards to align outputs with nuanced user satisfaction without reward hacking.

What carries the argument

The combination of Page-wise NTP training objective, asymmetric Token Merger, and GRPO-SR reinforcement learning procedure in a single decoder-only architecture.

If this is right

  • Page-wise supervision yields denser gradient signals and avoids point-wise training ambiguities.
  • The Token Merger reduces prompt length by roughly two times with little loss in accuracy.
  • GRPO-SR improves policy alignment with user preferences while maintaining training stability.
  • Live deployment produces higher click counts and transaction counts than the prior system.

Where Pith is reading between the lines

These are editorial extensions of the paper, not claims the author makes directly.

  • This generative setup might eventually consolidate retrieval and ranking into one model pass for efficiency.
  • Page-wise objectives could extend to other domains with paginated or batched sequential data.
  • The hybrid reward design offers a template for preventing hacking in other reinforcement learning applications to recommendation.
  • Ablation results would clarify which component contributes most to the gains.

Load-bearing premise

The lifts in click and transaction counts are caused by the new Page-wise NTP, Token Merger, and GRPO-SR components instead of other unstated production changes or chance.

What would settle it

An online experiment that adds or removes the proposed components one at a time, with all other factors held constant, and checks whether the performance improvements appear or disappear accordingly.

Figures

Figures reproduced from arXiv: 2604.14878 by Binglei Zhao, Jiabao Gao, Junbo Qi, Kewei Xu, Lunsong Huang, Shengjie Li, Sulong Xu, Xuanhua Yang, Yanyan Zou, Yu Li.

Figure 1
Figure 1. Figure 1: Model architecture of GenRec. High-dimensional items are quantized into Semantic IDs. To enhance efficiency, an [PITH_FULL_IMAGE:figures/full_fig_p003_1.png] view at source ↗
Figure 2
Figure 2. Figure 2: SFT loss curves. (a) Page-wise NTP converges faster than NTP. (b) Larger models achieve lower loss with diminishing [PITH_FULL_IMAGE:figures/full_fig_p005_2.png] view at source ↗
read the original abstract

Generative Retrieval (GR) offers a promising paradigm for recommendation through next-token prediction (NTP). However, scaling it to large-scale industrial systems introduces three challenges: (i) within a single request, the identical model inputs may produce inconsistent outputs due to the pagination request mechanism; (ii) the prohibitive cost of encoding long user behavior sequences with multi-token item representations based on semantic IDs, and (iii) aligning the generative policy with nuanced user preference signals. We present GenRec, a preference-oriented generative framework deployed on the JD App that addresses above challenges within a single decoder-only architecture. For training objective, we propose Page-wise NTP task, which supervises over an entire interaction page rather than each interacted item individually, providing denser gradient signal and resolving the one-to-many ambiguity of point-wise training. On the prefilling side, an asymmetric linear Token Merger compresses multi-token Semantic IDs in the prompt while preserving full-resolution decoding, reducing input length by ~2X with negligible accuracy loss. To further align outputs with user satisfaction, we introduce GRPO-SR, a reinforcement learning method that pairs Group Relative Policy Optimization with NLL regularization for training stability, and employs Hybrid Rewards combining a dense reward model with a relevance gate to mitigate reward hacking. In month-long online A/B tests serving production traffic, GenRec achieves 9.5% improvement in click count and 8.7% in transaction count over the existing pipeline.

Editorial analysis

A structured set of objections, weighed in public.

Desk editor's note, referee report, simulated authors' rebuttal, and a circularity audit. Tearing a paper down is the easy half of reading it; the pith above is the substance, this is the friction.

Referee Report

1 major / 1 minor

Summary. The paper introduces GenRec, a decoder-only generative retrieval framework for large-scale recommendation. It proposes Page-wise NTP to supervise entire interaction pages and resolve one-to-many ambiguity, an asymmetric linear Token Merger to compress multi-token semantic IDs in prompts by ~2X, and GRPO-SR (Group Relative Policy Optimization with NLL regularization and hybrid dense+relevance rewards) to align the policy with user satisfaction. The system is deployed on the JD App; the central empirical claim is that month-long online A/B tests on production traffic yield 9.5% higher click count and 8.7% higher transaction count versus the existing pipeline.

Significance. If the reported lifts are attributable to the three proposed components, the work would be significant for demonstrating a production-scale generative retrieval system that directly tackles pagination inconsistency, encoding cost, and preference alignment within a single architecture. The use of live A/B tests on real traffic supplies direct outcome evidence rather than relying solely on offline metrics, which is a clear strength.

major comments (1)
  1. [Online A/B tests description] The description of the online A/B tests (abstract and Experiments section) reports 9.5% and 8.7% lifts but supplies no traffic-split ratio, p-value or confidence interval, statement that the control arm was held fixed, or online ablation isolating Page-wise NTP, Token Merger, and GRPO-SR. Without these controls the observed deltas cannot be confidently attributed to the proposed methods rather than system drift or unmentioned changes.
minor comments (1)
  1. [Abstract] Abstract: 'addresses above challenges' should read 'addresses the above challenges'.

Simulated Author's Rebuttal

1 responses · 0 unresolved

We thank the referee for the constructive feedback on the reporting of our online A/B tests. We agree that additional experimental details are needed to strengthen attribution of the observed lifts and will incorporate them in the revision.

read point-by-point responses
  1. Referee: [Online A/B tests description] The description of the online A/B tests (abstract and Experiments section) reports 9.5% and 8.7% lifts but supplies no traffic-split ratio, p-value or confidence interval, statement that the control arm was held fixed, or online ablation isolating Page-wise NTP, Token Merger, and GRPO-SR. Without these controls the observed deltas cannot be confidently attributed to the proposed methods rather than system drift or unmentioned changes.

    Authors: We acknowledge that the current manuscript omits several key details required for rigorous interpretation of the production A/B results. In the revised version we will expand the Experiments section (and update the abstract if space permits) to report: (1) the traffic-split ratio (50/50), (2) p-values and 95% confidence intervals for the 9.5% click-count and 8.7% transaction-count lifts, (3) an explicit statement that the control arm remained unchanged for the full month-long test window, and (4) any component-wise online ablation results we can obtain or have already run. While full isolation of each proposed component (Page-wise NTP, Token Merger, GRPO-SR) in live traffic is operationally costly, we will include the strongest available ablation evidence and note any limitations. These additions will allow readers to more confidently attribute the gains to the proposed methods. revision: yes

Circularity Check

0 steps flagged

No circularity: claims rest on external A/B tests and novel components

full rationale

The paper motivates three engineering challenges in scaling generative retrieval, then introduces Page-wise NTP (supervising entire pages), asymmetric Token Merger (compressing semantic IDs), and GRPO-SR (RL with hybrid rewards) inside a decoder-only model. These are presented as design choices, not derived predictions. Validation comes from month-long production A/B tests measuring click and transaction lifts against the existing pipeline. No equations, fitted parameters, or self-citations are shown to reduce the central claims to inputs by construction. The performance numbers are measured externally and do not collapse into self-referential definitions or renamings of known results. The derivation chain is therefore self-contained against external benchmarks.

Axiom & Free-Parameter Ledger

0 free parameters · 1 axioms · 0 invented entities

The work rests on standard assumptions of the generative retrieval paradigm and reinforcement learning stability techniques; no new free parameters, axioms, or invented entities are introduced beyond those already present in the prior literature referenced by the abstract.

axioms (1)
  • domain assumption Next-token prediction on item sequences can capture user preference signals at scale
    Implicit foundation of the generative retrieval approach described in the abstract.

pith-pipeline@v0.9.0 · 5588 in / 1295 out tokens · 31337 ms · 2026-05-10T09:55:12.626637+00:00 · methodology

discussion (0)

Sign in with ORCID, Apple, or X to comment. Anyone can read and Pith papers without signing in.

Forward citations

Cited by 1 Pith paper

Reviewed papers in the Pith corpus that reference this work. Sorted by Pith novelty score.

  1. Conditional Memory Enhanced Item Representation for Generative Recommendation

    cs.IR 2026-05 unverdicted novelty 6.0

    ComeIR introduces dual-level Engram memory and memory-restoring prediction to reconstruct SID-token embeddings and restore token granularity in generative recommendation.

Reference graph

Works this paper leans on

31 extracted references · 18 canonical work pages · cited by 1 Pith paper · 5 internal anchors

  1. [1]

    Shuai Bai, Keqin Chen, Xuejing Liu, Jialin Wang, Wenbin Ge, Sibo Song, Kai Dang, Peng Wang, Shijie Wang, Jun Tang, et al . 2025. Qwen2. 5-vl technical report.arXiv preprint arXiv:2502.13923(2025). SIGIR ’26, July 20–24, 2026, Melbourne, VIC, Australia Yanyan Zou et al

  2. [2]

    Runjin Chen, Mingxuan Ju, Ngoc Bui, Dimosthenis Antypas, Stanley Cai, Xi- aopeng Wu, Leonardo Neves, Zhangyang Wang, Neil Shah, and Tong Zhao

  3. [3]

    Enhancing item tokenization for generative recommendation through self-improvement.arXiv preprint arXiv:2412.17171, 2024

    Enhancing Item Tokenization for Generative Recommendation through Self-Improvement. arXiv:2412.17171 [cs.LG] https://arxiv.org/abs/2412.17171

  4. [4]

    Sunhao Dai, Jiakai Tang, Jiahua Wu, Kun Wang, Yuxuan Zhu, Bingjun Chen, Bangyang Hong, Yu Zhao, Cong Fu, Kangle Wu, Yabo Ni, Anxiang Zeng, Wen- jie Wang, Xu Chen, Jun Xu, and See-Kiong Ng. 2025. OnePiece: Bringing Context Engineering and Reasoning to Industrial Cascade Ranking System. arXiv:2509.18091 [cs.IR] https://arxiv.org/abs/2509.18091

  5. [5]

    Mukund Deshpande and George Karypis. 2004. Item-based top-n recommenda- tion algorithms.ACM Transactions on Information Systems (TOIS)22, 1 (2004), 143–177

  6. [6]

    Ruidong Han, Bin Yin, Shangyu Chen, He Jiang, Fei Jiang, Xiang Li, Chi Ma, Mincong Huang, Xiaoguang Li, Chunzhen Jing, Yueming Han, MengLei Zhou, Lei Yu, Chuan Liu, and Wei Lin. 2025. MTGR: Industrial-Scale Generative Rec- ommendation Framework in Meituan. InProceedings of the 34th ACM Interna- tional Conference on Information and Knowledge Management (CI...

  7. [7]

    Peiyu Hu, Wayne Lu, and Jia Wang. 2025. From IDs to Semantics: A Genera- tive Framework for Cross-Domain Recommendation with Adaptive Semantic Tokenization. arXiv:2511.08006 [cs.IR] https://arxiv.org/abs/2511.08006

  8. [8]

    Kalervo Järvelin and Jaana Kekäläinen. 2002. Cumulated gain-based evaluation of IR techniques.ACM Transactions on Information Systems (TOIS)20, 4 (2002), 422–446

  9. [9]

    Wang-Cheng Kang and Julian McAuley. 2018. Self-attentive sequential recom- mendation. In2018 IEEE international conference on data mining (ICDM). IEEE, 197–206

  10. [10]

    Diederik P Kingma. 2014. Adam: A method for stochastic optimization.arXiv preprint arXiv:1412.6980(2014)

  11. [11]

    Xiaoxi Li, Jiajie Jin, Yujia Zhou, Yuyao Zhang, Peitian Zhang, Yutao Zhu, and Zhicheng Dou. 2025. From matching to generation: A survey on generative information retrieval.ACM Transactions on Information Systems43, 3 (2025), 1–62

  12. [12]

    Yang Li, Kangbo Liu, Ranjan Satapathy, Suhang Wang, and Erik Cambria. 2024. Recent Developments in Recommender Systems: A Survey [Review Article].IEEE Computational Intelligence Magazine19, 2 (2024), 78–95. doi:10.1109/MCI.2024. 3363984

  13. [13]

    Guanyu Lin, Zhigang Hua, Tao Feng, Shuang Yang, Bo Long, and Jiaxuan You

  14. [14]

    arXiv:2502.16474 [cs.IR] https://arxiv.org/abs/2502.16474

    Unified Semantic and ID Representation Learning for Deep Recommenders. arXiv:2502.16474 [cs.IR] https://arxiv.org/abs/2502.16474

  15. [15]

    Ruihui Mu. 2018. A Survey of Recommender Systems Based on Deep Learning. IEEE Access6 (2018), 69009–69022. doi:10.1109/ACCESS.2018.2880197

  16. [16]

    Qi Pi, Guorui Zhou, Yujing Zhang, Zhe Wang, Lejian Ren, Ying Fan, Xiaoqiang Zhu, and Kun Gai. 2020. Search-based user interest modeling with lifelong sequential behavior data for click-through rate prediction. InProceedings of the 29th ACM International Conference on Information & Knowledge Management. 2685–2692

  17. [17]

    Qwen, :, An Yang, Baosong Yang, Beichen Zhang, Binyuan Hui, Bo Zheng, Bowen Yu, Chengyuan Li, Dayiheng Liu, Fei Huang, Haoran Wei, Huan Lin, Jian Yang, Jianhong Tu, Jianwei Zhang, Jianxin Yang, Jiaxi Yang, Jingren Zhou, Junyang Lin, Kai Dang, Keming Lu, Keqin Bao, Kexin Yang, Le Yu, Mei Li, Mingfeng Xue, Pei Zhang, Qin Zhu, Rui Men, Runji Lin, Tianhao Li,...

  18. [18]

    Shashank Rajput, Nikhil Mehta, Anima Singh, Raghunandan Hulikal Keshavan, Trung Vu, Lukasz Heldt, Lichan Hong, Yi Tay, Vinh Tran, Jonah Samost, et al

  19. [19]

    Recommender systems with generative retrieval.Advances in Neural Information Processing Systems36 (2023), 10299–10315

  20. [20]

    Zhihong Shao, Peiyi Wang, Qihao Zhu, Runxin Xu, Junxiao Song, Xiao Bi, Haowei Zhang, Mingchuan Zhang, Y. K. Li, Y. Wu, and Daya Guo. 2024. DeepSeek- Math: Pushing the Limits of Mathematical Reasoning in Open Language Models. arXiv:2402.03300 [cs.CL] https://arxiv.org/abs/2402.03300

  21. [21]

    Fei Sun, Jun Liu, Jian Wu, Changhua Pei, Xiao Lin, Wenwu Ou, and Peng Jiang

  22. [22]

    BERT4Rec: Sequential Recommendation with Bidirectional Encoder Repre- sentations from Transformers. InCIKM

  23. [23]

    Wenjie Wang, Honghui Bao, Xinyu Lin, Jizhi Zhang, Yongqi Li, Fuli Feng, See- Kiong Ng, and Tat-Seng Chua. 2025. Learnable Item Tokenization for Generative Recommendation. arXiv:2405.07314 [cs.IR] https://arxiv.org/abs/2405.07314

  24. [24]

    Chaojun Xiao, Jie Cai, Weilin Zhao, Biyuan Lin, Guoyang Zeng, Jie Zhou, Zhi Zheng, Xu Han, Zhiyuan Liu, and Maosong Sun. 2025. Densing law of llms. Nature Machine Intelligence(2025), 1–11

  25. [25]

    Yuhao Yang, Zhi Ji, Zhaopeng Li, Yi Li, Zhonglin Mo, Yue Ding, Kai Chen, Zijian Zhang, Jie Li, Shuanglong Li, et al. 2025. Sparse meets dense: Unified generative recommendations with cascaded sparse-dense representations.arXiv preprint arXiv:2503.02453(2025)

  26. [26]

    Jiaqi Zhai, Lucy Liao, Xing Liu, Yueming Wang, Rui Li, Xuan Cao, Leon Gao, Zhao- jie Gong, Fangda Gu, Michael He, Yinghai Lu, and Yu Shi. 2024. Actions Speak Louder than Words: Trillion-Parameter Sequential Transducers for Generative Recommendations. arXiv:2402.17152 [cs.LG] https://arxiv.org/abs/2402.17152

  27. [27]

    Jun Zhang, Yi Li, Yue Liu, Changping Wang, Yuan Wang, Yuling Xiong, Xun Liu, Haiyang Wu, Qian Li, Enming Zhang, Jiawei Sun, Xin Xu, Zishuai Zhang, Ruoran Liu, Suyuan Huang, Zhaoxin Zhang, Zhengkai Guo, Shuojin Yang, Meng-Hao Guo, Huan Yu, Jie Jiang, and Shi-Min Hu. 2025. GPR: Towards a Generative Pre-trained One-Model Paradigm for Large-Scale Advertising ...

  28. [28]

    Junjie Zhang, Beichen Zhang, Wenqi Sun, Hongyu Lu, Wayne Xin Zhao, Yu Chen, and Ji-Rong Wen. 2025. Slow Thinking for Sequential Recommendation. arXiv:2504.09627 [cs.IR] https://arxiv.org/abs/2504.09627

  29. [29]

    Zhaoqi Zhang, Haolei Pei, Jun Guo, Tianyu Wang, Yufei Feng, Hui Sun, Shaowei Liu, and Aixin Sun. 2025. OneTrans: Unified Feature Interaction and Sequence Modeling with One Transformer in Industrial Recommender. arXiv:2510.26104 [cs.IR] https://arxiv.org/abs/2510.26104

  30. [30]

    Bowen Zheng, Yupeng Hou, Hongyu Lu, Yu Chen, Wayne Xin Zhao, Ming Chen, and Ji-Rong Wen. 2024. Adapting large language models by integrating collaborative semantics for recommendation. In2024 IEEE 40th International Conference on Data Engineering (ICDE). IEEE, 1435–1448

  31. [31]

    Guorui Zhou, Jiaxin Deng, Jinghao Zhang, Kuo Cai, Lejian Ren, Qiang Luo, Qianqian Wang, Qigen Hu, Rui Huang, Shiyao Wang, Weifeng Ding, Wuchao Li, Xinchen Luo, Xingmei Wang, Zexuan Cheng, Zixing Zhang, Bin Zhang, Boxuan Wang, Chaoyi Ma, Chengru Song, Chenhui Wang, Di Wang, Dongxue Meng, Fan Yang, Fangyu Zhang, Feng Jiang, Fuxing Zhang, Gang Wang, Guowang ...