pith. machine review for the scientific record. sign in

arxiv: 2601.22925 · v2 · submitted 2026-01-30 · 💻 cs.IR · cs.AI· cs.LG

Recognition: 2 theorem links

· Lean Theorem

BEAR: Towards Beam-Search-Aware Optimization for Recommendation with Large Language Models

Authors on Pith no claims yet

Pith reviewed 2026-05-16 09:22 UTC · model grok-4.3

classification 💻 cs.IR cs.AIcs.LG
keywords LLM recommendationbeam searchfine-tuningtraining-inference gapregularizationtop-B pruning
0
0 comments X

The pith

BEAR adds a regularization term to LLM fine-tuning that keeps every token of a positive item inside the top-B candidates at each decoding step.

A machine-rendered reading of the paper's core claim, the machinery that carries it, and where it could break.

Standard supervised fine-tuning for large language models in recommendation optimizes the full probability of positive items yet leaves them vulnerable to early pruning by beam search at inference time. The inconsistency arises because beam search discards sequences whose prefix probabilities fall outside the top-B candidates even when the final item probability is high. BEAR enforces the necessary condition that each token of a positive item must rank in the top-B at every step, without simulating full beam search during training. This change incurs almost no extra cost while raising the chance that correct items survive pruning. Experiments on four datasets show consistent gains over strong baselines that use plain supervised fine-tuning.

Core claim

BEAR enforces a relaxed necessary condition: each token in a positive item must rank within the top-B candidate tokens at each decoding step. This objective mitigates the risk of incorrect pruning while adding negligible computational overhead compared with standard supervised fine-tuning.

What carries the argument

A beam-search-aware regularization term added to the supervised fine-tuning loss that penalizes any step where a positive-item token falls outside the top-B candidates.

Load-bearing premise

Keeping positive-item tokens inside the top-B at every training step is sufficient to stop them from being pruned by beam search at inference and does not degrade other ranking metrics.

What would settle it

Run inference beam search on held-out positive items and count how many that satisfied the top-B token condition during BEAR training are still discarded, then compare that count with the same items under plain supervised fine-tuning.

Figures

Figures reproduced from arXiv: 2601.22925 by Bohao Wang, Canghong Jin, Can Wang, Jiawei Chen, Jingbang Chen, Shengjia Zhang, Weiqin Yang, Zhenxiang Xu.

Figure 1
Figure 1. Figure 1: Illustration of beam search applied to SFT-trained LLMs [PITH_FULL_IMAGE:figures/full_fig_p002_1.png] view at source ↗
Figure 2
Figure 2. Figure 2: Instruction prompt template. Contributions. In summary, our contributions are as follows: • We identify and analyze the critical issue of training-inference inconsistency in LLM-based recommendation tasks, and advo￾cate for explicitly incorporating the impact of beam search into the design of fine-tuning objectives. • We propose BEAR, a novel beam-search-aware fine-tuning objec￾tive that optimizes a necess… view at source ↗
Figure 5
Figure 5. Figure 5: Running time comparison of three training objec [PITH_FULL_IMAGE:figures/full_fig_p004_5.png] view at source ↗
Figure 6
Figure 6. Figure 6: Pruning Rate (PR@𝐾) comparison between SFT and BEAR. During beam search with beam size 𝐵, PR@𝐾 measures the proportion of incorrectly pruned positive items with high overall probabilities (i.e., top-𝐾 ranked among candidates). A lower PR@𝐾 indicates better training-inference alignment. Results show that BEAR significantly reduces PR@𝐾 compared to SFT. are then split into training, validation, and test sets… view at source ↗
Figure 7
Figure 7. Figure 7: Performance of applying BEAR to S-DPO [10]. In addition to the original S-DPO loss, we apply the positive BEAR regularization to the positive items and the negative BEAR regularization to the sampled negative items. Results show that BEAR significantly enhances the performance of S-DPO across all datasets. Mitigation of Training-Inference Inconsistency (RQ2). To val￾idate the effectiveness of BEAR in allev… view at source ↗
Figure 9
Figure 9. Figure 9: Training efficiency comparison between BEAR and [PITH_FULL_IMAGE:figures/full_fig_p009_9.png] view at source ↗
read the original abstract

Recent years have seen a rapid surge in research leveraging Large Language Models (LLMs) for recommendation. These methods typically employ supervised fine-tuning (SFT) to adapt LLMs to recommendation scenarios, and utilize beam search during inference to efficiently retrieve $B$ top-ranked recommended items. However, we identify a critical training-inference inconsistency: while SFT optimizes the overall probability of positive items, it does not guarantee that such items will be retrieved by beam search even if they possess high overall probabilities. Due to the greedy pruning mechanism, beam search can prematurely discard a positive item once its prefix probability is insufficient. To address this inconsistency, we propose BEAR (Beam-SEarch-Aware Regularization), a novel fine-tuning objective that explicitly accounts for beam search behavior during training. Rather than directly simulating beam search for each instance during training, which is computationally prohibitive, BEAR enforces a relaxed necessary condition: each token in a positive item must rank within the top-$B$ candidate tokens at each decoding step. This objective effectively mitigates the risk of incorrect pruning while incurring negligible computational overhead compared to standard SFT. Extensive experiments across four real-world datasets demonstrate that BEAR significantly outperforms strong baselines. Code is available at https://github.com/Tiny-Snow/BEAR-SIGIR-2026 .

Editorial analysis

A structured set of objections, weighed in public.

Desk editor's note, referee report, simulated authors' rebuttal, and a circularity audit. Tearing a paper down is the easy half of reading it; the pith above is the substance, this is the friction.

Referee Report

2 major / 2 minor

Summary. The paper identifies a training-inference mismatch in LLM-based recommendation: standard SFT optimizes the overall probability of positive items but does not prevent their premature pruning during beam search at inference when prefix probabilities fall below competing paths. It introduces BEAR, which augments the fine-tuning loss with a regularization term enforcing that each token of a positive item ranks among the top-B candidates at every decoding step. This is presented as a relaxed necessary condition that mitigates incorrect pruning at negligible extra cost. Experiments across four real-world datasets report that BEAR outperforms strong baselines.

Significance. If the local top-B condition proves sufficient to preserve positive sequences under actual beam search and does not degrade other ranking metrics, BEAR would offer a lightweight, practical alignment between training and inference for generative recommenders. The availability of code and the multi-dataset evaluation are positive factors that would strengthen adoption if the empirical claims are robustly supported.

major comments (2)
  1. [§3] §3 (Method), the definition of the BEAR objective: the manuscript asserts that enforcing per-token top-B ranking at each step mitigates the risk of incorrect pruning, yet provides no derivation or counter-example analysis demonstrating that this local condition guarantees survival of the full sequence under global beam search (where pruning decisions depend on cumulative log-probabilities across competing beams). The skeptic concern that a sequence can satisfy the per-step condition while still being outranked cumulatively is not addressed.
  2. [Experiments] Experiments section: results claim significant outperformance, but the reported tables lack full details on exact metric values for all baselines, statistical significance tests (e.g., paired t-tests or Wilcoxon), and ablation isolating BEAR from standard SFT. Without these, the central empirical claim that the relaxed condition improves retrieval quality remains only moderately supported.
minor comments (2)
  1. [Abstract] Abstract and §1: the phrase 'significantly outperforms' should be accompanied by concrete improvement magnitudes or primary metrics to allow readers to gauge practical impact without reading the full results.
  2. [§3] Notation in §3: the exact formulation of the regularization term (e.g., how the top-B ranking is turned into a loss) could be clarified with a short pseudocode snippet or explicit equation reference to avoid ambiguity in implementation.

Simulated Author's Rebuttal

2 responses · 0 unresolved

We thank the referee for the constructive and detailed feedback. We address each major comment below, clarifying our approach where appropriate and outlining planned revisions to strengthen the manuscript.

read point-by-point responses
  1. Referee: [§3] §3 (Method), the definition of the BEAR objective: the manuscript asserts that enforcing per-token top-B ranking at each step mitigates the risk of incorrect pruning, yet provides no derivation or counter-example analysis demonstrating that this local condition guarantees survival of the full sequence under global beam search (where pruning decisions depend on cumulative log-probabilities across competing beams). The skeptic concern that a sequence can satisfy the per-step condition while still being outranked cumulatively is not addressed.

    Authors: We thank the referee for highlighting this distinction. The BEAR objective enforces a relaxed necessary condition: if any token of a positive item falls outside the top-B candidates at its decoding step, the corresponding path is immediately pruned by beam search, independent of later cumulative scores. This directly targets the premature pruning issue identified in the paper. We agree that the condition is not sufficient to guarantee the sequence will survive global beam search, as cumulative log-probabilities across competing beams ultimately determine retention. In the revised manuscript we will expand §3 to explicitly state this limitation, include a brief counter-example showing a case where per-step top-B ranking holds yet the full sequence is outranked cumulatively, and explain why the local condition nonetheless provides practical mitigation at negligible cost. No change to the core objective is required. revision: partial

  2. Referee: Experiments section: results claim significant outperformance, but the reported tables lack full details on exact metric values for all baselines, statistical significance tests (e.g., paired t-tests or Wilcoxon), and ablation isolating BEAR from standard SFT. Without these, the central empirical claim that the relaxed condition improves retrieval quality remains only moderately supported.

    Authors: We agree that additional experimental details will strengthen the empirical claims. In the revised version we will expand the tables to report exact metric values for every baseline, add statistical significance results using paired t-tests (with p-values) computed over multiple random seeds, and include a dedicated ablation study that isolates the BEAR regularization term from standard SFT. These updates will be placed in the Experiments section and will directly address the concern about moderate support for the central claim. revision: yes

Circularity Check

0 steps flagged

No circularity: BEAR objective is constructed directly from beam-search mechanics

full rationale

The paper's central derivation introduces BEAR as a regularization term that enforces each positive-item token to lie in the top-B candidates at its decoding step. This condition is defined explicitly from the standard beam-search pruning rule (select B highest cumulative-score paths) and does not reduce to any fitted parameter, self-citation chain, or quantity defined inside the paper itself. The objective is therefore an independent modeling choice rather than a tautological restatement of its inputs. No load-bearing step collapses by construction.

Axiom & Free-Parameter Ledger

0 free parameters · 1 axioms · 0 invented entities

The method rests on the standard domain assumption that beam search prunes on prefix probabilities and that the top-B condition is a useful proxy; no free parameters are introduced and no new entities are postulated.

axioms (1)
  • domain assumption Beam search decoding prunes candidates based on cumulative prefix probability
    Invoked when describing the pruning risk in the abstract and motivation sections.

pith-pipeline@v0.9.0 · 5557 in / 1100 out tokens · 26240 ms · 2026-05-16T09:22:49.445962+00:00 · methodology

discussion (0)

Sign in with ORCID, Apple, or X to comment. Anyone can read and Pith papers without signing in.

Lean theorems connected to this paper

Citations machine-checked in the Pith Canon. Every link opens the source theorem in the public Lean library.

What do these tags mean?
matches
The paper's claim is directly supported by a theorem in the formal canon.
supports
The theorem supports part of the paper's argument, but the paper may add assumptions or extra steps.
extends
The paper goes beyond the formal theorem; the theorem is a base layer rather than the whole result.
uses
The paper appears to rely on the theorem as machinery.
contradicts
The paper's claim conflicts with a theorem or certificate in the canon.
unclear
Pith found a possible connection, but the passage is too broad, indirect, or ambiguous to say the theorem truly supports the claim.

Reference graph

Works this paper leans on

82 extracted references · 82 canonical work pages · 10 internal anchors

  1. [1]

    Josh Achiam, Steven Adler, Sandhini Agarwal, Lama Ahmad, Ilge Akkaya, Floren- cia Leoni Aleman, Diogo Almeida, Janko Altenschmidt, Sam Altman, Shyamal Anadkat, et al. 2023. Gpt-4 technical report.arXiv preprint arXiv:2303.08774 (2023)

  2. [2]

    Jinze Bai, Shuai Bai, Yunfei Chu, Zeyu Cui, Kai Dang, Xiaodong Deng, Yang Fan, Wenbin Ge, Yu Han, Fei Huang, et al. 2023. Qwen technical report.arXiv preprint arXiv:2309.16609(2023)

  3. [3]

    Keqin Bao, Jizhi Zhang, Wenjie Wang, Yang Zhang, Zhengyi Yang, Yanchen Luo, Chong Chen, Fuli Feng, and Qi Tian. 2025. A bi-step grounding paradigm for large language models in recommendation systems.ACM Transactions on Recommender Systems3, 4 (2025), 1–27

  4. [4]

    Keqin Bao, Jizhi Zhang, Yang Zhang, Xinyue Huo, Chong Chen, and Fuli Feng

  5. [5]

    InProceedings of the 2024 Conference on Empirical Methods in Natural Language Processing

    Decoding matters: Addressing amplification bias and homogeneity issue in recommendations for large language models. InProceedings of the 2024 Conference on Empirical Methods in Natural Language Processing. 10540–10552

  6. [6]

    Keqin Bao, Jizhi Zhang, Yang Zhang, Wenjie Wang, Fuli Feng, and Xiangnan He. 2023. Tallrec: An effective and efficient tuning framework to align large language model with recommendation. InProceedings of the 17th ACM conference on recommender systems. 1007–1014

  7. [7]

    Tom Brown, Benjamin Mann, Nick Ryder, Melanie Subbiah, Jared D Kaplan, Prafulla Dhariwal, Arvind Neelakantan, Pranav Shyam, Girish Sastry, Amanda Askell, et al. 2020. Language models are few-shot learners.Advances in neural information processing systems33 (2020), 1877–1901

  8. [8]

    Guohao Cai, Jieming Zhu, Quanyu Dai, Zhenhua Dong, Xiuqiang He, Ruiming Tang, and Rui Zhang. 2022. Reloop: A self-correction continual learning loop for recommender systems. InProceedings of the 45th International ACM SIGIR Conference on Research and Development in Information Retrieval. 2692–2697

  9. [9]

    Jianxin Chang, Chen Gao, Yu Zheng, Yiqun Hui, Yanan Niu, Yang Song, Depeng Jin, and Yong Li. 2021. Sequential recommendation with graph neural networks. InProceedings of the 44th international ACM SIGIR conference on research and development in information retrieval. 378–387

  10. [10]

    Xu Chen, Hongteng Xu, Yongfeng Zhang, Jiaxi Tang, Yixin Cao, Zheng Qin, and Hongyuan Zha. 2018. Sequential recommendation with user memory networks. InProceedings of the eleventh ACM international conference on web search and data mining. 108–116

  11. [11]

    Yuxin Chen, Junfei Tan, An Zhang, Zhengyi Yang, Leheng Sheng, Enzhi Zhang, Xiang Wang, and Tat-Seng Chua. 2024. On softmax direct preference optimization for recommendation.Advances in Neural Information Processing Systems37 (2024), 27463–27489

  12. [12]

    Yu Cui, Feng Liu, Pengbo Wang, Bohao Wang, Heng Tang, Yi Wan, Jun Wang, and Jiawei Chen. 2024. Distillation matters: empowering sequential recommenders to match the performance of large language models. InProceedings of the 18th ACM Conference on Recommender Systems. 507–517

  13. [13]

    Zeyu Cui, Jianxin Ma, Chang Zhou, Jingren Zhou, and Hongxia Yang. 2022. M6-rec: Generative pretrained language models are open-ended recommender systems.arXiv preprint arXiv:2205.08084(2022)

  14. [14]

    Sunhao Dai, Ninglu Shao, Haiyuan Zhao, Weijie Yu, Zihua Si, Chen Xu, Zhongx- iang Sun, Xiao Zhang, and Jun Xu. 2023. Uncovering chatgpt’s capabilities in recommender systems. InProceedings of the 17th ACM Conference on Recom- mender Systems. 1126–1132

  15. [15]

    Markus Freitag and Yaser Al-Onaizan. 2017. Beam Search Strategies for Neural Machine Translation. InProceedings of the First Workshop on Neural Machine Translation. 56–60

  16. [16]

    Chongming Gao, Ruijun Chen, Shuai Yuan, Kexin Huang, Yuanqing Yu, and Xiangnan He. 2025. Sprec: Self-play to debias llm-based recommendation. In Proceedings of the ACM on Web Conference 2025. 5075–5084

  17. [17]

    Yunfan Gao, Tao Sheng, Youlin Xiang, Yun Xiong, Haofen Wang, and Jiawei Zhang. 2023. Chat-rec: Towards interactive and explainable llms-augmented recommender system.arXiv preprint arXiv:2303.14524(2023)

  18. [18]

    Shijie Geng, Shuchang Liu, Zuohui Fu, Yingqiang Ge, and Yongfeng Zhang. 2022. Recommendation as language processing (rlp): A unified pretrain, personalized prompt & predict paradigm (p5). InProceedings of the 16th ACM conference on recommender systems. 299–315

  19. [19]

    Fabian Gloeckle, Badr Youbi Idrissi, Baptiste Rozière, David Lopez-Paz, and Gabriel Synnaeve. 2024. Better & faster large language models via multi-token prediction.arXiv preprint arXiv:2404.19737(2024)

  20. [20]

    Aaron Grattafiori, Abhimanyu Dubey, Abhinav Jauhri, Abhinav Pandey, Abhishek Kadian, Ahmad Al-Dahle, Aiesha Letman, Akhil Mathur, Alan Schelten, Alex Vaughan, et al. 2024. The llama 3 herd of models.arXiv preprint arXiv:2407.21783 (2024)

  21. [21]

    Ruining He and Julian McAuley. 2016. Ups and downs: Modeling the visual evolution of fashion trends with one-class collaborative filtering. Inproceedings of the 25th international conference on world wide web. 507–517

  22. [22]

    Balázs Hidasi, Alexandros Karatzoglou, Linas Baltrunas, and Domonkos Tikk

  23. [23]

    Session-based recommendations with recurrent neural networks.arXiv preprint arXiv:1511.06939(2015)

  24. [24]

    Sepp Hochreiter and Jürgen Schmidhuber. 1997. Long short-term memory.Neural computation9, 8 (1997), 1735–1780

  25. [25]

    Yupeng Hou, Junjie Zhang, Zihan Lin, Hongyu Lu, Ruobing Xie, Julian McAuley, and Wayne Xin Zhao. 2024. Large language models are zero-shot rankers for recommender systems. InEuropean Conference on Information Retrieval. Springer, 364–381

  26. [26]

    Edward J Hu, Yelong Shen, Phillip Wallis, Zeyuan Allen-Zhu, Yuanzhi Li, Shean Wang, Lu Wang, Weizhu Chen, et al. 2022. Lora: Low-rank adaptation of large language models.ICLR1, 2 (2022), 3. BEAR: Towards Beam-Search-Aware Optimization for Recommendation with Large Language Models Preprint, 2026, Hangzhou, China

  27. [27]

    Wang-Cheng Kang and Julian McAuley. 2018. Self-attentive sequential recom- mendation. In2018 IEEE international conference on data mining (ICDM). IEEE, 197–206

  28. [28]

    Sein Kim, Hongseok Kang, Seungyoon Choi, Donghyun Kim, Minchul Yang, and Chanyoung Park. 2024. Large language models meet collaborative filtering: An efficient all-round llm-based recommender system. InProceedings of the 30th ACM SIGKDD Conference on Knowledge Discovery and Data Mining. 1395–1406

  29. [29]

    Alex Krizhevsky, Ilya Sutskever, and Geoffrey E Hinton. 2012. Imagenet classifi- cation with deep convolutional neural networks.Advances in neural information processing systems25 (2012)

  30. [30]

    Jiacheng Li, Yujie Wang, and Julian McAuley. 2020. Time interval aware self- attention for sequential recommendation. InProceedings of the 13th international conference on web search and data mining. 322–330

  31. [31]

    Xiaoxi Li, Jiajie Jin, Yujia Zhou, Yuyao Zhang, Peitian Zhang, Yutao Zhu, and Zhicheng Dou. 2025. From matching to generation: A survey on generative information retrieval.ACM Transactions on Information Systems43, 3 (2025), 1–62

  32. [32]

    Xinyi Li, Yongfeng Zhang, and Edward C Malthouse. 2023. Exploring fine-tuning chatgpt for news recommendation.arXiv preprint arXiv:2311.05850(2023)

  33. [33]

    Jiayi Liao, Xiangnan He, Ruobing Xie, Jiancan Wu, Yancheng Yuan, Xingwu Sun, Zhanhui Kang, and Xiang Wang. 2024. Rosepo: Aligning llm-based recommenders with human values.arXiv preprint arXiv:2410.12519(2024)

  34. [34]

    Jiayi Liao, Sihang Li, Zhengyi Yang, Jiancan Wu, Yancheng Yuan, Xiang Wang, and Xiangnan He. 2024. Llara: Large language-recommendation assistant. In Proceedings of the 47th International ACM SIGIR Conference on Research and Development in Information Retrieval. 1785–1795

  35. [35]

    Zijie Lin, Yang Zhang, Xiaoyan Zhao, Fengbin Zhu, Fuli Feng, and Tat-Seng Chua. 2025. IGD: Token Decisiveness Modeling via Information Gain in LLMs for Personalized Recommendation.arXiv preprint arXiv:2506.13229(2025)

  36. [36]

    Enze Liu, Bowen Zheng, Xiaolei Wang, Wayne Xin Zhao, Jinpeng Wang, Sheng Chen, and Ji-Rong Wen. 2025. LARES: Latent Reasoning for Sequential Recom- mendation.arXiv preprint arXiv:2505.16865(2025)

  37. [37]

    Junling Liu, Chao Liu, Peilin Zhou, Renjie Lv, Kang Zhou, and Yan Zhang

  38. [38]

    Is chatgpt a good recommender? a preliminary study.arXiv preprint arXiv:2304.10149(2023)

  39. [39]

    Qijiong Liu, Nuo Chen, Tetsuya Sakai, and Xiao-Ming Wu. 2024. Once: Boosting content-based recommendation with both open-and closed-source large language models. InProceedings of the 17th ACM International Conference on Web Search and Data Mining. 452–461

  40. [40]

    Ze Liu, Jin Zhang, Chao Feng, Defu Lian, Jie Wang, and Enhong Chen. 2024. Learning Deep Tree-based Retriever for Efficient Recommendation: Theory and Method.arXiv preprint arXiv:2408.11345(2024)

  41. [41]

    Ilya Loshchilov and Frank Hutter. 2017. Decoupled weight decay regularization. arXiv preprint arXiv:1711.05101(2017)

  42. [42]

    Sijin Lu, Zhibo Man, Fangyuan Luo, and Jun Wu. 2025. Dual Debiasing in LLM-based Recommendation. InProceedings of the 48th International ACM SIGIR Conference on Research and Development in Information Retrieval. 2685–2689

  43. [43]

    Julian McAuley, Christopher Targett, Qinfeng Shi, and Anton Van Den Hengel

  44. [44]

    InProceedings of the 38th international ACM SIGIR conference on research and development in information retrieval

    Image-based recommendations on styles and substitutes. InProceedings of the 38th international ACM SIGIR conference on research and development in information retrieval. 43–52

  45. [45]

    Fei Mi, Xiaoyu Lin, and Boi Faltings. 2020. Ader: Adaptively distilled exem- plar replay towards continual learning for session-based recommendation. In Proceedings of the 14th ACM Conference on Recommender Systems. 408–413

  46. [46]

    Long Ouyang, Jeffrey Wu, Xu Jiang, Diogo Almeida, Carroll Wainwright, Pamela Mishkin, Chong Zhang, Sandhini Agarwal, Katarina Slama, Alex Ray, et al. 2022. Training language models to follow instructions with human feedback.Advances in neural information processing systems35 (2022), 27730–27744

  47. [47]

    Weizhen Qi, Yeyun Gong, Yu Yan, Jian Jiao, Bo Shao, Ruofei Zhang, Houqiang Li, Nan Duan, and Ming Zhou. 2020. Prophetnet-ads: A looking ahead strategy for generative retrieval models in sponsored search engine. InCCF International Conference on Natural Language Processing and Chinese Computing. Springer, 305–317

  48. [48]

    Alec Radford, Jeffrey Wu, Rewon Child, David Luan, Dario Amodei, Ilya Sutskever, et al. 2019. Language models are unsupervised multitask learners.OpenAI blog 1, 8 (2019), 9

  49. [49]

    Rafael Rafailov, Archit Sharma, Eric Mitchell, Christopher D Manning, Stefano Ermon, and Chelsea Finn. 2023. Direct preference optimization: Your language model is secretly a reward model.Advances in neural information processing systems36 (2023), 53728–53741

  50. [50]

    Pranav Rajpurkar, Jian Zhang, Konstantin Lopyrev, and Percy Liang. 2016. SQuAD: 100,000+ Questions for Machine Comprehension of Text. InProceed- ings of the 2016 Conference on Empirical Methods in Natural Language Processing. 2383–2392

  51. [51]

    Wentao Shi, Xiangnan He, Yang Zhang, Chongming Gao, Xinyue Li, Jizhi Zhang, Qifan Wang, and Fuli Feng. 2024. Large language models are learnable planners for long-term recommendation. InProceedings of the 47th International ACM SIGIR Conference on Research and Development in Information Retrieval. 1893–1903

  52. [52]

    Fei Sun, Jun Liu, Jian Wu, Changhua Pei, Xiao Lin, Wenwu Ou, and Peng Jiang

  53. [53]

    InProceedings of the 28th ACM international conference on information and knowledge management

    BERT4Rec: Sequential recommendation with bidirectional encoder rep- resentations from transformer. InProceedings of the 28th ACM international conference on information and knowledge management. 1441–1450

  54. [54]

    Zhongxiang Sun, Zihua Si, Xiaoxue Zang, Kai Zheng, Yang Song, Xiao Zhang, and Jun Xu. 2024. Large language models enhanced collaborative filtering. InPro- ceedings of the 33rd ACM International Conference on Information and Knowledge Management. 2178–2188

  55. [55]

    Jiakai Tang, Sunhao Dai, Teng Shi, Jun Xu, Xu Chen, Wen Chen, Jian Wu, and Yuning Jiang. 2025. Think before recommend: Unleashing the latent reasoning power for sequential recommendation.arXiv preprint arXiv:2503.22675(2025)

  56. [56]

    Jiaxi Tang and Ke Wang. 2018. Personalized top-n sequential recommenda- tion via convolutional sequence embedding. InProceedings of the eleventh ACM international conference on web search and data mining. 565–573

  57. [57]

    Hugo Touvron, Thibaut Lavril, Gautier Izacard, Xavier Martinet, Marie-Anne Lachaux, Timothée Lacroix, Baptiste Rozière, Naman Goyal, Eric Hambro, Faisal Azhar, et al. 2023. Llama: Open and efficient foundation language models.arXiv preprint arXiv:2302.13971(2023)

  58. [58]

    2013.The nature of statistical learning theory

    Vladimir Vapnik. 2013.The nature of statistical learning theory. Springer science & business media

  59. [59]

    Ashish Vaswani, Noam Shazeer, Niki Parmar, Jakob Uszkoreit, Llion Jones, Aidan N Gomez, Łukasz Kaiser, and Illia Polosukhin. 2017. Attention is all you need.Advances in neural information processing systems30 (2017)

  60. [60]

    Ashwin Vijayakumar, Michael Cogswell, Ramprasaath Selvaraju, Qing Sun, Stefan Lee, David Crandall, and Dhruv Batra. 2018. Diverse beam search for improved description of complex scenes. InProceedings of the AAAI Conference on Artificial Intelligence, Vol. 32

  61. [61]

    Bohao Wang, Feng Liu, Jiawei Chen, Xingyu Lou, Changwang Zhang, Jun Wang, Yuegang Sun, Yan Feng, Chun Chen, and Can Wang. 2025. Msl: Not all tokens are what you need for tuning llm as a recommender. InProceedings of the 48th International ACM SIGIR Conference on Research and Development in Information Retrieval. 1912–1922

  62. [62]

    Bohao Wang, Feng Liu, Changwang Zhang, Jiawei Chen, Yudi Wu, Sheng Zhou, Xingyu Lou, Jun Wang, Yan Feng, Chun Chen, and Can Wang. 2025. LLM4DSR: Leveraging Large Language Model for Denoising Sequential Recommendation. ACM Transactions on Information Systems(Aug. 2025). Just Accepted

  63. [63]

    Lei Wang and Ee-Peng Lim. 2023. Zero-shot next-item recommendation using large pretrained language models.arXiv preprint arXiv:2304.03153(2023)

  64. [64]

    Zhefan Wang, Weizhi Ma, and Min Zhang. 2024. To recommend or not: Recom- mendability identification in conversations with pre-trained language models. In International Conference on Database Systems for Advanced Applications. Springer, 19–35

  65. [65]

    Jason Wei, Xuezhi Wang, Dale Schuurmans, Maarten Bosma, Fei Xia, Ed Chi, Quoc V Le, Denny Zhou, et al. 2022. Chain-of-thought prompting elicits reasoning in large language models.Advances in neural information processing systems35 (2022), 24824–24837

  66. [66]

    Wei Wei, Chao Huang, Lianghao Xia, and Chuxu Zhang. 2023. Multi-modal self-supervised learning for recommendation. InProceedings of the ACM web conference 2023. 790–800

  67. [67]

    Jiancan Wu, Xiang Wang, Xingyu Gao, Jiawei Chen, Hongcheng Fu, and Tianyu Qiu. 2024. On the effectiveness of sampled softmax loss for item recommendation. ACM Transactions on Information Systems42, 4 (2024), 1–26

  68. [68]

    Shiguang Wu, Zhaochun Ren, Xin Xin, Jiyuan Yang, Mengqi Zhang, Zhumin Chen, Maarten de Rijke, and Pengjie Ren. 2025. Constrained Auto-Regressive Decoding Constrains Generative Retrieval. InProceedings of the 48th International ACM SIGIR Conference on Research and Development in Information Retrieval. 2429–2440

  69. [69]

    Xue Xia, Pong Eksombatchai, Nikil Pancha, Dhruvil Deven Badani, Po-Wei Wang, Neng Gu, Saurabh Vishwas Joshi, Nazanin Farahpour, Zhiyuan Zhang, and An- drew Zhai. 2023. Transact: Transformer-based realtime user action model for recommendation at pinterest. InProceedings of the 29th ACM SIGKDD Conference on Knowledge Discovery and Data Mining. 5249–5259

  70. [70]

    An Yang, Beichen Zhang, Binyuan Hui, Bofei Gao, Bowen Yu, Chengpeng Li, Dayiheng Liu, Jianhong Tu, Jingren Zhou, Junyang Lin, et al. 2024. Qwen2. 5- math technical report: Toward mathematical expert model via self-improvement. arXiv preprint arXiv:2409.12122(2024)

  71. [71]

    Weiqin Yang, Jiawei Chen, Xin Xin, Sheng Zhou, Binbin Hu, Yan Feng, Chun Chen, and Can Wang. 2024. PSL: Rethinking and Improving Softmax Loss from Pairwise Perspective for Recommendation.Advances in Neural Information Processing Systems37 (2024), 120974–121006

  72. [72]

    Weiqin Yang, Jiawei Chen, Shengjia Zhang, Peng Wu, Yuegang Sun, Yan Feng, Chun Chen, and Can Wang. 2025. Breaking the Top-K Barrier: Advancing Top-K Ranking Metrics Optimization in Recommender Systems. InProceedings of the 31st ACM SIGKDD Conference on Knowledge Discovery and Data Mining V. 2. 3542–3552

  73. [73]

    Zhengyi Yang, Xiangnan He, Jizhi Zhang, Jiancan Wu, Xin Xin, Jiawei Chen, and Xiang Wang. 2023. A generic learning framework for sequential recommenda- tion with distribution shifts. InProceedings of the 46th International ACM SIGIR Preprint, 2026, Hangzhou, China Weiqin Yang et al. Conference on Research and Development in Information Retrieval. 331–340

  74. [74]

    Hansi Zeng, Chen Luo, Bowen Jin, Sheikh Muhammad Sarwar, Tianxin Wei, and Hamed Zamani. 2024. Scalable and effective generative information retrieval. In Proceedings of the ACM Web Conference 2024. 1441–1452

  75. [75]

    Hansi Zeng, Chen Luo, and Hamed Zamani. 2024. Planning ahead in generative retrieval: Guiding autoregressive generation through simultaneous decoding. InProceedings of the 47th International ACM SIGIR Conference on Research and Development in Information Retrieval. 469–480

  76. [76]

    Jiaqi Zhai, Lucy Liao, Xing Liu, Yueming Wang, Rui Li, Xuan Cao, Leon Gao, Zhao- jie Gong, Fangda Gu, Michael He, et al. 2024. Actions speak louder than words: Trillion-parameter sequential transducers for generative recommendations.arXiv preprint arXiv:2402.17152(2024)

  77. [77]

    Junjie Zhang, Ruobing Xie, Yupeng Hou, Xin Zhao, Leyu Lin, and Ji-Rong Wen

  78. [78]

    Recommendation as instruction following: A large language model em- powered recommendation approach.ACM Transactions on Information Systems 43, 5 (2025), 1–37

  79. [79]

    Yang Zhang, Juntao You, Yimeng Bai, Jizhi Zhang, Keqin Bao, Wenjie Wang, and Tat-Seng Chua. 2024. Causality-enhanced behavior sequence modeling in LLMs for personalized recommendation.arXiv preprint arXiv:2410.22809(2024)

  80. [80]

    Xin Zhou, Hongyu Zhou, Yong Liu, Zhiwei Zeng, Chunyan Miao, Pengwei Wang, Yuan You, and Feijun Jiang. 2023. Bootstrap latent representations for multi- modal recommendation. InProceedings of the ACM web conference 2023. 845–854

Showing first 80 references.