Recognition: 2 theorem links
· Lean TheoremActions Speak Louder than Words: Trillion-Parameter Sequential Transducers for Generative Recommendations
Pith reviewed 2026-05-13 19:34 UTC · model grok-4.3
The pith
Reformulating recommendations as generative sequential transduction with the HSTU architecture lets models scale to 1.5 trillion parameters following power-law improvements in compute.
A machine-rendered reading of the paper's core claim, the machinery that carries it, and where it could break.
Core claim
Generative Recommenders built on HSTU achieve up to 65.8 percent higher NDCG than baselines on public and synthetic data, run 5.3x to 15.2x faster than FlashAttention2 transformers on long sequences, and when scaled to 1.5 trillion parameters deliver 12.4 percent metric lifts in live A/B tests while exhibiting power-law quality gains with compute through the GPT-3/LLaMa-2 regime.
What carries the argument
HSTU, a transformer-style architecture specialized for high-cardinality non-stationary streaming recommendation data that performs the core sequential transduction step inside the generative modeling framework.
If this is right
- Future recommendation models can be improved primarily by scaling training compute rather than by hand-engineering new feature interactions.
- Production systems can host trillion-parameter models on surfaces used by billions of daily users while still meeting latency constraints.
- Development cycles for new surfaces require fewer manual iterations because quality improves predictably with additional compute.
- Carbon cost per incremental quality gain drops because the same scaling curve applies across three orders of magnitude.
Where Pith is reading between the lines
- The observed power-law suggests recommendation systems may support the same pre-training and fine-tuning paradigm now common in language models.
- High-cardinality sequential data in other domains such as advertising or content moderation could adopt the same generative transduction framing.
- If the scaling continues, the field could converge on a small number of foundational recommendation backbones rather than many task-specific DLRMs.
Load-bearing premise
That casting recommendation as generative sequential transduction over action sequences with HSTU captures the essential user-behavior dynamics without creating artifacts absent from conventional DLRM training.
What would settle it
A head-to-head run in which an HSTU generative model trained on identical data and compute budget shows no metric advantage over a strong DLRM baseline, or larger-scale experiments that deviate from the reported power-law fit.
read the original abstract
Large-scale recommendation systems are characterized by their reliance on high cardinality, heterogeneous features and the need to handle tens of billions of user actions on a daily basis. Despite being trained on huge volume of data with thousands of features, most Deep Learning Recommendation Models (DLRMs) in industry fail to scale with compute. Inspired by success achieved by Transformers in language and vision domains, we revisit fundamental design choices in recommendation systems. We reformulate recommendation problems as sequential transduction tasks within a generative modeling framework ("Generative Recommenders"), and propose a new architecture, HSTU, designed for high cardinality, non-stationary streaming recommendation data. HSTU outperforms baselines over synthetic and public datasets by up to 65.8% in NDCG, and is 5.3x to 15.2x faster than FlashAttention2-based Transformers on 8192 length sequences. HSTU-based Generative Recommenders, with 1.5 trillion parameters, improve metrics in online A/B tests by 12.4% and have been deployed on multiple surfaces of a large internet platform with billions of users. More importantly, the model quality of Generative Recommenders empirically scales as a power-law of training compute across three orders of magnitude, up to GPT-3/LLaMa-2 scale, which reduces carbon footprint needed for future model developments, and further paves the way for the first foundational models in recommendations.
Editorial analysis
A structured set of objections, weighed in public.
Referee Report
Summary. The paper reformulates recommendation systems as generative sequential transduction tasks and introduces the HSTU architecture tailored to high-cardinality, non-stationary streaming data. It reports up to 65.8% NDCG gains over baselines on synthetic and public datasets, 5.3x–15.2x faster inference than FlashAttention-2 Transformers on 8192-length sequences, deployment of a 1.5-trillion-parameter HSTU-based model yielding 12.4% metric lifts in online A/B tests on a large platform, and empirical power-law scaling of model quality with training compute across three orders of magnitude up to GPT-3/LLaMA-2 scale.
Significance. If the scaling behavior and A/B gains hold under controlled conditions, the work would demonstrate that recommendation models can follow compute-driven scaling laws analogous to those in language modeling, potentially enabling foundational recommendation models while lowering the carbon cost of future development. The reported real-world deployment on surfaces serving billions of users provides direct evidence of practical impact.
major comments (3)
- [Abstract] Abstract: the claim that Generative Recommenders 'empirically scales as a power-law of training compute across three orders of magnitude' lacks an explicit definition of the compute axis (FLOPs, effective tokens, or wall-clock GPU-hours) and supplies no tabulated scaling points with error bars or side-by-side curves for compute-matched DLRM or standard Transformer baselines trained on the identical recommendation stream; without these controls the architectural contribution to the observed scaling cannot be isolated from raw capacity or data-volume effects.
- [Experiments] Experiments section (synthetic and public dataset results): the reported 65.8% NDCG improvement and 5.3x–15.2x speedups are presented without ablation studies that hold model size, data volume, and training regime fixed while varying only the generative transduction reformulation versus standard DLRM or Transformer baselines; this leaves open whether the gains arise from the HSTU design or from differences in training scale and data.
- [Online A/B Tests] Online A/B tests paragraph: the 12.4% metric improvement for the 1.5 T parameter model is stated without naming the precise metrics (e.g., NDCG@K, CTR), the surfaces involved, or statistical significance and confidence intervals, which are load-bearing for the deployment claim.
minor comments (1)
- [Abstract] Abstract: the datasets used for the 65.8% NDCG result are described only as 'synthetic and public' without explicit names or references; adding these details would improve reproducibility.
Simulated Author's Rebuttal
We thank the referee for the detailed and constructive feedback on our manuscript. We address each of the major comments below and outline the revisions we plan to make to strengthen the paper.
read point-by-point responses
-
Referee: [Abstract] Abstract: the claim that Generative Recommenders 'empirically scales as a power-law of training compute across three orders of magnitude' lacks an explicit definition of the compute axis (FLOPs, effective tokens, or wall-clock GPU-hours) and supplies no tabulated scaling points with error bars or side-by-side curves for compute-matched DLRM or standard Transformer baselines trained on the identical recommendation stream; without these controls the architectural contribution to the observed scaling cannot be isolated from raw capacity or data-volume effects.
Authors: We agree that clarifying the compute axis and providing more detailed scaling analysis would strengthen the paper. In the revised version, we will explicitly define the compute axis as the number of training FLOPs. We will include a table listing the scaling points with associated error bars from multiple runs where available. Additionally, we will add plots comparing our model's scaling curve to compute-matched baselines where feasible, though we note that training full baselines at trillion-parameter scale on the same stream is computationally prohibitive, which is why we focused on our architecture's scaling behavior. This will help isolate the contributions. revision: yes
-
Referee: [Experiments] Experiments section (synthetic and public dataset results): the reported 65.8% NDCG improvement and 5.3x–15.2x speedups are presented without ablation studies that hold model size, data volume, and training regime fixed while varying only the generative transduction reformulation versus standard DLRM or Transformer baselines; this leaves open whether the gains arise from the HSTU design or from differences in training scale and data.
Authors: The comparisons in the experiments section are designed to evaluate the end-to-end performance of the generative reformulation with HSTU against standard DLRM and Transformer baselines under comparable training conditions on the same datasets. However, to directly address the concern, we will add ablation studies in the revised manuscript that fix model size, data volume, and training regime, isolating the effect of the generative transduction approach versus traditional setups. This will clarify the source of the improvements. revision: yes
-
Referee: [Online A/B Tests] Online A/B tests paragraph: the 12.4% metric improvement for the 1.5 T parameter model is stated without naming the precise metrics (e.g., NDCG@K, CTR), the surfaces involved, or statistical significance and confidence intervals, which are load-bearing for the deployment claim.
Authors: We acknowledge that additional details on the A/B tests would enhance transparency. Due to the proprietary nature of the platform and business considerations, we are limited in disclosing the exact surfaces and specific metric definitions. However, the 12.4% improvement refers to key user engagement metrics, and the tests were conducted with sufficient statistical power to achieve significance at p < 0.01. In the revision, we will provide more context on the metric types (e.g., ranking quality and click-through rates) and confidence intervals where possible without compromising confidentiality. We believe this supports the practical impact claim. revision: partial
Circularity Check
No significant circularity: empirical scaling and architecture claims rest on external benchmarks
full rationale
The paper's central claims involve reformulating recommendations as generative sequential transduction tasks and introducing the HSTU architecture, with quality improvements and power-law scaling reported from direct experiments on synthetic, public, and production A/B test data. No load-bearing steps reduce predictions to fitted inputs by construction, invoke self-citations for uniqueness theorems, or smuggle ansatzes via prior work; the scaling observation is presented as an empirical pattern measured across compute regimes rather than derived from self-referential definitions. The derivation chain is therefore self-contained against the enumerated circularity patterns.
Axiom & Free-Parameter Ledger
axioms (1)
- domain assumption Recommendation problems can be reformulated as sequential transduction tasks within a generative modeling framework
Lean theorems connected to this paper
-
IndisputableMonolith.Foundation.PhiForcingphi_equation unclearHSTU-based Generative Recommenders, with 1.5 trillion parameters, improve metrics in online A/B tests by 12.4% and have been deployed on multiple surfaces of a large internet platform with billions of users. More importantly, the model quality of Generative Recommenders empirically scales as a power-law of training compute across three orders of magnitude, up to GPT-3/LLaMa-2 scale.
-
IndisputableMonolith.Foundation.LedgerForcingconservation_from_balance unclearWe reformulate recommendation problems as sequential transduction tasks within a generative modeling framework (Generative Recommenders), and propose a new architecture, HSTU, designed for high cardinality, non-stationary streaming recommendation data.
Forward citations
Cited by 22 Pith papers
-
UniRank: Unified List-wise Reranking via Confidence-Ordered Denoising
UniRank unifies autoregressive and non-autoregressive list-wise reranking via bidirectional modeling in a confidence-ordered iterative denoising process, outperforming baselines on datasets and online tests.
-
Sample Is Feature: Beyond Item-Level, Toward Sample-Level Tokens for Unified Large Recommender Models
SIF encodes full historical raw samples as tokens via hierarchical quantization to preserve sample context and unify sequential/non-sequential features in large recommender models.
-
GenRec: A Preference-Oriented Generative Framework for Large-Scale Recommendation
GenRec combines page-wise NTP, token compression, and GRPO-SR reinforcement learning to scale generative retrieval, delivering 9.5% click and 8.7% transaction gains in production A/B tests on the JD App.
-
IAT: Instance-As-Token Compression for Historical User Sequence Modeling in Industrial Recommender Systems
IAT compresses each historical interaction instance into a unified embedding token via temporal-order or user-order schemes, allowing standard sequence models to learn long-range preferences with better performance an...
-
Next-Scale Generative Reranking: A Tree-based Generative Rerank Method at Meituan
NSGR is a tree-structured generative reranker that progressively generates optimal lists via next-scale expansion and multi-scale neighbor loss to balance perspectives and align training signals.
-
Tencent Advertising Algorithm Challenge 2025: All-Modality Generative Recommendation
Releases TencentGR-1M and TencentGR-10M datasets with baselines for all-modality generative recommendation in advertising, including weighted evaluation for conversions.
-
Conditional Memory Enhanced Item Representation for Generative Recommendation
ComeIR introduces dual-level Engram memory and memory-restoring prediction to reconstruct SID-token embeddings and restore token granularity in generative recommendation.
-
UxSID: Semantic-Aware User Interests Modeling for Ultra-Long Sequence
UxSID uses Semantic IDs and dual-level attention for semantic-group shared interest memory to efficiently model ultra-long user sequences, claiming SOTA performance and 0.337% revenue lift in advertising A/B tests.
-
An Embarrassingly Simple Graph Heuristic Reveals Shortcut-Solvable Benchmarks for Sequential Recommendation
A simple graph heuristic without training or sequence encoders matches or outperforms trained generative recommenders on 10 of 14 sequential recommendation benchmarks by exploiting local transition and feature shortcuts.
-
Bridging Textual Profiles and Latent User Embeddings for Personalization
BLUE aligns LLM-generated textual user profiles with embedding-based recommendation objectives via reinforcement learning and next-item text supervision, yielding better zero-shot performance and cross-domain transfer...
-
CapsID: Soft-Routed Variable-Length Semantic IDs for Generative Recommendation
CapsID uses probabilistic capsule routing and confidence-based termination to generate variable-length semantic IDs, improving recall by 9.6% over strong baselines with half the latency of dual-representation systems.
-
Position-Aware Drafting for Inference Acceleration in LLM-Based Generative List-Wise Recommendation
PAD-Rec augments standard draft models with item-position and step-position embeddings plus learnable gates, delivering up to 3.1x wall-clock speedup and 5% average gain over strong speculative-decoding baselines on f...
-
RoTE: Coarse-to-Fine Multi-Level Rotary Time Embedding for Sequential Recommendation
RoTE is a multi-level rotary time embedding module that explicitly models time spans in sequential recommendation and improves NDCG@5 by up to 20.11% when added to standard backbones on public benchmarks.
-
UniRec: Bridging the Expressive Gap between Generative and Discriminative Recommendation via Chain-of-Attribute
UniRec bridges the expressive gap in generative recommendation by prefixing semantic ID sequences with structured attribute tokens, recovering explicit feature crossing and yielding +22.6% HR@50 gains plus online lift...
-
MBGR: Multi-Business Prediction for Generative Recommendation at Meituan
MBGR is a new generative recommendation framework using business-aware semantic IDs, multi-business prediction, and label dynamic routing to handle multiple businesses without seesaw effects or representation confusio...
-
TwiSTAR:Think Fast, Think Slow, Then Act,Generative Recommendation with Adaptive Reasoning
TwiSTAR learns to switch between fast SID retrieval and slow rationale-generating reasoning in generative recommendation, yielding better accuracy-latency trade-offs on three datasets.
-
UxSID: Semantic-Aware User Interests Modeling for Ultra-Long Sequence
UxSID introduces semantic-group shared interest memory with Semantic IDs and dual-level attention to model ultra-long user sequences, claiming state-of-the-art results and a 0.337% revenue lift in advertising A/B tests.
-
PACEvolve++: Improving Test-time Learning for Evolutionary Search Agents
PACEvolve++ uses a phase-adaptive reinforcement learning advisor to decouple hypothesis selection from execution in LLM-driven evolutionary search, delivering faster convergence than prior frameworks on load balancing...
-
Harmonizing Generative Retrieval and Ranking in Chain-of-Recommendation
RecoChain unifies generative candidate generation via hierarchical semantic IDs and SIM-based ranking in a single Transformer to improve top-K recommendation performance.
-
PRAGMA: Revolut Foundation Model
PRAGMA pre-trains a Transformer on heterogeneous banking events with a tailored self-supervised masked objective, yielding embeddings that support strong downstream performance on credit scoring, fraud detection, and ...
-
A Cascaded Generative Approach for e-Commerce Recommendations
A cascaded generative system for e-commerce recommendations using theme and keyword generation with teacher-student fine-tuning achieves a 2.7% lift in cart adds per page view.
-
RecGPT-Mobile: On-Device Large Language Models for User Intent Understanding in Taobao Feed Recommendation
RecGPT-Mobile runs a compact LLM on phones to understand evolving user intent from behaviors and improve mobile e-commerce recommendations.
Reference graph
Works this paper leans on
-
[1]
and Ermon, Stefano and Rudra, Atri and R
Dao, Tri and Fu, Daniel Y. and Ermon, Stefano and Rudra, Atri and R. Flash. Advances in Neural Information Processing Systems , year=
-
[2]
FlashAttention-2: Faster Attention with Better Parallelism and Work Partitioning , author=. 2023 , eprint=
work page 2023
- [4]
-
[5]
COLD: Towards the Next Generation of Pre-Ranking System , author=. 2020 , eprint=
work page 2020
-
[6]
Asymmetric LSH (ALSH) for Sublinear Time Maximum Inner Product Search (MIPS) , volume =
Shrivastava, Anshumali and Li, Ping , booktitle =. Asymmetric LSH (ALSH) for Sublinear Time Maximum Inner Product Search (MIPS) , volume =
-
[8]
Chen Li and Chang, E. and Garcia-Molina, H. and Wiederhold, G. , journal=. Clustering for approximate similarity search in high-dimensional spaces , year=
-
[9]
Proceedings of the 2011 ACM SIGMOD International Conference on Management of Data , pages =
Zhai, Jiaqi and Lou, Yin and Gehrke, Johannes , title =. Proceedings of the 2011 ACM SIGMOD International Conference on Management of Data , pages =. 2011 , isbn =
work page 2011
-
[11]
DHEN: A Deep and Hierarchical Ensemble Network for Large-Scale Click-Through Rate Prediction , author=. 2022 , eprint=
work page 2022
- [12]
-
[13]
Proceedings of the 1st Workshop on Deep Learning for Recommender Systems , pages =
Cheng, Heng-Tze and Koc, Levent and Harmsen, Jeremiah and Shaked, Tal and Chandra, Tushar and Aradhye, Hrishi and Anderson, Glen and Corrado, Greg and Chai, Wei and Ispir, Mustafa and Anil, Rohan and Haque, Zakaria and Hong, Lichan and Jain, Vihan and Liu, Xiaobing and Shah, Hemal , title =. Proceedings of the 1st Workshop on Deep Learning for Recommender...
work page 2016
-
[15]
Modeling Task Relationships in Multi-task Learning with Multi-gate Mixture-of-Experts , author =. 2018 , location =
work page 2018
-
[16]
Training Deep Nets with Sublinear Memory Cost
Tianqi Chen and Bing Xu and Chiyuan Zhang and Carlos Guestrin , title =. CoRR , volume =. 2016 , url =. 1604.06174 , timestamp =
work page internal anchor Pith review Pith/arXiv arXiv 2016
-
[17]
Transformers Are RNNs: Fast Autoregressive Transformers with Linear Attention , year =
Katharopoulos, Angelos and Vyas, Apoorv and Pappas, Nikolaos and Fleuret, Fran. Transformers Are RNNs: Fast Autoregressive Transformers with Linear Attention , year =. Proceedings of the 37th International Conference on Machine Learning , articleno =
-
[18]
Linformer: Self-Attention with Linear Complexity , author=. 2020 , eprint=
work page 2020
-
[19]
The Best of Both Worlds: Combining Recent Advances in Neural Machine Translation
Chen, Mia Xu and Firat, Orhan and Bapna, Ankur and Johnson, Melvin and Macherey, Wolfgang and Foster, George and Jones, Llion and Schuster, Mike and Shazeer, Noam and Parmar, Niki and Vaswani, Ashish and Uszkoreit, Jakob and Kaiser, Lukasz and Chen, Zhifeng and Wu, Yonghui and Hughes, Macduff. The Best of Both Worlds: Combining Recent Advances in Neural M...
-
[20]
Breaking the Curse of Quality Saturation with User-Centric Ranking , author=. 2023 , eprint=
work page 2023
-
[21]
Efficiently Modeling Long Sequences with Structured State Spaces , booktitle =
Albert Gu and Karan Goel and Christopher R. Efficiently Modeling Long Sequences with Structured State Spaces , booktitle =. 2022 , url =
work page 2022
-
[22]
Bowen Peng and Jeffrey Quesnelle and Honglu Fan and Enrico Shippole , booktitle=. Ya. 2024 , url=
work page 2024
-
[23]
Zhou, Guorui and Zhu, Xiaoqiang and Song, Chenru and Fan, Ying and Zhu, Han and Ma, Xiao and Yan, Yanghui and Jin, Junqi and Li, Han and Gai, Kun , title =. 2018 , numpages =
work page 2018
-
[24]
Proceedings of the 29th ACM International Conference on Information & Knowledge Management , pages =
Pi, Qi and Zhou, Guorui and Zhang, Yujing and Wang, Zhe and Ren, Lejian and Fan, Ying and Zhu, Xiaoqiang and Gai, Kun , title =. Proceedings of the 29th ACM International Conference on Information & Knowledge Management , pages =. 2020 , isbn =. doi:10.1145/3340531.3412744 , abstract =
-
[25]
Language Models are Unsupervised Multitask Learners , author=
-
[26]
Language Models are Few-Shot Learners , author=
-
[27]
LLaMA: Open and Efficient Foundation Language Models , author=. 2023 , eprint=
work page 2023
-
[28]
Llama 2: Open Foundation and Fine-Tuned Chat Models , author=. 2023 , eprint=
work page 2023
-
[29]
Xinran He and Junfeng Pan and Ou Jin and Tianbing Xu and Bo Liu and Tao Xu and Yanxin Shi and Antoine Atallah and Ralf Herbrich and Stuart Bowers and Joaquin Quiñonero Candela , title =. ADKDD'14: Proceedings of the Eighth International Workshop on Data Mining for Online Advertising , year =
-
[30]
Reddi and Sanjiv Kumar , title =
Srinadh Bhojanapalli and Chulhee Yun and Ankit Singh Rawat and Sashank J. Reddi and Sanjiv Kumar , title =. Proceedings of the 37th International Conference on Machine Learning,. 2020 , url =
work page 2020
- [31]
-
[32]
Smith and Mike Lewis , title =
Ofir Press and Noah A. Smith and Mike Lewis , title =. The Tenth International Conference on Learning Representations,. 2022 , url =
work page 2022
- [33]
-
[34]
Vaswani, Ashish and Shazeer, Noam and Parmar, Niki and Uszkoreit, Jakob and Jones, Llion and Gomez, Aidan N. and Kaiser,. Attention is All You Need , year =. Proceedings of the 31st International Conference on Neural Information Processing Systems , pages =
- [36]
-
[37]
Weizhe Hua and Zihang Dai and Hanxiao Liu and Quoc V. Le , editor =. Transformer Quality in Linear Time , booktitle =. 2022 , url =
work page 2022
-
[38]
International Conference on Learning Representations , year=
Decoupled Weight Decay Regularization , author=. International Conference on Learning Representations , year=
-
[39]
TWIN: TWo-stage Interest Network for Lifelong User Behavior Modeling in CTR Prediction at Kuaishou , author=. 2023 , eprint=
work page 2023
-
[40]
Monolith: Real Time Recommendation System With Collisionless Embedding Table , author=. 2022 , eprint=
work page 2022
-
[41]
Proceedings of the 10th ACM Conference on Recommender Systems , pages =
Covington, Paul and Adams, Jay and Sargin, Emre , title =. Proceedings of the 10th ACM Conference on Recommender Systems , pages =. 2016 , isbn =
work page 2016
-
[42]
Proceedings of the 13th ACM Conference on Recommender Systems , pages=
Sampling-bias-corrected neural modeling for large corpus item recommendations , author=. Proceedings of the 13th ACM Conference on Recommender Systems , pages=
-
[43]
9th International Conference on Learning Representations,
Alexey Dosovitskiy and Lucas Beyer and Alexander Kolesnikov and Dirk Weissenborn and Xiaohua Zhai and Thomas Unterthiner and Mostafa Dehghani and Matthias Minderer and Georg Heigold and Sylvain Gelly and Jakob Uszkoreit and Neil Houlsby , title =. 9th International Conference on Learning Representations,. 2021 , url =
work page 2021
- [44]
-
[46]
Wang, Jizhe and Huang, Pipei and Zhao, Huan and Zhang, Zhibo and Zhao, Binqiang and Lee, Dik Lun , title =. Proceedings of the 24th ACM SIGKDD International Conference on Knowledge Discovery and Data Mining , pages =. 2018 , isbn =
work page 2018
-
[47]
Proceedings of the 2018 World Wide Web Conference , pages =
Eksombatchai, Chantat and Jindal, Pranav and Liu, Jerry Zitao and Liu, Yuchen and Sharma, Rahul and Sugnet, Charles and Ulrich, Mark and Leskovec, Jure , title =. Proceedings of the 2018 World Wide Web Conference , pages =. 2018 , isbn =
work page 2018
-
[48]
Zhu, Han and Li, Xiang and Zhang, Pengye and Li, Guozheng and He, Jie and Li, Han and Gai, Kun , title =. Proceedings of the 24th ACM SIGKDD International Conference on Knowledge Discovery and Data Mining , pages =. 2018 , isbn =
work page 2018
-
[49]
Proceedings of the 37th International Conference on Machine Learning , articleno =
Zhuo, Jingwei and Xu, Ziru and Dai, Wei and Zhu, Han and Li, Han and Xu, Jian and Gai, Kun , title =. Proceedings of the 37th International Conference on Machine Learning , articleno =. 2020 , publisher =
work page 2020
-
[50]
Gao, Weihao and Fan, Xiangjun and Wang, Chong and Sun, Jiankai and Jia, Kai and Xiao, Wenzi and Ding, Ruofan and Bin, Xingyan and Yang, Hui and Liu, Xiaobing , title =. Proceedings of the 30th ACM International Conference on Information and Knowledge Management , pages =. 2021 , isbn =
work page 2021
-
[51]
Proceedings of the 27th ACM SIGKDD Conference on Knowledge Discovery and Data Mining , pages =
Zhou, Chang and Ma, Jianxin and Zhang, Jianwei and Zhou, Jingren and Yang, Hongxia , title =. Proceedings of the 27th ACM SIGKDD Conference on Knowledge Discovery and Data Mining , pages =. 2021 , isbn =
work page 2021
-
[52]
Proceedings of the 37th International Conference on Machine Learning , articleno =
Xiong, Ruibin and Yang, Yunchang and He, Di and Zheng, Kai and Zheng, Shuxin and Xing, Chen and Zhang, Huishuai and Lan, Yanyan and Wang, Liwei and Liu, Tie-Yan , title =. Proceedings of the 37th International Conference on Machine Learning , articleno =. 2020 , publisher =
work page 2020
-
[53]
Fourteenth ACM Conference on Recommender Systems (RecSys'20) , pages =
Rendle, Steffen and Krichene, Walid and Zhang, Li and Anderson, John , title =. Fourteenth ACM Conference on Recommender Systems (RecSys'20) , pages =. 2020 , isbn =
work page 2020
-
[54]
arXiv preprint arXiv:2306.04039 , year=
Revisiting Neural Retrieval on Accelerators , author=. arXiv preprint arXiv:2306.04039 , year=
-
[56]
Proceedings of the 15th ACM Conference on Recommender Systems , pages =
A Case Study on Sampling Strategies for Evaluating Neural Sequential Item Recommendation Models , author =. Proceedings of the 15th ACM Conference on Recommender Systems , pages =. 2021 , isbn =
work page 2021
-
[58]
Session-based Recommendations with Recurrent Neural Networks , booktitle =
Bal. Session-based Recommendations with Recurrent Neural Networks , booktitle =. 2016 , url =
work page 2016
-
[59]
Liu, Langming and Cai, Liu and Zhang, Chi and Zhao, Xiangyu and Gao, Jingtong and Wang, Wanyu and Lv, Yifu and Fan, Wenqi and Wang, Yiqi and He, Ming and Liu, Zitao and Li, Qing , title =. Proceedings of the 46th International ACM SIGIR Conference on Research and Development in Information Retrieval , pages =. 2023 , isbn =. doi:10.1145/3539618.3591717 , ...
-
[60]
Proceedings of the 37th International Conference on Machine Learning , articleno =
Chen, Ting and Kornblith, Simon and Norouzi, Mohammad and Hinton, Geoffrey , title =. Proceedings of the 37th International Conference on Machine Learning , articleno =. 2020 , abstract =
work page 2020
-
[61]
Rajbhandari, Samyam and Rasley, Jeff and Ruwase, Olatunji and He, Yuxiong , title =. Proceedings of the International Conference for High Performance Computing, Networking, Storage and Analysis , articleno =. 2020 , isbn =
work page 2020
-
[62]
Reducing Activation Recomputation in Large Transformer Models , author=. 2022 , eprint=
work page 2022
-
[63]
Factorization Machines , year=
Rendle, Steffen , booktitle=. Factorization Machines , year=
-
[64]
Proceedings of the 26th International Joint Conference on Artificial Intelligence , pages =
Guo, Huifeng and Tang, Ruiming and Ye, Yunming and Li, Zhenguo and He, Xiuqiang , title =. Proceedings of the 26th International Joint Conference on Artificial Intelligence , pages =. 2017 , isbn =
work page 2017
-
[65]
Proceedings of the 26th International Joint Conference on Artificial Intelligence , pages =
Xiao, Jun and Ye, Hao and He, Xiangnan and Zhang, Hanwang and Wu, Fei and Chua, Tat-Seng , title =. Proceedings of the 26th International Joint Conference on Artificial Intelligence , pages =. 2017 , isbn =
work page 2017
-
[66]
Gillenwater, Jennifer and Kulesza, Alex and Fox, Emily and Taskar, Ben , title =. Proceedings of the 27th International Conference on Neural Information Processing Systems - Volume 2 , pages =. 2014 , publisher =
work page 2014
-
[69]
XLM-V: Overcoming the Vocabulary Bottleneck in Multilingual Masked Language Models , author=. 2023 , eprint=
work page 2023
-
[70]
Getting Deep Recommenders Fit: Bloom Embeddings for Sparse Binary Input/Output Networks , year =
Serr\`. Getting Deep Recommenders Fit: Bloom Embeddings for Sparse Binary Input/Output Networks , year =. Proceedings of the Eleventh ACM Conference on Recommender Systems , pages =. doi:10.1145/3109859.3109876 , abstract =
-
[72]
Large Language Models are Zero-Shot Rankers for Recommender Systems , booktitle =
Yupeng Hou and Junjie Zhang and Zihan Lin and Hongyu Lu and Ruobing Xie and Julian McAuley and Wayne Xin Zhao , year=. Large Language Models are Zero-Shot Rankers for Recommender Systems , booktitle =. 2305.08845 , archivePrefix=
-
[73]
RoFormer: Enhanced Transformer with Rotary Position Embedding , author=. 2023 , eprint=
work page 2023
- [74]
-
[75]
Zeyu Cui and Jianxin Ma and Chang Zhou and Jingren Zhou and Hongxia Yang , title =. CoRR , volume =. 2022 , url =. doi:10.48550/arXiv.2205.08084 , eprinttype =. 2205.08084 , timestamp =
-
[76]
2018 International Conference on Data Mining (ICDM) , pages=
Self-attentive sequential recommendation , author=. 2018 International Conference on Data Mining (ICDM) , pages=
work page 2018
-
[77]
Sun, Fei and Liu, Jun and Wu, Jian and Pei, Changhua and Lin, Xiao and Ou, Wenwu and Jiang, Peng , title =. Proceedings of the 28th ACM International Conference on Information and Knowledge Management , pages =. 2019 , isbn =
work page 2019
-
[81]
Transformer Memory as a Differentiable Search Index , volume =
Tay, Yi and Tran, Vinh and Dehghani, Mostafa and Ni, Jianmo and Bahri, Dara and Mehta, Harsh and Qin, Zhen and Hui, Kai and Zhao, Zhe and Gupta, Jai and Schuster, Tal and Cohen, William W and Metzler, Donald , booktitle =. Transformer Memory as a Differentiable Search Index , volume =
-
[83]
and Bengio, Samy and Weston, Jason , title =
Gupta, Maya R. and Bengio, Samy and Weston, Jason , title =. J. Mach. Learn. Res. , month =. 2014 , issue_date =
work page 2014
-
[87]
Thirty-seventh Conference on Neural Information Processing Systems , year=
Scaling Data-Constrained Language Models , author=. Thirty-seventh Conference on Neural Information Processing Systems , year=
-
[88]
Text is all you need: Learning language representations for sequential recommendation
Jiacheng Li and Ming Wang and Jin Li and Jinmiao Fu and Xin Shen and Jingbo Shang and Julian McAuley. Text is all you need: Learning language representations for sequential recommendation. KDD. 2023
work page 2023
-
[90]
A Bi-Step Grounding Paradigm for Large Language Models in Recommendation Systems , author=. 2023 , eprint=
work page 2023
-
[91]
M6-Rec: Generative Pretrained Language Models are Open-Ended Recommender Systems , author=. 2022 , eprint=
work page 2022
-
[93]
Dynamically Scaled RoPE further increases performance of long context LLaMA with zero fine-tuning , author=
-
[94]
Bao, K., Zhang, J., Zhang, Y., Wang, W., Feng, F., and He, X. Tallrec: An effective and efficient tuning framework to align large language model with recommendation. In Proceedings of the 17th ACM Conference on Recommender Systems, RecSys ’23. ACM, September 2023. doi:10.1145/3604915.3608857. URL http://dx.doi.org/10.1145/3604915.3608857
-
[95]
Brown, T. B., Mann, B., Ryder, N., Subbiah, M., Kaplan, J., Dhariwal, P., Neelakantan, A., Shyam, P., Sastry, G., Askell, A., Agarwal, S., Herbert-Voss, A., Krueger, G., Henighan, T., Child, R., Ramesh, A., Ziegler, D. M., Wu, J., Winter, C., Hesse, C., Chen, M., Sigler, E., Litwin, M., Gray, S., Chess, B., Clark, J., Berner, C., McCandlish, S., Radford, ...
work page 2020
-
[96]
Chang, J., Zhang, C., Fu, Z., Zang, X., Guan, L., Lu, J., Hui, Y., Leng, D., Niu, Y., Song, Y., and Gai, K. Twin: Two-stage interest network for lifelong user behavior modeling in ctr prediction at kuaishou, 2023
work page 2023
-
[97]
Behavior sequence transformer for e-commerce recommendation in alibaba
Chen, Q., Zhao, H., Li, W., Huang, P., and Ou, W. Behavior sequence transformer for e-commerce recommendation in alibaba. In Proceedings of the 1st International Workshop on Deep Learning Practice for High-Dimensional Sparse Data, DLP-KDD '19, New York, NY, USA, 2019. Association for Computing Machinery. ISBN 9781450367837. doi:10.1145/3326937.3341261. UR...
-
[98]
A simple framework for contrastive learning of visual representations
Chen, T., Kornblith, S., Norouzi, M., and Hinton, G. A simple framework for contrastive learning of visual representations. In Proceedings of the 37th International Conference on Machine Learning, ICML'20, 2020
work page 2020
-
[99]
Wide & deep learning for recommender systems
Cheng, H.-T., Koc, L., Harmsen, J., Shaked, T., Chandra, T., Aradhye, H., Anderson, G., Corrado, G., Chai, W., Ispir, M., Anil, R., Haque, Z., Hong, L., Jain, V., Liu, X., and Shah, H. Wide & deep learning for recommender systems. In Proceedings of the 1st Workshop on Deep Learning for Recommender Systems, DLRS 2016, pp.\ 7–10, 2016. ISBN 9781450347952
work page 2016
-
[100]
Generating Long Sequences with Sparse Transformers
Child, R., Gray, S., Radford, A., and Sutskever, I. Generating long sequences with sparse transformers. CoRR, abs/1904.10509, 2019. URL http://arxiv.org/abs/1904.10509
work page internal anchor Pith review Pith/arXiv arXiv 1904
discussion (0)
Sign in with ORCID, Apple, or X to comment. Anyone can read and Pith papers without signing in.