arxiv: 2604.08933 · v1 · submitted 2026-04-10 · 💻 cs.IR

Recognition: unknown

IAT: Instance-As-Token Compression for Historical User Sequence Modeling in Industrial Recommender Systems

Xinchun Li , Ning Zhang , Qianqian Yang , Fei Teng , Wenlin Zhao , Huizhi Yang , Heng Shi , Linlan Chen

show 6 more authors

Yixin Wu Zhen Wang Daiye Hou Fei Qin Lele Yu Yaocheng Tan

Authors on Pith no claims yet

Pith reviewed 2026-05-10 17:16 UTC · model grok-4.3

classification 💻 cs.IR

keywords recommender systemssequence modelinginstance compressionuser behavior modelingindustrial recommendertransferability

0 comments

The pith

Compressing each historical interaction into one token enables better sequence modeling in recommender systems.

A machine-rendered reading of the paper's core claim, the machinery that carries it, and where it could break.

This paper proposes a two-stage Instance-As-Token framework to overcome the information limits of hand-crafted features in user sequence modeling for recommendations. The first stage compresses every past interaction's features into a single unified embedding that serves as an informative token. The second stage applies standard sequence models to sequences of these tokens to learn long-range user preferences. A reader would care if this leads to more accurate recommendations and better transfer across domains, as demonstrated in industrial deployments where it improved business metrics.

Core claim

The central claim is that by compressing all features of each historical interaction instance into a unified instance embedding in the first stage, using temporal or user-order schemes, and then using these compressed tokens in the second stage with standard sequence modeling, the framework can significantly outperform state-of-the-art methods in modeling long-range preferences with superior transferability.

What carries the argument

The two-stage Instance-As-Token (IAT) compression mechanism that reduces each multi-feature interaction to a single compact token for efficient sequence processing.

Load-bearing premise

That compressing all features of each historical interaction instance into a single unified instance embedding preserves sufficient information for downstream sequence modeling without critical loss.

What would settle it

Observing that a model using uncompressed raw features or alternative compression methods achieves equal or higher performance on the same industrial datasets would indicate that the IAT compression does not provide the claimed benefits.

Figures

Figures reproduced from arXiv: 2604.08933 by Daiye Hou, Fei Qin, Fei Teng, Heng Shi, Huizhi Yang, Lele Yu, Linlan Chen, Ning Zhang, Qianqian Yang, Wenlin Zhao, Xinchun Li, Yaocheng Tan, Yixin Wu, Zhen Wang.

**Figure 2.** Figure 2: The overall two-stage framework of IAT. The IAT [PITH_FULL_IMAGE:figures/full_fig_p003_2.png] view at source ↗

**Figure 3.** Figure 3: The training paradigms of IAT source models. The [PITH_FULL_IMAGE:figures/full_fig_p004_3.png] view at source ↗

**Figure 4.** Figure 4: The storage architecture design of IAT. InsID [PITH_FULL_IMAGE:figures/full_fig_p005_4.png] view at source ↗

**Figure 5.** Figure 5: The further analysis of the performance improve [PITH_FULL_IMAGE:figures/full_fig_p007_5.png] view at source ↗

**Figure 7.** Figure 7: The scaling law of downstream models. Models with the user-order IAT consistently achieve a relative improvement in AUC (up to 0.31%) and reductions in LogLoss (up to −0.67%), accompanied by only a modest increase in parameters and FLOPs. These experiments demonstrate that the IAT sequence is effective in various sequential modeling paradigms. We observe that aligning the IAT sequence modeling architectu… view at source ↗

**Figure 9.** Figure 9: An enhancement training approach for the user [PITH_FULL_IMAGE:figures/full_fig_p011_9.png] view at source ↗

**Figure 8.** Figure 8: Different IAT modeling architecture choices studied [PITH_FULL_IMAGE:figures/full_fig_p011_8.png] view at source ↗

read the original abstract

Although sophisticated sequence modeling paradigms have achieved remarkable success in recommender systems, the information capacity of hand-crafted sequential features constrains the performance upper bound. To better enhance user experience by encoding historical interaction patterns, this paper presents a novel two-stage sequence modeling framework termed Instance-As-Token (IAT). The first stage of IAT compresses all features of each historical interaction instance into a unified instance embedding, which encodes the interaction characteristics in a compact yet informative token. Both temporal-order and user-order compression schemes are proposed, with the latter better aligning with the demands of downstream sequence modeling. The second stage involves the downstream task fetching fixed-length compressed instance tokens via timestamps and adopting standard sequence modeling approaches to learn long-range preferences patterns. Extensive experiments demonstrate that IAT significantly outperforms state-of-the-art methods and exhibits superior in-domain and cross-domain transferability. IAT has been successfully deployed in real-world industrial recommender systems, including e-commerce advertising, shopping mall marketing, and live-streaming e-commerce, delivering substantial improvements in key business metrics.

Editorial analysis

A structured set of objections, weighed in public.

Desk editor's note, referee report, simulated authors' rebuttal, and a circularity audit. Tearing a paper down is the easy half of reading it; the pith above is the substance, this is the friction.

Desk Editor's Note private letter to a colleague

IAT turns each historical interaction into a single compressed token using temporal or user-order schemes before standard sequence modeling, with claims of industrial deployment but limited checks on information loss during compression.

read the letter

The main point is a two-stage setup where every past user action gets squeezed into one embedding token, then fed as a shorter sequence to off-the-shelf models. The user-order compression is the fresher angle here because it tries to match how later layers actually process the data. The paper also points to live deployments in e-commerce ads, mall marketing, and live-streaming, with reported lifts in business numbers. That practical angle is the part worth noting for this subfield.

Referee Report

2 major / 1 minor

Summary. The paper proposes Instance-As-Token (IAT), a two-stage framework for historical user sequence modeling in recommender systems. Stage 1 compresses all features of each historical interaction instance into a single unified instance embedding (token) via temporal-order or user-order schemes; Stage 2 feeds the resulting fixed-length token sequence to standard sequence models for long-range preference learning. The authors claim that IAT significantly outperforms state-of-the-art methods, shows superior in-domain and cross-domain transferability, and has been deployed in industrial systems (e-commerce advertising, shopping mall marketing, live-streaming e-commerce) with substantial business-metric gains.

Significance. If the first-stage compression demonstrably preserves intra-instance feature interactions, IAT would offer a practical route to higher-capacity sequence modeling in industrial recommenders by converting variable-length, multi-feature histories into compact, fixed-length token sequences while improving both accuracy and transfer. The reported real-world deployments constitute a strong practical strength if supported by the experimental details.

major comments (2)

[Abstract / first-stage description] Abstract / first-stage description: the central claim that the unified instance embedding 'encodes the interaction characteristics in a compact yet informative token' is load-bearing for all performance and deployment assertions, yet no reconstruction error, mutual-information, or per-feature ablation results are supplied to test whether cross-feature dependencies within a single interaction are retained. Without such evidence the observed lifts could arise from downstream architecture choices or data differences rather than the IAT compression itself.
[Experiments and deployment claims] Experiments and deployment claims: the abstract asserts 'extensive experiments' and successful industrial deployment with 'substantial improvements in key business metrics,' but the provided text supplies neither baseline details, data-split protocols, statistical significance tests, nor comparisons against richer multi-feature sequence models. These omissions prevent verification that the reported gains are robust and attributable to IAT.

minor comments (1)

[Method description] The distinction between the temporal-order and user-order compression schemes would be clearer if the paper included a short pseudocode or explicit algorithmic description of how timestamps and user-ordering are used to produce the fixed-length token sequence.

Simulated Author's Rebuttal

2 responses · 0 unresolved

We thank the referee for the constructive feedback on our manuscript. We address each major comment below with clarifications and proposed revisions to strengthen the presentation of evidence for the IAT compression and experimental details.

read point-by-point responses

Referee: [Abstract / first-stage description] Abstract / first-stage description: the central claim that the unified instance embedding 'encodes the interaction characteristics in a compact yet informative token' is load-bearing for all performance and deployment assertions, yet no reconstruction error, mutual-information, or per-feature ablation results are supplied to test whether cross-feature dependencies within a single interaction are retained. Without such evidence the observed lifts could arise from downstream architecture choices or data differences rather than the IAT compression itself.

Authors: We acknowledge that explicit metrics such as reconstruction error or mutual information for the instance embeddings are not reported. The downstream performance gains across multiple datasets and the industrial results provide indirect support that the unified tokens retain key interaction characteristics, particularly under the user-order scheme which aligns features for sequence modeling. To directly address this, we will add per-feature ablation studies and comparisons against models that retain individual features without compression in the revised manuscript. revision: yes
Referee: [Experiments and deployment claims] Experiments and deployment claims: the abstract asserts 'extensive experiments' and successful industrial deployment with 'substantial improvements in key business metrics,' but the provided text supplies neither baseline details, data-split protocols, statistical significance tests, nor comparisons against richer multi-feature sequence models. These omissions prevent verification that the reported gains are robust and attributable to IAT.

Authors: The full manuscript details the experimental protocols, including baselines (e.g., DIN, DIEN, and other sequence models), chronological data splits, and statistical significance via paired t-tests. However, to improve clarity and verifiability, we will expand the experiments section with explicit comparisons to richer multi-feature sequence models and additional specifics on the industrial A/B tests, including exact business metrics and deployment settings. revision: partial

Circularity Check

0 steps flagged

No significant circularity in IAT derivation or claims

full rationale

The paper introduces an empirical two-stage framework: stage one compresses per-interaction features into a single instance embedding via temporal or user-order schemes, and stage two feeds the resulting token sequence into standard sequence models for preference learning. No equations, loss functions, or performance metrics are shown to reduce by construction to the compression step itself or to any fitted parameters renamed as predictions. Claims of outperformance and industrial deployment rest on experimental results rather than self-referential definitions or load-bearing self-citations that collapse the central result. The framework is self-contained against external benchmarks with no detected self-definitional, fitted-input, or ansatz-smuggling patterns.

Axiom & Free-Parameter Ledger

1 free parameters · 1 axioms · 0 invented entities

The central claim rests on the domain assumption that instance-level compression can encode interaction characteristics compactly yet informatively; no explicit free parameters or invented entities are named in the abstract, though training of the embeddings will involve learned weights.

free parameters (1)

instance embedding dimension and compression parameters
The unified instance embedding is produced by a learned compression step whose exact architecture and hyperparameters are not detailed in the abstract.

axioms (1)

domain assumption All features of a historical interaction instance can be compressed into a single token that retains the interaction characteristics needed for downstream preference modeling
Invoked in the description of the first stage of IAT.

pith-pipeline@v0.9.0 · 5523 in / 1281 out tokens · 43145 ms · 2026-05-10T17:16:31.969350+00:00 · methodology

discussion (0)

Forward citations

Cited by 1 Pith paper

Reviewed papers in the Pith corpus that reference this work. Sorted by Pith novelty score.

Sample Is Feature: Beyond Item-Level, Toward Sample-Level Tokens for Unified Large Recommender Models
cs.IR 2026-04 unverdicted novelty 7.0

SIF encodes full historical raw samples as tokens via hierarchical quantization to preserve sample context and unify sequential/non-sequential features in large recommender models.

Reference graph

Works this paper leans on

53 extracted references · 21 canonical work pages · cited by 1 Pith paper · 5 internal anchors

[1]

Zheng Chai, Qin Ren, Xijun Xiao, Huizhi Yang, Bo Han, Sijun Zhang, Di Chen, Hui Lu, Wenlin Zhao, Lele Yu, et al . 2025. Longer: Scaling up long sequence modeling in industrial recommenders. InProceedings of the Nineteenth ACM Conference on Recommender Systems. 247–256

2025
[2]

Jianxin Chang, Chenbin Zhang, Zhiyi Fu, Xiaoxue Zang, Lin Guan, Jing Lu, Yiqun Hui, Dewei Leng, Yanan Niu, Yang Song, et al. 2023. TWIN: TWo-stage interest network for lifelong user behavior modeling in CTR prediction at kuaishou. In Proceedings of the 29th ACM SIGKDD Conference on Knowledge Discovery and Data Mining. 3785–3794

2023
[3]

Junyi Chen, Lu Chi, Bingyue Peng, and Zehuan Yuan. 2024. Hllm: Enhancing sequential recommendations via hierarchical large language models for item and user modeling.arXiv preprint arXiv:2409.12740(2024)

work page arXiv 2024
[4]

Jiaxin Deng, Shiyao Wang, Kuo Cai, Lejian Ren, Qigen Hu, Weifeng Ding, Qiang Luo, and Guorui Zhou. 2025. Onerec: Unifying retrieve and rank with generative recommender and iterative preference alignment.arXiv preprint arXiv:2502.18965 (2025)

work page internal anchor Pith review arXiv 2025
[5]

Kairui Fu, Tao Zhang, Shuwen Xiao, Ziyang Wang, Xinming Zhang, Chenchi Zhang, Yuliang Yan, Junjun Zheng, Yu Li, Zhihong Chen, et al . 2025. Forge: Forming semantic identifiers for generative retrieval in industrial datasets.arXiv preprint arXiv:2509.20904(2025)

work page arXiv 2025
[6]

Lin Guan, Jia-Qi Yang, Zhishan Zhao, Beichuan Zhang, Bo Sun, Xuanyuan Luo, Jinan Ni, Xiaowen Li, Yuhang Qi, Zhifang Fan, et al. 2025. Make It Long, Keep It Fast: End-to-End 10k-Sequence Modeling at Billion Scale on Douyin.arXiv preprint arXiv:2511.06077(2025)

work page internal anchor Pith review arXiv 2025
[7]

Huan Gui, Ruoxi Wang, Ke Yin, Long Jin, Maciej Kula, Taibai Xu, Lichan Hong, and Ed H Chi. 2023. Hiformer: Heterogeneous feature interactions learning with transformers for recommender systems.arXiv preprint arXiv:2311.05884(2023)

work page arXiv 2023
[8]

Ruidong Han, Bin Yin, Shangyu Chen, He Jiang, Fei Jiang, Xiang Li, Chi Ma, Mincong Huang, Xiaoguang Li, Chunzhen Jing, et al . 2025. Mtgr: Industrial- scale generative recommendation framework in meituan. InProceedings of the 34th ACM International Conference on Information and Knowledge Management. 5731–5738

2025
[9]

Xintian Han, Honggang Chen, Quan Lin, Jingyue Gao, Xiangyuan Ren, Lifei Zhu, Zhisheng Ye, Shikang Wu, XiongHang Xie, Xiaochu Gan, et al . 2025. LEMUR: Large scale End-to-end MUltimodal Recommendation.arXiv preprint arXiv:2511.10962(2025)

work page arXiv 2025
[10]

Zhicheng He, Weiwen Liu, Wei Guo, Jiarui Qin, Yingxue Zhang, Yaochen Hu, and Ruiming Tang. 2023. A survey on user behavior modeling in recommender systems.arXiv preprint arXiv:2302.11087(2023)

work page arXiv 2023
[11]

Dan Hendrycks and Kevin Gimpel. 2016. Gaussian error linear units (gelus). arXiv preprint arXiv:1606.08415(2016)

work page internal anchor Pith review Pith/arXiv arXiv 2016
[12]

Yupeng Hou, Shanlei Mu, Wayne Xin Zhao, Yaliang Li, Bolin Ding, and Ji-Rong Wen. 2022. Towards universal sequence representation learning for recommender systems. InProceedings of the 28th ACM SIGKDD conference on knowledge discovery and data mining. 585–593

2022
[13]

Yunwen Huang, Shiyong Hong, Xijun Xiao, Jinqiu Jin, Xuanyuan Luo, Zhe Wang, Zheng Chai, Shikang Wu, Yuchao Zheng, and Jingjian Lin. 2026. HyFormer: Revis- iting the Roles of Sequence Modeling and Feature Interaction in CTR Prediction. arXiv preprint arXiv:2601.12681(2026)

work page arXiv 2026
[14]

Andrew Jaegle, Felix Gimeno, Andy Brock, Oriol Vinyals, Andrew Zisserman, and Joao Carreira. 2021. Perceiver: General perception with iterative attention. InInternational conference on machine learning. PMLR, 4651–4664

2021
[15]

Pengyue Jia, Yejing Wang, Zhaocheng Du, Xiangyu Zhao, Yichao Wang, Bo Chen, Wanyu Wang, Huifeng Guo, and Ruiming Tang. 2024. Erase: Benchmarking feature selection methods for deep recommender systems. InProceedings of the 30th ACM SIGKDD conference on knowledge discovery and data mining. 5194–5205

2024
[16]

Yuchen Jiang, Jie Zhu, Xintian Han, Hui Lu, Kunmin Bai, Mingyu Yang, Shikang Wu, Ruihao Zhang, Wenlin Zhao, Shipeng Bai, et al. 2026. TokenMixer-Large: Scaling Up Large Ranking Models in Industrial Recommenders.arXiv preprint arXiv:2602.06563(2026)

work page arXiv 2026
[17]

Wang-Cheng Kang and Julian McAuley. 2018. Self-attentive sequential recom- mendation. In2018 IEEE international conference on data mining (ICDM). IEEE, 197–206

2018
[18]

Jared Kaplan, Sam McCandlish, Tom Henighan, Tom B Brown, Benjamin Chess, Rewon Child, Scott Gray, Alec Radford, Jeffrey Wu, and Dario Amodei. 2020. Scaling laws for neural language models.arXiv preprint arXiv:2001.08361(2020)

work page internal anchor Pith review Pith/arXiv arXiv 2020
[19]

Kirill Khrylchenko, Artem Matveev, Sergei Makeev, and Vladimir Baikalov. 2025. Scaling recommender transformers to one billion parameters.arXiv preprint arXiv:2507.15994(2025)

work page arXiv 2025
[20]

Weijiang Lai, Beihong Jin, Jiongyan Zhang, Yiyuan Zheng, Jian Dong, Jia Cheng, Jun Lei, and Xingxing Wang. 2025. Exploring Scaling Laws of CTR Model for Online Performance Improvement. InProceedings of the Nineteenth ACM Conference on Recommender Systems. 114–123

2025
[21]

Kaiyuan Li, Yongxiang Tang, Yanhua Cheng, Yong Bai, Yanxiang Zeng, Chao Wang, Xialong Liu, and Peng Jiang. 2025. VQL: An End-to-End Context-Aware Vector Quantization Attention for Ultra-Long User Behavior Modeling.arXiv preprint arXiv:2508.17125(2025)

work page arXiv 2025
[22]

Zhiwei Liu, Ziwei Fan, Yu Wang, and Philip S Yu. 2021. Augmenting sequential recommendation with pseudo-prior items via reversely pre-training transformer. InProceedings of the 44th international ACM SIGIR conference on Research and development in information retrieval. 1608–1612

2021
[23]

Xiao Lv, Jiangxia Cao, Shijie Guan, Xiaoyou Zhou, Zhiguang Qi, Yaqiang Zang, Ben Wang, and Guorui Zhou. 2025. MARM: Unlocking the Recommendation Cache Scaling-Law through Memory Augmentation and Scalable Complexity. InProceedings of the 34th ACM International Conference on Information and Knowledge Management. 2022–2031

2025
[24]

Wenhan Lyu, Devashish Tyagi, Yihang Yang, Ziwei Li, Ajay Somani, Karthikeyan Shanmugasundaram, Nikola Andrejevic, Ferdi Adeputra, Curtis Zeng, Arun K Singh, et al. 2025. DV365: Extremely Long User History Modeling at Instagram. InProceedings of the 31st ACM SIGKDD Conference on Knowledge Discovery and Data Mining V. 2. 4717–4727

2025
[25]

Shashank Rajput, Nikhil Mehta, Anima Singh, Raghunandan Hulikal Keshavan, Trung Vu, Lukasz Heldt, Lichan Hong, Yi Tay, Vinh Tran, Jonah Samost, et al
[26]

Recommender systems with generative retrieval.Advances in Neural Information Processing Systems36 (2023), 10299–10315

2023
[27]

Ads Recommendation. 2025. External Large Foundation Model: How to Efficiently Serve Trillions of Parameters for Online Ads Recommendation.arXiv preprint arXiv:2502.17494(2025)

work page arXiv 2025
[28]

Benedikt Schifferer, Chris Deotte, and Even Oldridge. 2020. Tutorial: feature engineering for recommender systems. InProceedings of the 14th ACM Conference on Recommender Systems. 754–755

2020
[29]

Zihua Si, Lin Guan, ZhongXiang Sun, Xiaoxue Zang, Jing Lu, Yiqun Hui, Xingchao Cao, Zeyu Yang, Yichen Zheng, Dewei Leng, et al. 2024. Twin v2: Scaling ultra- long user behavior sequence modeling for enhanced ctr prediction at kuaishou. InProceedings of the 33rd ACM International Conference on Information and Knowledge Management. 4890–4897

2024
[30]

Uriel Singer, Haggai Roitman, Yotam Eshel, Alexander Nus, Ido Guy, Or Levi, Idan Hasson, and Eliyahu Kiperwasser. 2022. Sequential modeling with multiple attributes for watchlist recommendation in e-commerce. InProceedings of the fifteenth ACM international conference on web search and data mining. 937–946

2022
[31]

Xin Song, Xiaochen Li, Jinxin Hu, Hong Wen, Zulong Chen, Yu Zhang, Xiaoyi Zeng, and Jing Zhang. 2025. Lrea: Low-rank efficient attention on modeling long- term user behaviors for ctr prediction. InProceedings of the 48th International ACM SIGIR Conference on Research and Development in Information Retrieval. 2843–2847

2025
[32]

Ashish Vaswani, Noam Shazeer, Niki Parmar, Jakob Uszkoreit, Llion Jones, Aidan N Gomez, Łukasz Kaiser, and Illia Polosukhin. 2017. Attention is all you need.Advances in neural information processing systems30 (2017)

2017
[33]

Hu Wan, Yun Huang, Shuhan Bai, Xuan Sun, Tei-Wei Kuo, and Chun Jason Xue
[34]

InProceedings of the 40th ACM/SIGAPP Symposium on Applied Computing

Rabbitail: A Tail Latency-Aware Scheduler for Deep Learning Recommen- dation Systems with Hierarchical Embedding Storage. InProceedings of the 40th ACM/SIGAPP Symposium on Applied Computing. 279–287. Conference acronym ’XX, June 03–05, 2018, Woodstock, NY Li et al

2018
[35]

Shuhan Wang, Bin Shen, Xu Min, Yong He, Xiaolu Zhang, Liang Zhang, Jun Zhou, and Linjian Mo. 2024. Aligned side information fusion method for sequential recommendation. InCompanion Proceedings of the ACM Web Conference 2024. 112–120

2024
[36]

Zhuoxing Wei, Qi Liu, and Qingchen Xie. 2025. Deep Multiple Quantization Network on Long Behavior Sequence for Click-Through Rate Prediction. In Proceedings of the 48th International ACM SIGIR Conference on Research and Development in Information Retrieval. 3090–3094

2025
[37]

Bin Wu, Feifan Yang, Zhangming Chan, Yu-Ran Gu, Jiawei Feng, Chao Yi, Xiang- Rong Sheng, Han Zhu, Jian Xu, Mang Ye, et al. 2025. MUSE: A Simple Yet Effective Multimodal Search-Based Framework for Lifelong User Interest Modeling.arXiv preprint arXiv:2512.07216(2025)

work page arXiv 2025
[38]

Xue Xia, Saurabh Joshi, Kousik Rajesh, Kangnan Li, Yangyi Lu, Nikil Pancha, Dhruvil Badani, Jiajing Xu, and Pong Eksombatchai. 2025. TransAct V2: Lifelong User Action Sequence Modeling on Pinterest Recommendation. InProceedings of the 34th ACM International Conference on Information and Knowledge Management. 6881–6882

2025
[39]

Yueqi Xie, Peilin Zhou, and Sunghun Kim. 2022. Decoupled side information fusion for sequential recommendation. InProceedings of the 45th international ACM SIGIR conference on research and development in information retrieval. 1611– 1621

2022
[40]

Lee Xiong, Zhirong Chen, Rahul Mayuranath, Shangran Qiu, Arda Ozdemir, Lu Li, Yang Hu, Dave Li, Jingtao Ren, Howard Cheng, et al. 2026. LLaTTE: Scaling Laws for Multi-Stage Sequence Modeling in Large-Scale Ads Recommendation. arXiv preprint arXiv:2601.20083(2026)

work page arXiv 2026
[41]

Songpei Xu, Shijia Wang, Da Guo, Xianwen Guo, Qiang Xiao, Bin Huang, Guanlin Wu, and Chuanjiang Luo. 2025. Climber: Toward efficient scaling laws for large recommendation models. InProceedings of the 34th ACM International Conference on Information and Knowledge Management. 6193–6200

2025
[42]

Jinho Yang, Ji-Hoon Kim, and Joo-Young Kim. 2025. SCRec: A Scalable Computa- tional Storage System with Statistical Sharding and Tensor-train Decomposition for Recommendation Models.IEEE Trans. Comput.(2025)

2025
[43]

Jingyang Yuan, Huazuo Gao, Damai Dai, Junyu Luo, Liang Zhao, Zhengyan Zhang, Zhenda Xie, Yuxing Wei, Lean Wang, Zhiping Xiao, et al. 2025. Native sparse attention: Hardware-aligned and natively trainable sparse attention. In Proceedings of the 63rd Annual Meeting of the Association for Computational Linguistics (Volume 1: Long Papers). 23078–23097

2025
[44]

Kun Yuan, Junyu Bi, Daixuan Cheng, Changfa Wu, Shuwen Xiao, Binbin Cao, Jian Wu, and Yuning Jiang. 2026. HiSAC: Hierarchical Sparse Activation Com- pression for Ultra-long Sequence Modeling in Recommenders.arXiv preprint arXiv:2602.21009(2026)

work page arXiv 2026
[45]

Zhichen Zeng, Xiaolong Liu, Mengyue Hang, Xiaoyi Liu, Qinghai Zhou, Chaofei Yang, Yiqun Liu, Yichen Ruan, Laming Chen, Yuxin Chen, et al. 2025. InterFormer: Effective Heterogeneous Interaction Learning for Click-Through Rate Prediction. InProceedings of the 34th ACM International Conference on Information and Knowledge Management. 6225–6233

2025
[46]

Jiaqi Zhai, Lucy Liao, Xing Liu, Yueming Wang, Rui Li, Xuan Cao, Leon Gao, Zhao- jie Gong, Fangda Gu, Michael He, et al. 2024. Actions speak louder than words: Trillion-parameter sequential transducers for generative recommendations.arXiv preprint arXiv:2402.17152(2024)

work page internal anchor Pith review arXiv 2024
[47]

Buyun Zhang, Liang Luo, Yuxin Chen, Jade Nie, Xi Liu, Daifeng Guo, Yanli Zhao, Shen Li, Yuchen Hao, Yantao Yao, et al. 2024. Wukong: Towards a scaling law for large-scale recommendation.arXiv preprint arXiv:2403.02545(2024)

work page arXiv 2024
[48]

Kun Zhang, Jingming Zhang, Wei Cheng, Yansong Cheng, Jiaqi Zhang, Hao Lu, Xu Zhang, Haixiang Gan, Jiangxia Cao, Tenglong Wang, et al. 2026. OneMall: One Model, More Scenarios–End-to-End Generative Recommender Family at Kuaishou E-Commerce.arXiv preprint arXiv:2601.21770(2026)

work page arXiv 2026
[49]

Tingting Zhang, Pengpeng Zhao, Yanchi Liu, Victor S Sheng, Jiajie Xu, Deqing Wang, Guanfeng Liu, Xiaofang Zhou, et al . 2019. Feature-level deeper self- attention network for sequential recommendation. InIJCAI. 4320–4326

2019
[50]

Zhaoqi Zhang, Haolei Pei, Jun Guo, Tianyu Wang, Yufei Feng, Hui Sun, Shaowei Liu, and Aixin Sun. 2025. OneTrans: Unified Feature Interaction and Sequence Modeling with One Transformer in Industrial Recommender.arXiv preprint arXiv:2510.26104(2025)

work page arXiv 2025
[51]

Guorui Zhou, Na Mou, Ying Fan, Qi Pi, Weijie Bian, Chang Zhou, Xiaoqiang Zhu, and Kun Gai. 2019. Deep interest evolution network for click-through rate prediction. InProceedings of the AAAI conference on artificial intelligence, Vol. 33. 5941–5948

2019
[52]

Wen-Ji Zhou, Yuhang Zheng, Yinfu Feng, Yunan Ye, Rong Xiao, Long Chen, Xiaosong Yang, and Jun Xiao. 2024. ENCODE: Breaking the trade-off between performance and efficiency in long-term user behavior modeling.IEEE Transac- tions on Knowledge and Data Engineering37, 1 (2024), 265–277

2024
[53]

Jie Zhu, Zhifang Fan, Xiaoxie Zhu, Yuchen Jiang, Hangyu Wang, Xintian Han, Haoran Ding, Xinmin Wang, Wenlin Zhao, Zhen Gong, et al. 2025. Rankmixer: Scaling up ranking models in industrial recommenders. InProceedings of the 34th ACM International Conference on Information and Knowledge Management. 6309–6316. IAT: Instance-As-Token Compression for Historic...

2025