pith. machine review for the scientific record. sign in

arxiv: 2604.12990 · v1 · submitted 2026-04-14 · 💻 cs.IR

Recognition: unknown

Sparse Contrastive Learning for Content-Based Cold Item Recommendation

Authors on Pith no claims yet

Pith reviewed 2026-05-10 14:12 UTC · model grok-4.3

classification 💻 cs.IR
keywords cold-start recommendationcontent-based filteringsparse contrastive learningentmaxsampled softmaxknowledge distillationitem equity
0
0 comments X

The pith

A content encoder trained solely on item features with sparse contrastive loss can rank cold-start items more accurately than methods that map content into collaborative embedding spaces.

A machine-rendered reading of the paper's core claim, the machinery that carries it, and where it could break.

The paper claims that cold-start item recommendation need not rely on any collaborative filtering signals or alignment between content and user embeddings. Instead, a content encoder is trained to produce a latent space in which item-to-item similarity directly reflects user preferences. The training uses a sparse generalization of sampled softmax based on the alpha-entmax family of activations, which zeros out gradients for uninformative negative samples and yields sharper relevance estimates. This regime, called SEMCo and optionally extended by knowledge distillation, is reported to improve ranking metrics over prior cold-start approaches and standard sampled softmax. The authors further note that avoiding collaborative signals can improve equity of outcomes across items.

Core claim

The authors introduce Sampled Entmax for Cold-start (SEMCo), a purely content-based training objective that replaces the standard softmax in sampled contrastive loss with the alpha-entmax activation. This produces sparse gradients by zeroing contributions from uninformative negatives, allowing the content encoder to learn a latent space where similarity correlates with user preferences. When combined with knowledge distillation, the resulting model outperforms both existing cold-start baselines and conventional sampled softmax on ranking accuracy. The approach is presented as avoiding the information gap inherent in aligning content features to collaborative embeddings.

What carries the argument

The Sampled Entmax for Cold-start (SEMCo) objective, a sparse generalization of sampled softmax loss that employs the alpha-entmax family to zero gradients on uninformative negatives during content-encoder training.

Load-bearing premise

That item content alone supplies enough signal for an encoder to learn a latent space whose similarities match user preferences without any interaction data or alignment to collaborative embeddings.

What would settle it

A head-to-head ranking experiment on a public cold-start dataset in which SEMCo records lower NDCG@K or recall@K than a content-to-CF alignment baseline on the same held-out cold items.

Figures

Figures reproduced from arXiv: 2604.12990 by Gregor Meehan, Johan Pauwels.

Figure 1
Figure 1. Figure 1: Item MDG@20 and prediction counts across the [PITH_FULL_IMAGE:figures/full_fig_p004_1.png] view at source ↗
read the original abstract

Item cold-start is a pervasive challenge for collaborative filtering (CF) recommender systems. Existing methods often train cold-start models by mapping auxiliary item content, such as images or text descriptions, into the embedding space of a CF model. However, such approaches can be limited by the fundamental information gap between CF signals and content features. In this work, we propose to avoid this limitation with purely content-based modeling of cold items, i.e. without alignment with CF user or item embeddings. We instead frame cold-start prediction in terms of item-item similarity, training a content encoder to project into a latent space where similarity correlates with user preferences. We define our training objective as a sparse generalization of sampled softmax loss with the $\alpha$-entmax family of activation functions, which allows for sharper estimation of item relevance by zeroing gradients for uninformative negatives. We then describe how this Sampled Entmax for Cold-start (SEMCo) training regime can be extended via knowledge distillation, and show that it outperforms existing cold-start methods and standard sampled softmax in ranking accuracy. We also discuss the advantages of purely content-based modeling, particularly in terms of equity of item outcomes.

Editorial analysis

A structured set of objections, weighed in public.

Desk editor's note, referee report, simulated authors' rebuttal, and a circularity audit. Tearing a paper down is the easy half of reading it; the pith above is the substance, this is the friction.

Referee Report

2 major / 2 minor

Summary. The paper proposes Sampled Entmax for Cold-start (SEMCo), a purely content-based approach to cold-start item recommendation. It trains a content encoder (e.g., on images or text) to embed items in a latent space where pairwise similarity correlates with user preferences, using a sparse generalization of sampled softmax loss based on the α-entmax family. This sparsity is intended to zero gradients on uninformative negatives for sharper relevance estimation. The method can be extended with knowledge distillation, and the authors report that it outperforms prior cold-start baselines and standard sampled softmax on ranking metrics while offering equity advantages for item outcomes.

Significance. If the central claim holds—that a content-only encoder can be trained to produce preference-aligned similarities without CF alignment or signals—the result would be significant for cold-start recommendation. It would demonstrate a viable alternative to embedding-alignment methods that suffer from information gaps, and the entmax-based sparsity would represent a concrete technical advance over softmax sampling for contrastive objectives in recsys. The equity discussion could also be valuable if supported by appropriate metrics.

major comments (2)
  1. [Abstract and §3] Abstract and §3 (training objective): The central claim that SEMCo is 'purely content-based' and avoids the 'fundamental information gap' by training 'without alignment with CF user or item embeddings' is load-bearing but under-specified. Positive pairs must be defined to train the contrastive loss; if these derive from observed user-item interactions or co-occurrences (standard practice), collaborative signals are present at training time even if inference uses only content. This directly undermines the premise that the method sidesteps CF signals. The manuscript must explicitly state the source of positives/negatives and whether any CF-derived labels or pre-trained embeddings are used.
  2. [§4 and results tables] §4 (experiments) and Table X (results): The abstract asserts outperformance over 'existing cold-start methods and standard sampled softmax' in ranking accuracy, but the strength of this claim depends on the experimental design. Without details on dataset sizes, number of cold items, choice of α, ablation on the entmax sparsity mechanism (e.g., fraction of zeroed gradients), statistical significance tests, and whether baselines were re-implemented with identical content encoders, it is impossible to verify that gains are attributable to the proposed loss rather than implementation differences or post-hoc tuning.
minor comments (2)
  1. [§3] Notation for the α-entmax loss should be introduced with an explicit equation (e.g., the sparse softmax definition) rather than assuming familiarity; this would improve readability for readers outside the entmax literature.
  2. [Discussion section] The equity discussion would benefit from a concrete metric (e.g., Gini coefficient on item exposure or long-tail coverage) and a comparison table rather than qualitative statements.

Simulated Author's Rebuttal

2 responses · 0 unresolved

We thank the referee for their insightful comments. We provide detailed responses to each major comment below and will revise the manuscript to address the concerns raised.

read point-by-point responses
  1. Referee: [Abstract and §3] Abstract and §3 (training objective): The central claim that SEMCo is 'purely content-based' and avoids the 'fundamental information gap' by training 'without alignment with CF user or item embeddings' is load-bearing but under-specified. Positive pairs must be defined to train the contrastive loss; if these derive from observed user-item interactions or co-occurrences (standard practice), collaborative signals are present at training time even if inference uses only content. This directly undermines the premise that the method sidesteps CF signals. The manuscript must explicitly state the source of positives/negatives and whether any CF-derived labels or pre-trained embeddings are used.

    Authors: We appreciate the referee pointing out the need for greater specificity in describing our training setup. To clarify: positive pairs are constructed based on co-occurrence in user interactions (i.e., items liked by the same users are treated as positives), which provides the supervisory signal for preference alignment. Negative pairs are sampled from the item pool. Importantly, however, the content encoder is trained exclusively on item content features without any pre-trained CF embeddings or direct use of user embeddings. No alignment to an existing CF latent space occurs. This is what we mean by 'purely content-based' and avoiding the information gap: at inference, recommendations for cold items rely solely on content-derived similarities, without needing CF data for those items. We will revise the abstract and Section 3 to explicitly detail the construction of positive/negative pairs and confirm that no CF-derived labels or embeddings are used beyond defining the pairs. revision: yes

  2. Referee: [§4 and results tables] §4 (experiments) and Table X (results): The abstract asserts outperformance over 'existing cold-start methods and standard sampled softmax' in ranking accuracy, but the strength of this claim depends on the experimental design. Without details on dataset sizes, number of cold items, choice of α, ablation on the entmax sparsity mechanism (e.g., fraction of zeroed gradients), statistical significance tests, and whether baselines were re-implemented with identical content encoders, it is impossible to verify that gains are attributable to the proposed loss rather than implementation differences or post-hoc tuning.

    Authors: We agree that more experimental details are necessary to substantiate our claims. In the revised version, we will expand Section 4 to include: dataset sizes and the number/proportion of cold items; the specific value of α used and how it was selected; an ablation study quantifying the effect of the entmax sparsity (including the average fraction of zeroed gradients); statistical significance testing on the reported metrics; and explicit confirmation that all baselines were re-implemented with the identical content encoders and hyperparameter tuning procedures as SEMCo. These additions will allow readers to better assess the contribution of the sparse contrastive loss. revision: yes

Circularity Check

0 steps flagged

No significant circularity; derivation remains self-contained.

full rationale

The paper defines its training objective explicitly as a sparse generalization of the sampled softmax loss via the α-entmax family, presented as an extension of an established contrastive objective rather than a redefinition of its own outputs. No equations or steps are shown that reduce the claimed ranking gains to a fitted parameter renamed as a prediction, nor does any load-bearing premise collapse to a self-citation chain or ansatz smuggled from prior author work. The content encoder's latent space is trained to correlate item similarity with user preferences by construction of the loss, but this is an explicit modeling choice with external empirical validation claimed against baselines, not a tautological equivalence. Minor self-citation of contrastive learning literature is present but not load-bearing for the central result.

Axiom & Free-Parameter Ledger

1 free parameters · 1 axioms · 0 invented entities

The approach rests on the premise that content features alone suffice to learn preference-aligned similarities and that the entmax sparsity mechanism improves gradient quality over standard softmax; no new entities are postulated.

free parameters (1)
  • alpha
    Sparsity parameter of the entmax family that controls how many negatives receive zero gradient; must be chosen or tuned.
axioms (1)
  • standard math The α-entmax family provides a valid sparse generalization of softmax that can zero out gradients for uninformative negatives.
    Invoked as the activation in the sampled loss for sharper relevance estimation.

pith-pipeline@v0.9.0 · 5497 in / 1278 out tokens · 35131 ms · 2026-05-10T14:12:01.325002+00:00 · methodology

discussion (0)

Sign in with ORCID, Apple, or X to comment. Anyone can read and Pith papers without signing in.

Reference graph

Works this paper leans on

76 extracted references · 12 canonical work pages · 5 internal anchors

  1. [1]

    Gediminas Adomavicius and YoungOk Kwon. 2011. Improving aggregate recom- mendation diversity using ranking-based techniques.IEEE Trans. on Knowledge and Data Engineering24, 5 (2011), 896–911

  2. [2]

    Haoyue Bai, Min Hou, Le Wu, Yonghui Yang, Kun Zhang, Richang Hong, and Meng Wang. 2023. Gorec: a generative cold-start recommendation framework. InProc. of the 31st ACM international conf. on multimedia. 1004–1012

  3. [3]

    James Bergstra, Rémi Bardenet, Yoshua Bengio, and Balázs Kégl. 2011. Algorithms for hyper-parameter optimization.Advances in neural information processing systems24 (2011)

  4. [4]

    Mathieu Blondel, André FT Martins, and Vlad Niculae. 2020. Learning with fenchel-young losses.Journal of Machine Learning Research21, 35 (2020), 1–69

  5. [5]

    Hao Chen, Zefan Wang, Feiran Huang, Xiao Huang, Yue Xu, Yishi Lin, Peng He, and Zhoujun Li. 2022. Generative adversarial framework for cold-start item recommendation. InProc. of the 45th International ACM SIGIR Conf. on Research and Development in Information Retrieval. 2565–2571

  6. [6]

    Ting Chen, Simon Kornblith, Mohammad Norouzi, and Geoffrey Hinton. 2020. A simple framework for contrastive learning of visual representations. InIntl. Conf. on Machine Learning. 1597–1607

  7. [7]

    Minjin Choi, Hye-young Kim, Hyunsouk Cho, and Jongwuk Lee. 2024. Multi- intent-aware session-based recommendation. InProceedings of the 47th interna- tional ACM SIGIR conference on research and development in information retrieval. 2532–2536

  8. [8]

    Jacob Devlin, Ming-Wei Chang, Kenton Lee, and Kristina Toutanova. 2018. BERT: Pre-training of Deep Bidirectional Transformers for Language Understanding. CoRRabs/1810.04805 (2018). arXiv:1810.04805 http://arxiv.org/abs/1810.04805

  9. [9]

    Christian Ganhör, Marta Moscati, Anna Hausberger, Shah Nawaz, and Markus Schedl. 2024. A Multimodal Single-Branch Embedding Network for Recommen- dation in Cold-Start and Missing Modality Scenarios. InProc. of the 18th ACM Conf. on Recommender Systems. 380–390

  10. [10]

    Jianping Gou, Baosheng Yu, Stephen J Maybank, and Dacheng Tao. 2021. Knowl- edge distillation: A survey.International journal of computer vision129, 6 (2021)

  11. [11]

    Geoffrey Hinton, Oriol Vinyals, and Jeff Dean. 2015. Distilling the knowledge in a neural network.arXiv preprint arXiv:1503.02531(2015)

  12. [12]

    Feiran Huang, Zefan Wang, Xiao Huang, Yufeng Qian, Zhetao Li, and Hao Chen

  13. [13]

    Aligning Distillation For Cold-start Item Recommendation. InProc. of the 46th International ACM SIGIR Conf. on Research and Development in Information Retrieval (SIGIR ’23). Association for Computing Machinery, New York, NY, USA

  14. [14]

    SeongKu Kang, Wonbin Kweon, Dongha Lee, Jianxun Lian, Xing Xie, and Hwanjo Yu. 2023. Distillation from heterogeneous models for top-K recommendation. In Proc. of the ACM Web Conf. 2023. 801–811

  15. [15]

    SeongKu Kang, Wonbin Kweon, Dongha Lee, Jianxun Lian, Xing Xie, and Hwanjo Yu. 2025. Unbiased, Effective, and Efficient Distillation from Heterogeneous Models for Recommender Systems.ACM Transactions on Recommender Systems 3, 3 (2025), 1–28

  16. [16]

    Adam: A Method for Stochastic Optimization

    Diederik P. Kingma and Jimmy Ba. 2014. Adam: A Method for Stochastic Opti- mization.CoRRabs/1412.6980 (2014)

  17. [17]

    Anastasiia Klimashevskaia, Dietmar Jannach, Mehdi Elahi, and Christoph Trat- tner. 2024. A survey on popularity bias in recommender systems.User Modeling and User-Adapted Interaction34, 5 (2024), 1777–1834

  18. [18]

    Weiwei Kong, Walid Krichene, Nicolas Mayoraz, Steffen Rendle, and Li Zhang

  19. [19]

    Advances in Neural Information Processing Systems33 (2020), 633–643

    Rankmax: An adaptive projection alternative to the softmax function. Advances in Neural Information Processing Systems33 (2020), 633–643

  20. [20]

    Jae-woong Lee, Minjin Choi, Jongwuk Lee, and Hyunjung Shim. 2019. Collabo- rative distillation for top-N recommendation. In2019 IEEE international conf. on data mining (ICDM). IEEE, 369–378

  21. [21]

    Ilya Loshchilov and Frank Hutter. 2016. Sgdr: Stochastic gradient descent with warm restarts.arXiv preprint arXiv:1608.03983(2016)

  22. [22]

    Daniele Malitesta, Giandomenico Cornacchia, Claudio Pomo, and Tommaso Di Noia. 2023. On popularity bias of multimodal-aware recommender systems: a modalities-driven analysis. InProc. of the 1st International Workshop on Deep Multimodal Learning for Information Retrieval. 59–68

  23. [23]

    Daniele Malitesta, Emanuele Rossi, Claudio Pomo, Tommaso Di Noia, and Fragkiskos D Malliaros. 2026. Training-free Graph-based Imputation of Missing Modalities in Multimodal Recommendation.IEEE Transactions on Knowledge and Data Engineering(2026)

  24. [24]

    Saranya Maneeroj and Nakarin Sritrakool. 2022. An end-to-end personalized pref- erence drift aware sequential recommender system with optimal item utilization. IEEE Access10 (2022), 62932–62952

  25. [25]

    Andre Martins and Ramon Astudillo. 2016. From softmax to sparsemax: A sparse model of attention and multi-label classification. InInternational conference on machine learning. PMLR, 1614–1623

  26. [26]

    Julian McAuley, Christopher Targett, Qinfeng Shi, and Anton Van Den Hengel

  27. [27]

    Image-based recommendations on styles and substitutes. InProc. of the 38th international ACM SIGIR conf. on research and development in information retrieval. 43–52

  28. [28]

    Gregor Meehan and Johan Pauwels. 2025. Artist Considerations in Offline Eval- uation of Music Recommender Systems. MuRS 2025: 3rd Music Recommender Systems Workshop, September 22nd, 2025

  29. [29]

    Gregor Meehan and Johan Pauwels. 2025. On Inherited Popularity Bias in Cold- Start Item Recommendation. InProc. of the Nineteenth ACM Conf. on Recommender Systems. 649–654

  30. [30]

    Marta Moscati, Emilia Parada-Cabaleiro, Yashar Deldjoo, Eva Zangerle, and Markus Schedl. 2022. Music4All-Onion – A Large-Scale Multi-Faceted Content- Centric Music Recommendation Dataset. InProc. of the 31st ACM International Conf. on Information & Knowledge Management(Atlanta, GA, USA)(CIKM ’22). Association for Computing Machinery, New York, NY, USA, 43...

  31. [31]

    Amir Moslemi, Anna Briskina, Zubeka Dang, and Jason Li. 2024. A survey on knowledge distillation: Recent advancements.Machine Learning with Applications 18 (2024), 100605. SIGIR ’26, July 20–24, 2026, Melbourne, VIC, Australia Meehan and Pauwels

  32. [32]

    Yongxin Ni, Yu Cheng, Xiangyan Liu, Junchen Fu, Youhua Li, Xiangnan He, Yongfeng Zhang, and Fajie Yuan. 2023. A Content-Driven Micro-Video Recom- mendation Dataset at Scale.arXiv preprint arXiv:2309.15379(2023)

  33. [33]

    Xia Ning and George Karypis. 2011. Slim: Sparse linear methods for top-n recommender systems. In2011 IEEE 11th international conference on data mining. IEEE, 497–506

  34. [34]

    Aaron van den Oord, Yazhe Li, and Oriol Vinyals. 2018. Representation learning with contrastive predictive coding.arXiv preprint arXiv:1807.03748(2018)

  35. [35]

    Seongmin Park, Mincheol Yoon, Hye-young Kim, and Jongwuk Lee. 2025. Why is normalization necessary for linear recommenders?. InProceedings of the 48th international ACM SIGIR conference on research and development in information retrieval. 2142–2151

  36. [36]

    Ben Peters, Vlad Niculae, and André FT Martins. 2019. Sparse sequence-to- sequence models.arXiv preprint arXiv:1905.05702(2019)

  37. [37]

    Yuanhao Pu, Xiaolong Chen, Xu Huang, Jin Chen, Defu Lian, and Enhong Chen

  38. [38]

    InProceedings of the 41st International Conference on Machine Learning

    Learning-efficient yet generalizable collaborative filtering for item rec- ommendation. InProceedings of the 41st International Conference on Machine Learning. 41183–41203

  39. [39]

    Igor André Pegoraro Santana, Fabio Pinhelli, Juliano Donini, Leonardo Catharin, Rafael Biazus Mangolin, Valéria Delisandra Feltrim, Marcos Aurélio Domingues, et al. 2020. Music4all: A new music database and its applications. In2020 Interna- tional Conf. on Systems, Signals and Image Processing (IWSSIP). IEEE, 399–404

  40. [40]

    Markus Schedl, Stefan Brandl, Oleg Lesota, Emilia Parada-Cabaleiro, David Penz, and Navid Rekabsaz. 2022. LFM-2b: A dataset of enriched music listening events for recommender systems research and fairness analysis. InProc. of the 2022 Conf. on Human Information Interaction and Retrieval. 337–341

  41. [41]

    Andrew I Schein, Alexandrin Popescul, Lyle H Ungar, and David M Pennock

  42. [42]

    Methods and metrics for cold-start recommendations. InProc. of the 25th annual international ACM SIGIR conf. on Research and development in information retrieval. 253–260

  43. [43]

    Pavan Seshadri, Shahrzad Shashaani, and Peter Knees. 2024. Enhancing Se- quential Music Recommendation with Negative Feedback-informed Contrastive Learning. InProc. of the 18th ACM Conf. on Recommender Systems. 1028–1032

  44. [44]

    Yu Shang, Chen Gao, Jiansheng Chen, Depeng Jin, and Yong Li. 2024. Improving item-side fairness of multimodal recommendation via modality debiasing. In Proc. of the ACM Web Conf. 2024. 4697–4705

  45. [45]

    Harald Steck. 2019. Embarrassingly shallow autoencoders for sparse data. InThe World Wide Web Conference. 3251–3257

  46. [46]

    Changfeng Sun, Han Liu, Meng Liu, Zhaochun Ren, Tian Gan, and Liqiang Nie

  47. [47]

    LARA: Attribute-to-feature adversarial learning for new-item recommen- dation. InProc. of the 13th international conf. on web search and data mining

  48. [48]

    Yan-Martin Tamm, Gregor Meehan, Vojtěch Nekl, Vojtěch Vančura, Rodrigo Alves, Johan Pauwels, and Anna Aljanaki. 2026. Leveraging Artist Catalogs for Cold-Start Music Recommendation. arXiv:2604.07090 [cs.IR] https://arxiv.org/ abs/2604.07090

  49. [49]

    Jiaxi Tang and Ke Wang. 2018. Ranking distillation: Learning compact ranking models with high performance for recommender system. InProc. of the 24th ACM SIGKDD international conf. on knowledge discovery & data mining. 2289–2298

  50. [50]

    Antti Tarvainen and Harri Valpola. 2017. Mean teachers are better role models: Weight-averaged consistency targets improve semi-supervised deep learning results.Advances in neural information processing systems30 (2017)

  51. [51]

    Constantino Tsallis. 1988. Possible generalization of Boltzmann-Gibbs statistics. Journal of statistical physics52, 1 (1988), 479–487

  52. [52]

    Aaron Van den Oord, Sander Dieleman, and Benjamin Schrauwen. 2013. Deep content-based music recommendation.Advances in neural information processing systems26 (2013)

  53. [53]

    Vojtěch Vančura, Rodrigo Alves, Petr Kasalick`y, and Pavel Kordík. 2022. Scalable linear shallow autoencoder for collaborative filtering. InProceedings of the 16th ACM Conference on Recommender Systems. 604–609

  54. [54]

    Vojtěch Vančura, Petr Kasalick`y, Rodrigo Alves, and Pavel Kordík. 2025. Evaluat- ing Linear Shallow Autoencoders on Large Scale Datasets.ACM Transactions on Recommender Systems(2025)

  55. [55]

    Vojtěch Vančura, Pavel Kordík, and Milan Straka. 2024. beeFormer: Bridging the Gap Between Semantic and Interaction Similarity in Recommender Systems. In Proceedings of the 18th ACM Conference on Recommender Systems. 1102–1107

  56. [56]

    Maksims Volkovs, Guangwei Yu, and Tomi Poutanen. 2017. Dropoutnet: Ad- dressing cold start in recommender systems.Advances in neural information processing systems30 (2017)

  57. [57]

    Wenbo Wang, Bingquan Liu, Lili Shan, Chengjie Sun, Ben Chen, and Jian Guan

  58. [58]

    Preference Aware Dual Contrastive Learning for Item Cold-Start Recom- mendation. InProc. of the AAAI Conf. on Artificial Intelligence, Vol. 38. 9125–9132

  59. [59]

    Yinwei Wei, Xiang Wang, Qi Li, Liqiang Nie, Yan Li, Xuanping Li, and Tat-Seng Chua. 2021. Contrastive learning for cold-start recommendation. InProc. of the 29th ACM International Conf. on Multimedia. 5382–5390

  60. [60]

    Minz Won, Yun-Ning Hung, and Duc Le. 2024. A foundation model for music informatics. InICASSP 2024-2024 IEEE International Conf. on Acoustics, Speech and Signal Processing (ICASSP). IEEE, 1226–1230

  61. [61]

    Jiancan Wu, Xiang Wang, Xingyu Gao, Jiawei Chen, Hongcheng Fu, and Tianyu Qiu. 2024. On the effectiveness of sampled softmax loss for item recommendation. ACM Transactions on Information Systems42, 4 (2024), 1–26

  62. [62]

    Jiahao Yuan, Zihan Song, Mingyou Sun, Xiaoling Wang, and Wayne Xin Zhao

  63. [63]

    In Proceedings of the AAAI conference on artificial intelligence, Vol

    Dual sparse attention network for session-based recommendation. In Proceedings of the AAAI conference on artificial intelligence, Vol. 35. 4635–4643

  64. [64]

    Hansi Zeng, Hamed Zamani, and Vishwa Vinay. 2022. Curriculum learning for dense retrieval distillation. InProc. of the 45th International ACM SIGIR Conf. on Research and Development in Information Retrieval. 1979–1983

  65. [65]

    Jinghao Zhang, Guofan Liu, Qiang Liu, Shu Wu, and Liang Wang. 2024. Modality- Balanced Learning for Multimedia Recommendation. InProc. of the 32nd ACM International Conf. on Multimedia (MM ’24). Association for Computing Machin- ery, New York, NY, USA, 7551–7560. doi:10.1145/3664647.3680626

  66. [66]

    Linfeng Zhang, Chenglong Bao, and Kaisheng Ma. 2021. Self-distillation: Towards efficient and compact neural networks.IEEE Transactions on Pattern Analysis and Machine Intelligence44, 8 (2021), 4388–4403

  67. [67]

    Linfeng Zhang, Jiebo Song, Anni Gao, Jingwei Chen, Chenglong Bao, and Kaisheng Ma. 2019. Be your own teacher: Improve the performance of con- volutional neural networks via self distillation. InProceedings of the IEEE/CVF international conference on computer vision. 3713–3722

  68. [68]

    Weizhi Zhang, Yuanchen Bei, Liangwei Yang, Henry Peng Zou, Peilin Zhou, Aiwei Liu, Yinghui Li, Hao Chen, Jianling Wang, Yu Wang, et al. 2025. Cold-start rec- ommendation towards the era of large language models (llms): A comprehensive survey and roadmap.arXiv preprint arXiv:2501.01945(2025)

  69. [69]

    Zhiwei Zhao, Xiaoye Wang, and Yingyuan Xiao. 2023. Combining Multi-Head Attention and Sparse Multi-Head Attention Networks for Session-Based Recom- mendation. In2023 International Joint Conference on Neural Networks (IJCNN). 1–8. doi:10.1109/IJCNN54540.2023.10191924

  70. [70]

    Xin Zhou. 2023. MMRec: Simplifying multimodal recommendation. InProc. of the 5th ACM International Conf. on Multimedia in Asia Workshops. 1–2

  71. [71]

    Xin Zhou and Zhiqi Shen. 2023. A Tale of Two Graphs: Freezing and Denoising Graph Structures for Multimodal Recommendation. InProc. of the 31st ACM International Conf. on Multimedia (MM ’23)

  72. [72]

    Zhihui Zhou, Lilin Zhang, and Ning Yang. 2023. Contrastive collaborative filtering for cold-start item recommendation. InProc. of the ACM Web Conf. 2023

  73. [73]

    Jieming Zhu, Jinyang Liu, Weiqi Li, Jincai Lai, Xiuqiang He, Liang Chen, and Zibin Zheng. 2020. Ensembled CTR prediction via knowledge distillation. InProc. of the 29th ACM international conf. on information & knowledge management

  74. [74]

    Ziwei Zhu, Jingu Kim, Trung Nguyen, Aish Fenton, and James Caverlee. 2021. Fairness among new items in cold start recommender systems. InProc. of the 44th international ACM SIGIR conf. on research and development in information retrieval. 767–776

  75. [75]

    Ziwei Zhu, Shahin Sefati, Parsa Saadatpanah, and James Caverlee. 2020. Rec- ommendation for New Users and New Items via Randomized Training and Mixture-of-Experts Transformation. InProc. of the 43rd International ACM SIGIR Conf. on Research and Development in Information Retrieval (SIGIR ’20)

  76. [76]

    Ziyi Zhuang, Hanwen Du, Hui Han, Youhua Li, Junchen Fu, Joemon M Jose, and Yongxin Ni. 2025. Bridging the Gap: Teacher-Assisted Wasserstein Knowledge Distillation for Efficient Multi-Modal Recommendation. InProc. of the ACM on Web Conf. 2025. 2464–2475