arxiv: 2604.12990 · v1 · submitted 2026-04-14 · 💻 cs.IR

Recognition: unknown

Sparse Contrastive Learning for Content-Based Cold Item Recommendation

Gregor Meehan , Johan Pauwels

Authors on Pith no claims yet

Pith reviewed 2026-05-10 14:12 UTC · model grok-4.3

classification 💻 cs.IR

keywords cold-start recommendationcontent-based filteringsparse contrastive learningentmaxsampled softmaxknowledge distillationitem equity

0 comments

The pith

A content encoder trained solely on item features with sparse contrastive loss can rank cold-start items more accurately than methods that map content into collaborative embedding spaces.

A machine-rendered reading of the paper's core claim, the machinery that carries it, and where it could break.

The paper claims that cold-start item recommendation need not rely on any collaborative filtering signals or alignment between content and user embeddings. Instead, a content encoder is trained to produce a latent space in which item-to-item similarity directly reflects user preferences. The training uses a sparse generalization of sampled softmax based on the alpha-entmax family of activations, which zeros out gradients for uninformative negative samples and yields sharper relevance estimates. This regime, called SEMCo and optionally extended by knowledge distillation, is reported to improve ranking metrics over prior cold-start approaches and standard sampled softmax. The authors further note that avoiding collaborative signals can improve equity of outcomes across items.

Core claim

The authors introduce Sampled Entmax for Cold-start (SEMCo), a purely content-based training objective that replaces the standard softmax in sampled contrastive loss with the alpha-entmax activation. This produces sparse gradients by zeroing contributions from uninformative negatives, allowing the content encoder to learn a latent space where similarity correlates with user preferences. When combined with knowledge distillation, the resulting model outperforms both existing cold-start baselines and conventional sampled softmax on ranking accuracy. The approach is presented as avoiding the information gap inherent in aligning content features to collaborative embeddings.

What carries the argument

The Sampled Entmax for Cold-start (SEMCo) objective, a sparse generalization of sampled softmax loss that employs the alpha-entmax family to zero gradients on uninformative negatives during content-encoder training.

Load-bearing premise

That item content alone supplies enough signal for an encoder to learn a latent space whose similarities match user preferences without any interaction data or alignment to collaborative embeddings.

What would settle it

A head-to-head ranking experiment on a public cold-start dataset in which SEMCo records lower NDCG@K or recall@K than a content-to-CF alignment baseline on the same held-out cold items.

Figures

Figures reproduced from arXiv: 2604.12990 by Gregor Meehan, Johan Pauwels.

read the original abstract

Item cold-start is a pervasive challenge for collaborative filtering (CF) recommender systems. Existing methods often train cold-start models by mapping auxiliary item content, such as images or text descriptions, into the embedding space of a CF model. However, such approaches can be limited by the fundamental information gap between CF signals and content features. In this work, we propose to avoid this limitation with purely content-based modeling of cold items, i.e. without alignment with CF user or item embeddings. We instead frame cold-start prediction in terms of item-item similarity, training a content encoder to project into a latent space where similarity correlates with user preferences. We define our training objective as a sparse generalization of sampled softmax loss with the $\alpha$-entmax family of activation functions, which allows for sharper estimation of item relevance by zeroing gradients for uninformative negatives. We then describe how this Sampled Entmax for Cold-start (SEMCo) training regime can be extended via knowledge distillation, and show that it outperforms existing cold-start methods and standard sampled softmax in ranking accuracy. We also discuss the advantages of purely content-based modeling, particularly in terms of equity of item outcomes.

Editorial analysis

A structured set of objections, weighed in public.

Desk editor's note, referee report, simulated authors' rebuttal, and a circularity audit. Tearing a paper down is the easy half of reading it; the pith above is the substance, this is the friction.

Desk Editor's Note private letter to a colleague

The paper's content-only cold-start claim with entmax loss is interesting but likely still relies on interaction data for training supervision.

read the letter

The one thing to know is that this paper tries to solve cold-start item recommendation using only content features by training an encoder with a sparse contrastive loss based on alpha-entmax, and they report better ranking performance than baselines. The second is that the purely content-based part may not be as clean as it sounds because defining the training pairs likely requires user interaction data. What the paper does well is shift away from aligning content to CF embeddings, which often creates that information gap. Framing it as learning item similarities that track preferences is direct, and adding the sparse activation to drop uninformative negatives is a nice, low-overhead change to the loss. They also touch on equity for item outcomes, which is worth highlighting in cold-start work. The soft spots are around the mechanism and evidence. The abstract doesn't detail how positives are chosen or what datasets and metrics they used, so the outperformance claim is hard to assess yet. More importantly, if the supervision comes from co-interaction data, then the model is learning from collaborative patterns even though it only uses content at test time. That undercuts the claim of avoiding CF signals entirely. The entmax sparsity needs to be shown to actually matter, not just asserted. This paper is for recsys researchers focused on cold-start and contrastive learning. A practitioner might get ideas for new item handling, but the real value is for people who want to experiment with sparse losses in this domain. It deserves a serious referee because the approach is distinct enough and the problem is persistent. I'd recommend sending it to review, with the expectation that the authors clarify the data pipeline and provide ablations on the alpha parameter and pair selection.

Referee Report

2 major / 2 minor

Summary. The paper proposes Sampled Entmax for Cold-start (SEMCo), a purely content-based approach to cold-start item recommendation. It trains a content encoder (e.g., on images or text) to embed items in a latent space where pairwise similarity correlates with user preferences, using a sparse generalization of sampled softmax loss based on the α-entmax family. This sparsity is intended to zero gradients on uninformative negatives for sharper relevance estimation. The method can be extended with knowledge distillation, and the authors report that it outperforms prior cold-start baselines and standard sampled softmax on ranking metrics while offering equity advantages for item outcomes.

Significance. If the central claim holds—that a content-only encoder can be trained to produce preference-aligned similarities without CF alignment or signals—the result would be significant for cold-start recommendation. It would demonstrate a viable alternative to embedding-alignment methods that suffer from information gaps, and the entmax-based sparsity would represent a concrete technical advance over softmax sampling for contrastive objectives in recsys. The equity discussion could also be valuable if supported by appropriate metrics.

major comments (2)

[Abstract and §3] Abstract and §3 (training objective): The central claim that SEMCo is 'purely content-based' and avoids the 'fundamental information gap' by training 'without alignment with CF user or item embeddings' is load-bearing but under-specified. Positive pairs must be defined to train the contrastive loss; if these derive from observed user-item interactions or co-occurrences (standard practice), collaborative signals are present at training time even if inference uses only content. This directly undermines the premise that the method sidesteps CF signals. The manuscript must explicitly state the source of positives/negatives and whether any CF-derived labels or pre-trained embeddings are used.
[§4 and results tables] §4 (experiments) and Table X (results): The abstract asserts outperformance over 'existing cold-start methods and standard sampled softmax' in ranking accuracy, but the strength of this claim depends on the experimental design. Without details on dataset sizes, number of cold items, choice of α, ablation on the entmax sparsity mechanism (e.g., fraction of zeroed gradients), statistical significance tests, and whether baselines were re-implemented with identical content encoders, it is impossible to verify that gains are attributable to the proposed loss rather than implementation differences or post-hoc tuning.

minor comments (2)

[§3] Notation for the α-entmax loss should be introduced with an explicit equation (e.g., the sparse softmax definition) rather than assuming familiarity; this would improve readability for readers outside the entmax literature.
[Discussion section] The equity discussion would benefit from a concrete metric (e.g., Gini coefficient on item exposure or long-tail coverage) and a comparison table rather than qualitative statements.

Simulated Author's Rebuttal

2 responses · 0 unresolved

We thank the referee for their insightful comments. We provide detailed responses to each major comment below and will revise the manuscript to address the concerns raised.

read point-by-point responses

Referee: [Abstract and §3] Abstract and §3 (training objective): The central claim that SEMCo is 'purely content-based' and avoids the 'fundamental information gap' by training 'without alignment with CF user or item embeddings' is load-bearing but under-specified. Positive pairs must be defined to train the contrastive loss; if these derive from observed user-item interactions or co-occurrences (standard practice), collaborative signals are present at training time even if inference uses only content. This directly undermines the premise that the method sidesteps CF signals. The manuscript must explicitly state the source of positives/negatives and whether any CF-derived labels or pre-trained embeddings are used.

Authors: We appreciate the referee pointing out the need for greater specificity in describing our training setup. To clarify: positive pairs are constructed based on co-occurrence in user interactions (i.e., items liked by the same users are treated as positives), which provides the supervisory signal for preference alignment. Negative pairs are sampled from the item pool. Importantly, however, the content encoder is trained exclusively on item content features without any pre-trained CF embeddings or direct use of user embeddings. No alignment to an existing CF latent space occurs. This is what we mean by 'purely content-based' and avoiding the information gap: at inference, recommendations for cold items rely solely on content-derived similarities, without needing CF data for those items. We will revise the abstract and Section 3 to explicitly detail the construction of positive/negative pairs and confirm that no CF-derived labels or embeddings are used beyond defining the pairs. revision: yes
Referee: [§4 and results tables] §4 (experiments) and Table X (results): The abstract asserts outperformance over 'existing cold-start methods and standard sampled softmax' in ranking accuracy, but the strength of this claim depends on the experimental design. Without details on dataset sizes, number of cold items, choice of α, ablation on the entmax sparsity mechanism (e.g., fraction of zeroed gradients), statistical significance tests, and whether baselines were re-implemented with identical content encoders, it is impossible to verify that gains are attributable to the proposed loss rather than implementation differences or post-hoc tuning.

Authors: We agree that more experimental details are necessary to substantiate our claims. In the revised version, we will expand Section 4 to include: dataset sizes and the number/proportion of cold items; the specific value of α used and how it was selected; an ablation study quantifying the effect of the entmax sparsity (including the average fraction of zeroed gradients); statistical significance testing on the reported metrics; and explicit confirmation that all baselines were re-implemented with the identical content encoders and hyperparameter tuning procedures as SEMCo. These additions will allow readers to better assess the contribution of the sparse contrastive loss. revision: yes

Circularity Check

0 steps flagged

No significant circularity; derivation remains self-contained.

full rationale

The paper defines its training objective explicitly as a sparse generalization of the sampled softmax loss via the α-entmax family, presented as an extension of an established contrastive objective rather than a redefinition of its own outputs. No equations or steps are shown that reduce the claimed ranking gains to a fitted parameter renamed as a prediction, nor does any load-bearing premise collapse to a self-citation chain or ansatz smuggled from prior author work. The content encoder's latent space is trained to correlate item similarity with user preferences by construction of the loss, but this is an explicit modeling choice with external empirical validation claimed against baselines, not a tautological equivalence. Minor self-citation of contrastive learning literature is present but not load-bearing for the central result.

Axiom & Free-Parameter Ledger

1 free parameters · 1 axioms · 0 invented entities

The approach rests on the premise that content features alone suffice to learn preference-aligned similarities and that the entmax sparsity mechanism improves gradient quality over standard softmax; no new entities are postulated.

free parameters (1)

alpha
Sparsity parameter of the entmax family that controls how many negatives receive zero gradient; must be chosen or tuned.

axioms (1)

standard math The α-entmax family provides a valid sparse generalization of softmax that can zero out gradients for uninformative negatives.
Invoked as the activation in the sampled loss for sharper relevance estimation.

pith-pipeline@v0.9.0 · 5497 in / 1278 out tokens · 35131 ms · 2026-05-10T14:12:01.325002+00:00 · methodology

discussion (0)

Reference graph

Works this paper leans on

76 extracted references · 12 canonical work pages · 5 internal anchors

[1]

Gediminas Adomavicius and YoungOk Kwon. 2011. Improving aggregate recom- mendation diversity using ranking-based techniques.IEEE Trans. on Knowledge and Data Engineering24, 5 (2011), 896–911

2011
[2]

Haoyue Bai, Min Hou, Le Wu, Yonghui Yang, Kun Zhang, Richang Hong, and Meng Wang. 2023. Gorec: a generative cold-start recommendation framework. InProc. of the 31st ACM international conf. on multimedia. 1004–1012

2023
[3]

James Bergstra, Rémi Bardenet, Yoshua Bengio, and Balázs Kégl. 2011. Algorithms for hyper-parameter optimization.Advances in neural information processing systems24 (2011)

2011
[4]

Mathieu Blondel, André FT Martins, and Vlad Niculae. 2020. Learning with fenchel-young losses.Journal of Machine Learning Research21, 35 (2020), 1–69

2020
[5]

Hao Chen, Zefan Wang, Feiran Huang, Xiao Huang, Yue Xu, Yishi Lin, Peng He, and Zhoujun Li. 2022. Generative adversarial framework for cold-start item recommendation. InProc. of the 45th International ACM SIGIR Conf. on Research and Development in Information Retrieval. 2565–2571

2022
[6]

Ting Chen, Simon Kornblith, Mohammad Norouzi, and Geoffrey Hinton. 2020. A simple framework for contrastive learning of visual representations. InIntl. Conf. on Machine Learning. 1597–1607

2020
[7]

Minjin Choi, Hye-young Kim, Hyunsouk Cho, and Jongwuk Lee. 2024. Multi- intent-aware session-based recommendation. InProceedings of the 47th interna- tional ACM SIGIR conference on research and development in information retrieval. 2532–2536

2024
[8]

Jacob Devlin, Ming-Wei Chang, Kenton Lee, and Kristina Toutanova. 2018. BERT: Pre-training of Deep Bidirectional Transformers for Language Understanding. CoRRabs/1810.04805 (2018). arXiv:1810.04805 http://arxiv.org/abs/1810.04805

work page internal anchor Pith review Pith/arXiv arXiv 2018
[9]

Christian Ganhör, Marta Moscati, Anna Hausberger, Shah Nawaz, and Markus Schedl. 2024. A Multimodal Single-Branch Embedding Network for Recommen- dation in Cold-Start and Missing Modality Scenarios. InProc. of the 18th ACM Conf. on Recommender Systems. 380–390

2024
[10]

Jianping Gou, Baosheng Yu, Stephen J Maybank, and Dacheng Tao. 2021. Knowl- edge distillation: A survey.International journal of computer vision129, 6 (2021)

2021
[11]

Geoffrey Hinton, Oriol Vinyals, and Jeff Dean. 2015. Distilling the knowledge in a neural network.arXiv preprint arXiv:1503.02531(2015)

work page internal anchor Pith review Pith/arXiv arXiv 2015
[12]

Feiran Huang, Zefan Wang, Xiao Huang, Yufeng Qian, Zhetao Li, and Hao Chen
[13]

Aligning Distillation For Cold-start Item Recommendation. InProc. of the 46th International ACM SIGIR Conf. on Research and Development in Information Retrieval (SIGIR ’23). Association for Computing Machinery, New York, NY, USA
[14]

SeongKu Kang, Wonbin Kweon, Dongha Lee, Jianxun Lian, Xing Xie, and Hwanjo Yu. 2023. Distillation from heterogeneous models for top-K recommendation. In Proc. of the ACM Web Conf. 2023. 801–811

2023
[15]

SeongKu Kang, Wonbin Kweon, Dongha Lee, Jianxun Lian, Xing Xie, and Hwanjo Yu. 2025. Unbiased, Effective, and Efficient Distillation from Heterogeneous Models for Recommender Systems.ACM Transactions on Recommender Systems 3, 3 (2025), 1–28

2025
[16]

Adam: A Method for Stochastic Optimization

Diederik P. Kingma and Jimmy Ba. 2014. Adam: A Method for Stochastic Opti- mization.CoRRabs/1412.6980 (2014)

work page internal anchor Pith review Pith/arXiv arXiv 2014
[17]

Anastasiia Klimashevskaia, Dietmar Jannach, Mehdi Elahi, and Christoph Trat- tner. 2024. A survey on popularity bias in recommender systems.User Modeling and User-Adapted Interaction34, 5 (2024), 1777–1834

2024
[18]

Weiwei Kong, Walid Krichene, Nicolas Mayoraz, Steffen Rendle, and Li Zhang
[19]

Advances in Neural Information Processing Systems33 (2020), 633–643

Rankmax: An adaptive projection alternative to the softmax function. Advances in Neural Information Processing Systems33 (2020), 633–643

2020
[20]

Jae-woong Lee, Minjin Choi, Jongwuk Lee, and Hyunjung Shim. 2019. Collabo- rative distillation for top-N recommendation. In2019 IEEE international conf. on data mining (ICDM). IEEE, 369–378

2019
[21]

Ilya Loshchilov and Frank Hutter. 2016. Sgdr: Stochastic gradient descent with warm restarts.arXiv preprint arXiv:1608.03983(2016)

work page Pith review arXiv 2016
[22]

Daniele Malitesta, Giandomenico Cornacchia, Claudio Pomo, and Tommaso Di Noia. 2023. On popularity bias of multimodal-aware recommender systems: a modalities-driven analysis. InProc. of the 1st International Workshop on Deep Multimodal Learning for Information Retrieval. 59–68

2023
[23]

Daniele Malitesta, Emanuele Rossi, Claudio Pomo, Tommaso Di Noia, and Fragkiskos D Malliaros. 2026. Training-free Graph-based Imputation of Missing Modalities in Multimodal Recommendation.IEEE Transactions on Knowledge and Data Engineering(2026)

2026
[24]

Saranya Maneeroj and Nakarin Sritrakool. 2022. An end-to-end personalized pref- erence drift aware sequential recommender system with optimal item utilization. IEEE Access10 (2022), 62932–62952

2022
[25]

Andre Martins and Ramon Astudillo. 2016. From softmax to sparsemax: A sparse model of attention and multi-label classification. InInternational conference on machine learning. PMLR, 1614–1623

2016
[26]

Julian McAuley, Christopher Targett, Qinfeng Shi, and Anton Van Den Hengel
[27]

Image-based recommendations on styles and substitutes. InProc. of the 38th international ACM SIGIR conf. on research and development in information retrieval. 43–52
[28]

Gregor Meehan and Johan Pauwels. 2025. Artist Considerations in Offline Eval- uation of Music Recommender Systems. MuRS 2025: 3rd Music Recommender Systems Workshop, September 22nd, 2025

2025
[29]

Gregor Meehan and Johan Pauwels. 2025. On Inherited Popularity Bias in Cold- Start Item Recommendation. InProc. of the Nineteenth ACM Conf. on Recommender Systems. 649–654

2025
[30]

Marta Moscati, Emilia Parada-Cabaleiro, Yashar Deldjoo, Eva Zangerle, and Markus Schedl. 2022. Music4All-Onion – A Large-Scale Multi-Faceted Content- Centric Music Recommendation Dataset. InProc. of the 31st ACM International Conf. on Information & Knowledge Management(Atlanta, GA, USA)(CIKM ’22). Association for Computing Machinery, New York, NY, USA, 43...

work page arXiv 2022
[31]

Amir Moslemi, Anna Briskina, Zubeka Dang, and Jason Li. 2024. A survey on knowledge distillation: Recent advancements.Machine Learning with Applications 18 (2024), 100605. SIGIR ’26, July 20–24, 2026, Melbourne, VIC, Australia Meehan and Pauwels

2024
[32]

Yongxin Ni, Yu Cheng, Xiangyan Liu, Junchen Fu, Youhua Li, Xiangnan He, Yongfeng Zhang, and Fajie Yuan. 2023. A Content-Driven Micro-Video Recom- mendation Dataset at Scale.arXiv preprint arXiv:2309.15379(2023)

work page arXiv 2023
[33]

Xia Ning and George Karypis. 2011. Slim: Sparse linear methods for top-n recommender systems. In2011 IEEE 11th international conference on data mining. IEEE, 497–506

2011
[34]

Aaron van den Oord, Yazhe Li, and Oriol Vinyals. 2018. Representation learning with contrastive predictive coding.arXiv preprint arXiv:1807.03748(2018)

work page internal anchor Pith review Pith/arXiv arXiv 2018
[35]

Seongmin Park, Mincheol Yoon, Hye-young Kim, and Jongwuk Lee. 2025. Why is normalization necessary for linear recommenders?. InProceedings of the 48th international ACM SIGIR conference on research and development in information retrieval. 2142–2151

2025
[36]

Ben Peters, Vlad Niculae, and André FT Martins. 2019. Sparse sequence-to- sequence models.arXiv preprint arXiv:1905.05702(2019)

work page Pith review arXiv 2019
[37]

Yuanhao Pu, Xiaolong Chen, Xu Huang, Jin Chen, Defu Lian, and Enhong Chen
[38]

InProceedings of the 41st International Conference on Machine Learning

Learning-efficient yet generalizable collaborative filtering for item rec- ommendation. InProceedings of the 41st International Conference on Machine Learning. 41183–41203
[39]

Igor André Pegoraro Santana, Fabio Pinhelli, Juliano Donini, Leonardo Catharin, Rafael Biazus Mangolin, Valéria Delisandra Feltrim, Marcos Aurélio Domingues, et al. 2020. Music4all: A new music database and its applications. In2020 Interna- tional Conf. on Systems, Signals and Image Processing (IWSSIP). IEEE, 399–404

2020
[40]

Markus Schedl, Stefan Brandl, Oleg Lesota, Emilia Parada-Cabaleiro, David Penz, and Navid Rekabsaz. 2022. LFM-2b: A dataset of enriched music listening events for recommender systems research and fairness analysis. InProc. of the 2022 Conf. on Human Information Interaction and Retrieval. 337–341

2022
[41]

Andrew I Schein, Alexandrin Popescul, Lyle H Ungar, and David M Pennock
[42]

Methods and metrics for cold-start recommendations. InProc. of the 25th annual international ACM SIGIR conf. on Research and development in information retrieval. 253–260
[43]

Pavan Seshadri, Shahrzad Shashaani, and Peter Knees. 2024. Enhancing Se- quential Music Recommendation with Negative Feedback-informed Contrastive Learning. InProc. of the 18th ACM Conf. on Recommender Systems. 1028–1032

2024
[44]

Yu Shang, Chen Gao, Jiansheng Chen, Depeng Jin, and Yong Li. 2024. Improving item-side fairness of multimodal recommendation via modality debiasing. In Proc. of the ACM Web Conf. 2024. 4697–4705

2024
[45]

Harald Steck. 2019. Embarrassingly shallow autoencoders for sparse data. InThe World Wide Web Conference. 3251–3257

2019
[46]

Changfeng Sun, Han Liu, Meng Liu, Zhaochun Ren, Tian Gan, and Liqiang Nie
[47]

LARA: Attribute-to-feature adversarial learning for new-item recommen- dation. InProc. of the 13th international conf. on web search and data mining
[48]

Yan-Martin Tamm, Gregor Meehan, Vojtěch Nekl, Vojtěch Vančura, Rodrigo Alves, Johan Pauwels, and Anna Aljanaki. 2026. Leveraging Artist Catalogs for Cold-Start Music Recommendation. arXiv:2604.07090 [cs.IR] https://arxiv.org/ abs/2604.07090

work page internal anchor Pith review Pith/arXiv arXiv 2026
[49]

Jiaxi Tang and Ke Wang. 2018. Ranking distillation: Learning compact ranking models with high performance for recommender system. InProc. of the 24th ACM SIGKDD international conf. on knowledge discovery & data mining. 2289–2298

2018
[50]

Antti Tarvainen and Harri Valpola. 2017. Mean teachers are better role models: Weight-averaged consistency targets improve semi-supervised deep learning results.Advances in neural information processing systems30 (2017)

2017
[51]

Constantino Tsallis. 1988. Possible generalization of Boltzmann-Gibbs statistics. Journal of statistical physics52, 1 (1988), 479–487

1988
[52]

Aaron Van den Oord, Sander Dieleman, and Benjamin Schrauwen. 2013. Deep content-based music recommendation.Advances in neural information processing systems26 (2013)

2013
[53]

Vojtěch Vančura, Rodrigo Alves, Petr Kasalick`y, and Pavel Kordík. 2022. Scalable linear shallow autoencoder for collaborative filtering. InProceedings of the 16th ACM Conference on Recommender Systems. 604–609

2022
[54]

Vojtěch Vančura, Petr Kasalick`y, Rodrigo Alves, and Pavel Kordík. 2025. Evaluat- ing Linear Shallow Autoencoders on Large Scale Datasets.ACM Transactions on Recommender Systems(2025)

2025
[55]

Vojtěch Vančura, Pavel Kordík, and Milan Straka. 2024. beeFormer: Bridging the Gap Between Semantic and Interaction Similarity in Recommender Systems. In Proceedings of the 18th ACM Conference on Recommender Systems. 1102–1107

2024
[56]

Maksims Volkovs, Guangwei Yu, and Tomi Poutanen. 2017. Dropoutnet: Ad- dressing cold start in recommender systems.Advances in neural information processing systems30 (2017)

2017
[57]

Wenbo Wang, Bingquan Liu, Lili Shan, Chengjie Sun, Ben Chen, and Jian Guan
[58]

Preference Aware Dual Contrastive Learning for Item Cold-Start Recom- mendation. InProc. of the AAAI Conf. on Artificial Intelligence, Vol. 38. 9125–9132
[59]

Yinwei Wei, Xiang Wang, Qi Li, Liqiang Nie, Yan Li, Xuanping Li, and Tat-Seng Chua. 2021. Contrastive learning for cold-start recommendation. InProc. of the 29th ACM International Conf. on Multimedia. 5382–5390

2021
[60]

Minz Won, Yun-Ning Hung, and Duc Le. 2024. A foundation model for music informatics. InICASSP 2024-2024 IEEE International Conf. on Acoustics, Speech and Signal Processing (ICASSP). IEEE, 1226–1230

2024
[61]

Jiancan Wu, Xiang Wang, Xingyu Gao, Jiawei Chen, Hongcheng Fu, and Tianyu Qiu. 2024. On the effectiveness of sampled softmax loss for item recommendation. ACM Transactions on Information Systems42, 4 (2024), 1–26

2024
[62]

Jiahao Yuan, Zihan Song, Mingyou Sun, Xiaoling Wang, and Wayne Xin Zhao
[63]

In Proceedings of the AAAI conference on artificial intelligence, Vol

Dual sparse attention network for session-based recommendation. In Proceedings of the AAAI conference on artificial intelligence, Vol. 35. 4635–4643
[64]

Hansi Zeng, Hamed Zamani, and Vishwa Vinay. 2022. Curriculum learning for dense retrieval distillation. InProc. of the 45th International ACM SIGIR Conf. on Research and Development in Information Retrieval. 1979–1983

2022
[65]

Jinghao Zhang, Guofan Liu, Qiang Liu, Shu Wu, and Liang Wang. 2024. Modality- Balanced Learning for Multimedia Recommendation. InProc. of the 32nd ACM International Conf. on Multimedia (MM ’24). Association for Computing Machin- ery, New York, NY, USA, 7551–7560. doi:10.1145/3664647.3680626

work page doi:10.1145/3664647.3680626 2024
[66]

Linfeng Zhang, Chenglong Bao, and Kaisheng Ma. 2021. Self-distillation: Towards efficient and compact neural networks.IEEE Transactions on Pattern Analysis and Machine Intelligence44, 8 (2021), 4388–4403

2021
[67]

Linfeng Zhang, Jiebo Song, Anni Gao, Jingwei Chen, Chenglong Bao, and Kaisheng Ma. 2019. Be your own teacher: Improve the performance of con- volutional neural networks via self distillation. InProceedings of the IEEE/CVF international conference on computer vision. 3713–3722

2019
[68]

Weizhi Zhang, Yuanchen Bei, Liangwei Yang, Henry Peng Zou, Peilin Zhou, Aiwei Liu, Yinghui Li, Hao Chen, Jianling Wang, Yu Wang, et al. 2025. Cold-start rec- ommendation towards the era of large language models (llms): A comprehensive survey and roadmap.arXiv preprint arXiv:2501.01945(2025)

work page arXiv 2025
[69]

Zhiwei Zhao, Xiaoye Wang, and Yingyuan Xiao. 2023. Combining Multi-Head Attention and Sparse Multi-Head Attention Networks for Session-Based Recom- mendation. In2023 International Joint Conference on Neural Networks (IJCNN). 1–8. doi:10.1109/IJCNN54540.2023.10191924

work page doi:10.1109/ijcnn54540.2023.10191924 2023
[70]

Xin Zhou. 2023. MMRec: Simplifying multimodal recommendation. InProc. of the 5th ACM International Conf. on Multimedia in Asia Workshops. 1–2

2023
[71]

Xin Zhou and Zhiqi Shen. 2023. A Tale of Two Graphs: Freezing and Denoising Graph Structures for Multimodal Recommendation. InProc. of the 31st ACM International Conf. on Multimedia (MM ’23)

2023
[72]

Zhihui Zhou, Lilin Zhang, and Ning Yang. 2023. Contrastive collaborative filtering for cold-start item recommendation. InProc. of the ACM Web Conf. 2023

2023
[73]

Jieming Zhu, Jinyang Liu, Weiqi Li, Jincai Lai, Xiuqiang He, Liang Chen, and Zibin Zheng. 2020. Ensembled CTR prediction via knowledge distillation. InProc. of the 29th ACM international conf. on information & knowledge management

2020
[74]

Ziwei Zhu, Jingu Kim, Trung Nguyen, Aish Fenton, and James Caverlee. 2021. Fairness among new items in cold start recommender systems. InProc. of the 44th international ACM SIGIR conf. on research and development in information retrieval. 767–776

2021
[75]

Ziwei Zhu, Shahin Sefati, Parsa Saadatpanah, and James Caverlee. 2020. Rec- ommendation for New Users and New Items via Randomized Training and Mixture-of-Experts Transformation. InProc. of the 43rd International ACM SIGIR Conf. on Research and Development in Information Retrieval (SIGIR ’20)

2020
[76]

Ziyi Zhuang, Hanwen Du, Hui Han, Youhua Li, Junchen Fu, Joemon M Jose, and Yongxin Ni. 2025. Bridging the Gap: Teacher-Assisted Wasserstein Knowledge Distillation for Efficient Multi-Modal Recommendation. InProc. of the ACM on Web Conf. 2025. 2464–2475

2025