arxiv: 2604.23077 · v1 · submitted 2026-04-25 · 💻 cs.IR

Recognition: unknown

Adopting State-of-the-Art Pretrained Audio Representations for Music Recommender Systems

Anna Aljanaki, Yan-Martin Tamm

Authors on Pith no claims yet

Pith reviewed 2026-05-08 07:13 UTC · model grok-4.3

classification 💻 cs.IR

keywords pretrained audio representationsmusic recommender systemstransfer learningcold-start recommendationMIR tasksperformance evaluationaudio models

0 comments

The pith

Pretrained audio models for music perform differently in recommendation systems than in standard MIR tasks.

A machine-rendered reading of the paper's core claim, the machinery that carries it, and where it could break.

This paper tests whether nine state-of-the-art pretrained audio models from music information retrieval can be applied directly to music recommender systems. It evaluates them using five recommendation methods, including K-nearest neighbors and sequence models, in both regular and new-user cold-start settings. The results show clear differences in how well these models support recommendations compared to their established performance on tasks like auto-tagging and genre classification. The work suggests that the musical features learned for classification tasks capture aspects that are less directly useful for predicting user preferences in listening data. This matters because it points to a need for understanding task-specific requirements when transferring audio representations between music applications.

Core claim

Our evaluation of nine pretrained backend models (MusicFM, Music2Vec, MERT, EncodecMAE, Jukebox, MusiCNN, MULE, MuQ and MuQ-MuLan) across five recommendation approaches (KNN, Shallow Neural Network, Contrastive Multi-Modal projection, Hybrid model, and BERT4Rec) demonstrates significant performance disparity between their effectiveness in traditional MIR tasks and in both hot and cold music recommendations. This indicates that valuable aspects of musical information captured by these backend models may differ depending on the task, establishing a foundation for further exploration of pretrained audio representations to enhance music recommendation systems.

What carries the argument

Transfer evaluation of nine pretrained audio models to five music recommendation approaches for hot and cold-start scenarios.

If this is right

Pretrained models capture musical information whose value depends on whether the goal is classification or personalized recommendation.
Cold-start scenarios particularly expose limitations in direct transfer of these audio representations.
Hybrid approaches may better leverage pretrained audio features alongside other signals.
Task-specific adaptations are likely needed to improve transfer from MIR backends to recommender systems.
The findings support developing recommendation-focused objectives for future audio pretraining.

Where Pith is reading between the lines

These are editorial extensions of the paper, not claims the author makes directly.

Music recommenders could benefit from pretraining that directly incorporates user listening patterns rather than audio alone.
The disparities may guide creation of new audio features tuned to preference prediction instead of tagging.
Testing on additional datasets would clarify whether the gaps are consistent or tied to the chosen collections.
Integrating these representations with collaborative filtering could help close the performance difference in cold starts.

Load-bearing premise

The five chosen recommendation approaches and the particular datasets used are sufficient to reveal general differences in what the pretrained models capture.

What would settle it

If applying the same models after task-specific fine-tuning or on larger recommendation datasets removes the observed performance gap with MIR tasks, the claim of task-dependent differences would not hold.

Figures

Figures reproduced from arXiv: 2604.23077 by Anna Aljanaki, Yan-Martin Tamm.

**Figure 1.** Figure 1: Number of MRS papers using different types of input data to represent audio files per year. view at source ↗

**Figure 2.** Figure 2: A count plot showing genre distribution in Music4all dataset. view at source ↗

read the original abstract

Over the years, Music Information Retrieval (MIR) research community has released various models pretrained on large amounts of music data. Transfer learning showcases the proven effectiveness of pretrained backend models for a broad spectrum of downstream tasks, including auto-tagging and genre classification. However, MIR papers generally do not explore the efficiency of pretrained models for Music Recommender Systems (MRS). In addition, the Recommender Systems community tends to favour traditional end-to-end neural network training. Our research addresses this gap and evaluates the performance of nine pretrained backend models (MusicFM, Music2Vec, MERT, EncodecMAE, Jukebox, MusiCNN, MULE, MuQ and MuQ-MuLan) in the context of MRS. We assess them using five recommendation approaches: K-Nearest Neighbours (KNN), Shallow Neural Network, Contrastive Multi-Modal projection, a Hybrid model, and BERT4Rec both for the hot and cold-start scenarios. Our findings suggest that pretrained audio representations exhibit significant performance disparity between traditional MIR tasks and both hot and cold music recommendations, indicating that valuable aspects of musical information captured by backend models may differ depending on the task. This study establishes a foundation for further exploration of pretrained audio representations to enhance music recommendation systems.

Editorial analysis

A structured set of objections, weighed in public.

Desk editor's note, referee report, simulated authors' rebuttal, and a circularity audit. Tearing a paper down is the easy half of reading it; the pith above is the substance, this is the friction.

Desk Editor's Note private letter to a colleague

The paper gives a practical comparison of nine pretrained audio models on music recommendation but its claim about task-dependent information capture rests on mismatched literature numbers rather than matched tests.

read the letter

The main thing here is a straightforward empirical check: nine recent pretrained audio backbones (MusicFM, MERT, Jukebox and the rest) are plugged into five standard recommendation pipelines, run on both hot and cold-start music data, and the results are compared. That setup has not been reported before, so the work fills a narrow but real gap between the MIR and recsys communities. The experiments are simple, use off-the-shelf models, and cover multiple recommenders, which makes the findings easy to interpret for someone who just wants to know which frozen audio features are worth trying in a production MRS pipeline. No new theory or architecture is claimed, and that keeps the paper focused and short.

Referee Report

2 major / 2 minor

Summary. The paper evaluates nine pretrained audio representation models (MusicFM, Music2Vec, MERT, EncodecMAE, Jukebox, MusiCNN, MULE, MuQ, MuQ-MuLan) as backends for music recommender systems. Using five approaches (KNN, Shallow Neural Network, Contrastive Multi-Modal projection, Hybrid model, BERT4Rec) it tests performance in hot and cold-start recommendation scenarios and reports a significant disparity relative to the models' published results on traditional MIR tasks such as auto-tagging and genre classification, concluding that the musical information captured by these representations is task-dependent.

Significance. If the disparity claim can be supported under matched conditions, the work would usefully bridge MIR and recommender-systems research by showing that off-the-shelf pretrained audio encoders are not equally transferable to recommendation. The breadth of nine backends and five pipelines provides a practical starting point for practitioners selecting representations for MRS, even if the causal interpretation requires further controls.

major comments (2)

[Abstract / Results] Abstract and Results sections: the central claim of performance disparity (and the inference that 'valuable aspects of musical information captured by backend models may differ depending on the task') rests on comparing the new MRS results to MIR numbers taken from the literature. Because those MIR evaluations use different datasets, label sets, and protocols (e.g., tagging on MTG-Jamendo versus the paper's user-item data), the gap could arise from domain shift or label mismatch rather than from fundamentally different information being captured. A direct test—running the same frozen representations on MIR-style labels for the identical tracks—would be needed to substantiate the task-dependence conclusion.
[Experiments] Experimental design: the five recommendation pipelines and nine backends supply useful internal variation, yet the manuscript does not report whether the same audio tracks were also evaluated on any MIR proxy task under identical conditions. Without this matched comparison, the weakest assumption—that the chosen datasets and pipelines are sufficient to reveal general differences in captured information—remains under-supported.

minor comments (2)

[Abstract] Abstract: quantitative results, error bars, statistical tests, and basic details on data splits or hyper-parameters are absent; a short summary of key metrics would improve readability.
[Methods] Methods: reproducibility would benefit from explicit statements of the exact evaluation metrics, negative-sampling strategy for the contrastive and hybrid models, and how cold-start users/items were defined.

Simulated Author's Rebuttal

2 responses · 1 unresolved

We thank the referee for the careful reading and valuable feedback on our work evaluating pretrained audio models for music recommendation. We address the two major comments point by point below, focusing on the comparison to MIR benchmarks. We agree that additional caveats are needed and will revise the manuscript accordingly while preserving the core empirical contribution.

read point-by-point responses

Referee: [Abstract / Results] Abstract and Results sections: the central claim of performance disparity (and the inference that 'valuable aspects of musical information captured by backend models may differ depending on the task') rests on comparing the new MRS results to MIR numbers taken from the literature. Because those MIR evaluations use different datasets, label sets, and protocols (e.g., tagging on MTG-Jamendo versus the paper's user-item data), the gap could arise from domain shift or label mismatch rather than from fundamentally different information being captured. A direct test—running the same frozen representations on MIR-style labels for the identical tracks—would be needed to substantiate the task-dependence conclusion.

Authors: We acknowledge that direct comparison to published MIR results on different datasets introduces potential confounds such as domain shift and label mismatch. Our intent was to highlight a practical observation: models that achieve strong results on standard MIR benchmarks show substantially weaker performance when used as frozen backends in common MRS pipelines. To address the concern, we will revise the abstract and results to explicitly note that MIR figures are taken from the literature (with citations) and add a dedicated limitations paragraph discussing possible dataset and protocol differences. We will also soften the causal language around 'task-dependence' to 'suggestive of differing utility across tasks.' revision: partial
Referee: [Experiments] Experimental design: the five recommendation pipelines and nine backends supply useful internal variation, yet the manuscript does not report whether the same audio tracks were also evaluated on any MIR proxy task under identical conditions. Without this matched comparison, the weakest assumption—that the chosen datasets and pipelines are sufficient to reveal general differences in captured information—remains under-supported.

Authors: We agree that a matched MIR proxy evaluation on the identical tracks would provide stronger support for claims about captured information. Our experimental focus was on MRS performance using the available user-item dataset and standard recommendation protocols. We will update the experimental design and discussion sections to explicitly state that no intra-dataset MIR proxy tasks were run and to qualify our conclusions as comparisons against established literature benchmarks rather than controlled matched tasks. This will include a clearer statement of the assumption and its limitations. revision: partial

standing simulated objections not resolved

Conducting MIR-style proxy tasks (e.g., auto-tagging or genre classification) on the exact same tracks from the recommendation dataset, because the user-item data does not include the required MIR annotations.

Circularity Check

0 steps flagged

No circularity: purely empirical comparison of existing models

full rationale

The paper performs a direct empirical evaluation of nine off-the-shelf pretrained audio backends (MusicFM, MERT, etc.) on five MRS pipelines (KNN, BERT4Rec, etc.) using standard hot/cold-start splits. No derivations, equations, fitted parameters, or ansatzes are introduced. The disparity claim between MIR and MRS performance rests on the authors' own MRS measurements plus citations to independent prior MIR papers; these citations are external literature values, not self-citations whose authors overlap with the present work, and they do not reduce the central result to a tautology or construction. The study is therefore self-contained against external benchmarks with no load-bearing self-referential steps.

Axiom & Free-Parameter Ledger

0 free parameters · 0 axioms · 0 invented entities

Empirical evaluation paper; no mathematical derivations, no new entities postulated, and no free parameters introduced beyond standard training choices for the downstream recommenders.

pith-pipeline@v0.9.0 · 5521 in / 1009 out tokens · 24970 ms · 2026-05-08T07:13:44.350177+00:00 · methodology

discussion (0)

Reference graph

Works this paper leans on

107 extracted references · 32 canonical work pages · 2 internal anchors

[1]

Abdul and Abdulbasit K

Zrar Kh. Abdul and Abdulbasit K. Al-Talabani. 2022. Mel Frequency Cepstral Coefficient and its Applications: A Review.IEEE Access10 (2022), 122136–122158. https://doi.org/10.1109/ACCESS.2022.3223444

work page doi:10.1109/access.2022.3223444 2022
[2]

Adiyansjah, Alexander Agung Santoso Gunawan, and Derwin Suhartono. 2019. Music Recommender System Based on Genre using Convolutional Recurrent Neural Networks.Procedia Computer Science(2019)

2019
[3]

Pedro Álvarez, Francisco Javier Zarazaga-Soria, and Sandra Baldassarri. 2020. Mobile music recommendations for runners based on location and emotions: The DJ-Running system.Pervasive Mob. Comput.67 (2020), 101242

2020
[4]

Ivana Andjelkovic, Denis Parra, and John O’Donovan. 2016. Moodplay: Interactive Mood-based Music Discovery and Recommendation.Proceedings of the 2016 Conference on User Modeling Adaptation and Personalization(2016)

2016
[5]

Ellis, Brian Whitman, and Paul Lamere

Thierry Bertin-Mahieux, Daniel P.W. Ellis, Brian Whitman, and Paul Lamere. 2011. The Million Song Dataset. In Proceedings of the 12th International Conference on Music Information Retrieval (ISMIR 2011)

2011
[6]

Théo Bontempelli, Benjamin Chapus, François Rigaud, Mathieu Morlon, Marin Lorant, and Guillaume Salha-Galvan
[7]

Flow Moods: Recommending Music by Moods on Deezer.Proceedings of the 16th ACM Conference on Recommender Systems(2022)

2022
[8]

Rui Cai, Chao Zhang, Lei Zhang, and Wei-Ying Ma. 2007. Scalable music recommendation by search.Proceedings of the 15th ACM international conference on Multimedia(2007)

2007
[9]

Rodrigo Castellon, Chris Donahue, and Percy Liang. 2021. Codified audio language modeling learns useful represen- tations for music information retrieval.ArXivabs/2107.05677 (2021)

work page arXiv 2021
[10]

Pedro Dalla Vecchia Chaves, Bruno Laporais Pereira, and Rodrygo L. T. Santos. 2022. Efficient Online Learning to Rank for Sequential Music Recommendation.Proceedings of the ACM Web Conference 2022(2022)

2022
[11]

Chih-Ming Chen, Ming-Feng Tsai, Jen-Yu Liu, and Yi-Hsuan Yang. 2013. Using emotional context from article for contextual music recommendation.Proceedings of the 21st ACM international conference on Multimedia(2013)

2013
[12]

Chen, Beici Liang, Xiaoshuang Ma, and Minwei Gu

K. Chen, Beici Liang, Xiaoshuang Ma, and Minwei Gu. 2020. Learning Audio Embeddings with User Listening Data for Content-Based Music Recommendation.ICASSP 2021 - 2021 IEEE International Conference on Acoustics, Speech and Signal Processing (ICASSP)(2020), 3015–3019

2020
[13]

Zhiyong Cheng and Jialie Shen. 2016. On Effective Location-Aware Music Recommendation.ACM Transactions on Information Systems (TOIS)34 (2016), 1 – 32

2016
[14]

Chung-Cheng Chiu, James Qin, Yu Zhang, Jiahui Yu, and Yonghui Wu. 2022. Self-supervised Learning with Random- projection Quantizer for Speech Recognition. InInternational Conference on Machine Learning

2022
[15]

Jeong Choi, Seongwon Jang, Hyunsouk Cho, and Sehee Chung. 2022. Towards Proper Contrastive Self-supervised Learning Strategies For Music Audio Representation. arXiv:2207.04471 [cs.SD] https://arxiv.org/abs/2207.04471 24 Yan-Martin Tamm and Anna Aljanaki

work page arXiv 2022
[16]

Keunwoo Choi, George Fazekas, and Mark Sandler. 2016. Automatic tagging using deep convolutional neural networks. arXiv preprint arXiv:1606.00298(2016)

work page arXiv 2016
[17]

Parag Chordia, Mark Godfrey, and Alex Rae. 2008. Extending Content-Based Recommendation: The Case of Indian Classical Music. InInternational Society for Music Information Retrieval Conference

2008
[18]

Szu-Yu Chou, Li-Chia Yang, Yi-Hsuan Yang, and Jyh-Shing Roger Jang. 2017. Conditional preference nets for user and item cold start problems in music recommendation.2017 IEEE International Conference on Multimedia and Expo (ICME)(2017), 1147–1152

2017
[19]

Alexandre D’efossez, Jade Copet, Gabriel Synnaeve, and Yossi Adi. 2022. High Fidelity Neural Audio Compression. ArXivabs/2210.13438 (2022)

work page internal anchor Pith review arXiv 2022
[20]

Yashar Deldjoo, Markus Schedl, and Peter Knees. 2021. Content-driven Music Recommendation: Evolution, State of the Art, and Challenges.ArXivabs/2107.11803 (2021)

work page arXiv 2021
[21]

Prafulla Dhariwal, Heewoo Jun, Christine Payne, Jong Wook Kim, Alec Radford, and Ilya Sutskever. 2020. Jukebox: A Generative Model for Music.ArXivabs/2005.00341 (2020)

work page arXiv 2020
[22]

Yiwei Ding and Alexander Lerch. 2024. Parameter-Efficient Transfer Learning for Music Foundation Models. arXiv:2411.19371 [cs.SD] https://arxiv.org/abs/2411.19371

work page arXiv 2024
[23]

Mohamadreza Sheikh Fathollahi and Farbod Razzazi. 2021. Music similarity measurement and recommendation system using convolutional neural networks.International Journal of Multimedia Information Retrieval10 (2021), 43 – 53

2021
[24]

Andres Ferraro, Jaehun Kim, Sergio Oramas, Andreas Ehmann, and Fabien Gouyon. 2023. Contrastive Learning for Cross-modal Artist Retrieval. arXiv:2308.06556 [cs.IR] https://arxiv.org/abs/2308.06556

work page arXiv 2023
[25]

Anders Friberg and Anton Hedblad. 2011. A Comparison of Perceptual Ratings and Computed Audio Features. In8th Sound and Music Computing Conference. https://doi.org/10.5281/zenodo.849857

work page doi:10.5281/zenodo.849857 2011
[26]

Christian Ganhör, Marta Moscati, Anna Hausberger, Shah Nawaz, and Markus Schedl. 2024. A Multimodal Single- Branch Embedding Network for Recommendation in Cold-Start and Missing Modality Scenarios. InProceedings of the 18th ACM Conference on Recommender Systems(Bari, Italy)(RecSys ’24). Association for Computing Machinery, New York, NY, USA, 380–390. http...

work page doi:10.1145/3640457.3688138 2024
[27]

Hong Gao. 2022. Automatic Recommendation of Online Music Tracks Based on Deep Learning.Mathematical Problems in Engineering(2022)

2022
[28]

Wenjuan Gong and Qingshuang Yu. 2021. A Deep Music Recommendation Method Based on Human Motion Analysis. IEEE Access9 (2021), 26290–26300

2021
[29]

Florian Grötschla, Luca Strässle, Luca A Lanzendörfer, and Roger Wattenhofer. 2024. Towards leveraging contrastively pretrained neural audio embeddings for recommender tasks.arXiv preprint arXiv:2409.09026(2024)

work page arXiv 2024
[30]

Wei-Ning Hsu, Benjamin Bolte, Yao-Hung Hubert Tsai, Kushal Lakhotia, Ruslan Salakhutdinov, and Abdel rahman Mohamed. 2021. HuBERT: Self-Supervised Speech Representation Learning by Masked Prediction of Hidden Units. IEEE/ACM Transactions on Audio, Speech, and Language Processing29 (2021), 3451–3460

2021
[31]

Marius Kaminskas, Francesco Ricci, and Markus Schedl. 2013. Location-aware music recommendation using auto- tagging and hybrid matching.Proceedings of the 7th ACM conference on Recommender systems(2013)

2013
[32]

Jaehun Kim, Julián Urbano, Cynthia C. S. Liem, and Alan Hanjalic. 2018. One deep music representation to rule them all? A comparative analysis of different representation learning strategies.Neural Computing and Applications32 (2018), 1067 – 1093

2018
[33]

Peter Knees, Ángel Faraldo, Perfecto Herrera, Richard Vogl, Sebastian Böck, Florian Hörschläger, and Mickael Le Goff. 2015. Two Data Sets for Tempo Estimation and Key Detection in Electronic Dance Music Annotated from User Corrections. InInternational Society for Music Information Retrieval Conference

2015
[34]

Peter Knees, Tim Pohle, Markus Schedl, and Gerhard Widmer. 2006. Combining audio-based similarity with web-based data to accelerate automatic music playlist generation. InMultimedia Information Retrieval

2006
[35]

Christian Koch, Ganna Krupii, and David Hausheer. 2017. Proactive Caching of Music Videos based on Audio Features, Mood, and Genre.Proceedings of the 8th ACM on Multimedia Systems Conference(2017)

2017
[36]

Junghyun Koo, Seungryeol Paik, and Kyogu Lee. 2022. End-To-End Music Remastering System Using Self-Supervised And Adversarial Training.ICASSP 2022 - 2022 IEEE International Conference on Acoustics, Speech and Signal Processing (ICASSP)(2022), 4608–4612

2022
[37]

Mandel, Mert Bay, and J

Edith Law, Kris West, Michael I. Mandel, Mert Bay, and J. S. Downie. 2009. 10 th International Society for Music Information Retrieval Conference ( ISMIR 2009 ) EVALUATION OF ALGORITHMS USING GAMES : THE CASE OF MUSIC TAGGING

2009
[38]

Jongpil Lee, Kyungyun Lee, Jiyoung Park, Jangyeon Park, and Juhan Nam. 2018. Deep Content-User Embedding Model for Music Recommendation.ArXivabs/1807.06786 (2018)

work page arXiv 2018
[39]

Jongpil Lee, Jiyoung Park, Keunhyoung Luke Kim, and Juhan Nam. 2018. SampleCNN: End-to-End Deep Convolutional Neural Networks Using Very Small Filters for Music Classification.Applied Sciences8, 1 (2018). https://doi.org/10. Adopting State-of-the-Art Pretrained Audio Representations for Music Recommender Systems 25 3390/app8010150

2018
[40]

Tom L. H. Li and Antoni B. Chan. 2011. Genre Classification and the Invariance of MFCC Features to Key and Tempo. InAdvances in Multimedia Modeling, Kuo-Tien Lee, Wen-Hsiang Tsai, Hong-Yuan Mark Liao, Tsuhan Chen, Jun-Wei Hsieh, and Chien-Cheng Tseng (Eds.). Springer Berlin Heidelberg, Berlin, Heidelberg, 317–327

2011
[41]

Gyenge, Roger B

Yizhi Li, Ruibin Yuan, Ge Zhang, Yi Ma, Xingran Chen, Hanzhi Yin, Chen-Li Lin, Anton Ragni, Emmanouil Benetos, N. Gyenge, Roger B. Dannenberg, Ruibo Liu, Wenhu Chen, Gus G. Xia, Yemin Shi, Wen-Fen Huang, Yi-Ting Guo, and Jie Fu. 2023. MERT: Acoustic Music Understanding Model with Large-Scale Self-supervised Training.ArXiv abs/2306.00107 (2023)

work page arXiv 2023
[42]

Yizhi Li, Ruibin Yuan, Ge Zhang, Yi Ma, Chenghua Lin, Xingran Chen, Anton Ragni, Hanzhi Yin, Zhijie Hu, Haoyu He, Emmanouil Benetos, Norbert Gyenge, Ruibo Liu, and Jie Fu. 2022. MAP-Music2Vec: A Simple and Effective Baseline for Self-Supervised Music Audio Representation Learning.ArXivabs/2212.02508 (2022)

work page arXiv 2022
[43]

Dawen Liang, Minshu Zhan, and Daniel P. W. Ellis. 2015. Content-Aware Collaborative Music Recommendation Using Pre-trained Neural Networks. InInternational Society for Music Information Retrieval Conference

2015
[44]

Yinhan Liu, Myle Ott, Naman Goyal, Jingfei Du, Mandar Joshi, Danqi Chen, Omer Levy, Mike Lewis, Luke Zettlemoyer, and Veselin Stoyanov. 2019. RoBERTa: A Robustly Optimized BERT Pretraining Approach. arXiv:1907.11692 [cs.CL] https://arxiv.org/abs/1907.11692

work page internal anchor Pith review arXiv 2019
[45]

Paul Magron and C’edric F’evotte. 2021. Neural content-aware collaborative filtering for cold-start music recommen- dation.Data Mining and Knowledge Discovery36 (2021), 1971 – 2005

2021
[46]

Knowing me, knowing you

Millecamp Martijn, Cristina Conati, and Katrien Verbert. 2022. “Knowing me, knowing you”: personalized explanations for a music recommender system.User Modeling and User-Adapted Interaction32 (2022), 215 – 252

2022
[47]

David Martín-Gutiérrez, Gustavo Hernández Peñaloza, Alberto Belmonte-Hernández, and Federico Álvarez García
[48]

A Multimodal End-to-End Deep Learning Architecture for Music Popularity Prediction.IEEE Access8 (2020), 39361–39374

2020
[49]

McCallum, Filip Korzeniowski, Sergio Oramas, Fabien Gouyon, and Andreas F

Matthew C. McCallum, Filip Korzeniowski, Sergio Oramas, Fabien Gouyon, and Andreas F. Ehmann. 2022. Supervised and Unsupervised Learning of Audio Representations for Music Understanding. arXiv:2210.03799 [cs.SD] https: //arxiv.org/abs/2210.03799

work page arXiv 2022
[50]

Brian McFee, Luke Barrington, and Gert R. G. Lanckriet. 2011. Learning Content Similarity for Music Recommendation. IEEE Transactions on Audio, Speech, and Language Processing20 (2011), 2207–2218

2011
[51]

Brian McFee and Gert R. G. Lanckriet. 2011. The Natural Language of Playlists. InInternational Society for Music Information Retrieval Conference

2011
[52]

M Jeffrey Mei, Florian Henkel, Samuel E Sandberg, Oliver Bembom, and Andreas F Ehmann. 2025. Semantic ids for music recommendation. InProceedings of the Nineteenth ACM Conference on Recommender Systems. 1070–1073

2025
[53]

Melchiorre, Verena Haunschmid, Markus Schedl, and Gerhard Widmer

Alessandro B. Melchiorre, Verena Haunschmid, Markus Schedl, and Gerhard Widmer. 2021. LEMONS: Listenable Explanations for Music recOmmeNder Systems. InEuropean Conference on Information Retrieval

2021
[54]

Reimer, S

Scott Miller, Paul N. Reimer, S. Ness, and George Tzanetakis. 2010. Geoshuffle: Location-Aware, Content-based Music Browsing Using Self-organizing Tag Clouds. InInternational Society for Music Information Retrieval Conference

2010
[55]

Marta Moscati, Emilia Parada-Cabaleiro, Yashar Deldjoo, Eva Zangerle, and Markus Schedl. 2022. Music4All-Onion – A Large-Scale Multi-faceted Content-Centric Music Recommendation Dataset.Proceedings of the 31st ACM International Conference on Information & Knowledge Management(2022)

2022
[56]

Sotiroudis, Achilles D

Lazaros Moysis, Lazaros Alexios Iliadis, Sotirios P. Sotiroudis, Achilles D. Boursianis, Maria S. Papadopoulou, Konstantinos-Iraklis D. Kokkinidis, Christos K. Volos, Panagiotis G. Sarigiannidis, Spiridon Nikolaidis, and Sotirios K Goudos. 2023. Music Deep Learning: Deep Learning Methods for Music Signal Processing—A Review of the State-of- the-Art.IEEE A...

2023
[57]

Alexandros Nanopoulos, Dimitrios Rafailidis, Panagiotis Symeonidis, and Yannis Manolopoulos. 2010. MusicBox: Personalized Music Recommendation Based on Cubic Analysis of Social Tags.IEEE Transactions on Audio, Speech, and Language Processing18 (2010), 407–412

2010
[58]

Sergio Oramas, Oriol Nieto, Mohamed Sordo, and Xavier Serra. 2017. A deep multimodal approach for cold-start music recommendation. InProceedings of the 2nd workshop on deep learning for recommender systems. 32–37

2017
[59]

Jayashree Padmanabhan and Melvin Jose Johnson Premkumar. 2015. Machine Learning in Automatic Speech Recognition: A Survey.IETE Technical Review32, 4 (2015), 240–251. https://doi.org/10.1080/02564602.2015.1010611

work page doi:10.1080/02564602.2015.1010611 2015
[60]

Elias Pampalk and Masataka Goto. 2007. MusicSun: A New Approach to Artist Recommendation. InInternational Society for Music Information Retrieval Conference

2007
[61]

Minju Park and Kyogu Lee. 2022. Exploiting Negative Preference in Content-based Music Recommendation with Contrastive Learning.Proceedings of the 16th ACM Conference on Recommender Systems(2022)

2022
[62]

Anton Pembek, Artem Fatkulin, Anton Klenitskiy, and Alexey Vasilev. 2025. Let It Go? Not Quite: Addressing Item Cold Start in Sequential Recommendations with Content-Based Initialization. InProceedings of the Nineteenth ACM Conference on Recommender Systems. 626–631. 26 Yan-Martin Tamm and Anna Aljanaki

2025
[63]

Leonardo Pepino, Pablo Ernesto Riera, and Luciana Ferrer. 2023. EnCodecMAE: Leveraging neural codecs for universal audio representation learning.ArXivabs/2309.07391 (2023)

work page arXiv 2023
[64]

Bruno Laporais Pereira, Alberto Hideki Ueda, Gustavo Penha, Rodrygo L. T. Santos, and Nivio Ziviani. 2019. Online learning to rank for sequential music recommendation.Proceedings of the 13th ACM Conference on Recommender Systems(2019)

2019
[65]

Jordi Pons and Xavier Serra. 2019. musicnn: Pre-trained convolutional neural networks for music audio tagging. ArXivabs/1909.06654 (2019)

work page arXiv 2019
[66]

Michael Pulis and Josef Bajada. 2021. Siamese Neural Networks for Content-based Cold-Start Music Recommendation. Proceedings of the 15th ACM Conference on Recommender Systems(2021)

2021
[67]

Hendrik Purwins, Bo Li, Tuomas Virtanen, Jan Schlüter, Shuo yiin Chang, and Tara N. Sainath. 2019. Deep Learning for Audio Signal Processing.IEEE Journal of Selected Topics in Signal Processing13 (2019), 206–219

2019
[68]

Shashank Rajput, Nikhil Mehta, Anima Singh, Raghunandan Hulikal Keshavan, Trung Vu, Lukasz Heldt, Lichan Hong, Yi Tay, Vinh Tran, Jonah Samost, et al. 2023. Recommender systems with generative retrieval.Advances in Neural Information Processing Systems36 (2023), 10299–10315

2023
[69]

Seungmin Rho, Byeong jun Han, and Eenjun Hwang. 2009. SVR-based music mood classification and context-based music recommendation.Proceedings of the 17th ACM international conference on Multimedia(2009)

2009
[70]

Igor André Pegoraro Santana, Fabio Pinhelli, Juliano Donini, Leonardo Catharin, Rafael Biazus Mangolin, Valéria Delisandra Feltrim, Marcos Aurélio Domingues, et al . 2020. Music4all: A new music database and its applications. In2020 International Conference on Systems, Signals and Image Processing (IWSSIP). IEEE, 399–404

2020
[71]

Markus Schedl. 2019. Deep Learning in Music Recommendation Systems.Frontiers Appl. Math. Stat.5 (2019), 44

2019
[72]

2022.Music Recommendation Systems: Techniques, Use Cases, and Challenges

Markus Schedl, Peter Knees, Brian McFee, and Dmitry Bogdanov. 2022.Music Recommendation Systems: Techniques, Use Cases, and Challenges. Springer US, New York, NY, 927–971. https://doi.org/10.1007/978-1-0716-2197-4_24

work page doi:10.1007/978-1-0716-2197-4_24 2022
[73]

Markus Schedl and Dominik Schnitzer. 2014. Location-Aware Music Artist Recommendation. InConference on Multimedia Modeling

2014
[74]

Dominik Schnitzer, Arthur Flexer, Markus Schedl, and Gerhard Widmer. 2012. Local and global scaling reduce hubs in space.J. Mach. Learn. Res.13 (2012), 2871–2902

2012
[75]

Hendrik Schreiber. 2015. Improving Genre Annotations for the Million Song Dataset. InInternational Society for Music Information Retrieval Conference. https://api.semanticscholar.org/CorpusID:16812873

2015
[76]

Klaus Seyerlehner, Peter Knees, Dominik Schnitzer, and Gerhard Widmer. 2009. Browsing Music Recommendation Networks. InInternational Society for Music Information Retrieval Conference

2009
[77]

Bo Shao, Mitsunori Ogihara, Dingding Wang, and Tao Li. 2009. Music Recommendation Based on Acoustic Features and User Access Patterns.IEEE Transactions on Audio, Speech, and Language Processing17 (2009), 1602–1611

2009
[78]

Anima Singh, Trung Vu, Nikhil Mehta, Raghunandan Keshavan, Maheswaran Sathiamoorthy, Yilin Zheng, Lichan Hong, Lukasz Heldt, Li Wei, Devansh Tandon, et al. 2024. Better generalization with semantic ids: A case study in ranking for recommendations. InProceedings of the 18th ACM Conference on Recommender Systems. 1039–1044

2024
[79]

Soleymani, Anna Aljanaki, Frans Wiering, and Remco C

M. Soleymani, Anna Aljanaki, Frans Wiering, and Remco C. Veltkamp. 2015. Content-based music recommendation using underlying music preference structure.2015 IEEE International Conference on Multimedia and Expo (ICME) (2015), 1–6

2015
[80]

Janne Spijkervet and John Ashley Burgoyne. 2021. Contrastive Learning of Musical Representations.ArXiv abs/2103.09410 (2021)

work page arXiv 2021

Showing first 80 references.