Spain;
Nantes;
Departamento de Sistemas Informaticos;
Universidad de Castilla-La Mancha;
Albacete 02071;
Campus universitarios/n;
Laboratoire des Sciences du Numerique de Nantes;
University of Nantes;
France;
Centre national de la recherche scientifique;
关键词:
Music information retrieval;
Artificial intelligence;
期刊名称:
Multimedia tools and applications
i s s n:
1380-7501
年卷期:
2024 年
83 卷
16 期
页 码:
48311-48330
页 码:
摘 要:
Music information retrieval (MIR) is an interdisciplinary research field that focuses on theextraction, processing, and knowledge discovery of information contained in music. Whileprevious studies have utilized Spotify audio features and Last.fm tags as input values forclassification tasks, such as music genre recognition, their potential as target values hasremained unexplored. In this article, we address this notable gap in the research landscapeby proposing a novel approach to predict Spotify audio features based on a set of Last.fmtags. By predicting audio features, we aim to explore the relationship between subjectiveperception and concrete musical features, shedding light on patterns and hidden correlationsbetween how music is perceived, consumed, and discovered. Additionally, the predictedaudio features can be leveraged in recommendation systems to provide users with explainablerecommendations, bridging the gap between algorithmic suggestions and user understanding.Our experiments involve training models such as GPT-2, XGBRegressor, andBayesian Ridgeregressor to predict Spotify audio features from Last.fm tags. Through our findings, wecontribute to the advancement ofMIR research by demonstrating the potential of Last.fm tagsas target values and paving the way for future research on the connection between subjectiveand objective music characterization. Our approach holds promise for both listeners andresearchers, offering new insights into the intricate relationship between perception andaudio signal in music. Our study aims to explore the feasibility and efficacy of this uniqueapproach, where we intentionally refrain from using traditional audio-based or metadatadrivenmethods.