High quality user feedback data is essential to training and evaluating a successful music recommendation system, particularly one that has to balance the needs of multiple stakeholders. Most existing music datasets suffer from noisy feedback and self-selection biases inherent in the data collected by music platforms. Using the Piki Music dataset of 500k ratings collected over a two-year time period, we evaluate the performance of classic recommendation algorithms on three important stakeholders: consumers, well-known artists and lesser-known artists. We show that a matrix factorization algorithm trained on both likes and dislikes performs significantly better compared to one trained only on likes for all three stakeholders.
翻译:高质量的用户反馈数据对于培训和评价成功的音乐推荐系统至关重要,特别是平衡多个利益攸关方需求的系统。大多数现有的音乐数据集都受到音乐平台所收集的数据所固有的强烈反馈和自我选择偏差的影响。我们利用两年来收集的500k分数的比基音乐数据集,评估三大利益攸关方:消费者、知名艺术家和不太知名的艺术家的经典推荐算法表现。我们显示,与所有三个利益攸关方只接受过类似培训的矩阵化因子算法相比,在喜欢和不喜欢方面受过培训的矩阵化算法表现得要好得多。