Imbalanced music genre classification is a crucial task in the Music Information Retrieval (MIR) field for identifying the long-tail, data-poor genre based on the related music audio segments, which is very prevalent in real-world scenarios. Most of the existing models are designed for class-balanced music datasets, resulting in poor performance in accuracy and generalization when identifying the music genres at the tail of the distribution. Inspired by the success of introducing Multi-instance Learning (MIL) in various classification tasks, we propose a novel mechanism named Multi-instance Attention (MATT) to boost the performance for identifying tail classes. Specifically, we first construct the bag-level datasets by generating the album-artist pair bags. Second, we leverage neural networks to encode the music audio segments. Finally, under the guidance of a multi-instance attention mechanism, the neural network-based models could select the most informative genre to match the given music segment. Comprehensive experimental results on a large-scale music genre benchmark dataset with long-tail distribution demonstrate MATT significantly outperforms other state-of-the-art baselines.
翻译:在音乐信息检索(MIR)域中,平衡音乐基因分类是一项关键任务,用于根据相关音乐音频段确定长尾、数据贫乏的流派,这在现实世界情景中非常普遍。大多数现有模型是为类平衡音乐数据集设计的,导致在确定分布尾端的音乐元件时,准确性和概括性性表现差。由于在各种分类任务中引入多年级学习(MIL)的成功,我们提议了一个名为多年级关注的新颖机制,以提升识别尾品类的性能。具体地说,我们首先通过生成专辑艺术双组袋来构建包级数据集。第二,我们利用神经网络来编码音乐音频段。最后,在多年级关注机制的指导下,基于神经网络的模型可以选择最有知识的流派来与给定的音乐元件相匹配。关于大型音乐元基准数据集和长尾品分布的综合实验结果显示超模模模基线。