We present a content-based automatic music tagging algorithm using fully convolutional neural networks (FCNs). We evaluate different architectures consisting of 2D convolutional layers and subsampling layers only. In the experiments, we measure the AUC-ROC scores of the architectures with different complexities and input types using the MagnaTagATune dataset, where a 4-layer architecture shows state-of-the-art performance with mel-spectrogram input. Furthermore, we evaluated the performances of the architectures with varying the number of layers on a larger dataset (Million Song Dataset), and found that deeper models outperformed the 4-layer architecture. The experiments show that mel-spectrogram is an effective time-frequency representation for automatic tagging and that more complex models benefit from more training data.
翻译:我们使用完全进化神经网络(FCNs)展示了基于内容的自动音乐标记算法。我们评估了由2D进化层和子抽样层组成的不同结构。在实验中,我们用MagnaTagatune数据集测量了具有不同复杂性和输入型的AUC-ROC结构分数。在MagnaTagatune数据集中,一个四层结构显示了使用Mel-pectrogrogram输入的先进性能。此外,我们评估了结构的性能,在更大的数据集(Million Song Dataset)上,层次各异,并发现更深的模型比四层结构要好。实验显示,Mel-Progragrograph是自动标记的有效时间-频率代表,更复杂的模型受益于更多的培训数据。