This paper proposes a 1D residual convolutional neural network (CNN) architecture for music genre classification and compares it with other recent 1D CNN architectures. The 1D CNNs learn a representation and a discriminant directly from the raw audio signal. Several convolutional layers capture the time-frequency characteristics of the audio signal and learn various filters relevant to the music genre recognition task. The proposed approach splits the audio signal into overlapped segments using a sliding window to comply with the fixed-length input constraint of the 1D CNNs. As a result, music genre classification can be carried out on a single audio segment or on the aggregation of the predictions on several audio segments, which improves the final accuracy. The performance of the proposed 1D residual CNN is assessed on a public dataset of 1,000 audio clips. The experimental results have shown that it achieves 80.93% of mean accuracy in classifying music genres and outperforms other 1D CNN architectures.
翻译:本文建议了 1D 剩余神经神经网络(CNN) 结构,用于音乐基因分类,并将其与其他最近的1D CNN 结构进行比较。 1D CNN 结构直接从原始音频信号中学习一个表达和截面。 多个相继层捕获音频信号的时间频率特性,并学习与音乐基因识别任务相关的各种过滤器。 提议的方法将音频信号分割为重叠部分,使用滑动窗口满足1D CNN 的固定长度输入限制。 因此,音乐基因分类可以在单一音频段或若干音频段预测汇总上进行,以提高最终准确性。 拟议的1D CNN 功能在1,000个音频剪的公开数据集上进行评估。 实验结果表明,它在音乐基因分类中实现了80.93%的平均准确度,并超越了其他 1D CNN 结构。