Due to the increased demand for music streaming/recommender services and the recent developments of music information retrieval frameworks, Music Genre Classification (MGC) has attracted the community's attention. However, convolutional-based approaches are known to lack the ability to efficiently encode and localize temporal features. In this paper, we study the broadcast-based neural networks aiming to improve the localization and generalizability under a small set of parameters (about 180k) and investigate twelve variants of broadcast networks discussing the effect of block configuration, pooling method, activation function, normalization mechanism, label smoothing, channel interdependency, LSTM block inclusion, and variants of inception schemes. Our computational experiments using relevant datasets such as GTZAN, Extended Ballroom, HOMBURG, and Free Music Archive (FMA) show state-of-the-art classification accuracies in Music Genre Classification. Our approach offers insights and the potential to enable compact and generalizable broadcast networks for music and audio classification.
翻译:由于对音乐流流/通信服务的需求增加,以及最近音乐信息检索框架的发展,Music Genere分类(MGC)吸引了社区的注意力,然而,众所周知,以革命为基础的方法缺乏对时间特征进行有效编码和本地化的能力,在本文中,我们研究以广播为基础的神经网络,目的是在一套小参数(约180k)下改善本地化和普及性,并调查12种不同的广播网络,这些网络讨论了块状配置、集合方法、激活功能、正常化机制、标签平滑、频道间相互依赖、LSTM区集成和初始方案变异等的影响。我们利用GTZAN、扩展球室、HOMBURG和自由音乐档案(FMA)等相关数据集进行的计算实验显示了GTZAN、扩展球室、HOMBURG和自由音乐档案(FMA)在Gene音乐 Genre分类中的最新艺术分类精度。我们的方法提供了深刻的见解和潜力,可以使压缩和可推广的广播网络能够进行音乐和音频分类。