Music segmentation refers to the dual problem of identifying boundaries between, and labeling, distinct music segments, e.g., the chorus, verse, bridge etc. in popular music. The performance of a range of music segmentation algorithms has been shown to be dependent on the audio features chosen to represent the audio. Some approaches have proposed learning feature transformations from music segment annotation data, although, such data is time consuming or expensive to create and as such these approaches are likely limited by the size of their datasets. While annotated music segmentation data is a scarce resource, the amount of available music audio is much greater. In the neighboring field of semantic audio unsupervised deep learning has shown promise in improving the performance of solutions to the query-by-example and sound classification tasks. In this work, unsupervised training of deep feature embeddings using convolutional neural networks (CNNs) is explored for music segmentation. The proposed techniques exploit only the time proximity of audio features that is implicit in any audio timeline. Employing these embeddings in a classic music segmentation algorithm is shown not only to significantly improve the performance of this algorithm, but obtain state of the art performance in unsupervised music segmentation.
翻译:音乐分解是指在流行音乐中辨别不同音乐段(如合唱、音乐、桥、桥等)之间的界限和标签、不同音乐段的双重问题。 一系列音乐分解算法的性能显示取决于选择的音频特性。 有些方法建议从音乐部分注解数据中学习外观转换,尽管这些数据耗费时间或昂贵,因此,这些方法可能因其数据集的大小而受到限制。 虽然附加说明的音乐分解数据是一种稀缺的资源,但可用的音乐音频量要大得多。在音频分解的邻近领域,未经监督的深层学习显示,在改进逐个查询和音频分类任务解决方案的性能方面,有希望。在这项工作中,探索使用进化神经网络的深层外观嵌入深度功能培训,以进行音乐分解。拟议的技术仅利用音频时间线中隐含的音频特征的时间接近时间。将这些嵌入到经典音乐分解算法中,但将这些嵌入的音乐分解算法在非高级艺术分算法中,不仅显著改进了这种状态演算法的性能。