This paper presents a novel supervised approach to detecting the chorus segments in popular music. Traditional approaches to this task are mostly unsupervised, with pipelines designed to target some quality that is assumed to define "chorusness," which usually means seeking the loudest or most frequently repeated sections. We propose to use a convolutional neural network with a multi-task learning objective, which simultaneously fits two temporal activation curves: one indicating "chorusness" as a function of time, and the other the location of the boundaries. We also propose a post-processing method that jointly takes into account the chorus and boundary predictions to produce binary output. In experiments using three datasets, we compare our system to a set of public implementations of other segmentation and chorus-detection algorithms, and find our approach performs significantly better.
翻译:本文介绍了一种新颖的监督方法来探测流行音乐中的合唱部分。 这项任务的传统方法大多无人监督, 管道设计针对某种质量的“ 合唱” 定义, 通常意味着寻找最吵或最频繁的重复部分。 我们提议使用具有多任务学习目标的演进神经网络,同时适合两种时间活化曲线: 一种表示“合唱”作为时间函数,另一种表示边界的位置。 我们还提议一种后处理方法, 共同考虑合唱和边界预测以产生二进制产出。 在使用三个数据集的实验中,我们将我们的系统与一套公开实施其他分化和合唱反演算法的系统进行比较,并发现我们的方法效果要好得多。