Singing techniques are used for expressive vocal performances by employing temporal fluctuations of the timbre, the pitch, and other components of the voice. Their classification is a challenging task, because of mainly two factors: 1) the fluctuations in singing techniques have a wide variety and are affected by many factors and 2) existing datasets are imbalanced. To deal with these problems, we developed a novel audio feature learning method based on deformable convolution with decoupled training of the feature extractor and the classifier using a class-weighted loss function. The experimental results show the following: 1) the deformable convolution improves the classification results, particularly when it is applied to the last two convolutional layers, and 2) both re-training the classifier and weighting the cross-entropy loss function by a smoothed inverse frequency enhance the classification performance.
翻译:唱法用于通过使用音调、音调和其他部分的时间波动来进行声响表演。它们的分类是一项具有挑战性的任务,因为主要有两个因素:(1)歌唱技术的波动多种多样,并受到许多因素的影响;(2)现有的数据集不平衡。为处理这些问题,我们开发了一种新型音频学习方法,其基础是:通过对地物提取器和分类器进行分解培训,进行变形变异的演化,并使用一个分级加权损失函数。实验结果显示:(1) 变形变形能改进分类结果,特别是适用于最后两个相交层时;(2) 通过平滑的反频率对分类器进行再培训和对跨热带损失函数加权,提高分类性能。