Area under the ROC curve (AUC) optimisation techniques developed for neural networks have recently demonstrated their capabilities in different audio and speech related tasks. However, due to its intrinsic nature, AUC optimisation has focused only on binary tasks so far. In this paper, we introduce an extension to the AUC optimisation framework so that it can be easily applied to an arbitrary number of classes, aiming to overcome the issues derived from training data limitations in deep learning solutions. Building upon the multiclass definitions of the AUC metric found in the literature, we define two new training objectives using a one-versus-one and a one-versus-rest approach. In order to demonstrate its potential, we apply them in an audio segmentation task with limited training data that aims to differentiate 3 classes: foreground music, background music and no music. Experimental results show that our proposal can improve the performance of audio segmentation systems significantly compared to traditional training criteria such as cross entropy.
翻译:为神经网络开发的 ROC 曲线(AUC) 优化技术领域最近展示了它们在不同音频和语音相关任务方面的能力,然而,由于其内在性质,AUC 优化迄今为止只侧重于二进制任务。在本文中,我们引入了AUC优化框架的扩展,以便它能够很容易地适用于任意数量的班级,目的是克服深层学习解决方案中培训数据限制所产生的问题。我们根据文献中发现的AUC 指标的多级定义,用一反一和一反复制方法界定了两个新的培训目标。为了展示其潜力,我们将这些新目标应用到音频分割任务中,其有限的培训数据旨在区分3个班级:地面音乐、背景音乐和没有音乐。实验结果显示,我们的建议可以大大改善音分化系统的性能,而不像像横曲子这样的传统培训标准。