The success of deep neural networks requires both high annotation quality and massive data. However, the size and the quality of a dataset are usually a trade-off in practice, as data collection and cleaning are expensive and time-consuming. Therefore, automatic noisy label detection (NLD) techniques are critical to real-world applications, especially those using crowdsourcing datasets. As this is an under-explored topic in automatic speaker verification (ASV), we present a simple but effective solution to the task. First, we compare the effectiveness of various commonly used metric learning loss functions under different noise settings. Then, we propose two ranking-based NLD methods, inter-class inconsistency and intra-class inconsistency ranking. They leverage the inconsistent nature of noisy labels and show high detection precision even under a high level of noise. Our solution gives rise to both efficient and effective cleaning of large-scale speaker recognition datasets.
翻译:深层神经网络的成功既需要高注解质量,也需要大量数据。然而,数据集的大小和质量在实践上通常是一种权衡,因为数据收集和清理费用昂贵,而且耗费时间。因此,自动噪音标签探测技术对于现实世界的应用,特别是那些使用众包数据集的应用至关重要。由于这是在自动语音验证(ASV)中探索不足的一个专题,因此我们对任务提出了一个简单而有效的解决办法。首先,我们比较了不同噪音环境下各种常用的计量学习损失功能的有效性。然后,我们提出了两种基于等级的民联方法,即阶级间不一致和阶级内部不一致的排名。它们利用了吵闹标签的不一致性质,并显示即使在高噪音下也具有很高的探测精度。我们的解决方案使得大规模语音识别数据集得到高效和有效的清理。