Recently, the attention mechanism such as squeeze-and-excitation module (SE) and convolutional block attention module (CBAM) has achieved great success in deep learning-based speaker verification system. This paper introduces an alternative effective yet simple one, i.e., simple attention module (SimAM), for speaker verification. The SimAM module is a plug-and-play module without extra modal parameters. In addition, we propose a noisy label detection method to iteratively filter out the data samples with a noisy label from the training data, considering that a large-scale dataset labeled with human annotation or other automated processes may contain noisy labels. Data with the noisy label may over parameterize a deep neural network (DNN) and result in a performance drop due to the memorization effect of the DNN. Experiments are conducted on VoxCeleb dataset. The speaker verification model with SimAM achieves the 0.675% equal error rate (EER) on VoxCeleb1 original test trials. Our proposed iterative noisy label detection method further reduces the EER to 0.643%.
翻译:最近,挤压和抽取模块(SE)和变速区块注意模块(CBAM)等关注机制在深层学习用扬声器校验系统中取得了巨大成功。 本文介绍了一个有效但简单的替代软件, 即简单的注意模块(SimAM), 用于校验扬声器。 Simmam模块是一个插插座和游戏模块,没有额外的模式参数。 此外, 我们建议使用一个噪音标签检测方法, 以从培训数据中贴上噪音标签的方式迭接数据样本, 以迭接方式过滤数据样本, 考虑到贴有人类注或其他自动程序标签的大型数据集可能包含噪音标签。 带有噪音标签的数据可能超出深层神经网络( DNNN)的参数,并导致因DNN的记忆效应而出现性下降。 在VoxCeleb数据集上进行了实验。 在VoxCeleb原试验中, 与SimAM 的语音验证模型在VoxCeleb1 测试中达到0. 675% 等误率。 我们提议的迭传热标签检测方法进一步将EER 降到0.64。