In this paper, we propose a novel strategy for text-independent speaker identification system: Multi-Label Training (MLT). Instead of the commonly used one-to-one correspondence between the speech and the speaker label, we divide all the speeches of each speaker into several subgroups, with each subgroup assigned a different set of labels. During the identification process, a specific speaker is identified as long as the predicted label is the same as one of his/her corresponding labels. We found that this method can force the model to distinguish the data more accurately, and somehow takes advantages of ensemble learning, while avoiding the significant increase of computation and storage burden. In the experiments, we found that not only in clean conditions, but also in noisy conditions with speech enhancement, Multi-Label Training can still achieve better identification performance than commom methods. It should be noted that the proposed strategy can be easily applied to almost all current text-independent speaker identification models to achieve further improvements.
翻译:在本文中,我们提出了针对依赖文本的发言者识别系统的新战略:多标签培训(MLT)。我们不使用常用的演讲和发言者标签之间的一对一对应,而是将每个发言者的所有发言分成几个分组,每个分组分配不同的标签。在识别过程中,只要预测标签与其对应标签相同,就可指定一个特定发言者。我们发现,这种方法可以迫使模型更准确地区分数据,并在某种程度上利用共同学习的好处,同时避免计算和存储负担的大幅增加。在实验中,我们发现不仅在清洁条件下,而且在音量增强的吵闹条件下,多标签培训仍然能够比逗号方法取得更好的识别性能。应该指出,拟议的战略可以很容易地适用于几乎所有目前依赖文本的发言者识别模型,以便实现进一步的改进。