For self-supervised speaker verification, the quality of pseudo labels decides the upper bound of the system due to the massive unreliable labels. In this work, we propose dynamic loss-gate and label correction (DLG-LC) to alleviate the performance degradation caused by unreliable estimated labels. In DLG, we adopt Gaussian Mixture Model (GMM) to dynamically model the loss distribution and use the estimated GMM to distinguish the reliable and unreliable labels automatically. Besides, to better utilize the unreliable data instead of dropping them directly, we correct the unreliable label with model predictions. Moreover, we apply the negative-pairs-free DINO framework in our experiments for further improvement. Compared to the best-known speaker verification system with self-supervised learning, our proposed DLG-LC converges faster and achieves 11.45%, 18.35% and 15.16% relative improvement on Vox-O, Vox-E and Vox-H trials of Voxceleb1 evaluation dataset.
翻译:为了进行自我监督的扬声器校验,假标签的质量决定了系统的上层界限,因为有大量不可靠的标签。 在这项工作中,我们提出动态损耗门和标签校正(DLG-LC)以缓解不可靠估计标签造成的性能退化。在DLG中,我们采用高森混合模型(GMMM)来动态模拟损失分布,并使用估计的GMM自动区分可靠和不可靠的标签。此外,为了更好地利用不可靠的数据而不是直接将其丢弃,我们用模型预测来纠正不可靠的标签。此外,我们用无负面的DINO框架进行实验,以进一步改进。与通过自我监督学习最著名的扬声器校验系统相比,我们提议的DLG-LC更快地聚合并实现了Vox-O、Vox-E和Vox-H对Voxceleb1评估数据集的11.45%、18.35%和15.16%的相对改进。