State-of-the-art speaker recognition systems are trained with a large amount of human-labeled training data set. Such a training set is usually composed of various data sources to enhance the modeling capability of models. However, in practical deployment, unseen condition is almost inevitable. Domain mismatch is a common problem in real-life applications due to the statistical difference between the training and testing data sets. To alleviate the degradation caused by domain mismatch, we propose a new feature-based unsupervised domain adaptation algorithm. The algorithm we propose is a further optimization based on the well-known CORrelation ALignment (CORAL), so we call it CORAL++. On the NIST 2019 Speaker Recognition Evaluation (SRE19), we use SRE18 CTS set as the development set to verify the effectiveness of CORAL++. With the typical x-vector/PLDA setup, the CORAL++ outperforms the CORAL by 9.40% relatively on EER.
翻译:最先进的语音识别系统通过大量的人类标签培训数据集接受培训。这种培训组通常由各种数据来源组成,以加强模型模型的建模能力。然而,在实际部署中,几乎不可避免的是不可见的状况。由于培训和测试数据集之间的统计差异,域错配是实际应用中常见的问题。为了减轻域错配造成的退化,我们提出了一个新的基于地貌的不受监督域适应算法。我们提议的算法是基于众所周知的Correlation ALignment (CORAL) (CORAL) (CORAL++) (CORAL)) (我们称之为CORAL++) (CORAL) (CORAL) (CORAL+++) (CORAL) (CORAL+++++) (CORAL) (CORAL) (SRE19) 。在NIST 2019 发言人识别评价(SRE19) 中,我们使用SRE18 CTS(CTS) 设置的开发数据集来核查CORAL++++(C++) 的有效性。在典型的X- Ver/PLDA 设置中, CORAL++ 相对来说, 将CORAL 的CORAL 的CORAL 以9.40% 。