Over the recent years, various deep learning-based methods were proposed for extracting a fixed-dimensional embedding vector from speech signals. Although the deep learning-based embedding extraction methods have shown good performance in numerous tasks including speaker verification, language identification and anti-spoofing, their performance is limited when it comes to mismatched conditions due to the variability within them unrelated to the main task. In order to alleviate this problem, we propose a novel training strategy that regularizes the embedding network to have minimum information about the nuisance attributes. To achieve this, our proposed method directly incorporates the information bottleneck scheme into the training process, where the mutual information is estimated using the main task classifier and an auxiliary normalizing flow network. The proposed method was evaluated on different speech processing tasks and showed improvement over the standard training strategy in all experimentation.
翻译:近年来,提出了各种深层次的学习方法,从语音信号中提取固定的维系矢量。虽然深层的基于学习的嵌入式提取方法在包括语音校验、语言识别和反排泄等许多任务方面表现良好,但是由于与主要任务无关的语句处理任务的不同性,在不匹配的条件下,这些方法的表现有限。为了缓解这一问题,我们提出了一个新的培训战略,使嵌入网络正规化,以获得关于骚扰特性的最低限度信息。为了实现这一目标,我们提出的方法将信息瓶颈计划直接纳入培训过程,在培训过程中,利用主要任务分类器和辅助性正常化流网络估算相互信息。拟议方法对不同的语音处理任务进行了评估,并显示在所有实验中的标准培训战略都得到了改进。