The objective of this work is to train noise-robust speaker embeddings adapted for speaker diarisation. Speaker embeddings play a crucial role in the performance of diarisation systems, but they often capture spurious information such as noise, adversely affecting performance. Our previous work has proposed an auto-encoder-based dimensionality reduction module to help remove the redundant information. However, they do not explicitly separate such information and have also been found to be sensitive to hyper-parameter values. To this end, we propose two contributions to overcome these issues: (i) a novel dimensionality reduction framework that can disentangle spurious information from the speaker embeddings; (ii) the use of speech activity vector to prevent the speaker code from representing the background noise. Through a range of experiments conducted on four datasets, our approach consistently demonstrates the state-of-the-art performance among models without system fusion.
翻译:这项工作的目标是培训适合发言者二分化的噪音-紫外线扬声器扩音器嵌入器; 扩音器嵌入器在二分化系统的运行中发挥着关键作用,但往往捕捉到噪音等虚假信息,对性能产生不利影响; 我们先前的工作提议采用基于自动编码的维度减少模块来帮助消除多余的信息; 但是,它们没有明确区分此类信息,而且被认为对超参数值十分敏感。 为此,我们建议为克服这些问题作出两项贡献:(一) 创新的多元性减少框架,它能够分解来自扩音器嵌入器的虚假信息;(二) 使用语音活动矢量来防止语音代码代表背景噪音。通过在四个数据集上进行的一系列实验,我们的方法始终展示了没有系统组合的模型中最先进的性能。