The goal of this paper is to adapt speaker embeddings for solving the problem of speaker diarisation. The quality of speaker embeddings is paramount to the performance of speaker diarisation systems. Despite this, prior works in the field have directly used embeddings designed only to be effective on the speaker verification task. In this paper, we propose three techniques that can be used to better adapt the speaker embeddings for diarisation: dimensionality reduction, attention-based embedding aggregation, and non-speech clustering. A wide range of experiments is performed on various challenging datasets. The results demonstrate that all three techniques contribute positively to the performance of the diarisation system achieving an average relative improvement of 25.07% in terms of diarisation error rate over the baseline.
翻译:本文的目标是调整演讲者嵌入器,以解决音员二分化问题。 发言者嵌入器的质量对于发言者二分化系统的业绩至关重要。 尽管如此,先前的实地工程直接使用了只设计在演讲者核实任务上有效的嵌入器。 在本文中,我们提出了三种技术,可以用来更好地调整演讲者嵌入器以进行二分化:维度减少、关注嵌入聚合和非语音组合。在各种具有挑战性的数据集上进行了广泛的实验。结果显示,所有三种技术都对二分化系统的性能作出了积极贡献,在二分化误差率方面比基线的平均相对改进率为25.07%。