The objective of this work is speaker diarisation of speech recordings 'in the wild'. The ability to determine speech segments is a crucial part of diarisation systems, accounting for a large proportion of errors. In this paper, we present a simple but effective solution for speech activity detection based on the speaker embeddings. In particular, we discover that the norm of the speaker embedding is an extremely effective indicator of speech activity. The method does not require an independent model for speech activity detection, therefore allows speaker diarisation to be performed using a unified representation for both speaker modelling and speech activity detection. We perform a number of experiments on in-house and public datasets, in which our method outperforms popular baselines.
翻译:这项工作的目标是“ 野外” 语音录音的语音分解。 确定语音部分的能力是分解系统的一个关键部分, 占错误的很大一部分。 在本文中, 我们提出一个简单而有效的解决方案, 用于根据演讲者嵌入的语音活动探测。 特别是, 我们发现, 发言者嵌入的规范是语言活动的一个极为有效的指标。 该方法不需要独立的语音活动检测模式, 因此, 能够使用统一的代表来进行语言模拟和语音活动检测。 我们在内部和公共数据集上进行了一些实验, 我们的方法超过了流行基线 。