We describe the system used by our team for the VoxCeleb Speaker Recognition Challenge 2022 (VoxSRC 2022) in the speaker diarization track. Our solution was designed around a new combination of voice activity detection algorithms that uses the strengths of several systems. We introduce a novel multi stream approach with a decision protocol based on classifiers entropy. We called this method a multi-stream voice activity detection and used it with standard baseline diarization embeddings, clustering and resegmentation. With this work, we successfully demonstrated that using a strong baseline and working only on voice activity detection, one can achieved close to state-of-theart results.
翻译:我们描述一下我们的团队用于2022年 VoxCeleb 发言人识别挑战的系统(VoxSRC 2022)在音响分解轨道上所使用的系统。 我们的解决方案是围绕使用若干系统优势的语音活动检测算法的新组合设计的。 我们引入了一种新型的多流方法,其决定协议以分解器为主。 我们称这种方法为多流语音活动检测,并使用标准的基线分解嵌入、组合和分解。 通过这项工作,我们成功地证明,使用一个强大的基线,并且仅仅在语音活动检测上工作,就能接近于最新结果。