This system description describes our submission system to the Third DIHARD Speech Diarization Challenge. Besides the traditional clustering based system, the innovation of our system lies in the combination of various front-end techniques to solve the diarization problem, including speech separation and target-speaker based voice activity detection (TS-VAD), combined with iterative data purification. We also adopted audio domain classification to design domain-dependent processing. Finally, we performed post processing to do system fusion and selection. Our best system achieved DERs of 11.30% in track 1 and 16.78% in track 2 on evaluation set, respectively.
翻译:这个系统描述描述我们提交第三次DIHARD 语音分裂挑战的系统。 除了传统的基于集群的系统外,我们系统的创新在于将各种前端技术相结合,以解决二分化问题,包括语音分解和以目标声音活动为基础的探测(TS-VAD),加上迭代数据净化。我们还采用了音频域分类,以设计依赖域的处理程序。最后,我们进行了后端处理,以进行系统融合和选择。我们的最佳系统在评估集的轨道1和轨道2中分别实现了11.30%和16.78%的DER。