In the task of speaker diarization, the number of small-scale meetings accounts for a large proportion. When microphone arrays are employed as a recording device, its spatial information is usually ignored by most researchers. In this paper, inspired by the clustering method combining d-vector and microphone array spatial vector, we proposed a diarization method which using multi-channel microphone arrays for a meeting with no more than 4 speakers. We utilize speech enhancement to preprocess the audio from the microphone array. The Steered-Response Power Phase Transform (SRP-PHAT) algorithm are employed to get more accurate speakers, and apply the number of speakers to recluster the speech segments to achieve better performance. Finally, we fuse our system by DOVER-LAP to get the best result. We evaluated our system on the AMI corpus. Compared with the best experimental results so far, our system has achieved largely improvement in the diarization error rate (DER).
翻译:在扩音分解任务中,小型会议的数量占很大比例。当麦克风阵列被用作录音设备时,大多数研究人员通常忽略其空间信息。在本文中,由于d-Verctor和麦克风阵列空间矢量的组合方法的启发,我们建议了一种分解方法,在不超过4个发言者的会议中使用多声道麦克风阵列。我们利用语音增强来预处理麦克风阵列的音频。采用了斯特雷德-反应力阶段变换算法(SRP-PHAT)来获取更准确的发言者,并应用发言者人数来重新组合发言部分,以取得更好的性能。最后,我们用DOVER-LAP的集成我们的系统来取得最佳结果。我们评价了我们在AMI文上的系统。与迄今为止的最佳实验结果相比,我们的系统在分解误率方面取得了很大的改进。