This technical report describes our system for track 1, 2 and 4 of the VoxCeleb Speaker Recognition Challenge 2022 (VoxSRC-22). By combining several ResNet variants, our submission for track 1 attained a minDCF of 0:090 with EER 1:401%. By further incorporating three fine-tuned pre-trained models, our submission for track 2 achieved a minDCF of 0:072 with EER 1:119%. For track 4, our system consisted of voice activity detection (VAD), speaker embedding extraction, agglomerative hierarchical clustering (AHC) followed by a re-clustering step based on a Bayesian hidden Markov model and overlapped speech detection and handling. Our submission for track 4 achieved a diarisation error rate (DER) of 4.86%. The submissions all ranked the 2nd places for the corresponding tracks.
翻译:本技术报告描述了2022年VoxCeleb议长承认挑战(VoxSRC-22)的第1、2和4轨系统。通过将几个ResNet变量合并,我们的第1轨呈件实现了0.090MDCF和1:401% EER。通过进一步纳入三个经过微调的预培训模型,我们的第2轨呈件实现了0.072MDCF和1.119% EER 1:119%。关于第4轨,我们的系统包括语音活动探测(VAD)、发言者嵌入提取、聚居式等级集群(AHC),然后根据Bayesian隐藏的Markov模型采取重新分组步骤,以及重叠语音探测和处理。我们的第4轨呈件达到了4.86%的分解误率。提交材料都排在相应的轨道的第二位。