Speaker identification systems are deployed in diverse environments, often different from the lab conditions on which they are trained and tested. In this paper, first, we show the problem of generalization using fixed thresholds computed using the equal error rate metric. Secondly, we introduce a novel and generalizable speaker-specific thresholding technique for robust imposter identification in unseen speaker identification. We propose a speaker-specific adaptive threshold, which can be computed using the enrollment audio samples, for identifying imposters in unseen speaker identification. Furthermore, we show the efficacy of the proposed technique on VoxCeleb1, VCTK and the FFSVC 2022 datasets, beating the baseline fixed thresholding by up to 25%. Finally, we exhibit that the proposed algorithm is also generalizable, demonstrating its performance on ResNet50, ECAPA-TDNN and RawNet3 speaker encoders.
翻译:暂无翻译