Automatic speaker verification (ASV) systems, which determine whether two speeches are from the same speaker, mainly focus on verification accuracy while ignoring inference speed. However, in real applications, both inference speed and verification accuracy are essential. This study proposes cross-sequential re-parameterization (CS-Rep), a novel topology re-parameterization strategy for multi-type networks, to increase the inference speed and verification accuracy of models. CS-Rep solves the problem that existing re-parameterization methods are unsuitable for typical ASV backbones. When a model applies CS-Rep, the training-period network utilizes a multi-branch topology to capture speaker information, whereas the inference-period model converts to a time-delay neural network (TDNN)-like plain backbone with stacked TDNN layers to achieve the fast inference speed. Based on CS-Rep, an improved TDNN with friendly test and deployment called Rep-TDNN is proposed. Compared with the state-of-the-art model ECAPA-TDNN, which is highly recognized in the industry, Rep-TDNN increases the actual inference speed by about 50% and reduces the EER by 10%. The code will be released.
翻译:自动扬声器校验(ASV)系统确定两种演讲是否来自同一发言者,主要侧重于核查准确性,而忽略推论速度。然而,在实际应用中,推论速度和核查准确性都至关重要。本研究提出了跨序列再校准(CS-Rep)系统,这是多类型网络的一种新型地形再校准战略,目的是提高模型的推论速度和核查准确性。CS-Rep解决了现有重新校准方法不适合典型的ASV主干网的问题。当模型应用CS-Rep时,培训周期网络使用多层表层表层来捕捉扬声器信息,而推论期模型则转换成像时代换神经网络(TDNN)那样的直径脊骨,配有堆叠式TDNT层,以快速推导出速度。根据CS-Rep,提出了改进的TDNN方法,称为Rep-TDNN,与最新模型相比,ECPA-TDNNN将提高实际速度,而E-NM将降低E-NE值为10的版本。