DNN-based models achieve significant performance in the speaker verification (SV) task with substantial computation costs. Model compression can be applied to reduce the model size for lower resource consumption. The present study exploits weight quantization to compress two widely-used SV models, ECAPA-TDNN and ResNet. The experiments on VoxCeleb indicate that quantization is effective for compressing SV models, where the model size can be reduced by multiple times with no noticeable performance decline. ResNet achieves more robust results than ECAPA-TDNN using lower-bitwidth quantization. The analysis of layer weights shows that the smooth distribution of ResNet may contribute to its robust results. The additional experiments on CN-Celeb validate the quantized model's generalization ability in the language mismatch scenario. Furthermore, information probing results demonstrate that the quantized models can preserve most of the learned speaker-relevant knowledge compared to the original models.
翻译:以 DNN 为基础的模型在语音校验(SV)任务中取得了显著的绩效,计算成本很高。模型压缩可用于减少较低资源消耗的模型规模。本研究利用权重量化来压缩两种广泛使用的SV模型,即ECAPA-TDNN和ResNet。关于VoxCeleb的实验表明,量化对于压缩SV模型是有效的,在这种模型的尺寸可以多次缩小,而没有显著的性能下降。ResNet比ECAPA-TDNN 取得比低比比重的四分法强得多的结果。对层加权的分析表明,ResNet的平稳分布可能有助于其稳健的结果。关于CN-Celeb的额外实验验证了在语言不匹配情景中量化模型的通用能力。此外,信息预测结果表明,量化模型能够保存大部分与原始模型相比的与语音相关知识。</s>