The countermeasure (CM) model is developed to protect ASV systems from spoof attacks and prevent resulting personal information leakage in Automatic Speaker Verification (ASV) system. Based on practicality and security considerations, the CM model is usually deployed on edge devices, which have more limited computing resources and storage space than cloud-based systems, confining the model size under a limitation. To better trade off the CM model sizes and performance, we proposed an adversarial speaker distillation method, which is an improved version of knowledge distillation method combined with generalized end-to-end (GE2E) pre-training and adversarial fine-tuning. In the evaluation phase of the ASVspoof 2021 Logical Access task, our proposed adversarial speaker distillation ResNetSE (ASD-ResNetSE) model reaches 0.2695 min t-DCF and 3.54% EER. ASD-ResNetSE only used 22.5% of parameters and 19.4% of multiply and accumulate operands of ResNetSE model.
翻译:根据实际和安全考虑,CM模型通常部署在边缘装置上,这些装置的计算资源和储存空间比云基系统更有限,限制模型的大小,为了更好地权衡CM模型的大小和性能,我们提议了一种对抗性扬声器蒸馏方法,这是一种改进的知识蒸馏方法,结合一般的终端到终端(GE2E)培训前和对抗性微调。在ASVspooof 2021逻辑存取任务的评价阶段,我们提议的对抗性扬声器蒸馏模型(ASD-ResNetSE)模型达到0.2695 mint-DCF和3.54% EER。ASD-ResNetSE只使用了22.5%的参数和19.4%的ResNetSE模型的增殖和积累操作。