The countermeasure (CM) model is developed to protect ASV systems from spoof attacks and prevent resulting personal information leakage in Automatic Speaker Verification (ASV) system. Based on practicality and security considerations, the CM model is usually deployed on edge devices, which have more limited computing resources and storage space than cloud-based systems, confining the model size under a limitation. To better trade off the CM model sizes and performance, we proposed an adversarial speaker distillation method, which is an improved version of knowledge distillation method combined with generalized end-to-end (GE2E) pre-training and adversarial fine-tuning. In the evaluation phase of the ASVspoof 2021 Logical Access task, our proposed adversarial speaker distillation ResNetSE (ASD-ResNetSE) model reaches 0.2695 min t-DCF and 3.54\% EER. ASD-ResNetSE only used 22.5\% of parameters and 19.4\% of multiply and accumulate operands of ResNetSE model.
翻译:根据实际和安全考虑,CM模型通常部署在边缘装置上,这些装置的计算资源和储存空间比云基系统更有限,限制了模型规模,为了更好地权衡CM模型的大小和性能,我们提议了一种对抗式发言者蒸馏方法,这是一种改进的知识蒸馏方法,结合一般的终端到终端(GE2E)培训前和对抗性微调。在ASVspoof 2021逻辑存取任务的评价阶段,我们提议的对抗式发言者蒸馏ResNetS(ASD-ResNetSE)模型达到0.2695 min t-DCF和3.54 ⁇ EER。ASD-ResNetSE只使用了22.5 ⁇ 参数和19.4 ⁇ ResNetSE模型的增殖和积累操作。