Fake audio attack becomes a major threat to the speaker verification system. Although current detection approaches have achieved promising results on dataset-specific scenarios, they encounter difficulties on unseen spoofing data. Fine-tuning and retraining from scratch have been applied to incorporate new data. However, fine-tuning leads to performance degradation on previous data. Retraining takes a lot of time and computation resources. Besides, previous data are unavailable due to privacy in some situations. To solve the above problems, this paper proposes detecting fake without forgetting, a continual-learning-based method, to make the model learn new spoofing attacks incrementally. A knowledge distillation loss is introduced to loss function to preserve the memory of original model. Supposing the distribution of genuine voice is consistent among different scenarios, an extra embedding similarity loss is used as another constraint to further do a positive sample alignment. Experiments are conducted on the ASVspoof2019 dataset. The results show that our proposed method outperforms fine-tuning by the relative reduction of average equal error rate up to 81.62%.
翻译:虽然目前的探测方法在特定数据集的假设情景上取得了可喜的成果,但它们在隐蔽的数据方面遇到了困难。 从零开始的微调和再培训已经应用到新数据。 但是,微调导致先前数据性能退化。 微调需要大量的时间和计算资源。 此外, 先前的数据由于某些情况下的隐私而无法获得。 为解决上述问题, 本文提议在不忘记的情况下探测假数据, 这是一种以持续学习为基础的方法, 使模型逐渐学习新的假攻击。 将知识蒸馏损失引入损失功能以保存原始模型的记忆。 假设真实声音的分布在不同的假设中是一致的, 将额外嵌入相似性损失作为进一步进行正面抽样校准的另一个制约。 实验是在ASVspoof2019数据集上进行的。 结果表明,我们拟议的方法在将平均平均平均误率降至81.62%的相对减幅后, 将微调到81.62%。