In this paper, we perform an in-depth study of how data augmentation techniques improve synthetic or spoofed audio detection. Specifically, we propose methods to deal with channel variability, different audio compressions, different band-widths, and unseen spoofing attacks, which have all been shown to significantly degrade the performance of audio-based systems and Anti-Spoofing systems. Our results are based on the ASVspoof 2021 challenge, in the Logical Access (LA) and Deep Fake (DF) categories. Our study is Data-Centric, meaning that the models are fixed and we significantly improve the results by making changes in the data. We introduce two forms of data augmentation - compression augmentation for the DF part, compression & channel augmentation for the LA part. In addition, a new type of online data augmentation, SpecAverage, is introduced in which the audio features are masked with their average value in order to improve generalization. Furthermore, we introduce a Log spectrogram feature design that improved the results. Our best single system and fusion scheme both achieve state-of-the-art performance in the DF category, with an EER of 15.46% and 14.46% respectively. Our best system for the LA task reduced the best baseline EER by 50% and the min t-DCF by 16%. Our techniques to deal with spoofed data from a wide variety of distributions can be replicated and can help anti-spoofing and speech-based systems enhance their results.
翻译:在本文中,我们深入研究了数据增强技术如何改进合成或隐蔽音频检测。 具体地说, 我们提出了处理频道变异性、 不同音频压缩、 不同带宽、 和隐蔽式攻击的方法, 所有这些都显示会显著降低音频系统和反伪隐蔽系统的性能。 我们的结果基于逻辑存取(LA) 和深假(DF) 类别中的 ASVspoof 2021 挑战。 我们的研究是数据中心, 意思是模型已经固定, 我们通过改变数据来大大改进结果。 我们采用了两种数据扩增形式: DF 部分的压缩扩增, LA 部分的压缩和 频道扩增。 此外,还引入了一种新的在线数据扩增(SpecAverage), 其音特征以其平均价值遮掩罩, 以便改进一般化。 此外, 我们引入了一种可以改进结果的日光谱特征设计。 我们最好的单一系统和联合计划, 既能实现DF 15级的州级扩增功能, 也实现了EER 和RE 最佳递减的E-% 任务交易, 最佳递减的E- 最佳递减的E- 的E- fro- pro- pro- pro- pro- pro- pro- pro- pro- pro- pro- proal- pro- pro- pro- sal- pal- pro- pro- sal- sal lax- sal lax- sal lex- pal lemental deal lemental lemental lad lemental be lax a lax a lad lax lex lad lad lax a dal dal dal dal dal dal dal dal dal dal dal dal dal dal dal dal d lab laction lection lab lax lax lab lad lacal dal dal dal dal dal dal dal dal d lad lad lad lad lad lad lad la la la la lad