检测博士演讲:迈向最终至最后参数学习过滤法 (Detection of Doctored Speech: Towards an End-to-End Parametric Learn-able Filter Approach)

The Automatic Speaker Verification systems have potential in biometrics applications for logical control access and authentication. A lot of things happen to be at stake if the ASV system is compromised. The preliminary work presents a comparative analysis of the wavelet and MFCC-based state-of-the-art spoof detection techniques developed in these papers, respectively (Novoselov et al., 2016) (Alam et al., 2016a). The results on ASVspoof 2015 justify our inclination towards wavelet-based features instead of MFCC features. The experiments on the ASVspoof 2019 database show the lack of credibility of the traditional handcrafted features and give us more reason to progress towards using end-to-end deep neural networks and more recent techniques. We use Sincnet architecture as our baseline. We get E2E deep learning models, which we call WSTnet and CWTnet, respectively, by replacing the Sinc layer with the Wavelet Scattering and Continuous wavelet transform layers. The fusion model achieved 62% and 17% relative improvement over traditional handcrafted models and our Sincnet baseline when evaluated on the modern spoofing attacks in ASVspoof 2019. The final scale distribution and the number of scales used in CWTnet are far from optimal for the task at hand. So to solve this problem, we replaced the CWT layer with a Wavelet Deconvolution(WD) (Khan and Yener, 2018) layer in our CWTnet architecture. This layer calculates the Discrete-Continuous Wavelet Transform similar to the CWTnet but also optimizes the scale parameter using back-propagation. The WDnet model achieved 26% and 7% relative improvement over CWTnet and Sincnet models respectively when evaluated over ASVspoof 2019 dataset. This shows that more generalized features are extracted as compared to the features extracted by CWTnet as only the most important and relevant frequency regions are focused upon.

翻译：自动音响校验系统在逻辑控制访问和认证的生物鉴别应用中具有潜力。如果 ASV 系统受到破坏, 许多事情会受到威胁。初步工作展示了对以波浪和MFCC为基础的最先进的探测技术分别开发的比较分析( Novoselov 等人,2016年)(Alam等人,2016年a) 。 ASVspoof 2015 的结果证明我们倾向于以波浪为基础的功能而不是MFCC 特征。 ASVpoof 数据库的实验显示传统手动功能缺乏可信度,使我们更有理由在使用端至端深的神经网络和最新技术方面取得进展。我们用Sincnet 结构作为我们的基线。我们用E2E深度学习模型分别称为WSTnet 和CWTnet, 以波浪流模式取代Sinc 变电压层。在传统手动模型和Sincnet 基线上,我们用ServerFOlickal 19 和Server Server Server Server Serveal 的Serviewal Serviewal Serviewal Seral Serview Seral Serview Seral Serviewal Prode Seral Serviews) 将Serview Stal 的C 19 和Serviewal Serviewl 的Serviewal Serviewdal 比例分别分别用到最新的C 19 级模型, 和C) 。我们基模型分别用Serviewdal Stal 和CWTal Servild Servildal 和C 19 的20WT Fladal 级模型取代了2019 。我们的C 格式模型, 我们的C 19 和C 。我们的20FIlsal Serldal Seral Stal Seral Seral 和CFSlal Sermal Seral 格式模型, 我们格式模型, 我们和C 和C 和C 19 和Cl 格式模型,我们的20Flal 和Clal Stal 和C-Slalterld Flal Seral Seral Serlal Sal