Synthetic voice and splicing audio clips have been generated to spoof Internet users and artificial intelligence (AI) technologies such as voice authentication. Existing research work treats spoofing countermeasures as a binary classification problem: bonafide vs. spoof. This paper extends the existing Res2Net by involving the recent Conformer block to further exploit the local patterns on acoustic features. Experimental results on ASVspoof 2019 database show that the proposed SE-Res2Net-Conformer architecture is able to improve the spoofing countermeasures performance for the logical access scenario. In addition, this paper also proposes to re-formulate the existing audio splicing detection problem. Instead of identifying the complete splicing segments, it is more useful to detect the boundaries of the spliced segments. Moreover, a deep learning approach can be used to solve the problem, which is different from the previous signal processing techniques.
翻译:合成声音和复制音频剪辑已生成给因特网用户和声音认证等人工智能技术(AI),现有研究工作将假冒反措施作为二元分类问题处理:roonfide vs. spoof。本文件扩展了现有的Res2Net,让最近的Coneder 区块参与,以进一步利用当地声学特征模式。ASVspoof 2019数据库的实验结果表明,拟议的SE-Res2Net-Conecter 架构能够改进逻辑存取情景的假冒反措施性能。此外,本文件还提议重新拟订现有的音频断层探测问题。与其查明完整的断层部分,不如查明断层段的界限。此外,可以采用深入的学习方法解决问题,这与以前的信号处理技术不同。