RW-Resnet:使用原波形的新语言反排波模型 (RW-Resnet: A Novel Speech Anti-Spoofing Model Using Raw Waveform)

In recent years, synthetic speech generated by advanced text-to-speech (TTS) and voice conversion (VC) systems has caused great harms to automatic speaker verification (ASV) systems, urging us to design a synthetic speech detection system to protect ASV systems. In this paper, we propose a new speech anti-spoofing model named ResWavegram-Resnet (RW-Resnet). The model contains two parts, Conv1D Resblocks and backbone Resnet34. The Conv1D Resblock is based on the Conv1D block with a residual connection. For the first part, we use the raw waveform as input and feed it to the stacked Conv1D Resblocks to get the ResWavegram. Compared with traditional methods, ResWavegram keeps all the information from the audio signal and has a stronger ability in extracting features. For the second part, the extracted features are fed to the backbone Resnet34 for the spoofed or bonafide decision. The ASVspoof2019 logical access (LA) corpus is used to evaluate our proposed RW-Resnet. Experimental results show that the RW-Resnet achieves better performance than other state-of-the-art anti-spoofing models, which illustrates its effectiveness in detecting synthetic speech attacks.

翻译：近年来,由先进的文本到语音系统(TTS)和语音转换系统(VC)产生的合成语音对自动扬声器核查(ASV)系统造成重大伤害,敦促我们设计一个合成语音探测系统来保护ASV系统。在本文中,我们提议了名为ResWavegram-Resnet(RW-Resnet)的新的反播音模型。该模型包含两个部分,即Conv1D 阻隔和主干Resnet34。Conv1D Resstlock以Conv1D块为基础,并有一个剩余连接。首先,我们使用原始波形作为输入,并将其输入堆叠的Conv1D Resblocks。与传统方法相比,Reswavegram将所有信息从音频信号中保存,并具有更强的提取功能。在第二部分,提取的特征被输入到主干线Resnet34,用于作出有剩余连接的Conv1D 2019 逻辑访问(LA) 系统,用于评估我们所拟议的语音-Resnet系统测试的其他结果,从而更好地显示其探测结果。

相关内容

MoDELS

关注 43

ACM/IEEE第23届模型驱动工程语言和系统国际会议，是模型驱动软件和系统工程的首要会议系列，由ACM-SIGSOFT和IEEE-TCSE支持组织。自1998年以来，模型涵盖了建模的各个方面，从语言和方法到工具和应用程序。模特的参加者来自不同的背景，包括研究人员、学者、工程师和工业专业人士。MODELS 2019是一个论坛，参与者可以围绕建模和模型驱动的软件和系统交流前沿研究成果和创新实践经验。今年的版本将为建模社区提供进一步推进建模基础的机会，并在网络物理系统、嵌入式系统、社会技术系统、云计算、大数据、机器学习、安全、开源等新兴领域提出建模的创新应用以及可持续性。官网链接：http://www.modelsconference.org/