Deep generative modeling has the potential to cause significant harm to society. Recognizing this threat, a magnitude of research into detecting so-called "Deepfakes" has emerged. This research most often focuses on the image domain, while studies exploring generated audio signals have, so-far, been neglected. In this paper we make three key contributions to narrow this gap. First, we provide researchers with an introduction to common signal processing techniques used for analyzing audio signals. Second, we present a novel data set, for which we collected nine sample sets from five different network architectures, spanning two languages. Finally, we supply practitioners with two baseline models, adopted from the signal processing community, to facilitate further research in this area.
翻译:深基因模型有可能对社会造成重大伤害。 认识到这一威胁, 发现所谓“ 深假” 的大规模研究已经出现。 这一研究通常侧重于图像领域, 探索生成的音频信号的研究却远被忽略了。 在本文中,我们为缩小这一差距做出了三大贡献。 首先, 我们向研究人员介绍用于分析音频信号的通用信号处理技术。 第二, 我们提供了一套新颖的数据集, 我们从五种不同的网络结构中收集了九套样本, 涵盖两种语言。 最后, 我们向从业者提供两种基线模型, 由信号处理社区采用, 以促进这一领域的进一步研究 。