The present paper proposes a waveform boundary detection system for audio spoofing attacks containing partially manipulated segments. Partially spoofed/fake audio, where part of the utterance is replaced, either with synthetic or natural audio clips, has recently been reported as one scenario of audio deepfakes. As deepfakes can be a threat to social security, the detection of such spoofing audio is essential. Accordingly, we propose to address the problem with a deep learning-based frame-level detection system that can detect partially spoofed audio and locate the manipulated pieces. Our proposed method is trained and evaluated on data provided by the ADD2022 Challenge. We evaluate our detection model concerning various acoustic features and network configurations. As a result, our detection system achieves an equal error rate (EER) of 6.58% on the ADD2022 challenge test set, which is the best performance in partially spoofed audio detection systems that can locate manipulated clips.
翻译:本文件建议为含有部分受控部件的音频潜伏攻击建立一个波形边界探测系统。 部分被窃听/假音频,其部分内容被合成或天然音频剪辑所取代,最近被报告为音频深藏层的一种假想。 由于深层假言可能对社会保障构成威胁,因此探测这种潜伏音频至关重要。 因此,我们提议用一个深层的基于深层学习的框架级探测系统来解决问题,该系统可以探测部分被窃听的音频并定位被操纵的碎片。我们建议的方法是就ADD2022挑战提供的数据进行培训和评价。我们评估了我们关于各种声学特征和网络配置的探测模型。结果,我们的探测系统在ADD2022挑战测试集上达到6.58%的等误率,这是部分被窃听音频探测系统的最佳性能,可以定位被操纵的剪辑。